Tuesday 21 August 2012

Joiner Transformation

  • Used to join source data from two related heterogeneous sources residing in Different locations or file systems. Or, we can join data  from the same source.
  • If we need to join 3 tables, then we need 2 Joiner Transformations.
  • The Joiner transformation joins two sources with at least one matching port. The Joiner transformation uses a condition that matches one or more pairs of Ports between the two sources.
JOIN CONDITION:
The join condition contains ports from both input sources that must match for the Power Center Server to join two rows.
Example: DEPTNO=DEPTNO1.
  1. Edit Joiner Transformation -> Condition Tab
  2. Add condition
  • We can add as many conditions as needed.
  • Only = operator is allowed.
If we join Char and Varchar data types, the Power Center Server counts any spaces that pad Char values as part of the string. So if you try to join the following:
Char (40) = “abcd” and Varchar (40) = “abcd”
Then the Char value is “abcd” padded with 36 blank spaces, and the Power Center Server does not join the two fields because the Char field contains trailing spaces.
Note: The Joiner transformation does not match null values.
MASTER and DETAIL TABLES
In Joiner, one table is called as MASTER and other as DETAIL.
  • MASTER table is always cached. We can make any table as MASTER.
  • Edit Joiner Transformation -> Ports Tab -> Select M for Master table.
Table with less number of rows should be made MASTER to improve Performance.
Reason:
  • When the Power Center Server processes a Joiner transformation, it reads rows from both sources concurrently and builds the index and data cache based on the master rows. So table with fewer rows will be read fast and cache can be made as table with more rows is still being read.
  • The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
JOINER TRANSFORMATION PROPERTIES TAB
  •  Case-Sensitive String Comparison: If selected, the Power Center Server uses case-sensitive string comparisons when performing joins on string columns.
  •  Cache Directory: Specifies the directory used to cache master or detail rows and the index to these rows.
  •  Join Type: Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
 Tracing Level
 Joiner Data Cache Size
 Joiner Index Cache Size
 Sorted Input
JOIN TYPES
In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation acts in much the same manner, except that tables can originate from different databases or flat files.
Types of Joins:
  • Normal
  • Master Outer
  • Detail Outer
  • Full Outer
Note: A normal or master outer join performs faster than a full outer or detail outer join.
Example: In EMP, we have employees with DEPTNO 10, 20, 30 and 50. In DEPT, we have DEPTNO 10, 20, 30 and 40. DEPT will be MASTER table as it has less rows.
Normal Join:
With a normal join, the Power Center Server discards all rows of data from the master and detail source that do not match, based on the condition.
  • All employees of 10, 20 and 30 will be there as only they are matching.
Master Outer Join:
This join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.
  • All data of employees of 10, 20 and 30 will be there.
  • There will be employees of DEPTNO 50 and corresponding DNAME and LOC Columns will be NULL.
Detail Outer Join:
This join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
  • All employees of 10, 20 and 30 will be there.
  • There will be one record for DEPTNO 40 and corresponding data of EMP columns will be NULL.
Full Outer Join:
A full outer join keeps all rows of data from both the master and detail sources.
  • All data of employees of 10, 20 and 30 will be there.
  • There will be employees of DEPTNO 50 and corresponding DNAME and LOC Columns will be NULL.
  • There will be one record for DEPTNO 40 and corresponding data of EMP Columns will be NULL.
USING SORTED INPUT
  • Use to improve session performance.
  • to use sorted input, we must pass data to the Joiner transformation sorted by the ports that are used in Join Condition.
  • We check the Sorted Input Option in Properties Tab of the transformation.
  • If the option is checked but we are not passing sorted data to the Transformation, then the session fails.
  • We can use SORTER to sort data or Source Qualifier in case of Relational tables.
JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds Index cache and Data Cache based on MASTER table.
1) Joiner Index Cache:
  • All Columns of MASTER table used in Join condition are in JOINER INDEX CACHE.
· Example: DEPTNO in our mapping.
2) Joiner Data Cache:
  • Master column not in join condition and used for output to other transformation or target table are in Data Cache.
· Example: DNAME and LOC in our mapping example.
Performance Tuning:
  • Perform joins in a database when possible.
  • Join sorted data when possible.
  • For a sorted Joiner transformation, designate as the master source the source with fewer duplicate key values.
  • Joiner can’t be used in following conditions:
  1. Either input pipeline contains an Update Strategy transformation.
  2. We connect a Sequence Generator transformation directly before the Joiner transformation.

No comments:

Post a Comment