In general when requirement come to implement SCD type 2, we look out for any date field in the Target. If not date field we go for SCD type 2 with flag column, still we can go for SCD type two with versioning. This all scenario holds good when there is a date column or flag column in the table it’s easy for a developer to implement SCD type2.
Let’s say we come across a situation where there are no date columns or flag column, then how to implement SCD type 2. Usually we come across such situation when you are dealing with very old legacy system which doesn’t have such fields in the table. Then the solution is to generate hash code for the entire row and use it as identifier column for reference. To achieve this informatica allow us to use hash function namely MD5 () – Message Digest and CRC32 () – Cyclic Redundancy Check. The hash code in ETL perspective is referred as checksum value
The above is the mapping for the same. So to begin with in expression transformation create two new output ports like below
And edit the expression of the two new ports as below
MD5_CHECKSUM à MD5 (COL1||COL2|COL3||…..||COLn)
CRC32_CHECKSUM à CRC32(COL1||COL2|COL3||…..||COLn)
So how does it works, we are passing a concat value of all the fields to these hash function. It will create a unique hash value for these columns and we can use it for further unique key reference.
NOTE: It’s good to handle NULL values in the column which are used by NVL2() function to avoid any data discrepancies.
Next in look up transformation look up on the primary key for any changes and pass primary key and MD5_CHECKSUM and CRC32_CHECKSUM to a router transformation and populate rest fields from the source.
In Router create groups for insert n update rows as below
INSERT à ISNULL(LKP_PK)
UPDATE à NOT ISNULL(LKP_PK) AND ( LKP_MD5_CHECKSUM != MD5_CHECKSUM AND LKP_CRC_CHECKSUM != CRC32_CHECKSUM )
Here you can use either of MD5 or CRC32 function, I have used both just for understabd process. So once you have driven the records for INSERT and UPDATE rows perform SCD type 2 then.
NOTE: Tacke care of Nulls and spcaces in the column your are performing concat operation and passing it to MD5() function, Nulls create a big problem here as teh value generated by MD5 won't be unique.
MD5(IIF(ISNULL(TRIM(COL1)),'A',TRIM(COL1))||IIF(ISNULL(TRIM(COL2)),'A',TRIM(COL2))||IIF(ISNULL(TRIM(COL3)),'A',TRIM(COL3)).....)
This way you can implement SCD type 2 without bothering about DATE or FLAG columns in the table.
MD5(IIF(ISNULL(TRIM(COL1)),'A',TRIM(COL1))||IIF(ISNULL(TRIM(COL2)),'A',TRIM(COL2))||IIF(ISNULL(TRIM(COL3)),'A',TRIM(COL3)).....)
This way you can implement SCD type 2 without bothering about DATE or FLAG columns in the table.
Thnaks,
Hari