Blog Categories

DataStage Data Transformation

David Cross

Data Analyst

DataStage Data Transformation

Transforming Data with the Transformer Stage 

The Transformer stage is used within a DataStage job (referred to as DS for the remainder of this article) to filter, transform, or re-direct data. Overall, you can define conditions in the Transformer that will affect the output. In this article, we will explore some of the ways that you can use the Transformer. 
 
Another common use of the Transformer, is splitting the output to multiple tables or databases. You can also test for specific conditions, and if it fails, you can define a ‘reject’ link, and examine the failed data.     
These concepts are best demonstrated with examples.   
 

Transform Types 

There are 3 general types of transformations.  

  • Derivations change data. 
  • Constraints filter data 
  • Rejections direct data. 

Job Configuration 

We have created a simple Parallel job with a source DB2 connector and the same for the target. The Transformer links the two connectors. Both the source and target have been configured. The three different types of Transforms will be described as follows:

A Simple Parallel Job with Transformer Stage

Filter/Constraint – 1

Filters, or Constraints, can allow rows to pass that satisfy specific conditions. For example, you can filter customer data so that only Canadian customers can pass into a table. In addition, you can split data into multiple outputs, based on conditions. We could have one table populated with customers from Canada only, and another table populated with customers from any other country, excluding Canada. This is what the filter would look like the stage using ‘Constraints’.

Parallel Job using the Transformer Stage with Multiple Outputs

Derivations – 2

Derivations are typically used in conjunction with Stage Variables. As stated previously, Derivations are used to change data. Let us look at an example.

We have a field in our source table named ACTIVE. The value is either ‘Y’ or ‘N’. Our requirement in the Target table field ACTIVE is for that to be displayed as either ‘Active’ or ‘Inactive’. A Stage Variable ‘AcctActiveFlag’ is created to hold the output. An If/Else statement tests for the input values and outputs to ‘AcctActiveFlag’.


‘Y’ becomes ‘Active’ and ‘N’ becomes ‘Inactive’ in the target field. There are also various functions available (Date, Time, Number, String, and Type Conversion) that can assist us in converting data. We could change a null to a value using the ‘NullToValue’ function for instance.

Reject Links – 3

We can think of Reject Links as a direction type of transformation. Data that passes our criteria will continue to the Target table. Data that fails our criteria is ‘re-directed’ to a separate link and output (ex: text file). We can then examine the data and ascertain the reason for the rejection.

Loop Processing

This is yet another function available in the Transformer Stage. Multiple output rows can be written from an input row. Looping is useful for writing repeating values to the output for an input, as in grouping.

There are two components to Loop Processing:

  • Loop Condition – This is a True/False expression that is evaluated after a row is read.
  • Loop Variable – While the condition is met, the Variable will continue to execute.

 Note ** You must ensure that the condition eventually changes to false. This is necessary to prevent an infinite loop state.
In the following example, we have a source table with a ‘MODEL’ field. The model names are delimited by the backslash character (Input Table). However, we need to split them. Our desired result is that each model will be on a separate row (Output Table).

Input Table
Output Table

To implement this, we would create a Stage Variable ‘CarCount’ in the Transformer Stage to hold the count of the fields. In the derivation we would create an expression assigned to split and count for ‘CarCount’. It would look like this:

Stage variable CarCount

Next, we set our Loop Condition as less than or equal to our Stage Variable (CarCount).

Furthermore, we create a Loop Variable ‘modelLoopVar’ that is assigned the value of the Derivation formula:             Field(to_transformer.MODEL, “/”, @ITERATION, 1)

Drag the Loop Variable ‘modelLoopVar’ down to the Input Derivation. Your Transformer configuration should look like this:

Conclusion


The Transformer stage is quite useful at filtering and modifying data. However, it does have some drawbacks. The overhead when using a Transformer is relatively high. Keep this in mind when designing your job. A good practise is to examine all the requirements, then build your job accordingly. 

If performance is paramount, DS provides different stages, where the functionality overlaps. These alternatives have a smaller performance ‘’cost’.
If you just want to split data to multiple outputs, you would be better served using the ‘Copy’ stage instead of the ‘Transformer’. Likewise, for filtering, use the ‘Filter’ stage. However, what makes the Transformer appealing is that it has all the features of the Copy, Filter, and Switch stage combined.

IBM’s online documentation provides a listing and description for every DS stage here at DS Stages


Collect and Analyze your Data with Indellient

Need support in collecting, organizing, storing, or reading your data? Indellient has teams of experts that can help you no matter how big or small the project. Contact us today to have a no-obligation conversation on your specific needs.

Indellient is a Software Development Company that specializes in Data AnalyticsCloud Application Development, Managed IT SolutionsDevOps Services, and Blue Relay.

Learn More

About The Author

Hello, my name is David Cross, Data Analyst at Indellient. The skill set I have developed is honed from over 20 years in the IT field, after graduating from Sheridan College's Systems Analyst program as a Sheridan Scholar. I have an enthusiasm for both Data Analytics, and the Extraction and Transformation of data between systems (ETL).