ETL operation taxonomy for the context of data generation




Comparison of ETL operations through selected ETL tools


Operation Level
(Taxonomy)
Operation Type Pentaho PDI Talend Data Integration SSIS Oracle Warehouse Builder
Attribute  Attribute Value Alteration Add constant
Formula
Number ranges
Add sequence
Calculator
Add a checksum
tMap
tConvertType
tReplaceList
Character Map
Derived Column
Copy Column
Data Conversion
Constant Operator
Expression Operator
Data Generator
Transformation
Mapping Sequence
Dataset Duplicate Removal Unique Rows
Unique Rows (HashSet)
tUniqRow Fuzzy Grouping Deduplicator
Sort Sort Rows tSortRow Sort Sorter
Sampling Reservoir Sampling
Sample Rows
tSampleRow Percentage Sampling
Row Sampling
 
Aggregation Group by
Memory Group by
tAggregateRow
tAggregateSortedRow
Aggregate  Aggregator
Dataset Copy   tReplicate Multicast  
Entry Duplicate Row Clone Row tRowGenerator    
Filter Filter Rows
Data Validator
tFilterRow
tMap
tSchemaComplianceCheck
Conditional Split Filter
Join Merge Join
Stream Lookup
Database lookup
Merge Rows
Multiway Merge Join
Fuzzy Match
tJoin
tFuzzyMatch
Merge Join
Fuzzy Lookup
Joiner
Key Lookup Operator
Router Switch/Case tMap Conditional Split Splitter
Set Operation - Intersect Merge Rows (diff) tMap Merge Join Set Operation
Set Operation - Difference Merge Rows (diff) tMap   Set Operation
Set Operation - Union Sorted Merge
Append streams
tUnite Merge
Union All
Set Operation
Schema Attribute Addition Set field value
Set field value to a constant
String operations
Strings cut
Replace in string
Formula
Split Fields
Concat Fields
Add value fields changing sequence
Sample rows
tMap
tExtractRegexFields
tAddCRCRow
Derived Column
Character Map
Row Count
Audit Transformation
Constant Operator
Expression Operator
Data Generator
Mapping Input/Output parameter
Datatype Conversion Select Values tConvertType Data Conversion Anydata Cast Operator
Attribute Renaming Select Values tMap Derived Column  
Projection Select Values tFilterColumns    
Relation Pivoting Row Denormalizer tDenormalize
tDenormalizeSortedRow
Pivot Unpivot
Unpivoting Row Normalizer
Split field to rows
tNormalize
tSplitRow
Unpivot Pivot
Value Single Value Alteration If field value is null
Null if
Modified Java Script Value
SQL Execute
tMap
tReplace
Derived Column Constant Operator
Expression Operator
Match-Merge Operator
Mapping Input/Output parameter
Source Operation Extraction CSV file input
Microsoft Excel Input
Table input
Text file input
XML Input 
tFileInputDelimited
tDBInput
tFileInputExcel
ADO .NET / DataReader Source
Excel Source
Flat File Source
OLE DB Source
XML Source
Table Operator
Flat File Operator
Dimension Operator
Cube Operator
Target Operation Loading Text file output
Microsoft Excel Output
Table output
Text file output
XML Output
tFileOutputDelimited
tDBOutput
tFileOutputExcel
Dimension Processing
Excel Destination
Flat File Destination
OLE DB Destination
SQL Server Destination
Table Operator
Flat File Operator
Dimension Operator
Cube Operator