IT4BI MSc Thesis in 2015
Representing ETL Flows with BPMN 2.0
Extract, Transform and Load (ETL) processes are widely used in Data Warehousing in order to extract, cleanse and load data into a centralized location for better analysis and decision-making. As users become more demanding for on-line decision making, ETL processes grow large and more complex. Most processes are deployed at the physical level without any abstraction, thus costs of maintenance and efforts for reuse are considerable. Therefore, having logical and conceptual abstractions of ETL processes makes such tasks substantially easier. In this thesis, given a logical ETL representation, we provide an algorithm that automatically translates logical ETL flows into their BPMN representation. To achieve this goal, we create a dictionary that defines simple and composite ETL flow patterns and their corresponding BPMN elements. The pattern dictionary follows a formalized grammar and can be further extended with additional ETL flow patterns. As a result, we can produce conceptual ETL flows in BPMN 2.0 format that can be further edited by the business user. The patterns defined in the dictionary help to move away from technical details and complexity of the ETL flows and make the output model semantics more intuitive and understandable for the business users, as shown during the approach validation.