January, 2013Héctor Candón, Alberto Abelló, Petar Jovanovic, Sergi Nadal, Oscar Romero, Vasileios Theodorou
The Quarry project is one of the pillar projects of the DTIM research group. Throughout the years, the project has gathered many researchers, PhD, Master, and Bachelor students. They are all working together towards the final goal of providing an end-to-end system for assisting users of various technical skills in managing the incremental design and deployment of analytical infrastructures (e.g., MD schemata and ETL processes).
The main idea behind Quarry is to automate the complex and time-consuming task of the incremental data warehouse (DW) design from high-level information requirements. Moreover, Quarry provides tools for efficiently accommodating MD schema and ETL process designs to new or changed information needs of its end-users. Finally, Quarry facilitates the deployment of the generated DW design over an extensible list of execution engines.
Nomenclature (source: http://dictionary.reference.com/)
noun, plural quarries: an excavation or pit, usually open to the air, from which building stone, slate, or the like, is obtained by cutting, blasting, etc.
verb (used with object), quarried, quarrying:to obtain (stone) from or as if from a quarry.
In our context: starting from the raw conceptual knowledge of the sources available for analysis, in Quarry, we plan to identify, cut, excavate, transform, and integrate pieces to create the infrastructure which suites analytical needs of business users.
Quarry comprises four core components: Requirements Elicitor, Requirements Interpreter, Design Integrator, and Design Deployer; as well as the Communication&Metadata layer.
For supporting non-expert users in providing their information requirements at input, Quarry provides a graphical component, namely Requirements Elicitor. Requirements Elicitor then connects to a component Requirements Interpreter, which for each information requirement at input semi-automatically generates validated partial MD schema and ETL process designs. Quarry further offers a component called Design Integrator comprising two modules for integrating partial MD schema and ETL process designs processed so far, and generating unified design solutions satisfying a complete set of requirements. At each step, after integrating partial designs of a new requirement, Quarry guarantees the soundness of the unified design solutions and the satisfiability of all requirements processed so far. The produced DW design solutions are further sent to the Design Deployer component for the initial deployment of a DW schema and an ETL process that populates it. The deployed design solutions are then available for further user-preferred tunings and use.
To support intra and cross-platform communication, Quarry includes a generic Communication&Metadata layer where other components can plugin to communicate with the platform.
The Quarry project has resulted in several conference and journal publications, and involved many successful Bachelor and Master thesis.
Related publications, Master and Bachelor thesis:
- Quarry: Digging Up the Gems of Your Data Treasury. Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló, Héctor Candón, Sergi Nadal. EDBT 2015: 549-552
- Requirement elicitor (GEM):
- GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs. Oscar Romero, Alkis Simitsis, Alberto Abelló. DaWaK 2011: 80-95
- Requirement-Driven Creation and Deployment of Multidimensional and ETL Designs. Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. ER Workshops 2012: 391-395
- Integration of Multidimensional and ETL design. Petar Jovanovic, Master Thesis, 2011
- GEM. Petar Jovanovic, Oscar Romero, Alberto Abelló, Alkis Simitsis, eBISS, 2013
- MD Schema integrator (ORE):
- ORE: an iterative approach to the design and evolution of multi-dimensional schemas. Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. DOLAP 2012: 1-8
- A requirement-driven approach to the design and evolution of data warehouses. Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló, Daria Mayorova. Inf. Syst. 44: 94-119 (2014)
- Implementation of the multidimensional schemas integration method ORE. Daria Mayorova, Master Thesis, 2013
- ETL Process integrator (CoAl):
- Incremental Consolidation of Data-Intensive Multi-flows. Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. IEEE Trans. Knowl. Data Eng. 99(): In Press (2016)
- Integrating ETL Processes from Information Requirements. Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. DaWaK 2012: 65-80
- CoAl: Incremental requirement-driven design and deployment of data intensive flows. Petar Jovanovic, Oscar Romero, Alberto Abelló, Alkis Simitsis, eBISS, 2014
- Communication and Metadata (Minecart):
- Towards Next Generation BI Systems: The Analytical Metadata Challenge. Jovan Varga, Oscar Romero, Torben Bach Pedersen, Christian Thomsen. DaWaK 2014: 89-101
- SM4AM: A Semantic Metamodel for Analytical Metadata. Jovan Varga, Oscar Romero, Torben Bach Pedersen, Christian Thomsen. DOLAP 2014: 57-66
- Metadata Management for Knowledge Discovery. Varunya Thavornun, Master Thesis, 2015
- Semi-Automatic Ontology Matching and Enrichment. Rizkallah Touma, Master Thesis, 2015.
- The Minecart Project: A Wee Step Towards BI 2.0. Héctor Candón Arenas, Bachelor Thesis, 2014
- Iterative optimization (Forge)
- Multi-Objective Materialized View Selection in Data-Intensive Flows. Sergi Nadal, Master Thesis, 2015