DTIM | UPC

Business Intelligence

Alberto Abelló, Petar Jovanovic, Sergi Nadal, Oscar Romero

Description

Business Intelligence (BI) is the set of collections and tools that empowers an organisation with the capability of collecting and analyzing internal and external data to generate knowledge and value, providing decision support at the strategic, tactical, and operational levels. This traditionally includes the areas of Data Warehousing, OLAP (descriptive analysis), and Data Mining (predicive analysis). In the group, we maily focus on the main two areas.

Data Warehousing refers to the extraction of data from the sources and their storage and management in a common integrated, long-lasting repository with temporal capabilities (both Valid Time and Transaction Time), with the ultimate purpose of analyzing them. Some times, from a theoretical point of view, a Data Warehouse is simply defined as a set of Materialized Views. Optimally selecting and updating such Materialized Views has been an active research area in the last years.

On the other hand, OLAP (standing for On-Line Analytical Processing) tools are those that allow the navigation of data by means of the Multidimensional Model, based on the Data Cube metaphor. Thus, cubes are defined in terms of a Star Schema composed by a Fact subject of analysis and different Dimensions around it that facilitate operations like Roll-up, Drill-down, Slice, Dice, etc. Such conceptual schema can be implemented in different technologies (named ROLAP if the DBMS is Relational), but in any case, it must result in high performance on the aggregation operations (most of the times obtained by precomputing the results of queries).

Finally, it is important to acknowledge the relevance of automation in this context, given that users (typically executives) of these tools are not necessarily experts in Information Technologies. Thus, some efforts are being spent in providing Self-service capabilities that hide the technological complexity underneath.

Research line: Multidimensional Conceptual Modelling

We have proposed YAM², a multidimensional conceptual model for OLAP defined as an extension of UML (Unified Modeling Language). The aim was to benefit from Object-Oriented concepts and relationships to allow the definition of semantically rich multi-star schemas. Thus, the usage of Generalization, Association, Derivation, and Flow relationships (in UML terminology) was studied.

An architecture based on different levels of schemas was also proposed and the characteristics of its different levels defined. The benefits of this architecture are twofold. Firstly, it relates Federated Information Systems with Data Warehousing, so that advances in one area can also be used in the other. Moreover, the Data Mart schemas are defined so that they can be implemented on different Database Management Systems, while still offering a common integrated vision that allows to navigate through the different stars.

The main concepts of any multidimensional model are facts and dimensions. Both were analyzed separately, based on the assumption that relationships between aggregation levels are part-whole (or composition) relationships. Thus, mereology axioms were used on that analysis to prove some properties.

Besides structures, operations and integrity constraints were also defined for YAM². Due to the fact that, a data cube was defined as a function, operations (i.e. Drill-across, ChangeBase, Roll-up, Projection, and Selection) were defined over functions. Regarding the set of integrity constraints, they reflect the importance of summarizability (or aggregability) of measures, and pay special attention to it.

Research Line: Automating the Multidimensional Design of Data Warehouses

Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.

Several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which misleads the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert’s ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and time-consuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.

In this research line we introduced two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the current limitations previously discussed. On the one hand, we rely on the end-user requirements, but we do not decline that the data sources may also contain hidden analysis capabilities that, eventually, may be of interest. Nevertheless, in any case, we do not generate endless chunks of results from the sources. On the contrary, we aim at filtering by means of objective evidences the results obtained by analyzing the sources. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens. Furthermore, we also focus on the automation of the process, to facilitate the designer’s task as much as possible.

Related publications

2024
Alberto Abelló, James Cheney: Eris: efficiently measuring discord in multidimensional sources. VLDB J. 2024

2022
Katja Hose, Oscar Romero, Il-Yeol Song: Trends in Design, Optimization, Languages, and Analytical Processing of Big Data (DOLAP 2020). Inf. Syst. 2022
Rudra Pratap Deb Nath, Oscar Romero, Torben Bach Pedersen, Katja Hose: High-level ETL for semantic data warehouses. Semantic Web 2022
Alberto Abelló, James Cheney: Eris: Measuring discord among multidimensional data sources. CoRR 2022
Alberto Abelló, James Cheney: Measuring Discord Among Multidimensional Data Sources. DOLAP 2022
Alberto Abelló, James Cheney: Measuring Discord Among Multidimensional Data Sources. DOLAP 2022

2021
Amine Ghrab, Oscar Romero, Sabri Skhiri, Esteban Zimányi: TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes. Inf. Syst. Frontiers 2021

2020
Rudra Pratap Deb Nath, Katja Hose, Torben Bach Pedersen, Oscar Romero, Amrit Bhattacharjee: SETLBI: An Integrated Platform for Semantic Business Intelligence. WWW (Companion Volume) 2020
Il-Yeol Song, Katja Hose, Oscar Romero: Proceedings of the 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with EDBT/ICDT 2020 Joint Conference, DOLAP@EDBT/ICDT 2020, Copenhagen, Denmark, March 30, 2020. DOLAP 2020
Rudra Pratap Deb Nath, Oscar Romero, Torben Bach Pedersen, Katja Hose: High-Level ETL for Semantic Data Warehouses - Full Version. CoRR 2020
Oscar Romero, Robert Wrembel, Il-Yeol Song: An Alternative View on Data Processing Pipelines from the DOLAP 2019 Perspective. Inf. Syst. 2020
Il-Yeol Song, Katja Hose, Oscar Romero: Proceedings of the 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with EDBT/ICDT 2020 Joint Conference, DOLAP@EDBT/ICDT 2020, Copenhagen, Denmark, March 30, 2020. DOLAP 2020

2019
Robert Wrembel, Alberto Abelló, Il-Yeol Song: DOLAP data warehouse research over two decades: Trends and challenges. Inf. Syst. 2019
Jam Jahanzeb Khan Behan, Oscar Romero, Esteban Zimányi: Multidimensional Integration of RDF Datasets. DaWaK 2019
Il-Yeol Song, Oscar Romero, Robert Wrembel: Proceedings of the 21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, co-located with EDBT/ICDT Joint Conference, DOLAP@EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019. DOLAP 2019
Il-Yeol Song, Oscar Romero, Robert Wrembel: Proceedings of the 21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, co-located with EDBT/ICDT Joint Conference, DOLAP@EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019. DOLAP 2019

2018
Enrico Gallinucci, Matteo Golfarelli, Stefano Rizzi, Alberto Abelló, Oscar Romero: Interactive multidimensional modeling of linked data for exploratory OLAP. Inf. Syst. 2018

2017
Jovan Varga, Ekaterina Dobrokhotova, Oscar Romero, Torben Bach Pedersen, Christian Thomsen: SM4MQ: A Semantic Model for Multidimensional Queries. ESWC (1) 2017
Rudra Pratap Deb Nath, Katja Hose, Torben Bach Pedersen, Oscar Romero: SETL: A programmable semantic extract-transform-load framework for semantic data warehouses. Inf. Syst. 2017
Jovan Varga: Semantic metadata for supporting exploratory OLAP. 2017

2016
Stephany García, Oscar Romero, Ruth Raventós: DSS from an RE Perspective: A systematic mapping. J. Syst. Softw. 2016
Petar Jovanovic, Oscar Romero, Alberto Abelló: A Unified View of Data-Intensive Flows in Business Intelligence Systems: A Survey. Trans. Large Scale Data Knowl. Centered Syst. 2016
Emmanouil Valsomatzis, Torben Bach Pedersen, Alberto Abelló, Katja Hose, Laurynas Siksnys: Towards constraint-based aggregation of energy flexibilities. e-Energy (Posters) 2016
Stefano Rizzi, Enrico Gallinucci, Matteo Golfarelli, Alberto Abelló, Oscar Romero: Towards Exploratory OLAP on Linked Data. SEBD 2016
Esteban Zimányi, Alberto Abelló: Business Intelligence - 5th European Summer School, eBISS 2015, Barcelona, Spain, July 5-10, 2015, Tutorial Lectures eBISS 2016
Jovan Varga, Lorena Etcheverry, Alejandro A. Vaisman, Oscar Romero, Torben Bach Pedersen, Christian Thomsen: QB2OLAP: Enabling OLAP on Statistical Linked Open Data. ICDE 2016

2015
Pedro Furtado, Sergi Nadal, Verónika Peralta, Mahfoud Djedaini, Nicolas Labroche, Patrick Marcel: Materializing Baseline Views for Deviation Detection Exploratory OLAP. DaWaK 2015
Maximiliano Ariel López, Sergi Nadal, Mahfoud Djedaini, Patrick Marcel, Verónika Peralta, Pedro Furtado: An Approach for Alert Raising in Real-Time Data Warehouses. EDA 2015
Alberto Abelló, Oscar Romero, Torben Bach Pedersen, Rafael Berlanga Llavori, Victoria Nebot, María José Aramburu Cabo, Alkis Simitsis: Using Semantic Web Technologies for Exploratory OLAP: A Survey. IEEE Trans. Knowl. Data Eng. 2015
Rihan Hai, Vasileios Theodorou, Maik Thiele, Wolfgang Lehner: SCIT: A Schema Change Interpretation Tool for Dynamic-Schema Data Warehouses. ADC 2015
Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro A. Vaisman, Esteban Zimányi: A Framework for Building OLAP Cubes on Graphs. ADBIS 2015

2014
Ruth Raventós, Stephany García, Oscar Romero, Alberto Abelló, Jaume Viñas: On the Complexity of Requirements Engineering for Decision-Support Systems: The CID Case Study. eBISS 2014
Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló, Daria Mayorova: A requirement-driven approach to the design and evolution of data warehouses. Inf. Syst. 2014

2013
Alberto Abelló, Jérôme Darmont, Lorena Etcheverry, Matteo Golfarelli, Jose-Norberto Mazón, Felix Naumann, Torben Bach Pedersen, Stefano Rizzi, Juan Trujillo, Panos Vassiliadis, Gottfried Vossen: Fusion Cubes: Towards Self-Service Business Intelligence. IJDWM 2013
Oscar Romero, Alberto Abelló: Open Access Semantic Aware Business Intelligence. eBISS 2013

2012
Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló: Requirement-Driven Creation and Deployment of Multidimensional and ETL Designs. ER Workshops 2012

2011
Alberto Abelló, Jaume Ferrarons, Oscar Romero: Building cubes with MapReduce. DOLAP 2011
Oscar Romero, Alkis Simitsis, Alberto Abelló: GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs. DaWaK 2011
Oscar Romero, Patrick Marcel, Alberto Abelló, Verónika Peralta, Ladjel Bellatreche: Describing Analytical Sessions Using a Multidimensional Algebra. DaWaK 2011
Alberto Abelló, Oscar Romero: Service-Oriented Business Intelligence. eBISS 2011
Oscar Romero, Alberto Abelló: A Comprehensive Framework on Multidimensional Modeling. ER Workshops 2011
Oscar Romero, Alberto Abelló: Data-Driven Multidimensional Design for OLAP. SSDBM 2011
Oscar Romero, Alberto Abelló: Multidimensional Design Methods for Data Warehousing. Integrations of Data Warehousing, Data Mining and Database Technologies 2011

2010
Alberto Abelló, Il-Yeol Song: Data warehousing and OLAP (DOLAP'08). Data Knowl. Eng. 2010
Oscar Romero, Alberto Abelló: Automatic validation of requirements to support multidimensional design. Data Knowl. Eng. 2010
Oscar Romero, Alberto Abelló: A framework for multidimensional design of data warehouses from ontologies. Data Knowl. Eng. 2010
Oscar Romero: Automating the multidimensional design of data warehouses. 2010

2009
Oscar Romero, Alberto Abelló: A Survey of Multidimensional Modeling Methodologies. IJDWM 2009
Oscar Romero, Diego Calvanese, Alberto Abelló, Mariano Rodriguez-Muro: Discovering functional dependencies for multidimensional design. DOLAP 2009
Alberto Abelló, Oscar Romero: On-Line Analytical Processing. Encyclopedia of Database Systems 2009

2008
Oscar Romero, Alberto Abelló: MDBE: Automatic Multidimensional Modeling. ER 2008

2007
Oscar Romero, Alberto Abelló: Generating Multidimensional Schemas from the Semantic Web. CAiSE Forum 2007
Oscar Romero, Alberto Abelló: On the Need of a Reference Algebra for OLAP. DaWaK 2007
Oscar Romero, Alberto Abelló: Automating multidimensional design from ontologies. DOLAP 2007
Oscar Romero, Alberto Abelló: Generating Multidimensional Schemas from the Semantic Web. CAiSE Forum 2007

2006
Alberto Abelló, José Samos, Fèlix Saltor: YAM: a multidimensional conceptual model extending UML. Inf. Syst. 2006
Adriana Marotta, Federico Piedrabuena, Alberto Abelló: Managing Quality Properties in a ROLAP Environment. CAiSE 2006
Oscar Romero, Alberto Abelló: Multidimensional Design by Examples. DaWaK 2006
Stefano Rizzi, Alberto Abelló, Jens Lechtenbörger, Juan Trujillo: Research in data warehouse modeling and design: dead or alive? DOLAP 2006

2003
Carme Martín, Alberto Abelló: A Temporal Study of Data Sources to Load a Corporate Data Warehouse. DaWaK 2003
Alberto Abelló, José Samos, Fèlix Saltor: Implementing operations to navigate semantic star schemas. DOLAP 2003

2002
Alberto Abelló, José Samos, Fèlix Saltor: On relationships offering new drill-across possibilities. DOLAP 2002
Alberto Abelló, José Samos, Fèlix Saltor: YAM(Yet Another Multidimensional Model): An Extension of UML. IDEAS 2002
Alberto Abelló: YAM^2: a multidimensional conceptual model. 2002

2001
Alberto Abelló, José Samos, Fèlix Saltor: A Framework for the Classification and Description of Multidimensional Data Models. DEXA 2001
Alberto Abelló, José Samos, Fèlix Saltor: Understanding Analysis Dimensions in a Multidimensional Object-Oriented Model. DMDW 2001
Alberto Abelló, José Samos, Fèlix Saltor: Understanding Facts in a Multidimensional Object-Oriented Model. DOLAP 2001
Alberto Abelló, José Samos, Fèlix Saltor: Understanding Analysis Dimensions in a Multidimensional Object-Oriented Model. DMDW 2001

2000
Alberto Abelló, José Samos, Fèlix Saltor: Benefits of an Object-Oriented Multidimensional Data Model. Objects and Databases 2000