UPC Logo

Publications

2016
  • Vasileios Theodorou, Alberto Abelló, Wolfgang Lehner, Maik Thiele. Quality measures for ETL processes: from goals to implementation. In Concurrency and Computation: Practice and Experience, 28(15). John Wiley & Sons, 2016. Pages 3969-3993. ISSN: 1532-0634. DOI: 10.1002/cpe.3729
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.
  • Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. Incremental Consolidation of Data-Intensive Multi-Flows. In Transactions on Knowledge and Data Engineering, 28(5). IEEE Press, May 2016. Pages 1203-1216. ISSN: 1041-4347. DOI: 10.1109/TKDE.2016.2515609
    Business intelligence (BI) systems depend on efficient integration of disparate and often heterogeneous data. The integration of data is governed by data-intensive flows and is driven by a set of information requirements. Designing such flows is in general a complex process, which due to the complexity of business environments is hard to be done manually. In this paper, we deal with the challenge of efficient design and maintenance of data-intensive flows and propose an incremental approach, namely CoAl , for semi-automatically consolidating data-intensive flows satisfying a given set of information requirements. CoAl works at the logical level and consolidates data flows from either high-level information requirements or platform-specific programs. As CoAl integrates a new data flow, it opts for maximal reuse of existing flows and applies a customizable cost model tuned for minimizing the overall cost of a unified solution. We demonstrate the efficiency and effectiveness of our approach through an experimental evaluation using our implemented prototype.
  • Petar Jovanovic, Oscar Romero, Alberto Abelló. A Unified View of Data-Intensive Flows in Business Intelligence Systems: A Survey. In Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIX. Lecture Notes in Computer Science 10120. Springer, 2016. Pages 66-107. ISBN (printed): 978-3-662-54036-7. ISBN (online): 978-3-662-54037-4. ISSN: 0302-9743. DOI: 10.1007/978-3-662-54037-4_3
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.
  • Ayman Alserafi, Alberto Abelló, Oscar Romero, Toon Calders. Towards Information Profiling: Data Lake Content Metadata Management. In the 3rd Woorkshop on Data Integration and Applications (DINA) held in conjunction with IEEE International Conference on Data Mining Workshops (ICDMW). Barcelona, December 12-15, 2016. IEEE, 2016. ISBN (online): 978-1-5090-5910-2. ISBN: 978-1-5090-5911-9. DOI: 10.1109/ICDMW.2016.0033
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this. We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.
  • Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet, Robert Wrembel. Towards Intelligent Data Analysis: The Metadata Challenge. In International Conference on Internet of Things and Big Data (IoTBD). Rome (Italy), April 23-25, 2016. ScitePress, 2016. Pages 331-338. ISBN: 978-989-758-183-0. DOI: 10.5220/0005876203310338
    Once analyzed correctly, data can yield substantial benefits. The process of analyzing the data and transforming it into knowledge is known as Knowledge Discovery in Databases (KDD). The plethora and subtleties of algorithms in the different steps of KDD, render it challenging. An effective user support is of crucial importance, even more now, when the analysis is performed on Big Data. Metadata is the necessary component to drive the user support. In this paper we study the metadata required to provide user support on every stage of the KDD process. We show that intelligent systems addressing the problem of user assistance in KDD are incomplete in this regard. They do not use the whole potential of metadata to enable assistance during the whole process. We present a comprehensive classification of all the metadata required to provide user support. Furthermore, we present our implementation of a metadata repository for storing and managing this metadata and explain its benefits in a real Big Data analytics project.
  • Petar Jovanovic. Requirement-Driven Design and Optimization of Data-Intensive Flows. PhD Thesis, Universitat Politècnica de Catalunya. Barcelona, September 2016.

    Data have become number one assets of today's business world. Thus, its exploitation and analysis attracted the attention of people from different fields and having different technical backgrounds. Data-intensive flows are central processes in today¿s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. However, designing and optimizing such data flows, to satisfy both users' information needs and agreed quality standards, have been known as a burdensome task, typically left to the manual efforts of a BI system designer. These tasks have become even more challenging for next generation BI systems, where data flows typically need to combine data from in-house transactional storages, and data coming from external sources, in a variety of formats (e.g., social media, governmental data, news feeds). Moreover, for making an impact to business outcomes, data flows are expected to answer unanticipated analytical needs of a broader set of business users' and deliver valuable information in near real-time (i.e., at the right time). These challenges largely indicate a need for boosting the automation of the design and optimization of data-intensive flows. This PhD thesis aims at providing automatable means for managing the lifecycle of data-intensive flows. The study primarily analyzes the remaining challenges to be solved in the field of data-intensive flows, by performing a survey of current literature, and envisioning an architecture for managing the lifecycle of data-intensive flows. Following the proposed architecture, we further focus on providing automatic techniques for covering different phases of the data-intensive flows' lifecycle. In particular, the thesis first proposes an approach (CoAl) for incremental design of data-intensive flows, by means of multi-flow consolidation. CoAl not only facilitates the maintenance of data flow designs in front of changing information needs, but also supports the multi-flow optimization of data-intensive flows, by maximizing their reuse. Next, in the data warehousing (DW) context, we propose a complementary method (ORE) for incremental design of the target DW schema, along with systematically tracing the evolution metadata, which can further facilitate the design of back-end data-intensive flows (i.e., ETL processes). The thesis then studies the problem of implementing data-intensive flows into deployable formats of different execution engines, and proposes the BabbleFlow system for translating logical data-intensive flows into executable formats, spanning single or multiple execution engines. Lastly, the thesis focuses on managing the execution of data-intensive flows on distributed data processing platforms, and to this end, proposes an algorithm (H-WorD) for supporting the scheduling of data-intensive flows by workload-driven redistribution of data in computing clusters. The overall outcome of this thesis an end-to-end platform for managing the lifecycle of data-intensive flows, called Quarry. The techniques proposed in this thesis, plugged to the Quarry platform, largely facilitate the manual efforts, and assist users of different technical skills in their analytical tasks. Finally, the results of this thesis largely contribute to the field of data-intensive flows in today's BI systems, and advocate for further attention by both academia and industry to the problems of design and optimization of data-intensive flows.
  • Alberto Abelló, Xavier Burgués, María José Casany, Carme Martín, Maria Carme Quer, M. Elena Rodríguez, Oscar Romero, Antoni Urpí. A software tool for e-assessment of relational database skills. In International Journal of Engineering Education, 32(3). Tempus Publications, February 2016. Pages 1289-1312. ISSN: 0949-149X/91
    The objective of this paper is to present a software tool for the e-assessment of relational database skills of students. The tool is referred to as LearnSQL (Learning Environment for Automatic Rating of Notions of SQL). LearnSQL is able to correct, provide automatic feedback, and grade the responses of relational database exercises. It can assess the acquisition of knowledge and practical skills in relational database that are not assessed by other systems. The paper also reports on the impact of using the tool over the past 8 years by 2500 students.
  • Petar Jovanovic, Oscar Romero, Toon Calders, Alberto Abelló. H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution. In 20th East European Conference on Advances in Databases and Information Systems (ADBIS). Prague (Czech Republic), August 28-31, 2016. Lecture Notes in Computer Science 9809, Springer, 2016. Pages 306-320. ISBN: 978-3-319-44038-5. DOI: 10.1007/978-3-319-44039-2_21
    Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.
  • Emmanouil Valsomatzis, Torben Bach Pedersen, Alberto Abelló, Katja Hose, Laurynas Siksnys. Towards constraint-based aggregation of energy flexibilities. In poster session in Seventh International Conference on Future Energy Systems (e-Energy 2016). Waterloo, ON (Canada), June 21-24, 2016. ACM, 2016. Pages 6:1-6:2. ISBN: 978-1-4503-4417-3. DOI: 10.1145/2939912.2942351
    The aggregation of energy flexibilities enables individual producers and/or consumers with small loads to directly participate in the emerging energy markets. On the other hand, aggregation of such flexibilities might also create problems to the operation of the electrical grid. In this paper, we present the problem of aggregating energy flexibilities taking into account grid capacity limitations and introduce a heuristic aggregation technique. We show through an experimental setup that our proposed technique, compared to a baseline approach, not only leads to a valid unit commitment result that respects the grid constraint, but it also improves the quality of the result.
  • Victor Herrero, Alberto Abelló, Oscar Romero. NOSQL Design for Analytical Workloads: Variability Matters. In 35th International Conference on Conceptual Modeling (ER). Gifu (Japan), November 14-17, 2016. Lecture Notes in Computer Science 9974. Springer, 2016. Pages 50-64. ISBN: 978-3-319-46396-4. DOI: 10.1007/978-3-319-46397-1_4
    Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.
  • Rana Faisal Munir, Oscar Romero, Alberto Abelló, Besim Bilalli, Maik Thiele, Wolfgang Lehner. ResilientStore: A Heuristic-Based Data Format Selector for Intermediate Results. In 6th International Conference on Model and Data Engineering (MEDI). Almería (Spain), September 21-23, 2016. Lecture Notes in Computer Science 9893. Springer, 2016. Pages 42-56. ISBN: 978-3-319-45546-4. DOI: 10.1007/978-3-319-45547-1_4
    Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns. We have implemented ResilientStore for HDFS and three different data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18 % better performance than any solution based on a single fixed format.
  • Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet, Robert Wrembel. Automated Data Pre-processing via Meta-learning. In 6th International Conference on Model and Data Engineering (MEDI). Almería (Spain), September 21-23, 2016. Lecture Notes in Computer Science 9893. Springer, 2016. Pages 194-208. ISBN: 978-3-319-45546-4. DOI: 10.1007/978-3-319-45547-1_16
    A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and non-experienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from meta-learning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.
  • Stefano Rizzi, Enrico Gallinucci, Matteo Golfarelli, Alberto Abelló, Oscar Romero. Towards Exploratory OLAP on Linked Data. In 24th Italian Symposium on Advanced Database Systems (SEBD). Ugento, Lecce (Italy), June 19-22, 2016. Matematicamente.it, 2016. Pages 86-93. ISBN: 9788896354889
    In the context of exploratory OLAP, coupling the information wealth of linked data with the precision and detail of corporate data can greatly improve the efectiveness of the decision-making process. In this paper we outline an approach that enables users to extend the hierarchies in their corporate cubes through a user-guided process that explores selected linked data and derives hierarchies from them. This is done by identifying in the linked data the recurring modeling patterns that express roll-up relationships between RDF concepts and translating them into multidimensional knowledge.
  • Emmanouil Valsomatzis, Torben Bach Pedersen, Alberto Abelló, Katja Hose. Aggregating energy flexibilities under constraints. In 2016 IEEE International Conference on Smar Grid Communications (SmartGridComm 2016). Sydney (Australia), 6-9 November 2016. IEEE, 2016. Pages 484-490. ISBN: 978-1-5090-4075-9. DOI: 10.1109/SmartGridComm.2016.7778808
    The flexibility of individual energy prosumers (producers and/or consumers) has drawn a lot of attention in recent years. Aggregation of such flexibilities provides prosumers with the opportunity to directly participate in the energy market and at the same time reduces the complexity of scheduling the energy units. However, aggregated flexibility should support normal grid operation. In this paper, we build on the flex-offer (FO) concept to model the inherent flexibility of a prosumer (e.g., a single flexible consumption device such as a clothes washer). An FO captures flexibility in both time and amount dimensions. We define the problem of aggregating FOs taking into account grid power constraints. We also propose two constraint-based aggregation techniques that efficiently aggregate FOs while retaining flexibility. We show through a comprehensive evaluation that our techniques, in contrast to state-of-the-art techniques, respect the constraints imposed by the electrical grid. Moreover, our techniques also reduce the scheduling input size significantly and improve the quality of scheduling results.
  • Esteban Zimányi, Alberto Abelló (Editors). Business Intelligence. Tutorial Lectures of 5th European Summer School in Business Intelligence (eBISS). Barcelona (Spain), July 5-10, 2015. In Lecture Notes in Business Information Processing, 253. Springer, 2016. ISBN: 978-3-319-39242-4. DOI: 10.1007/978-3-319-39243-1
2015
  • Oscar Romero, Victor Herrero, Alberto Abelló, Jaume Ferrarons. Tuning small analytics on Big Data: Data partitioning and secondary indexes in the Hadoop ecosystem. In Information Systems, 54. Pages 336-356. Elsevier, December 2015. ISSN: 0306-4379. DOI: 10.1016/j.is.2014.09.005
    In the recent years the problems of using generic storage (i.e., relational) techniques for very specific applications have been detected and outlined and, as a consequence, some alternatives to Relational DBMSs (e.g., HBase) have bloomed. Most of these alternatives sit on the cloud and benefit from cloud computing, which is nowadays a reality that helps us to save money by eliminating the hardware as well as software fixed costs and just pay per use. On top of this, specific querying frameworks to exploit the brute force in the cloud (e.g., MapReduce) have also been devised. The question arising next tries to clear out if this (rather naive) exploitation of the cloud is an alternative to tuning DBMSs or it still makes sense to consider other options when retrieving data from these settings. In this paper, we study the feasibility of solving OLAP queries with Hadoop (the Apache project implementing MapReduce) while benefiting from secondary indexes and partitioning in HBase. Our main contribution is the comparison of different access plans and the definition of criteria (i.e., cost estimation) to choose among them in terms of consumed resources (namely CPU, bandwidth and I/O).
  • Alberto Abelló, Oscar Romero, Torben Bach Pedersen, Rafael Berlanga Llavori, Victoria Nebot, María José Aramburu Cabo, Alkis Simitsis. Using Semantic Web Technologies for Exploratory OLAP: A Survey. In IEEE Transactions on Knowledge and Data Engineering, 27(2). Pages 571-588. IEEE, February 2015. ISSN: 1041-4347. DOI: 10.1109/TKDE.2014.2330822
    This paper describes the convergence of some of the most influential technologies in the last few years, namely data warehousing (DW), on-line analytical processing (OLAP), and the Semantic Web (SW). OLAP is used by enterprises to derive important business-critical knowledge from data inside the company. However, the most interesting OLAP queries can no longer be answered on internal data alone, external data must also be discovered (most often on the web), acquired, integrated, and (analytically) queried, resulting in a new type of OLAP, exploratory OLAP. When using external data, an important issue is knowing the precise semantics of the data. Here, SW technologies come to the rescue, as they allow semantics (ranging from very simple to very complex) to be specified for web-available resources. SW technologies do not only support capturing the "passive" semantics, but also support active inference and reasoning on the data. The paper first presents a characterization of DW/OLAP environments, followed by an introduction to the relevant SW foundation concepts. Then, it describes the relationship of multidimensional (MD) models and SW technologies, including the relationship between MD models and SW formalisms. Next, the paper goes on to survey the use of SW technologies for data modeling and data provisioning, including semantic data annotation and semantic-aware extract, transform, and load (ETL) processes. Finally, all the findings are discussed and a number of directions for future research are outlined, including SW support for intelligent MD querying, using SW technologies for providing context to data warehouses, and scalability issues.
  • Alberto Abelló. Big Data Design. In 18th International Workshop on Data Warehousing and OLAP (DOLAP). Melbourne (Australia), November 2015. ACM Press, 2015. Pages 35-38. ISBN: 978-1-4503-3785-4. DOI 10.1145/2811222.2811235
    It is widely accepted today that Relational databases are not appropriate in highly distributed shared-nothing architectures of commodity hardware, that need to handle poorly structured heterogeneous data. This has brought the blooming of NoSQL systems with the purpose of mitigating such problem, specially in the presence of analytical workloads. Thus, the change in the data model and the new analytical needs beyond OLAP take us to rethink methods and models to design and manage these newborn repositories. In this paper, we will analyze state of the art and future research directions.
  • Vasileios Theodorou, Alberto Abelló, Maik Thiele, Wolfgang Lehner. POIESIS: a Tool for Quality-aware ETL Process Redesign. In demonstration session in 18th International Conference on Extending Database Technology (EDBT). Brussels (Belgium), March 2015. Open Proceedings, 2015. Pages 545-548. ISBN 978-3-89318-067-7
    We present a tool, called POIESIS, for automatic ETL process enhancement. ETL processes are essential data-centric activities in modern business intelligence environments and they need to be examined through a viewpoint that concerns their quality characteristics (e.g., data quality, performance, manageability) in the era of Big Data. POIESIS responds to this need by providing a user-centered environment for quality-aware analysis and redesign of ETL flows. It generates thousands of alternative flows by adding flow patterns to the initial flow, in varying positions and combinations, thus creating alternative design options in a multidimensional space of different quality attributes. Through the demonstration of POIESIS we introduce the tool's capabilities and highlight its efficiency, usability and modificability, thanks to its polymosphic design.
  • Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló, Héctor Candón, Sergi Nadal. Quarry: Digging Up the Gems of Your Data Treasury. In demonstration session in 18th International Conference on Extending Database Technology (EDBT). Brussels (Belgium), March 2015. Open Proceedings, 2015. Pages 549-552. ISBN 978-3-89318-067-7
    The design lifecycle of a data warehousing (DW) system is primarily led by requirements of its end-users and the complexity of underlying data sources. The process of designing a multidimensional (MD) schema and back-end extract-transform-load (ETL) processes, is a long-term and mostly manual task. As enterprises shift to more real-time and ’on-the-fly’ decision making, business intelligence (BI) systems require automated means for efficiently adapting a physical DW design to frequent changes of business needs. To address this problem, we present Quarry, an end-to-end system for assisting users of various technical skills in managing the incremental design and deployment of MD schemata and ETL processes. Quarry automates the physical design of a DW system from high-level information requirements. Moreover, Quarry provides tools for efficienly accommodating MD schema and ETL process designs to new or changed information needs of its end-users. Finally, Quarry facilitates the deployment of the generated DW design over an extensible list of execution engines. On-site, we will use a variety of examples to show how Quarry facilitates the complexity of the DW design lifecycle.
2014
  • Ruth Raventós, Stephany García, Oscar Romero, Alberto Abelló, and Jaume Viñas. On the Complexity of Requirements Engineering for Decision-Support Systems: The CID Case Study. In Fourth European Business Intelligence Summer School (eBISS'14). Lecture Notes in Business Information Processing, Volume 205. Springer, July 2015. Pages 1-38. ISBN (printed): 978-3-319-17551-5. ISSN (electronic): 978-3-319-17550-8. DOI: 10.1007/978-3-319-17551-5
    The Chagas disease is classified as a life-threatening disease by the World Health Organization (WHO) and is currently causing death to 534,000 people every year. In order to advance with the disease control, the WHO presented a strategy that included the development of the Chagas Information Database (CID) for surveillance to raise awareness about Chagas. CID is defined as a decision-support system to support national and international authorities in both their day-by-day and long-term decision making. The requirements engineering to develop this project was particularly complex and Pohl’s framework was followed. This paper describes the results of applying the framework in this project. Thus, it focuses on the requirements engineering stage. The difficulties found motivated the further study and analysis of the complexity of requirements engineering in decision-support systems and the feasibility of using said framework.
  • Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló, Daria Mayorova. A requirement-driven approach to the design and evolution of data warehouses. Information Systems, Volume 44. Pages 94-119. Elsevier, August 2014. ISSN: 0306-4379. DOI: 10.1016/j.is.2014.01.004
    Designing data warehouse (DW) systems in highly dynamic enterprise environments is not an easy task. At each moment, the multidimensional (MD) schema needs to satisfy the set of information requirements posed by the business users. At the same time, the diversity and heterogeneity of the data sources need to be considered in order to properly retrieve needed data. Frequent arrival of new business needs requires that the system is adaptable to changes. To cope with such an inevitable complexity (both at the beginning of the design process and when potential evolution events occur), in this paper we present a semi-automatic method called ORE, for creating DW designs in an iterative fashion based on a given set of information requirements. Requirements are first considered separately. For each requirement, ORE expects the set of possible MD interpretations of the source data needed for that requirement (in a form similar to an MD schema). Incrementally, ORE builds the unified MD schema that satisfies the entire set of requirements and meet some predefined quality objectives. We have implemented ORE and performed a number of experiments to study our approach. We have also conducted a limited-scale case study to investigate its usefulness to designers.
  • Alberto Abelló, Boualem Benatallah, Ladjel Bellatreche (Eds.). Special Issue on: Model and Data Engineering. J. Data Semantics 3(3). Springer, 2014. Pages 141-142. ISSN (printed): 1861-2032. ISBN (electronic): 1861-2040. DOI 10.1007/s13740-013-0033-1

  • Vasileios Theodorou, Alberto Abelló, Wolfgang Lehner. Quality Measures for ETL Processes. 16th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Munich (Germany), September 2-4, 2014. Pages 9-22. Lecture Notes in Computer Science 8646, Springer 2014. ISBN (printed): 978-3-319-10159-0. ISBN (electronic): 978-3-319-10160-6. DOI: 10.1007/978-3-319-10160-6_2
    ETL processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of Business Process Management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using Goal Modeling techniques.
  • Emona Nakuçi, Vasileios Theodorou, Petar Jovanovic, Alberto Abelló. Bijoux: Data Generator for Evaluating ETL Process Quality. In 17th International Workshop on Data Warehousing and OLAP (DOLAP). Shanghai (China), November 2014. ACM Press, 2014. Pages 23-32. ISBN: 978-1-4503-0999-8. DOI: 10.1145/2666158.2666183
    Obtaining the right set of data for evaluating the fulfillment of different quality standards in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. Additionally, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over data, and automatically generates testing datasets. At the same time, it considers different dataset and transformation characteristics (e.g., size, distribution, selectivity, etc.) in order to cover a variety of test scenarios. We report our experimental findings showing the effectiveness and scalability of our approach.
  • Vasileios Theodorou, Alberto Abelló, Maik Thiele, Wolfgang Lehner. A Framework for User-Centered Declarative ETL. In 17th International Workshop on Data Warehousing and OLAP (DOLAP). Shanghai (China), November 2014. ACM Press, 2014. Pages 67-70. ISBN: 978-1-4503-0999-8. DOI: 10.1145/2666158.2666178
    As business requirements evolve with increasing information density and velocity, there is a growing need for efficiency and automation of Extract-Transform-Load (ETL) processes. Current approaches for the modeling and optimization of ETL processes provide platform-independent optimization solutions for the (semi-)automated transition among different abstraction levels, focusing on cost and performance. However, the suggested representations are not abstract enough to communicate business requirements and the role of the process quality in a user-centered perspective has not yet been adequately examined. In this paper, we introduce a novel methodology for the end-to-end design of ETL processes that takes under consideration both functional and non-functional requirements. Based on existing work, we raise the level of abstraction for the conceptual representation of ETL operations and we show how process quality characteristics can generate specific patterns on the process design.
  • Alberto Abelló, Ramon Bragós, Margarita Cabrera, Antonia Cortés, Àlex Fabra, Josep Fernández, José Lázaro, Jordi Amorós, Neus Arroyo, Francesc Garófano, Daniel González, Aleix Guash, Ferran Recio. Plataforma per a la interoperabilitat de laboratoris virtuals i remots. Revista de Tecnologia, Número 5, 2014. Pages 35-43. ISSN (printed): 1698-2045. ISSN (electronic): 2013-9861. DOI: 10.2436/20.2004.01.14

2013
  • Unleashing the Potential of Big Data. A white paper based on the 2013 World Summit on Big Data and Organization Design. http://www.e-pages.dk/aarhusuniversitet/775/
    "While knowledge is the engine of the economy, Big Data is its fuel." This characterization of Big Data was made by Ms. Neelie Kroes, European Commission Vice President in charge of the digital agenda for Europe. Kroes calls Big Data the "new oil". For traditional industries and the service sector, Big Data will create a huge number of commercial opportunities. For the public sector, Big Data offers a promising route to service improvement and transparency as well as a tool for making infrastructure and other investments. Politicians and policymakers are aware of both the potential and the dangers of Big Data. In 2012, the Obama Administration launched the Big Data Research and Development Initiative in the United States, and the European Commission (EC) is taking steps to remove obstacles to the use of Big Data through legislation, standards setting, and its R&D programmes. Hand-in-hand with new data-protection legislation, the EC wants to formulate an overall cybersecurity strategy to ensure that individual and organizational data are properly used and protected. Alongside harmonized rules for how data is handled, the EC is pushing for standards to allow the interoperability and integration of data. Other government initiatives focus on technological development and infrastructure projects. This White Paper offers ideas and recommendations to further increase the value of Big Data initiatives while protecting against their risks. Governments, universities, and business all have a role to play in this endeavor, and we hope that decision makers will find the paper helpful as they pursue their respective tasks.
  • Alberto Abelló, Jérôme Darmont, Lorena Etcheverry, Matteo Golfarelli, José-Norberto Mazón, Felix Naumann, Torben Bach Pedersen, Stefano Rizzi, Juan Trujillo, Panos Vassiliadis, and Gottfried Vossen. Fusion Cubes: Towards Self-Service Business Intelligence. In International Journal on Data Warehousing and Mining (IJDWM), volume 9, number 2. Idea Group, 2013. Pages 66-88. ISSN: 1548-3924 DOI: 10.4018/jdwm.2013040104

    Self-service business intelligence is about enabling non-expert users to make well-informed decisions by enriching the decision process with situational data, i.e., data that have a narrow focus on a specific business problem and, typically, a short lifespan for a small group of users. Often, these data are not owned and controlled by the decision maker; their search, extraction, integration, and storage for reuse or sharing should be accomplished by decision makers without any intervention by designers or programmers. The goal of this paper is to present the framework we envision to support self-service business intelligence and the related research challenges; the underlying core idea is the notion of fusion cubes, i.e., multidimensional cubes that can be dynamically extended both in their schema and their instances, and in which situational data and metadata are associated with quality and provenance annotations.
  • Oscar Romero, Alberto Abelló. Open Access Semantic Aware Business Intelligence. In Third European Business Intelligence Summer School (eBISS'13). Lecture Notes in Business Information Processing, Volume 172. Pages 121-149. Springer, July 2014. ISSN (printed): 978-3-319-05460-5. ISSN (electronic): 978-3-319-05461-2. DOI: 10.1007/978-3-319-05461-2_4
    The vision of an interconnected and open Web of data is, still, a chimera far from being accomplished. Fortunately, though, one can find several evidences in this direction and despite the technical challenges behind such approach recent advances have shown its feasibility. Semantic-aware formalisms (such as RDF and ontology languages) have been successfully put in practice in approaches such as Linked Data, whereas movements like Open Data have stressed the need of a new open access paradigm to guarantee free access to Web data.

    In front of such promising scenario, traditional business intelligence (BI) techniques and methods have been shown not to be appropriate. BI was born to support decision making within the organizations and the data warehouse, the most popular IT construct to support BI, has been typically nurtured with data either owned or accessible within the organization. With the new linked open data paradigm BI systems must meet new requirements such as providing on-demand analysis tasks over any relevant (either internal or external) data source in right-time. In this paper we discuss the technical challenges behind such requirements, which we refer to as exploratory BI, and envision a new kind of BI system to support this scenario.
  • Carme Martín, Toni Urpí, M. José Casany, Xavier Burgués, Carme Quer, M. Elena Rodríguez and Alberto Abelló. Improving Learning in a Database Course using Collaborative Learning Techniques. In International Journal of Engineering Education (IJEE), volume 29, number 4. Tempus publications, 2013. Pages 1-12. ISSN: 0949-149X

    In the last years the European universities have been adapting their curricula to the new European Higher Education Area, which implies the use of active learning methodologies. In most database courses, project-based learning is the active methodology widely used but the authors of this paper face context constraints against its use. This paper presents a quantitative and qualitative analysis of the results obtained from the use of collaborative learning in both the cross-curricula competences and the subject-specific ones in the "Introduction to Databases" course of the Barcelona School of Informatics. Relevantly, this analysis demonstrates the positive impact this methodology had, allowing to conclude that not only project-based learning fits these kind of courses.
2012
  • Alberto Abelló, Oscar Romero. Ontology driven search of compound IDs. Knowledge and Information Systems, Volume 32, Issue 1. Pages 191-216. Springer, July 2012. ISSN (printed): 0219-1377. ISSN (electronic): 0219-3116. DOI: 10.1007/s10115-011-0418-0
    Object identification is a crucial step in most information systems. Nowadays, we have many different ways to identify entities such as surrogates, keys, and object identifiers. However, not all of them guarantee the entity identity. Many works have been introduced in the literature for discovering meaningful identifiers (i.e., guaranteeing the entity identity according to the semantics of the universe of discourse), but all of them work at the logical or data level and they share some constraints inherent to the kind of approach. Addressing it at the logical level, we may miss some important data dependencies, while the cost to identify data dependencies purely at the data level may not be affordable. In this paper, we propose an approach for discovering meaningful identifiers driven by domain ontologies. In our approach, we guide the process at the conceptual level and we introduce a set of pruning rules for improving the performance by reducing the number of identifier hypotheses generated and to be verified with data. Finally, we also introduce a simulation over a case study to show the feasibility of our method.
  • Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. Integrating ETL Processes from Information Requirements. 14th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Vienna, Austria, September 3-6, 2012. Lecture Notes in Computer Science 7448. Springer, 2012. Pages 65-80. ISBN (printed): 978-3-642-32583-0. ISBN (electronic): 978-3-642-32584-7. DOI 10.1007/978-3-642-32584-7_6
    Data warehouse (DW) design is based on a set of requirements expressed as service level agreements (SLAs) and business level objects (BLOs). Populating a DW system from a set of information sources is realized with extract-transform-load (ETL) processes based on SLAs and BLOs. The entire task is complex, time consuming, and hard to be performed manually. This paper presents our approach to the requirement-driven creation of ETL designs. Each requirement is considered separately and a respective ETL design is produced. We propose an incremental method for consolidating these individual designs and creating an ETL design that satisfies all given requirements. Finally, the design produced is sent to an ETL engine for execution. We illustrate our approach through an example based on TPC-H and report on our experimental findings that show the efectiveness and quality of our approach.
  • Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. ORE: an iterative approach to the design and evolution of multi-dimensional schemas. In 15th International Workshop on Data Warehousing and OLAP (DOLAP). Maui (USA), October 2012. ACM Press, 2012. Pages 1-8. ISBN: 978-1-4503-1721-4. DOI 10.1145/2390045.2390047
    Designing a data warehouse (DW) highly depends on the information requirements of its business users. However, tailoring a DW design that satisfies all business requirements is not an easy task. In addition, complex and evolving business environments result in a continuous emergence of new or changed business needs. Furthermore, for building a correct multidimensional (MD) schema for a DW, the designer should deal with the semantics and heterogeneity of the underlying data sources. To cope with such an inevitable complexity, both at the beginning of the design process and when a potential evolution event occurs, in this paper we present a semi-automatic method, named ORE, for constructing the MD schema in an iterative fashion based on the information requirements. In our approach, we consider each requirement separately and incrementally build the unified MD schema satisfying the entire set of requirements.
  • Petar Jovanovic, Oscar Romero, Alkis Simitsis, Alberto Abelló. Requirement-Driven Creation and Deployment of Multidimensional and ETL Designs. In 31st International Conference on Conceptual Modeling (ER) Workshops. Springer 2012. Pages 391-395. ISBN: 978-3-642-33999-8
    We present our tool, GEM, for assisting designers in the error-prone and time-consuming tasks carried out at the early stages of a data warehousing project. Our tool semi-automatically produces multidimensional (MD) and ETL conceptual designs from a given set of business requirements (like SLAs) and data source descriptions. Subsequently, our tool translates both the MD and ETL conceptual designs produced into physical designs, so they can be further deployed on a DBMS and an ETL engine. In this paper, we describe the system architecture and present our demonstration proposal by means of an example.
  • Alberto Abelló, Ladjel Bellatreche, Boualem Benatallah (Eds.). Model and Data Engineering - 2nd International Conference, MEDI 2012, Poitiers, France, October 3-5, 2012. Proceedings. Lecture Notes in Computer Science 7602, Springer 2012. ISBN (printed): 978-3-642-33608-9. ISBN (electronic): 978-3-642-33609-6. DOI: 10.1007/978-3-642-33609-6
  • José Fernández, Ramón Bragós, Margarita Cabrera, Alberto Abelló, Neus Arroyo, Daniel González, Francesc Garófano, A. Cortés, A. Fabra. Interoperability platform for virtual and remote laboratories. In 9th International Conference on Remote Engineering and Virtual Instrumentation (REV). IEEE, 2012. Pages 1-7. ISBN: 978-1-4673-2542-4
    This communication describes the interoperability platform that has been developed at the Technical University of Catalonia (UPC) to integrate the access to different virtual and remote laboratories. Up to eleven laboratories that belong to the GilabViR group of interest in virtual and remote laboratories in our University have been analyzed to generate a set of specifications and develop the architecture and the applications that would allow its access through the university LMS system. Although the current LMS platform (Atenea) is implemented over Moodle 1.9, the new modules have been developed using Moodle 2.2.1 given that the migration to this version will be done in the next months. The interoperability platform defines new Moodle modules that allow the interconnection between the LMS system and a set of laboratories and provide the intrinsic LMS features (user identification, activity recording, educational materials repository, ...). There are modules that allow the interaction with a web service interface giving access to the laboratory, with Java applet virtual laboratories and others that make possible the link with LabView based remote laboratories, all of them with recording of experiment parameters in SQL databases placed in the experiment servers.
  • Carme Martín, Antoni Urpi, Alberto Abelló, Xavier Burgués, Marí José Casañ, Carme Quer, M. Elena Rodríguez. Avaluació de la incorporació d'activitats d'aprenentatge actiu i cooperatiu a les assignatures de bases de dades de la Facultat d'Informàtica de Barcelona. In VII Congrés Internacional de Docència Universitària i Innovació (CIDUI). 2012. Pages: 1-38. ISBN: 9788499213002
  • Alberto Abelló, and Oscar Romero. Service-Oriented Business Intelligence.. In First European Business Intelligence Summer School (eBISS'11). Lecture Notes in Business Information Processing Volume 96. Springer, 2012. Pages 156-185. ISSN: 1865-1348. ISBN (paper): 978-3-642-27357-5. ISBN (Electronic): 978-3-642-27358-2. DOI: 10.1007/978-3-642-27358-2_8
    The traditional way to manage Information Technologies (IT) in the companies is having a data center, and licensing monolithic applications based on the number of CPUs, allowed connections, etc. This also holds for Business Intelligence environments. Nevertheless, technologies have evolved and today other approaches are possible. Specifically, the service paradigm allows to outsource hardware as well as software in a pay-as-you-go model. In this work, we will introduce the concepts related to this paradigm and analyze how they affect Business Intelligence (BI). We will analyze the specificity of services and present specific techniques to engineering service systems (e.g., Cloud Computing, Service-Oriented Architectures -SOA- and Business Process Modeling -BPM-). Then, we will also analyze to which extent it is possible to consider Business Intelligence just a service and use these same techniques on it. Finally, we store the other way round. Since service companies represent around 70% of the Gross Domestic Product (GDP) in the world, special attention must be paid to their characteristics and how to adapt BI techniques to enhance services.
2011
  • Alberto Abelló, Jaume Ferrarons, Oscar Romero. Building cubes with MapReduce. In 14th International Workshop on Data Warehousing and OLAP (DOLAP). Glasgow (United Kingdom), October 2011. ACM Press, 2011. Pages 18-24. ISBN: 978-1-4503-0963-9. DOI: 10.1145/2064676.2064680
    In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g., BigTable) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Indeed, specific software tools to exploit a cloud are also here. The trend in this case is toward using tools based on the MapReduce paradigm developed by Google. In this paper, we explore the possibility of having data in a cloud by using BigTable to store the corporate historical data and MapReduce as an agile mechanism to deploy cubes in ad-hoc Data Marts. Our main contribution is the comparison of three different approaches to retrieve data cubes from BigTable by means of MapReduce and the definition of criteria to choose among them.
  • Oscar Romero, Patrick Marcel, Alberto Abelló, Verónika Peralta, Ladjel Bellatreche. Describing Analytical Sessions Using a Multidimensional Algebra. In 13th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Toulouse, France, August 29-September 2, 2011. Lecture Notes in Computer Science 6862, Springer 2011. Pages 224-239. ISBN: 978-3-642-23543-6. DOI:10.1007/978-3-642-23544-3_17
    Recent efforts to support analytical tasks over relational sources have pointed out the necessity to come up with flexible, powerful means for analyzing the issued queries and exploit them in decisionoriented processes (such as query recommendation or physical tuning). Issued queries should be decomposed, stored and manipulated in a dedicated subsystem. With this aim, we present a novel approach for representing SQL analytical queries in terms of a multidimensional algebra, which better characterizes the analytical efforts of the user. In this paper we discuss how an SQL query can be formulated as a multidimensional algebraic characterization. Then, we discuss how to normalize them in order to bridge (i.e., collapse) several SQL queries into a single characterization (representing the analytical session), according to their logical connections.
  • Oscar Romero, Alkis Simitsis, Alberto Abelló. GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs. In 13th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Toulouse, France, August 29-September 2, 2011. Lecture Notes in Computer Science 6862. Springer, 2011. Pages 80-95. ISBN: 978-3-642-23543-6. DOI:10.1007/978-3-642-23544-3_7
    At the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an error-prone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual representation of the extract-transform-load (ETL) processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we describe a method that combines information about the data sources along with the business requirements, for validating and completing -if necessary- these requirements, producing a multidimensional design, and identifying the ETL operations needed. We present our method in terms of the TPC-DS benchmark and show its applicability and usefulness.
  • Oscar Romero, Alberto Abelló. A Comprehensive Framework on Multidimensional Modeling. In Advances in Conceptual Modeling. Recent Developments and New Directions - ER 2011 Workshopsi (MoRE-BI). Brussels, Belgium, October 31 - November 3, 2011. Lecture Notes in Computer Science 6999. Springer, 2011. Pages 108-117. ISBN: 978-3-642-24573-2. DOI: 10.1007/978-3-642-24574-9_14
    In this paper we discuss what current multidimensional design approaches provide and which are their major flaws. Our contribution lays in a comprehensive framework that does not focus on how these approaches work but what they do provide for usage in real data warehouse projects. So that, we do not aim at comparing current approaches but set up a framework (based on four criteria: the role played by end-user requirements and data sources, the degree of automation achieved and the quality of the output produced) highlighting their drawbacks, and the need for further research on this area.
  • Oscar Romero, Alberto Abelló. Data-Driven Multidimensional Design for OLAP. In poster session in 23rd International Conference Scientific and Statistical Database Management (SSDBM). Portland, OR, USA, July 2011. Lecture Notes in Computer Science 6809. Springer, 2011. Pages 594-595. ISBN: 978-3-642-22350-1. DOI:10.1007/978-3-642-22351-8_51. See poster.
    OLAP is a popular technology to query scientific and statistical databases, but their success heavily depends on a proper design of the underlying multidimensional (MD) databases (i.e., based on the fact / dimension paradigm). Relevantly, different approaches to automatically identify facts are nowadays available, but all MD design methods rely on discovering functional dependencies (FDs) to identify dimensions. However, an unbound FD search generates a combinatorial explosion and accordingly, these methods produce MD schemas with too many dimensions whose meaning has not been analyzed in advance. On the contrary, i) we use the available ontological knowledge to drive the FD search and avoid the combinatorial explosion and ii) only propose dimensions of interest for analysts by performing a statistical study of data.
  • A. Abelló, X. Burgués. Puntuación entre iguales para la evaluación del trabajo en equipo. In XVII Jornadas de Enseñanza Universitaria de la Informática (JENUI), Sevilla (España), July 2011. Pages 73-80. ISBN: 978-84-694-5156-4

    La entrada en el EEES y la adopción de un sistema de evaluación basado en competencias, algunas de ellas no técnicas, hace que nos tengamos que plantear algún tipo de cambio, no solo en la forma de enseñar, sino también en la forma de evaluación. Evaluar, por ejemplo, la actitud ante el trabajo, el trabajo en equipo o la capacidad de innovación mediante un examen resulta a todas luces poco apropiado, si no imposible. Es en este sentido que hemos experimentado durante dos semestres la posibilidad de evaluación entre iguales para la competencia genérica "trabajo en equipo". En este trabajo, presentamos la experiencia y conclusiones extraídas.
  • A. Abelló. NOSQL: The death of the Star. As Invited speaker in VII journées francophones sur les entrepots de Données et Analyses en ligne (EDA), Clermont-Ferrand (France), June 2011. Pages 1-2. Hermann, 2011. ISBN: 978-27056-81-2
    In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g. BigTable and C-Store) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Thus, specific software tools to exploit the cloud have also appeared. The trend in this case is to use implementations based on the MapReduce paradigm developed by Google. The basic goal of this talk will be the introduction and the discussion of these ideas from the point of view of Data Warehousing and OLAP. We will see advantages, disadvantages and some possibilities it offers.
  • Oscar Romero, Alberto Abelló. Multidimensional Design Methods for Data Warehousing. Chapter 5 in Integrations of Data Warehousing, Data Mining and Database Technologies: Innovative Approaches. Editors David Taniar, Li Chen. IGI Global, 2011. Pages 78-105. ISBN (printed): 978-1-60960-537-7. ISBN (electronic): 978-1-60960-538-4. DOI: 10.4018/978-1-60960-537-7.ch005
    In the last years, data warehousing systems have gained relevance to support decision making within organizations. The core component of these systems is the data warehouse and nowadays it is widely assumed that the data warehouse design must follow the multidimensional paradigm. Thus, many methods have been presented to support the multidimensional design of the data warehouse.The first methods introduced were requirement-driven but the semantics of the data warehouse (since the data warehouse is the result of homogenizing and integrating relevant data of the organization in a single, detailed view of the organization business) require to also consider the data sources during the design process. Considering the data sources gave rise to several data-driven methods that automate the data warehouse design process, mainly, from relational data sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methods have been introduced proposing to combine data-driven and requirement-driven approaches. These methods focus on automating the whole process and improving the feedback retrieved by each approach to produce better results. On the other hand, some new approaches focus on considering alternative scenarios than relational sources. These methods also consider (semi)-structured data sources, such as ontologies or XML, that have gained relevance in the last years. Thus, they introduce innovative solutions for overcoming the heterogeneity of the data sources. All in all, we discuss the current scenario of multidimensional modeling by carrying out a survey of multidimensional design methods. We present the most relevant methods introduced in the literature and a detailed comparison showing the main features of each approach.
  • Rafael Berlanga, Oscar Romero, Alkis Simitsis, Victoria Nebot, Torben Bach Pedersen, Alberto Abelló, María José Aramburu. Semantic Web Technologies for Business Intelligence . Chapter 14 in Business Intelligence Applications and the Web: Models, Systems, and Technologies. Editors Marta E. Zorrilla, Jose-Norberto Mazón, Óscar Ferrández, Irene Garrigós, Florian Daniel, Juan Trujillo. IGI Global, 2011. Pages 310-339. ISBN (printed): 978-1-61350-038-5. ISBN (electronic): 978-1-61350-039-2. ISBN (perpetual access): 978-1-61350-040-8. DOI: 10.4018/978-1-61350-038-5.ch014
    This chapter describes the convergence of two of the most influential technologies in the last decade, namely business intelligence (BI) and the Semantic Web (SW). Business intelligence is used by almost any enterprise to derive important business-critical knowledge from both internal and (increasingly) external data. When using external data, most often found on the Web, the most important issue is knowing the precise semantics of the data. Without this, the results cannot be trusted. Here, Semantic Web technologies come to the rescue, as they allow semantics ranging from very simple to very complex to be specified for any web-available resource. SW technologies do not only support capturing the "passive" semantics, but also support active inference and reasoning on the data. The chapter first presents a motivating running example, followed by an introduction to the relevant SW foundation concepts. The chapter then goes on to survey the use of SW technologies for data integration, including semantic data annotation and semantics-aware extract, transform, and load processes (ETL). Next, the chapter describes the relationship of multidimensional (MD) models and SW technologies, including the relationship between MD models and SW formalisms, and the use of advanced SW reasoning functionality on MD models. Finally, the chapter describes in detail a number of directions for future research, including SW support for intelligent BI querying, using SW technologies for providing context to data warehouses, and scalability issues. The overall conclusion is that SW technologies are very relevant for the future of BI, but that several new developments are needed to reach the full potential.
2010
  • Alberto Abelló, Oscar Romero. Using ontologies to discover fact IDs. In 13th International Workshop on Data Warehousing and OLAP (DOLAP 2010). Toronto (Canada), October 2010. ACM Press, 2010. Pages 3-10. ISBN: 978-1-4503-0383-5. DOI: 10.1145/1871940.1871944
    Object identification is a crucial step in most information systems. Nowadays, we have many different ways to identify entities such as surrogates, keys and object identifiers. However, not all of them guarantee the entity identity. Many works have been introduced in the literature for discovering meaningful IDs, but all of them work at the logical or data level and they share some constraints inherent to the kind of approach. Addressing it at the logical level, we may miss some important data dependencies, while the cost to identify data dependencies at the data level may not be affordable. In this paper, we propose an approach for discovering fact IDs from domain ontologies. In our approach, we guide the process at the conceptual level and we introduce a set of pruning rules for improving the performance by reducing the number of ID hypotheses generated and to be verified with data. Finally, we also introduce a simulation over a case study to show the feasibility of our method.
  • A. Abelló, X. Burgués, M. E. Rodríguez. Utilización de glosarios de Moodle para incentivar la participación y dedicación de los estudiantes. In XVI Jornadas de Enseñanza Universitaria de la Informática (JENUI), Santiago de Compostela (Spain), 2010. Pages 309-316. ISBN: 84-693-3741-7

    La entrada en el EEES y la adopción del nuevo sistema de créditos ECTS, que mide las horas de dedicación del estudiante y no las del profesor, hace que debamos plantearnos nuevos métodos docentes que incentiven, al mismo tiempo que acoten y controlen, la dedicación de los estudiantes fuera del aula. Es en este sentido que hemos experimentado el uso de los glosarios provistos por Moodle para fomentar que los estudiantes repasen en casa la teoría presentada en clase, de forma continuada a lo largo del curso (no únicamente en vísperas del examen final).
  • Xavier Burgués, Carme Quer, Carme Martín, Alberto Abelló, M. José Casany, M. Elena Rodríguez, Toni Urpí. Adapting LEARN-SQL to Database computer supported cooperative learning. In Workshop on Methods and Cases in Computing Education (MCCE). Cadiz (Spain), July 2010.
    LEARN-SQL is a tool that we are using since three years ago in several database courses, and that has shown its positive effects in the learning of different database issues. This tool allows proposing remote questionnaires to students, which are automatically corrected giving them a feed-back and promoting their self-learning and self-assessment of their work. However, this tool as it is currently used does not has the possibility to propose structured exercises to teams that promote their cooperative learning. In this paper, we present our adaptation of the LEARN-SQL tool for allowing some Computer-Supported Collaboration Learning techniques.
  • Carme Martín, Alberto Abelló, Xavier Burgués, M. José Casany, Carme Quer, M. Elena Rodríguez, Toni Urpí. Adaptació d'assignatures de bases de dades a l'EEES. In VII Congreso Internacional de Docencia Universitaria e Innovación (CIDUI). Barcelona (Spain), July 2010.
    Els canvis recents en els plans d´estudis de la UPC i la UOC tenen en compte el nou espai europeu d´educació superior (EEES). Una de les conseqüències directes d´aquests canvis és la necessitat d´afitar i optimitzar el temps dedicat a les activitats d´aprenentatge que requereixen la participació activa de l´estudiant i que es realitzen de manera continuada durant el semestre. A més, l´EEES destaca la importància de les pràctiques, les relacions interpersonals i la capacitat de treballar en equip, suggerint la reducció de classes magistrals i l´augment d´activitats que fomentin tant el treball personal de l´estudiant com el cooperatiu. En l´àmbit de la docència informàtica d´assignatures de bases de dades el problema és especialment complex degut a que els enunciats de les proves no acostumen a tenir una solució única. Nosaltres hem desenvolupat una eina, anomenada LEARN-SQL, l´objectiu de la qual és corregir automàticament qualsevol tipus de sentència SQL (consultes, actualitzacions, procediments emmagatzemats, disparadors, etc ...) i discernir si la resposta aportada per l´estudiant és o no és correcta amb independència de la solució concreta que aquest proposi. D´aquesta manera potenciem l´autoaprenentatge i l´autoavaluació, fent possible la semi-presencialitat supervisada i facilitant l´aprenentatge individualitzat segons les necessitats de cada estudiant. Addicionalment, aquesta eina ajuda als professors a dissenyar les proves d´avaluació, permetent també la opció de revisar qualitativament les solucions aportades pels estudiants. Per últim, el sistema proporciona ajuda als estudiants per a que aprenguin dels seus propis errors, proporcionant retroalimentació de qualitat.
  • Oscar Romero. Automating the multidimensional design of data warehouses. PhD Thesis, Universitat Politècnica de Catalunya. Barcelona, February 2010.

    Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.

    Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.

    In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.

    1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.

    2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.

    Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.

    So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literature.
  • Oscar Romero, Alberto Abelló. A framework for multidimensional design of data warehouses from ontologiesElsevier). In Data & Knowledge Engineering, Volume 69, Issue 11. Elsevier, 2010. Pages 1138-1157. ISSN: 0169-023X. DOI: 10.1016/j.datak.2010.07.007
    The data warehouse design task needs to consider both the end-user requirements and the organization data sources. For this reason, the data warehouse design has been traditionally considered a reengineering process, guided by requirements, from the data sources.

    Most current design methods available demand highly-expressive end-user requirements as input, in order to carry out the exploration and analysis of the data sources. However, the task to elicit the end-user information requirements might result in a thorough task. Importantly, in the data warehousing context, the analysis capabilities of the target data warehouse depend on what kind of data is available in the data sources. Thus, in those scenarios where the analysis capabilities of the data sources are not (fully) known, it is possible to help the data warehouse designer to identify and elicit unknown analysis capabilities.

    In this paper we introduce a user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks. Our proposal is based on a reengineering process that derives the multidimensional schema from a conceptual formalization of the domain. It starts by fully analyzing the data sources to identify, without considering requirements yet, the multidimensional knowledge they capture (i.e., data likely to be analyzed from a multidimensional point of view). Next, we propose to exploit this knowledge in order to support the requirements elicitation task. In this way, we are already conciliating requirements with the data sources, and we are able to fully exploit the analysis capabilities of the sources. Once requirements are clear, we automatically create the data warehouse conceptual schema according to the multidimensional knowledge extracted from the sources.
  • Oscar Romero, Alberto Abelló. Automatic validation of requirements to support multidimensional designElsevier). In Data & Knowledge Engineering, Volume 69, Issue 9. Elsevier, 2010. Pages 917-942. ISSN: 0169-023X. DOI: 10.1016/j.datak.2010.03.006
    It is widely accepted that the conceptual schema of a data warehouse must be structured according to the multidimensional model. Moreover, it has been suggested that the ideal scenario for deriving the multidimensional conceptual schema of the data warehouse would consist of a hybrid approach (i.e., a combination of data-driven and requirement-driven paradigms). Thus, the resulting multidimensional schema would satisfy the end-user requirements and would be conciliated with the data sources. Most current methods follow either a data-driven or requirement-driven paradigm and only a few use a hybrid approach. Furthermore, hybrid methods are unbalanced and do not benefit from all of the advantages brought by each paradigm.

    In this paper we present our approach for multidimensional design. The most relevant step in our framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements. MDBE introduces several advantages over previous approaches, which can be summarized as three main contributions. (i) The MDBE method is a fully automatic approach that handles and analyzes the end-user requirements automatically. (ii) Unlike data-driven methods, we focus on data of interest to the end-user. However, the user may not be aware of all the potential analyses of the data sources and, in contrast to requirement-driven approaches, MDBE can propose new multidimensional knowledge related to concepts already queried by the user. (iii) Finally, MDBE proposes meaningful multidimensional schemas derived from a validation process. Therefore, the proposed schemas are sound and meaningful.
  • Alberto Abelló, Il-Yeol Song. Data warehousing and OLAP (DOLAP'08)Elsevier). In Data & Knowledge Engineering, Volume 69, Issue 1. Elsevier, 2010. Pages 1-2. ISSN: 0169-023X. DOI: 10.1016/j.datak.2009.08.011
2009
  • Oscar Romero, Diego Calvanese, Alberto Abelló, Mariano Rodriguez-Muro. Discovering functional dependencies for multidimensional design. In 12th International Workshop on Data Warehousing and OLAP (DOLAP 2009). Hong Kong (China), November 2009. ACM Press, 2009. Pages 1-8. ISBN: 978-1-60558-801-8
    Nowadays, it is widely accepted that the data warehouse design task should be largely automated. Furthermore, the data warehouse conceptual schema must be structured according to the multidimensional model and as a consequence, the most common way to automatically look for subjects and dimensions of analysis is by discovering functional dependencies (as dimensions functionally depend of the fact) over the data sources. Most advanced methods for automating the design of the data warehouse carry out this process from relational OLTP systems, assuming that a RDBMS is the most common kind of data source we may find, and taking as starting point a relational schema. In contrast, in our approach we propose to rely instead on a conceptual representation of the domain of interest formalized through a domain ontology expressed in the DL-Lite Description Logic. In our approach, we propose an algorithm to discover functional dependencies from the domain ontology that exploits the inference capabilities of DL-Lite, thus fully taking into account the semantics of the domain. We also provide an evaluation of our approach in a real-world scenario.
  • Alberto Abelló, Oscar Romero. On-Line Analytical Processing (OLAP). In Encyclopedia of Database Systems (editors-in-chief: Tamer Ozsu & Ling Liu). Springer, 2009. Pages 1949-1954. ISBN: 978-0-387-39940-9
  • A. Abelló, X. Burgués, M. J. Casany, C. Martín, C. Quer, T. Urpí, M. E. Rodríguez. LEARN-SQL: Herramienta de gestión de ejercicios de SQL con autocorrección. In XV Jornadas de Enseñanza Universitaria de la Informática (JENUI), Barcelona (Spain), 2009. Pages 353-360. ISBN: 978-84-692-2758-9

    Algunas herramientas de autocorrección existen ya en el ámbito de la docencia informática. No obstante en asignaturas de bases de datos el problema es especialmente complejo debido a la gran variedad de tipos de ejercicios (los sistemas existentes se limitan a consultas) y a que éstos no tienen solución única. Nuestro sistema tiene como objetivo corregir automáticamente cualquier tipo de sentencia SQL (consultas, actualizaciones, procedimientos, disparadores, creación de índices, etc.) y discernir si la respuesta aportada por el estudiante es o no correcta con independencia de la solución concreta que éste proponga. En esta comunicación presentaremos específicamente el módulo encargado de la gestión de ejercicios y todas las tipologías de estos que estamos utilizando en la actualidad.
  • Oscar Romero, Alberto Abelló. A Survey of Multidimensional Modeling Methodologies. In International Journal on Data Warehousing and Mining (IJDWM), volume 5, number 2. Idea Group, 2009. Pages 1-23. ISSN: 1548-3924

    Many methodologies have been presented to support the multidimensional design of the data warehouse. First methodologies introduced were requirement-driven but the semantics of a data warehouse require to also consider data sources along the design process. In the following years, data sources gained relevance in multidimensional modeling and gave rise to several data-driven methodologies that automate the data warehouse design process from relational sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methodologies have been introduced proposing to combine data-driven and requirement-driven approaches. On the other hand, new approaches focus on considering other kind of structured data sources that have gained relevance in the last years such as ontologies or XML. In this article we present the most relevant methodologies introduced in the literature and a detailed comparison showing main features of each approach.
2008
  • Il-Yeol Song and Alberto Abelló. ForewordACM). In 11th International Workshop on Data Warehousing and OLAP (DOLAP). Napa (USA), November 2008. ACM Press, 2008. ISBN: 978-1-60558-387-7.

  • Oscar Romero and Alberto Abelló. MDBE: Automatinc Multidimensional ModelingSpringer). In 27th International Conference on Conceptual Modeling (ER). Barcelona (Spain), October 2008. LNCS 5231. Springer, 2008. Pages 534-535. ISSN: 0302-9743.

    The goal of this demonstration is to present MDBE, a tool implementing our methodology for automatically deriving multidimensional schemas from relational sources, bearing in mind the end-user requirements. Our approach starts gathering the end-user information requirements that will be mapped over the data sources as SQL queries. Based on the constraints that a query must preserve to make multidimensional sense, MDBE automatically derives multidimensional schemas which agree with both the input requirements and the data sources.
  • Alberto Abelló, M. Elena Rodríguez, Toni Urpí, Xavier Burgués, M. José Casany, Carme Martín, Carme Quer. LEARN-SQL:Automatic Assessment of SQL Based on IMS QTI SpecificationIEEE). Poster session in 8th International Conference on Advanced Learning Technologies (ICALT). Santander (Spain), July 2008. IEEE, 2008. Pages 592-593. ISBN: 978-0-7695-3167-0. See poster

    In this paper we present LEARN-SQL, a system conforming to the IMS QTI specification that allows on-line learning and assessment of students on SQL skills in an automatic, interactive, informative, scalable and extensible manner.
  • Xavier Burgués, Carme Quer, Alberto Abelló, M. José Casany, Carme Martín, M. Elena Rodríguez, Toni Urpí. Uso de LEARN-SQL en el aprendizaje cooperativo de Bases de Datos. In XIV Jornadas de Enseñanza Universitaria de la Informática (JENUI). Granada (Spain), July 2008. FER fotocomposición, 2008. Pages 359-366. ISBN: 978-84-612-4475-1

    En este artículo se describen los cambios efectuados en algunas asignaturas del área de bases de datos en dos vertientes: organizativa y tecnológica. En la primera, el objetivo principal ha sido la introducción de técnicas de aprendizaje cooperativo. En la segunda, el objetivo ha sido potenciar el autoaprendizaje y el autoevaluación a través de la herramienta LEARN-SQL. Los cambios relacionados con las dos vertientes se han aplicado, hasta el momento, a asignaturas distintas. Para finalizar el artículo, se hace una valoración de los resultados obtenidos, y se trazan las líneas de futuros cambios orientados a la combinación de las dos vertientes.
  • M. José Casany, Carme Martín, Alberto Abelló, Xavier Burgués, Carme Quer, M. Elena Rodríguez, Toni Urpí. LEARN-SQL: A blended learning tool for the database area. In V Congreso Internacional de Docencia Universitaria e Innovación (CIDUI). Lleida (Spain), July 2008. ISBN: 978-84-8458-279-3.

    The academic programs of the UPC and UOC are adapting to the European Credit Transfer System (ECTS). One of the changes introduced in the academic programs of the previous universities tries to optimize the time of the activities that require the active participation of the students. The definition of these activities is a very complex task specially when dealing with database teaching in ICT engineering degrees, because usually the questions do not have a unique solution. LEARN -SQL is the tool developed by our group that automatically evaluates the correctness of any SQL statement (queries, updates, stored procedures, triggers etc.) with independence of the student solution. Furthermore, LEARN-SQL helps teachers design their tests as well as allow them review the solutions provided by the students. Finally, the system provides students with valuable feedback, so that they can learn from their mistakes.
2007
  • Oscar Romero and Alberto Abelló. Automating Multidimensional Design from OntologiesACM). In 10th International Workshop on Data Warehousing and OLAP (DOLAP). Lisbon (Portugal), November 2007. ACM Press, 2007. Pages 1-8. ISBN: 1-59593-827-5.

    This paper presents a new approach to automate the multidimensional design of Data Warehouses. In our approach we propose a semi-automatable method aimed to find the business multidimensional concepts from a domain ontology representing different and potentially heterogeneous data sources of our business domain. In short, our method identifies business multidimensional concepts from heterogeneous data sources having nothing in common but that they are all described by an ontology.
  • Alberto Abelló, Toni Urpí, M. Elena Rodríguez, and Marc Estévez. Extensión de Moodle para facilitar la corrección automática de cuestionarios y su aplicación en el ámbito de las bases de datos. In MoodleMoot (Moodle). Cáceres (Spain), October 2007.

    Moodle 1.5 dispone de un módulo de cuestionarios que facilita la gestión de un conjunto de preguntas para su posterior uso en diferentes cuestionarios que pueden ir definiéndose según las necesidades de cada curso. Básicamente, las preguntas pueden ser de opción múltiple o bien de respuesta corta. En caso de preguntas de respuesta corta, la simple presencia de un espacio en blanco de más o de menos en la respuesta del estudiante (respecto a la solución introducida previamente por el profesor) hace que ésta se considere incorrecta. En el ámbito de la docencia en informática, asignaturas como, por ejemplo, "programación" o "bases de datos", el problema es especialmente sangrante, debido a que los enunciados no acostumbran a tener solución única. Es por esto que nos planteamos la posibilidad de desarrollar un nuevo módulo para Moodle que permitiera más posibilidades en la corrección, que la simple comparación carácter a carácter respecto a la solución aportada por el profesor. Así pues, hemos desarrollado un nuevo tipo de cuestionario cuyas preguntas se encuentran en un repositorio externo al Moodle. Cada una de estas preguntas tiene asociado uno o más Servicios Web que son capaces de discernir si la respuesta del estudiante es correcta o no. En nuestro caso, estábamos interesados en la corrección de consultas sobre una base de datos utilizando SQL, pero mediante el mismo módulo conectando con un Servicio Web diferente, se puede corregir cualquier tipo de pregunta, no necesariamente del ámbito de bases de datos. Básicamente, únicamente requiere que la corrección sea objetivable y, en consecuencia, exista un procedimiento que permita realizarla automáticamente.
  • Oscar Romero, and Alberto Abelló. MDBE: Una herramienta Automática para el Modelado Multidimensional. Demonstration in Jornadas de Ingeniería del Software y Bases de Datos (JISBD). Zaragoza (Spain), September 2007. Thomson Editores, 2007. Pages 387-388. ISBN: 978-84-9732-595-0.

    Para facilitar el proceso de modelado multidimensional de un DW, en este trabajo presentamos MDBE (Multidimensional Design By Examples): nuestra propuesta de herramienta para validar requisitos multidimensionales proporcionados por el usuario final y expresados como consultas SQL sobre las fuentes de datos operacionales. MDBE descompone la consulta SQL de entrada para extraer el conocimiento multidimensional relevante que contiene y acorde con dicha información, deriva un conjunto de esquemas multidimensionales que satisfacen los requisitos (consultas) del usuario. Es decir, nos propone posibles esquemas multidimensionales de forma automática.
  • Oscar Romero and Alberto Abelló. On the Need of a Reference Algebra for OLAPSpringer-Verlag). In 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Regensburg (Germany), September, 2007. Lecture Notes in Computer Science volume 4654. Springer, 2007. Pages 99-110. ISSN: 0302-9743. ISBN: 3-540-28566-0.

    Although multidimensionality has been widely accepted as the best solution to conceptual modeling, there is not such agreement about the set of operators to handle multidimensional data. This paper presents a comparative of the existing multidimensional algebras trying to find a common backbone, as well as it discusses about the necessity of a reference multidimensional algebra and the current state of the art.
  • Oscar Romero and Alberto Abelló. Generating Multidimensional Schemas from the Semantic Web. Poster session in 19th Conference on Advanced Information Systems Engineering (CAiSE). Trodheim (Norwey), June 2007.

    In this paper, we introduce a semi-automatable method aimed to find the business multidimensional concepts from an ontology representing the organization domain. With these premises, our approach falls into the Semantic Web research area, where ontologies play a key role to provide a common vocabulary describing the meaning of relevant terms and relationships among them.
2006
  • Stefano Rizzi, Alberto Abelló, Jens Lechtenbörger, and Juan Trujillo. Research in Data Warehouse Modeling and Design: Dead or Alive?ACM). In 9th International Workshop on Data Warehousing and OLAP (DOLAP). Arlington (USA), November 2006. ACM Press, 2006. Pages 3-10. ISBN: 1-59593-530-4.

    Multidimensional modeling requires specialized design techniques. Though a lot has been written about how a data warehouse should be designed, there is no consensus on a design method yet. This paper follows from a wide discussion that took place in Dagstuhl, during the Perspectives Workshop "Data Warehousing at the Crossroads", and is aimed at outlining some open issues in modeling and design of data warehouses. More precisely, issues regarding conceptual models, logical models, methods for design, interoperability, and design for new architectures and applications are considered.
  • Alberto Abelló, Roberto García, Rosa Gil, Marta Oliva, and Ferran Perdix. Semantic Data Integration in a Newspaper Content Management SystemSpringer-Verlag). In poster session in 5th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE). Lyon (France), October, 2006. Lecture Notes in Computer Science volume 4277. Springer, 2006. Pages 41-41. ISSN: 0302-9743. ISBN: 3-540-28566-0. See poster

    A newspaper content management system has to deal with a very heterogeneous information space as the experience in the Diari Segre newspaper has shown us. The greatest problem is to harmonise the different ways the involved users (journalist, archivists) structure the newspaper information space, i.e. news, topics, headlines, etc. Our approach is based on ontology and differentiated universes of discourse (UoD). Users interact with the system and, from this interaction, integration rules are derived. These rules are based on Description Logic ontological relations for subsumption and equivalence. They relate the different UoD and produce a shared conceptualisation of the newspaper information domain.
  • Oscar Romero and Alberto Abelló. Multidimensional Design by ExamplesSpringer-Verlag). In 8th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Krakov (Poland), September, 2006. Lecture Notes in Computer Science volume 4081. Springer, 2006. Pages 85-94. ISSN: 0302-9743, ISBN: 3-540-28566-0.

    In this paper we present a method to validate user multidi-mensional requirements expressed in terms of SQL queries. Furthermore, our approach automatically generates and proposes the set of multidimensional schemas satisfying the user requirements, from the organizational operational schemas. If no multidimensional schema is generated for a query, we can state that requirement is not multidimensional.
  • Alberto Abelló, José Samos, and Fèlix Saltor. YAM²: A Multidimensional Conceptual Model Extending UMLElsevier). In Information Systems 31 (6), September, 2006. Elsevier, 2006. Pages 541-567. ISSN: 0306-4379.

    This paper presents a multidimensional conceptual Object-Oriented model for Data Warehousing and OLAP tools, its structures,integrity constraints and query operations. It has been developed as an extension of UML core metaclasses to facilitate its usage, and try to fill the absence of a standard model. Being a UML extension allows reusing modeling constructs and techniques, and integrating multidimensional modeling in more general modeling processes. Moreover,while existing multidimensional models are restricted to the modeling of isolated stars, this paper investigates the representation of several semantically related star schemas. Summarizability and identification constraints can also be represented in the model, and a closed and complete set of algebraic operations has been defined in terms of functions (so that mathematical properties of functions can be smoothly applied).
  • Adriana Marotta, Federico Piedrabuena, and Alberto Abelló. Managing Quality Properties in a ROLAP EnvironmentSpringer-Verlag). In 18th Conference on Advanced Information Systems Engineering (CAiSE). Luxemburg, June 2006. Lecture Notes in Computer Science volume 4001. Springer, 2006. Pages 127-141. ISSN: 0302-9743, ISBN: 3-540-28566-0.

    In this work we propose, for an environment where multidimensional queries are made over multiple Data Marts, techniques for providing the user with quality information about the retrieved data. This meta-information behaves as an added value over the obtained information or as an additional element to take into account during the proposition of the queries. The quality properties considered are freshness, availability and accuracy. We provide a set of formulas that allow estimating or calculating the values of these properties, for the result of any multidimensional operation of a predefined basic set.
  • Oscar Romero and Alberto Abelló. On the Mismatch Between Multidimensionality and SQL. Technical Report LSI-06-32-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), June 2006.

    ROLAP tools are intended to ease information analysis and navigation through the whole Data Warehouse. These tools automat-ically generate a query according to the multidimensional operations performed by the end-user, using the relational database technology to implement multidimensionality and consequently, automatically trans-lating multidimensional operations to SQL. In this paper, we consider this automatic translation process in detail and to do so, we present an exhaustive comparative (both theoretical and practical) between the multidimensional algebra and the relational one. Firstly, we discuss about the necessity of a multidimensional algebra with regard to the relational one and later, we thoroughly study those considerations to be made to guarantee the correctness of a cube-query (an SQL query making mul-tidimensional sense). With this aim, we analyze the multidimensional algebra expressiveness with regard to SQL pointing out the features a query must satisfy to make multidimensional sense and we also focus on those problems that can arise in a cube-query due to SQL intrinsic restrictions. The SQL translation of an isolated operation does not rep-resent a problem, but when mixing up the modifications brought about by a set of operations in a single cube-query, some conflicts derived from SQL could emerge depending on the operations involved. Therefore, if these problems are not detected and treated appropriately, the automatic translation can retrieve unexpected results.
  • Alberto Abelló, and Fernando Carpani. Using OWL to integrate relational Schemas. Technical Report LSI-06-10-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), March 2006.

    Ontologies offer two contributions to the Semantic Web. On the first hand, they show a vocabulary consensus inside a community. On the other hand, they provide reasoning capabilities. In this paper we present a completely automatic translation from relational schemas to OWL, so that inference mechanisms can be used to integrate different schemas, by dealing with structure heterogeneities. The output of the translation algorithm, which explicits functional dependencies in the relational schema, belongs to OWL Full.
2005
  • Oscar Romero, and Alberto Abelló. Improving automatic SQL translation for ROLAP tools. In Proceedings of Jornadas de Ingeniería del Software y Bases de Datos (JISBD). Granada (Spain), September 2005. Thomson Editores, 2005. Pages 123-130. ISBN: 84-9732-434-X

    In the last years, despite a vast amount of work have been devoted to modeling multidimensionality, multidimensional algebra translation to SQL have been overlooked. ROLAP tools automatically generate a cubequery according to the operations performed by the user. The SQL translation does not represent a problem when treating isolated operations but when mixing up together modifications brought about by a set of operations in the same cube-query, some conflicts could emerge depending on the operations involved. Therefore, if these problems are not detected and treated appropriately, the automatic translation can retrieve unexpected results. In this paper, we define and classify conflicts raised when automatically translating a multidimensional algebra to SQL, and analyze how to solve or minimize their impact.
  • Alberto Abelló, Xavi de Palol, and Mohand-Saïd Hacid. On the Midpoint of a Set of XML DocumentsSpringer-Verlag). In 16th International Conference on Database and Expert Systems Applications (DEXA). Copenhagen (Denmark), August 2005. Lecture Notes in Computer Science volume 3588. Springer, 2005. Pages 441-450. ISSN: 0302-9743, ISBN: 3-540-28566-0

    The WWW contains a huge amount of documents. Some of them share the subject, but are generated by different people or even organizations. To guarantee the interchange of such documents, we can use XML, which allows to share documents that do not have the same structure. However, it makes dificult to understand the core of such heterogeneous documents (in general, schema is not available). In this paper, we ofer a characterization and algorithm to obtain the midpoint (in terms of a resemblance function) of a set of semi-structured, heterogeneous documents without optional elements. The trivial case of midpoint would be the common elements to all documents. Nevertheless, in cases with several heterogeneous documents this may result in an empty set. Thus, we consider that those elements present in a given amount of documents belong to the midpoint. A exact schema could always be found generating optional elements. However, the exact schema of the whole set may result in overspecialization (lots of optional elements), which would make it useless.
  • Alberto Abelló, Xavi de Palol, and Mohand-Saïd Hacid. Approximating the DTD of a set of XML documents. Technical Report LSI-05-7-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), March 2005.

    Extended/preliminary version of the previous paper: "On the Midpoint of a Set of XML Documents".
2003
  • Alberto Abelló, and Carme Martín. The Data Warehouse: A Temporal Database. In Proceedings of Jornadas de Ingeniería del Software y Bases de Datos (JISBD). Alacant (Spain), November 2003. Campobell S.L., 2003. Pages 675-684. ISBN: 84-688-3836-5

    The aim of this paper is to bring together two research areas, i.e. "Data Warehouses" and "Temporal Databases", involving representation of time. In order to achieve this goal, data warehouse and temporal database research results have been surveyed. Looking at temporal aspects within a data warehouse, more similarities than differences between temporal databases and data warehouses have been found. The first closeness between these areas consists in the possibility of a data warehouse redefinition in terms of a bitemporal database. Another relation is the use of temporal languages in data warehousing. Moreover, the correspondence between advances in temporal evolution and storage, and data warehouses are presented. Finally, Object-Oriented temporal data models contribute to add the integration and subject-orientation that is required by a data warehouse. Therefore, this paper is focussed on how contributions of the temporal database research could benefit data warehouses.
  • Alberto Abelló, José Samos, and Fèlix Saltor. Implementing Operations to Navigate Semantic Star SchemasACM). In 6th International Workshop on Data Warehousing and OLAP (DOLAP). New Orleans (USA), November 2003. ACM Press, 2003. Pages 56-62. ISBN: 1-58113-727-3

    In the last years, lots of work have been devoted to multidimensional modeling, star shape schemas and OLAP operations. However, \foreign{drill-across} has not captured as much attention as other operations. This operation allows to change the subject of analysis keeping the same analysis space we were using to analyze another subject. It is assumed that this can be done if both subjects share exactly the same analysis dimensions. In this paper, besides the implementation of an algebraic set of operations on a RDBMS, we are going to show when and how we can change the subject of analysis in the presence of semantic relationships, even if the analysis dimensions do not exactly coincide.
  • Carme Martín, and Alberto Abelló. A Temporal Study of Data Sources to Load a Corporate Data WarehouseSpringer-Verlag). In 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Prague (Czech Republic), September 2003. Lecture Notes in Computer Science volume 2737. Springer, 2003. Pages 109-118. ISSN: 0302-9743. ISBN: 3-540-40807-X

    The input data of the corporate data warehouse is provided by the data sources, that are integrated. In the temporal database research area, a bitemporal database is a database supporting valid time and transaction time. Valid time is the time when the fact is true in the modeled reality, while transaction time is the time when the fact is stored in the database. Defining a data warehouse as a bitemporal database containing integrated and subject-oriented data in support of the decision making process, transaction time in the data warehouse can always be obtained, because it is internal to a given storage system. When an event is loaded into the data warehouse, its valid time is transformed into a bitemporal element, adding transaction time, generated by the database management system of the data warehouse. However, depending on whether the data sources manage transaction time and valid time or not, we could obtain the valid time for the data warehouse or not. The aim of this paper is to present a temporal study of the different kinds of data sources to load a corporate data warehouse, using a bitemporal storage structure.
  • Alberto Abelló, Elena Rodríguez, Fèlix Saltor, Marta Oliva, Cecilia Delgado, Eladio Garví and José Samos. On Operations to Conform Object-Oriented Schemas. In International Conference on Enterprise Information Systems (ICEIS). Angers (France), April 2003. Selected among the best papers of the conference to be published in "Enterprise Information Systems V", Kluwer Academic Publishers, 2004. Pages 49-56. ISBN: 1-4020-1726-X

    To build a Cooperative Information System from several preexisting, heterogeneous systems, the schemas of these systems must be integrated. Operations used for this purpose include conforming operations, which change the form of a schema. In this paper we present a systematic approach to establish which conforming operations for Object-Oriented schemas are needed, and which of them can be considered as primitive, all others being derivable from these. We organize these operations in matrixes according to the Object-Oriented dimensions -Generalization/Specialization, Aggregation/Decomposition- on which they operate.
  • Alberto Abelló, and Carme Martín. A Bitemporal Storage Structure for a Corporate Data Warehouse. Short paper in International Conference on Enterprise Information Systems (ICEIS). Angers (France), April 2003.

    This paper brings together two research areas, i.e. "Data Warehouses" and "Temporal Databases", involving representation of time. Looking at temporal aspects within a data warehouse, more similarities than differences between temporal databases and data warehouses have been found. The first closeness between these areas consists in the possibility of a data warehouse redefinition in terms of a bitemporal database. A bitemporal storage mechanism is proposed along this paper. In order to meet this goal, a temporal study of data sources is developed. Moreover, we will show how Object-Oriented temporal data models contribute to add the integration and subject-orientation that is required by a data warehouse.
2002
  • Alberto Abelló, Francisco Araque, Cecilia Delgado, Eladio Garví, Marta Oliva, Elena Rodríguez, Emilia Ruíz, Fèlix Saltor, José Samos, and Manolo Torres. Operaciones para Conformar Esquemas Orientados a Objetos. In Taller sobre Integración Semántica de Fuentes de Datos Distribuidas y Heterogéneas de las Jornadas de Ingeniería del Software y Bases de Datos (JISBD2002). El Escorial (Spain), November 2002. (In Spanish)
  • Alberto Abelló, José Samos, and Fèlix Saltor. On Relationships Offering New Drill-across PossibilitiesACM). In 5th International Workshop on Data Warehousing and OLAP (DOLAP). McLean (USA), November 2002. ACM Press, 2002. Pages 7-13. ISBN: 1-58113-590-4

    OLAP tools divide concepts based on whether they are used as analysis dimensions, or are the fact subject of analysis, which gives rise to star shape schemas. Operations are always provided to navigate inside such star schemas. However, the navigation among different stars is usually overlooked. This paper studies different kinds of Object-Oriented conceptual relationships (part of UML standard) between stars (namely Derivation, Generalization, Association, and Flow) that allow to drill across them.
  • Carme Martín, and Alberto Abelló. The Data Warehouse: A Temporal Database. Technical Report LSI-02-66-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), Novembre 2002.

    Extended version of the homonimous paper published in 2003.
  • Alberto Abelló, José Samos, and Fèlix Saltor. YAM² (Yet Another Multidimensional Model): An extension of UMLIEEE). In International Database Engineering & Applications Symposium (IDEAS). Edmonton (Canada), July 2002. Mario A. Nascimento, M. Tamer Özsu, Osmar Zaïne Editors. IEEE Computer Society Press, 2002. Pages 172-181. ISBN: 0-7695-1638-6. ISSN: 1098-8086

    This paper presents a multidimensional conceptual Object-Oriented model, its structures, integrity constraints and query operations. It has been developed as an extension of UML core metaclasses to facilitate its usage, as well as to avoid the introduction of completely new concepts. YAM² allows the representation of several semantically related star schemas, as well as summarizability and identification constraints.
  • Alberto Abelló. YAM²: A Multidimensional Conceptual Model. PhD Thesis, Universitat Politècnica de Catalunya. Barcelona, April 2002.

    This thesis proposes YAM², a multidimensional conceptual model for OLAP (On-Line Analytical Processing). It is defined as an extension of UML (Unified Modeling Language). The aim is to benefit from Object-Oriented concepts and relationships to allow the definition of semantically rich multi-star schemas. Thus, the usage of Generalization, Association, Derivation, and Flow relationships (in UML terminology) is studied.

    An architecture based on different levels of schemas is proposed and the characteristics of its different levels defined. The benefits of this architecture are twofold. Firstly, it relates Federated Information Systems with Data Warehousing, so that advances in one area can also be used in the other. Moreover, the Data Mart schemas are defined so that they can be implemented on different Database Management Systems, while still offering a common integrated vision that allows to navigate through the different stars.

    The main concepts of any multidimensional model are facts and dimensions. Both are analyzed separately, based on the assumption that relationships between aggregation levels are part-whole (or composition) relationships. Thus, mereology axioms are used on that analysis to prove some properties.

    Besides structures, operations and integrity constraints are also defined for YAM². Due to the fact that, in this thesis, a data cube is defined as a function, operations (i.e. Drill-across, ChangeBase, Roll-up, Projection, and Selection) are defined over functions. Regarding the set of integrity constraints, they reflect the importance of summarizability (or aggregability) of measures, and pay special attention to it.
2001
  • Alberto Abelló, Francisco Araque, José Samos, and Fèlix Saltor. Bases de Datos Federadas, Almacenes de Datos y Análisis Multidimensional. In Taller de Almacenes de Datos y Tecnologia OLAP de las Jornadas de Ingeniería del Software y Bases de Datos (JISBD2001). Almagro (Spain), November 2001. (In Spanish)
  • Alberto Abelló, José Samos, and Fèlix Saltor. Understanding Facts in a Multidimensional Object-Oriented ModelACM). In 4th International Workshop on Data Warehousing and OLAP (DOLAP 2001). Atlanta (USA), November 2001. Pages 32-39. ACM Press, 2001. ISBN 1-58113-437-1.

    "On-Line Analytical Processing" tools are used to extract information from the "Data Warehouse" in order to help in the decision making process. These tools are based on multidimensional concepts, i.e. facts and dimensions. In this paper we study the meaning of facts, and the dependencies in multidimensional data. This study is used to find relationships between cubes (in an Object-Oriented framework) and explain navigation operations.
  • Alberto Abelló, José Samos, and Fèlix Saltor. Multi-star Conceptual Schemas for OLAP Systems.. Technical Report LSI-01-45-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), October 2001.

    Extended version of the paper published in 2002: "On Relationships Offering New Drill-across Possibilities".
  • Alberto Abelló, José Samos, and Fèlix Saltor. YAM2 (Yet Another Multidimensional Model): An extension of UML.. Technical Report LSI-01-43-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), October 2001.

    Extended version of the homonimous paper published in 2002.
  • Elena Rodríguez, Alberto Abelló, Marta Oliva, Fèlix Saltor, Cecilia Delgado, Eladio Garví and José Samos. On Operations along the Generalization/Specialization Dimension. In International Workshop on Engineering Federated Information Systems (EFIS). Berlin (Germany), October 2001. Pages 70-83. ISBN: 3-89838-027-0

    The need to derive a database schema from one or more existing schemas arises in Federated Database Systems as well as in other contexts. Operations used for this purpose include conforming operations, which change the form of a schema. In this paper we present a systematic approach to establish a set of primitive conforming operations that operate along the Generalization/Specialization dimension in the context of Object-Oriented schemas.
  • Alberto Abelló, José Samos, and Fèlix Saltor. A Framework for the Classification and Description of Multidimensional Data ModelsSpringer-Verlag). In 12th International Conference on Database and Expert Systems Applications (DEXA). Munich (Germany), September 2001. Lecture Notes in Computer Science volume 2113. Springer, 2001. Pages 668-677. ISSN: 0302-9743, ISBN: 3-540-42527-6

    The words On-Line Analytical Processing bring together a set of tools, that use multidimensional modeling in the management of information to improve the decision making process. Lately, a lot of work has been devoted to modeling the multidimensional space. The aim of this paper is twofold. On one hand, it compiles and classifies some of that work, with regard to the design phase they are used in. On the other hand, it allows to compare the different terminology used by each author, by placing all the terms in a common framework.
  • Alberto Abelló, José Samos, and Fèlix Saltor. Understanding Analysis Dimensions in a Multidimensional Object-Oriented Model. In 3rd International Workshop on Design and Management of Data Warehouses (DMDW). Interlaken (Switzerland), June 2001. SwissLife, 2001. ISSN: 1424-4691

    OLAP defines a set of data warehousing query tools characterized by providing a multidimensional view of data. Information can be shown at different aggregation levels (often called granularities) for each dimension. In this paper, we try to outline the benefits of understanding the relationships between those aggregation levels as Part-Whole relationships, and how it helps to address some semantic problems. Moreover, we propose the usage of other Object-Oriented constructs to keep as much semantics as possible in analysis dimensions.
2000
  • Alberto Abelló, José Samos, and Fèlix Saltor. A Data Warehouse Multidimensional Data Models Classification. Technical Report LSI-2000-6. Dept. Llenguages y Sistemas Informáticos (Universidad de Granada), December 2000.

    The words On-Line Analytical Processing (OLAP) bring together a set of tools, that use multidimensional modeling in the extraction of information from the Data Warehouse. Lately, a lot of work has been devoted to modeling the multidimensional space. The aim of this paper is twofold. On one hand, it compiles and classifies most of that work. On the other hand, it allows to compare the different terminology used by each author, by placing all the terms in a common framework.
  • Elena Rodríguez, Alberto Abelló, and Marta Oliva. Resumen del Simposium en Objetos y Bases de Datos del ECOOP'2000. In Taller de Bases de Datos Orientadas a Objetos dentro de las Jornadas de Ingeniería del Software y Bases de Datos (JISBD2000). Valladolid (Spain), November 2000. (In Spanish)

  • Alberto Abelló, and Elena Rodríguez. Describing BLOOM99 with regard to UML Semantics. In Proceedings of Jornadas de Ingeniería del Software y Bases de Datos (JISBD). Valladolid (Spain), November 2000. Gráficas Andrés Martín S.L., 2000. Pages 307-319. ISBN: 84-8448-065-8

    In this paper, we describe the BLOOM metaclasses with regard to the Unified Modeling Language (UML) semantics. We concentrate essentially on the Generalization/Specialization and Aggregation/Decomposition dimensions, because they are used to guide the integration process BLOOM was intended for. Here we focus on conceptual data modeling constructs that UML offers. In spite of UML provides much more abstractions than BLOOM, we will show that BLOOM still has some abstractions that UML does not. For some of these abstractions, we will sketch how UML can be extended to deal with this semantics that BLOOM adds.
  • Fèlix Saltor, Marta Oliva, Alberto Abelló, and José Samos. Building Secure Data Warehouse Schemas from Federated Information Systems. In International CODATA Conference on Data and Information for the Coming Knowledge Milenium (CODATA), Baveno (Italy), October 2000 (Extended abstract). "Heterogeneous Information Exchange and Organizational Hubs", Bestougeff, Dubois and Thuraisingham Editors. Kluwer Academic Publishers, 2002. Pages 123-134. ISBN: 1-4020-0649-7

    There are similarities between architectures for Federated Information Systems and architectures for Data Warehousing. In the context of an integrated architecture for both Federated Information Systems and Data Warehousing, we discuss how additional schema levels provide security, and operations to convert from one level to the next.
  • Alberto Abelló, José Samos, and Fèlix Saltor. Benefits of an Object-Oriented Multidimensional Data ModelSpringer-Verlag). In Objects and Databases - International Symposium- in 14th European Conference on Object-Oriented Programming (ECOOP). Sophia Antipolis and Cannes (France), June 2000. Lecture Notes in Computer Science volume 1944. Springer, 2000. Pages 141-152. ISSN: 0302-9743. ISBN: 3-540-41664-1

    In this paper, we try to outline the goodness of using an O-O model on designing multidimensional Data Marts. We argue that multidimensional modeling is lacking in semantics, which can be obtained by using the O-O paradigm. Some benefits that could be obtained by doing this are classified in six O-O-Dimensions (i.e. Classification/Instantiation, Generalization/Specialization, Aggregation/Decomposition, Caller/Called, Derivability, and Dynamicity), and exemplified with specific cases.
  • Alberto Abelló, Marta Oliva, José Samos, and Fèlix Saltor. Information System Architecture for Data Warehousing from a Federation. In Proc. of the Int. Workshop on Engineering Federated Information Systems (EFIS). Dublin (Ireland), June 2000. IOS Press, 2000. Pages 33-40. ISBN: 1-58603-075-2

    This paper is devoted to Data Warehousing architecture and its data schemas. We relate a federated databases architecture to Data Warehouse schemas, which allows us to provide better understanding to the characteristics of every schema, as well as the way they should be defined. Because of the confidentiality of data used to make decisions, and the federated architecture used, we also pay attention to data protection.
  • Alberto Abelló, Marta Oliva, José Samos, and Fèlix Saltor. Information System Architecture for Secure Data Warehousing. Technical Report LSI-00-26-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), April 2000.

    Extended version of the previous paper: "Information System Architecture for Data Warehousing from a Federation".
1999
  • José Samos, Alberto Abelló, Marta Oliva, Elena Rodríguez, Fèlix Saltor, Jaume Sistac, Francisco Araque, Cecilia Delgado, Eladio Garví and Emilia Ruíz. Sistema Cooperativo para la Integración de Fuentes Heterogéneas de Información y Almacenes de Datos. In Novatica, 142 (Nov-Dec 1999). Asociación de Técnicos de Informática (ATI), 1999. Pages 44-49. (In Spanish). ISSN: 0211-2124

    En este trabajo se presenta nuestra propuesta de creación de un prototipo de sistema cooperativo para la integración de fuentes heterogéneas de información y almacenes de datos en el cual se centran actualmente nuestras investigaciones. El objetivo general es proporcionar una capa de software que permita la cooperación entre diversas fuentes de información que están interconectadas mediante una red de líneas de comunicación. Cada fuente posee sus propios servicios de respuesta a preguntas que sobre sus datos realizan sus usuarios y, adicionalmente, se desea ofrecer a determinados usuarios la capacidad de acceder al conjunto de datos de una forma uniforme (acceso integrado), ya sea en tiempo real, ya sea a través de almacenes de datos.
  • Alberto Abelló, Marta Oliva, Elena Rodríguez, and Fèlix Saltor. The syntax of BLOOM99 schemas. Technical Report LSI-99-34-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), July 1999.

    The BLOOM (BarceLona Object Oriented Model) data model was developed to be the Canonical Data Model (CDM) of a Federated Database Management System prototype. Its design satisfies the features that a data model should have to be suitable as a CDM. The initial version of the model (BLOOM91) has evolved into the present version, BLOOM99.

    This report specifies the syntax of the schema definition language of BLOOM99. In our model, a schema is a set of classes, related through two dimensions: the generalization/specialization dimension, and the aggregation/decomposition dimension. BLOOM supports several features in each of these dimensions, through their corresponding metaclasses.

    Even if users are supposed to define and modify schemas in an interactive way, using a Graphical User Interface, a linear schema definition language is clearly needed. Syntax diagrams are used in this report to specify the language; an alternative using grammar productions appears as Appendix A. A possible graphical notation is given in Appendix B.

    A comprehensive running example illustrates the model, the language and its syntax, and the graphical notation.
  • Alberto Abelló, Marta Oliva, Elena Rodríguez, and Fèlix Saltor. The BLOOM model revisited: An evolution proposal (poster sesion). In Workshop Reader of the 13th European Conference on Object-Oriented Programming (ECOOP). Lisboa, June 1999. Lecture Notes in Computer Science, Vol. 1743. Springer, 2000. Pages 376-378. ISBN: 3-540-66954-X

    Once argued the desirable characteristics of a suitable CDM, the BLOOM model (BarceLona Object Oriented Model) was progressively defined. It results in an extension of an object oriented model with a semantically rich set of abstractions. BLOOM was not developed as a whole but suffered extensions in different phases. Its abstractions were conceived for building the FDBS in as needed basis. It drove to a lack of unity and differences in the nomenclature.

    The necessity of revising the BLOOM model outcropped during the design process of the directory of the FDBS. It is essential to have such storage system because of the amount of needed information in building and operating a FDBS. The directory is the core of our FDBS architecture and it must contain the different schema levels as well as the mappings among them. Therefore, the model had to be fixed in order to store those schemas and mappings in a structured manner.
  • Alberto Abelló. CORBA: A middleware for an heterogeneous cooperative system. Technical Report LSI-99-21-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), May 1999.

    Two kinds of heterogeneities interfere with the integration of different information sources, those in systems and those in semantics. They generate different problems and require different solutions. This paper tries to separate them by proposing the usage of a distinct tool for each one (i.e. CORBA and BLOOM respectively), and analizing how they could collaborate. CORBA offers lots of ways to deal with distributed objects and their potential needs, while BLOOM takes care of the semantic heterogeneities. Therefore, it seems promising to handle the system heterogeneities by wrapping the components of the BLOOM execution architecture into CORBA objects.
  • Alberto Abelló, and Fèlix Saltor. Implementation of the BLOOM data model on ObjectStore. Technical Report LSI-99-7-T. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), May 1999.

    BLOOM is a semantically enriched object oriented data model. It offers extra semantic abstractions to better represent the real world. Those abstractions are not implemented in any commercial product. This paper explains how all them could be simulated with a software layer on an object oriented database management system. Concretely, it proved to work on ObjectStore.
1998

"A celebrity is a person who works hard all his life to become known, then wears dark glasses to avoid being recognized."

Copyright © 1997, Alberto Abelló Gamazo
Dept. Enginyeria de Serveis i Sistemes d'Informació.
Universitat Politècnica de Catalunya.
All rights reserved.
Revised: January 21st, 2017
URL: http://www.essi.upc.edu/~aabello/publications/home.html
Please, send comments and suggestions to: aabello [at] essi.upc.edu