Publications
2011
- Alberto Abelló, Jaume Ferrarons, Oscar Romero. Building cubes with MapReduce. In 14th International Workshop on Data Warehousing and OLAP (DOLAP). Glasgow (Scottland), October 2011. Pages 17-24. ACM Press, 2011. ISBN: 978-1-4503-0963-9. DOI: 10.1145/2064676.2064680
In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g., BigTable) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Indeed, specific software tools to exploit a cloud are also here. The trend in this case is toward using tools based on the MapReduce paradigm developed by Google. In this paper, we explore the possibility of having data in a cloud by using BigTable to store the corporate historical data and MapReduce as an agile mechanism to deploy cubes in ad-hoc Data Marts. Our main contribution is the comparison of three different approaches to retrieve data cubes from BigTable by means of MapReduce and the definition of criteria to choose among them.
- Oscar Romero, Patrick Marcel, Alberto Abelló, Verónika Peralta, Ladjel Bellatreche. Describing Analytical Sessions Using a Multidimensional Algebra. 13th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Toulouse, France, August 29-September 2, 2011. Pages 224-239. Lecture Notes in Computer Science 6862, Springer 2011. ISBN: 978-3-642-23543-6. DOI:10.1007/978-3-642-23544-3_17
Recent efforts to support analytical tasks over relational sources have pointed out the necessity to come up with flexible, powerful means for analyzing the issued queries and exploit them in decisionoriented processes (such as query recommendation or physical tuning). Issued queries should be decomposed, stored and manipulated in a dedicated subsystem. With this aim, we present a novel approach for representing SQL analytical queries in terms of a multidimensional algebra, which better characterizes the analytical efforts of the user. In this paper we discuss how an SQL query can be formulated as a multidimensional algebraic characterization. Then, we discuss how to normalize them in order to bridge (i.e., collapse) several SQL queries into a single characterization (representing the analytical session), according to their logical
connections.
- Oscar Romero, Alkis Simitsis, Alberto Abelló. GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs. 13th International Conference on Data Warehousing and Knowledge Discovery (DaWaK). Toulouse, France, August 29-September 2, 2011. Pages 80-95. Lecture Notes in Computer Science 6862, Springer 2011. ISBN: 978-3-642-23543-6. DOI:10.1007/978-3-642-23544-3_7
At the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an error-prone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual representation of the extract-transform-load (ETL) processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we describe a method that combines information about the data sources along with the business requirements, for validating and completing -if necessary- these requirements, producing a multidimensional design, and identifying the ETL operations needed. We present our method in terms of the TPC-DS benchmark and show
its applicability and usefulness.
- Oscar Romero, Alberto Abelló. A Comprehensive Framework on Multidimensional Modeling. Advances in Conceptual Modeling. Recent Developments and New Directions - ER 2011 Workshopsi (MoRE-BI). Brussels, Belgium, October 31 - November 3, 2011. Pages 108-117. Lecture Notes in Computer Science 6999i, Springer 2011. ISBN: 978-3-642-24573-2. DOI: 10.1007/978-3-642-24574-9_14
In this paper we discuss what current multidimensional design approaches provide and which are their major flaws. Our contribution lays in a comprehensive framework that does not focus on how these approaches work but what they do provide for usage in real data warehouse projects. So that, we do not aim at comparing current approaches but set up a framework (based on four criteria: the role played by end-user requirements and data sources, the degree of automation achieved and the quality of the output produced) highlighting their drawbacks, and the need for further research on this area.
- Oscar Romero, Alberto Abelló. Data-Driven Multidimensional Design for OLAP. Poster session in 23rd International Conference Scientific and Statistical Database Management (SSDBM 2011). Portland, OR, USA, July 2011. Pages 594-595. Lecture Notes in Computer Science 6809, Springer 2011. ISBN: 978-3-642-22350-1. DOI:10.1007/978-3-642-22351-8_51. See poster.
OLAP is a popular technology to query scientific and statistical databases, but their success heavily depends on a proper design of the underlying multidimensional (MD) databases (i.e., based on the fact / dimension paradigm). Relevantly, different approaches to automatically identify facts are nowadays available, but all MD design methods rely on discovering functional dependencies (FDs) to identify dimensions. However, an unbound FD search generates a combinatorial explosion and accordingly, these methods produce MD schemas with too many dimensions whose meaning has not been analyzed in advance. On the contrary, i) we use the available ontological knowledge to drive the FD search and avoid the combinatorial explosion and ii) only propose dimensions of interest for analysts by performing a statistical study of data.
- A. Abelló, X. Burgués. Puntuación entre iguales para la evaluación del trabajo en equipo. In XVII Jornadas de Enseñanza Universitaria de la Informática (JENUI), Sevilla (España), July 2011. Pages 73-80. ISBN: 978-84-694-5156-4
La entrada en el EEES y la adopción de un sistema de evaluación basado en competencias, algunas de ellas no técnicas, hace que nos tengamos que plantear algún tipo de cambio, no solo en la forma de enseñar, sino también en la forma de evaluación. Evaluar, por ejemplo, la actitud ante el trabajo, el trabajo en equipo o la capacidad de innovación mediante un examen resulta a todas luces poco apropiado, si no imposible. Es en este sentido que hemos experimentado durante dos semestres la posibilidad de evaluación entre iguales para la competencia genérica "trabajo en equipo". En este trabajo, presentamos la experiencia y conclusiones extraídas.
- A. Abelló. NOSQL: The death of the Star. Invited speaker in VII journées francophones sur les entrepots de Données et Analyses en ligne (EDA), Clermont-Ferrand (France), June 2011. Pages 1-2. Hermann, 2011. ISBN: 978-27056-81-2
In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g. BigTable and C-Store) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Thus, specific software tools to exploit the cloud have also appeared. The trend in this case is to use implementations based on the MapReduce paradigm developed by Google. The basic goal of this talk will be the introduction and the discussion of these ideas from the point of view of Data Warehousing and OLAP. We will see advantages, disadvantages and some possibilities it offers.
- Oscar Romero, Alberto Abelló. Multidimensional Design Methods for Data Warehousing. Chapter 5 in Integrations of Data Warehousing, Data Mining and Database Technologies: Innovative Approaches. Editors David Taniar, Li Chen. Pages 78-105. IGI Global, 2011. ISBN: 978-1-60960-537-7 (hardcover), 978-1-60960-538-4 (ebook). DOI: 10.4018/978-1-60960-537-7.ch005
In the last years, data warehousing systems have gained relevance to support decision making within organizations. The core component of these systems is the data warehouse and nowadays it is widely assumed that the data warehouse design must follow the multidimensional paradigm. Thus, many methods have been presented to support the multidimensional design of the data warehouse.The first methods introduced were requirement-driven but the semantics of the data warehouse (since the data warehouse is the result of homogenizing and integrating relevant data of the organization in a single, detailed view of the organization business) require to also consider the data sources during the design process. Considering the data sources gave rise to several data-driven methods that automate the data warehouse design process, mainly, from relational data sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methods have been introduced proposing to combine data-driven and requirement-driven approaches. These methods focus on automating the whole process and improving the feedback retrieved by each approach to produce better results. On the other hand, some new approaches focus on considering alternative scenarios than relational sources. These methods also consider (semi)-structured data sources, such as ontologies or XML, that have gained relevance in the last years. Thus, they introduce innovative solutions for overcoming the heterogeneity of the data sources. All in all, we discuss the current scenario of multidimensional modeling by carrying out a survey of multidimensional design methods. We present the most relevant methods introduced in the literature and a detailed comparison showing
the main features of each approach.
- Rafael Berlanga, Oscar Romero, Alkis Simitsis, Victoria Nebot, Torben Bach Pedersen, Alberto Abelló, María José Aramburu. Semantic Web Technologies for Business Intelligence . Chapter 14 in Business Intelligence Applications and the Web: Models, Systems, and Technologies. Editors Marta E. Zorrilla, Jose-Norberto Mazón, Óscar Ferrández, Irene Garrigós, Florian Daniel, Juan Trujillo. Pages 310-339. IGI Global, 2011. ISBN: 978-1-61350-038-5 (hardcover), 978-1-61350-039-2 (ebook), ISBN 978-1-61350-040-8 (print & perpetual access). DOI: 10.4018/978-1-61350-038-5.ch014
This chapter describes the convergence of two of the most influential technologies in the last decade, namely business intelligence (BI) and the Semantic Web (SW). Business intelligence is used by almost any enterprise to derive important business-critical knowledge from both internal and (increasingly) external data. When using external data, most often found on the Web, the most important issue is knowing the precise semantics of the data. Without this, the results cannot be trusted. Here, Semantic Web technologies come to the rescue, as they allow semantics ranging from very simple to very complex to be specified for any web-available resource. SW technologies do not only support capturing the "passive" semantics, but also support active inference and reasoning on the data. The chapter first presents a motivating running example, followed by an introduction to the relevant SW foundation concepts. The chapter then goes on to survey the use of SW technologies for data integration, including semantic data annotation and semantics-aware extract, transform, and load processes (ETL). Next, the chapter describes the relationship of multidimensional (MD) models and SW technologies, including the relationship between MD models and SW formalisms, and the use of advanced SW reasoning functionality on MD models. Finally, the chapter describes in detail a number of directions for future research, including SW support for intelligent BI querying, using SW technologies for providing context to data warehouses, and scalability issues. The overall conclusion is that SW technologies are very relevant for the future of BI, but that several new developments are needed to reach the full potential.
2010
- Alberto Abelló, Oscar Romero. Using ontologies to discover fact IDs. In 13th International Workshop on Data Warehousing and OLAP (DOLAP 2010). Toronto (Canada), October 2010. Pages 3-10. ACM Press, 2010. ISBN: 978-1-4503-0383-5. DOI: 10.1145/1871940.1871944
Object identification is a crucial step in most information systems. Nowadays, we have many different ways to identify entities such as surrogates, keys and object identifiers. However, not all of them guarantee the entity identity. Many works have been introduced in the literature for discovering meaningful IDs, but all of them work at the logical or data level and they share some constraints inherent to the kind of approach. Addressing it at the logical level, we may miss some important data dependencies, while the cost to identify data dependencies at the data level may not be affordable. In this paper, we propose an approach for discovering fact IDs from domain ontologies. In our approach, we guide the process at the conceptual level and we introduce a set of pruning rules for improving the performance by reducing the number of ID hypotheses generated and to be verified with data. Finally, we also introduce a simulation over a case study to show the feasibility of our method.
- A. Abelló, X. Burgués, M. E. Rodríguez. Utilización de glosarios de Moodle para incentivar la participación y dedicación de los estudiantes. In XVI Jornadas de Enseñanza Universitaria de la Informática (JENUI), Santiago de Compostela (España), 2010. Pages 309-316. ISBN: 84-693-3741-7
La entrada en el EEES y la adopción del nuevo sistema de créditos ECTS, que mide las horas de dedicación del estudiante y no las del profesor, hace que debamos plantearnos nuevos métodos docentes que incentiven, al mismo tiempo que acoten y controlen, la dedicación de los estudiantes fuera del aula. Es en este sentido que hemos experimentado el uso de los glosarios provistos por Moodle para fomentar que los estudiantes repasen en casa la teoría presentada en clase, de forma continuada a lo largo del curso (no únicamente en vísperas del examen final).
- Xavier Burgués, Carme Quer, Carme Martín, Alberto Abelló, M. José Casany, M. Elena Rodríguez, Toni Urpí. Adapting LEARN-SQL to Database computer supported cooperative learning. In Workshop on Methods and Cases in Computing Education (MCCE). Cadiz (Spain), July 2010.
LEARN-SQL is a tool that we are using since three years ago in several database courses, and that has shown its positive effects in the learning of different database issues. This tool allows proposing remote questionnaires to students, which are automatically corrected giving them a feed-back and promoting their self-learning and self-assessment of their work. However, this tool as it is currently used does not has the possibility to propose structured exercises to teams that promote their cooperative learning. In this paper, we present our adaptation of the LEARN-SQL tool for allowing some Computer-Supported Collaboration Learning techniques.
- Carme Martín, Alberto Abelló, Xavier Burgués, M. José Casany, Carme Quer, M. Elena Rodríguez, Toni Urpí. Adaptació d'assignatures de bases de dades a l'EEES. In VII Congreso Internacional de Docencia Universitaria e Innovación (CIDUI). Barcelona (Spain), July 2010.
Els canvis recents en els plans d´estudis de la UPC i la UOC tenen en compte el nou espai europeu d´educació superior (EEES). Una de les conseqüències directes d´aquests canvis és la necessitat d´afitar i optimitzar el temps dedicat a les activitats d´aprenentatge que requereixen la participació activa de l´estudiant i que es realitzen de manera continuada durant el semestre. A més, l´EEES destaca la importància de les pràctiques, les relacions interpersonals i la capacitat de treballar en equip, suggerint la reducció de classes magistrals i l´augment d´activitats que fomentin tant el treball personal de l´estudiant com el cooperatiu. En l´àmbit de la docència informàtica d´assignatures de bases de dades el problema és especialment complex degut a que els enunciats de les proves no acostumen a tenir una solució única. Nosaltres hem desenvolupat una eina, anomenada LEARN-SQL, l´objectiu de la qual és corregir automàticament qualsevol tipus de sentència SQL (consultes, actualitzacions, procediments emmagatzemats, disparadors, etc ...) i discernir si la resposta aportada per l´estudiant és o no és correcta amb independència de la solució concreta que aquest proposi. D´aquesta manera potenciem l´autoaprenentatge i l´autoavaluació, fent possible la semi-presencialitat supervisada i facilitant l´aprenentatge individualitzat segons les necessitats de cada estudiant. Addicionalment, aquesta eina ajuda als professors a dissenyar les proves d´avaluació, permetent també la opció de revisar qualitativament les solucions aportades pels estudiants. Per últim, el sistema proporciona ajuda als estudiants per a que aprenguin dels seus propis errors, proporcionant retroalimentació de qualitat.
- Oscar Romero. Automating the multidimensional design of data warehouses. PhD Thesis, Universitat Politècnica de Catalunya. Barcelona, February 2010.
Previous experiences in the data warehouse field have shown that the data warehouse multidimensional conceptual schema must be derived from a hybrid approach: i.e., by considering both the end-user requirements and the data sources, as first-class citizens. Like in any other system, requirements guarantee that the system devised meets the end-user necessities. In addition, since the data warehouse design task is a reengineering process, it must consider the underlying data sources of the organization: (i) to guarantee that the data warehouse must be populated from data available within the organization, and (ii) to allow the end-user discover unknown additional analysis capabilities.
Currently, several methods for supporting the data warehouse modeling task have been provided. However, they suffer from some significant drawbacks. In short, requirement-driven approaches assume that requirements are exhaustive (and therefore, do not consider the data sources to contain alternative interesting evidences of analysis), whereas data-driven approaches (i.e., those leading the design task from a thorough analysis of the data sources) rely on discovering as much multidimensional knowledge as possible from the data sources. As a consequence, data-driven approaches generate too many results, which mislead the user. Furthermore, the design task automation is essential in this scenario, as it removes the dependency on an expert's ability to properly apply the method chosen, and the need to analyze the data sources, which is a tedious and timeconsuming task (which can be unfeasible when working with large databases). In this sense, current automatable methods follow a data-driven approach, whereas current requirement-driven approaches overlook the process automation, since they tend to work with requirements at a high level of abstraction. Indeed, this scenario is repeated regarding data-driven and requirement-driven stages within current hybrid approaches, which suffer from the same drawbacks than pure data-driven or requirement-driven approaches.
In this thesis we introduce two different approaches for automating the multidimensional design of the data warehouse: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both approaches were devised to overcome the limitations from which current approaches suffer. Importantly, our approaches consider opposite initial assumptions, but both consider the end-user requirements and the data sources as first-class citizens.
1. MDBE follows a classical approach, in which the end-user requirements are well-known beforehand. This approach benefits from the knowledge captured in the data sources, but guides the design task according to requirements and consequently, it is able to work and handle semantically poorer data sources. In other words, providing high-quality end-user requirements, we can guide the process from the knowledge they contain, and overcome the fact of disposing of bad quality (from a semantical point of view) data sources.
2. AMDO, as counterpart, assumes a scenario in which the data sources available are semantically richer. Thus, the approach proposed is guided by a thorough analysis of the data sources, which is properly adapted to shape the output result according to the end-user requirements. In this context, disposing of high-quality data sources, we can overcome the fact of lacking of expressive end-user requirements.
Importantly, our methods establish a combined and comprehensive framework that can be used to decide, according to the inputs provided in each scenario, which is the best approach to follow. For example, we cannot follow the same approach in a scenario where the end-user requirements are clear and well-known, and in a scenario in which the end-user requirements are not evident or cannot be easily elicited (e.g., this may happen when the users are not aware of the analysis capabilities of their own sources). Interestingly, the need to dispose of requirements beforehand is smoothed by the fact of having semantically rich data sources. In lack of that, requirements gain relevance to extract the multidimensional knowledge from the sources.
So that, we claim to provide two approaches whose combination turns up to be exhaustive with regard to the scenarios discussed in the literature.
- Oscar Romero, Alberto Abelló. A framework for multidimensional design of data warehouses from ontologies (© Elsevier). In Data & Knowledge Engineering, Volume 69, Issue 11. Pages 1138-1157. Elsevier, 2010. ISSN: 0169-023X. DOI: 10.1016/j.datak.2010.07.007
The data warehouse design task needs to consider both the end-user requirements and the organization data sources. For this reason, the data warehouse design has been traditionally considered a reengineering process, guided by requirements, from the data sources.
Most current design methods available demand highly-expressive end-user requirements as input, in order to carry out the exploration and analysis of the data sources. However, the task to elicit the end-user information requirements might result in a thorough task. Importantly, in the data warehousing context, the analysis capabilities of the target data warehouse depend on what kind of data is available in the data sources. Thus, in those scenarios where the analysis capabilities of the data sources are not (fully) known, it is possible to help the data warehouse designer to identify and elicit unknown analysis capabilities.
In this paper we introduce a user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks. Our proposal is based on a reengineering process that derives the multidimensional schema from a conceptual formalization of the domain. It starts by fully analyzing the data sources to identify, without considering requirements yet, the multidimensional knowledge they capture (i.e., data likely to be analyzed from a multidimensional point of view). Next, we propose to exploit this knowledge in order to support the requirements elicitation task. In this way, we are already conciliating requirements with the data sources, and we are able to fully exploit the analysis capabilities of the sources. Once requirements are clear, we automatically create the data warehouse conceptual schema according to the multidimensional knowledge extracted from the sources.
- Oscar Romero, Alberto Abelló. Automatic validation of requirements to support multidimensional design (© Elsevier). In Data & Knowledge Engineering, Volume 69, Issue 9. Pages 917-942. Elsevier, 2010. ISSN: 0169-023X. DOI: 10.1016/j.datak.2010.03.006
It is widely accepted that the conceptual schema of a data warehouse must be structured according to the multidimensional model. Moreover, it has been suggested that the ideal scenario for deriving the multidimensional conceptual schema of the data warehouse would consist of a hybrid approach (i.e., a combination of data-driven and requirement-driven paradigms). Thus, the resulting multidimensional schema would satisfy the end-user requirements and would be conciliated with the data sources. Most current methods follow either a data-driven or requirement-driven paradigm and only a few use a hybrid approach. Furthermore, hybrid methods are unbalanced and do not benefit from all of the advantages brought by each paradigm.
In this paper we present our approach for multidimensional design. The most relevant step in our framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements. MDBE introduces several advantages over previous approaches, which can be summarized as three main contributions. (i) The MDBE method is a fully automatic approach that handles and analyzes the end-user requirements automatically. (ii) Unlike data-driven methods, we focus on data of interest to the end-user. However, the user may not be aware of all the potential analyses of the data sources and, in contrast to requirement-driven approaches, MDBE can propose new multidimensional knowledge related to concepts already queried by the user. (iii) Finally, MDBE proposes meaningful multidimensional schemas derived from a validation process. Therefore, the proposed schemas are sound and meaningful.
- Alberto Abelló, Il-Yeol Song. Data warehousing and OLAP (DOLAP'08) (© Elsevier). In Data & Knowledge Engineering, Volume 69, Issue 1. Pages 1-2. Elsevier, 2010. ISSN: 0169-023X. DOI: 10.1016/j.datak.2009.08.011
2009
- Oscar Romero, Diego Calvanese, Alberto Abelló, Mariano Rodriguez-Muro. Discovering functional dependencies for multidimensional design. In 12th International Workshop on Data Warehousing and OLAP (DOLAP 2009). Hong Kong (China), November 2009. Pages 1-8. ACM Press, 2009. ISBN 978-1-60558-801-8.
Nowadays, it is widely accepted that the data warehouse design task should be largely automated. Furthermore, the data warehouse conceptual schema must be structured according to the multidimensional model and as a consequence, the most common way to automatically look for subjects and dimensions of analysis is by discovering functional dependencies (as dimensions functionally depend of the fact) over the data sources.
Most advanced methods for automating the design of the data warehouse carry out this process from relational OLTP systems, assuming that a RDBMS is the most common kind of data source we may find, and taking as starting point a relational schema. In contrast, in our approach we propose to rely instead on a conceptual representation of the domain of interest formalized through a domain ontology expressed in the DL-Lite Description Logic. In our approach, we propose an algorithm to discover functional dependencies from the domain ontology that exploits the inference capabilities of DL-Lite, thus fully taking into account the semantics of the domain. We also provide an evaluation of our approach in a real-world scenario.
- Alberto Abelló, Oscar Romero. On-Line Analytical Processing (OLAP). In Encyclopedia of Database Systems (editors-in-chief: Tamer Ozsu & Ling Liu). Springer 2009. Pages 1949-1954. ISBN: 978-0-387-39940-9
- A. Abelló, X. Burgués, M. J. Casany, C. Martín, C. Quer, T. Urpí, M. E. Rodríguez. LEARN-SQL: Herramienta de gestión de ejercicios de SQL con autocorrección. In XV Jornadas de Enseñanza Universitaria de la Informática (JENUI), Barcelona (España), 2009. Pages 353-360. ISBN 978-84-692-2758-9
Algunas herramientas de autocorrección existen ya en el ámbito de la docencia informática. No obstante en asignaturas de bases de datos el problema es especialmente complejo debido a la gran variedad de tipos de ejercicios (los sistemas existentes se limitan a consultas) y a que éstos no tienen solución única. Nuestro sistema tiene como objetivo corregir automáticamente cualquier tipo de sentencia SQL (consultas, actualizaciones, procedimientos, disparadores, creación de índices, etc.) y discernir si la respuesta aportada por el estudiante es o no correcta con independencia de la solución concreta que éste proponga. En esta comunicación presentaremos específicamente el módulo encargado de la gestión de ejercicios y todas las tipologías de estos que estamos utilizando en la actualidad.
- Oscar Romero, Alberto Abelló. A Survey of Multidimensional Modeling Methodologies. In International Journal on Data Warehousing and Mining (IJDWM), volume 5, number 2. Idea Group 2009. Pages 1-23. ISSN: 1548-3924
Many methodologies have been presented to support the multidimensional design of the data warehouse. First methodologies introduced were requirement-driven but the semantics of a data warehouse require to also consider data sources along the design process. In the following years, data sources gained relevance in multidimensional modeling and gave rise to several data-driven methodologies that automate the data warehouse design process from relational sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methodologies have been introduced proposing to combine data-driven and requirement-driven approaches. On the other hand, new approaches focus on considering other kind of structured data sources that have gained relevance in the last years such as ontologies or XML. In this article we present the most relevant methodologies introduced in the literature and a detailed comparison showing main features of each approach.
2008
- Il-Yeol Song and Alberto Abelló. Foreword (© ACM). In 11th International Workshop on Data Warehousing and OLAP (DOLAP 2008). Napa (USA), November 2008. ACM Press, 2008. ISBN 978-1-60558-387-7.
- Oscar Romero and Alberto Abelló. MDBE: Automatinc Multidimensional Modeling (© Springer). In 27th International Conference on Conceptual Modeling (ER 2008). Barcelona (Spain), October 2008. LNCS 5231, Pages 534-535. Springer, 2008. ISSN 0302-9743.
The goal of this demonstration is to present MDBE, a tool implementing our methodology for automatically deriving multidimensional schemas from relational sources, bearing in mind the end-user requirements. Our approach starts gathering the end-user information requirements that will be mapped over the data sources as SQL queries. Based on the constraints that a query must preserve to make multidimensional sense, MDBE automatically derives multidimensional schemas which agree with both the input requirements and the data sources.
-
Alberto Abelló, M. Elena Rodríguez, Toni Urpí, Xavier Burgués, M. José Casany, Carme Martín, Carme Quer. LEARN-SQL:Automatic Assessment of SQL Based on IMS QTI Specification (© IEEE). Poster session in 8th International Conference on Advanced Learning Technologies (ICALT 2008). Santander (Spain), July 2008. Pages 592-593. IEEE, 2008. ISBN 978-0-7695-3167-0. See poster
In this paper we present LEARN-SQL, a system conforming to the IMS QTI specification that allows on-line learning and assessment of students on SQL skills in an automatic, interactive, informative, scalable and extensible manner.
- Xavier Burgués, Carme Quer, Alberto Abelló, M. José Casany, Carme Martín, M. Elena Rodríguez, Toni Urpí. Uso de LEARN-SQL en el aprendizaje cooperativo de Bases de Datos. In XIV Jornadas de Enseñanza Universitaria de la Informática (JENUI 2008). Granada (Spain), July 2008. Pages 359-366. FER fotocomposición, 2008. ISBN 978-84-612-4475-1.
En este artículo se describen los cambios efectuados en algunas asignaturas del área de bases de datos en dos vertientes: organizativa y tecnológica. En la primera, el objetivo principal ha sido la introducción de técnicas de aprendizaje cooperativo. En la segunda, el objetivo ha sido potenciar el autoaprendizaje y el autoevaluación a través de la herramienta LEARN-SQL. Los cambios relacionados con las dos vertientes se han aplicado, hasta el momento, a asignaturas distintas. Para finalizar el artículo, se hace una valoración de los resultados obtenidos, y se trazan las líneas de futuros cambios orientados a la combinación de las dos vertientes.
- M. José Casany, Carme Martín, Alberto Abelló, Xavier Burgués, Carme Quer, M. Elena Rodríguez, Toni Urpí. LEARN-SQL: A blended learning tool for the database area. In V Congreso Internacional de Docencia Universitaria e Innovación (CIDUI 2008). Lleida (Spain), July 2008. ISBN 978-84-8458-279-3.
The academic programs of the UPC and UOC are adapting to the European Credit Transfer System (ECTS). One of the changes introduced in the academic programs of the previous universities tries to optimize the time of the activities that require the active participation of the students. The definition of these activities is a very complex task specially when dealing with database teaching in ICT engineering degrees, because usually the questions do not have a unique solution. LEARN -SQL is the tool developed by our group that automatically evaluates the correctness of any SQL statement (queries, updates, stored procedures, triggers etc.) with independence of the student solution. Furthermore, LEARN-SQL helps teachers design their tests as well as allow them review the solutions provided by the students. Finally, the system provides students with valuable feedback, so that they can learn from their mistakes.
2007
- Oscar Romero and Alberto Abelló. Automating Multidimensional Design from Ontologies (© ACM). In 10th International Workshop on Data Warehousing and OLAP (DOLAP 2007). Lisbon (Portugal), November 2007. Pages 1-8. ACM Press, 2007. ISBN 1-59593-827-5.
This paper presents a new approach to automate the multidimensional design of Data Warehouses. In our approach we propose a semi-automatable method aimed to find the business multidimensional concepts from a domain ontology representing different and potentially heterogeneous data sources of our business domain. In short, our method identifies business multidimensional concepts from heterogeneous data sources having nothing in common but that they are all described by an ontology.
- Alberto Abelló, Toni Urpí, M. Elena Rodríguez, and Marc Estévez. Extensión de Moodle para facilitar la corrección automática de cuestionarios y su aplicación en el ámbito de las bases de datos. In MoodleMoot'07 (Moodle). Cáceres (Spain), October 2007.
Moodle 1.5 dispone de un módulo de cuestionarios que facilita la gestión de un conjunto de preguntas para su posterior uso en diferentes cuestionarios que pueden ir definiéndose según las necesidades de cada curso. Básicamente, las preguntas pueden ser de opción múltiple o bien de respuesta corta. En caso de preguntas de respuesta corta, la simple presencia de un espacio en blanco de más o de menos en la respuesta del estudiante (respecto a la solución introducida previamente por el profesor) hace que ésta se considere incorrecta. En el ámbito de la docencia en informática, asignaturas como, por ejemplo, "programación" o "bases de datos", el problema es especialmente sangrante, debido a que los enunciados no acostumbran a tener solución única. Es por esto que nos planteamos la posibilidad de desarrollar un nuevo módulo para Moodle que permitiera más posibilidades en la corrección, que la simple comparación carácter a carácter respecto a la solución aportada por el profesor. Así pues, hemos desarrollado un nuevo tipo de cuestionario cuyas preguntas se encuentran en un repositorio externo al Moodle. Cada una de estas preguntas tiene asociado uno o más Servicios Web que son capaces de discernir si la respuesta del estudiante es correcta o no. En nuestro caso, estábamos interesados en la corrección de consultas sobre una base de datos utilizando SQL, pero mediante el mismo módulo conectando con un Servicio Web diferente, se puede corregir cualquier tipo de pregunta, no necesariamente del ámbito de bases de datos. Básicamente, únicamente requiere que la corrección sea objetivable y, en consecuencia, exista un procedimiento que permita realizarla automáticamente.
- Oscar Romero, and Alberto Abelló. MDBE: Una herramienta Automática para el Modelado Multidimensional. Demonstration in Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2007). Zaragoza(Spain), September 2007. Pages 387-388. Thomson Editores, ISBN 978-84-9732-595-0.
Para facilitar el proceso de modelado multidimensional de un DW, en este trabajo presentamos MDBE (Multidimensional Design By Examples): nuestra propuesta de herramienta para validar requisitos multidimensionales proporcionados por el usuario final y expresados como consultas SQL sobre las fuentes de datos operacionales. MDBE descompone la consulta SQL de entrada para extraer el conocimiento multidimensional relevante que contiene y acorde con dicha información, deriva un conjunto de esquemas multidimensionales que satisfacen los requisitos (consultas) del usuario. Es decir, nos propone posibles esquemas multidimensionales de forma automática.
- Oscar Romero and Alberto Abelló. On the Need of a Reference Algebra for OLAP (© Springer-Verlag). In 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK'07). Regensburg (Germany), September, 2007. Pages 99-110, Lecture Notes in Computer Science volume 4654. Springer, 2007. ISSN 0302-9743, ISBN 3-540-28566-0.
Although multidimensionality has been widely accepted as the best solution to conceptual modeling, there is not such agreement about the set of operators to handle multidimensional data. This paper presents a comparative of the existing multidimensional algebras trying to find a common backbone, as well as it discusses about the necessity of a reference multidimensional algebra and the current state of the art.
- Oscar Romero and Alberto Abelló. Generating Multidimensional Schemas from the Semantic Web. Poster session in 19th Conference on Advanced Information Systems Engineering (CAiSE'07). Trodheim (Norwey), June 2007.
In this paper, we introduce a semi-automatable method aimed to find the business multidimensional concepts from an ontology representing the organization domain. With these premises, our approach falls into the Semantic Web research area, where ontologies play a key role to provide a common vocabulary describing the meaning of relevant terms and relationships among them.
2006
- Stefano Rizzi, Alberto Abelló, Jens Lechtenbörger, and Juan Trujillo. Research in Data Warehouse Modeling and Design: Dead or Alive? (© ACM). In 9th International Workshop on Data Warehousing and OLAP (DOLAP 2006). Arlington (USA), November 2006. Pages 3-10. ACM Press, 2006. ISBN 1-59593-530-4.
Multidimensional modeling requires specialized design techniques. Though a lot has been written about how a data warehouse should be designed, there is no consensus on a design method yet. This paper follows from a wide discussion that took place in Dagstuhl, during the Perspectives Workshop "Data Warehousing at the Crossroads", and is aimed at outlining some open issues in modeling and design of data warehouses. More precisely, issues regarding conceptual models, logical models, methods for design, interoperability, and design for new architectures and applications are considered.
- Alberto Abelló, Roberto García, Rosa Gil, Marta Oliva, and Ferran Perdix. Semantic Data Integration in a Newspaper Content Management System (© Springer-Verlag). In 5th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE'06) poster session. Lyon (France), October, 2006. Pages 41-41, Lecture Notes in Computer Science volume 4277. Springer, 2006. ISSN 0302-9743, ISBN 3-540-28566-0. See poster
A newspaper content management system has to deal with a very heterogeneous information space as the experience in the Diari Segre newspaper has shown us. The greatest problem is to harmonise the different ways the involved users (journalist, archivists&) structure the newspaper information space, i.e. news, topics, headlines, etc. Our approach is based on ontology and differentiated universes of discourse (UoD). Users interact with the system and, from this interaction, integration rules are derived. These rules are based on Description Logic ontological relations for subsumption and equivalence. They relate the different UoD and produce a shared conceptualisation of the newspaper information domain.
- Oscar Romero and Alberto Abelló. Multidimensional Design by Examples (© Springer-Verlag). In 8th International Conference on Data Warehousing and Knowledge Discovery (DaWaK'06). Krakov (Poland), September, 2006. Pages 85-94, Lecture Notes in Computer Science volume 4081. Springer, 2006. ISSN 0302-9743, ISBN 3-540-28566-0.
In this paper we present a method to validate user multidi-mensional requirements expressed in terms of SQL queries. Furthermore, our approach automatically generates and proposes the set of multidimensional schemas satisfying the user requirements, from the organizational operational schemas. If no multidimensional schema is generated for a query, we can state that requirement is not multidimensional.
- Alberto Abelló, José Samos, and Fèlix Saltor. YAM²: A Multidimensional Conceptual Model Extending UML (© Elsevier). In Information Systems 31 (6), September, 2006. Pages 541-567. Elsevier, 2006. ISSN 0306-4379.
This paper presents a multidimensional conceptual Object-Oriented model for Data Warehousing and OLAP tools, its structures,integrity constraints and query operations. It has been developed as an extension of UML core metaclasses to facilitate its usage, and try to fill the absence of a standard model. Being a UML extension allows reusing modeling constructs and techniques, and integrating multidimensional modeling in more general modeling processes. Moreover,while existing multidimensional models are restricted to the modeling of isolated stars, this paper investigates the representation of several semantically related star schemas. Summarizability and identification constraints can also be represented in the model, and a closed and complete set of algebraic operations has been defined in terms of functions (so that mathematical properties of functions can be smoothly applied).
- Adriana Marotta, Federico Piedrabuena, and Alberto Abelló. Managing Quality Properties in a ROLAP Environment (© Springer-Verlag). In 18th Conference on Advanced Information Systems Engineering (CAiSE'06). Luxemburg, June 2006. Pages 127-141, Lecture Notes in Computer Science volume 4001. Springer, 2006. ISSN 0302-9743, ISBN 3-540-28566-0.
In this work we propose, for an environment where multidimensional queries are made over multiple Data Marts, techniques for providing the user with quality information about the retrieved data. This meta-information behaves as an added value over the obtained information or as an additional element to take into account during the proposition of the queries. The quality properties considered are freshness, availability and accuracy. We provide a set of formulas that allow estimating or calculating the values of these properties, for the result of any multidimensional operation of a predefined basic set.
- Oscar Romero and Alberto Abelló. On the Mismatch Between Multidimensionality and SQL. Technical Report LSI-06-32-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), June 2006.
ROLAP tools are intended to ease information analysis and navigation through the whole Data Warehouse. These tools automat-ically generate a query according to the multidimensional operations performed by the end-user, using the relational database technology to implement multidimensionality and consequently, automatically trans-lating multidimensional operations to SQL. In this paper, we consider this automatic translation process in detail and to do so, we present an exhaustive comparative (both theoretical and practical) between the multidimensional algebra and the relational one. Firstly, we discuss about the necessity of a multidimensional algebra with regard to the relational one and later, we thoroughly study those considerations to be made to guarantee the correctness of a cube-query (an SQL query making mul-tidimensional sense). With this aim, we analyze the multidimensional algebra expressiveness with regard to SQL pointing out the features a query must satisfy to make multidimensional sense and we also focus on those problems that can arise in a cube-query due to SQL intrinsic restrictions. The SQL translation of an isolated operation does not rep-resent a problem, but when mixing up the modifications brought about by a set of operations in a single cube-query, some conflicts derived from SQL could emerge depending on the operations involved. Therefore, if these problems are not detected and treated appropriately, the automatic translation can retrieve unexpected results.
- Alberto Abelló, and Fernando Carpani. Using OWL to integrate relational Schemas. Technical Report LSI-06-10-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), March 2006.
Ontologies offer two contributions to the Semantic Web. On the first hand, they show a vocabulary consensus inside a community. On the other hand, they provide reasoning capabilities. In this paper we present a completely automatic translation from relational schemas to OWL, so that inference mechanisms can be used to integrate different schemas, by dealing with structure heterogeneities. The output of the translation algorithm, which explicits functional dependencies in the relational schema, belongs to OWL Full.
2005
- Oscar Romero, and Alberto Abelló. Improving automatic SQL translation for ROLAP tools. In Proceedings of Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2005). Granada (Spain), September 2005. Pages 123-130. Thomson Editores, ISBN 84-9732-434-X.
In the last years, despite a vast amount of work have been devoted to modeling multidimensionality, multidimensional algebra translation to SQL have been overlooked. ROLAP tools automatically generate a cubequery according to the operations performed by the user. The SQL translation does not represent a problem when treating isolated operations but when mixing up together modifications brought about by a set of operations in the same cube-query, some conflicts could emerge depending on the operations involved. Therefore, if these problems are not detected and treated appropriately, the automatic translation can retrieve unexpected results. In this paper, we define and classify conflicts raised when automatically translating a multidimensional algebra to SQL, and analyze how to solve or minimize their impact.
- Alberto Abelló, Xavi de Palol, and Mohand-Saïd Hacid. On the Midpoint of a Set of XML Documents (© Springer-Verlag). In 16th International Conference on Database and Expert Systems Applications (DEXA 05). Copenhagen (Denmark), August 2005. Pages 441-450, Lecture Notes in Computer Science volume 3588. Springer, 2005. ISSN 0302-9743, ISBN 3-540-28566-0.
The WWW contains a huge amount of documents. Some of them share the subject, but are generated by different people or even organizations. To guarantee the interchange of such documents, we can use XML, which allows to share documents that do not have the same structure. However, it makes dificult to understand the core of such heterogeneous documents (in general, schema is not available). In this paper, we ofer a characterization and algorithm to obtain the midpoint (in terms of a resemblance function) of a set of semi-structured, heterogeneous documents without optional elements. The trivial case of midpoint would be the common elements to all documents. Nevertheless, in cases with several heterogeneous documents this may result in an empty set. Thus, we consider that those elements present in a given amount of documents belong to the midpoint. A exact schema could always be found generating optional elements. However, the exact schema of the whole set may result in overspecialization (lots of optional elements), which would make it useless.
- Alberto Abelló, Xavi de Palol, and Mohand-Saïd Hacid. Approximating the DTD of a set of XML documents. Technical Report LSI-05-7-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), March 2005.
Extended/preliminary version of the previous paper: "On the Midpoint of a Set of XML Documents".
2003
- Alberto Abelló, and Carme Martín. The Data Warehouse: A Temporal Database. In Proceedings of Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2003). Alacant (Spain), November 2003. Pages 675-684. Campobell S.L., ISBN 84-688-3836-5.
The aim of this paper is to bring together two research areas, i.e. "Data Warehouses" and "Temporal Databases", involving representation of time. In order to achieve this goal, data warehouse and temporal database research results have been surveyed. Looking at temporal aspects within a data warehouse, more similarities than differences between temporal databases and data warehouses have been found. The first closeness between these areas consists in the possibility of a data warehouse redefinition in terms of a bitemporal database. Another relation is the use of temporal languages in data warehousing. Moreover, the correspondence between advances in temporal evolution and storage, and data warehouses are presented. Finally, Object-Oriented temporal data models contribute to add the integration and subject-orientation that is required by a data warehouse. Therefore, this paper is focussed on how contributions of the temporal database research could benefit data warehouses.
- Alberto Abelló, José Samos, and Fèlix Saltor. Implementing Operations to Navigate Semantic Star Schemas (© ACM). In 6th International Workshop on Data Warehousing and OLAP (DOLAP 2003). New Orleans (USA), November 2003. Pages 56-62. ACM Press, 2003. ISBN 1-58113-727-3.
In the last years, lots of work have been devoted to multidimensional modeling, star shape schemas and OLAP operations. However, \foreign{drill-across} has not captured as much attention as other operations. This operation allows to change the subject of analysis keeping the same analysis space we were using to analyze another subject. It is assumed that this can be done if both subjects share exactly the same analysis dimensions. In this paper, besides the implementation of an algebraic set of operations on a RDBMS, we are going to show when and how we can change the subject of analysis in the presence of semantic relationships, even if the analysis dimensions do not exactly coincide.
- Carme Martín, and Alberto Abelló. A Temporal Study of Data Sources to Load a Corporate Data Warehouse (© Springer-Verlag). In 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2003). Prague (Czech Republic), September 2003. Pages 109-118, Lecture Notes in Computer Science volume 2737. Springer, 2003. ISSN 0302-9743, ISBN 3-540-40807-X.
The input data of the corporate data warehouse is provided by the data sources, that are integrated. In the temporal database research area, a bitemporal database is a database supporting valid time and transaction time. Valid time is the time when the fact is true in the modeled reality, while transaction time is the time when the fact is stored in the database. Defining a data warehouse as a bitemporal database containing integrated and subject-oriented data in support of the decision making process, transaction time in the data warehouse can always be obtained, because it is internal to a given storage system. When an event is loaded into the data warehouse, its valid time is transformed into a bitemporal element, adding transaction time, generated by the database management system of the data warehouse. However, depending on whether the data sources manage transaction time and valid time or not, we could obtain the valid time for the data warehouse or not. The aim of this paper is to present a temporal study of the different kinds of data sources to load a corporate data warehouse, using a bitemporal storage structure.
- Alberto Abelló, Elena Rodríguez, Fèlix Saltor, Marta Oliva, Cecilia Delgado, Eladio Garví and José Samos. On Operations to Conform Object-Oriented Schemas. Long paper in International Conference on Enterprise Information Systems (ICEIS 2003). Angers (France). April, 2003. Selected among the best papers of the conference to be published in "Enterprise Information Systems V", Kluwer Academic Publishers, 2004. Pages 49-56. ISBN 1-4020-1726-X
To build a Cooperative Information System from several preexisting, heterogeneous systems, the schemas of these systems must be integrated. Operations used for this purpose include conforming operations, which change the form of a schema. In this paper we present a systematic approach to establish which conforming operations for Object-Oriented schemas are needed, and which of them can be considered as primitive, all others being derivable from these. We organize these operations in matrixes according to the Object-Oriented dimensions -Generalization/Specialization, Aggregation/Decomposition- on which they operate.
- Alberto Abelló, and Carme Martín. A Bitemporal Storage Structure for a Corporate Data Warehouse. Short paper in International Conference on Enterprise Information Systems (ICEIS 2003). Angers (France). April, 2003.
This paper brings together two research areas, i.e. "Data Warehouses" and "Temporal Databases", involving representation of time. Looking at temporal aspects within a data warehouse, more similarities than differences between temporal databases and data warehouses have been found. The first closeness between these areas consists in the possibility of a data warehouse redefinition in terms of a bitemporal database. A bitemporal storage mechanism is proposed along this paper. In order to meet this goal, a temporal study of data sources is developed. Moreover, we will show how Object-Oriented temporal data models contribute to add the integration and subject-orientation that is required by a data warehouse.
2002
- Alberto Abelló, Francisco Araque, Cecilia Delgado, Eladio Garví, Marta Oliva, Elena Rodríguez, Emilia Ruíz, Fèlix Saltor, José Samos, and Manolo Torres. Operaciones para Conformar Esquemas Orientados a Objetos. In Taller sobre Integración Semántica de Fuentes de Datos Distribuidas y Heterogéneas de las Jornadas de Ingeniería del Software y Bases de Datos (JISBD2002). El Escorial (Spain), November 2002. (In Spanish)
- Alberto Abelló, José Samos, and Fèlix Saltor. On Relationships Offering New Drill-across Possibilities (© ACM). In 5th International Workshop on Data Warehousing and OLAP (DOLAP 2002). McLean (USA), November 2002. Pages 7-13. ACM Press, 2002. ISBN 1-58113-590-4.
OLAP tools divide concepts based on whether they are used as analysis dimensions, or are the fact subject of analysis, which gives rise to star shape schemas. Operations are always provided to navigate inside such star schemas. However, the navigation among different stars is usually overlooked. This paper studies different kinds of Object-Oriented conceptual relationships (part of UML standard) between stars (namely Derivation, Generalization, Association, and Flow) that allow to drill across them.
- Carme Martín, and Alberto Abelló. The Data Warehouse: A Temporal Database. Technical Report LSI-02-66-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), Novembre 2002.
Extended version of the homonimous paper published in 2003.
- Alberto Abelló, José Samos, and Fèlix Saltor. YAM² (Yet Another Multidimensional Model): An extension of UML (© IEEE). In International Database Engineering & Applications Symposium (IDEAS'02). Edmonton (Canada), July 2002. Pages 172-181. Mario A. Nascimento, M. Tamer Özsu, Osmar Zaïne Editors. IEEE Computer Society Press, 2002. ISBN 0-7695-1638-6. ISSN 1098-8086.
This paper presents a multidimensional conceptual Object-Oriented model, its structures, integrity constraints and query operations. It has been developed as an extension of UML core metaclasses to facilitate its usage, as well as to avoid the introduction of completely new concepts. YAM² allows the representation of several semantically related star schemas, as well as summarizability and identification constraints.
- Alberto Abelló. YAM²: A Multidimensional Conceptual Model. PhD Thesis, Universitat Politècnica de Catalunya. Barcelona, April 2002.
This thesis proposes YAM², a multidimensional conceptual model for OLAP (On-Line Analytical Processing). It is defined as an extension of UML (Unified Modeling Language). The aim is to benefit from Object-Oriented concepts and relationships to allow the definition of semantically rich multi-star schemas. Thus, the usage of Generalization, Association, Derivation, and Flow relationships (in UML terminology) is studied.
An architecture based on different levels of schemas is proposed and the characteristics of its different levels defined. The benefits of this architecture are twofold. Firstly, it relates Federated Information Systems with Data Warehousing, so that advances in one area can also be used in the other. Moreover, the Data Mart schemas are defined so that they can be implemented on different Database Management Systems, while still offering a common integrated vision that allows to navigate through the different stars.
The main concepts of any multidimensional model are facts and dimensions. Both are analyzed separately, based on the assumption that relationships between aggregation levels are part-whole (or composition) relationships. Thus, mereology axioms are used on that analysis to prove some properties.
Besides structures, operations and integrity constraints are also defined for YAM². Due to the fact that, in this thesis, a data cube is defined as a function, operations (i.e. Drill-across, ChangeBase, Roll-up, Projection, and Selection) are defined over functions. Regarding the set of integrity constraints, they reflect the importance of summarizability (or aggregability) of measures, and pay special attention to it.
2001
- Alberto Abelló, Francisco Araque, José Samos, and Fèlix Saltor. Bases de Datos Federadas, Almacenes de Datos y Análisis Multidimensional. In Taller de Almacenes de Datos y Tecnologia OLAP de las Jornadas de Ingeniería del Software y Bases de Datos (JISBD2001). Almagro (Spain), November 2001. (In Spanish)
- Alberto Abelló, José Samos, and Fèlix Saltor. Understanding Facts in a Multidimensional Object-Oriented Model (© ACM). In 4th International Workshop on Data Warehousing and OLAP (DOLAP 2001). Atlanta (USA), November 2001. Pages 32-39. ACM Press, 2001. ISBN 1-58113-437-1.
"On-Line Analytical Processing" tools are used to extract information from the "Data Warehouse" in order to help in the decision making process. These tools are based on multidimensional concepts, i.e. facts and dimensions. In this paper we study the meaning of facts, and the dependencies in multidimensional data. This study is used to find relationships between cubes (in an Object-Oriented framework) and explain navigation operations.
- Alberto Abelló, José Samos, and Fèlix Saltor. Multi-star Conceptual Schemas for OLAP Systems.. Technical Report LSI-01-45-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), October 2001.
Extended version of the paper published in 2002: "On Relationships Offering New Drill-across Possibilities".
- Alberto Abelló, José Samos, and Fèlix Saltor. YAM2 (Yet Another Multidimensional Model): An extension of UML.. Technical Report LSI-01-43-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), October 2001.
Extended version of the homonimous paper published in 2002.
- Elena Rodríguez, Alberto Abelló, Marta Oliva, Fèlix Saltor, Cecilia Delgado, Eladio Garví and José Samos. On Operations along the Generalization/Specialization Dimension. In Proc. of the Int. Workshop on Engineering Federated Information Systems (EFIS 2001). Berlin (Germany), October 2001. Pages 70-83. ISBN 3-89838-027-0
The need to derive a database schema from one or more existing schemas arises in Federated Database Systems as well as in other contexts. Operations used for this purpose include conforming operations, which change the form of a schema. In this paper we present a systematic approach to establish a set of primitive conforming operations that operate along the Generalization/Specialization dimension in the context of Object-Oriented schemas.
- Alberto Abelló, José Samos, and Fèlix Saltor. A Framework for the Classification and Description of Multidimensional Data Models (© Springer-Verlag). In 12th International Conference on Database and Expert Systems Applications (DEXA 2001). Munich (Germany), September 2001. Pages 668-677, Lecture Notes in Computer Science volume 2113. Springer, 2001. ISSN 0302-9743, ISBN 3-540-42527-6.
The words On-Line Analytical Processing bring together a set of tools, that use multidimensional modeling in the management of information to improve the decision making process. Lately, a lot of work has been devoted to modeling the multidimensional space. The aim of this paper is twofold. On one hand, it compiles and classifies some of that work, with regard to the design phase they are used in. On the other hand, it allows to compare the different terminology used by each author, by placing all the terms in a common framework.
- Alberto Abelló, José Samos, and Fèlix Saltor. Understanding Analysis Dimensions in a Multidimensional Object-Oriented Model. In 3rd International Workshop on Design and Management of Data Warehouses (DMDW'2001). Interlaken (Switzerland), June 2001. SwissLife, ISSN 1424-4691.
OLAP defines a set of data warehousing query tools characterized by providing a multidimensional view of data. Information can be shown at different aggregation levels (often called granularities) for each dimension. In this paper, we try to outline the benefits of understanding the relationships between those aggregation levels as Part-Whole relationships, and how it helps to address some semantic problems. Moreover, we propose the usage of other Object-Oriented constructs to keep as much semantics as possible in analysis dimensions.
2000
- Alberto Abelló, José Samos, and Fèlix Saltor. A Data Warehouse Multidimensional Data Models Classification. Technical Report LSI-2000-6. Dept. Llenguages y Sistemas Informáticos (Universidad de Granada), December 2000.
The words On-Line Analytical Processing (OLAP) bring together a set of tools, that use multidimensional modeling in the extraction of information from the Data Warehouse. Lately, a lot of work has been devoted to modeling the multidimensional space. The aim of this paper is twofold. On one hand, it compiles and classifies most of that work. On the other hand, it allows to compare the different terminology used by each author, by placing all the terms in a common framework.
- Elena Rodríguez, Alberto Abelló, and Marta Oliva. Resumen del Simposium en Objetos y Bases de Datos del ECOOP'2000. In Taller de Bases de Datos Orientadas a Objetos dentro de las Jornadas de Ingeniería del Software y Bases de Datos (JISBD2000). Valladolid (Spain), November 2000. (In Spanish)
- Alberto Abelló, and Elena Rodríguez. Describing BLOOM99 with regard to UML Semantics. In Proceedings of Jornadas de Ingeniería del Software y Bases de Datos (JISBD2000). Valladolid (Spain), November 2000. Pages 307-319. Gráficas Andrés Martín S.L., ISBN 84-8448-065-8.
In this paper, we describe the BLOOM metaclasses with regard to the Unified Modeling Language (UML) semantics. We concentrate essentially on the Generalization/Specialization and Aggregation/Decomposition dimensions, because they are used to guide the integration process BLOOM was intended for. Here we focus on conceptual data modeling constructs that UML offers. In spite of UML provides much more abstractions than BLOOM, we will show that BLOOM still has some abstractions that UML does not. For some of these abstractions, we will sketch how UML can be extended to deal with this semantics that BLOOM adds.
- Fèlix Saltor, Marta Oliva, Alberto Abelló, and José Samos. Building Secure Data Warehouse Schemas from Federated Information Systems. In Int. CODATA Conference on Data and Information for the Coming Knowledge Milenium (CODATA2000), Baveno (Italy), October 2000 (Extended abstract). Heterogeneous Information Exchange and Organizational Hubs, pages 123-134. Bestougeff, Dubois and Thuraisingham Editors. Kluwer Academic Publishers, 2002. ISBN: 1-4020-0649-7.
There are similarities between architectures for Federated Information Systems and architectures for Data Warehousing. In the context of an integrated architecture for both Federated Information Systems and Data Warehousing, we discuss how additional schema levels provide security, and operations to convert from one level to the next.
- Alberto Abelló, José Samos, and Fèlix Saltor. Benefits of an Object-Oriented Multidimensional Data Model (© Springer-Verlag). In Objects and Databases - International Symposium- in 14th European Conference on Object-Oriented Programming (ECOOP 2000). Sophia Antipolis and Cannes (France), June 2000. Pages 141-152, Lecture Notes in Computer Science volume 1944. Springer, 2000. ISSN 0302-9743, ISBN 3-540-41664-1.
In this paper, we try to outline the goodness of using an O-O model on designing multidimensional Data Marts. We argue that multidimensional modeling is lacking in semantics, which can be obtained by using the O-O paradigm. Some benefits that could be obtained by doing this are classified in six O-O-Dimensions (i.e. Classification/Instantiation, Generalization/Specialization, Aggregation/Decomposition, Caller/Called, Derivability, and Dynamicity), and exemplified with specific cases.
- Alberto Abelló, Marta Oliva, José Samos, and Fèlix Saltor. Information System Architecture for Data Warehousing from a Federation. In Proc. of the Int. Workshop on Engineering Federated Information Systems (EFIS 2000). Dublin (Ireland), June 2000. Pages 33-40, IOS Press. ISBN 1-58603-075-2
This paper is devoted to Data Warehousing architecture and its data schemas. We relate a federated databases architecture to Data Warehouse schemas, which allows us to provide better understanding to the characteristics of every schema, as well as the way they should be defined. Because of the confidentiality of data used to make decisions, and the federated architecture used, we also pay attention to data protection.
- Alberto Abelló, Marta Oliva, José Samos, and Fèlix Saltor. Information System Architecture for Secure Data Warehousing. Technical Report LSI-00-26-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), April 2000.
Extended version of the previous paper: "Information System Architecture for Data Warehousing from a Federation".
1999
- José Samos, Alberto Abelló, Marta Oliva, Elena Rodríguez, Fèlix Saltor, Jaume Sistac, Francisco Araque, Cecilia Delgado, Eladio Garví and Emilia Ruíz. Sistema Cooperativo para la Integración de Fuentes Heterogéneas de Información y Almacenes de Datos. In Novatica, 142 (Nov-Dec 1999), pages 44-49. Asociación de Técnicos de Informática (ATI), 1999. (In Spanish). ISSN: 0211-2124.
En este trabajo se presenta nuestra propuesta de creación de un prototipo de sistema cooperativo para la integración de fuentes heterogéneas de información y almacenes de datos en el cual se centran actualmente nuestras investigaciones. El objetivo general es proporcionar una capa de software que permita la cooperación entre diversas fuentes de información que están interconectadas mediante una red de líneas de comunicación. Cada fuente posee sus propios servicios de respuesta a preguntas que sobre sus datos realizan sus usuarios y, adicionalmente, se desea ofrecer a determinados usuarios la capacidad de acceder al conjunto de datos de una forma uniforme (acceso integrado), ya sea en tiempo real, ya sea a través de almacenes de datos.
- Alberto Abelló, Marta Oliva, Elena Rodríguez, and Fèlix Saltor. The syntax of BLOOM99 schemas. Technical Report LSI-99-34-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), July 1999.
The BLOOM (BarceLona Object Oriented Model) data model was developed to be the Canonical Data Model (CDM) of a Federated Database Management System prototype. Its design satisfies the features that a data model should have to be suitable as a CDM. The initial version of the model (BLOOM91) has evolved into the present version, BLOOM99.
This report specifies the syntax of the schema definition language of BLOOM99. In our model, a schema is a set of classes, related through two dimensions: the generalization/specialization dimension, and the aggregation/decomposition dimension. BLOOM supports several features in each of these dimensions, through their corresponding metaclasses.
Even if users are supposed to define and modify schemas in an interactive way, using a Graphical User Interface, a linear schema definition language is clearly needed. Syntax diagrams are used in this report to specify the language; an alternative using grammar productions appears as Appendix A. A possible graphical notation is given in Appendix B.
A comprehensive running example illustrates the model, the language and its syntax, and the graphical notation.
- Alberto Abelló, Marta Oliva, Elena Rodríguez, and Fèlix Saltor. The BLOOM model revisited: An evolution proposal (poster sesion). In Workshop Reader of the 13th European Conference on Object-Oriented Programming (ECOOP'99). Lisboa, June 1999. Pages 376-378, Springer-Verlag, Lecture Notes in Computer Science. Vol. 1743, Springer, 2000. ISBN 3-540-66954-X
Once argued the desirable characteristics of a suitable CDM, the BLOOM model (BarceLona Object Oriented Model) was progressively defined. It results in an extension of an object oriented model with a semantically rich set of abstractions. BLOOM was not developed as a whole but suffered extensions in different phases. Its abstractions were conceived for building the FDBS in as needed basis. It drove to a lack of unity and differences in the nomenclature.
The necessity of revising the BLOOM model outcropped during the design process of the directory of the FDBS. It is essential to have such storage system because of the amount of needed information in building and operating a FDBS. The directory is the core of our FDBS architecture and it must contain the different schema levels as well as the mappings among them. Therefore, the model had to be fixed in order to store those schemas and mappings in a structured manner.
- Alberto Abelló. CORBA: A middleware for an heterogeneous cooperative system. Technical Report LSI-99-21-R. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), May 1999.
Two kinds of heterogeneities interfere with the integration of different information sources, those in systems and those in semantics. They generate different problems and require different solutions. This paper tries to separate them by proposing the usage of a distinct tool for each one (i.e. CORBA and BLOOM respectively), and analizing how they could collaborate. CORBA offers lots of ways to deal with distributed objects and their potential needs, while BLOOM takes care of the semantic heterogeneities. Therefore, it seems promising to handle the system heterogeneities by wrapping the components of the BLOOM execution architecture into CORBA objects.
- Alberto Abelló, and Fèlix Saltor. Implementation of the BLOOM data model on ObjectStore. Technical Report LSI-99-7-T. Dept Llenguatges i Sistemes Informàtics (Universitat Politècnica de Catalunya), May 1999.
BLOOM is a semantically enriched object oriented data model. It offers extra semantic abstractions to better represent the real world. Those abstractions are not implemented in any commercial product. This paper explains how all them could be simulated with a software layer on an object oriented database management system. Concretely, it proved to work on ObjectStore.
1998
- Alberto Abelló, Benet Càmpderrich, Marta Oliva, Elena Rodríguez, Fèlix Saltor, José Samos, and Jaume Sistac. El proyecto BLOOM: Bases de Datos Federadas, Interoperables y Cooperativas. In Encuentro de Investigadores en Software del Nordeste Ibérico (EncISO), Enciso (Spain), September 1998. (In Spanish)
"A celebrity is a person who works hard all his life to become known, then wears dark glasses to avoid being recognized."
|