Advanced Data Management

Alberto Abelló, Amine Ghrab, Elvis Koci, Oscar Romero, Emmanouil Valsomatzis

  • Description

    This research line focuses on data management on non-traditional data formats. In our group we focus on two main data structures: graphs and flexible enery data management.

    Reseach line: Graph Data Warehouses

    Graphs are widely used to represent domains with complex structural properties. Applications include emerging topics such as social networks analysis, ontology management and bioinformatics. The greater expressive power of graphs enables revealing valuable insights on both the data and its structural representation. However, graph data modeling, querying and processing become more complex.

    Graph analysis is performed by traversing the network structures. Queries such as k-neighborhood or pattern matching are not obvious to express using traditional query languages such as SQL. The analysis is based on arbitrary traversal of the graph structure and could not be efficiently performed using block reads. The efficient management of graph data cannot be naturally handled by traditional data management approaches. This calls for new database models, query languages and processing frameworks naturally designed for graph structured data.

    At the multidimensional level, traditional OLAP frameworks provide a multi-level multi-perspective view of the data. They place the relevant measures within the multidimensional space and support their navigation and summarization following the cube metaphor. Graphs provide, in addition to numerical measure, a new class of complex structural measures such as the shortest path between nodes or centrality. Computation and aggregation of these measures require specific algorithms capable of computing and aggregating graphs. ROLAP engines are accepted as the most common logical models for data warehouses. The star and snowflake data models are built on the relational model and are designed to handle numerical data. They are not well-equipped for supporting the analysis and aggregation of structural properties of graphs. Therefore, ROLAP systems at their current state are also not ready for efficient multidimensional analysis of graph data.

    These limitations, at both the database and multidimensional levels, have called for the development of next-generation data warehousing systems that can provide the required features and performance.

    Reseach line: Flexible energy data management 

    Nowadays, the usage of energy produced by renewable sources such as wind and solar increases. Furthermore, new technological achievements such as electric vehicles and heat pumps may provoke overload of the power grid in the future, especially in peak demand situations. In this new energy scenery that is being formed, the power grid is gradually transformed to a Smart Grid that uses the information and communication technologies to improve the existing energy services.

    Within the Smart Grid, we aim to provide an alternative using the flex-offer (Micro-request) concept, based on the idea that the consumption of energy is not occurring only in fixed time slots but could be shifted and be flexible regarding time so that part of the consumption could be shifted away from the peaks or closer to the peaks of production respectively. Furthermore, those flex-offers could even be flexible regarding the amount of energy or even the price of the corresponding energy. For example, a consumer could use his dishwasher a few hours later than he intended to, because during the shifted time period there will be larger production of energy by wind power. As a result, in the future energy market there will be a need of management, storing and processing large amounts of data that represent such kind of flexibilities. Furthermore, the introduction of a new commodity (flex-offer) in the energy market will create a new energy market model in which business intelligence techniques will ensure its best
    operation. Specifically, we focus on advanced aggregation techniques over complex energy related data.

     Reseach line: Automating Information Extraction from Spreadsheets

    Spreadsheet applications have evolved to be a tool of great importance for businesses, open data and scientific communities. Using these applications, users can perform various transformations, address quality issues, generate new content, and format the data such that are visually comprehensive. The same data can be presented in deferent ways, depending on the preferences and the intentions of the user.

    All these make spreadsheet applications a user-friendly tool, but not as much machine-friendly. When it comes to the integration of spreadsheets with other sources, the structural and formatting flexibility is disadvantageous. In other words, it is rather difficult to algorithmically interpret the contents of these files. The current practices require manual involvements, which are cumbersome and timeconsuming.

    Overall the non-existence of an automatic processing method limits our ability to explore and reuse the great amount of rich data stored into partially-structured documents such as spreadsheets. In this research line we aim at solving this issue by developing a system able to understand the characteristics (e.g., structure and content type) of the data in spreadsheets. Such a system has to automatically perform many consecutive tasks, each dealing with a different aspect (challenge), before being able to extract the data in a usable form. However, we should consider that not all spreadsheets contain meaningful data. They are not only used to work in a tabular form, but also to create forms, scorecards, graphs and other not genuine table structures. The intended solution should be able to discard this files.

    In this research project, we are particularly interested on those spreadsheets containing data that can be transformed into the relational model. This allows us on the one hand to put spreadsheets data under the control of DBMSs and on the other hand to provide these data to a wide range of applications for data analysis, entity augmentation, etc. Since, spreadsheets that contain relational knowledge can exhibit different characteristics we need a flexible workflow of different transformation activities.

    Finally, we aim a solution able to work with large spreadsheet corpora. This will enable us to build a system that can be used on an enterprise level or that can be an integral component of research projects from related areas, such as information retrieval and data management.

     


    Related publications
    2017
    Rohit Kumar 0002, Alberto Abelló, Toon Calders: Cost Model for Pregel on GraphX. ADBIS 2017
    Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner: Table Identification and Reconstruction in Spreadsheets. CAiSE 2017
    2016
    Emmanouil Valsomatzis, Torben Bach Pedersen, Alberto Abelló, Katja Hose: Aggregating energy flexibilities under constraints. SmartGridComm 2016
    Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner: A Machine Learning Approach for Layout Inference in Spreadsheets. KDIR 2016
    Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro A. Vaisman, Esteban Zimányi: GRAD: On Graph Database Modeling. CoRR 2016
    Emmanouil Valsomatzis, Torben Bach Pedersen, Alberto Abelló, Katja Hose, Laurynas Siksnys: Towards constraint-based aggregation of energy flexibilities. e-Energy (Posters) 2016
    2015
    Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro A. Vaisman, Esteban Zimányi: A Framework for Building OLAP Cubes on Graphs. ADBIS 2015