Linked Open Data
Oscar RomeroDescription
The Semantic Web paradigm and related technologies brought new possiblities for publishing and sharing data on the Web. In this context, Resource Description Framework (RDF) raised as a standard model for representing data in the form of subject-predicate-object triples, thus being simple and flexible for modeling of different domains. The use of these new possibilities especially gained momentum with the Linked Open Data initiative bringing plenty of new publicly available (i.e., open) RDF data that are mutually interlinked. The data include common knowledge (e.g., DBpedia), public institution statistics (e.g., Eurostat), research publications (e.g., DBLP), and other diverse data, where related (e.g., same) concepts are linked and in this way making the (Semantic) Web one publicly available data base.
Although publicly available, challenges are still present for the users to consume (i.e., analyze) these data and gain valuable insights that they look for. In this direction, we believe that the area can tremendously benefit from the techniques and ideas inspired by traditional Business Intelligence (BI) approaches. Thus, on the one side we focus on preparing and modeling data to conform to the multidimensional model that is user-friendly for the analysis as shown in tradional BI settings. On the other side, we also aim to support the user as much as possible by means of metadata that can be exploited for the user assistance purposes.
Research Line: Data Integration and ETL for Semantic Data
In recent years, more and more semantic data has become freely available on the Web; websites are annotated with RDF markup, data collections are offered for download, and even interfaces for structured queries over such data can be used free of charge. One of the reasons why semantic data has become successful is that publishing and making the data available is low effort and does not rely on a sophisticated schema. Instead, various standard ontologies and self-designed extensions can be used. Being an advantage of the Semantic Web paradigm that the data format is highly flexible, this is a disadvantage during the ETL process where the schema plays an important role. In addition, the schema of semantic data is often not known beforehand, but encoded as part of the dataset itself. Furthermore, many sources have been automatically generated by converting other data formats into RDF or by information extraction techniques, and hence yield errors. Thus, in addition to the heterogeneities that ETL for traditional data has to deal with, additional challenges arise for semantic data, especially regarding cleansing and duplicate detection. The aim of this topic is to develop an approach that enables the ETL process for semantic data, despite the above-mentioned problems, by (1) developing scalable data integration techniques that can handle multiple semantic data sources, (2) implementing and appropriate environment to facilitate the ETL process, and (3) evaluating the proposed solutions.
Research Line: Discovering Analytical Concepts from User Profiles
Decision-making in the era of information society is based on the analysis of available data resources. There is an increasing number of publicly available data resources that represent a wealth of information to be explored. However, data exploration in these settings is often a tedious task due to the need for non-trivial technical skills (e.g., use of certain querying languages). Non-expert users need assistance to navigate through this data landscape to perform their analysis. Traditional BI systems provide different user support functionalities, typically for querying and data visualization. These features are based on the exploitation of metadata artifacts (e.g., queries). The metadata are the fuel for different user assistance (e.g., query recommendation) algorithms and they directly determine the assistance possibilities. However, their management and organization are typically overlooked. The aim of this research line is to provide a metadata foundation for the user assistance features for next generation BI systems. Our claim is that the metadata need to be considered and handled as a first-class citizen. Moreover, in the novel settings the user wants to analyze data coming from external and non-controlled data sources. Therefore, the metadata need to be designed in a flexible and reusable manner to be utilized in the context of these new and heterogeneous data sources. We will show that metadata are the neglected and unexploited treasure for user-centric BI systems.
Related publications
2022 Rudra Pratap Deb Nath, Oscar Romero, Torben Bach Pedersen, Katja Hose: High-level ETL for semantic data warehouses. Semantic Web 2022 2020 Rudra Pratap Deb Nath, Oscar Romero, Torben Bach Pedersen, Katja Hose: High-Level ETL for Semantic Data Warehouses - Full Version. CoRR 2020 2017 Rudra Pratap Deb Nath, Katja Hose, Torben Bach Pedersen, Oscar Romero: SETL: A programmable semantic extract-transform-load framework for semantic data warehouses. Inf. Syst. 2017 2006 Alberto Abelló, Roberto García, Rosa M. Gil, Marta Oliva, Ferran Perdrix: Semantic Data Integration in a Newspaper Content Management System. OTM Workshops (1) 2006