Semantic Data Management (SDM)

Anna Queralt, Oscar Romero

More info

  • Description

    Big Data is traditionally defined with the three V's: Volume, Velocity and Variety. Traditionally, Big Data has been associated with Volume (e.g., the Hadoop ecosystem) and recently Velocity has earned its momentum (especially, with the arrival of Stream processors such as Apache Flink). However, currently, associating Big Data with Volume or Velocity is simply a mistake. The biggest challenge in Big Data Management is nowadays the Variety challenge and how to tackle Variety in real-world projects is yet not clear and there are no standarized solutions.

    In this course the student will be introduced to advanced database technologies, modeling techniques and methods for tackling Variety for decision making. The fundamental underlying theory is that of graph data management and processing. We will also explore the difficulties that arise when combining Variety with Volume and / or Velocity. The focus of this course is on the need to enrich the available data (typically owned by the organization) with external repositories (special attention will be paid to Open Data), in order to gain further insights into the organization business domain. There is a vast amount of examples of external data to be considered as relevant in the decision making processes of any company. For example, data coming from social networks such as Facebook or Twitter; data released by governmental bodies (such as town councils or governments); data coming from sensor networks (such as those in the city services within the Smart Cities paradigm); third parties, etc.

    This is a new hot topic without a clear and established (mature enough) methodology. For this reason, it requires rigorous thinking, innovation and a strong technical background in order to master the inclusion of external data in an organization decision making processes. Accordingly, this course focuses on two main aspects:

    1) Technical aspect. This represents the core discussion in the course and includes:

    • dealing with semi-structured or non-structured data (as in the Web),
    • the effective use of metadata to understand external data,
    • mastering the main formalisms (mostly coming from the Semantic Web) to enrich the data with metadata (ontology languages, RDF, XML, etc.),
    • determine relevant sources, apply and use semantic mechanisms to automate the addition (potentially integration), linkage and / or cross of data between heterogeneous data sources
    • learn the main approaches to perform data analysis natively on graph-based formalisms (i.e., reasoning, graph-based algorithms and machine learning).

     

    2) Entrepreneurship and innovation, which includes:

    • working on the visionary aspect to boost new analytical perspectives on a business domain by considering external sources and,
    • developing added value to current systems by means of (such) external data