Big Data is traditionally defined with the three V's: Volume, Velocity and Variety. Traditionally, Big Data has been associated with Volume (e.g., the Hadoop ecosystem) and recently Velocity has earned its momentum (especially, with the arrival of Stream processors such as Spark). However, even if Variety has been part of the Big Data definition, how to tackle Variety in real-world projects is yet not clear and there are no standarized solutions (such as Hadoop for Volume or Spark for Velocity) for this challenge.
In this course the student will be introduced to advanced database technologies, modeling techniques and methods for tackling Variety for decision making. We will also explore the difficulties that arise when combining Variety with Volume and / or Velocity. The focus of this course is on the need to enrich the available data (typically owned by the organization) with external repositories (special attention will be paid to Open Data), in order to gain further insights into the organization business domain. There is a vast amount of examples of external data to be considered as relevant in the decision making processes of any company. For example, data coming from social networks such as Facebook or Twitter; data released by governmental bodies (such as town councils or governments); data coming from sensor networks (such as those in the city services within the Smart Cities paradigm); etc.
This is a new hot topic without a clear and established (mature enough) methodology. For this reason, it requires rigorous thinking, innovation and a strong technical background in order to master the inclusion of external data in an organization decision making processes. Accordingly, this course focuses on three main aspects:
1.- Technical aspect. This represents the core discussion in the course and includes:
- dealing with semi-structured or non-structured data (as in the Web),
- the effective use of metadata to understand external data as by means of Linked Data,
- mastering the main formalisms (mostly coming from the Semantic Web) to enrich the data with metadata (ontology languages, RDF, XML, etc.),
- determine relevant sources, apply and use semantic mechanisms to automate the addition (potentially integration), linkage and / or cross of data between heterogeneous data sources,
- refining and visualizing Open Data
2.- Ethic and social aspects, which includes:
- data ownership aspects,
- ethics and,
- identifying knowing legal frameworks (such as that of the LOPDP in Spain)
3.- Entrepreneurship and innovation, which includes:
- working on the visionary aspect to boost new analytical perspectives on a business domain by considering external sources and,
- developing added value to current systems by means of (such) external data