: Incremental and Agnostic Data Integration

Nextia_DI is a library for incremental and agnostic Data Integration that facilitates generating schema of heterogeneous data sources and integrating them. This website is a companion of the research paper submitted to Semantic Web Journal, where we present the method underlying our approach. Nextia_DI's novelty lies on a) extraction of schemata leveraging on the structure of schemaless data sources; b) standardization of such extracted schemata into a canonical data modedl (i.e., the RDFS graph data model) using the technique of production rules; c) annotation-based schema integration for RDF graphs that allow to capture the relationships of the modeled data sources via unions and joins; d) automated derivation of the required DI constructs for specific querying systems (i.e., source schemata, schema mappings, and target schema). All such features are provided in such a way that they are agnostic of the target system, and are additionally performed in an incremental manner. Nextia_DI is implemented as a java library. We showcase the effectiveness of Nextia_DI to automatically generate all DI constructs of ODIN tool.

People

Publications

2022

Incremental Schema Integration for Data Wrangling via Knowledge Graphs paper submitted in Semantic Web Journal

Resources

Software repository

The source code of the system can be found in the following Github repository.

The easy way to use Nextia_DI is with Maven. For Gradle just add the following dependency in your build.sbt

implementation 'edu.upc.essi.dtim:nextiadi:0.1.0'

For bootstrapping, the following dependency is also required:

implementation group: 'org.glassfish', name: 'javax.json', version: '1.1.4'

For more ways to add Nextia_DI using Maven, please go here

You can check how to use Nextia_DI here or see the zeppelin notebook with an explanation step by step, see demonstration section

Reproducibility

We believe in transparent and shareable research [1], [2]. Hence, in the following you can find all material (e.g., notebooks, code, answers) related to our experiments

User study

This user study aims at evaluating the efficiency and quality of NextiaDI in automatically supporting the task of schema integration compared to a conventional schema integration pipeline. The study is, hence, divided in three tasks: (i) generation of source schemata, (ii) generation of an integrated schema, and (iii) generation of mappings. In the following, you can find the all material related to this survey:

Survey instructions

Download V1 Download V2
Survey datasets

Download
Pre-study questionnaire: participants answers

Download
Task 1: participants answers

Download
Task 2: participants answers

Download
Task 3: participants answers

Download
Jupyter Notebook

Go
ODIN system powered by NextiaDI

Go

Scalability experiments

We evaluate our two technical contributions (i.e., bootstrapping and schema integration) to assess their computational complexity and runtime performance. We provide you with detailed instructions on how to reproduce the experiments presented in our work and the data sources used in each scenario:

Demonstration

Notebook step by step

A live demo for learning how to use Nextia_DI is available here. Bear in mind that, in order to access them you must first login with the following credentials (user: user2, password: nextiadi). The login button can be found at the top right of the page.

Showcase

We showcase the effectiveness of Nextia_DI in ODIN tool by automatically generating all DI constructs. Before these constructs were created manually. Here you can access to ODIN tool.

Last update: 2022/10/12 by Javier Flores

: Incremental and Agnostic Data Integration

People

Javier Flores

Kashif Rabbani

Sergi Nadal

Cristina Gómez

Oscar Romero

Emmanuel Jamin

Stamatia Dasiopoulou