DTIM | UPC

Université Libre de Bruxelles (ULB)

Esteban Zimányi, Stijn Vansummeren, Toon Calders

Description

Research Collaboration: MapReduce Data Flow Scheduling

This collaboration aims at providing “proactive” scheduling mechanisms for data-intensive flows across the shared and distributed resources.

We tackle the problem of scheduling data-intensive flows focusing on self-adapting data distribution inside the cluster, based on the provided and/or predicted workload (i.e., both data and function shipping). Timely adapting data distribution to the workload will improve the performance of distributed data-intensive flows that are largely dependent on the locality of input data (e.g., MapReduce).

We consider a typical distributed data processing system (e.g., Hadoop), with different clients submitting data flows for execution (multi-tenancy).

Goals

Improving the utilization, load and data balancing of the cluster resources.
Maximizing the throughput of a distributed data processing system.
Maximizing the satisfaction of data flows' Service Level Agreements (SLA).
Enabling timely self-adapting of the system and the scheduling policies to provide the optimal data flow execution and to guarantee the satisfaction of the data flow's SLAs.

Research Collaboration: Self-Optimizing Data Stream Processing

This collaboration aims at enabling the Lambda-architecture with semantic-aware self-optimizing capabilities for optimal data stream processing.

Goals

Refine the Lambda-architecture in order to provide semantic awareness to raw data.
Study all characteristics that represent a data stream and that can be drivers of the self-optimizing process. Assess available options, study their interdependence and propose extensions.
Study available benchmarks capable of varying the characteristics devised.
Develop self-optimizing capabilities for data stream processing in the architecture.

Related publications

2023
Sergi Nadal, Alberto Abelló, Oscar Romero, Stijn Vansummeren, Panos Vassiliadis: Graph-Driven Federated Data Management. IEEE Trans. Knowl. Data Eng. 2023
Moditha Hewasinghage, Sergi Nadal, Alberto Abelló, Esteban Zimányi: Automated database design for document stores with multicriteria optimization. Knowl. Inf. Syst. 2023

2021
Amine Ghrab, Oscar Romero, Sabri Skhiri, Esteban Zimányi: TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes. Inf. Syst. Frontiers 2021
Moditha Hewasinghage, Alberto Abelló, Jovan Varga, Esteban Zimányi: A cost model for random access queries in document stores. VLDB J. 2021
Moditha Hewasinghage, Alberto Abelló, Jovan Varga, Esteban Zimányi: Managing polyglot systems metadata with hypergraphs. Data Knowl. Eng. 2021

2020
Moditha Hewasinghage, Alberto Abelló, Jovan Varga, Esteban Zimányi: DocDesign: Cost-Based Database Design for Document Stores. SSDBM 2020
Ayman Alserafi, Alberto Abelló, Oscar Romero, Toon Calders: Keeping the Data Lake in Form: Proximity Mining for Pre-Filtering Schema Matching. ACM Trans. Inf. Syst. 2020

2019
Muhammad Aamir Saleem, Rohit Kumar 0002, Toon Calders, Torben Bach Pedersen: Effective and efficient location influence mining in location-based social networks. Knowl. Inf. Syst. 2019
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An integration-oriented ontology to govern evolution in Big Data ecosystems. Inf. Syst. 2019
Ayman Alserafi, Alberto Abelló, Oscar Romero, Toon Calders: Keeping the Data Lake in Form: DS-kNN Datasets Categorization Using Proximity Mining. MEDI 2019
Jam Jahanzeb Khan Behan, Oscar Romero, Esteban Zimányi: Multidimensional Integration of RDF Datasets. DaWaK 2019

2018
Rohit Kumar 0002, Toon Calders: 2SCENT: An Efficient Algorithm to Enumerate All Simple Temporal Cycles. Proc. VLDB Endow. 2018
Sergi Nadal, Alberto Abelló, Oscar Romero, Stijn Vansummeren, Panos Vassiliadis: MDM: Governing Evolution in Big Data Ecosystems. EDBT 2018
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems. CoRR 2018
Moditha Hewasinghage, Jovan Varga, Alberto Abelló, Esteban Zimányi: Managing Polyglot Systems Metadata with Hypergraphs. ER 2018

2017
Rohit Kumar 0002, Alberto Abelló, Toon Calders: Cost Model for Pregel on GraphX. ADBIS 2017
Muhammad Aamir Saleem, Rohit Kumar 0002, Toon Calders, Xike Xie, Torben Bach Pedersen: IMaxer: A Unified System for Evaluating Influence Maximization in Location-based Social Networks. CIKM 2017
Rohit Kumar 0002, Toon Calders: Information Propagation in Interaction Networks. EDBT 2017
Rohit Kumar 0002, Toon Calders: Finding simple temporal cycles in an interaction network. TD-LSG@PKDD/ECML 2017
Rohit Kumar 0002, Muhammad Aamir Saleem, Toon Calders, Xike Xie, Torben Bach Pedersen: Activity-Driven Influence Maximization in Social Networks. ECML/PKDD (3) 2017
Muhammad Aamir Saleem, Rohit Kumar 0002, Toon Calders, Xike Xie, Torben Bach Pedersen: Location Influence in Location-based Social Networks. WSDM 2017
Sergi Nadal, Victor Herrero, Oscar Romero, Alberto Abelló, Xavier Franch, Stijn Vansummeren, Danilo Valerio: A software reference architecture for semantic-aware Big Data systems. Inf. Softw. Technol. 2017
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems. EDBT/ICDT Workshops 2017
Ayman Alserafi, Toon Calders, Alberto Abelló, Oscar Romero: DS-Prox: Dataset Proximity Mining for Governing the Data Lake. SISAP 2017
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems. EDBT/ICDT Workshops 2017
Rohit Kumar 0002, Toon Calders: Finding simple temporal cycles in an interaction network. TD-LSG@PKDD/ECML 2017

2016
Petar Jovanovic, Oscar Romero, Toon Calders, Alberto Abelló: H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution. ADBIS 2016
Ayman Alserafi, Alberto Abelló, Oscar Romero, Toon Calders: Towards Information Profiling: Data Lake Content Metadata Management. ICDM Workshops 2016
Esteban Zimányi, Alberto Abelló: Business Intelligence - 5th European Summer School, eBISS 2015, Barcelona, Spain, July 5-10, 2015, Tutorial Lectures eBISS 2016
Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro A. Vaisman, Esteban Zimányi: GRAD: On Graph Database Modeling. CoRR 2016

2015
Rohit Kumar 0002, Toon Calders, Aristides Gionis, Nikolaj Tatti: Maintaining Sliding-Window Neighborhood Profiles in Interaction Networks. ECML/PKDD (2) 2015
Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro A. Vaisman, Esteban Zimányi: A Framework for Building OLAP Cubes on Graphs. ADBIS 2015