Description
Research Collaboration: MapReduce Data Flow Scheduling
This collaboration aims at providing “proactive” scheduling mechanisms for data-intensive flows across the shared and distributed resources.
We tackle the problem of scheduling data-intensive flows focusing on self-adapting data distribution inside the cluster, based on the provided and/or predicted workload (i.e., both data and function shipping). Timely adapting data distribution to the workload will improve the performance of distributed data-intensive flows that are largely dependent on the locality of input data (e.g., MapReduce).
We consider a typical distributed data processing system (e.g., Hadoop), with different clients submitting data flows for execution (multi-tenancy).
Goals
- Improving the utilization, load and data balancing of the cluster resources.
- Maximizing the throughput of a distributed data processing system.
- Maximizing the satisfaction of data flows' Service Level Agreements (SLA).
- Enabling timely self-adapting of the system and the scheduling policies to provide the optimal data flow execution and to guarantee the satisfaction of the data flow's SLAs.
Research Collaboration: Self-Optimizing Data Stream Processing
This collaboration aims at enabling the Lambda-architecture with semantic-aware self-optimizing capabilities for optimal data stream processing.
Goals
- Refine the Lambda-architecture in order to provide semantic awareness to raw data.
- Study all characteristics that represent a data stream and that can be drivers of the self-optimizing process. Assess available options, study their interdependence and propose extensions.
- Study available benchmarks capable of varying the characteristics devised.
- Develop self-optimizing capabilities for data stream processing in the architecture.
Related publications
2019 |
---|
Muhammad Aamir Saleem, Rohit Kumar 0002, Toon Calders, Torben Bach Pedersen: Effective and efficient location influence mining in location-based social networks. Knowl. Inf. Syst. 2019 |
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An integration-oriented ontology to govern evolution in Big Data ecosystems. Inf. Syst. 2019 |
Ayman Alserafi, Alberto Abelló, Oscar Romero, Toon Calders: Keeping the Data Lake in Form: DS-kNN Datasets Categorization Using Proximity Mining. MEDI 2019 |
Jam Jahanzeb Khan Behan, Oscar Romero, Esteban Zimányi: Multidimensional Integration of RDF Datasets. DaWaK 2019 |
2018 |
---|
Rohit Kumar 0002, Toon Calders: 2SCENT: An Efficient Algorithm to Enumerate All Simple Temporal Cycles. Proc. VLDB Endow. 2018 |
Sergi Nadal, Alberto Abelló, Oscar Romero, Stijn Vansummeren, Panos Vassiliadis: MDM: Governing Evolution in Big Data Ecosystems. EDBT 2018 |
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems. CoRR 2018 |
Moditha Hewasinghage, Jovan Varga, Alberto Abelló, Esteban Zimányi: Managing Polyglot Systems Metadata with Hypergraphs. ER 2018 |
2017 |
---|
Rohit Kumar 0002, Alberto Abelló, Toon Calders: Cost Model for Pregel on GraphX. ADBIS 2017 |
Muhammad Aamir Saleem, Rohit Kumar 0002, Toon Calders, Xike Xie, Torben Bach Pedersen: IMaxer: A Unified System for Evaluating Influence Maximization in Location-based Social Networks. CIKM 2017 |
Rohit Kumar 0002, Toon Calders: Information Propagation in Interaction Networks. EDBT 2017 |
Rohit Kumar 0002, Toon Calders: Finding simple temporal cycles in an interaction network. TD-LSG@PKDD/ECML 2017 |
Rohit Kumar 0002, Muhammad Aamir Saleem, Toon Calders, Xike Xie, Torben Bach Pedersen: Activity-Driven Influence Maximization in Social Networks. ECML/PKDD (3) 2017 |
Muhammad Aamir Saleem, Rohit Kumar 0002, Toon Calders, Xike Xie, Torben Bach Pedersen: Location Influence in Location-based Social Networks. WSDM 2017 |
Sergi Nadal, Victor Herrero, Oscar Romero, Alberto Abelló, Xavier Franch, Stijn Vansummeren, Danilo Valerio: A software reference architecture for semantic-aware Big Data systems. Inf. Softw. Technol. 2017 |
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems. EDBT/ICDT Workshops 2017 |
Ayman Alserafi, Toon Calders, Alberto Abelló, Oscar Romero: DS-Prox: Dataset Proximity Mining for Governing the Data Lake. SISAP 2017 |
Sergi Nadal, Oscar Romero, Alberto Abelló, Panos Vassiliadis, Stijn Vansummeren: An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems. EDBT/ICDT Workshops 2017 |
Rohit Kumar 0002, Toon Calders: Finding simple temporal cycles in an interaction network. TD-LSG@PKDD/ECML 2017 |
2016 |
---|
Petar Jovanovic, Oscar Romero, Toon Calders, Alberto Abelló: H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution. ADBIS 2016 |
Ayman Alserafi, Alberto Abelló, Oscar Romero, Toon Calders: Towards Information Profiling: Data Lake Content Metadata Management. ICDM Workshops 2016 |
Esteban Zimányi, Alberto Abelló: Business Intelligence - 5th European Summer School, eBISS 2015, Barcelona, Spain, July 5-10, 2015, Tutorial Lectures eBISS 2016 |
Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro A. Vaisman, Esteban Zimányi: GRAD: On Graph Database Modeling. CoRR 2016 |