Description
For the realization of BI for masses, exploratory BI, self-service BI, and similar concepts, it is necessary to enable all users to self-support themselves in the analytical and maintenance tasks they need to perform. The user-centricity feature of these systems seeks to enable non-technical users to analyze data on demand. Thus, next generation BI systems should provide flexible means for such users to create the desired reports/data analysis. This assumption means that the system should be self-configurable and react to the day-by-day usage. For this reason, continuous monitoring of the system must take place in order to overcome potential bottlenecks of any kind (such as performance, information, design or quality bottlenecks).
Methodology
In this project we propose to monitor the BI system, gather relevant metadata for its assessment, and according to past evidence develop self-tuning features. In order to fulfill this objective several tasks must be undertaken. First, the main storage alternatives must be characterized (also including NoSQL trends). However, this classification should not only be model-based (e.g., relational, key-value, document-stores, graph databases, etc.) as is usually done, but it should also consider other decisions such as the system architecture (e.g., hash-based, clustered, in-memory, disk-based), design (e.g., fragmentation and replication capabilities, indexing), optimizations implemented by the query execution engine, etc.
Outcome
Once a clear characterization is done and relevant factors have been identified to choose between different storage options, given a certain workload (i.e., past evidence gathered in the system), the desired output would be a deterministic algorithm (probably cost-based) to enable self-tuning BI systems and, in turn, more user-friendly BI tools that bridge the gap between business needs and IT limitations.
Related publications
2020 |
---|
Julius Gonsior, Josephine Rehak, Maik Thiele, Elvis Koci, Michael Günther 0002, Wolfgang Lehner: Active Learning for Spreadsheet Cell Classification. EDBT/ICDT Workshops 2020 |
Rana Faisal Munir, Alberto Abelló, Oscar Romero, Maik Thiele, Wolfgang Lehner: A cost-based storage format selector for materialized results in big data frameworks. Distributed Parallel Databases 2020 |
Rana Faisal Munir, Alberto Abelló, Oscar Romero, Maik Thiele, Wolfgang Lehner: Configuring Parallelism for Hybrid Layouts Using Multi-Objective Optimization. Big Data 2020 |
Julius Gonsior, Josephine Rehak, Maik Thiele, Elvis Koci, Michael Günther 0002, Wolfgang Lehner: Active Learning for Spreadsheet Cell Classification. EDBT/ICDT Workshops 2020 |
2019 |
---|
Elvis Koci, Dana Kuban, Nico Luettig, Dominik Olwig, Maik Thiele, Julius Gonsior, Wolfgang Lehner, Oscar Romero: XLIndy: Interactive Recognition and Information Extraction in Spreadsheets. DocEng 2019 |
Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner: A Genetic-Based Search for Adaptive Table Recognition in Spreadsheets. ICDAR 2019 |
Elvis Koci, Maik Thiele, Josephine Rehak, Oscar Romero, Wolfgang Lehner: DECO: A Dataset of Annotated Spreadsheets for Layout and Table Recognition. ICDAR 2019 |
Rana Faisal Munir, Alberto Abelló, Oscar Romero, Maik Thiele, Wolfgang Lehner: Automatically Configuring Parallelism for Hybrid Layouts. ADBIS (Short Papers and Workshops) 2019 |
2018 |
---|
Elvis Koci, Maik Thiele, Wolfgang Lehner, Oscar Romero: Table Recognition in Spreadsheets via a Graph Representation. DAS 2018 |
Rana Faisal Munir, Sergi Nadal, Oscar Romero, Alberto Abelló, Petar Jovanovic, Maik Thiele, Wolfgang Lehner: Intermediate Results Materialization Selection and Format for Data-Intensive Flows. Fundam. Inform. 2018 |
Rana Faisal Munir, Alberto Abelló, Oscar Romero, Maik Thiele, Wolfgang Lehner: ATUN-HL: Auto Tuning of Hybrid Layouts Using Workload and Data Characteristics. ADBIS 2018 |
Rana Faisal Munir, Alberto Abelló, Oscar Romero, Maik Thiele, Wolfgang Lehner: A Cost-based Storage Format Selector for Materialization in Big Data Frameworks. CoRR 2018 |
2016 |
---|
Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner: A Machine Learning Approach for Layout Inference in Spreadsheets. KDIR 2016 |
Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner: Cell Classification for Layout Recognition in Spreadsheets. IC3K 2016 |
Katrin Braunschweig, Maik Thiele, Elvis Koci, Wolfgang Lehner: Putting Web Tables into Context. KDIR 2016 |
Rana Faisal Munir, Oscar Romero, Alberto Abelló, Besim Bilalli, Maik Thiele, Wolfgang Lehner: ResilientStore: A Heuristic-Based Data Format Selector for Intermediate Results. MEDI 2016 |
Vasileios Theodorou, Alberto Abelló, Wolfgang Lehner, Maik Thiele: Quality measures for ETL processes: from goals to implementation. Concurr. Comput. Pract. Exp. 2016 |