Metadata ManagementVictor Herrero, Alberto Abelló, Besim Bilalli, Ayman Elserafi, Pedro González, Sergi Nadal, Oscar Romero, Jovan Varga
Metadata is data which describes data. In essence, metadata is related to all descriptions of data stored in data repositories, such as Data Warehouses (DW) and Data Lakes (DL). This can include metadata such as: data provenance, information content, and semantics. Data provenance includes metadata which describe the history of the data residing in the data repositories, for example, the source of the data, the transformations which were done to process the data, and the owners of the data. Information content metadata is related to describing the structure of the data (i.e. data schemas) and the type of information stored by the data (i.e. the topics and keywords describing the data). Semantic metadata includes descriptions of the meaning of the data and the business definition of what the data means for the business owner.
Metadata Management is the science which studies what metadata is required for Business Intelligence (BI) and how to collect such metadata in the dynamic enterprise environment. This includes the techniques for accurately and efficiently collecting the metadata, in addition to the processes of updating, utilizing, and maintaining such metadata for BI. The goals of managing such metadata is to support the data governance processes, to support the automation & optimization of the DW / DL, and to support user assitance for data analytics and BI.
Research Line: Managing Metadata for Analytics
There is currently an important need for metadata Master Data Management (MDM) and data governance inside the DL. There is a need to help the user understand the information owned in the data sources and to optimize the collection of the metadata efficently from the DL, and within distributed and streaming environments. In addition, there should be a capability to facilitate the optimized automatic collection of statistical and profiling metadata (i.e. metadata which describes the data profiles). This includes supporting landmarking of semi-structured data (e.g. XML) to detect important features and preparation of the data from such kind of sources for analytics. Those topics are researched within the DL environment using experimentation and simulation.
2005 Alberto Abelló, Xavier de Palol, Mohand-Said Hacid: On the Midpoint of a Set of XML Documents. DEXA 2005