Rudra Pratap Deb Nath
Research topic: Data Integration and ETL for Semantic Data
Home University: Aalborg University (AAU)
Host University: Universitat Politècnica de Catalunya (UPC)
Advisor (Home University): Torben Bach Pedersen (AAU)
Advisor (Host University): Oscar Romero (UPC)
Research Interests: Semantic Data Warehousing, Exploratory Business Intelligence, Knowledge integration and engineering, Semantic Web, Affective Computing.
EDUCATION
September 2014 to the present:
Doctoral Candidate IT4BI. Aalborg University, Universitat Politècnica de Catalunya
October 2010 - September 2012:
MEng., Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan
Thesis title: “An Efficient and Scalable Approach for Ontology Instance Matching”
September 2004 - March 2009:
BSc(Eng), Computer Science and Engineering. University of Chittagong (CU), Chittagong, Bangladesh
Thesis title: “Prediction of Travel Time using Modified K-means Clustering Method on Historical Traffic Data”
PROFESSIONAL EXPERIENCE
September 2014 to the present:
PhD fellow, Computer Science and Engineering, Aalborg, Denmark
March 2013 to the present (on study leave):
Lecturer, Computer Science and Engineering, University of Chittagong, Chittagong, Bangladesh
October 2010 - September 2012
Research Assistant, Knowledge and Data Engineering Lab (KDE), Toyohashi University of Technology, Aichi, Japan
January 2010 - August 2010
Lecturer, Computer Science, Chittagong Cantonment Public College, Chittagong, Bangladesh
RESEARCH
In recent years, more and more semantic data has become freely available on the Web; websites are annotated with RDF markup, data collections are offered for download, and even interfaces for structured queries over such data can be used free of charge. One of the reasons why semantic data has become successful is that publishing and making data available is low-effort and does not rely on a sophisticated schema. Instead, various standard ontologies and self-designed extensions can be used. Being an advantage of the Semantic Web paradigm that the data format is highly flexible, this is a disadvantage during the ETL process where the schema plays an important role. In addition, the schema of semantic data is often not known beforehand, but is encoded as part of the dataset itself. Furthermore, many sources have been automatically generated by converting other data formats into RDF or by information extraction techniques, and hence yield errors. Thus, in addition to the heterogeneities that ETL for traditional data has to deal with, additional challenges arise for semantic data, especially regarding cleansing and duplicate detection. The aim of this topic is to develop an approach that enables the ETL process for semantic data despite the above mentioned problems by (1) developing scalable data integration techniques that can handle multiple semantic data sources, (2) implementing an appropriate environment to facilitate the ETL process, and (3) evaluating the proposed solutions.