Data Integration over Distributed and Heterogeneous Data Endpoints
Citations Over TimeTop 10% of 2014 papers
Abstract
Data integration is a broad area encompassing techniques to merge data between data sources. Although there are plenty of efficient and effective methods focusing on data integration over homogeneous data, where instances share the same schema and range of values, their applications over heterogeneous data are less clear. This thesis considers data integration within the environment of the Semantic Web. More particularly, we propose a novel architecture for instance matching that takes into account the particularities of this heterogeneous and distributed setting. Instead of assuming that instances share the same schema, the proposed method operates even when there is no overlap between schemas, apart from a key label that matching instances must share. Moreover, we have considered the distributed nature of the Semantic Web to propose a new architecture for general data integration, which operates on-the-fly and in a pay-as-you-go fashion. We show that our view and the view of the traditional data integration school each only partially address the problem, but together complement each other. We have observed that this unified view gives a better insight into their relative importance and how data integration methods can benefit from their combination. The results achieved in this work are particularly interesting for the Semantic Web and Data Integration communities.
Related Papers
- → Data Integration of Heterogeneous Data Sources Using QR Decomposition(2015)6 cited
- → Structural schema integration with full and partial correspondence using the dual model(1992)20 cited
- → Contextualized Linguistic Matching for Heterogeneous Data Source Integration(2008)2 cited
- → Ontology-based Query Processing in a Dynamic Data Integration System(2013)
- Data Engineering: Modeling and Integration Issues(2008)