ABSTRACT
Discovering links between different data items in a single data source or across different data sources is a challenging problem faced by many information systems today. In particular, the recent Linking Open Data (LOD) community project has highlighted the paramount importance of establishing semantic links among web data sources. Currently, LOD sources provide billions of RDF triples, but only millions of links between data sources. Many of these data sources are published using tools that operate over relational data stored in a standard RDBMS. In this paper, we present a framework for discovery of semantic links from relational data. Our framework is based on declarative specification of linkage requirements by a user. We illustrate the use of our framework using several link discovery algorithms on a real world scenario. Our framework allows data publishers to easily find and publish high-quality links to other data sources, and therefore could significantly enhance the value of the data in the next generation of web.
- A. Arasu, V. Ganti, and R. Kaushik. Efficient Exact Set-Similarity Joins. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pages 918--929, 2006. Google ScholarDigital Library
- S. Auer, S. Dietzold, J. Lehmann, S. Hellmann, and D. Aumueller. Triplify: Light-Weight Linked Data Publication from Relational Databases. In Int'l World Wide Web Conference (WWW), pages 621--630, 2009. Google ScholarDigital Library
- R. J. Bayardo, Y. Ma, and R. Srikant. Scaling Up All Pairs Similarity Search. In Int'l World Wide Web Conference (WWW), pages 131--140, Banff, Canada, 2007. Google ScholarDigital Library
- I. Bhattacharya and L. Getoor. Query-time Entity Resolution. Journal of Artificial Intelligence Research (JAIR), 30:621--657, 2007. Google ScholarDigital Library
- A. Bilke, J. Bleiholder, C. Böhm, K. Draba, F. Naumann, and M. Weis. Automatic Data Fusion with HumMer. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pages 1251--1254, 2005. Google ScholarDigital Library
- C. Bizer, T. Heath, and T. Berners-Lee. Linked Data: Principles and State of the Art. In Int'l World Wide Web Conference (WWW), November 2008.Google Scholar
- C. Bizer and A. Seaborne. D2RQ -- Treating Non-RDF Databases as Virtual RDF Graphs. In Proc. of the Int'l Semantic Web Conference (ISWC), November 2004.Google Scholar
- S. Das, E. I. Chong, G. Eadon, and J. Srinivasan. Supporting Ontology-Based Semantic matching in RDBMS. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pages 1054--1065, 2004. Google ScholarDigital Library
- S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation. In Int'l World Wide Web Conference (WWW), 2003. Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering, 19(1):1--16, 2007. Google ScholarDigital Library
- H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pages 371--380, 2001. Google ScholarDigital Library
- L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate String Joins in a Database (Almost) for Free. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB), pages 491--500, 2001. Google ScholarDigital Library
- O. Hassanzadeh. Benchmarking Declarative Approximate Selection Predicates. Master's thesis, University of Toronto, February 2007.Google Scholar
- P. Indyk, R. Motwani, P. Raghavan, and S. Vempala. Locality-Preserving Hashing in Multidimensional Spaces. In ACM Symp. on Theory of Computing (STOC), pages 618--625, 1997. Google ScholarDigital Library
- A. Kementsietsidis, L. Lim, and M. Wang. Supporting Ontology-based Keyword Search over Medical Databases. In Proceedings of the AMIA 2008 Symposium, pages 409--13. American Medical Informatics Association, 2008.Google Scholar
Index Terms
- A framework for semantic link discovery over relational data
Recommendations
A declarative framework for semantic link discovery over relational data
WWW '09: Proceedings of the 18th international conference on World wide webIn this paper, we present a framework for online discovery of semantic links from relational data. Our framework is based on declarative specification of the linkage requirements by the user, that allows matching data items in many real-world scenarios. ...
Collaborative semantic association discovery from linked data
IRI'09: Proceedings of the 10th IEEE international conference on Information Reuse & IntegrationThe efforts of publishing and interlinking structured data on the Semantic Web will result in a global network of databases, or the Linked Data, which provides huge potential for discovering hidden relationships. We present a multi-agent framework for ...
A domain independent framework for extracting linked semantic data from tables
Search ComputingVast amounts of information is encoded in tables found in documents, on the Web, and in spreadsheets or databases. Integrating or searching over this information benefits from understanding its intended meaning and making it explicit in a semantic ...
Comments