Skip to main content
Log in

VDoc+: a virtual document based approach for matching large ontologies using MapReduce

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

Many ontologies have been published on the Semantic Web, to be shared to describe resources. Among them, large ontologies of real-world areas have the scalability problem in presenting semantic technologies such as ontology matching (OM). This either suffers from too long run time or has strong hypotheses on the running environment. To deal with this issue, we propose a three-stage MapReduce-based approach V-Doc+ for matching large ontologies, based on the MapReduce framework and virtual document technique. Specifically, two MapReduce processes are performed in the first stage to extract the textual descriptions of named entities (classes, properties, and instances) and blank nodes, respectively. In the second stage, the extracted descriptions are exchanged with neighbors in Resource Description Framework (RDF) graphs to construct virtual documents. This extraction process also benefits from the MapReduce-based implementation. A word-weight-based partitioning method is proposed in the third stage to conduct parallel similarity calculation using the term frequency-inverse document frequency (TF-IDF) model. Experimental results on two large-scale real datasets and the benchmark testbed from Ontology Alignment Evaluation Initiative (OAEI) are reported, showing that the proposed approach significantly reduces the run time with minor loss in precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bethea, W.L., Fink, C.R., Beecher-Deighan, J.S., 2006. JHU/APL Onto-Mapology Results for OAEI 2006. Proc. ISWC Workshop on Ontology Matching, p.144–152.

  • Castano, S., Ferrara, A., Messa, G., 2006. Results of the HMatch Ontology Matchmaker in OAEI 2006. Proc. ISWC Workshop on Ontology Matching, p.134–143.

  • Dean, J., Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113. [doi:10.1145/1327452.1327492]

    Article  Google Scholar 

  • Do, H.H., Rahm, E., 2007. Matching large schemas: approaches and evaluation. Inform. Syst., 32(6):857–885. [doi:10.1016/j.is.2006.09.002]

    Article  Google Scholar 

  • Euzenat, J., Shvaiko, P., 2007. Ontology Matching. Springer, Heidelberg, Germany. [doi:10.1007/978-3-540-49612-0]

    MATH  Google Scholar 

  • Euzenat, J., Ferrara, A., Meilicke, C., Nikolov, A., Pane, J., Scharffe, F., Shvaiko, P., Stuckenschmidt, H., Šváb-Zamazal, O., Svátek, V., et al., 2010. Results of the Ontology Alignment Evaluation Initiative 2010. Proc. ISWC Workshop on Ontology Matching, p.85–117.

  • Gross, A., Hartung, M., Kirsten, T., Rahm, E., 2010. On matching large life science ontologies in parallel. LNCS, 6254:35–49. [doi:10.1007/978-3-642-15120-0_4]

    Google Scholar 

  • Hu, W., Qu, Y.Z., Cheng, G., 2008. Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng., 67(1): 140–160. [doi:10.1016/j.datak.2008.06.003]

    Article  Google Scholar 

  • Kotis, K., Valarakos, A.G., Vouros, G.A., 2006. AUTOMS: Automated Ontology Mapping Through Synthesis of Methods. Proc. ISWC Workshop on Ontology Matching, p.96–106.

  • Li, J.Z., Tang, J., Li, Y., Luo, Q., 2009. RiMOM: a dynamic multistrategy ontology alignment framework. IEEE Trans. Knowl. Data Eng., 21(8):1218–1232. [doi:10.1109/TKDE.2008.202]

    Article  Google Scholar 

  • Mao, M., Peng, Y.F., Spring, M., 2010. An adaptive ontology mapping approach with neural network based constraint satisfaction. Web Semant., 8(1):14–25. [doi:10.1016/j.websem.2009.11.002]

    Article  Google Scholar 

  • Mork, P., Bernstein, P., 2004. Adapting a Generic Match Algorithm to Align Ontologies of Human Anatomy. Proc. 20th Int. Conf. on Data Engineering, p.787–790. [doi:10.1109/ICDE.2004.1320047]

  • Nagy, M., Vargas-Vera, M., 2011. Multi-agent ontology mapping framework for the semantic Web. IEEE Trans. Syst. Man Cybern. A, 41(4):693–704. [doi:10.1109/TSMCA.2011.2132704]

    Article  Google Scholar 

  • Qu, Y.Z., Hu, W., Cheng, G., 2006. Constructing Virtual Documents for Ontology Matching. Proc. 15th Int. Conf. on World Wide Web, p.23–31. [doi:10.1145/1135777.1135786]

  • Rahm, E., 2011. Towards Large-Scale Schema and Ontology Matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (Eds.), Schema Matching and Mapping. Springer, Heidelberg, Germany, p.3–27. [doi:10.1007/978-3-642-16518-4_1]

    Chapter  Google Scholar 

  • Rosse, C., Mejino, J.L.V., 2008. The foundational model of anatomy ontology. Comput. Biol., 6(1):59–117. [doi:10.1007/978-1-84628-885-2_4]

    Article  Google Scholar 

  • Salton, G., McGill, M.J., 1986. Introduction to Modern Information Retrieval. McGraw-Hill, NY, USA.

    Google Scholar 

  • Shvaiko, P., Euzenat, J., 2008. Ten challenges for ontology matching. LNCS, 5332:1164–1182. [doi:10.1007/978-3-540-88873-4_18]

    Google Scholar 

  • van Hage, W.R., Sini, M., Finch, L., Kolb, H., Schreiber, G., 2010. The OAEI food task: an analysis of a thesaurus alignment task. Appl. Ontol., 5(1):1–28. [doi:10.3233/AO-2010-0072]

    Google Scholar 

  • Vernica, R., Carey, M., Li, C., 2010. Efficient Parallel Set-Similarity Joins Using MapReduce. Proc. Int. Conf. on Management of Data, p.495–506. [doi:10.1145/1807167.1807222]

  • Wang, P., Zhou, Y.M., Xu, B.W., 2011. Matching Large Ontologies Based on Reduction Anchors. Proc. 22nd Int. Joint Conf. on Artificial Intelligence, p.2343–2348. [doi:10.5591/978-1-57735-516-8/IJCAI11-390]

  • Watters, C., 1999. Information retrieval and the virtual document. J. Am. Soc. Inform. Sci., 50(11):1028–1029. [doi:10.1002/(SICI)1097-4571(1999)50:11〈1028::AID-ASI8〉3.0.CO;2-0]

    Article  Google Scholar 

  • Zhang, H., Hu, W., Qu, Y.Z., 2011. Constructing virtual documents for ontology matching using MapReduce. LNCS, 7185:48–63.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Hu.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61003018), the Natural Science Foundation of Jiangsu Province, China (No. BK2011189), and the National Social Science Foundation of China (No. 11AZD121)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Hu, W. & Qu, Yz. VDoc+: a virtual document based approach for matching large ontologies using MapReduce. J. Zhejiang Univ. - Sci. C 13, 257–267 (2012). https://doi.org/10.1631/jzus.C1101007

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1101007

Key words

CLC number

Navigation