Skip to main content

Towards Large-Scale Schema and Ontology Matching

  • Chapter
  • First Online:
Schema Matching and Mapping

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

The purely manual specification of semantic correspondences between schemas is almost infeasible for very large schemas or when many different schemas have to be matched. Hence, solving such large-scale match tasks asks for automatic or semiautomatic schema matching approaches. Large-scale matching needs especially to be supported for XML schemas and different kinds of ontologies due to their increasing use and size, e.g., in e-business and web and life science applications. Unfortunately, correctly and efficiently matching large schemas and ontologies are very challenging, and most previous match systems have only addressed small match tasks. We provide an overview about recently proposed approaches to achieve high match quality or/and high efficiency for large-scale matching. In addition to describing some recent matchers utilizing instance and usage data, we cover approaches on early pruning of the search space, divide and conquer strategies, parallel matching, tuning matcher combinations, the reuse of previous match results, and holistic schema matching. We also provide a brief comparison of selected match tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    F-Measure combines Recall and Precision, two standard measures to evaluate the effectiveness of schema matching approaches (Do et al. 2003).

References

  • Alexe B, Gubanov M, Hernandez MA, Ho H, Huang JW, Katsis Y, Popa L, Saha B, Stanoi I (2009) Simplifying information integration: Object-based flow-of-mappings framework for integration. In: Proceedings of BIRTE08 (business intelligence for the real-time enterprise) workshop. Lecture Notes in Business Information Processing, vol 27. Springer, Heidelberg, pp 108–121

    Google Scholar 

  • Algergawy A, Schallehn E, Saake G (2009) Improving XML schema matching performance using Prüfer sequences. Data Knowl Eng 68(8):728–747

    Article  Google Scholar 

  • Algergawy A et al. (2010) Combining schema and level-based matching for web service discovery. In: Proceedings of 10th international conference on web engineering (ICWE). Lecture Notes in Computer Science, vol 6189. Springer, Heidelberg, pp 114–128

    Google Scholar 

  • Aumueller D, Do HH, Massmann S, Rahm E (2005) Schema and ontology matching with COMA +  + . In: Proceedings of ACM SIGMOD conference, demo paper. ACM, NY, pp 906–908

    Google Scholar 

  • Avesani P, Giunchiglia F, Yatskevich M (2005) A large scale taxonomy mapping evaluation. In: Proceedings of international conference on semantic web (ICSW). LNCS, vol 3729. Springer, Heidelberg, pp 67–81

    Google Scholar 

  • Bellahsene Z, Duchateau F (2011) Tuning for schema matching. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping, Data-Centric Systems and Applications Series. Springer, Heidelberg

    Chapter  Google Scholar 

  • Bellahsene Z, Bonifati A, Duchateau F, Velegrakis Y (2011) On evaluating schema matching and mapping. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping, Data-Centric Systems and Applications Series. Springer, Heidelberg

    Chapter  Google Scholar 

  • Bernstein PA, Melnik S, Petropoulos M, Quix C (2004) Industrial-strength schema matching. ACM SIGMOD Rec 33(4):38–43

    Article  Google Scholar 

  • Bernstein PA, Melnik S, Churchill JE (2006) Incremental schema matching. In: Proceedings of VLDB, demo paper. VLDB Endowment, pp 1167–1170

    Google Scholar 

  • Chen K, Madhavan J, Halevy AY (2009) Exploring schema repositories with Schemr. In: Proceedings of ACM SIGMOD Conference, demo paper. ACM, NY, pp 1095–1098

    Google Scholar 

  • Cruz IF, Antonelli FP, Stroe C (2009) AgreementMaker: Efficient matching for large real-world schemas and ontologies. In: PVLDB, vol 2(2), demo paper. VLDB Endowment, pp 1586–1589

    Google Scholar 

  • Das Sarma A, Dong X, Halevy AY (2008) Bootstrapping pay-as-you-go data integration systems. In: Proceedings of ACM SIGMOD conference. ACM, NY, pp 861–874

    Google Scholar 

  • Das Sarma A, Dong X, Halevy AY (2011) Uncertainty in data integration and dataspace support platforms. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping, Data-Centric Systems and Applications Series. Springer, Heidelberg

    Google Scholar 

  • Do HH (2006) Schema Matching and Mapping-based Data Integration. Dissertation, Dept of Computer Science, Univ. of Leipzig

    Google Scholar 

  • Do HH, Rahm E (2002) COMA – A System for Flexible Combination of Schema Matching Approaches. Proceedings VLDB Conf., pp 610–621

    Google Scholar 

  • Do HH, Rahm E (2007) Matching large schemas: Approaches and evaluation. Inf Syst 32(6): 857–885

    Article  Google Scholar 

  • Do HH, Melnik S, Rahm E (2003) Comparison of schema matching evaluations. In: web, web-services, and database systems, LNCS, vol 2593. Springer, Heidelberg

    Google Scholar 

  • Doan A, Madhavan J, Dhamankar R, Domingos P, Halevy AY (2003) Learning to match ontologies on the semantic web. VLDB J 12(4):303–319

    Article  Google Scholar 

  • Dong X, Halevy AY, Madhavan J, Nemes E, Zhang J (2004) Similarity search for web services. In: Proceedings of VLDB conference. VLDB Endowment, pp 372–383

    Google Scholar 

  • Duchateau F, Coletta R, Bellahsene Z, Miller RJ (2009) (Not) yet another matcher. In: Proceedings of CIKM, poster paper. ACM, NY, pp 1537–1540

    Google Scholar 

  • Ehrig M, Staab S (2004) Quick ontology matching. In: Proceedings of international conference semantic web (ICSW). LNCS, vol 3298. Springer, Heidelberg, pp 683–697

    Google Scholar 

  • Ehrig M, Staab S, Sure Y (2005) Bootstrapping ontology alignment methods with APFEL. In: Proceedings of international conference on semantic web (ICSW). LNCS, vol 3729. Springer, Heidelberg, pp 1148–1149

    Google Scholar 

  • Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: A survey. IEEE Trans Knowl Data Eng 19(1):1–16

    Article  Google Scholar 

  • Elmeleegy H, Ouzzani M, Elmagarmid AK (2008): Usage-based schema matching. In: Proceedings of ICDE conference. IEEE Computer Society, Washington, DC, pp 20–29

    Google Scholar 

  • Euzenat J, Shvaiko P (2007) Ontology matching. Springer, Heidelberg

    MATH  Google Scholar 

  • Euzenat J et al. (2009) Results of the ontology alignment evaluation initiative 2009. In: Proceedings of the 4th international workshop on Ontology Matching (OM-2009)

    Google Scholar 

  • Fagin R, Haas LM, Hernández MA, Miller RJ, Popa L, Velegrakis Y (2009) Clio: Schema mapping creation and data exchange. In: Conceptual modeling: Foundations and applications. LNCS, vol 5600. Springer, Heidelberg

    Google Scholar 

  • Falconer SM, Noy NF (2011) Interactive techniques to support ontology mapping. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping. Data-Centric Systems and Applications Series. Springer, Heidelberg

    Google Scholar 

  • Gligorov R, ten Kate W, Aleksovski Z, van Harmelen F (2007) Using Google distance to weight approximate ontology matches. In: Proceedings WWW Conf., pp 767–776

    Google Scholar 

  • Gross A, Hartung M, Kirsten T, Rahm E (2010) On matching large life science ontologies in parallel. In: Proceedings of 7th international conference on data integration in the life sciences (DILS). LNCS, vol 6254. Springer, Heidelberg

    Google Scholar 

  • Gubanov M et al (2009) IBM UFO repository: Object-oriented data integration. PVLDB, demo paper. VLDB Endowment, pp 1598–1601

    Google Scholar 

  • Hamdi F, Safar B, Reynaud C, Zargayouna H (2009) Alignment-based partitioning of large-scale ontologies. In: Advances in knowledge discovery and management. Studies in Computational Intelligence Series. Springer, Heidelberg

    Google Scholar 

  • Hanif MS, Aono M (2009) An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. J Web Sem 7(4):344–356

    Article  Google Scholar 

  • He B, Chang KC (2006) Automatic complex schema matching across Web query interfaces: A correlation mining approach. ACM Trans. Database Syst 31(1):346–395

    Article  Google Scholar 

  • He H, Meng W, Yu CT, Wu Z (2004) Automatic integration of Web search interfaces with WISE-Integrator. VLDB J 13(3):256–273

    Article  Google Scholar 

  • Hu W, Qu Y, Cheng G (2008) Matching large ontologies: A divide-and-conquer-approach. Data Knowl Eng 67(1):140–160

    Article  Google Scholar 

  • Jean-Mary YR, Shironoshita EP, Kabuka MR (2009) Ontology matching with semantic verification. J Web Sem 7(3):235–251

    Article  Google Scholar 

  • Kappel G et al. (2007) Matching metamodels with semantic systems – An experience report. In: Proceedings of BTW workshop on model management, pp 1–15

    Google Scholar 

  • Kirsten T, Thor A, Rahm E (2007) Instance-based matching of large life science ontologies. In: Proceedings of data integration in the life sciences (DILS). LNCS, vol 4544. Springer, Heidelberg, pp 172–187

    Google Scholar 

  • Koepcke H, Rahm E (2010) Frameworks for entity matching: A comparison. Data Knowl Eng 69(2):197–210

    Article  Google Scholar 

  • Koudas N, Marathe A, Srivastava D (2004) Flexible string matching against large databases in practice. In: Proceedings of VLDB conference. VLDB Endowment, pp 1078–1086

    Google Scholar 

  • Lambrix P, Tan H, Xu W (2008) Literature-based alignment of ontologies. In: Proceedings of the 3rd International Workshop on Ontology Matching (OM-2008)

    Google Scholar 

  • Lee Y, Sayyadian M, Doan A, Rosenthal A (2007) eTuner: Tuning schema matching software using synthetic scenarios. VLDB J 16(1):97–122

    Article  Google Scholar 

  • Li J, Tang J, Li Y, Luo Q (2009) RiMOM: A dynamic multistrategy ontology alignment framework. IEEE Trans Knowl Data Eng 21(8):1218–1232

    Article  Google Scholar 

  • Madhavan J, Bernstein P A, Rahm E (2001) Generic Schema Matching with Cupid. In: Proceedings VLDB Conf., pp 49–58

    Google Scholar 

  • Madhavan J, Bernstein PA, Doan A, Halevy AY (2005) Corpus-based schema matching. In: Proceedings of ICDE conference. IEEE Computer Society, Washington, DC, pp 57–68

    Google Scholar 

  • Mao M, Peng Y, Spring M (2008) A harmony based adaptive ontology mapping approach. In: Proceedings of international conference on semantic web and web services (SWWS), pp 336–342

    Google Scholar 

  • Massmann S, Rahm E (2008) Evaluating instance-based matching of web directories. In: Proceedings of 11th international Workshop on the Web and Databases (WebDB 2008)

    Google Scholar 

  • Mork P, Seligman L, Rosenthal A, Korb J, Wolf C (2008) The harmony integration workbench. J Data Sem 11:65–93

    Google Scholar 

  • Nandi A, Bernstein PA (2009) HAMSTER: Using search clicklogs for schema and taxonomy matching. PVLDB, vol 2(1), pp 181–192

    Google Scholar 

  • Peukert E, Berthold H, Rahm E (2010a) Rewrite techniques for performance optimization of schema matching processes. In: Proceedings of 13th international conference on extending database technology (EDBT). ACM, NY, pp 453–464

    Google Scholar 

  • Peukert E, Massmann S, König K (2010b) Comparing similarity combination methods for schema matching. In: Proceedings of 40th annual conference of the German computer society (GI-Jahrestagung). Lecture Notes in Informatics 175, pp 692–701

    Google Scholar 

  • Pirrò G, Talia D (2010) UFOme: An ontology mapping system with strategy prediction capabilities. Data Knowl Eng 69(5):444–471

    Article  Google Scholar 

  • Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350

    Article  MATH  Google Scholar 

  • Rahm E, Do, HH, Massmann S (2004) Matching large XML schemas. SIGMOD Rec 33(4):26–31

    Article  Google Scholar 

  • Saha B, Stanoi I, Clarkson KL (2010) Schema covering: A step towards enabling reuse in information integration. In: Proceedings of ICDE conference, pp 285–296

    Google Scholar 

  • Saleem K, Bellahsene Z, Hunt E (2008) PORSCHE: Performance oriented SCHEma mediation. Inf Syst 33(7–8):637–657

    Article  Google Scholar 

  • SAP (2010) Warp10 community-based integration. https://cw.sdn.sap.com/cw/docs/DOC-120470 (white paper), https://cw.sdn.sap.com/cw/community/esc/cdg135. Accessed April 2010

  • Seligman L, Mork P, Halevy AY et al (2010) OpenII: An open source information integration toolkit. In: Proceedings of ACM SIGMOD conference. ACM, NY, pp 1057–1060

    Google Scholar 

  • Shi F, Li J et al (2009) Actively learning ontology matching via user interaction. In: Proceedings of international conference on semantic web (ICSW). Springer, Heidelberg, pp 585–600

    Google Scholar 

  • Smith K, Morse M, Mork P et al (2009) The role of schema matching in large enterprises. In: Proceedings of CIDR

    Google Scholar 

  • Spiliopoulos V, Vouros GA, Karkaletsis V (2010) On the discovery of subsumption relations for the alignment of ontologies. J Web Sem 8(1):69–88

    Article  Google Scholar 

  • Su W, Wang J, Lochovsky FH (2006) Holistic schema matching for web query interfaces. In: Proceedings of international conference on extending database technology (EDBT). Springer, Heidelberg, pp 77–94

    Google Scholar 

  • Tan H, Lambrix P (2007) A method for recommending ontology alignment strategies. In: Proceedings of international conference on semantic web (ICSW). LNCS, vol 4825. Springer, Heidelberg

    Google Scholar 

  • Thor A, Kirsten T, Rahm E (2007) Instance-based matching of hierarchical ontologies. In: Proceedings of 12th BTW conference (Database systems for business, technology and web). Lecture Notes in Informatics 103, pp 436–448

    Google Scholar 

  • Zhang S, Mork P, Bodenreider O, Bernstein PA (2007) Comparing two approaches for aligning representations of anatomy. Artif Intell Med 39(3):227–236

    Article  Google Scholar 

  • Zhong Q, Li H et al. (2009) A gauss function based approach for unbalanced ontology matching. In: Proceedings of ACM SIGMOD conference. ACM, NY, pp 669–680

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erhard Rahm .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rahm, E. (2011). Towards Large-Scale Schema and Ontology Matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds) Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16518-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16518-4_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16517-7

  • Online ISBN: 978-3-642-16518-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics