Abstract
Attribute-level schema matching is a critical step in numerous database applications, such as DataSpaces, Ontology Merging and Schema Integration. There exist many researches on this topic, however, they ignore the implicit categorical information which is crucial to find high-quality matches between schema attributes. In this paper, we discover the categorical semantics implicit in source instances, and associate them with the matches in order to improve overall quality of schema matching. Our method works in three phases. The first phase is a pre-detecting step that detects the possible categories of source instances by using clustering techniques. In the second phase, we employ information entropy to find the attributes whose instances imply the categorical semantics. In the third phase, we introduce a new concept c-mapping to represent the associations between the matches and the categorical semantics. Then, we employ an adaptive scoring function to evaluate the c-mappings to achieve the task of associating the matches with the semantics. Moreover, we show how to translate the matches with semantics into schema mapping expressions, and use the chase procedure to transform source data into target schemas. An experimental study shows that our approach is effective and has good performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Miller, R.J., Haas, L.M., Hernandez, M.A.: Schema Mapping as Query Discovery. In: Proc. of VLDB, pp. 77–99 (2000)
Doan, A.: Illinois semantic integration archive
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Fagin, R., Kolaitis, P., Miller, R., Popa, L.: Data exchange: Semantics and query answering. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 207–224. Springer, Heidelberg (2002)
Warren, R.H., Tompa, F.: Multicolumn Substring Matching for Database Schema Translation. In: Proc. of VLDB, pp. 331–342 (2006)
Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting context into schema matching. In: Proc. of VLDB, pp. 307–318 (2006)
Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proc. of VLDB, pp. 687–698 (2007)
An, Y., Borgid, A., Miller, R.J.: A semantic approach to discovering schema mapping expressions. In: Proc. of ICDE, pp. 206–215 (2007)
Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping Pay-As-You-Go Data Integration Systems. In: Proc. of SIGMOD, pp. 861–874 (2008)
Chan, C., Elmeleegy, H.V.J.H., Ouzzani, M., Elmagarmid, A.: Usage-Based Schema Matching. In: Proc. of ICDE, pp. 20–29 (2008)
Mecca, G., Papotti, P., Raunich, S.: Core Schema Mappings. In: Proc. of SIGMOD, pp. 655–668 (2009)
Radwan, A., Popa, L., Stanoi, I.R., Younis, A.: Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences. In: Proc. of SIGMOD, pp. 641–654 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ding, G., Wang, G. (2011). Discovering Implicit Categorical Semantics for Schema Matching. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20152-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-20152-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20151-6
Online ISBN: 978-3-642-20152-3
eBook Packages: Computer ScienceComputer Science (R0)