Skip to main content

Duplicate detection and deletion in the extended NF2 data model

  • Data Organizations For Extended DBMSs
  • Conference paper
  • First Online:
Book cover Foundations of Data Organization and Algorithms (FODO 1989)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 367))

Abstract

A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF2) data model. One particular development, the so-called extended NF2 data model, even permits structured values like lists and tuples to be included as attributes in relations. It is thus well suited to represent complex objects for non-standard database applications. A DBMS which uses this model, called the Advanced Information Management Prototype, is currently being implemented at the IBM Heidelberg Scientific Center. In this paper we examine the problem of detecting and deleting duplicates within this data model. Several alternative approaches are evaluated and a new method, based on sorting complex objects, is proposed, which is both time- and space-efficient.

on leave from TU Braunschweig, FB Informatik, D-3300 Braunschweig, West Germany

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Bidoit, N.: Non First Normal Form Relations: An Algebra Allowing Data Restructuring. Rapports de Recherche No. 347, Institut de Recherche en Informatique et en Automatique, Rocquencourt, France, Nov. 1984.

    Google Scholar 

  2. Aho, A.V., Hopcroft; J.E., Ullman, J.D.: Data Structures and Algorithms, Addison-Wesley, Reading, Mass., 1983.

    Google Scholar 

  3. Andersen, F., Linnemann, V., Pistor, P., Südkamp, N.: Advanced Information Management Prototype: User Manual for the Online Interface of the Heidelberg Database Language (HDBL) Prototype Implementation, Release 2.0, Technical Note TN 86.01, IBM Heidelberg Scientific Center, Jan. 1988.

    Google Scholar 

  4. Bitton, D., DeWitt, D.J.: Duplicate record elimination in large data files, ACM Trans. Database Syst., June 1983, pp. 255–265.

    Google Scholar 

  5. Bentley, J., Knuth, D.E., McIlroy, D.: Programming Pearls: A Literate Program, Comm. ACM, Vol. 29, No. 6, June 1986, pp. 471–483.

    Google Scholar 

  6. Chamberlin, D.D., et al.: Support of Repetitive Transactions and Ad Hoc Queries in System R., ACM Transactions on Database Systems, Vol. 6, No. 1, March 1981, pp. 70–94.

    Google Scholar 

  7. Chang, C.C.: A study of an ordered minimal perfect hashing scheme, Comm. ACM, Vol. 27, No. 4, April 1984, pp. 384–387

    Google Scholar 

  8. Cichelli, R.J.: Minimal Perfect Hash Functions Made Simple, Comm. ACM, Vol. 23, No. 1, Jan 1980, pp. 17–19.

    Google Scholar 

  9. Codd, E.F.: A Relational Model of Data for Large Shared Data Banks, Comm. ACM, Vol. 13, No. 6, June 1970.

    Google Scholar 

  10. Date, C.J.: An Introduction to Database Systems (3rd ed.), Addison-Wesley, Reading, Mass., 1981.

    Google Scholar 

  11. Dadam, P., et al.: A DBMS Prototype to Support Extended NF2 Relations: An Integrated View on Flat Tables and Hierarchies, Proc. ACM SIGMOD Int. Conf. on Management of Data, Washington, D.C., May 1986, pp. 356–367.

    Google Scholar 

  12. Dayal,U., Goodman,N., Katz,R.H.: An Extended Relational Algebra with Control Over Duplicate Elimination, Proc. ACM Symp. PoDS, Los Angeles, Cal., March 1982, pp. 117–123.

    Google Scholar 

  13. Faloutsos, C., Christodoulakis, S.: Signature Files: An Access Method for Documents and its Analytical Performance Evaluation, ACM TOOIS, Vol. 2, No. 4, Oct. 1984, pp. 267–288.

    Google Scholar 

  14. Floyd, R.W.: Algorithm 245, Treesort 3, Comm. ACM, Vol. 7, No. 12, Dec. 1964, p. 701.

    Google Scholar 

  15. Jaeschke, G.: Reciprocal Hashing: A Method for Generating Minimal Perfect Hashing Functions, Comm. ACM, Vol. 24, No. 12, Dec. 1981, pp. 829–833.

    Google Scholar 

  16. Jaeschke, G., Schek, H.-J.: Remarks on the Algebra of Non First Normal Form Relations, Proc. ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal., March 1982, pp. 124–138.

    Google Scholar 

  17. Khoshafian, S., Frank, D.: Implementation Techniques for Object Oriented Databases; in: Advances in Object-Oriented Database Systems, K.R. Dittrich (Ed.), Springer LNCS 334, Sept. 1988, pp. 60–79.

    Google Scholar 

  18. Knuth, D.E.: The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-Wesley, Reading, Mass., 1973.

    Google Scholar 

  19. Munro, I., Spira, P.M.: Sorting and Searching in Multisets, SIAM J. Comput., Vol. 5, No. 1, March 1976, pp. 1–8.

    Google Scholar 

  20. Pistor, P., Andersen, F.: Designing a Generalized NF2 Data Model with an SQL-Type Language Interface, Proc. 12th Int. Conf. on Very Large Data Bases, Kyoto, Japan, Aug. 1986, pp. 278–288

    Google Scholar 

  21. Pistor, P.: The Advanced Information Management Prototype: Architecture and Language Interface Overview, Proc. Troisièmes Journées Bases de Données Avancées, Port Camarque, France, May 1987 (invited paper).

    Google Scholar 

  22. Pistor, P., Traunmüller, R.: A Database Language for Sets, Lists and Tables, Information Systems, Vol. 11, No. 4, 1986, pp. 323–336.

    Google Scholar 

  23. Roth, M.A.: SQL/NF: A Query Language for-NF Relational Databases, Technical Report TR-85-19, Univ. of Texas at Austin, Dept. of Computer Science, Sept. 1985.

    Google Scholar 

  24. Sedgewick, R.: Algorithms, Addison-Wesley, Reading, Mass., 1983.

    Google Scholar 

  25. IBM Systems Journal, Special Issue on IMS, Vol. 16, No. 2, 1977.

    Google Scholar 

  26. Saake, G., Linnemann, V., Pistor, P., Wegner, L.: Sorting, Grouping, and Duplicate Elimination in the Advanced Information Management System, IBM Heidelberg Scientific Center (in preparation).

    Google Scholar 

  27. Sprugnoli, R.: Perfect hashing functions: A single probe retrieving method for static sets, Comm, ACM, Vol. 20, No. 11, Nov. 1977, pp. 841–850.

    Google Scholar 

  28. Schek, H.-J., Scholl, M.: The Relational Model with Relation-Valued Attributes, Information Systems, Vol. 11, No. 2, 1986, pp. 137–147.

    Google Scholar 

  29. Stonebraker, M., et al.: The Design and Implementation of Ingres, ACM Trans. on Database Systems, Vol. 1, No. 3, Sept. 1976, pp. 189–222.

    Google Scholar 

  30. Six, H.W., Wegner, L.: Sorting a Random Access File in Situ, Computer Journal, Vol. 27, No. 3, pp. 270–275, 1984.

    Google Scholar 

  31. Teuhola, J., Wegner, L.: The External Heapsort, IEEE Trans. Softw. Eng., 1988 (in print).

    Google Scholar 

  32. Teuhola, J., Wegner, L.: Linear Time, Minimal Space Duplicate Deletion, Math. Schriften Kassel, No. 2/89, January 1989.

    Google Scholar 

  33. Teuhola, J., Wegner, L.: A tale of sorts: duplicate deletion in Quicksort, Mergesort and Heapsort, 1989 (in prep.).

    Google Scholar 

  34. Wegner, L.: Quicksort for Equal Keys, IEEE Trans. on Computers, Vol. C-34, No. 4 (April 1985), pp. 362–367.

    Google Scholar 

  35. Wegner, L.: A Generalized, One-Way, Stackless Quicksort, BIT, Vol. 27, No. 1, pp. 44–48, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Witold Litwin Hans-Jörg Schek

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Küspert, K., Saake, G., Wegner, L. (1989). Duplicate detection and deletion in the extended NF2 data model. In: Litwin, W., Schek, HJ. (eds) Foundations of Data Organization and Algorithms. FODO 1989. Lecture Notes in Computer Science, vol 367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-51295-0_120

Download citation

  • DOI: https://doi.org/10.1007/3-540-51295-0_120

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-51295-0

  • Online ISBN: 978-3-540-46186-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics