skip to main content
research-article

An information-theoretic analysis of worst-case redundancy in database design

Published:15 February 2008Publication History
Skip Abstract Section

Abstract

Normal forms that guide the process of database schema design have several key goals such as elimination of redundancies and preservation of integrity constraints, such as functional dependencies. It has long been known that complete elimination of redundancies and complete preservation of constraints cannot be achieved simultaneously. In this article, we use a recently introduced information-theoretic framework, and provide a quantitative analysis of the redundancy/integrity preservation trade-off, and give techniques for comparing different schema designs in terms of the amount of redundancy they carry.

The main notion of the information-theoretic framework is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. We start by providing a combinatorial criterion that lets us calculate, for a relational schema with functional dependencies, the lowest information content in its instances. This indicates how good the schema design is in terms of allowing redundant information. We then study the normal form 3NF, which tolerates some redundancy to guarantee preservation of functional dependencies. The main result provides a formal justification for normal form 3NF by showing that this normal form pays the smallest possible price, in terms of redundancy, for achieving dependency preservation. We also give techniques for quantitative comparison of different normal forms based on the redundancy they tolerate.

References

  1. Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aho, A. V., Beeri, C., and Ullman, J. D. 1979. The theory of joins in relational databases. ACM Trans. Datab. Syst. 4, 3, 297--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arenas, M. and Libkin, L. 2004. A normal form for XML documents. ACM Trans. Datab. Syst. 29, 195--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arenas, M. and Libkin, L. 2005. An information-theoretic approach to normal forms for relational and XML data. J. ACM 52, 2, 246--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beeri, C., Bernstein, P. A., and Goodman, N. 1978. A sophisticate's introduction to database normalization theory. In Proceedings of the 4th International Conference on Very Large Data Bases. 113--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beeri, C., Dowd, M., Fagin, R., and Statman, R. 1984. On the structure of Armstrong relations for functional dependencies. J. ACM 31, 1, 30--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bernstein, P. A. 1976. Synthesizing third normal form relations from functional dependencies. ACM Trans. Datab. Syst. 1, 4, 277--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bernstein, P. A. and Goodman, N. 1980. What does boyce-codd normal form do? In Proceedings of the 6th International Conference on Very Large Data Bases. IEEE Computer Society, 245--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Biskup, J. 1995. Achievements of relational database schema design theory revisited. In Semantics in Databases. 29--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Biskup, J., Dayal, U., and Bernstein, P. A. 1979. Synthesizing independent database schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 143--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Biskup, J. and Meyer, R. 1987. Design of relational database schemes by deleting attributes in the canonical decomposition. J. Comput. Syst. Sci. 35, 1, 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cavallo, R. and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of the 13th International Conference on Very Large Data Bases. 71--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. John Wiley and Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dalkilic, M. M. and Robertson, E. L. 2000. Information dependencies. In Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 245--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Demetrovics, J. and Thi, V. 1987. Keys, antikeys and prime attributes. Annales Univ. Sci., Sect. Comp., Budapest 8, 35--52.Google ScholarGoogle Scholar
  16. Dewson, R. 2006. Beginning SQL Server 2005 for Developers: From Novice to Professional. Apress. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fagin, R. 1979. Normal forms and relational database operators. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 153--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fagin, R. 1981. A normal form for relational databases that is based on domians and keys. ACM Trans. Datab. Syst. 6, 3, 387--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Greenwald, R., Stackowiak, R., and Stern, J. 2007. Oracle Essentials: Oracle Database 11g, 4th Ed. O'Reilly Media.Google ScholarGoogle Scholar
  20. Kanellakis, P. C. 1990. Elements of relational database theory. In Handbook of Theoretical Computer Science, Vol. B: Formal Models and Semantics, 1073--1156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kifer, M., Bernstein, A., and Lewis, P. M. 2006. Database Systems: An Application-Oriented Approach. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kolahi, S. 2007. Dependency-Preserving normalization of relational and XML data. J. Comput. Syst. Sci. 73, 4, 636--647. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kolahi, S. and Libkin, L. 2006. On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 114--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. LeDoux, C. H. and Parker, D. S. 1982. Reflections on boyce-codd normal form. In Proceedings of the 8th International Conference on Very Large Data Bases. Morgan Kaufmann, 131--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lee, T. T. 1987. An information-theoretic analysis of relational databases - Part i: Data dependencies and information metric. IEEE Trans. Softw. Engin. 13, 10, 1049--1061. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Levene, M., Levene, M., and Loizou, G. 1999. A Guided Tour of Relational Databases and Beyond. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Levene, M. and Loizou, G. 2003. Why is the snowflake schema a good data warehouse design? Inf. Syst. 28, 3, 225--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Levene, M. and Vincent, M. W. 2000. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Engin. 12, 2, 281--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ling, T. W., Tompa, F. W., and Kameda, T. 1981. An improved third normal form for relational databases. ACM Trans. Datab. Syst. 6, 2, 329--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mannila, H. and Räihä, K.-J. 1986. Design by example: An application of Armstrong relations. J. Comput. Syst. Sci. 33, 3, 126--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Stephens, R. K. and Plew, R. R. 2002. Sams Teach Yourself SQL in 21 Days 4th Ed. Sams. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Valiant, L. G. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3, 410--421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vincent, M. W. 1999. Semantic foundations of 4NF in relational database design. Acta Inf. 36, 3, 173--213.Google ScholarGoogle ScholarCross RefCross Ref
  34. Zaniolo, C. 1982. A new normal form for the design of relational database schemata. ACM Trans. Datab. Syst. 7, 3, 489--499. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An information-theoretic analysis of worst-case redundancy in database design

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Database Systems
            ACM Transactions on Database Systems  Volume 35, Issue 1
            February 2010
            310 pages
            ISSN:0362-5915
            EISSN:1557-4644
            DOI:10.1145/1670243
            Issue’s Table of Contents

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Accepted: 1 August 2009
            • Revised: 1 May 2009
            • Received: 1 September 2008
            • Published: 15 February 2008
            Published in tods Volume 35, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader