Abstract
Normal forms that guide the process of database schema design have several key goals such as elimination of redundancies and preservation of integrity constraints, such as functional dependencies. It has long been known that complete elimination of redundancies and complete preservation of constraints cannot be achieved simultaneously. In this article, we use a recently introduced information-theoretic framework, and provide a quantitative analysis of the redundancy/integrity preservation trade-off, and give techniques for comparing different schema designs in terms of the amount of redundancy they carry.
The main notion of the information-theoretic framework is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. We start by providing a combinatorial criterion that lets us calculate, for a relational schema with functional dependencies, the lowest information content in its instances. This indicates how good the schema design is in terms of allowing redundant information. We then study the normal form 3NF, which tolerates some redundancy to guarantee preservation of functional dependencies. The main result provides a formal justification for normal form 3NF by showing that this normal form pays the smallest possible price, in terms of redundancy, for achieving dependency preservation. We also give techniques for quantitative comparison of different normal forms based on the redundancy they tolerate.
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Aho, A. V., Beeri, C., and Ullman, J. D. 1979. The theory of joins in relational databases. ACM Trans. Datab. Syst. 4, 3, 297--314. Google ScholarDigital Library
- Arenas, M. and Libkin, L. 2004. A normal form for XML documents. ACM Trans. Datab. Syst. 29, 195--232. Google ScholarDigital Library
- Arenas, M. and Libkin, L. 2005. An information-theoretic approach to normal forms for relational and XML data. J. ACM 52, 2, 246--283. Google ScholarDigital Library
- Beeri, C., Bernstein, P. A., and Goodman, N. 1978. A sophisticate's introduction to database normalization theory. In Proceedings of the 4th International Conference on Very Large Data Bases. 113--124. Google ScholarDigital Library
- Beeri, C., Dowd, M., Fagin, R., and Statman, R. 1984. On the structure of Armstrong relations for functional dependencies. J. ACM 31, 1, 30--46. Google ScholarDigital Library
- Bernstein, P. A. 1976. Synthesizing third normal form relations from functional dependencies. ACM Trans. Datab. Syst. 1, 4, 277--298. Google ScholarDigital Library
- Bernstein, P. A. and Goodman, N. 1980. What does boyce-codd normal form do? In Proceedings of the 6th International Conference on Very Large Data Bases. IEEE Computer Society, 245--259. Google ScholarDigital Library
- Biskup, J. 1995. Achievements of relational database schema design theory revisited. In Semantics in Databases. 29--54. Google ScholarDigital Library
- Biskup, J., Dayal, U., and Bernstein, P. A. 1979. Synthesizing independent database schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 143--151. Google ScholarDigital Library
- Biskup, J. and Meyer, R. 1987. Design of relational database schemes by deleting attributes in the canonical decomposition. J. Comput. Syst. Sci. 35, 1, 1--22. Google ScholarDigital Library
- Cavallo, R. and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of the 13th International Conference on Very Large Data Bases. 71--81. Google ScholarDigital Library
- Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. John Wiley and Sons. Google ScholarDigital Library
- Dalkilic, M. M. and Robertson, E. L. 2000. Information dependencies. In Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 245--253. Google ScholarDigital Library
- Demetrovics, J. and Thi, V. 1987. Keys, antikeys and prime attributes. Annales Univ. Sci., Sect. Comp., Budapest 8, 35--52.Google Scholar
- Dewson, R. 2006. Beginning SQL Server 2005 for Developers: From Novice to Professional. Apress. Google ScholarDigital Library
- Fagin, R. 1979. Normal forms and relational database operators. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 153--160. Google ScholarDigital Library
- Fagin, R. 1981. A normal form for relational databases that is based on domians and keys. ACM Trans. Datab. Syst. 6, 3, 387--415. Google ScholarDigital Library
- Greenwald, R., Stackowiak, R., and Stern, J. 2007. Oracle Essentials: Oracle Database 11g, 4th Ed. O'Reilly Media.Google Scholar
- Kanellakis, P. C. 1990. Elements of relational database theory. In Handbook of Theoretical Computer Science, Vol. B: Formal Models and Semantics, 1073--1156. Google ScholarDigital Library
- Kifer, M., Bernstein, A., and Lewis, P. M. 2006. Database Systems: An Application-Oriented Approach. Addison-Wesley. Google ScholarDigital Library
- Kolahi, S. 2007. Dependency-Preserving normalization of relational and XML data. J. Comput. Syst. Sci. 73, 4, 636--647. Google ScholarDigital Library
- Kolahi, S. and Libkin, L. 2006. On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 114--123. Google ScholarDigital Library
- LeDoux, C. H. and Parker, D. S. 1982. Reflections on boyce-codd normal form. In Proceedings of the 8th International Conference on Very Large Data Bases. Morgan Kaufmann, 131--141. Google ScholarDigital Library
- Lee, T. T. 1987. An information-theoretic analysis of relational databases - Part i: Data dependencies and information metric. IEEE Trans. Softw. Engin. 13, 10, 1049--1061. Google ScholarDigital Library
- Levene, M., Levene, M., and Loizou, G. 1999. A Guided Tour of Relational Databases and Beyond. Springer. Google ScholarDigital Library
- Levene, M. and Loizou, G. 2003. Why is the snowflake schema a good data warehouse design? Inf. Syst. 28, 3, 225--240. Google ScholarDigital Library
- Levene, M. and Vincent, M. W. 2000. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Engin. 12, 2, 281--291. Google ScholarDigital Library
- Ling, T. W., Tompa, F. W., and Kameda, T. 1981. An improved third normal form for relational databases. ACM Trans. Datab. Syst. 6, 2, 329--346. Google ScholarDigital Library
- Mannila, H. and Räihä, K.-J. 1986. Design by example: An application of Armstrong relations. J. Comput. Syst. Sci. 33, 3, 126--141. Google ScholarDigital Library
- Stephens, R. K. and Plew, R. R. 2002. Sams Teach Yourself SQL in 21 Days 4th Ed. Sams. Google ScholarDigital Library
- Valiant, L. G. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3, 410--421.Google ScholarDigital Library
- Vincent, M. W. 1999. Semantic foundations of 4NF in relational database design. Acta Inf. 36, 3, 173--213.Google ScholarCross Ref
- Zaniolo, C. 1982. A new normal form for the design of relational database schemata. ACM Trans. Datab. Syst. 7, 3, 489--499. Google ScholarDigital Library
Index Terms
- An information-theoretic analysis of worst-case redundancy in database design
Recommendations
Database design with equality-generating dependencies
DASFAA'05: Proceedings of the 10th international conference on Database Systems for Advanced ApplicationsIn relational database systems, traditional normalization techniques (eg, BCNF, 4NF) remove data redundancies from a single relation, but can not detect and remove redundancies across multiple relations. However, redundancies among multiple relations ...
SQL Schema Design: Foundations, Normal Forms, and Normalization
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataNormalization helps us find a database schema at design time that can process the most frequent updates efficiently at run time. Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are ...
A normal form for preventing redundant tuples in relational databases
ICDT '12: Proceedings of the 15th International Conference on Database TheoryWe introduce a new normal form, called essential tuple normal form (ETNF), for relations in a relational database where the constraints are given by functional dependencies and join dependencies. ETNF lies strictly between fourth normal form and fifth ...
Comments