Skip to main content
Log in

Empirical studies to assess the understandability of data warehouse schemas using structural metrics

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Data warehouses are powerful tools for making better and faster decisions in organizations where information is an asset of primary importance. Due to the complexity of data warehouses, metrics and procedures are required to continuously assure their quality. This article describes an empirical study and a replication aimed at investigating the use of structural metrics as indicators of the understandability, and by extension, the cognitive complexity of data warehouse schemas. More specifically, a four-step analysis is conducted: (1) check if individually and collectively, the considered metrics can be correlated with schema understandability using classical statistical techniques, (2) evaluate whether understandability can be predicted by case similarity using the case-based reasoning technique, (3) determine, for each level of understandability, the subsets of metrics that are important by means of a classification technique, and assess, by means of a probabilistic technique, the degree of participation of each metric in the understandability prediction. The results obtained show that although a linear model is a good approximation of the relation between structure and understandability, the associated coefficients are not significant enough. Additionally, classification analyses reveal respectively that prediction can be achieved by considering structure similarity, that extracted classification rules can be used to estimate the magnitude of understandability, and that some metrics such as the number of fact tables have more impact than others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Figure 6 does not show schema S07 because it was removed in the replication. See Sect. 4.3.1 for more information.

References

  • Anahory, S., & Murray, D. (1997). Data warehousing in the real world. Harlow, UK: Addison-Wesley.

    Google Scholar 

  • Basili, V. R., Shull, F., & Lanubille, F. (1999). Building knowledge through families of experiments. IEEE Transactions on Software Engineering, 25(4), 456–473.

    Article  Google Scholar 

  • Bouzeghoub, M., & Kedad, Z. (2002). Information and database quality, Chapter 8, Quality in data warehousing (pp. 163–198). Kluwer Academic Publishers.

  • Briand, L., Morasca, S., & Basili, V. (1996). Property-based software engineering measurement. IEEE Transactions on Software Engineering, 22(1), 68–86.

    Article  Google Scholar 

  • Briand, L., Ikonomovski, S., Lounis, H., & Wüst, J. (1998). A Comprehensive investigation of quality factors in object-oriented designs: An industrial case study, Technical Report ISERN-98-29. Germany: Fraunhofer Institute for Experimental Software Engineering.

  • Calero, C., Piattini, M., Pascual, C., & Serrano, M. (2001). Towards Data warehouse Quality Metrics, International Workshop on Design and Management of Data Warehouses (DMDW’01).

  • Carver, J., Jaccheri, L., Morasca, S., & Shull, F. (2003). Issues in using students in empirical studies in software engineering education. In Proceedings of 2003 International Symposium on software metrics (METRICS 2003). Sydney, Australia. September 2003, pp. 239–249.

  • Debevoise, N. T. (1999). The data warehouse method. NJ: Prentice Hall Upper Saddle River.

    Google Scholar 

  • Fenton, N., & Pfleeger, S. (1997). Software metrics: A rigorous approach (2nd ed.). London: Chapman & Hall.

  • Flach, P., & Lachiche, N. (1999). 1BC: A First-Order Bayesian Classifier. In Proceedings of the Ninth International Workshop on inductive logic programming (ILP’99), volume 1634 of lecture notes in artificial intelligence, pp. 92–103.

  • Godin, R., Mineau, G., Missaoui, R., St-Germain, M., & Faraj, N. (1995). Applying concept formation methods to software reuse. International Journal of Knowledge Engineering and Software Engineering, 5(1), 119–142.

    Article  Google Scholar 

  • Grosser, D., Sahraoui, H. A., & Valtchev, P. (2003). An analogy-based approach for predicting design stability of Java classes. In International Symposium on Software Metrics (METRICS’03), pp. 252–262.

  • Hörst, M., Regnell, B., & Wohlin, C. (2000). Using students as subjects – A comparative study of students & professionals in lead-time impact assessment. In 4th Conference on empirical assessment & evaluation in software engineering, EASE, Keele University, UK.

  • Huang, K.-T., Lee, Y. W., & Wang, R. Y. (1999). Quality information and knowledge. Prentice Hall: Upper Saddle River.

    Google Scholar 

  • Inmon, W. H. (1997). Building the data warehouse (2nd ed.). John Wiley and Sons.

  • ISO. (2001). Software product evaluation-quality characteristics and guidelines for their use. Geneva: ISO/IEC Standard 9126.

  • Jarke, M., LenzerinI, I. M., Vassilou, Y., & Vassiliadis, P. (2000). Fundamentals of data warehouses. Springer.

  • Kimball, R., Reeves, L., Ross, M., & Thornthwaite, W. (1998). The data warehouse lifecycle toolkit. John Wiley and Sons.

  • Kitchenham, B., Pfleegger, S., Pickard, L., Jones, P., Hoaglin, D., El-Emam, K., & Rosenberg, J. (2002). Preliminary guidelines for empirical research in software engineering. IEEE Transactions of Software Engineering, 28(8), 721–734.

    Article  Google Scholar 

  • Poels, G., & Dedene G. (1999). DISTANCE: A framework for software measure construction. Belgium: Dept. Applied Economics Katholieke Universiteit Leuven.

  • Ramoni, M., & Sebastiani, P. (1999). Bayesian methods for intelligent data analysis. In: M. Berthold & D. J. Hand (Eds.), An introduction to intelligent data analysis. Springer: New York.

  • Schneidewind, N. (2002). Body of knowledge for software quality measurement. IEEE Computer, 35(2), 77–83.

    Google Scholar 

  • Serrano, M., Calero, C., & Piattini, M. (2002). Validating metrics for data warehouses. IEE Proceedings SOFTWARE, 149(5), 161–166.

    Article  Google Scholar 

  • Serrano, M., Calero, C., & Piattini, M. (2005). An experimental replication with data warehouse metrics. International Journal of Data Warehousing & Mining, 1(4), 1–21.

    Google Scholar 

  • Wilson, D., & Martinez, T. (1997). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6, 1–34.

    MATH  MathSciNet  Google Scholar 

  • Wohlin, C., Runeson, P., Höst, M., Ohlson, M., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic Publishers.

  • Zuse, H. (1998). A framework of software measurement. Berlin: Walter de Gruyter.

Download references

Acknowledgements

This research is part of the CALIPO project, supported by Dirección General de Investigación of the Ministerio de Ciencia y Tecnologia (TIC2003-07804-C05-03). This research is also part of the ENIGMAS project, supported by Junta de Comunidades de Castilla – La Mancha – Consejería de Ciencia y Tecnología (PBI-05-058). This work was performed during the stay of Houari Sahraoui at the University of Castilla-La Mancha under the “Programa Nacional De Ayudas Para La Movilidad de Profesores en Régimen de año sabático”, from Spanish Ministerio de Educación y Ciencia, REF: 2004-0161. We would like to thank all of the volunteer subjects who participated in these experiments whose inestimable assistance helped us reach the conclusions in this paper. We also want to thank the reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Angel Serrano.

Appendix: Collected time

Appendix: Collected time

Table 19 Appendix: Collected time

Rights and permissions

Reprints and permissions

About this article

Cite this article

Serrano, M.A., Calero, C., Sahraoui, H.A. et al. Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Software Qual J 16, 79–106 (2008). https://doi.org/10.1007/s11219-007-9030-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-007-9030-7

Keywords

Navigation