Skip to main content
Log in

Commentary on Coefficient Alpha: A Cautionary Tale

Psychometrika Aims and scope Submit manuscript

Abstract

The general use of coefficient alpha to assess reliability should be discouraged on a number of grounds. The assumptions underlying coefficient alpha are unlikely to hold in practice, and violation of these assumptions can result in nontrivial negative or positive bias. Structural equation modeling was discussed as an informative process both to assess the assumptions underlying coefficient alpha and to estimate reliability

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  • Becker, G. (2000). How important is transient error in estimating reliability? Going beyond simulation studies. Psychological Methods, 5, 370–379.

    Article  PubMed  Google Scholar 

  • Bentler, P.M., & Woodward, J.A. (1980). Inequalities among lower bounds to reliability: With applications to test construction and factor analysis. Psychometrika, 45, 249–267.

    Article  Google Scholar 

  • Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley.

    Google Scholar 

  • Cattell, R.B., & Tsujioka, B. (1964). The importance of factor-trueness and validity, versus homogeneity and orthogonality in test scales. Educational and Psychological Measurement, 24, 3–30.

    Article  Google Scholar 

  • Chen, F.F., West, S.G., & Sousa, K.H. (2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41, 189–224.

    Article  Google Scholar 

  • Cortina, J.M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104.

    Article  Google Scholar 

  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart, and Winston.

    Google Scholar 

  • Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.

    Article  Google Scholar 

  • Feldt, L.S., & Qualls, A.L. (1996). Bias in coefficient alpha arising from heterogeneity of test content. Applied Measurement in Education, 9, 277–286.

    Article  Google Scholar 

  • Fleishman, J., & Benson, J. (1987). Using LISREL to evaluate measurement models and scale reliability. Educational and Psychological Measurement, 47, 925–939.

    Article  Google Scholar 

  • Gerbing, D.W., & Anderson, J.C. (1988). An updated paradigm for scale development incorporating unidimensionality and its assessment. Journal of Marketing Research, 25, 186–192.

    Article  Google Scholar 

  • Gessaroli, M.E., & Folske, J.C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2, 277–295.

    Article  Google Scholar 

  • Green, S.B. (2003). A coefficient alpha for test-retest data. Psychological Methods, 8, 88–101.

    Article  PubMed  Google Scholar 

  • Green, S.B., & Hershberger, S.L. (2000). Correlated errors in true score models and their effect on coefficient alpha. Structural Equation Modeling, 7, 251–270.

    Article  Google Scholar 

  • Green, S.B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 94. doi:10.1007/s11336-008-9099-3.

  • Green, S.B., Lissitz, R.W., & Mulaik, S.A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827–838.

    Article  Google Scholar 

  • Green, S.B., Akey, T.M., Fleming, K.K., Hershberger, S.L., & Marquis, J.G. (1997). Effect of the number of scale points on chi-square fit indices in confirmatory factor analysis. Structural Equation Modeling, 4, 108–120.

    Article  Google Scholar 

  • Guttman, L.A. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282.

    Article  Google Scholar 

  • Hattie, J. (1985). Methodology review: Assessing unidimensionality of test and items. Applied Psychological Measurement, 9, 139–164.

    Article  Google Scholar 

  • Horn, J.L. (1965). A rationale and a test for the number of factors in factor analysis. Psychometrika, 30, 179–185.

    Article  PubMed  Google Scholar 

  • Humphreys, L.G. (1985). General intelligence: An integration of factor, test, and simplex theory. In B.B. Wolman (Ed.), Handbook of intelligence: Theories, measurements, and applications (pp. 15–35). New York: Wiley.

    Google Scholar 

  • Jackson, P.H., & Agunwamba, C.C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I. Algebraic lower bounds. Psychometrika, 42, 567–578.

    Article  Google Scholar 

  • Jöreskog, K.G. (1971). Statistical analysis of sets of congeneric test. Psychometrika, 36, 109–133.

    Article  Google Scholar 

  • Leary, L.F., & Dorans, N.J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55, 387–411.

    Google Scholar 

  • Lee, G., & Frisbie, D.A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237–255.

    Article  Google Scholar 

  • Lee, G., Dunbar, S.B., & Frisbie, D.A. (2001). The relative appropriateness of eight measurement models for analyzing scores from tests composed of testlets. Educational and Psychological Measurement, 61, 958–975.

    Article  Google Scholar 

  • Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.

    Google Scholar 

  • Lucke, J.F. (2005). “Rassling the hog” The influence of correlated item error on internal consistency, classical reliability, and congeneric reliability. Applied Psychological Measurement, pp. 106–125.

  • Maxwell, A.E. (1968). The effect of correlated errors on estimates of reliability coefficients. Educational and Psychological Measurement, 28, 803–811.

    Article  Google Scholar 

  • McDonald, R.P. (1981). The dimensionality of test and items. British Journal of Mathematical and Statistical Psychology, 34, 100–117.

    Google Scholar 

  • McDonald, R.P. (1999). Test theory: A unified approach. Hillsdale: Erlbaum.

    Google Scholar 

  • Miller, M.B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, 2, 255–273.

    Article  Google Scholar 

  • Novick, M.R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1–13.

    Article  PubMed  Google Scholar 

  • Ochieng, C.O. (2001). Effects of item order on consistency and precision under different ordering schemes in attitudinal scales: A case of physical self-concept scales (Paper No. ESQESS-2001-3). University of British Columbia. Edgeworth Laboratory for Quantitative Educational and Social Science, Vancouver, B.C.

  • Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173–184.

    Article  Google Scholar 

  • Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22, 375–385.

    Article  Google Scholar 

  • Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 69–76.

    Article  Google Scholar 

  • Raykov, T., & Shrout, P. (2002). Reliability of scales with general structure: Point and interval estimation using a structural equation modeling approach. Structural Equation Modeling, 9, 195–212.

    Article  Google Scholar 

  • Reise, S.P., Waller, N.G., & Comrey, A.L. (2000). Factor analysis and scale revision. Psychological Assessment, 12, 287–297.

    Article  PubMed  Google Scholar 

  • Reise, S.P., Morizot, J., & Hays, R.D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 19–31.

    Article  PubMed  Google Scholar 

  • Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-order factor analysis. Multivariate Behavioral Research, 23, 51–67.

    Article  Google Scholar 

  • Rozeboom, W.W. (1966). Foundations of the theory of prediction. Homewood: Dorsey.

    Google Scholar 

  • Rozeboom, W.W. (1989). The reliability of a linear composite of nonequivalent subtests. Applied Psychological Measurement, 13, 277–283.

    Article  Google Scholar 

  • Roznowski, M., Tucker, L.R., & Humphreys, L.G. (1991). Three approaches to determining the dimensionality of binary items. Applied Psychological Measurement, 15, 109–127.

    Article  Google Scholar 

  • Schmid, J., & Leiman, J.M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53–61.

    Article  Google Scholar 

  • Schurr, K.T., & Henriksen, L.W. (1983). Effects of item sequencing and grouping in low-inference type questionnaires. Journal of Educational Measurement, 20, 379–391.

    Article  Google Scholar 

  • Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 94. doi:10.1007/s11336-008-9101-0.

  • Sireci, S.G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247.

    Article  Google Scholar 

  • Sparfeldt, J.E., Schilling, S.R., & Rost, D.H. (2006). Blocked versus randomized format of questionnaires: A confirmatory. Educational and Psychological Measurement, 66, 961–974.

    Article  Google Scholar 

  • Steinberg, L. (2001). The consequences of pairing questions: Context effects in personality measurement. Journal of Personality and Social Psychology, 81, 332–342.

    Article  PubMed  Google Scholar 

  • Steinberg, L., & Thissen, D. (1996). Uses of item response theory and the testlet concept in the measurement of psychopathology. Psychological Methods, 1, 81–97.

    Article  Google Scholar 

  • Ten Berge, J.M.F., & Kiers, H.A.L. (1991). A numerical approach to the exact and the approximate minimum rank of a covariance matrix. Psychometrika, 56, 309–315.

    Article  Google Scholar 

  • Ten Berge, J.M.F., & Kiers, H.A.L. (2003). The minimum rank factor analysis program MRFA. Internal report, Department of Psychology, University of Groningen, The Netherlands.

  • Veres, J.G., Sims, R.R., & Locklear, T.S. (1991). Improving the reliability of Kolb’s revised learning style inventory. Educational & Psychological Measurement, 51, 143–150.

    Article  Google Scholar 

  • Wainer, H., & Kiely, G.L. (1987). Item clusters and computerized adaptive testing: A case of testlets. Journal of Educational Measurement, 24, 185–201.

    Article  Google Scholar 

  • Woodhouse, B., & Jackson, E.H. (1977). Lower bounds for the reliability of a test composed of nonhomogeneous items II: A search procedure to locate the greatest lower bound. Psychometrika, 42, 579–591.

    Article  Google Scholar 

  • Yang, Y., & Green, S.B. (2007). Coefficient alpha and SEM estimates of reliability. Presented at annual meeting of the American Educational Research Association.

  • Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.

    Article  Google Scholar 

  • Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–214.

    Article  Google Scholar 

  • Yung, Y.F., Thissen, D., & McLeod, L.D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64, 113–128.

    Article  Google Scholar 

  • Zimmerman, D.W., Zumbo, R.D., & Lalonde, C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53, 33–49.

    Article  Google Scholar 

  • Zinbarg, R.E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ω H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123–133.

    Article  Google Scholar 

  • Zinbarg, R.E., Revelle, W., & Yovel, I. (2007). Estimating ω h for structures containing two group factors: Perils and prospects. Applied Psychological Measurement, 15, 135–157.

    Article  Google Scholar 

  • Zumbo, B.D., & Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate inferences: Important advances in reliability and validity theory. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 73–92). Thousand Oaks: Sage.

    Google Scholar 

  • Zwick, W.R., & Velicer, W.F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432–442.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel B. Green.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Green, S.B., Yang, Y. Commentary on Coefficient Alpha: A Cautionary Tale. Psychometrika 74, 121–135 (2009). https://doi.org/10.1007/s11336-008-9098-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-008-9098-4

Keywords

Navigation