ABSTRACT
Background: Construct validity concerns the use of indicators to measure a concept that is not directly measurable. Aim: This study intends to identify, categorize, assess and quantify discussions of threats to construct validity in empirical software engineering literature and use the findings to suggest ways to improve the reporting of construct validity issues. Method: We analyzed 83 articles that report human-centric experiments published in five top-tier software engineering journals from 2015 to 2019. The articles’ text concerning threats to construct validity was divided into segments (the unit of analysis) based on predefined categories. The segments were then evaluated regarding whether they clearly discussed a threat and a construct. Results: Three-fifths of the segments were associated with topics not related to construct validity. Two-thirds of the articles discussed construct validity without using the definition of construct validity given in the article. The threats were clearly described in more than four-fifths of the segments, but the construct in question was clearly described in only two-thirds of the segments. The construct was unclear when the discussion was not related to construct validity but to other types of validity. Conclusions: The results show potential for improving the understanding of construct validity in software engineering. Recommendations addressing the identified weaknesses are given to improve the awareness and reporting of CV.
- Thomas D Cook and Donald T Campbell. 1979. The design and conduct of true experiments and quasi-experiments in field settings. In Reproduced in part in Research in Organizations: Issues and Controversies. Goodyear Publishing Company.Google Scholar
- Lee J Cronbach and Paul E Meehl. 1955. Construct validity in psychological tests. Psychological bulletin 52, 4 (1955), 281–302.Google Scholar
- Andreas Jedlitschka, Marcus Ciolkowski, and Dietmar Pfahl. 2008. Reporting experiments in software engineering. Guide to advanced empirical software engineering (2008), 201–228.Google ScholarCross Ref
- Barbara Kitchenham, Hiyam Al-Khilidar, Muhammed Ali Babar, Mike Berry, Karl Cox, Jacky Keung, Felicia Kurniawati, Mark Staples, He Zhang, and Liming Zhu. 2008. Evaluating guidelines for reporting empirical software engineering studies. Empirical Software Engineering 13 (2008), 97–121.Google ScholarDigital Library
- Barbara Kitchenham, Dag I. K. Sjøberg, Tore Dybå, O Pearl Brereton, David Budgen, Martin Höst, and Per Runeson. 2012. Trends in the Quality of Human-Centric Software Engineering Experiments—A Quasi-Experiment. IEEE Transactions on Software Engineering 39, 7 (2012), 1002–1017.Google ScholarDigital Library
- Samuel Messick. 1989. Validity. Educational measurement (3rd edn.) 1989 (1989), 13–103.Google Scholar
- Paul Ralph and Ewan Tempero. 2018. Construct validity in software engineering research and software metrics. In Proc. 22nd Int’l Conference on Evaluation and Assessment in Software Engineering. 13–23.Google ScholarDigital Library
- Milton J Rosenberg. 1969. The Conditions and Consequences of Evaluation Apprehension. In Artifact in behavioral research, Robert Rosenthal and Ralph L Rosnow (Eds.). New York : Academic Press, 279–349.Google Scholar
- Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering 14, 2 (2009), 131–164.Google Scholar
- William R Shadish, Thomas D Cook, and Donald Thomas Campbell. 2002. Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.Google Scholar
- Mary Shaw. 2003. Writing good software engineering research papers. In 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 726–736.Google ScholarCross Ref
- Dag I.K. Sjøberg and Gunnar R. Bergersen. 2022. Construct Validity in Software Engineering. IEEE Transactions on Software Engineering (2022), early access. https://doi.org/10.1109/TSE.2022.3176725Google ScholarDigital Library
- Dag I. K. Sjøberg, Tore Dybå, and Magne Jørgensen. 2007. The future of empirical methods in software engineering research. In Future of Software Engineering (FOSE’07). IEEE, 358–378.Google Scholar
- Robert K Yin. 2003. Case study research: Design and methods. (3 ed.). Sage Publications, Inc.Google Scholar
Index Terms
- Improving the Reporting of Threats to Construct Validity
Recommendations
Construct Validity in Software Engineering Research and Software Metrics
EASE '18: Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018Construct validity is essentially the degree to which our scales, metrics and instruments actually measure the properties they are supposed to measure. Although construct validity is widely considered an important quality criterion for most empirical ...
Construct Validity in Software Engineering
Empirical research aims to establish generalizable claims from data. Such claims may involve concepts that must be measured indirectly by using indicators. Construct validity is concerned with whether one can justifiably make claims at the conceptual ...
Construct Validity Assessment in IS Research: Methods and Case Example of User Satisfaction Scale
Valid and reliable measures are critical to theory development as they facilitate theory testing in empirical research. Efforts in scale development have been put on ensuring aspects of validity. In this paper, the authors address a specific topic of ...
Comments