Abstract
Prediction of fault-prone modules is an important area of software engineering. We assumed that the occurrence of faults is related to the semantics in the source code modules. Semantics in a software module can be extracted from identifiers in the module. We then analyze the relationship between occurrence of “words” in identifiers and the existence of faults. To do so, we first decompose the identifiers into words, and investigate the occurrence of words in a module. Modeling by the random forest technique, we made a model of occurrence of words and existence of faults. We compared the word occurrence model with traditional models using CK metrics and LOC. The result of comparison showed that the occurrence of words is a good prediction measure as well as CK metrics and LOC.
Currently, The author is in Nara Advanced Institute of Science and Technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
https://github.com/doofuslarge/lscp. lcsp is a lightweight source code preprocesser. lscp can be used to isolate and manipulate the linguistic data (i.e., identifier names, comments, and string literals) from source code files.
References
Hata, H., Mizuno, O., Kikuno, T.: A systematic review of software fault prediction studies and related techniques in the context of repository mining. JSSST Comput. Softw. 29(1), 106–117 (2012)
Khoshgoftaar, T.M., Seliya, N.: Comparative assessment of software quality classification techniques: an empirical study. Empirical Softw. Eng. 9, 229–257 (2004)
Briand, L.C., Melo, W.L., Wust, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)
Gyimóthy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005). http://dx.doi.org/10.1109/TSE.2005.112
Ostrand, T., Weyuker, E., Bell, R.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31(4), 340–355 (2005)
Graves, T.L., Karr, A.F., Marron, J., Siy, H.: Predicting fault incidence using software change history. IEEE Trans. Softw. Eng. 26(7), 653–661 (2000). http://doi.ieeecomputersociety.org/10.1109/32.859533
Nagappan, N., Ball, T.: Static analysis tools as early indicators of pre-release defect density. In: Proceedings of 27th International Conference on Software Engineering, pp. 580–586. ACM, New York, NY, USA (2005). http://doi.acm.org/10.1145/1062455.1062558
Zheng, J., Williams, L., Nagappan, N., Snipes, W., Hudepohl, J.P., Vouk, M.A.: On the value of static analysis for fault detection in software. IEEE Trans. Softw. Eng. 32(4), 240–253 (2006). doi:10.1109/TSE.2006.38. http://dx.doi.org/10.1109/TSE.2006.38
Kawamoto, K., Mizuno, O.: Do long identifiers induce faults in software? A repository mining based investigation. In: Proceedings of 22nd International Symposium on Software Reliability Engineering (ISSRE2011), Supplemental Proceedings, pp. 3–1. Hiroshima, Japan, 2011
Yamamoto, H.: Software bug density prediction based on variable name (2010)
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number 24500038.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kawashima, N., Mizuno, O. (2015). Predicting Fault-Prone Modules by Word Occurrence in Identifiers. In: Lee, R. (eds) Software Engineering Research, Management and Applications. Studies in Computational Intelligence, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-11265-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-11265-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11264-0
Online ISBN: 978-3-319-11265-7
eBook Packages: EngineeringEngineering (R0)