Skip to main content

Predicting Fault-Prone Modules by Word Occurrence in Identifiers

  • Chapter
  • First Online:
Software Engineering Research, Management and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 578))

Abstract

Prediction of fault-prone modules is an important area of software engineering. We assumed that the occurrence of faults is related to the semantics in the source code modules. Semantics in a software module can be extracted from identifiers in the module. We then analyze the relationship between occurrence of “words” in identifiers and the existence of faults. To do so, we first decompose the identifiers into words, and investigate the occurrence of words in a module. Modeling by the random forest technique, we made a model of occurrence of words and existence of faults. We compared the word occurrence model with traditional models using CK metrics and LOC. The result of comparison showed that the occurrence of words is a good prediction measure as well as CK metrics and LOC.

Currently, The author is in Nara Advanced Institute of Science and Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/.

  2. 2.

    http://promisedata.googlecode.com/.

  3. 3.

    https://github.com/doofuslarge/lscp. lcsp is a lightweight source code preprocesser. lscp can be used to isolate and manipulate the linguistic data (i.e., identifier names, comments, and string literals) from source code files.

References

  1. Hata, H., Mizuno, O., Kikuno, T.: A systematic review of software fault prediction studies and related techniques in the context of repository mining. JSSST Comput. Softw. 29(1), 106–117 (2012)

    Google Scholar 

  2. Khoshgoftaar, T.M., Seliya, N.: Comparative assessment of software quality classification techniques: an empirical study. Empirical Softw. Eng. 9, 229–257 (2004)

    Article  Google Scholar 

  3. Briand, L.C., Melo, W.L., Wust, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)

    Article  Google Scholar 

  4. Gyimóthy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005). http://dx.doi.org/10.1109/TSE.2005.112

  5. Ostrand, T., Weyuker, E., Bell, R.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31(4), 340–355 (2005)

    Google Scholar 

  6. Graves, T.L., Karr, A.F., Marron, J., Siy, H.: Predicting fault incidence using software change history. IEEE Trans. Softw. Eng. 26(7), 653–661 (2000). http://doi.ieeecomputersociety.org/10.1109/32.859533

  7. Nagappan, N., Ball, T.: Static analysis tools as early indicators of pre-release defect density. In: Proceedings of 27th International Conference on Software Engineering, pp. 580–586. ACM, New York, NY, USA (2005). http://doi.acm.org/10.1145/1062455.1062558

  8. Zheng, J., Williams, L., Nagappan, N., Snipes, W., Hudepohl, J.P., Vouk, M.A.: On the value of static analysis for fault detection in software. IEEE Trans. Softw. Eng. 32(4), 240–253 (2006). doi:10.1109/TSE.2006.38. http://dx.doi.org/10.1109/TSE.2006.38

  9. Kawamoto, K., Mizuno, O.: Do long identifiers induce faults in software? A repository mining based investigation. In: Proceedings of 22nd International Symposium on Software Reliability Engineering (ISSRE2011), Supplemental Proceedings, pp. 3–1. Hiroshima, Japan, 2011

    Google Scholar 

  10. Yamamoto, H.: Software bug density prediction based on variable name (2010)

    Google Scholar 

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 24500038.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoki Kawashima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kawashima, N., Mizuno, O. (2015). Predicting Fault-Prone Modules by Word Occurrence in Identifiers. In: Lee, R. (eds) Software Engineering Research, Management and Applications. Studies in Computational Intelligence, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-11265-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11265-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11264-0

  • Online ISBN: 978-3-319-11265-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics