Article

Information theoretic evaluation of change prediction models for large-scale software

Authors:
Mina Askari

University of Waterloo, Waterloo, Canada

University of Waterloo, Waterloo, Canada
View Profile

,
Ric Holt

University of Waterloo, Waterloo, Canada

University of Waterloo, Waterloo, Canada
View Profile

MSR '06: Proceedings of the 2006 international workshop on Mining software repositoriesMay 2006Pages 126–132https://doi.org/10.1145/1137983.1138013

Published:22 May 2006Publication History

MSR '06: Proceedings of the 2006 international workshop on Mining software repositories

Pages 126–132

ABSTRACT

In this paper, we analyze the data extracted from several open source software repositories. We observe that the change data follows a Zipf distribution. Based on the extracted data, we then develop three probabilistic models to predict which files will have changes or bugs. The first model is Maximum Likelihood Estimation (MLE), which simply counts the number of events, i.e., changes or bugs, that happen to each file and normalizes the counts to compute a probability distribution. The second model is Reflexive Exponential Decay (RED) in which we postulate that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. The third model is called RED-Co-Change. With each modification to a given file, the RED-Co-Change model not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes. We then present an information-theoretic approach to evaluate the performance of different prediction models. In this approach, the closeness of model distribution to the actual unknown probability distribution of the system is measured using cross entropy. We evaluate our prediction models empirically using the proposed information-theoretic approach for six large open source systems. Based on this evaluation, we observe that of our three prediction models, the RED-Co-Change model predicts the distribution that is closest to the actual distribution for all the studied systems.

References

Allen, J. F. Using Entropy for Evaluating and Comparing Probability Distributions, available at: http://www.cs.rochester.edu/u/james/CSC248/Lec6.pdfGoogle Scholar
Basili, V. R., and Perricone, B. Software errors and complexity: An empirical investigation. Communications of the ACM, 27(1):42--52, 1984. Google ScholarDigital Library
Eick, S. G., Graves, T. L., Karr, A. F., Marron, J. S., and Mockus, A. Does Code Decay? Assessing the Evidence from Change Management Data. IEEE Trans. on Software Engineering, 27(1):1--12, 2001. Google ScholarDigital Library
Eick, S. G., Graves, T. L., Karr, A. F., Mockus, A., Schuster, P. Visualizing Software Changes, IEEE Trans. on Software Engineering, vol. 28, no. 4, pp. 396--412, April, 2002. Google ScholarDigital Library
Gall, H., Hajek, K., and Jazayeri, M. Detection of logical coupling based on product release history. In Proceedings of the 14th International Conference on Software Maintenance, Bethesda, Washington D.C., November 1998. Google ScholarDigital Library
Graves, T. L., Karr, A. F., Marron, J. S. and Siy, H. P. Predicting fault incidence using software change history. IEEE Trans. on Software Engineering, 26(7):653--661, 2000. Google ScholarDigital Library
Hassan, A. E., Mining Software Repositories to Assist Developers and Support Managers. PhD Thesis, University of Waterloo, Ontario, Canada, 2004. Google ScholarDigital Library
Hassan, A. E. and Holt, R. C., The Top Ten List: Dynamic Fault Prediction, Proceedings of ICSM 2005: International Conference on Software Maintenance, Budapest, Hungary, Sept 25--30, 2005. Google ScholarDigital Library
Khoshgoftaar, T. M., Allen, E. B., Halstead, R., Trio, G. P. and Flass, R. M. Using Process History to Predict Software Quality. Computer, 31(4), 1998. Google ScholarDigital Library
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. Data Mining for Predictors of Software Quality. International Journal of Software Engineering and Knowledge Engineering, 9(5), 1999.Google Scholar
Manning, C. and Schütze, H. Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999. Google ScholarDigital Library
Mockus, A. and Votta, L. G. Identifying reasons for software change using historic databases. In International Conference on Software Maintenance, pages 120-130, San Jose, California, October 11-14 2000. Google ScholarDigital Library
Mockus, A., Weiss, D. M., and Zhang, Ping. Understanding and predicting effort in software projects. In 2003 International Conference on Software Engineering, pages 274--284, Portland, Oregon, May 3-10 2003. ACM Press. Google ScholarDigital Library
Ostrand, T. J., Weyuker, E. J., Bell, R. M. Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. Software Eng. 31(4): 340--355 (2005). Google ScholarDigital Library
Pareto Law: http://www.it-cortex.com/Pareto_law.htmGoogle Scholar
Perry, D. E. and Evangelist, W. M. An Empirical Study of Software Interface Faults - An Update. In Proceedings of the 20th Annual Hawaii International Conference on Systems Sciences, pages 113--136, Hawaii, USA, January 1987.Google Scholar
Perry, D. E. and Steig, C.S. Software Faults in Evolving a Large, Real-Time System: a Case Study'. In Proceedings of the 4th European Software Engineering Conference, Garmisch, Germany, September 1993. Google ScholarDigital Library
Reliability Analysis Center, Introduction to Software Reliability: A state of the Art Review. Reliability Analysis Center (RAC), 1996. http://rome.iitri.com/RAC/Google Scholar
Zimmermann, T., Weißgerber, P., Diehl, S., Zeller, A. Mining Version Histories to Guide Software Changes, IEEE Trans. on Software Engineering, vol. 31, no. 6, pp. 429--445, June, 2005. Google ScholarDigital Library
Zipf, G. K. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.Google Scholar

Index Terms

Information theoretic evaluation of change prediction models for large-scale software

Recommendations

Predicting software bugs using ARIMA model
ACM SE '10: Proceedings of the 48th Annual Southeast Regional Conference

The number of software products available in market is increasing rapidly. Many a time, multiple companies develop software products of similar functionalities. Thus the competition among those owning companies is becoming tougher every day. Moreover, ...
Read More
Experience With the Accuracy of Software Maintenance Task Effort Prediction Models

This paper reports experience from the development and use of eleven different software maintenance effort prediction models. The models were developed applying regression analysis, neural networks and pattern recognition and the prediction accuracy was ...
Read More
Contributing Features-Based Schemes for Software Defect Prediction
Artificial Intelligence XXXVI
Abstract
Automated defect prediction of large and complex software systems is a challenging task. However, by utilising correlated quality metrics, a defect prediction model can be devised to automatically predict the defects in a software system. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MSR '06: Proceedings of the 2006 international workshop on Mining software repositories
May 2006
191 pages
ISBN:1595933972
DOI:10.1145/1137983
General Chairs:
Stephan Diehl
University Trier, Germany
,
Harald Gall
University of Zurich, Switzerland
,
Ahmed E. Hassan
Research in Motion RIM, Canada
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 May 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation approach
information theory
prediction models
Qualifiers
- Article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 524
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Information theoretic evaluation of change prediction models for large-scale software

MSR '06: Proceedings of the 2006 international workshop on Mining software repositories

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting software bugs using ARIMA model

Experience With the Accuracy of Software Maintenance Task Effort Prediction Models

Contributing Features-Based Schemes for Software Defect Prediction