An Empirical Analysis for Predicting Source Code File Reusability Using Meta-Classification Algorithms

Kaur, Loveleen; Mishra, Ashutosh

doi:10.1007/978-981-10-8237-5_48

Loveleen Kaur¹⁹ &
Ashutosh Mishra¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 706))

966 Accesses
2 Citations

Abstract

Although various quantifiers of software component reusability have been proposed, these metrics have seldom been utilized in existing literature to analyze source code file reusability transpiring within a single product family. Such metrics can be effortlessly employed to develop reuse prediction models which can support the software practitioners in obtaining information regarding the total cost involved in developing a novel version of a prevailing software or upgrading an existing software version by estimating the total reusable code files in advance without being compelled to scrutinize the complete codebase. In view of this, this research work aims to examine the efficacy of seven meta-classification techniques in the development of such reuse prediction models on four software datasets constructed from four successively released versions of software using appropriate reuse metrics. We also evaluate the predictive performance of these meta-classifiers against the statistical technique of logistic regression and rank these techniques using the Friedman statistical test.

Download conference paper PDF

Measuring the effect of clone refactoring on the size of unit test cases in object-oriented software: an empirical study

Article 02 April 2019

Mourad Badri, Linda Badri, … Alexandre Ouellet

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Article 17 August 2022

Shaiful Chowdhury, Reid Holmes, … Rick Kazman

Combining Clustering and Classification for Software Quality Evaluation

Keywords

1 Introduction

A systematic reuse activity incorporates the utilization of previously developed software components/artifacts to create new software systems, thus leading to the overall reduction in the cost, development time, and effort of software development [1]. For instance, in order to identify the actuality of reusability in existing software projects, investigation was performed on numerous sizeable open-source and individual software projects bundled with popular distributions of BSD and Linux [2]. The observations depicted that almost fifty percent of the total number of files (5.3 million) scrutinized in the projects have been employed in at least two projects. However, no software metrics were employed by the author to conduct this analysis.

Software metrics for calculating the ease of reusability of software projects are vital for accomplishing “development by reuse” and “development for reuse” [3]. Besides, these reuse metrics could turn out to be contributory in the development of reusability prediction models that could be applied by software developers to gather knowledge pertaining to the aggregate expenditure involved in developing a novel version of an existing software or updating an existing software version by knowing the total code that can be reused, i.e., integrated without change in the new version, in advance without requiring to examine the complete codebase. Since the development of new code would need substantial time and effort, reusability valuation of source code components could aid in minimizing this development effort and time, and provide means to assess the development cost of the new software.

The machine learning (ML) algorithms are being effectively utilized to build potent predictor models on a varied set of areas such as engineering, medicine, geology, etc. [4]. Classification, in ML, is the challenge of allocating a new observation to a certain set of category or class (subpopulation), based on a given training set of data having instances (or observations) whose category belonging is known [4]. An algorithm implementing classification is known as a classifier. Meta-classifiers formulate a collection of classifiers and then classify novel data records on the basis of the combination of the results of these classifiers using certain mechanisms. Investigational results have indicated that meta-classifiers often show more accuracy and more robustness even in the presence of noisy data, and obtain a lesser average error rate as compared to the constituent individual classifiers [5].

Therefore, in this study, we conduct an empirical validation by means of the reusability datasets constructed(with respect to seven randomly selected reuse metrics) from four consecutively released versions of a software developed in Java language to determine the competency of seven meta-classification algorithms for developing version to version source code reusability prediction models. In addition to this, we examine and compare the results of these meta-classifiers with logistic regression (LR) [6] technique using performance indicators such as accuracy and AUC ROC analysis. Lastly, we statistically rank all the techniques used in this study using Friedman’s statistical test in order to know which algorithm performs the best.

Though there are studies [7, 8] in the existing literature that assess the ability of classification techniques for change proneness and fault prediction, there has been no study till date performing the statistical comparison of the performance of meta-classifiers with the LR technique to predict version to version source code reusability.

The rest of this paper is organized as follows. The next section delivers a concise summary of the existing literature related to the topic in concern. Section 3 describes the empirical data collected and independent and dependent variable selected as a part of the research background. Section 4 reports the meta-classification techniques employed and the performance evaluation measures selected to evaluate the meta-classifiers. Section 5 describes the empirical analysis results of the application of the seven meta-classifiers and the LR technique along with the Friedman statistical test results. Section 6 comprises the various threats to validity of our work, followed by the last section, in which we state the conclusions and the future work.

2 Related Work

This section provides a brief summary of the literature related to the software reusability prediction and for detailed reading, one can refer to the review [9], which happens to be the most recent and the sole systematic literature review to be done with respect to reusability metrics and prediction of software components according to established guidelines of systematic literature review. Recently, self-organizing maps (SOM) were employed for clustering the datasets corresponding to the CK metrics values which were gathered from three Java-based projects [10]. It was also recently established that the reusability of a source code class increases reciprocally with the increase in depth of inheritance as well as number of children [11]. Moreover, authors [12] have also considered different variants in reuse (common reuse, high-reuse variation, low-reuse variation, and single-use) to empirically and correctly estimate fault proneness across products and across releases of Software Product Lines. Though many metrics have been proposed for the measurement of the reusability of a software component or a software as a whole in the previous research works [3, 9, 13,14,15], in the majority of the cases these have been qualitative reuse metrics, the evaluation of which is certainly dependent on individuals. Also, very few of the machine learning techniques (K-means and hierarchical clustering, support vector machines, artificial neural networks, and decision trees) have been explored for reusability prediction which do not include any ensemble or meta-classifiers or LR techniques. Moreover, these articles do not include the comparison of actual metric values against concrete reuse results of a realistic software development environment to validate the reusability prediction. Also, an empirical evaluation of the software reuse transpiring within the same product family and from version to version via the reuse metrics has not been analyzed in the literature examined.

To redress this, we assess a wide range of meta-classification techniques for reusability prediction with the help of four datasets which have been created from four consecutively released versions of a realistic Java-based software using actual values of their reuse results and some randomly selected reuse metrics. We also use Friedman statistical test to allocate statistical ranks to the techniques for the purpose of determining if the selected meta-classifiers significantly outperform the LR technique, with the motive to provide empirical substantiation to evaluate the best version to version reusability prediction model.

3 Research Background

The following subsections constitute an overview of the empirical data collected and the metrics (dependent variables and independent variable) incorporated to form the reusability prediction models.

3.1 Empirical Data Collection

In order to construct the version to version file reusability prediction models, we created four datasets using four consecutive versions of the JFreeChart^{Footnote 1} software, which is a free (LGPL) chart library for the Java platform. The details of the four selected versions have been illustrated in Table 1. Numerical values of seven different and commonly used static code metrics were collected corresponding to each of the Java files existing in each of the four selected versions of JFreeChart using two different open-source static code analysis tools- Stan4J^{Footnote 2} and JHawk 6.1.3.^{Footnote 3} These Java files are basically composed of one or more than one java classes (or interfaces). In those cases where there exists more than one class per file, an aggregate value of the metrics with respect to all the classes is collected. A brief description of the seven selected reuse metrics (Coupling between objects, Efferent Coupling, Depth of inheritance, Lack of cohesion between methods, Number of Calls, Number of methods defined in a file, and Cyclomatic Complexity) has been given in below:

Table 1 Details of the four JFreeChart versions employed in this study

Full size table

Coupling between objects [16] is stated as the total number of files coupled to a given file. Two source code files are said to be coupled to each other if the methods declared in one makes use of some other code file’s instance variables or methods, whereas Efferent Coupling [17] only evaluates the total number of external files used by a given file. Depth of inheritance [16] is the maximum length of a path from a given source code file to the root code file in the inheritance structure of the given software. Lack of cohesion between methods [16] calculates the total of different methods in a given code file that refer a given instance variable. Number of Calls is the number of method calls (in statements as well as in logical expressions) in the target file. Number of methods defined in a file [18] is the total of methods contained in a given Java code file, and lastly, Cyclomatic Complexity [19] measures the number of independent paths through program source code.

We used a clone detection tool called AntiCut&Paste [20] in order to estimate the reusability of individual source code files of each of the versions. Source code files of two successive versions of the selected JFreeChart software were supplied to the tool as an input and it returned those Java files which were discovered to be common to the two releases. Post this step, a binary variable of a “No” or “Yes” was calculated as the reuse statistic and was allocated for every source code file with “Yes” indicating that the file had been reused into next version in its entirety and without any modifications and “No” indicating that the file had not been employed in the next version or was employed with some modification.

3.2 Dependent and Independent Variables

In our study, the binary variable of reusability is selected as the dependent variable which is to be estimated via the independent variables. The independent variables are those for which the results need to be calculated for the prediction of the reuse statistic of a Java file in the next release of the software. Therefore as per our context of application, the independent variables are the seven software metrics discussed in Sect. 3.1.

4 Research Methodology

Having described the construction of the four datasets in Sect. 3, this section discusses the various meta-classification techniques incorporated in this study for the prediction of source code file reusability. The measures undertaken to analyze the performance of the selected meta-classification algorithms have also been stated.

4.1 Meta-Classification Techniques Employed

This section provides a brief description of the seven meta-classifiers [21] (AdaBoost, Bagging, Filtered, Multi-class (M-Class), Random Sub Space (RSS), Stacking and Voting) incorporated in this study.

AdaBoost or Adaptive Boosting utilizes a sequence of simple weighted classifiers, where each classifier is made to analyze a separate characteristic of the data, to finally generate an all-inclusive classifier, with the help of which there is a high probability of obtaining a low misclassification error rate as compared to an individual classifier. Bagging (stands for Bootstrap Aggregation) classifier supplies random subgroups of the original dataset to every base classifiers and then combines their separate predictions (either using averaging or voting) to decide the absolute prediction. The Filtered classifier consists of running an arbitrary classifier on data that has been passed through an arbitrary filter which uses some mathematical evaluation (that is based on some intrinsic characteristic of the training set like correlation). Multi-class (M-Class) classifier formulates a methodology to convert a given multi-class problem into numerous binary class problems. The metric is evaluated for every class by considering it to be a binary classification problem after combining all the remaining classes as second-class entities. Then the weighted average (weighted by class frequency) or macro average (treat each class equally) metric is obtained by averaging the binary metric over all the classes. However, unlike binary classification problems, here one does not need to select a threshold score to generate predictions. The label or class obtaining the highest predicted score is the predicted answer. The Random Sub Space (RSS) classifier is similar to bagging technique except here random subsets of the dataset are drawn as random subsets of the features whereas in bagging the samples are drawn with replacement. Stacking is similar to boosting. The difference here is that several classifiers are combined using the stacking method instead of an empirical formula for the weight function and the base learner’s predictions are given as input for a meta-level classifier whose output is the final class. In the Voting methodology, the base-level classifier’s predictions are combined in accordance with a static voting scheme (usually the plurality voting scheme).

4.2 Performance Evaluation Measures

Two performance evaluation measures: Accuracy and Area under the ROC Curve (AUC) are chosen to assess the predictive performance of the selected algorithms against version to version source code file reusability. The accuracy of a model is depicted as the ratio of the number of Java files that are predicted with accuracy to the total number of Java files in the version. The AUC metric is obtained through the ROC^{Footnote 4} plotting which indicates the optimal cutoff point at which both sensitivity and specificity are maximized. AUC values >= 0.7 and < 0.8 exhibit acceptable division; AUC values >= 0.8 and < 0.9 exhibit excellent division; and AUC >= 0.9, exhibit outstanding division between the reused and non-reused files by the prediction algorithm.

5 Empirical Analysis

Results of the models constructed using meta-classifiers to predict the version to version file reusability on the four selected versions of JFreeChart are described in this section of the paper. These results were predicted using the WEKA tool which is an open-source tool and is easily obtained on http://www.cs.waikato.ac.nz/ml/weka/. The Naïve Bayes [5] classifier was used as the base classifier/learner. We used the default settings of tool to construct the seven meta-classification models.

5.1 Model Evaluation Results

The columns in Table 2 show the version-wise Accuracy (Acc.) in %, and Area under Curve (AUC) scored by each of the seven meta-classification algorithms on each of the four versions of JFreeChart. In order to acquire a supplementary accurate assessment with respect to the predictive potential of the selected classification models, a k cross-validation [22] of all the models generated in this research was conducted in which the dataset is randomly separated into roughly “k” equivalent subsets and for each assessment, one of the k subsets is employed as the test set and the training set is formed with the residual k−1 subsets. This process is reiterated for all the k subsets. In this study, the meta-classification models generated observations stated in Table 2 were validated with a “k” value of 10.

Table 2 Results of the meta-classification analysis

Full size table

From the values of the performance measures observed in Table 2, six out of seven models(except the Stacking meta-classifier) exhibit good results, i.e., depicting high scores for both the performance metrics, with accuracy values ranging from 75.8 to 90% and AUC values ranging from 0.64 to 0.88. The obtained results, especially the AUC results show that the six out of seven selected meta-classification algorithms display an acceptable and, in some cases, excellent discrimination between the reused and non-reused files contained in the four JFreeChart versions, thus demonstrating their effectiveness for developing fitting and authentic version to version source code reusability prediction models. The Stacking technique^{Footnote 5} achieves the lowest AUC values (< 0.50). Thus even though it achieves high accuracy values (which is solely due to the correct classification of the reusable classes) the Stacking meta-classifier does not qualify as an efficient predictor of version to version reusability.

We also applied the LR technique on the four selected datasets, the results of which are stated in Table 3.

Table 3 Binary logistic regression results

Full size table

The results indicate that the LR technique also shows a comparable performance to six out of seven meta-classifiers, especially to the Vote meta-classifier with accuracy values ranging from 73.3 to 83.9% and AUC values ranging from 0.79 to 0.87.

Though there does exist a difference(however minor) in the prediction performance of the models developed via the selected techniques, we needed to ascertain if the difference between them holds statistical significance for which we conduct the Friedman statistical test [23].

Table 4 reports the mean ranks scored by every technique after the use of Friedman test, where the model with the lowest mean rank is the one which performs the worst. The test was based on the AUC results obtained by each of the models on the four selected JFreeChart datasets. According to the results stated in Table 4, the technique with the best performance on all the three datasets is the Vote meta-classifier with a mean rank of 7. This is followed by the LR algorithm which obtains a mean rank of 6, which is closely followed by the M-Class meta-classifier with a mean rank of 5.25. The Stacking meta-classifier is selected as the worst meta-classifier with the lowest mean rank of 1.

Table 4 Friedman statistical test results

Full size table

The Friedman statistical value with degree of freedom seven was evaluated to be 15.421, which is true for α = 0.05. Additionally, a p-value of 0.031 exhibits that the results obtained are true at a 95% confidence interval. Therefore, the null hypothesis of Friedman test that states that all techniques perform the same is rejected. The eight techniques (seven meta-classifiers and LR technique) taken under consideration are significantly diverse in their performance behavior. The results, however, indicate that only one meta-classifier “Vote” out of the seven meta-classifiers performs better than the LR statistical technique therefore establishing the LR technique to be also effective for the prediction of version to version source code file reusability.

6 Threats to Validity

There can be potential threats to this study like any other empirical study. Our research work focuses only on the estimation of prediction performance of the selected meta-classifiers which was performed using the statistical and machine learning methods, with the metrics as the independent variables and the reuse parameter of Yes/No as the dependent variable. Thusly, the threat to internal validity exists since this research work does not signify to establish change-outcome. The most critical threat to the external validity of our study is that our results may not generalize to a similar sample or new research environment and could be constrained to the surveyed systems, i.e., the four selected JFreeChart versions. In order to ascertain the generalizability of the classification inferences made in our study, the predictive performance of the selected techniques on similar datasets developed using other programming languages need to be evaluated. So this threat exists in the study. The construct validity makes sure that independent and dependent variables properly represent the concepts. The metric data was collected via mature source code analysis and clone detection tools. Although we make no declarations with respect to the accuracy of these software, we suppose that the tools collect the data reliably as they are in fact being employed effectively in practice [24, 25], therefore decreasing the threat to construct validity.

7 Conclusion and Future Work

This research work constitutes the evaluation of seven meta-classification techniques to predict version to version source code file reusability. The empirical validation was done on four datasets created using four consecutively released versions of JFreeChart. To the best of the author’s knowledge, no study till now has made use of meta-classifiers for reusability prediction. We further compared the performance of the selected meta-classifiers with the statistical LR technique and ranked the performances of the various algorithms using the Friedman statistical test.

Following are chief conclusions made from this analysis:

All the selected meta-classifiers, except for the Stacking technique, showed reasonably good performances(Accuracy and AUC values ranging from 75.8 to 90% and 0.64 to 0.88) over the four selected versions of the JFreeChart software and did not depict extremely divergent outcomes as far as the values of the performance measures over the four versions were concerned. The LR technique also showed comparable performance to the meta-classifiers (Accuracy and AUC values ranging from 73.3 to 83.9% and 0.79 to 0.87) for the prediction of version to version source code reusability over the four selected datasets.

Moreover, with the results of the Friedman test, it was statistically clarified, at a confidence interval of 95%, that only one meta-classifier –“Vote” significantly outperforms the LR technique in giving the best performance results, followed by the LR technique. Rest all of the meta-classifiers perform poorly as compared to the LR technique, thus establishing that the LR technique and six classifiers(AdaBoost, Bagging, Filtered, Multi-class (M-Class), Random Sub Space (RSS), and Voting) out of the seven selected meta-classifiers are indeed effective for developing fitting and authentic version to version source code reusability prediction models.

Future work may involve the replication of the selected meta-classifiers on other similar software datasets for the purpose of yielding generalized results. Application of other prediction models like deep learning could also be done to establish their pertinence in the development of software reusability models.

Notes

1.
http://www.jfree.org/jfreechart/.
2.
http://stan4j.com.
3.
JHawk Metrics Tool, http://www.virtualmachinery.com/jhawkprod.htm.
4.
The ROC curve is a plot of sensitivity (on the y-axis) and 1-specificity (on the x-axis). Several cutoff points between 0 and 1 are chosen during the creation of a ROC curve. The sensitivity of the model is calibrated as the percentage of the reused files that were predicted accurately. The specificity of the model is calibrated as the percentage of the non-reused files that were predicted accurately. High values are desired for both sensitivity and specificity.
5.
The results indicated that stacking achieved a sensitivity value of 0 and specificity value of 1 for all the four selected versions, thus indicating that it is not able to classify the reused classes at all and predicts all the classes included in the dataset to be “not reused”.

References

Mojica, I.J., Adams, B., Nagappan, M., Dienst, S., Berger, T., Hassan, A.E.: A large-scale empirical study on software reuse in mobile apps. IEEE Softw. 31(2), 78–86 (2014)
Article Google Scholar
Mockus, A.: Large-scale code reuse in open source software. In: First International Workshop on Emerging Trends in FLOSS Research and Development, pp. 1–7. IEEE (2007)
Google Scholar
Washizaki, H., Koike, T., Namiki, R., Tanabe, H.: Reusability metrics for program source code written in C language and their evaluation. Product-Focused Software Process Improvement. Springer, 89–103 (2012)
Chapter Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical machine learning tools and techniques. 4^th edn. Morgan Kaufmann (2016)
Google Scholar
Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. Informatica 31, 249–268 (2007)
MathSciNet MATH Google Scholar
Peng, C.Y.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Education. Res. 96(1), 3–14 (2002)
Article Google Scholar
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2012)
Article Google Scholar
Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International conference on Software engineering. pp. 181–190. ACM (2008)
Google Scholar
Mijač, M., Stapić, Z.: Reusability metrics of software components: survey. In: Proceedings of the 26th Central European Conference on Information and Intelligent Systems. pp. 221–231 (2015)
Google Scholar
Hudaib, A., Huneiti, A., Othman, I.: Software Reusability classification and predication using self-organizing map (SOM). Commun. Netw. 8, 179–192 (2016)
Article Google Scholar
Padhy, N., Satapathy, S., Singh, R.P.: Utility of an Object Oriented Reusability Metrics and Estimation Complexity. Indian J. Sci. Technol. 8(1), 1–9 (2017)
Google Scholar
Devine, T., Goseva-Popstojanova, K., Krishnan, S., Lutz, R.R.: Assessment and cross-product prediction of software product line quality: accounting for reuse across products, over multiple releases. Automat. Softw. Eng. 23(2), 253–302 (2016)
Article Google Scholar
Manhas, S., Sandhu, P.S., Chopra, V., Neeru, N.: Identification of reusable software modules in function oriented software systems using neural network based technique. World Acad. Sci. Eng. Technol. 43, 823–827 (2010)
Google Scholar
Sharma, A., Grover, P.S., Kumar, R.: Reusability assessment for software components. ACM SIGSOFT Softw. Eng. Notes 34(2), 1–6 (2009)
Article Google Scholar
Shri, A., Sandhu, P.S., Gupta, V., Anand, S.: Prediction of reusability of object oriented software systems using clustering approach. World Acad. Sci. Eng. Technol. 43, 853–856 (2010)
Google Scholar
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Article Google Scholar
Martin, R.C.: Agile Software Development: Principles, Patterns, and Practices. Prentice Hall (2002)
Google Scholar
Lorenz, M., Kidd, J.: Object-oriented software metrics: a practical guide. Prentice-Hall, Inc. (1994)
Google Scholar
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)
Article MathSciNet Google Scholar
ACNP Software. http://www.anticutandpaste.com/antiplagiarist/. Accessed 15 May 2017
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1), 1–39 (2010)
Article MathSciNet Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai 14(2), 1137–1145 (1995)
Google Scholar
Gibbons, J.D., Chakraborti, S.: Nonparametric Statistical Inference. International Encyclopedia of Statistical Science. Springer, Berlin Heidelberg (2011)
Book Google Scholar
Buchgeher, G., Weinreich, R.: Integrated software architecture management and validation. In: Proceedings of the 3rd International Conference on Software Engineering Advances. pp. 427–436. IEEE (2008)
Google Scholar
Scandariato, R., Walden, J.: Predicting vulnerable classes in an Android application. In: Proceedings of the 4th international workshop on Security measurements and metrics. pp. 11–16. ACM (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Thapar University, Patiala, India
Loveleen Kaur & Ashutosh Mishra

Authors

Loveleen Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loveleen Kaur .

Editor information

Editors and Affiliations

Department of Computer Application, RCC Institute of Information Technology, Kolkata, West Bengal, India
Siddhartha Bhattacharyya
Department of Computer Science and Engineering, University of Calcutta, Kolkata, West Bengal, India
Nabendu Chaki
Department of Computer Science and Engineering, Sikkim Manipal Institute of Technology, Majitar, Rangpo, Sikkim, India
Debanjan Konar
Department of Computer Science and Engineering, Sikkim Manipal Institute of Technology, Majitar, Rangpo, Sikkim, India
Udit Kr. Chakraborty
Department of Computer Science and Engineering, Sikkim Manipal Institute of Technology, Majitar, Rangpo, Sikkim, India
Chingtham Tejbanta Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaur, L., Mishra, A. (2018). An Empirical Analysis for Predicting Source Code File Reusability Using Meta-Classification Algorithms. In: Bhattacharyya, S., Chaki, N., Konar, D., Chakraborty, U., Singh, C. (eds) Advanced Computational and Communication Paradigms. Advances in Intelligent Systems and Computing, vol 706. Springer, Singapore. https://doi.org/10.1007/978-981-10-8237-5_48

Download citation

DOI: https://doi.org/10.1007/978-981-10-8237-5_48
Published: 21 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8236-8
Online ISBN: 978-981-10-8237-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Empirical Analysis for Predicting Source Code File Reusability Using Meta-Classification Algorithms

Abstract

Similar content being viewed by others

Measuring the effect of clone refactoring on the size of unit test cases in object-oriented software: an empirical study

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Combining Clustering and Classification for Software Quality Evaluation

Keywords

1 Introduction

2 Related Work

3 Research Background

3.1 Empirical Data Collection

3.2 Dependent and Independent Variables

4 Research Methodology

4.1 Meta-Classification Techniques Employed

4.2 Performance Evaluation Measures

5 Empirical Analysis

5.1 Model Evaluation Results

6 Threats to Validity

7 Conclusion and Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Empirical Analysis for Predicting Source Code File Reusability Using Meta-Classification Algorithms

Abstract

Similar content being viewed by others

Measuring the effect of clone refactoring on the size of unit test cases in object-oriented software: an empirical study

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Combining Clustering and Classification for Software Quality Evaluation

Keywords

1 Introduction

2 Related Work

3 Research Background

3.1 Empirical Data Collection

3.2 Dependent and Independent Variables

4 Research Methodology

4.1 Meta-Classification Techniques Employed

4.2 Performance Evaluation Measures

5 Empirical Analysis

5.1 Model Evaluation Results

6 Threats to Validity

7 Conclusion and Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation