Survey Based Analysis of Effect of Code Clones on Software Quality

Code clones are similar code portions. Cloning is a process of duplicating code segments by copy–paste activities that is a common activity in software development. It is believed that the presence of code clone is one of the factors that have a great impact on software quality attributes. In literature many techniques have been proposed to detect and eliminate code clones on this basis. Various research efforts are being performed to reduce somber problems caused by code clones. This paper presents the study of the effect of code clones on software quality. In this paper an industrial study is presented to understand impact of code clones on a software system from software developer’s point of view. This study involves a questionnaire survey and collects enough data about the reasons behind the cloning activity and the impact of code clones on a software system. The results of the study show that clones have a harmful effect on the system. This study also suggests that maintenance is the mostly effected software quality attribute. Keywords—Code Clones, Abstract Syntax Tree (AST), Program Dependence Graph (PDG).

INTRODUCTION Code clones are similar or identical code portions in software programs. Generally, code clones are introduced by copy-paste programming activity in software development [1]. Such programming practices are common, very easy and can reduce programming effort as well as time as they reuse on hand code rather than rewriting related code from scratch. But this can cause somber problems in long run to a software system. It is said that code clones have a negative impact on software development and quality. For example clones may increase bug occurrence, if an expression of duplicate code is changed for fixing bugs or adding new features, its correspondents must be changed simultaneously, if the corresponding duplicate code is not changed, new bugs are introduced to them. It is agreed that clones exist in software systems and they must be detected to maintain, manage or remove them. It is believed that code clones have a negative impact on software quality attributes. One of the attributes is maintenance. Software maintenance is the most expensive phase in the entire software development process. It is reported that a lot of money is spent in the maintenance of existing systems of a software development company [2].
The renowned researchers have put efforts for improving and resolving the problems caused by code clones. Many techniques have been proposed to detect, manage and remove code clones [3]. The various clone detection techniques Abstract Syntax Tree (AST) [4], Program Dependence Graph (PDG) [5], code metrics [6] and program tokens [7] [8] provide an automated assistance to identity code clones in source code. Visualization and query-based techniques [9] [10] have been proposed to manage and inspect detected code clones in large software systems. It has also been proposed that code clones can also be removed through refactoring [11]. Many clone detection tools have been developed which use these techniques for automatic detection of clones. All these tools have possible advantages and disadvantages and more hybrid approaches are needed to overcome the limitations of these tools [12]. In spite of the great success of these clone detection and refactoring techniques, little work has been done in understanding why and in which situations developers introduce clones into a software system. Clones have a huge impact on software quality attributes. But whether the clones in a software system are harmful or not is still an open question. In this study, it is tried to find out the answers to these questions. This work involves a survey through a questionnaire presented to the people working in software industry. The experience of professional people working in software industry with clones how they define a clone and impact of clones over software quality attributes is introduced in this paper. There are some studies [1] [11] [13] [14] on finding intentions and reasons behind code cloning practices. However, these studies are based on the personal experience of researchers with no support from industrial studies. These study focus on the introduction of clones into the system, with little study concerning impact of clones on software quality attributes. There are still many research questions that remain unanswered. This research work focuses on the following questions related to code clones: 5. Effect of clones on performance, complexity, scalability and maintenance of a software system. To find the answers of the above questions a questionnaire is given to the industry professionals. Based on their answers in the questionnaire better results are found. In this study enough data is collected from industrial perspective, to answer the above discussed questions regarding code cloning practices in software development.

II.
CATEGORICAL VARIABLES FREQUENCY TABLE In order to understand the code cloning practice in software industry, a survey is conducted through a questionnaire. The questionnaire is sent to software professionals working in industry to understand their perception about code cloning. The survey consisted of 34 pre-defined multiple choice questions, which were designed to understand the participant's view about clone definition, why they copypaste code, impact of clones on software quality and which of the software quality attribute is mostly effected by code clones. Table 1 shows a summary of survey questions. The responses of the survey are analyzed using SPSS-19. In total 40 engineers participated in the survey. The majority of the participants have experience of 6-10 years. 70% of them were developers, 20% of them were test engineers, 7.5% of them were maintenance engineers and 2.5% of them were design engineers. The participants had different experience in software industry with C, C++, C# and Java programming languages. The response frequency and percentage for each categorical variable is shown in table 2.

1.
How do you define a "code clone"?

2.
How often do you perform copy-paste or cloning activities? 3.
What according to you are the reasons for cloning?

4.
How do you agree that clones have a negative impact on software quality?

5.
Which of the following software attribute is mostly effected by code clones?

6.
How do you agree that clones reduce the performance of the software system?

7.
How do you agree that clones make complex the maintenance of a software system?

8.
How often do clones have negative impact on performance of software?

9.
What percentage of negative impact do clones have on the performance of software system?

10.
How do you agree that clones increase the size of a software program?

11.
How do you agree difficult to change code is reason for code cloning?

12.
How do you agree risk avoidance is reason for code cloning?

13.
How do you agree that clones increase the size of software program?

14.
How do you agree that clones effect maintenance effort and cost?

15.
What percentage of negative impact do clones have on the maintenance of software system? III.

SURVEY BASED ANALYSIS OF CODE CLONES
Code cloning is an active research area since 1980s. In literature large amount of research on software code clone has been carried out that mainly focus automatic techniques to identify, detect, manage and eliminate code clones. In spite of the promising results of these clone detection techniques, code clone still remains a big challenge to quality of a software system. Most of the studies available have examined open-source systems and focused on code analysis and did not consider reasons and intentions for the introduction of code clones. Furthermore, these studies do not analyze information from developer and industrial perspectives. In this concern this study conducts a survey based analysis to understand views of software developer regarding the effect of code clones on software quality. This research work is based on five questions as discussed in section 1. The results of the industry professionals are analyzed based on these questions.

A. What is a Code Clone?
Existing literature says that there is still vagueness in the definition of clones. This study finds the opinion of software developers about the definition of clones. It is found that developers have different views about the definition of code clones. 40% of participant define clone as duplicate code, 10% define clone as copy of original code, 37.5% define clone as copy-pasted code and 12.5% define it as similar code. Table 3 shows the frequencies and percentages for definition of code clones. It can be seen that majority of the developers called clone as duplicate code and copy-paste code. It is found that developers have different opinions regarding the definition of code clones and there is still an ambiguity in the definition. Fig. 1 shows the response for the definition of code clones.

B. Why and how often developers performs cloning activities?
The other objective of survey is to understand the reasons for cloning practices in software industry. The responses of developers suggest that there are different reasons for code cloning practice. It is found that risk avoidance is mostly the motivation of introducing clones. The developers try to avoid the risk of making changes to the existing system as they are afraid that this may crack the system. The other reason for introduction of code clones is time limitation, the pressure of submitting a project in time force a developer to reuse the exiting code that ultimately leads to the clone. The other three reasons of code cloning are difficult to change existing code, unaware of harmfulness of clones and skill limitation. By the industry developers finds it difficult to change the existing code so they keep on reusing the code, they copy-paste code in the initial phase of project development as they are unaware of its harmful consequences. Due to less knowledge and limited skills of developers people try to find the available code for problem that leads to the introduction of code clones in a system. Table 4 summarizes response percentage for the reasons of code clones. 47.5% of developer's claim that risk avoidance as the reason for cloning, 27.5% strongly agree and 60% agree for time limitation as a reason, 55% state unawareness of harmfulness of clones as the reason of cloning. The satisfaction level of survey participants for the reasons of code clones is shown in Fig. 2. Out of all five reasons risk avoidance and skill limitation are the two most important reasons.
This study also finds that how often does software developers copy-paste. In table 5 frequencies of response of how often developers perform cloning activity is shown. The responses of participants suggest that cloning is a frequent activity from developer's point of view. 17.5% of developer's state that they often copy-paste code, 60% state that they sometimes copy-paste code, 20% of developer's say that they rarely do so and none of the participants said that they never copy-paste code. Fig. 3 depicts percentage of how often developers perform cloning.

C. Whether Clones are Harmful or Not?
In order to answer the question whether code clones are harmful are not, certain questions were provided in the questionnaire concerned with the impact of clones on the software. This study finds the level of satisfaction of the participants for the options provided with the questions and it is checked by providing the scale from strongly agree to strongly disagree. Table 6 shows the responses of whether clones are harmful or not. It can be noticed that 27.5% of software developer's strongly agree that clones have negative impact on software quality, 67.5% of the participant's state that they agree for the statement while as 5% participants claim that they can't say anything about the statement but none of the participant state that they disagree with the statement. The response of the participants shows that clones are harmful for a system. In view of finding how clones are harmful to a software system the questionnaire contained related questions such as mostly effected software quality attribute, impact of clones on performance of a software system, is scalability effected by clones, does clones increase complexity of software system and impact of code clones on the maintenance of the software system.
In this study it is also found that how often clones have negative effect on a software system. Table 6 also shows how often clones have harmful effect on software system. 50% responses suggest clones sometimes negatively affect a software system, 32.5% participants claim for frequent negative impact of clones, 10% suggest seldom while as 7.5% state that it depends on the number of clones present in a system. It can be noted that clones often have a negative impact on a software system. One common statement among the developers is that they all are agree that clones are harmful for the software system. But, it depends on the type of system, environment and situations. Fig. 4 shows whether clones are harmful or not and how often.

D. Which of the software quality attribute is mostly effected by clones?
Although the researchers have found that clones have a negative impact on maintenance, there are other software attributes that are effected by clones. In order to find out which of the software quality attribute is mostly effected from developer's point of view a few related question were provided in the questionnaire. Based on existing and previous study of code clones four software quality attributes are chosen that are effected by clones. . The four chosen software quality attributes that are most significantly related with cloning activity are performance, scalability, complexity and maintenance. Table 7 shows the frequency of mostly effected software attribute. It can be seen that the professionals working in industry state that maintenance of software system is mostly affect by software system and claimed by 52.5% participants. 17.5% claim that performance is affected mostly, 5% state complexity while as 25% participants claim that scalability is mostly effected by code clones. Fig. 5 shows percentage of mostly effected software attribute. Out of the four software quality attributes it is found that maintenance and scalability are the two most significant attributes effected by code clones. Thus it is suggested that necessary measures should be taken at the inception of a project to prevent from the serious problems caused by code clones.

E. Effect of clones on performance, complexity, scalability and maintenance of a software system
This section presents the impact of code clones on performance, complexity, scalability and maintenance software quality attributes. The impact ratio of clones on quality attributes is also presented in this section. Table 8 shows response frequency and percentage of effect of clones on performance, complexity, scalability and maintenance. Although maintenance is effected by code clones mostly but the other software attributes are also affected. 72.5% participant agree with the statement that clones reduce the performance of the software system, 12.5% participant strongly agree while as only 5% of people disagree with the statement. In response of the statement "clones increase complexity" it is found that 50% of people disagree with the statement and only 32% agree. Thus it can be concluded that clones also have negative impact on complexity. It is also found that clones have a great impact on the scalability of system. In response of the statement "clones increase size" of software program, it is identified that 55% of people strongly agree and 40% are agree and just 5% disagree with the statement. The response of the statement "clones have negative impact on maintenance" suggest that clones have huge harmful affect on maintenance as 37.5% strongly agree and 62.5 % agree with the statement. Fig. 6 shows effect of clones on performance, complexity, scalability and maintenance. Table 8 shows that majority of the participants agree with the statements. This implies that clones have a negative impact on the quality attributes. Table 9 shows how often clones have a negative impact on software attributes. The responses suggest that the complexity is the only attribute that is seldom affected by code clones as stated by 52.5% of participants. 37.5% of participants state often and 50% say that performance is sometimes effected by code clones. Only 2.5% state that performance is never effected by clones. For scalability 50% state often and 47.5% state sometimes as effected by clones but none of the participants says never. 65% participants' claim that maintenance is often effected and 35% claim that it is effected sometimes. The response percentage of how often clones effect quality attributes is shown in Fig. 7.   scalability and maintenance attributes. 5% of the participants claim that clones have 61-80% impact on scalability and 2.5% state for the same percentage of effect on maintenance attribute of software system. The responses of the survey suggests that complexity is the only attribute that is least effected by code clones. Fig. 8 shows how much percentage of software attributes are effected by clones. Fig. 7: How often clones effect quality attributes.  Since the results found in this study are the questionnaire based responses of the professionals, therefore it may be a threat to the validity of the results. Also some participants might have hide real thoughts due to personal and organizational reasons, participants may have misunderstood certain questions. The analysis of study may also be biased because of the incomplete knowledge about the background of the participants.

IV. CONCLUSION
This study presented a survey based analysis of effect of code clones on software quality. The survey was conducted in software industry using a questionnaire to find out the answers of some significant research questions. The questionnaire was developed based on the previous study and the existing literature of code clones. The response of the survey is analyzed using the SPSS-19 software package. A good amount of empirical data was collected and based on the analysis of data results were presented. It is found that definition of clones is ambiguous. Risk avoidance was found to be the most significant reason of code cloning. This work suggests that clones have a harmful effect on software quality attribute.