CodeRAnts : A recommendation method based on collaborative searching and ant colonies , applied to reusing of open source code

This paper presents CodeRAnts, a new recommendation method based on a collaborative searching technique and inspired on the ant colony metaphor. This method aims to fill the gap in the current state of the matter regarding recommender systems for software reuse, for which prior works present two problems. The first is that, recommender systems based on these works cannot learn from the collaboration of programmers and second, outcomes of assessments carried out on these systems present low precision measures and recall and in some of these systems, these metrics have not been evaluated. The work presented in this paper contributes a recommendation method, which solves these problems.


Introduction 12
This paper presents the concepts and design taken into account in CodeRAnts, which is a new recommendation method proposed to assist software engineers and computer programmers in the reuse of source code by allowing them to retrieve useful snippets of code (potentially written in any programming language) for the implementation of new software products.CodeRAnts is based on two approaches.The first is collaborative searching, which takes advantage of the similarity and repetition of queries that have been used by programmers, who are the stakeholders in the search for snippets of code.The second is the ant colony meta-phor.We consider this approach to tackle two issues.First, it pretends to solve the cold start problem; for example, a system that implements CodeRAnts can suggest snippets of code, even as it receives new queries.Secondly, it initiates the use of system of recommendations, the structure used to save the queries will have little information; therefore, it is necessary to solve the problem related with the data sparsity of the query-ranking matrix, which is used in collaborative searching.
The preliminary evaluation carried out in this work shows better values for the metrics of precision and recall than those achieved in the state of the art.These metrics are the most commonly used to evaluate recommender systems (Basu et al., 1998;Billsus and Pazzani, 1998;Sarwar et al., 2000a,b;Picault et al., 2010;Bedi and Sharma, 2012).In mathematical terms, precision (see expression 1) is the number of retrieved and relevant items, divided by the total number of retrieved items.On the other hand, recall (see expression 2) is the number of retrieved relevant items, divided by the total number of relevant items.

Motivation
According to Ricci et al. (2010), a recommender system is a set of software tools and techniques which provide suggestions of worthy items for users.These suggestions are related to several decision-making processes that are difficult when users have a large amount of optional items to choose from.In the e-commerce context, these processes are related to the buying of items such as books.Amazon's recommender system, for example, assists its users in finding books that meet their needs.
The open source software engineering context is similar to that used by e-commerce.Today, there is a substantial amount of open source code available on the World Wide Web, which is stored inside repositories and available through search engines (e.g., Koders, Krugle Sourceforge, Google code, GitHub, and CodePlex).This source code belongs to world class software products (e.g., Linux operating system kernel, JBoss application server, GNU Emacs, etc).This plethora of source code is available to be reused.In fact, software reuse is acknowledged as an important activity because it allows programmers to use preexisting core assets or artifacts rather than creating them from scratch.Indeed, Raymond (1999, p. 4) highlights the importance of software reuse: "Good programmers know what to write.Great ones know what to rewrite (and reuse)".
Moreover, computer scientists such as Mcilroy (1968), Standish (1984), Brooks (1987), Poulin et al. (1993), Boehm (1999), and Pohl and Böckle (2005), have highlighted the following advantages of reusing software: i) reducing time and costs, ii) improving the quality of software, iii) reducing amount of defects and iv) by reusing code there is a higher chance of detecting failures and fixing them.
Search engines allow programmers to find useful source code; however, some problems still remain: i) the probability that two people choose the same word to describe a concept is less than 20% (Furnas et al., 1987;Harman, 1995), ii) users who consider search engines useful for finding code are those who know how to employ the search (Bajracharya and Lopes, 2010), iii) in a study by Coyle and Smyth (2007) more than 20,000 queries were used: its results showed that, on average, Google delivered at least one useful result only 48% of the time, iv) in the domain of collaborative-based recommender systems, research indicates that the design problems of search engines are twofold: solitary nature and one-size-fits-all.(Resnick and Varian, 1997;Balabanovic and Shoham, 1997;Schafer et al., 1999;Jameson and Smyth, 2007;Smyth, 2007;Morris, 2008).
On the one hand, solitary nature means that searches take the form of an isolated interaction among the user and the search engine.Due to this drawback, search engines overlook the experience of users, which is useful for offering a more accurate result list in comparison to the others with similar preferences.On the other hand, one-size-fits-all means that several users achieve the same result list when they use the same query in spite of having different preferences.
The same researchers highlight the importance of recommender systems technology, in particular, the concepts of collaboration and user preferences, in order to cope with the above mentioned search engine design problems.Preference is information about users' needs and the collaboration concept refers to preferences supplied by a group (or community) of users.The solution proposed, consists of influencing recommendations with information learned from users' preferences and their collaboration, thereby, suggestions are guided by users' behavior rather than only the items' features.
For instance, if a user performs the following query: "I need something with four legs where I can sit down".The search engine's answer is a result list with items like: horses, tigers, chairs and tables.These objects match the user's query.A search engine replies regardless of the user's preferences and the collaboration of similar users.Even though the user selects the chair, the search engine is not able to learn the preference of the active user, and hence, the engine cannot change the relevance level of the chair for future users with the same preference.Conversely, a recommender system suggests items based on what it has learned from the users' preferences and collaboration.In this case, the suggestion of the recommender system is to use a chair.This illustrative example depicts the advantages of recommendation techniques based on collaboration, which motivated the design of CodeRAnts.

Related Work
In the context of software engineering, Robillard et al. (2010) define recommender systems as a set of software applications that provide information items, which are considered valuable to perform software engineering tasks, e.g., reusing artifacts, maintenance of software products and the identification of defects and bugs.
All prior recommender systems for software reuse have presented two important drawbacks, which represent the gap in the state of the art: i) These recommender systems are not able to learn from users' preferences and their collaboration.Therefore, these systems cannot learn to identify items which were considered useful by users in past, hence, these systems will not suggest them in the future.Consequently, these recommender systems have the same problems as search engines that are mentioned above, i.e., solitary nature and one-size-fits-all.ii) Only in Hipikat and CodeBroker were precision and recall measures were evaluated and these indicators are still far from being satisfactory.It is possible that in the other systems, these metrics were not assessed because a dataset with information about programmers retrieving source code did not exist, as in the case of other kinds of systems based on collaborative filtering, which are assessed with classical datasets such as Jester (http://www.ieor.berkely.edu/~goldberg/jester-data) and MovieLens (http://www.grouplens.org/node/73).
Table 1 presents a summary of the literature review on recommender systems for software reuse.In this table, the third requirement ( Rq3) is missing because these systems lack of the second one (Rq2), due to a recommender system, which cannot learn from preferences and collaboration of its users, shares the design problems of search engines (one-size-fits-all and solitary nature).The contribution of this work is a recommendation technique designed to fulfill all requirements described in this table.

Design of CodeRAnts
CodeRAnts is designed to be implemented in a recommender system with a proxy architecture, where this system is the proxy of a code search engine (e.g., Koders).Fig. 1 depicts a recommender system that implements the CodeRAnts method and this system may be plugged to a search engine (or other recommendation system like Strathcona).The explanation of its operation is as follows: 1) The proxy agent receives the query, , which comes from user agent, 2) the proxy agent redirects  to code search engine, 3) the code search engine retrieves a result list, , with links to code that could be useful for the user, 4) the proxy agent sends forward  to recommender systems 5) recommender systems compute a new result list, , in accordance with the recommendation technique described below (CodeRAnts), and sends  to the user agent.

CodeRAnts method
Similar to Collaborative Searching (Smyth et al., 2010), the design goal of CodeRAnts is to take advantage of similarity and repetition of queries performed by programmer communities as a source of recommendations.
Nevertheless, the collaborative searching approach has similar research challenges to that of collaborative filtering, i.e., data sparsity of the input-ranking matrix and cold start problem for queries recently used.Therefore, in order to address these drawbacks, we have taken into account the ant colony metaphor, which was successfully used by Bedi and Sharma (2010) to overcome these problems in the context of collaborative filtering, achieving good values of precision and recall through off-line assessment.
Algorithms based on this metaphor are those that reproduce the behavior of real ants in order to build better solutions, by using artificial pheromones as a means of communication among ants, which tend to lay pheromone trails while walking from their nests to the food source and vice versa.
Ants do not communicate directly with each other.These insects are guided by pheromone smell and hence, ants choose paths marked by the highest concentration of pheromones.The indirect communication among ants through pheromone trails enables them to find a shorter path between their nest and food sources.
The CodeRAnts method consists of creating a directed graph, whose vertexes represent queries performed in the past and the weight edge is based on textual similarity, the correlation, and the confidence between vertexes.
Let  = { 1 ,  2 , … ,   } be a set of queries performed by programmers in the past and  = { 1 ,  2 , … ,   }, a set of snippets of code.In the same way that the collaborative searching method proposed by Smyth et al. (2010), computes the matrix, the CodeRAnts method also computes the matrix     , such that  , corresponds to the amount of times that the snippet of code   was retrieved when the query   was used by a programmer in the past.
Similar to the technique proposed by of Bedi and Sharma (2012), the CodeRAnts method is structured in two processes.The first is performed off-line; it consists of creating the directed graph, which shall be used as a set of paths with pheromone trails for ants.The second is on-line and it is designed to generate recommendations through ant movement in order to find the goal, namely, to collect a ranking for each snippet of source code.
Table 3 shows the normalized     matrix, namely,      .
ii) the directed graph  = (, ) is created; let  be a set of vertexes and let  be a set of edges.The vertexes represent queries performed by programmers in the past, hence,  = .On the other hand, edges are links between queries.Each edge represents a path where ants have laid pheromone trails at certain time ; therefore, edge weigh is the level of pheromone track at time , that is denoted by queries    ,  (),   and   .   ,  () is computed based on similarities and confidence among vertexes.If the level of pheromone among two vertexes is equal to zero, then there is no edge between both vertexes.Before delving into the similarity between queries, it is important to clarify certain issues concerning the language used to form queries. Recall from the previous section that a system, which implements CodeRAnts, has proxy architecture and it is plugged into a search engine, thereby query language is the same as that supported by the engine.For instance, if the recommender system is plugged into Koders, the queries are formed with words from natural language and with the same syntax supported by Koders by using identifiers such as cdef, fdef, mdef, idef, and sdef that refer to names of classes, files, methods, interfaces, and structures, respectively.In this particular case, a query could be: cdef:util mdef:compress.With this query, classes whose name contains the word util and whose method contains the word compress are searched.
The similarity among queries is defined in Expression 3, where ,  ∈ ℝ,  +  = 1.These constants represent weights for balancing two similarity measures.The first,   (  ,   ) is the similarity among queries based on the number of edition operations performed (the edit distance proposed by Levenshtein (1966), (  ,   )) to transform   into   (see Expression 4).If   =   , then   (  ,   ) = 1, this is,   (  ,   ) reaches its maximum value due to the edition distance equal to cero, (  ,   ) = 0. Following with the above mentioned example, an engineer may use the following words for the query: compress, conpresser (in French), or comprimir (in Spanish).If   = ,   = , and ℎ = 5 (threshold equal to five), then   (  ,   ) = 0.4, because there are three edits to change a query into the other one: 1) conpresser → compresser (substitution of the letter n for m), 2) compresser → compresse (removal of letter r) and 3) compresse → compress (removal of letter e).Table 4 presents all computations of edit distance and   between queries from the above mentioned example.
Confidence between queries is computed using Expression 7. (  |  ) is the conditional probability of making the query   given the query   .Table 5 also shows all computations of confidence between queries in the above mentioned example.Fig. 2 depicts the directed graph created by using Expression 8, with initial pheromone paths, when  = 0, namely,    ,  (0), where  ∈ ℝ,   → 0 (i.e., k tends to be very small).(  ,   ) is a function based on similarity and confidence among queries, it is computed using Expression 9 (adapted from Bedi and Sharman, 2012).Once the directed graph  = (, ) has been created, as in the example depicted in Fig. 2, the on-line process may start.In this process, an active query vertex is selected by searching the vertex most similar to the query, , sent by a programmer.Given that this algorithm is inspired in the ant colony, for all the generated ants, there is a time to live (TTL) parameter associated with the number of iterations that the ants can explore in the graph .If the destination vertex is not found within TTL limit, each ant is removed.Due to the fact that the destination vertex is not known, or in the worst case does not exist, it is mandatory to setup a stop point (the TTL parameter) for the on-line process in order to prevent it from running indefinitely.The virtual ants' source of food consists of ranking most of the items.Hence, ants move through queries, which are similar to or probably related with the active query in order to collect their rank for those snippets of code that are not ranked for the active query.The on-line process of CodeRAnts is described in the following steps: i) Seek an active query vertex, , which is selected if it is the most similar to query, , sent by a programmer.Suppose an engineer makes a query with the word comprimir (i.e.,  = ""); by using edition distance as a method for measuring similarity between queries,  1 is the most similar vertex in the graph depicted in Fig. 3 because there is no edition distance between  and  1 due to the fact that both are exactly alike.Hereafter, for this example,  1 is the active query vertex (i.e.,  1 = ) in the graph depicted in Fig. 2.
ii) Create  amount of ants, where  is equal to number of outgoing edges from active query vertex, .In Fig. 2, if the active query vertex is  1 , then two ants are created, because this vertex has two outgoing edges.
iii) Each x-th ant selects the next vertex to be visited with the and Sharman, 2012), where   ∉ (), () is a set of vertexes which the x-th ant has visited.   denotes the amount of code snippets traced by the vertex   , which have not been ranked by the active vertex .  is the total number of code snippets that have not been ranked by the active vertex .The x-th ant will stop when all its adjacent vertexes are in the set ().The whole process will finish when all ants may not move anymore or when the ants' TTL reaches its maximum value.
Taking  1 as the active query vertex, Table 3 shows that snippets of code  1 ,  2 ,  3 ,  5 ,  6 and  7 do not have a ranking in  1 .Among the neighbors of  1 ,  2 there is a ranking for  2 and  3 has a ranking for  5 and  6 , hence, these rankings are collected.Therefore,  1 ,  3 , and  7 are still without a ranking.Thereafter, ants keep moving, by choosing  3 as the new destination vertex because the path between  1 and  3 has the greatest concentration of pheromones.This step is repeated without passing twice through the same vertex until ants reach their goal (i.e., to rank all snippets of code), or until maximum TTL is reached.It is important to clarify that when a certain snippet of code,   , is ranked by at least two neighbor vertexes, the ranking provided by the neighbors is stored, descending sorted, in accordance with trails of pheromone between the active vertex and its neighbors.For instance, when ants are on the vertex  3 , the snippet of code  1 shall be ranked by its neighbor vertexes  4 and  5 .In this example, the ranking provided by  4 is stored before the other one provided by  5 , due to the fact that trails of pheromones between  3 and  4 are stronger than the concentration of pheromones among  3 and  5 .iv) Generate suggestions through the method proposed by Resnick et al. (1994)  ).The dataset is randomly generated due to the fact that there is not a real published dataset of interaction between programmers and search engines, or programmers and recommender systems based on collaborative searching.
A search engine and programmers are simulated in order to train and test the simulated recommender system, which implements the CodeRAnts method.In the training phase, programmers randomly choose certain snippets of code by performing searches through a simulated search engine.Queries are randomly selected from a dataset and these terms appear with high frequency in the snippet of code.In this way, the programmers' knowledge for performing a code search is simulated.The queries are words from natural language, which are used to write source code.This phase aims to fill the     matrix, by recording which snippets of code are selected by programmers during the searches.
After the training phase, the simulator program performs the offline phase of the CodeRAnt method.Then, the on-line phase begins the testing phase and other simulated programmers perform source code searches through a simulated recommender system.During the training phase, simulated programmers search at most 99 snippets of code and while testing phase is performed, the other instances seek at most, 25 snippets of code, which are stored in the dataset (i.e., relevant items).In both phases, programmers perform from 5 to 10 queries in order to search a snippet of code.Each query is randomly selected from a bag-of-queries.When a programmer finds a snippet of code, it is counted as a retrieved item in order to compute precision and recall metrics.Three fourths of the set of programmers are used for training and one fourth of this set is used for testing.Assessments were performed with sets of 30, 50, 100, 500, 1000, 5000, and 10000 programmers.Other parameters considered in the assessment are:  = 0.75,  = 0.25, ℎ = 1, timeout = 10,  = 10 −2 , and  = 10 −3 .These parameters were chosen by running the simulation several times.Thus, we tuned the parameters until the best performance was achieved.The next section shows the results of the assessments with this experimental setting.

Results and discussion
Table 8 shows the results of the simulation.The average of precision and recall values is 0.53 and 0.71 respectively.These values are better than those achieved by Cubranic et al. (2005) with Hipikat, in average, 0.11 and 0.65 for precision and recall, respectively.Furthermore, the outcomes of the simulation performed on CodeRAnts are better than those observed by Ye and Fischer (2005) with CodeBroker, namely, with this system precision is not greater than 0.4, but recall reached 1.Nevertheless, although with the simulations we achieved better values than those obtained by other researchers, this assessment method is not rigorous and its outcomes are slanted, given that the above mentioned systems were evaluated through user-based assessments.Cubranic et al. (2005) evaluated Hipikat with a group of real programmers and the Eclipse source code (version 2.1).In a similar fashion, Ye and Fischer (2005) carried out experiments over CodeBroker with real programmers, but with the Java 1.1.8core library and the JGL 1.3 library.Thereby, for future studies, CodeRAnts must be assessed with the other systems using the same experimental method and setting.Additionally, CodeRAnts was assessed with a randomly generated dataset; hence, for further work we must collect a real dataset through user-based experiments.
The results of these experiments reveal that edit-distance-based similarity between queries is not useful because the best performance is achieved when the threshold parameter is equal to one.Hence, this is almost the same procedure as checking whether both queries are equal.In the future, we will assess other similarity measures (e.g., cosine distance, Euclidean distance, and etcetera).

Conclusions and directions for further work
The contributions of this study are: i) a recommendation method that can be implemented like a proxy of a code search engine or an above mentioned recommender system (e.g., Strahtcona), in order to allow them to improve their answers and recommendations for programmers; due to this, these systems could learn from users' collaboration through CodeRAnts.ii) a recommendation method which tackles the cold start problem given that a system which implements CodeRAnts can suggest snippets of code, although it receives new queries that do not belong to set , by searching other similar in this set.Moreover, through the ant colony technique, a system which implements CodeRAnts can suggest snippets of code, despite the fact that the matrix      does not have a ranking for these snippets given certain queries through the search performed by ants, through possibly related or correlated queries, and collecting ranking for these snippets, iii) a recommendation method, that in a simulated environment, has better preliminary values of precision and recall than prior systems designed for software reuse; however, it is important to highlight that the results of the simulations are not definitive evidence of the quality of recommendation provided through of CodeRAnts method because in the simulation settings the bag-of-terms and the dataset are randomlygenerate; moreover, CodeRAnts was not evaluated with the same experimental method and setting carried out in the other systems by other researchers.
For future studies the following are proposed: i) collect a real dataset through user-based experiments, ii) carry out the evaluation of CodeRAnts with the other systems with the same experimental method and setting and iii) test other similar measures (e.g., cosine distance, Euclidean distance, etc.).

Figure 1 .
Figure 1.CodeRAnts method implemented in a recommender system

Table 1 . Comparison of recommender systems for software reuse according to the requirements described at the bottom of this table. In this table, X means a fulfilled requirement, and NA means a re- quirement that has not been assessed.
Rq1: The recommender system can reuse code stored inside open source repositories.Rq2: The recommender system can learn from preferences and collaboration of its users.Rq3: Results of assessment of the recommender system with respect to the values of precision and recall are acceptable.Rq4: The recommender system lacks a cold start problem.

Table 2 .
Instance of     matrix.

Table 3 .
Instance of the matrix of rankings

Table 4 . Computation of edit distance
The edit-distance-based measure is the extent of the typographical similarity between two queries.This has been considered in this work because sometimes, typographical mistakes are included within the users' query.Nevertheless, this measure does not consider the semantic similarity between two queries, e.g.,   = ℎ, and   = ℎ.Therefore, (  ,   ) is also based on the correlation between the rankings associated with both queries.Let   (  ,   ) be the similarity based on correlation coefficient between queries; it is calculated using Expression 5.    ,  is the correlation coefficient between the row vectors    and    , which is defined in Expression 6, where    ,  ∈ ℝ, and −1 ≤    ,  ≤ 1,    ̅̅̅̅̅ and    represent the average and the standard deviation of the row vector    , respectively.If the value of    ,  trends to one, it means that both queries are correlated, but if the value is close to zero, there is no correlation, otherwise there is an inverse correlation.Table 4 shows all computations of   between queries, taking into account the matrix of rankings ,   ,   , and ,

Table 6 . Computation of initial level of pheromones in the directed graph in Fig 3
, with the expression 10, where    ,  and    ,  represent the rankings of vertexes   and   for the snippet of code   , respectively.̅and ̅   denote the average rankings of vertexes   and   , respectively._ is the number of first neighbors of the vertex   with the biggest trail of pheromones.Finally, update    ,  () with Expression 11, where  is the evaporation rate of pheromones and  is computed with Expression 12, where  ,  = 1  ,  , and  ,  represents the distance from vertex  to vertex   .Table 7 and Fig. 3 depict the update to the pheromone graph, when  = 0.01.   ,  =    ̅̅̅̅ + ∑    ,  (   ,  − ̅   )    ,  () = (1 − )   ,  ( − 1) +    ,  ( − 1)

Table 7 . Update of pheromone graph in Fig. 6, when
Shani and Gunawardana (2010)present three methods to evaluate recommender systems, namely, off-line, user studies and on-line.In this work, CodeRAnts was evaluated through the first method with a simulation program written in Java.The program generates a bag-of-terms by assigning random values to the matrix termsCode, which holds the frequency of each term in the source code(i.e., if  termsCode[3][4]is equal to 15, it means that the term  3 appears fifteen times in the snippet of code  4