Improvised Analogy Based Software Cost Estimation with Ant Colony Optimization

The aim of this study is to provide an efficient methodology in estimating project development cost using analogy. Cost estimation is one of the greatest challenges in software industry to be successful enough in delivering a project within the schedule and with quality. In most cases, the delivered product either loses out on the quality or the expected timeline, owing to improper and imprecise estimation of the project cost. Deriving near accurate project cost could be done using analogy, wherein previous project data set is manipulated to arrive at the accurate cost for the current project. Ant Colony Optimization (ACO) technique implemented over analogy provides a better solution to overcome the challenges faced during cost estimation. In our study, we follow a three step methodology, based on ACO to arrive at the project development cost from promised datasets. Firstly, we extract the matching projects from data set based on the nearest values of a parameter. Secondly, we identify and group projects based on sizing. Thirdly, we improve similarity measures to match our project against those in data set.


INTRODUCTION
The practice of effort estimation is critical during software development.Without accurate estimates, project managers cannot set realistic goals for software delivery dates nor can they staff their projects in an efficient and optimal way.Software effort estimation can be done through two major approaches: estimates based on formal models such as the constructive cost model and function points or estimates based on analogy where the process of generating an estimate is based largely on past experience, expert judgments, Parkinson's law.Constructive Cost Model is an algorithmic software cost estimation model which was developed by Barry (1981).It uses a basic regression formula with parameters that are derived from historical project data.Limitations of COCOMO are that it ignores hardware issues and personnel turnover levels.Functional Point Analysis is an ISO recognized method to measure the functional size of an information system.It was developed by Albrecht (1979).The limitation of functional point is that it is performed after the creation of design specifications.The oldest metric for software projects is that of "Lines of Code" (LOC).This metric was first introduced in 1960 and was used for economic, productivity and quality studies.The limitation is lack of accountability and developer experience, Putnam's 78, SLIM is the first algorithmic cost model.It is based on the Norden/Rayleigh function and generally known as a macro estimation model (for large projects).Limitation is that estimates are extremely sensitive to the technology factor and not suitable for small projects.Estimation by analogy means creating accurate estimation for the proposed project by comparing the proposed project to similar projects.Lots of new research is being done on analogy, exploiting the essential assumption of Analog-Based effort Estimation (Kocaguneli et al., 2012) in this study proposed an approach to design TEAK using an easy path principle in order to avoid the computational cost of tools.Visual comparison of Software Cost Estimation model by regression error characteristic Analysis (Mittas and Angelis, 2010) in this study introduced (REC) Regression Error Characteristic which is a powerful visualization tool having interesting geometrical properties in order to easily compare and validate the different prediction models.The adjusted Analogy-Based Software effort Estimation based on Similarity (Chiu and Huang, 2007) in their article proposed an estimation model by adopting GA method used to adjust an effort based on similarity distance.Providing Statistical Inference to Analogy-Based Software cost estimation (Keung et al., 2008).In this study used the strength of correlation between the distance matrix of project features and the distance matrix of known effort values of data.Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data (Kocaguneli et al., 2013), In this study Software Effort Estimation data can be reduced to small essential

METHODOLOGY
ACO is a methodology involving computing that exhibits an ability to gain knowledge and deal with new situations, such that the system is perceived to own one or more attributes of reason, such as generalization, finding, association and abstraction.Particle Swarm Optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality.ACO is a technique for solving computational problems by providing the approach to resolve a problem by the shortest path.Our approach is a member of the ACO family, which is in swarm intelligence methods and consist of metaheuristic optimization.The aim of the first algorithm is to search for an optimal path in a graph, based on the behavior of ant travels in the path from the source to the destination.ACO algorithm has been applied on many combinatorial optimization problems.
In natural world, ants moves randomly, to find the food and return to their colony while laying down pheromone trails.Other ants find such a way, based on the pheromone concentration deposited on the path.They are likely not to keep travelling at random instead follow the trail, to reach the food destination.The equation one represents that to find the shortest path: However, when the pheromone trail starts to evaporate, the path reduces its attractive strength by the ant.The more time it takes for an ant to travel down the path and back again, the more time the pheromones have to evaporate.In the shorter path, the pheromone density becomes higher than the longer ones.If there were no evaporation at all, the paths chosen by the first ants would tend to be excessively attractive to the following ones.The Eq. ( 2) and (3) helped to find the pheromone concentration which is deposited on the path can be calculated using the formula: where, ˠ˟˜ = *

{{
(3) TSP = Total system pheromone When one ant finds a good path from the colony to reach the food source, other ants are more likely to follow that path and positive feedback eventually leads to all the ants following a single path.The idea of ant colony algorithm is to mimic this behavior with "simulated ants" walking around the graph and obtaining the best solution.
Analogy technique has a major role in estimation in which they compare previous projects' data with the proposed project to have better solutions for all challenges encountered during software cost estimation.Analogy based cost estimation takes us through lots of ambiguities while trying to pick up and match between our project and that project in the data set.There are two different matching techniques we could choose to implement for our project, namely, probabilistic matching and sematic matching.Probabilistic matching is a statistical analysis based technique to determine the overall probability value using more than one record.Semantic matching is a technique used to identify similarity between projects based on semantics.From analysis, we find that sematic matching does not produce accurate results because of it's related to ontology matching.Probabilistic matching technique on the other hand, helps overcome human errors that occur during dataset retrieval.ACO is a technique based on probabilistic matching where statistical reports are derived/present for each data in the data set.Using this probabilistic value, we arrive at the exact value with the help of ACO.
Figure 1 explains that each dataset contains 70 projects and each project contains more number of attributes.ACO Data filter collects the requirements from requirement specification and filter the projects from data set based on two parameters: • Domain name • KDSI ACO path finder helps to group projects based on sizing.Pheromone value improves the degree of similarity measure and helps to find the nearest matching data.The proposed algorithm as follows: Step 1: Read data from dataset which are not null.
Step 2: Assign a variable for relevant parameters in the data.
Let a be effort Let b be time Let α, β and γ be a positive constant Let J be percentage of pheromone concentration Let c be the number of projects taken from dataset for our study.
Step 3: Arrive at the feasible path which is used to choose data from dataset that matches Current project using the formula from Eq. ( 1).= 182.1635 Step 4: Arrive at the percentage of pheromone concentration 'ph' using the formula from Eq. ( 2) and ( 3 We find (based on ACO technique), from Table 2 that minimum path value and maximum value for pheromone have optimal values for effort, time and people.

RESULTS AND DISCUSSION
Analogy based cost estimation is the technique used at early stages of estimation.Sometimes analogy fails at early stage of estimation itself for reasons like assumption error, measurement error and skill of estimator.Our study here helps to minimize the assumption and measurement error.Using evolution criteria we define the following three steps, complete analogy based estimation will satisfy these three steps: Step 1: Find the nearest project effort value with the help of ACO path value.
Step 2: Identify and group projects based on sizing.
Step 3: Improve similarity measure and extract the correct project from our requirements.
Step 1 and 2 helps resolve measurement error and step 3 help minimize the assumption error.
Step 1: Identify and group projects based on sizing: Grouping the project from dataset is not easy because each project contains 72 attributes and all the attributes are related to one another.Two important parameters justify grouping, namely, domain name and size of the project.We collect appropriate values for KLOC and domain from business requirement specification based In our case, where KLOC = 27, we set 127 as the upper boundary value (27+100) and 27-100 being the lower boundary value So that the output will be the represented in Table 2.
Table 2 shows that using ACO algorithm and based on requirement we have eliminated unrelated data from Table 1.Collected requirement data from business requirement specification we have to match the data from dataset and extract the correct project with path value.We identified the similar projects addition and subtraction of boundary value 100 with path value.These values displayed in Table 2 and we can find the duplicate projects with various path values easily.
Through Fig. 2 we understand and conclude that KLOC and path are directly proportional.
Step 2: Improving similarity measure and extract the correct project from our requirements: Finding the similarity measure plays on important role in analogy cost estimation.Each project contains more than 72 attributes; we can hence not find the similarity measure easily after finding the similarity projects using 72 attributes.It's very difficult to find maximum and minimum values in the projects.
Steps followed to arrive at similarity measure: 1. Read the input data 2. Retrieve the similar projects based on KLOC and domain 3.If (Input KLOC value = = similar project) { Retrieved project = choose the minimum path value row or max pheromone value row } In ACO algorithm follows the concept of when the path value is low it follows the pheromone values is high use this concept and implemented in similarity measure algorithm.If the value of KLOC matches a similar project from Table 3 then we can choose to go with the project that has minimum for pheromone.
Step 3: Finding the nearest project effort value with the help of ACO path value.
Data availability is important parameter for analogy estimation.Some estimators skip the analogy estimation at early stages because of data unavailability.Data available in projects we classified two types, first one requirement data is not available in dataset, we can't handle this situation secondly requirement is available in dataset not exactly related to that but We can handle this situation.Finding the nearest project based on close surrounding of path value related to requirement path.So we identify the related project information.

CONCLUSION
Early stage estimation plays a vital role in industry because the estimation if not accurate.Sometimes overestimation leads to lose the project at biding time or under estimation leads to project failure.In order to avoid this situational we improve the quality of analogy with the help of ACO.Our study here helps to solve common errors like measurement error and data unavailability.Using analogy In future we will minimize the estimator knowledge error and The project is going to be implemented in any of the CMM level 5 companies.Based on the expert's feedback we will minimize the errors.

Table 1
content and the simple methods are still able to perform well on the essential content, Table1, presents the data collected from the promised dataset.The original dataset contains data 106 data with 94 attributes; here we have worked out on 15 data with 4 attributes such as KLOC, people, effort and time for generating sample input and output.All the projects we implemented three estimation techniques and found various outputs.This output is not similar to one to one.The main reason is COCOMO model helped to estimate the project at early stage but functional point model is not like that.

Table 3 :
Retrieve the path and pheromone values