A Fuzzy AHP-based approach for prioritization of cost overhead factors in agile software development

.


Introduction
Agile software development (ASD) is a widely-used practical approach for effective project management and software development.The underlying intention is to satisfy the client's needs through continuous testing, frequent delivery, and requirement change management [1].In other words, ASD intends to energize a continuous improvement process, which helps in enhancing the productivity of the involved development members.Agile methods have attained the wide attention of the software organization after the introduction of the agile manifesto notion in 2001 [2].Nowadays, ASD has become a well-known development process mainly because of the effective handling of changes in the elicited requirements of a software project.Due to this, ASD has been widely adopted by software organizations rather than traditional methodologies, such as waterfall and incremental, which lack in accommodating the changing requirements.In the traditional development approach, the project manager designs the project plans for the identified tasks.On the other hand, the entire team is responsible for completing the project in the ASD context [3].
Cost estimation plays a significant role in the ASD domain.It is because the project's success and failure heavily depend on cost factors such as the software project's effort, cost, and quality.Thus, accurate cost estimation remains a core challenge due to continuous changes in the requirements faced by software organizations [4].To perform accurate cost estimation, it is essential to know the factors and challenges that greatly influence the project cost.Agile accepts the changing requirements of the projects, so it is difficult to acquire precise effort estimation.Effort estimates needed to be adjusted in each sprint of ASD to ensure delivery within the time frame of a boxed release [5].Accurate and precise effort estimations significantly contribute to the success of a software development project.However, incorrect estimations can negatively impact the company's sales and marketing, which results in significant financial losses [6].
In the literature, several systematic literature reviews (SRLs) have been conducted that mentioned different techniques and cost factors for cost estimation purposes.The most recent SLR was reported by Fernández-Diego et al. [7] in 2020 and compared their result with the original study of Usman et al. [8].The authors identified various cost factors, evaluation measures, effort estimation methods, and effort predictors and investigated the agile methods.They used a theme analysis method to investigate the cost issues.However, they lacked validation and prioritization of the identified cost factors.In contrast, Dantas et al. [9] conducted an SLR to address various cost factors, evaluation measures, effort predictors, dataset characteristics, and agile effort estimation methods.However, further research needs to be conducted to investigate the identified cost factors and estimation techniques, especially in the context of ASD [10].
The empirical study [11] was conducted in agile global software development.The authors investigated different techniques and cost factors in the ASD context.Conversely, a multi-vocal study [12] was conducted in which the researchers focused on identifying different challenges and success factors.However, they need to conduct validation of the identified factors from the practitioners.Furthermore, only one questionnaire-based survey [6] has been conducted on the validation of the identified cost factors.The authors received 65 responses, out of which most respondents were developers rather than the agile-practitioners.To the best of our knowledge, none of the reported work has focused on validating and prioritizing the cost overhead factors contributing to the software project's success.In comparison to the work [6], we intend to identify an additional set of cost overhead factors and validate them from the practitioners (i.e., 154 Agile project managers) through a questionnaire-based research protocol.
Cost estimation remains a core challenge in the ASD context.Thus, there is a need to provide an up-to-date view of the factors impacting the cost overhead of agile-based projects.By doing this, the agile project managers would have a clear idea regarding the critical factors and key areas that cause costs overhead in the ASD context [13].Fig. 1 provides a general overview of the leading causes of overhead costs in the context of ASD.
In the past few decades, various effort estimation techniques have been proposed and further classified as algorithmic, nonalgorithmic, expert-based, or machine learning-based [14].However, according to the best of our knowledge, the current stateof-the-art lack in considering additional cost factors and prioritization process to rank the identified factors that can improve ASD estimation accuracy [7][8][9][10].Motivated by this, the current work aims to identify various factors from the literature and validate them through a questionnaire-based survey with agile project managers.To accomplish this, we conduct an SLR and an empirical approach to answer the formulated research questions.Consequently, we found additional cost overhead factors compared to the existing studies.Moreover, we propose a practical quantitative framework for prioritizing the cost overhead factors in the context of ASD.The proposed framework employs a Fuzzy-Analytic Hierarchy Process (F-AHP) technique to rank the identified factors by considering the opinions of multiple perspectives.
In the literature, many Multi-Criterion Decision Making (MCDM) techniques have been proposed, including TOPSIS [15], AHP [16], and F-AHP [17].Although, the researchers have considered the above-mentioned MCDM contexts for other research contexts.However, none of the published studies have employed the MCDM techniques for ranking cost overhead factors in the ASD context.The AHP technique needs to consider the uncertainty associated with mapping cost factors.In contrast, F-AHP effectively rectifies the subjectivity and imprecision of the AHP approach, thereby improving the decision-making process.Furthermore, F-AHP is effective in solving weightage-oriented hierarchical problems.Due to the advantages above-mentioned, we adopt the F-AHP technique to prioritize the identified pool of cost factors in the agile development context.The current work classifies and prioritizes the cost overhead factors using the F-AHP technique and develops a thematic taxonomy of cost factors and their categories.
The proposed quantitative framework supports the prioritization of cost overhead factors grounded on (4Ps) people, project, process, and product.The implementation results provide a complete list of prioritized cost overhead factors that would assist agile practitioners during the cost estimation process in the ASD context.In our prior study [18], we conducted an SLR to identify critical cost factors and effort estimation techniques in the context of ASD.However, the preliminary results of the conducted study found a limited set of cost factors and estimation techniques by considering the published literature till 2021.Furthermore, the performed SLR lacks consulting the practitioners to validate the identified cost factors in ASD.Moreover, the study lacks in ranking the cost factors useful for agile practitioners during the cost estimation process.Additionally, the study lacks in developing a thematic taxonomy of cost factors and their main classes.Thus, the following are the main research contributions (RC) of this study:  The remaining part of this article is structured as follows: Section 2 discusses the related work.Section 3 provides the details of the adopted research methodology.The designed questionnairebased survey and statistical analysis are mentioned in Section 4. Section 5 presents the proposed framework.The results and analysis are discussed in Section 6. Section 7 presents a discussion of the attained results, while implications and future work are described in Section 8. Finally, Section 9 concludes this study.

Related work
Tanveer et al. [19] analyzed the agile methodology estimation process and accuracy with agile development groups using a case study of two observations.The authors conducted 11 interviews with the three agile groups.Furthermore, they investigated why and how estimation was completed and what factors are important for improving cost estimations.However, the absence of the developer's experience, prior team knowledge, and technical complexity negatively impacts on estimation accuracy.Later, Tanveer et al. [20] updated their previously developed Gradient boosted tree and compared their results with their predictions-based model.The authors evaluated the usefulness and effectiveness of their proposed model with an agile team effort estimation.However, the developer ranked the model as neutral due to unfamiliarity with the system.
In comparison to [19,20], Conoscenti et al. [21] identified the underlying reasons for inaccurate estimation and incorporated software project data analytics with development teams.After identifying the causes of inaccurate estimations, the authors used an interactive visualization tool highlighting actual and estimated effort levels based on the developer's feedback.However, inexperienced developers led to inaccurate estimation, which was the most prominent cause of the incorrect estimate.
Conversely, Taibi et al. [22] developed the experience factory EF to collect historical data using an agile methodology to reduce errors due to poor decisions.The authors examined the data in effort estimation and set goals for successful project improvement.They conducted an exploratory case study and compared it with seven projects.However, the four considered projects (without using the Agile EP) must improve the effort estimation accuracy over time.
Usman et al. [6] surveyed in an agile context by focusing on the estimation techniques and various topics, including estimation methodologies and effort predictors.The authors used an online questionnaire as the instrument for the survey.They reported that expert judgment is a commonly used strategy for agile-based estimation.Moreover, they reported that the major causes of inaccurate estimates are requirements and management concerns.In another study, Usman et al. [8] analyzed the state-of-theart agile-based effort estimation.The authors reported that most agile teams focused on expert-based estimation techniques such as planning poker, analogy, and expert opinion.In addition, they mentioned that estimation accuracy could be enhanced using a hybrid method at the cost of high computational complexity.
The COSMIC-based estimation has shown significant performance, and software organizations can utilize it for practical effort estimation.Inspired by this, Salmanoglu et al. [23] performed three case studies utilizing agile methodologies presented by various experts.The authors concluded that COSMIC functional size is a successful and effective effort estimation technique for agile methodologies.However, data collection and evaluation methods are different because all case studies are independent.Moreover, they tested these models with the same projects, which led to an inaccurate estimation.
Raslan and Darwish [24] proposed a system based on mixing the story point.The authors used magnitude relative error (MRE) and percentage relative error deviation (PRED) for evaluation purposes.Given the potential benefits of using cross-company datasets for agile effort estimation, further research is required to improve cost estimation accuracy in agile-based development.
In the literature, several studies have been reported in the ASD context.However, further studies need to be conducted that identify additional cost overhead factors in the agile development context.Prior work has proposed various techniques and cost factors in the context of ASD.However, none of the published work has focused on validating, classifying, and prioritizing cost factors in the agile development domain.Moreover, no systematic approach has been adopted for the identification and validation of cost factors from both state-of-the-art and state-of-the-art viewpoints.Thus, further research is required to cover all aspects of cost estimation in the ASD context.Table 1 provides the summary of related work.

Research methodology
To accomplish the targeted research objectives, we followed an SLR protocol to identify the cost factors.Moreover, we have conducted an empirical study to validate the identified cost factors by agile project managers.Notice that an SLR is a standard review type aiming to review and synthesize the existing literature systematically.Through the performed SLR, we have systematically identified the overhead cost factors unbiasedly, evaluated the results, and classified the identified factors [31,32].In this work, we have followed the suggested Kitchenhem's guidelines [33] to conduct the SLR.Generally speaking, an SLR consists of three main phases: (i) planning the review, (ii) conducting the review, and (ii) reporting the review.The following section briefly describes the phases of the conducted SLR.
Fig. 2 illustrates the adopted research methodology.In the first phase, a general literature review was conducted for problem formulation.Notice that different factors can be associated with the cost overhead in the context of ASD; hence, it is selected as a base problem.In the second phase, SLR extracted different cost factors, challenges, and cost estimation models in the ASD domain.After data collection, we performed various statistical tests to analyze the data and tested the devised hypothesis in phase 5; the experts validated the proposed framework.

Planning the review
To conduct the SLR, it is essential to mention all steps in this phase explicitly.Moreover, different pre-requisite activities need to be completed before conducting an SLR.The following are the main performed steps: • Devising the research question • Selecting the best research repositories • Creating a search string for extraction of the relevant studies • Specifying the study's inclusion and exclusion criteria • Formulating the quality assessment criteria The following sections provide the details about the steps of the planning phase of the conducted SLR.

Research questions
Petticrew and Roberts [34] suggested PICOC strategy to define the research questions.We followed the PICOC process (in this work) to formulate the research questions.Table 2 describes the employed PICOC process.
To consider the targeted research objective and PICOC method, the following four research questions (RQ) were formulated covering the cost estimation facet in the ASD context.RQ3: How the identified factors could be prioritized to measure their impact on cost estimation?RQ4: How to devise a thematic taxonomy of cost overhead factors in the ASD context?

Data repositories
This section discusses the employed search repositories selected to find the potential studies in the targeted research context.The digital libraries include IEEE Xplore, ACM Digital Library, Springer Link, Science Direct, and Wiley Library.Notice that the

Tayh et al. [26]
The authors demonstrated how fuzzy logic could be used to improve effort estimation accuracy using user stories.
Framework Based.Fuzzy Logic Story size by using the Fibonacci sequence.COM Levels Actual data from software projects is not used to compare results.
There is no systematic approach to factors Quantification.
Manal El Bajta [27] The model is similar to Case-Based Reasoning.Similar projects must be identified first to calculate costs in ASD.

Model Based Analogy
The work is still ongoing, so the approach has yet to be evaluated.
Not having data or having previously completed a similar project makes this method difficult.
Khuat and Le [28] Proposed a PSO approach to reduce the project duration, effort, and estimation of schedule to build software in ASD.
Algorithmic Based ABC-PSO Algorithm MMRE, PRED, MER Lacks categorization and prioritization of identified factors.

Lenarduzzi et al. [29]
Introduced ASD estimation by demonstrating how stories can be quantified using function points to determine the project cost.

Replicated Study Function Point Weightage factor (Low, Average, High)
There is no systematic approach to factors prioritization.
Gandomani et al. [  considered digital repositories are selected based on the recommendations mentioned by Chen et al. [35].After devising the general search string, we tailored the search strings by following the guidelines of the considered digital library.In this work, we considered the studies published from 2010 to 2022 in order to provide an up-to-date view of the literature.The following databases are selected (mentioned in Table 3) based on previously conducted SLRs [7][8][9][10] to find the relevant primary studies.We found the main key terms having potential relevance to the topic as determined by a search of various academic   databases.Notice that each search query was executed differently in each considered data repository.After selecting the studies, inclusion and exclusion criteria were applied to select the primary studies.Consequently, we found a comprehensive list of primary studies.Fig. 3 depicts the overall procedure of the search process.Initially, we selected a data repository and executed the devised search strings.Then, after identifying the initial studies, we applied the formulated inclusion and exclusion criteria.Next, we analyzed the studies according to the targeted research objective.The primary studies fulfilling the inclusion and exclusion criteria were selected to answer the devised research questions.

Search string
After devising the research question and selecting the appropriate data repositories, the next step is to define a research string to identify different estimation techniques, methods, cost factors, and challenges associated with cost estimation.In this work, we have followed the Kitchenham guidelines mentioned in [33,36] to define the search string using the following steps: • Make a list of all main keywords.
• Define alternative words, synonyms, or terms related to the main keywords.
• Check to see if the studies contain the main keywords.
• Using the Boolean 'OR' associated synonyms, alternative words, or terms related to the main keywords and • the Boolean 'AND' operator to link the main terms.
Table 4 provides the main terms and their synonyms used to devise the final search string.The following is the devised search string used to find the relevant studies: The selected studies fulfilling the search criteria in their title, abstract, or keywords were selected.Additionally, peer-reviewed journals and conferences were selected as viable sources of the required information.

Inclusion criteria
Kitchenham [37] guidelines were followed to develop the inclusion and exclusion criteria for the conducted study.In this work, we devised the inclusion and exclusion criteria to ensure the selected relevant studies contained the relevant information.For primary study selection, the following inclusion criteria were used: IC1: The potential study must be published in a peer-reviewed journal, conference, or book chapter.

IC2:
The study should cover effort and cost estimation in the ASD context.

IC3:
The study discusses the cost drivers or factors in the context of ASD.

IC4:
The potential study should report the cost estimation techniques or models in the ASD context.

IC5:
The selected studies should be written in English and available in Full-text.

Exclusion criteria
For primary study selection, the following exclusion criteria were followed: EC1: The title, abstract, or any part of their content are unrelated to the research topic, but they are lexically related to the search string.

EC2:
The studies that were not written using the English language.

EC3:
There is no resemblance to the targeted research theme.

EC4:
The study lacks in covering the cost estimation issues in the context of ASD.
EC5: Duplicate and short papers were discarded due to a lack of technical details.

Quality assessment criteria
In this work, we have followed the quality checklist (QC) guidelines of Kitchenham [33] and Khan et al. [38] for the selection of relevant primary studies.The studies were scored based on how they satisfied the quality criteria.Each criterion works on the predefined scale (Y, N, P), signifying the study acceptance criterion (Y = 1), rejection criterion (N = 0), and partial acceptance criterion (P = 0.5).The threshold value was set at 2.0 for the quality of work.Table 5 illustrate the quality checklist questions to ensure the quality of the studies.

Conducting the review
In this phase, we applied different queries to select primary studies and used a standard approach to extract the studies.A significant number of identified studies were discarded based on the inclusion/exclusion criteria since they were unrelated to the targeted research context.In the following subsections, the details of the performed activities are discussed.

Selection of primary studies
The proposed tollgate approach proposed by Afzal et al. [39] was used to modify the selection process for several primary studies.According to the devised ICs, the tollgate approach comprise five phases, as depicted in Fig. 4.After removing the duplicate studies, 1746 were selected from data repositories using a search strategy that included only journal and conference studies.The proposed approach collects a subset of primary studies that can be analyzed based on research questions.
In the initial phase, 1095 studies were selected using IC1 based on study type (Conference and Journal).
In the second phase, 643 studies were selected and analyzed using IC2; the study should be related to effort and cost estimation.
We get 314 studies in the third phase using IC3, i.e., the potential study should focus on factors and challenges.
In the fourth phase, analyzing the study using IC5, the study should discuss cost estimation techniques; we got a total of 136 studies.We only those studies that included effort estimation techniques and discussed cost factors and challenges were chosen after applying full-text criteria.The resulting papers were 24.Moreover, after a snowballing technique, we included seven additional papers.Hence, finally, we got 31 primary studies.

Data synthesis
In this phase, the data collected from the primary studies are organized and compared to the research questions.Furthermore, as shown in Fig. 4, the studies were filtered via the different levels of the tollgate approach for data synthesizing.From a total of 31 primary studies, 15 cost factors and 11 ASD-specific cost estimation models were identified.During the exclusion time, most studies (54) were unrelated to effort and cost estimation, and 17 were not agile-based.Furthermore, two studies were written in other languages, and 15 studies lacked techniques, so these studies were excluded.We got a set of 31 primary studies, as shown in Table 6.

Reporting the review
In this phase, the selected primary studies were reviewed against the formulated research questions, and the primary studies list was finalized.This section also illustrates the temporal distribution graph and preliminary study types to identify trends.The details of the steps are described in the following sections.

Quality attributes
The selected studies were reviewed and analyzed to ensure the quality of each study for final inclusion in the set of selected primary studies.Table 7 shows the quality score for each primary study based on the formulated quality questions.Table 7 shows that the average quality score is more than 2.0.This high-quality [24] 1 0.5 0.5 score indicated that the potential studies met the quality criteria, and most were relevant to the topic.The selected studies targeted cost estimation and specifically in the context of ASD.Some primary studies are based on cost factors, while others propose models in the context of ASD.

Temporal distribution of primary studies
Fig. 4 shows that, we settled with 31 primary studies.45% of the studies were published from 2010-2015, and 55% of studies were published from 2015-2022.This trend shows the importance of this topic in contemporary ASD.We also applied the snow bowling technique and followed Wohlin's [47] guidelines to cover all the relevant studies through citations.We also used Google Scholar for snowballing to cover all relevant studies for the work.As a result, 7 additional studies were included in the potential primary studies.The rationale for including these studies is that they included all relevant information and answered the formulated research questions.

Prepare questionnaire
A questionnaire is a data collection instrument consisting of a set of questions that must be answered.We designed a questionnaire using Google forms. 1 The survey's primary objective was to collect data from developers and agile project managers.The questionnaire contains a series of questions for conducting the analysis.In addition, the questionnaire includes questions related to cost estimation techniques and cost factors in the context of ASD.Another questionnaire was conducted for the Fuzzy-AHP; we targeted senior agile project managers with more than five years of experience.After contacting them via LinkedIn, 2 we sent 1 https://workspace.google.com/ 2 https://www.linkedin.com/a questionnaire to the identified potential agile project managers.Finally, we received 22 responses, further analyzed them, and ranked them using the F-AHP technique.

Sample size and technique
We have selected different Agile project managers, Agile practitioners, Scrum Masters, and senior developers from various software organizations for distribution.We distribute the questionnaire via LinkedIn and social platforms, search for the keywords ''Agile Project Managers'', ''Agile Practitioners'' and ''Senior Developers'' and then select the ''group only'' filter.We then asked participants to complete a questionnaire after the various groups joined.Over 100 organizations were contacted to ensure the results were authentic.A total of 200+ responses were received.We used a questionnaire because it is an easy method when you cannot reach it in real.Five-point Likert Scale [48] was used in the questionnaire with each identified factor (Strongly agree, Agree, Neutral, Disagree, Strongly disagree).The obtained responses were then converted into percentages and refined with data analysis techniques.

Data cleaning
In order to remove duplicates and replications after collecting the responses, the data is analyzed.In total, 154 responses were found after data was cleaned, corrected, and validated by the respondents.In another survey, we received 22 responses.However, after the data-cleaning process, we got only 18 responses.

Data analysis
The data were analyzed using frequency analysis.Next, we determined the framework significance using the levels of agreement of the Agile project managers.After obtaining the percentages and creating the frequency chart, we used the Levene's test [49], t-test [50], and Spearman correlation test [51].These statistical tests were performed to ensure the validity of the results and determine whether the identified cost factors are as significant as reported in the literature.

Levene's test for RQ1
Levene's test result indicates homogeneity of variance between SLR's results and the empirical study.Table 8 shows these variance values and the percentages gathered from the reviewed literature and questionnaire.

T-test for RQ1
The T-Test was used in addition to Levene's test in

Spearman test for RQ1
The Spearman correlation test was used in addition to the t-test.We used Spearman's rank-order correlation test to determine the significance of the variances between SLR and Empirical study outcomes.The Spearman correlation test indicates a positive correlation by a coefficient value (Rs) closer to 1.In contrast, a negative correlation is shown by a value of Rs closer to −2.The Spearman coefficient for the current study was 0.667414, demonstrating an excellent positive correlation between the rankings generated from SLR and the results of empirical research.The pvalue is 0.01, indicating no significant difference, so we reject the null hypothesis.

Proposed framework
This section presents the proposed conceptual framework for prioritization of cost factors in ASD.Fig. 5 provides details about the phases of the proposed conceptual framework.

Phase 1: Data Collection
In Phase 1, we conducted an SLR to identify all overhead cost factors in ASD and classify them as critical, moderate, and low frequency.Following that, we validated all cost factors from the industry using a questionnaire-based survey and got some additional factors.The identified factors were categorized using 4Ps [52] (People, Product, Process, and Project).Agile project managers, agile practitioners, scrum masters, and developers participated in this study.

Phase 2: Data Processing
In Phase 2, we applied the F-AHP technique to prioritize the identified factors.We first constructed a hierarchical structure for the factors and performed a pairwise comparison.After that, we examined each pairwise consistency ratio (CR).The criteria weights will be calculated if CR is less than 0.10; otherwise, adjusted pairwise comparisons are performed to recheck consistency.Next, a thematic taxonomy was developed by assigning each factor's local and global weight.

Phase 3: Output
This phase of the proposed conceptual framework generates a final prioritized list of cost overhead factors to effectively assist the agile project managers during the cost estimation process.

Application of Fuzzy-Analytical Hierarchy Process (F-AHP)
The first step is to create a pairwise comparison matrix [17] of the factors to rank the overhead cost factors based on the four identified classes (i.e., people, process.project, and product).After comparing the factors, the next step is constructing a pairwise comparison matrix of the alternatives against each criterion individually.The F-AHP was applied to a data set of cost factors from a questionnaire-based survey from the Agile project managers.Therefore, prior knowledge of the AHP is required to comprehend the F-AHP.The following are the steps involved in the AHP approach: 1. Establish a hierarchy of complex decision problems 2. Compute the weight of each hierarchy level (criteria and alternatives) with the help of a pairwise comparison matrix.3. Calculate the consistency ratio of the pairwise comparison matrix.4. Compute the normalized weight and determine their local ranking.

Multiply criteria weight with their alternative weights to
demonstrate their global ranking.
The traditional AHP must address decision-makers ambiguities to finalize the priorities of different criteria.To overcome the ambiguities, F-AHP is proposed by combining AHP with fuzzy theory for determining accurate judgments in real-time problems [53].Furthermore, F-AHP could be used for qualitative and quantitative data in MCDM problems.In this study, we have implemented the F-AHP (Geometric Mean Method) proposed by Buckley [17], which provides more precise results than traditional F-AHP.The technique was implemented in the Microsoft Excel3 environment.The step-by-step implementation of the F-AHP method [17] is explained in the following steps.
Step 1: Formulate the pairwise comparison matrix.The first step in applying the F-AHP is constructing the hierarchical structure.The hierarchical structure consists of three levels; level 1 identifies the primary goal of performing the analytical hierarchy process.In this case, the main goal is to prioritize the overhead cost factors.Level 2 of the hierarchical structure identifies the factors to achieve the goal.Four classes (4Ps) [32], namely people, project, process, and product, have been identified via systematic literature review.In addition, their level of impact has been identified via questionnaires distributed amongst agile practitioners and project managers.The hierarchical structure for the targeted problem is shown in Fig. 6.
A Fuzzy pairwise comparison matrix is constituted using the fuzzy conversion scale [54] mentioned in Table 10.Fuzzy pairwise comparison describes the details and accurate linguistic  variables experts have used to give their survey opinions.The resultant pairwise comparison matrix will be constructed by using Eq. ( 1).

Źij = ∑
where Ź denotes the preference of the decision maker.
Step 2: Determine geometric mean.Update the pairwise comparison matrix based on the average preference, as shown in Eq. ( 2).
where P presents a pairwise contribution matrix.
Step 3: Determine geometric mean.Calculate the Geometric mean of fuzzy comparison value using the Buckley Eq. (3).
Step 4: Determine fuzzy weights.We find the fuzzy weight of each criterion by using these three steps: Step 4.1: Determine the sum of all ti values.
Step 4.2: Take the inverse of the total sum.
Step 4.3: To find the fuzzy weights, it is necessary to multiply each value of it by its inverse value.
Eq. ( 4) formally describes the fuzzy weight of each criterion.
Step 5: De-fuzzified triangular number.Since W i is still fuzzy triangular numbers [55], they must be De-Fuzzified using Eq. ( 5) and the center of area method.
where M i is the non-fuzzy number Step 6: Normalize the fuzzy number.Now, it is necessary to normalize M i by applying Eq. ( 6).
where N i is the final weight after normalization.
To get the normalized weights of both criteria and sub-criteria, follow these six steps.First, the scores for each sub-criteria are calculated by multiplying each weight by related criteria.Then, based on these results, the sub-criteria with the highest score is recommended to the decision-maker.
Step 7. Checking the consistency ratio.The graded mean method [56] will be used, where the pairwise comparison matrix will be converted into a defuzzified crisp number using Eq.(7).A triangular fuzzy number denoted as P = (g, e, f) can be defuzzified to a crisp number as follows: .
Moreover, for the de-fuzzification of matrices, the Consistency Index (CI) Consistency Ratio (CR) will be found using Eqs.( 8) and where λ max is the largest eigenvalue of the comparison matrix, n is the number of items being compared in the matrix, RI is the Random Index, and its value can be obtained from Table 11.

Results and analysis
This section discusses the findings and results of each formulated research question.Furthermore, analysis and empirical evaluation are performed to validate the obtained results.Moreover, this section provides the results of F-AHP.Moreover, it presents a thematic taxonomy of the cost overhead factors in the context of ASD.

RQ1: What factors cause cost overhead, and how to validate them by the ASD experts empirically?
The cost factors are the factors that have a multi-dimensional impact (i.e., positive or negative) in the ASD context.From the literature perspective, the researchers have reported various cost overhead factors.Thus, we have followed the standard values' ranges to highlight the criticality level of the identified cost overhead factors.In this research, we have followed a standard value of Critical greater than 50%, Moderate 25% to 50%, and low impact less than 25%, as followed by other works [36], [38], .However, the practitioners might have a different opinion about their criticality level compared to the state-of-the-art studies.To handle this issue, we have performed a questionnaire-based survey to determine the consequence level from the practitioner's viewpoint.By doing this, we believe in appropriately reporting the critical, moderate, and low impact of cost overhead factors in the ASD context.
A total of 15 (out of 31) studies focused on the cost factors, while others discussed ASD-specific cost estimation techniques.The existing studies explained cost factors but needed more empirical evidence to validate and prioritize these cost factors.Table 12 describes the extracted cost factors from the literature.A total of 16 cost factors were identified, further explored in the following sections.

Factors Having a Critical Impact On Cost
To evaluate the significance of each cost factor, we have followed the standard values of frequency greater than 50% as a critical cost factor which means it causes severe consequences to the cost.A relevant study [38] also adheres to the same criteria.We identified six critical cost factors using the above-mentioned criteria.These are the most critical cost factors: and [CF9].

Factors With Moderate Impact On Cost
We adopted the criterion for classifying cost factors with moderate impact because their frequency ranged from 25 to 50%.Based on the available literature, we identified six cost factors with a moderate impact on cost estimation.

Factors Having Low Significant Impact On Cost
We adopted the criteria of factors with frequency less than 25% to identify the least significant cost factors.These cost factors have a negligible influence on the cost.Therefore, these cost factors could be overlooked due to their low significance.There are four cost factors included in this category.This category's cost factors are [CF10], [CF11], [CF13], and [CF15].

Empirically Validation of RQ1
The identified factors were analyzed to determine the critical cost factors.These factors, which have a frequency greater than 50% in the literature and industry, were crucial.A frequency greater than 50% in the literature and the industry indicates that a highlighted cost factor is equally important to practitioners and must be considered in estimation process.The extraction of critical cost factors is of the greatest priority.Table 13 shows a comparison of ranks obtained from SLR and Empirical study.

RQ2: What are the current state-of-the-art cost estimation techniques in the ASD context?
To answer RQ2, we gathered research data from several repositories to ensure the literature review quality.For this purpose, we have reviewed different techniques, frameworks, and approaches to highlight effort estimation in ASD.The review of the current state-of-the-art techniques for managing effort estimation in ASD is presented in Table 1.Each technique is labeled with its description, study type, technique, evaluation measures, and limitations (Table 1).

RQ2.1: What are the ASD-specific effort estimation techniques from the industrial perspective?
The performed study has also considered an industrial perspective regarding cost estimation techniques used in the ASD environment.To accomplish this, we performed a questionnairebased survey of the practitioners (i.e., agile project managers) to identify the cost estimation techniques in the context of ASD.It has been observed that companies do not rely on a single cost estimation technique but instead use multiple techniques for project estimation.The literature also discussed that we derive benefits from combining multiple techniques.Therefore, 154 agile project managers from various ASD companies were considered in this study.Multiple organizations use a combination of cost estimation techniques in the context of ASD, as shown in Table 13.However, they mainly rely on expert opinion and analogy-based techniques, indicating that we need an appropriate formal cost estimation technique.Software organizations based on ASD continue to prefer non-formal cost estimation techniques because the results of formal techniques in this context are unsatisfactory; they fail to account for the additional cost factors of ASD.The limitations of existing cost estimation techniques are discussed in Table 1.Another fact is that algorithmic-based estimation has a higher percentage than formal models, so we can enhance these techniques to increase estimation precision in the future.

RQ3: How the identified factors could be prioritized to measure their impact on cost estimation?
The Fuzzy-Analytical Hierarchy Process (F-AHP) technique ranks the identified cost factors and classes.The steps of adapted F-AHP are described as following:

1: Proposed hierarchy model of identified cost overhead factors and their classes
The foremost step of applying F-AHP is classifying complex decision-making problems into steps and sub-steps.A hierarchy structure has developed by considering the identified influential factors and their categories, as presented in Fig. 6.

2: Conducting the Pairwise Comparison
This study aims to rank the identified cost factors and their respective classes.The second step of F-AHP is pairwise comparison.The purpose of the second survey was to compare identified factors and their respective classes for pairwise comparison.Respondents of the first survey were contacted to conduct the  second survey.Few individuals responded to the second survey, so additional experts were contacted to rank the identified influential factor and their classes using F-AHP.We received 22 responses, and after refining and cleaning the results, the responses of 18 experts were considered for this study.The limited sample size can complicate the investigation.AHP and F-AHP were both analyzed using small datasets in the existing studies.Table 16 illustrates the pair wise comparison of main classes.For example, Akbar et al. [58] ranked the cloud-based outsourcing development challenges using F-AHP and collected responses from 31 experts.Similarly, Khan et al. [59] performed F-AHP prioritization on the factors of software process improvement in global software development, and 26 experts provided feedback.
Shameem et al. [60] performed an AHP analysis to rank the influential factors of distributed agile software development based on the responses of five experts.
Moreover, Farid et al. [61] used AHP to prioritize the critical issue of eLearning in Pakistan based on the 18 experts' responses.
In comparison to the above-discussed techniques, we have conducted an F-AHP analysis by considering the responses of 18 experts that permit generalizing the study's findings.The pairwise comparison matrix is initially constructed using the frequency analysis method and the survey results.The geometric mean must be calculated for pairwise comparisons of identified cost factors and their respective classes.The F-AHP data will be transformed using Eq. ( 1) [17] using the sum of columns and the inverse of the sum, as depicted in Table 17.

3: Calculating the Local Weight
The Local Weight (LW) for all cost factors for each main class has been calculated using Eq. ( 5) [17], and 6 [17] main classes'.For de-fuzzing, the matrix presented in Table 19 calculates each record's average using Eq. ( 6) [17], and the results are shown in Table 17.
The sum of Mi in Table 18 is not equal to 1, so these weights must be normalized to normalize weights.Eq. ( 8) [17] has been used, and the results are mentioned in Table 19.

4: Consistency check of the pairwise comparison matrix
The consistency ratio of each matrix was computed by following the method highlighted in step 7.For example, the largest eigenvalue λ max was calculated for the pairwise comparison matrix of the primary classification of cost factors.For determining λ max , it was necessary to de-fuzzified the fuzzy triangular matrix.
The largest eigenvalue was calculated by multiplying each column's sum (Table 15) and multiplying it by its corresponding local weight (Table 17).The local priority weights are developed by following steps 1 to 6. Based on calculations, the eigenvalue has calculated the CI and CR using equations 8 and 9.The dimensions of FCM are 5; that is why n = 5 and RI are 1.12 for n = 4 from Table 11.
CI= λ max = 3.9343, CI = −0.0219,CR = −0.02The value of calculated CR is −0.02, which is less than 0.1; therefore, the pairwise comparison matrix developed to classify cost factors is consistent and adequate.Similarly, the CR of all the matrices is calculated.Table 21 represents the CR results related to people.Similarly, Tables 22 and 23 describe the CR results for the process and project, respectively.Finally, Table 24 provides the details of product's CR results.

5: Calculating the Global Weight
Global Weight (GW) is calculated by multiplying the LW of each cost factor's LW of their respective class.For example, CF17 in the ''Process'' class ranks 1 in the local ranking, but its Global Ranking (GR) is 7 with a GW of 0.0508.This GW is determined by multiplying the LW of CF (0.4224) with the LW (0.2401) of the ''Process'' category.The GWs of all other CFs are calculated and presented in Table 20.
Based on Table 25, CF7 and CF1 impact agile practitioners the most due to their high GW, whereas CF16 and CF20 are the least significant cost factors due to their low GW.

6: Final Ranking of Identified Cost Overhead Factors
The final ranking of the identified factors is determined by multiplying Local Weight with their Global Weights.Table 22 reveals that CF7 (Technical Complexity) is the high-cost overhead factor and CF8 (Developer Knowledge) is the second-highest cost overhead factor.The CF2(Team Prior Experience) is the third most significant cost factor.The results of the F-AHP showed that CF10 (Changing/Unclear Requirements), CF3 (Client/User Communication), and CF13 (Time Zone) are the other high-cost overhead factors.

RQ4: How to devise a thematic taxonomy of cost overhead factors in the ASD context?
The taxonomy of cost overhead factors has been developed by identifying and prioritizing ASD-related factors.As a result, the 20 cost factors were categorized into four main classes based on the 4P's (People, Project, Process, and Product) or PMBOK (Project management body of knowledge) Standard [31,32]; PMBOK is a compilation of project management practices [52].According to the survey experts, the results presented in Fig. 7 demonstrate that the People class has been considered the top rank class with a weight of 0.4494.Process (0.2101) is the second-highest ranked class, as shown in Fig. 7.Moreover, it is noted that Project (0.1899) and Product (with a weight of 0.0882) are the classes with the lowest significance.
Furthermore, CF7 (Technical Complexity) is considered the most influential factor in project class and ranked as ''1'' compared with all other 20 factors.Similarly, CF1 (Developer Knowledge) is considered top-ranked in people class but ranked as two compared to 20 factors.Consequently, CF2 (Team Prior Experience) is considered the third most influential factor in people class.

Discussion
This study aim to investigate the primary overhead cost factors that could assist agile project managers in accurately estimating project costs.To accomplish this, we have devised three main research questions.The following sections discuss the results and analysis of the current study.

RQ1: Cost overhead factors
We have reported 16 cost factors in the ASD context, briefly discussed in Section 6.The identified cost factors are critical and may negatively or positively impact the projects' success.The main focus was to identify the hidden cost factors, which are often neglected and act as the leading cause of cost overhead in the ASD context.In addition, we have identified additional factors from the SLR lacking in the published studies.The classification of cost factors is presented in Section 4. The cost factors with 50% frequency are critical factors that are highly recommended Moreover, only one questionnaire-based study [6] has been conducted but focused on validating the identified cost factors.In contrast, we intend to identify and validate the founded cost factors by agile project managers through a questionnaire-based survey.Afterward, 300+ agile project managers of various organizations were contacted through LinkedIn and other social platforms.After that, we received 184 responses.Next, after applying the data-cleaning process, we got 154 responses.However, the empirical study highlighted some additional critical factors needing to be improved in the recent work, including scope, poor planning, changing market needs, and team coordination.Neglecting these factors can associate with the hidden cost that could influence the cost overhead.The comparative rank of identified cost factors through SLR and questionnaire-based survey are presented in Table 12.

RQ2: Cost estimation techniques
Various cost estimation techniques were identified through the performed SLR to review frameworks, methods, and taxonomy.Table 1 presents the literature review matrix.Each cost estimation technique is briefly discussed, along with the study type, model, or technique name, which explicitly mentions the limitation of a particular technique.RQ2 discusses that most of the proposed techniques lacked validation and are still in an infant stage of development.Some techniques were analogy and experiment based, where experts were required to execute the model and lacked quantification of factors.However, none of the reported studies have adopted any systematic approach for identifying overhead cost factors in the context of ASD.Further research is still required to cover all aspects of cost estimation in the ASD context.

RQ2.1: ASD specific cost estimation techniques
The ASD-specific cost estimation techniques were identified and grounded on the agile project managers.We conducted a questionnaire-based survey of agile practitioners.We asked them to rank the provided cost estimation techniques and further ask about the additional techniques (if any).Table 14 presents the categories of the identified cost estimation techniques.Agile practitioners did not rely on a single cost estimation technique.The main cost estimation categories are: (i) expert judgment, (ii) analogy based, (iii) pay-as-go, (iv) algorithmic, (v) hybrid, and (vi) machine learning.We found that majority of the agile practitioners used expert judgment (38%), analogy-based (26%), or Pay as Go (14%).Consequently, it indicates that the literature still needs a formal cost estimation technique, which can be regarded as a standard technique.We explicitly discussed the limitation of each existing cost estimation technique in Table 1.Notice that algorithmic-based estimation techniques attained higher accuracy than formal models.Hence, the accuracy of cost estimation techniques can be further enhanced in the future by employing algorithmic-oriented cost estimation techniques.

RQ3: Prioritization of the cost overhead factors
Fuzzy-AHP technique was used to prioritize the identified cost factors and their categories.In this regard, we performed a pairwise comparison of the overhead cost factors and their respective classes.The pairwise comparison aims to identify the priorities of each identified cost factor and their categories grounded on F-AHP results.Compared to TOPSIS and AHP, the F-AHP technique can be viewed as an advanced analytical method developed from the traditional AHP.Moreover, F-AHP effectively rectifies the subjectivity and imprecision of the AHP approach, thereby significantly improving the decision-making process.Furthermore, F-AHP is effective in solving weightage-oriented hierarchical problems.We proposed a quantitative-based framework, which ranked the cost-overhead factors based on the criteria identified from the existing literature.
We believe that the current study's findings could be beneficial in dealing with the issues associated with the cost estimation of Agile-driven projects.The attained result is presented in Table 26, illustrating that CF7 (technical complexity) is a top-ranked cost factor globally and locally in the project category.In other words, it indicates that the survey's respondents strongly believe that the technical complexity can negatively impact the project cost in the context of ASD.The CF8 (Developer Knowledge) is the second most overhead cost factor.The CF2 (Team Prior Experience) is the third most significant cost factor.The results of the F-AHP showed that CF10 (Changing/Unclear Requirements), CF3 (Client/User Communication), and CF13 (Time Zone) are the other high-cost overhead factors.Notice that this is the first extensive study that focused on identifying, validating, and prioritizing cost overhead factors from an SLR and questionnaire-based survey.

RQ4: Taxonomy of ASD cost overhead factors
A thematic taxonomy of the identified cost factors is developed by identifying and prioritizing the influence of cost overhead factors and their categories.The identified factors were further mapped into four categories (i.e., ''people'', ''process'', ''project'', and ''product'') by following the relevant works [31,32].
The results are shown in Fig. 7, in which people (0.4494) have been ranked as a top category based on the response of agile project managers.Consequently, the people category is regarded as the most crucial category of ASD and industrial perspective where they should focus on it to estimate the cost of the projects accurately.Fig. 5 highlights that the second top-ranked category is processed (0.241) by survey experts.Agile practitioners should focus on this category to increase the success rate of ASD-based projects.

Implications and future work
This work has both research and practical implications as it identifies additional cost overhead factors through SLR and questionnaire-based survey lacked by the published works.Furthermore, we intended to validate the identified factors from the practitioners (i.e., agile project managers) through a questionnairebased research protocol.In addition, the current study emphasized the importance of prioritizing the overhead cost factors and provided a detailed thematic taxonomy using an F-AHP technique.Consequently, it is useful in providing an in-depth understanding and knowledge to agile practitioners during cost estimation in ASD.
In the future, we plan to compare the results of the F-AHP technique with other MCDM techniques (e.g., Fuzzy-Topsis, Fuzzy-Delphi, and so on) for a better decision-making process in the ASD context.This would help find the most-effective cost estimation technique in the agile-driven context through an empirical analysis of existing MCDM techniques.Another potential future research dimension is the application of the proposed framework in other domains.For example, the proposed quantitative framework could be applied in the Global Software Development (GSD) domain.For this purpose, the proposed framework must be modified following the GSD environment.

Conclusion
In this work, we proposed a quantitative-based framework to rank the identified cost overhead factors in the agile software development (ASD) context.Although, prior work has reported various cost factors in the ASD context; however, they lack validation and prioritization of the identified cost overhead factors in the context of ASD.In this regard, we performed a systematic literature review to identify the additional cost overhead factors in ASD.The identified cost overhead factors were classified into four categories using a 4Ps standard (i.e., people, process, project, and product).Moreover, an empirical study was conducted to validate and determine the additional cost factors through a questionnaire-based survey from the agile project managers.The empirical data was collected from 154 agile project managers for the identified 16 cost factors.We found additional cost factors from practitioners including scope, poor planning, changing market needs, and team coordination.To check the criticality level of identified factors, we followed a frequency-based analysis that shows if the factor frequency is greater than 50% critical, 25% to 50% moderate, and low impact below 25%.For data processing, the proposed framework adopted an F-AHP technique to prioritize the identified cost factors and their categories.The results indicated that the ''people'' category is regarded as the most critical category in the ASD domain.In other words, the 'people'' category cannot be neglected during the cost estimation process.The most critical factors are technical complexity, developer's knowledge, prior team experience, changing/unclear requirements, and client/user communication.Based on the implementation results, we developed a prioritization-based thematic taxonomy of cost overhead factors using local and global weights of each identified cost factor.Finally, we believe that the developed taxonomy would assist agile practitioners in accurately estimating the cost in the ASD context.

•
RC1: It extensively reviews different cost estimation techniques and cost overhead factors in the ASD context.•RC2: It investigates various overhead cost factors that affect the cost estimation process for agile-based development.• RC3: It provides a validation of critical cost factors by the practitioners (i.e., agile project managers).• RC4: It prioritizes the identified cost factors and propose a conceptual framework handling the overhead cost factors in the ASD context.

•
RC5: It proposes a thematic taxonomy of overhead cost factors in agile-based projects.

RQ1: 1 :
What factors cause cost overhead, and how to validate them by ASD experts empirically?RQ2: What are the current state-of-the-art cost estimation techniques in the ASD context?RQ2.What are the ASD-specific effort estimation techniques from the industrial perspective?

Fig. 4 .
Fig. 4. Tollgate Approach for the Selection of Primary Studies.

Fig. 5 .
Fig. 5.The Proposed Conceptual Framework for Cost Factors Prioritization in ASD.

Fig. 7 .
Fig. 7. Prioritization Based Taxonomy of Identified Influential Factors.factors for the improvement of cost estimation accuracy.The critical factors are the developer's experience, the team's prior experience, team size, technical complexity, and changing requirements.The identified critical cost factors could be used to estimate the project cost in the ASD context accurately.Moreover, only one questionnaire-based study[6] has been conducted but focused on validating the identified cost factors.In contrast, we intend to identify and validate the founded cost factors by agile project managers through a questionnaire-based survey.Afterward, 300+ agile project managers of various organizations were contacted through LinkedIn and other social platforms.After that, we received 184 responses.Next, after applying the data-cleaning process, we got 154 responses.However, the empirical study highlighted some additional critical factors needing to be improved in the recent work, including scope, poor planning, changing market needs, and team coordination.Neglecting these factors can associate with the hidden cost that could influence the cost overhead.The comparative rank of identified cost factors through SLR and questionnaire-based survey are presented in Table12.

Table 1
Summary of related work.
PopulationAgile software development Intervention Effort/Cost estimation in ASD Comparison The findings of this work will be compared with empirical results.Outcome We are identifying different cost factors, challenges, and cost estimation techniques in the ASD context.Context Agile methods.

Table 4
Main terms and their synonyms.

Table 5
Quality assessment checklist questions.

Table 6
The result after full-text screening.

Table 7
Quality score of selected primary studies.

Table 9
to compare the mean between SLR and empirical research data.The results indicate no significant differences between SLR and empirical study methods.They produce similar outcomes with only minor variations in magnitude (tD −2.09; p = 0.02).The comparison of cost factors based on SLR and empirical study is shown in Table9.The comparison considers only cost factors with a moderate or critical influence.The cost factors with a low impact on estimation are excluded because they have no real significance.

Table 12
Identified cost factors of cost estimation in ASD.

Table 13
Comparative rank of identified cost factors.

Table 14
Industrial perspective of ASD-specific cost estimation techniques.

Table 15
Classification of identified cost factors.

Table 17
Geometric mean of main classes.

Table 20
Fuzzy crisp matrix for factors classes.

Table 25
Global Weights of identified cost overhead factors.

Table 26
Final ranking of identified cost overhead factors.