Selecting a portfolio of projects considering both optimization and balance of sub-portfolios

Over the past four decades, portfolio selection has been one of the most important concerns of researchers, project managers, project-oriented companies, and public agencies around the world. Although numerous studies have been done in this field, still there is a room for more improvement in both theory and practice. One of the yet unspoiled topics in this field is improving and balancing the efficiency of sub-portfolios while paying attention to portfolio optimization. This study employs data-mining tools to categorize projects into sub-port-folios and rank them. Multiple Criteria Decision Making (MCDM) methods are also used to weigh the criteria on which the ranking process is based. Finally, a novel multi-objective model is designed to optimize the efficiency of sub-portfolios and the gain of the main port-folio. The model is solved by NSGA II algorithm. This study introduces a hybrid framework by which project portfolio selection process can be carried out regarding strategic alignment, cost, and risk.


Introduction
Project portfolio management (PPM) is managing a set of programs and projects as an integrated system and providing an appropriate allocation of financial, technological, and human resources (Tiggemann, et al., 1998).Project portfolio management is an important issue for organizations for prioritizing and ranking the projects in accordance with organizational strategies.AXELOS defines the portfolio management as an integrated set of strategic processes and decisions to create the most effective balance between organizational and business changes (AXELOS, 2011).
PPM includes activities such as identifying, evaluating, selecting, and prioritizing projects, as well as balancing the portfolio of projects, so that projects are aligned with the strategies, vision, mission, and values of the organization.Organizational strategies are the outcome of strategic planning in which the vision and mission turn into a strategic plan.The strategies can be further subdivided into a set of initiatives that are influenced by several factors, including market dynamics and competition, customer satisfaction, shareholder requirements, and government regulations.These initiatives can turn into projects and programs forming the project portfolio.Projects and programs are typically related to strategic objectives (Fig. 1).From the methodological point of view, project portfolio management activities can be divided into three categories: process planning, coordination and control.Project Portfolio Selection (PPS) that is the subject of this paper is a coordination process, and one of the main components of the project portfolio management system.Selection of projects should be executed after all of the potential alternatives are identified and evaluated based on some pre-determined criteria.In fact, PPS is a periodic activity to select a set of projects regarding the organizational objectives within the available resources (Archer & Ghasemzadeh, 1999).The outcome of Project Portfolio Selection Problem (PPSP) is a set of projects and (or) programs which optimize one or more objective functions without violating the real-world limits and constraints.

Mission and Vision
Several qualitative and quantitative methods have been proposed for PPSP so far.Choosing the right method for project portfolio selection must be performed according to various factors such as type of projects, problem sensitivity, type and accuracy of input and output parameters and level of expertise and skill of the decision-makers.The method proposed in this study is suitable for organizations where the balance between sub-portfolios must be maintained in addition to optimizing the main portfolio.The allocation of research funding to the departments of a university is a good example.If only criteria such as profitability and risk are taken into account, the entire university research funding may go to the engineering or medical colleges, and other schools such as the arts, history, or political science will receive very little funding.In this study, an objective function has been introduced that balances the performance of each of the sub-portfolios.Balance (as we refer to in this study) is the ratio of the number of projects selected from one sub-portfolio to the total projects of the same sub-portfolio.This objective function maximizes the minimum value of this ratio among the sub-portfolios.As a result, each sub-portfolio achieves an acceptable level of performance relative to itself, and the main portfolio is balanced, while not reducing the profitability of the main portfolio as much as possible.
The innovation of this research, in addition to its mathematical model, is in the using of data mining tools to automatically categorize and rank projects.This clustering is based on the impact of projects on the strategic goals of the organization.Therefore, projects of each cluster have similar effects on the strategies of the organization.When the number of projects or strategies are large, and several criteria are involved in decision making, this method has a great advantage over expert-based and supervised methods.
The structure of the rest of this article is as follows: Section 2 briefly outlines different types of the project portfolio selection methods that have been used in various research so far.In Section 3, the methodology proposed for this research for project portfolio selection is outlined.This methodology includes project clustering, criteria weighting, project ranking, and final portfolio selection.In Section 4, the proposed mathematical model is solved using a multi-objective genetic algorithm, and the output of the proposed framework of this study is shown.The last section presents the conclusion and suggestions for future works.

A review of the PPS methods
Various approaches have been proposed for selecting the project portfolio in the subject literature, all of which aims to try to provide a way to guide the project selection process.However, all these methods cover a part of this hybrid problem from one or more features of the features described in the previous section.Disadvantages: Cannot directly deal with the risk parameter, the results of these methods are sensitive to the weights of the criteria.(Helin & Souder, 1974) (Baker & Freeland, 1975) (Souder & Mandovic, 1986) MCDM tools (Martel & Khoury, 1988) (Ravanshadnia, et al., 2010) (Almeida & Vetschera, 2012) Engineering Economics (Paolini & Glaser, 1977) (Liberatore & Stylianou, 1993) Mathematical Programming Models

Linear Programming
Advantages: these models allows us to consider all the different combinations of a large number of candidate projects implicitly, the structure of the models is clear, sensitivity analysis can be done to determine the consequence of changes in resources, The model can easily model various interdependencies between projects, compulsory projects can be considered, the models clearly identify resource constraints throughout the planning horizon.

Decision Tree
The decision tree is a set of decisions that are made over time and under uncertainty.The game theory also facilitates the PPS when it is necessary to take into account the competitors' strategy in the decisionmaking process.Since there is a significant gap between the real world and the theoretical world, many simplistic assumptions are needed to solve these models.The purpose of developing these techniques is a systematic collection of knowledge and expert judgments in specialized fields and generate the required data for more complex models.(Kalashnikov, et al., 2017) Game Theory (Winkofsky, et al., 1981) (Badri & Davis, 2001) Group Decision Techniques (Winkofsky, et al., 1981) (Cook & Seifford, 1982) Statistical Approaches (Mathieu & Gibson, 1993) Expert Systems (Liberatore & Stylianou, 1993) Decision Process Analysis (Winkofsky, et al., 1981) (Schmidt, 1993) Table 1 Project Portfolio Selection Methods (Continued)

Simulations and innovative methods
The Monte Carlo simulation uses random numbers that are generated by probability distributions.
In real-world conditions, it is not always necessary to find an optimal solution.This happens when decisionmakers prefer to use more realistic strategies.As a result, the models become less complex and can be optimized in logical processing time.Innovative approaches can balance the optimality and solving time.(Schniederjans & Wilson, 1991) Ad hoc Models This method estimates the economic value of projects under uncertainty.However, many experts believe that the results of this method are not fully reliable because they overestimate the flexibility of projects.(Luehrman, 1998) Hybrid Methods Fuzzy Programming: selecting projects based on fuzzy expressions.(Ghapanchi, et al., 2012) (Pérez, et al., 2018) DEA + QFD QFD Method: all customer requirements and needs are identified and effectively transmitted to different parts of the organization.DEA: determines the efficient frontier (Jafarzadeh, et al., 2018) MCDM+ IP + DEA MCDM: Ranking the strategies IP: selecting projects based on the strategy weights and alignments.DEA: Balancing the portfolio (Ghorbaniane, et al., 2015) DEMATEL + DEA DEMATEL: Cause and effect analysis DEA: Selecting an efficient project portfolio (Alinezhad & Simiari , 2013) DEA + BSC DEA: Choosing the best strategies BSC: Extracting operational plans to execute the strategies.(Abbasi, et al., 2013) System Dynamic + IP SD: Selecting a portfolio of project while some projects are already running in the system IP: select the projects that should start to complete the mission of ongoing projects (Rowzan, 2018)

MCDM + Goal
A review of the different project portfolio selection methods in Table 1 shows that over time the methods have changed from qualitative to quantitative and hybrid approaches, and social and environmental issues have also been considered in PPS problems in addition to strategic and economic considerations.Thus, we get the benefits of both data-driven and expert-based approaches at the same time.

Methodology
The proposed methodology of this study consists of three main stages.Firstly, projects are divided into clusters according to their alignment with each of the specified strategies of the organization.Due to the similarity of the projects in terms of strategic direction, each cluster can be considered as a sub-portfolio.In the second stage, a hybrid Fuzzy Analytical Hierarchy Process-Artificial Neural Network (FAHP-ANN) method is used to rank the projects and determine the weight factor of all criteria used in the prioritization process.In the final stage, an integer programming model is proposed to select the best projects for implementation considering two objective functions.The first one is to choose the projects with higher priority, and the second one is to balance the subportfolios, which leads to balance in strategic goals.A sample problem is brought to clarify the proposed methodology and show how it can be used in real situations.The data of 20 real candidate projects are shown in Table 2.The proposed methodology aims to form a project portfolio with these projects by dividing them into sub-portfolios and identify which of these projects can be implemented within the organizational budget and risk constraints.

Project clustering
Cluster analysis is a process by which a set of objects can be divided into separate groups.Each group is called a cluster.Members of each cluster are very similar to each other, and the similarity between clusters is low.In the context of project portfolio management, clustering techniques can be used to form the portfolio structure.In this study, the projects are clustered based on their alignment with the organizational strategies.In multivariate models, different properties of objects must be used to cluster them.Therefore, the clustering process deals with multidimensional data, which are often referred to as features or properties.Some properties are quantitative and some qualitative.
Measuring the similarity or dissimilarity between objects is the most important issue in clustering should be carefully considered.Different distance functions can be employed to measure the degree of similarity.K-means algorithm is used to carry out the clustering process.The k-means algorithm is one of the simplest and most popular algorithms used in Data Mining, especially when an unsupervised learning approach is needed.The k-means clustering algorithm belongs to the group of partitioning clustering methods.Clustering responses with this method obtains by minimizing or maximizing an objective function.This means that when the algorithm calculates the distance between objects in the same cluster, the objective function is minimization.Conversely, when the algorithm wants to measure the dissimilarity of different clusters, the type of objective function is maximization.Suppose the observations   1 2 , , ..., n X X X that have d dimensions must be divided into k sections or clusters.We know these sections or clusters with a set called . Cluster members should be selected in a way that minimizes the withincluster sum of squares (WCSS) function, which is one-dimensional like variance.Therefore, the objective function in this algorithm is written as Eq.(1). where: In this study, the clustering process is done by SPSS Modeler 18. Fig. 2 shows the projects after being clustered.Each cluster is represented in a specific color.By looking at Figure 1, it can be inferred that clustering has been done in good accuracy.But visual inspection does not give us a precise measure in many cases.The SPSS Modeler reports an index called Silhouette measure of cohesion and separation.This index can get continuous values between -1 and 1.If the index is larger than 0.5, it can be inferred that the quality of the clustering process is good.In our case, Silhouette measure is 0.8.

Prioritization
As mentioned before, at this stage, the projects are ranked in terms of strategic value using hybrid FAHP-ANN approach.FAHP is used to determine the weights of the criteria by which the ranking process is done (Stratrgy 1 to 3).After determining the weight, ANN is used to rank the projects and measure the weight of each of them.

Weighting the criteria
Analytical Hierarchy Process (AHP) is a good way to get the opinions of experts, but it does not properly reflect human thinking, because the answerers must express their opinions in precise numbers.As the nature of pairwise comparisons is fuzzy, experts want to use an interval in their judgments, rather than expressing a constant value.In fuzzy logic, accurate results can be derived using a set of inaccurate information defined by words and verbal quantities.In this study Chang method (Chang, 1996) for Fuzzy-AHP is applied.The answerers are asked to state their opinion by verbal expressions including Equal (1,1,1), Weakly Perferable (2,3,4), Fairly Perferable (4,5,6), Strongly Perferable (6,7,8), and Perfectly Perferable (8,9,10).Then the incompatibility rate of the pairwise comparisons is examined, and if the rate is less than 0.1, the pairwise comparisons are of good consistency.Gogus and Boucher method is used in this study to calculated the incompatibility rate.If the pairwise comparison matrix passes the test, the fuzzy average of the opinions of different answerers must be calculated to be used as the final pairwise comparison matrix.Then we can obtain the weights of the criteria as follows: 1. Calculate which are triangular fuzzy numbers for each row of the final pairwise comparisom matrix according to Eq. ( 4), where M are fuzzy components of the pairwise comparison matrix.According to Eq. ( 2), each of the components of the fuzzy number is added to its peers in the same Fig. 2. The projects after the clustering process row and then multiplied by the fuzzy inverse of the summation of all components in the matrix.This step is similar to computing normalized weights in the conventional AHP method.
Then, according to Eq. (3), magnitude (degree of preference) of each two i S must be compared. 1 Finally, using Eq. ( 4), the raw weights of each criterion are calculated.By dividing each raw weight by the sum of the raw weights, the normalized weight is obtained.
After gathering the opinions of 5 experts and managers of the case study organization, and doing the above calculations, the weight obtained for each criterion.Table 3 contains these weights.These weights will be used in the next stage to rank the projects.

Ranking the projects
Artificial neural networks are one of the most well-known and widely used tools for data classification.These networks consider a group of variables as covariates and at least one variable as the target.The goal is to determine a degree of importance (weight) of each covariate to predict the target variable with the highest possible accuracy.The network partitions the samples into training and test sets.It uses the training set to determine weights and then examines how much accurate it can predict the target variable using the second part of data.
In this study, Multi-layer Perceptron (a type of ANN) is used to do the classification process.The target variable is the Weights of the strategies obtained in the previous part.Also, the projects are used as covariates.The neural network has to find the degree of importance of the projects in a way that predicts the target variable accurately.
The classification process is performed by SPSS Modeler 18. Fig. 3 shows a part of the neural network in which the first layer represents the covariates.As the alignment of the projects had been stated by verbal expressions, the covariates were taken as ordinal variables.The second layer is called Hidden Layer in the multi-layer Perceptron literature, and the number of variables in this layer is automatically determined by the software.The target variable (Weights) can take any value between 0 and 1, so it is a continuous variable.Bias variables are automatically added to the model to boost accuracy.These variables are continuous, as well.
Fig. 3. Architecture of the neural network of this study Fig. 4 shows the graphical report of the software about the degree of importance and rank of the projects.According to this report, P16 is the most important project with the normalized weight of 0.21, and P12 is the least important project with the normalized weight of 0.01.The reported accuracy for this test is 96%.The obtained weights will be used in the mathematical model (section 3.3) as an input of the objective function.

Project selection
As mentioned in Section 2, different mathematical models have been proposed for the project portfolio selection problem.One of the most well-known models originally designed for investment portfolio selection and later was employed to solve the PPS problem is the Markowitz Modern Portfolio Theory.This model aims to maximize the return of the portfolio while minimizing its associated risk.According to this theory, the more the financial investments of a portfolio are similar, the more unsafe it will be.In the project portfolio context, the return has been interpreted as strategic gain as well as financial benefits.Similarly, the balance of sub-portfolios or strategic goals can substitute the risk in the model.It means that if the portfolio managers focus on one or few goals and omit the others, the risk of failure increases accordingly.The proposed model of this study is inspired by the Markowitz Modern Portfolio Theory, although it is altered and customized significantly.The model consists of 2 objective functions first of which aims to maximize the strategic gain of the portfolio.This objective function is presented in Eq. ( 5): : T S Total strategic gain of the portfolio Block distance between project i and project j in the space of strategies (See Figure1) Normalized Block distance ( ij D ) for each pair of projects obtains by Eq. ( 6): ( ... ( ) : .


) enhances the strategic diversity of the selected projects and  increases the alignment of the portfolio with the organizational strategies.
The second objective function aims to strategic balance of the sub-portfolios.It maximizes the minimum strategic gain among the sub-portfolios.This objective function also considers the resources consumed by each sub-portfolio.It divides the strategic gain of each sub-portfolio by its consumed resource to obtain the efficiency ( l E ) and maximizes the minimum l E .The second objective func- tion is shown in Eq. ( 7): where: : l Sub-portfolios : i C Number of human resources used in the project i Equation 7 can be rewritten as Equation 8 which is a more suitable form for codding in solvers.

max 
A set of constraints can be added to the model to simulate the real limitations of the organizations.In our case, budget and risk constraints have been considered in the model.The budget constraint is shown in Eq. ( 10), where T B is maximum available budget.In Our case, 550 The projects are categorized into 3 levels including high-risk, medium-risk, and low-risk projects to manage the risk of the portfolio more efficiently.This technique is known as risk mapping in the literature.In this technique, a 2D or 3D diagram is drawn and projects are classified based on the probability of failure, the severity of the failure consequence, and the probability of discovering and eliminating the risk of failure (if it is 3D).Fig. 5 shows the result of risk mapping of the case study.
(Axis X: Probability, Axis Y: Severity) Fig. 5. Project Risk Mapping Decision-makers can define a risk constraint model as having at least one percent of the projects in the final portfolio at low risk, at least b percent of projects at medium or low risk (b>a), and others (if any) can be chosen from the high-risk projects.Eq. ( 11) to Eq. ( 13) show these constraints.
After creating the mathematical model, it is time to solve it.Exact solving methods such as Bender's algorithm or meta-heuristic methods such as genetic algorithm can be used to solve these models.The solution method is chosen based on the degree of complexity of the problem, the number, and types of variables and constraints, the type of objective function, and the accuracy of the input variables and decision variables.In this study, due to the fuzziness of the variables, the nonlinearity of the model, and the binary decision variables and their relatively large number, the multi-objective genetic algorithm NSGA-II was used to solve the model.

Solution procedure
NSGA II is one of the most widely used and powerful algorithms for solving multi-objective optimization problems as its effectiveness in solving various problems has been proven.In 1995, Srinivas and Deb introduced the NSGA optimization method for solving multi-objective optimization problems.The flowchart of the NSGA II algorithm is as follows: Fig. 6.The procedure for solving the proposed model  Generating an initial answer: The first step of the multi-objective genetic algorithm is to generate some initial answers.Initial answers must be chosen randomly to cover the feasible space.Quality means the superiority of one answer over the other.In optimization problems, superiority can be determined by the objective function.If the objective function is of maximum type, the quality is the same as the value of the objective function, but if the objective function is of minimum type, the quality function is f  or 1 / f (f is the objective function formula).
 Sorting the non-dominant answers: First, it is necessary to define the concept of dominance: 1-X is dominant over all members of A.
Members of A are a set of points that are worse than x at least one criterion and better than X in no criterion.
2. X is dominated by all members of C.
X is not better than any of the C members and is worse than them at least than one criterion.
 Computation Crowding Distance: Among the non-dominated answers, the answers with more crowding distance are better than the others, because they yield a wider range of answers.Therefore, the answers with the highest crowding distance are selected.The crowding distance is calculated as follows:  Crossover: The crossover operation in the NSGA II algorithm is the same as the single-objective genetic algorithm.In this operation, the non-dominated answers are separated and then combined based on a predetermined mechanism (roulette wheel, tournament or random) and produce new answers.This combination of answers is done in ways such as single-point, two-point, uniform, and so on.The purpose of crossover is to combine the characteristics of different answers to produce better answers.

 Mutation:
The mutation operation in the NSGA II algorithm is exactly the same as this operation in the singleobjective genetic algorithm.In this operation, some answers are randomly selected and some of their cells are randomly changed.The purpose is to make abrupt changes to the answers and escape the local optimum trap.
 Updating Pareto Frontier: In this stage, the Pareto Frontier of the previous iteration is compared with the new iteration.If one or more answers on the new Pareto frontier are better than the previous frontier, they will be replaced.Finally, the dominant answer with the highest crowding distance is reported.
In this Study, NSGA II with the following parameters and specifications is used to solve the mathematical model: The decision-makers can take any point on the Pareto front as the answer and select the project portfolio based on its corresponding values.Here, the highlighted point in Figure 6 is chosen as the answer.
By checking this answer, we can verify the selected projects, the portfolio risk status, the method of balancing the substrates, and the optimality of this answer.Fig. 7 displays the value of the above factors for the selected answer.This report can also be prepared as a management dashboard for the rest of the points on the Pareto Front.

Conclusion
In this study, a framework for project portfolio selection was presented in which, in addition to the strategic alignment of projects with organizational goals, a balance between the efficiency of the main sub-portfolios was also considered.The output of this framework is an efficient frontier that enables decision-makers to choose the final project portfolio from the points located on it.Some of the limiting assumptions considered in this study can be relaxed in future works to bring the model closer to reality.For example, gray numbers or probability functions can be used to express the project costs instead of fixed numbers.The costs can also be time-dependent, or they can be divided into fixed and variable costs.The model presented in this study is designed for static portfolios, meaning that no project is underway and work begins from the beginning of the planning horizon.
The researchers can make this model a dynamic model in future works.Moreover, efficient heuristic or exact methods can be proposed for solving this problem.

Fig. 4 .
Fig. 4. Ranks and normalized weights of the projects Population Size: 150  Number of Generations: 100  Crossover probability: 0.8  Mutation probability: 0.3  Crossover Method: Binary Selection  Mutation Method: Binary Selection  Selection Method: Tournament Fig. 7 shows the Pareto Frontier obtained after running the NSGA II Algorithm in Matlab 2016 software.

Fig. 7 .
Fig. 7.The Pareto Frontier Table 1 lists the various methods for selecting the project portfolio and refers to some articles that have used any of the techniques.This category includes a summary of review articles provided by previous authors.It also includes PPS hybrid methods.

Table 2
The list of candidate projects

Table 3
Weights of the criteria (By FAHP method)