On the Resilience of Ant Algorithms. Experiment with Adapted MMAS on TSP

This paper focuses on the resilience of a nature-inspired class of algorithms. The issues related to resilience fall under a very wide umbrella. The uncertainties that we face in the world require the need of resilient systems in all domains. Software resilience is certainly of critical importance, due to the presence of software applications which are embedded in numerous operational and strategic systems. For Ant Colony Optimization (ACO), one of the most successful heuristic methods inspired by the communication processes in entomology, performance and convergence issues have been intensively studied by the scientific community. Our approach addresses the resilience of MAX–MIN Ant System (MMAS), one of the most efficient ACO algorithms, when studied in relation with Traveling Salesman Problem (TSP). We introduce a set of parameters that allow the management of real-life situations, such as imprecise or missing data and disturbances in the regular computing process. Several metrics are involved, and a statistical analysis is performed. The resilience of the adapted MMAS is analyzed and discussed. A broad outline on future research directions is given in connection with new trends concerning the design of resilient systems.


Introduction
In recent decades, the community of researchers from various domains has shown a growing interest towards systems resilience. Of course, systems of any kind are expected to meet requirements and maintain their operational characteristics as long as possible, while facing changing (sometimes unpredictable) conditions, environments and challenges. The concept of resilience has emerged for complex, dynamic systems and can be generally defined as "the capacity of a system to tolerate disturbances while retaining its structure and functions" [1]. The list of domains where systems resilience is important is long, and specific definitions have been provided: engineering [1], economics [2,3], environment [4,5], ecology [6,7], psychology and neurobiology [8][9][10], sociology [11]. In computer science, resilience has been defined mainly for networks [12][13][14][15] and large-scale distributed systems [16,17], security [18] and soft infrastructure systems [19]. The term resilience is derived from the Latin "resilire", with the meaning "springing back" or "jumping back up", which sends us to the views of resilience considering the recovery of a system after facing adverse situations.
Given the wide interest and importance of the concept, not only for researchers but for policymakers too, numerous and sometimes diverging interpretations and perceptions have been proposed for

The Concept of Resilience and Some of Its Instances
This section presents the concept of resilience as it evolved over the ages and across areas of human activity and research, with the intent to underline its main characteristics and emphasize both its essence and its multiple facets.
For physical and social systems, resilience is defined in Bruneau and Reinhorn [26] as consisting of the following properties: robustness (the ability of elements to withstand a given level of stress or demand without suffering degradation or loss function), redundancy (the extent to which the system's elements are substitutable), resourcefulness (the capacity to identify problems, establish priorities and mobilize resources when disruption conditions manifest), and rapidity (the capacity to meet priorities and achieve goals in a timely manner in order to recover functionality and avoid future disruption). Defining the concepts of vulnerability and resilience, Proag [27] differentiates between two broad forms of the last one: hard resilience-the direct strength of a structure when placed under pressure, and soft resilience-the ability of systems to absorb and recover from the impact of disruptive events without fundamental changes in function or structure.
The need for designing systems that allow for their evolvability has been recognized [28], since the majority of changes that a system is going to cope with are not foreseeable at the moment of its design. Technology refreshment programs have been proposed [29] to enhance systems capability for facing external sources of change.
Computer scientists have used the term "resilient" since 1976 and use it more and more often, sometimes as a synonym of fault-tolerant [30]. Avizienis et al. [31] define resilience as "the persistence of service delivery that can justifiably be trusted, when facing changes" or "the persistence of avoidance of failures that are unacceptably frequent or severe, when facing changes". Resilient software systems are needed in applications executed on distributed platforms (very often, with a dynamically changing topology), satisfying real-time requirements, interacting with real-world and managed externally by an authority. Approaching software resilience implies to model the functionality that the system provides, the way the system can provide that functionality and the deployment of constraints. Axelrod [32] notes the definition given in Marcus and Stern [33] for a resilient system: "one that can take a hit to a critical component and recover and come back for more in a known, bounded, and generally acceptable period of time" and presents the factors that may diminish software resiliency: complexity, interdependency and interconnectivity, globalization, hybridization, rapid change and reuse in different contexts.
In Chandra [34], an extensive study regarding the synergy between biology and systems resilience is given. The thesis focuses on extracting principles and heuristics useful for designing resilient engineering systems, based on the resilience seen in nature. Resilience attributes (adaptability, redundancy, scalability, self-organization, interoperability, dynamic learning) are identified in immune systems, colonies of social insects and ecosystems, and a domain-independent qualitative model of resilience is developed.
When talking about resilience, it is important to measure it. Enhancing the resilience of a system involves the understanding of the relationships that exist between the various components of that system. Although models for systems resilience can be useful, these should be interpreted on specific context and purpose [35] and it is important to be able to quantify the resilience of systems.
Measuring resilience naturally depends on the system being studied. While qualitative assessment reveals how bad a system can be disturbed, quantitative measures give estimates of performance [36]. Some general considerations on metrics for resilience are given in Ford et al. [18].
For the purposes of our work, which approaches TSP with a nature-inspired type of algorithm, we have used several metrics, further described in Section 4.3.

Adaptation of an Ant Algorithm to Allow Disturbances Simulation
During the last century, TSP was one of the most studied NP-hard problems in Combinatorial Optimization and Operations Research, mostly due to its applications in industry, transportation and logistics. The ant colony metaphor was easily adapted to TSP. Sections 3.1-3.4 present the foundations of the ant algorithmic approach, TSP as test problem and the particular Ant algorithm adapted for tackling real-world issues of TSP.

An Efficient Nature-Inspired Class of Algorithms
An ant colony is an example of a highly distributed natural, cooperative multiagent system. It is comprised of hundreds (or thousands) of completely autonomous simple agents (the ants) and is robust with respect to loss of individual agents and to changes in the environment [37]. The performance of the activities of the colony can be seen as highly effective [38] and are, in some cases, optimal or near-optimal.
In a colony, the scouts are the ants exploring the territory around the nest in search for food. At the beginning, a scout wanders and finds a food source only by chance. As a result, the route the scout ant follows from the nest to the food source displays a random pattern. Conversely, the route back is straight-lined and impregnated with pheromones, which are a very elaborated form of chemical communication [39]. The scout alerts its worker mates, who follow the pheromone trail to the food source. The number of ants using the trail is proportional to the available amount of food and, as long as the food source is not exhausted, the ants will reinforce the path while carrying their load back to the nest. This behavior determines higher and higher concentrations of pheromone on the shorter paths, as more and more ants are using them. On longer paths, the pheromone concentration decays over time.
Based on this ant foraging model, Dorigo et al. [40,41] have founded a new paradigm of evolutionary computation that has come to be known as Ant Colony Optimization (ACO) [42]. ACO is perhaps the most successful application inspired by ants-and perhaps by any insect society. There are three ideas from natural ant behavior that have been transferred to the artificial ant colony: the preference for paths with high pheromone level, the higher rate of growth of pheromone on shorter paths and the trail mediated communication among ants.
Since its development, researchers have applied ACO to a significant number of combinatorial optimization problems including the Traveling Salesman Problem (TSP), the Vehicle Routing Problem, graph coloring, allocation problems, swarm robotics, routing in telecommunication networks, constraint satisfaction and many other related issues.

TSP as Test Problem
TSP [43][44][45] was the first combinatorial optimization problem successfully solved with ACO [41]. We chose the Euclidean TSP as a support problem for the experiments on the resilience of ant algorithms implementations mostly due to its numerous real-life applications, most of which involve uncertainty. This characteristic is highly adequate for the aim of our study, namely, to analyze the ant algorithms capacity to "turn back" towards good solutions, even after significant interventions in the normal ACO execution process.
Given a map with n sites and their pairwise distances, TSP requires finding one shortest cyclic tour through all the sites or, more formally, the least cost Hamiltonian cycle in a weighted graph. Formally, Each arc (i, j) is assigned the positive value d(i, j) which is the distance between locations i and j. TSP is the problem of finding the shortest closed tour visiting the n = |N| nodes of G exactly once.

Implementations of Ant Algorithms for TSP
The software package ACOTSP [64], freely available for both researchers and practitioners, includes implementations of ACO Algorithms for the symmetric TSP: Ant System, Elitist Ant System, MAX-MIN Ant System, Rank-based Ant System, Best-worst Ant System, and Ant Colony System. The performance of these implementations depends on instance dimension, parameter variation strategies, and particular improvements that may be operated upon (such as ants' specialization [65], human intervention in the ant loop [66], pheromone correction methods [67]) but is generally perceived by the scientific community as reasonably high. What makes these implementations very "usable" is their capacity to return as high-quality solutions as possible, independently of the allowed computation time [68,69].
Ant System (AS) was the first ACO algorithm, proposed by Dorigo et al. in 1991 [70-72]. The algorithm of TSP solving with AS assumes a colony of m artificial ants that operate on the graph G and a pheromone matrix τ = τ ij for the construction of the solutions. At the beginning, τ ij are set to the same initial, positive value. The m ants are randomly deployed in the nodes of G. Within t max iterations, the following repeat.
Each ant constructs a complete tour using Equation (1) below. If t is the index of the current iteration, an ant k currently at node i chooses the node j to move to (from the set J k (i) of the nodes not visited yet) by applying the following probabilistic transition rule: where η ij = 1/d(i, j) stands for the heuristic information called "visibility" of arc (i, j), while τ ij (t) represents the "desirability" that the arc (i, j) belongs to the tour of the ant k. This desirability reflects the experience acquired by ants during the problem-solving and models a learning process. Parameters α and β set the balance between the influence of the pheromone level and that of the heuristic information. Pheromone update is applied according to Equation (2) for all ants k and all arcs(i, j): where ρ ∈ (0, 1] represents the proportion of the old pheromone quantity that is carried over to the next iteration of the algorithm and the added quantity of pheromone as defined in Equation (3), with T k (t) the tour built by ant k and L k (t) its length: For our experimental study concerning the resilience of the ant algorithms, we have chosen the MAX-MIN Ant System (MMAS) proposed by Stützle and Hoos [73] to be modified according to the purpose of our research. Although the focus is on resilience and not on performance, it is important to note that MMAS was designed as an algorithmic variant to achieve better performance than the ant system, from which MMAS originates.
MMAS is the same as AS except that the pheromone trails are updated offline: only the arcs which were used by the best ant in the current iteration receive additional quantities of pheromones. Also, the pheromone trail values are placed in an interval [τ min , τ max ] and the trails are initialized to their maximum value τ max .
Moreover, MMAS calls the 3-opt local search procedure [58] at the end of the tour completion for every ant, thus being a hybrid method, which integrates both constructive and heuristic local-search.

New Parameters for MMAS
Let MMAS(epoch, p_nod_ch, p_inten, ampl) be the MAX-MIN Ant System enriched with a set of parameters, which are described below. The role of the parameters resides in introducing various disturbances in the algorithmic loop which determines the solution construction.
The idea is to alter the pheromone matrix periodically. This process allow us to simulate features such as uncertainty and dynamicity in the TSP instances to be approached with MMAS(epoch, p_nod_ch, p_inten, ampl). These features reside in numerous real-world problems that are modelled with TSP, which justifies our approach. It assumes that the number of iterations when the system works without intervention defines the parameter epoch. At the beginning of MMAS(epoch, p_nod_ch, p_inten, ampl), the algorithm works as presented in Section 3.3. At the end of the first period defined by epoch (after a number of iterations equal to epoch), the values in the pheromone matrix are changed non-deterministically: the degree of perturbation is introduced by means of the parameters p_nod_ch-the percentage of the nodes to suffer modifications, and p_inten-the percentage of arcs incident to the chosen nodes, arcs whose pheromone load are modified. The parameter ampl gives the amplitude of the modification, setting an interval of length 2 * ampl * old_value, centred in the old pheromone value. The new pheromone value is randomly generated within this interval. The process repeats all along the algorithmic loop of MMAS. The above parameters' identifiers head the data columns in the files available at [74].
This design of MMAS was also used by Crişan et al. in [75] in order to perform an ant-based system analysis on TSP under real-world settings, when comparing it with other hybrid optimization techniques. The feature introduced by these new parameters is the uncertainty that manifests in the real world. This alteration of the pheromones may simulate missing or imprecise data, which frequently happens in real-world systems that include TSP solvers.

Experimental Settings
Our experiment concerning the ant algorithms resilience was performed with the modified MMAS(epoch, p_nod_ch, p_inten, ampl) and three TSP instances. Specific details regarding instances, software, values for its parameters and organization of the output data are given below.

TSP Test Instances
The instances chosen for tests in our experiment are from TSPLIB [76,77], the well-known library of sample instances for combinatorial optimization problems. The problem names (which also include the number of nodes of the corresponding graph) are d2103, fl3795 and rl1889. For all three, the 2-dimensional Euclidean distances between nodes are rounded up to the next integer. The optimum tour lengths are known to be 80,450, 28,722 and 316,536, respectively.
d2103 is one of the instances proposed by Reinelt [76], coming from the practical problem of drilling holes in printed circuit boards. The drilling problem can be approached as a sequence of TSP instances, one for each hole diameter [78]. fl3795 is another drilling instance, with a high degree of difficulty [79] given by the placement of the nodes in very particular clusters. The last instance, rl1889, is one of the examples drawn by Reinelt [76] from geographic problems featuring locations of cities on maps.

Parameters Values and Ant Implementation
The Ant implementation that we propose is synthetically described by the following algorithmic scheme. The skeleton is according to [73], and the new parameters epoch, p_nod_ch, p_inten, ampl are introduced. procedure MMAS(epoch, p_nod_ch, p_inten, ampl) set parameters m, α, β, ρ set the maximum pheromone trail τ max to an estimate /* the maximum possible pheromone trail is asymptotically bound */ initialize pheromone trails to τ max set epoch, p_nod_ch, p_inten, ampl set previous_best and global_best to an initial solution /* e.g., found with Nearest-Neighbor heuristic */ no_epochs = 0, better_sol = 0, epoch_previous_best = 0 iteration = 0 while (termination condition not met) do iteration = iteration + 1 ConstructSolutions /* each of the m ants builds its solution*/ ApplyLocalSearch /* 3-opt local search procedure is called */ set current_best to the best solution, delivered by best_ant if current_best < global_best then global_best = current_best set global_time_best to the duration (seconds) required by best_ant to deliver its tour iteration_best= iteration endif epoch_current_best = no_epochs UpdatePheromoneTrails /* only the arcs used by best_ant are updated */ /* according to (2) For each of the three instances, all the 3·3·2·5 = 90 variants of MMAS, generated by the possible combinations of the parameter values, were executed. For each variant, ten trials have been performed in order to report a line in the result files, following the guidelines proposed by Birattari et al. [80]. One core (of a dual-core processor at 2.5GHz frequency and 250 GB RAM) was assigned only for the execution of MMAS software.

Recorded Data
For every TSP instance, the following execution outputs and statistics were recorded. These data are stored in a file with the same identifier as the name of the instance, available for download at [74].
• better_solutions: number of times when the application manages to find a better approximation of the optimum tour, although the information stored on arcs (the pheromone quantities) was disturbed (averaged over ten trials). • number of epochs: how many times during the execution the information on the arcs was disturbed (on average, in ten trials). During an epoch, the pheromones evolve according to Equations (2) and (3). • average-best: this is the main metric based on which we study the algorithm resilience. It represents the approximation of the TSP instance solution, as delivered by the modified ant algorithm MMAS (also averaged over ten trials). • recovery speed: is the ratio between better_solutions and number of epochs. It gives a measure of the recovering capacity, showing how many times (on average, in ten trials) average-best improves during an epoch. • average-iterations: is the average of the ten iteration numbers that found the best solution. • stddev-best: standard deviation for the ten best solutions delivered by the ten trials. • stddev-iterations: standard deviation for the ten iterations when the best solution was found. • best try: the best solution in ten trials. • worst try: the worst solution in ten trials. • avg. time-best: average of the times for finding the best solution in ten trials. • stddev.time best: standard deviation for the time durations needed to find the best solution, for the ten trials. • distance-to-optimum: is the difference between the (average) solution found by the ant algorithm (average-best) and the optimum solution. • relative-error: distance-to-optimum/optimum tour length * 100.
Using the instance d2013 and the following values for the new parameters: epoch = 20, p_nod_ch = 20, p_inten = 50, ampl = 10 (which corresponds to line 22 in the data file d2013), the adapted MMAS works as described below.
In a trial of MMAS (20,20,50,10), after every 20 iterations, for 20/100·2103 = 420 nodes randomly chosen, the pheromone load is modified for half of the arcs (also randomly chosen) of the selected nodes. The number of arcs in this situation is 420· 50/100·2102 = 441,420. For every arc, the new pheromone value is randomly chosen in an interval of length 2·(10/100)·v, centred in v, which is the old pheromone value. Under these settings, on average in 10 trials, we obtained the following outcome: • number of epochs = 516. We obtained 516 series of 20 iterations and, starting with the second one, the pheromone alteration takes place as described above. that 105.5 represents 0.13% (= the value of the relative-error) from the optimum. The algorithm succeeded to provide a solution that is only 0.13% worse than the optimum, which is a very good outcome, given the repeated intervention into the solution construction mechanism. In terms of resilience, we can assess that this is high for MMAS (20,20,50,10). • stddev-best = 54.64. The solutions delivered by the algorithm are spread around the average-best = 80,555.5 with a standard deviation of 54.64. The best solution among the ten collected in ten trials is best try = 80,476 and the worst is worst try = 80,638. • average-iterations = 633.5, that is, the average of the iteration numbers when the ten best solutions were delivered. The standard deviation of these ten values is stddev-iterations = 249.93, quite a big value, showing that the best solution can be found early in the iterative process, or after a significant number of iterations. • avg. time-best = 36.81 s. This is the average value of the execution durations (measured in seconds) of the 10 trials. Given the computer system configuration that the experiment was processed on, the result is very good. The standard deviation for the execution times of the ten trials is stddev.time best = 13.

Metrics for TSP Resilience Assessment with Adapted MMAS
According to Jackson [81], "System resilience is the ability of [...] systems to mitigate the severity and likelihood of failures and loses, to adapt to changing conditions, and to respond appropriately after the fact".
Researchers in systems resilience point that resilience is not a 1-dimensional quantity [18]. Moreover, a shift from robustness-centered design to principles of more flexible and adaptive design has been noticed [33]. This approach includes the following ideas: composing the measures of different aspects in order to reason about resilience; the metrics should be particular for a specific perturbation; the metrics should be dependent on the system boundaries; the customer requirements drive the metrics for resilience; take into consideration other aspects except for the system output. There is no universal way to combine the particular metrics into a global one, but the assessment process is specific to the synergy of all considered aspects.
When transferring the above considerations to the transportation field, resilience could be seen as the ability of transportation systems to react to unexpected disturbance. This can be seen at the individual level, when transportation needs are satisfied for each customer, as well as at the community level-such as in emergency state or special events, when this perspective comes first. Transportation systems diversity (which includes multiple modes, routes, and system components) and critical information related to collection and distribution are key elements for resilience [82]. Both these features can be modelled in ant algorithms [83].
In line with these considerations, when assigning the resilience of MMAS(epoch, p_nod_ch, p_inten, ampl) we chose to consider two kind of metrics: • Time-performance metrics: better_solutions, number of epochs, average-iterations, stddev-iterations, avg. time-best, stdev.time best and recovery speed. All these data give us a feedback on how fast the altered applications work. • Path-performance metrics: average-best (which is the objective function) and stddev-best, best try and worst try, three measures indicating how good the altered applications still perform.
Sections 5.2-5.4 present both a brief descriptive analysis (which can be used for a qualitative assessment of the MMAS applications resilience) and a statistical one, based on the results of the experiment, for each of the three TSP instances. The files presenting extensive statistical analysis are available at [74].
For the descriptive analysis, only a part of the data from the files d2103, fl3795 and rl1889 was extracted in Tables 1, 4 and 7, respectively. The tables present the results obtained for the metrics recovery speed and average-best, reported for the highest value of ampl: 40%. That is, if ω is a pheromone value on a selected arc (according to p_nod_ch and p_inten) then the altered pheromone value randomly falls within the interval [0.6 ·ω, 1.40 ·ω]. These 18 variants of MMAS (of all 90) are those simulating the most substantial disturbances in the ACO constructive process of the approximate solution. Moreover, each table includes the lines for which the smallest and the highest values for the relative-error have been registered in all the 90 MMAS variants. For example, in Table 1, these correspond to executions with ampl = 0 (which is the original MMAS, with no altered pheromones). For the statistical analysis, two directions are set. With epoch, p_nod_ch, p_inten and ampl as factors, the following are studied: • The way the time-performance metrics depend on the factors and how these are inter-correlated.

•
The way the path-performance metrics depend on the factors and how these are inter-correlated.
The material displaying the results of the two approaches is quite extensive, and thus we present in Tables 2, 3 (for d2103), 5, 6 (for fl3795), and 8, 9 (for rl1889), the regression models for the metrics average-best and recovery speed only. The factors for the non-zero values of the categorical variable ampl (the amplitude of the pheromone alteration) are ampl10%, ampl20%, ampl30%, and ampl40%.
The statistical analysis was performed with the free software environment for statistical computing and graphics R, version 3.4.4.

Discussion on Behaviour of Adapted MMAS with TSP Instance d2103
The length of the optimum solution for the TSP instance d2103 is 80,450. As data in Table 1 display in all lines except no. 4 and no. 12, the relative error for the 18 variants of MMAS with ampl = 40 is placed in the interval [0.1, 0.25], corresponding to an absolute error (distance-to-optimum in the file d2013) in the interval [82.1, 197.5]. We note that the minimum but also the maximum relative-error values (in lines 4 and 12) correspond to MMAS with ampl = 0.
For the original MMAS, recovery speed ∈[0.4934, 1.6154] (see file d2013), while for the applications corresponding to data in Table 1 (with ampl = 40), recovery speed ∈ [0.4173, 1.3642] and only three values are smaller than the minimum recovery speed for MMAS with ampl = 0. These findings show that MMAS(*,*,*, 40) continue to behave normally, in the sense of building efficient solutions for the instance under consideration, successfully recovering after pheromone alterations.
The tableau depicted by the metrics average-best and recovery speed allows us to qualitatively assess the resilience of the adapted MMAS applications (according to the qualitative model in [34]) as High.
The regression model for the metric average-best, for instance d2103, follows in Table 2.  The adjusted low value of the R-Squared index shows that the linear regression model is bad. There is no relation between average-best and the predictors. The interpretation is that average-best is not influenced by the perturbations introduced by the parameters epoch, p_nod_ch, p_inten, ampl.
The regression model for recovery speed follows in Table 3.  In Figure 1, the correlogram between the other time-performance metrics is displayed. Positive correlations are displayed in blue and negative correlations in red. Colour intensity and the size of the circles are proportional to the correlation coefficients. In the right side of the correlogram, the legend colour shows the correlation coefficients and the corresponding colours. In Figure 1, the correlogram between the other time-performance metrics is displayed. Positive correlations are displayed in blue and negative correlations in red. Colour intensity and the size of the circles are proportional to the correlation coefficients. In the right side of the correlogram, the legend colour shows the correlation coefficients and the corresponding colours.

Discussion on Behaviour of Adapted MMAS with TSP Instance fl3795
The optimum for TSP instance fl3795 is 28,722. As data in Table 4 (except line no. 9) display, the relative error for the 18 variants of MMAS with ampl = 40 is placed in the interval [3.84, 4.72], corresponding to an absolute error (distance-to-optimum in the file fl3795) in the interval [1101. 5, 1354.3]. For this instance, the relative error is higher than in the case of the instance d2103, but it can

Discussion on Behaviour of Adapted MMAS with TSP Instance fl3795
The optimum for TSP instance fl3795 is 28,722. As data in Table 4 (except line no. 9) display, the relative error for the 18 variants of MMAS with ampl = 40 is placed in the interval [3.84, 4.72], corresponding to an absolute error (distance-to-optimum in the file fl3795) in the interval [1101. 5, 1354.3]. For this instance, the relative error is higher than in the case of the instance d2103, but it can still be appreciated as small. We note that the minimum value for the relative-error is obtained for MMAS with ampl=10 (3.42, in line 9). If we compute the average values of the variable avg.time-best for d2103 and for fl3795, we notice that the first is 47.09 and the second is 38.34. This translates into larger execution times until the best solution is found (on average) for d2103 than for fl3795, although the number of nodes is smaller for the first instance. The explanation for this has to be searched into the structure of the instances. If we characterize the two instances by their order parameter OP, defined in [84] as the standard deviation of the distance matrix divided by its average value, we have OP = 0.4866 for d2103 and OP = 0.5393 for fl3795.
For the original MMAS, recovery speed ∈ [0.3721, 0.725] (see file fl3795), while for the applications corresponding to data in Table 4 (with ampl = 40), recovery speed ∈ [0.4167, 0.7027]. In this case, no value is smaller than the minimum recovery speed for MMAS with ampl = 0. For this data set, the metric shows that MMAS(*,*,*, 40) perform even better than the original implementation MMAS.
The results above allow us to qualitatively assess the resilience of the adapted MMAS applications as High [34].
Linear regression models for the metrics average-best and recovery speed follow in Tables 5 and 6.   In Table 5, the adjusted R-Squared index shows that the linear regression model is good in the case of this instance, but knowing epoch, p_nod_ch, p_inten, and ampl, we could only explain 50.84% of the variance in average-best. ampl10% is negative correlated to H. Other independents are all positive correlated to H.
The p values show that ampl20%, ampl30%, ampl40% are statistically significant in the model. Again, the behavior of the adapted MMAS differs from the case of d2013. On fl3795, the quality of the solutions provided by our implementation is influenced (mostly) by epoch and ampl.
The adjusted R-Squared index shows that, in this case, there is no linear relation between the factors epoch, p_nod_ch, p_inten, ampl and recovery speed. Again, the result differs from the correspondent one for d2103.
The plot in Figure 2 presents the intercorrelations between the path-performance metrics average-best, stddev-best, best try and worst try and displays the distribution of each of them on the diagonal. On the bottom of the diagonal, the bivariate scatter plots with a fitted line are displayed. The value of the correlation and the significance level appear as stars on the top of the diagonal. Each significance level is associated to a symbol: p-values (0, 0.001, 0.01, 0.05, 0.1, 1) have the corresponding symbols ('***', '**', '*', '.', ' '). Intercorrelations between the path-performance metricsaverage-best, stddev-best, best try and worst try for fl3795.

Discussion on Behaviour of Adapted MMAS with TSP Instance rl1889
The optimum for TSP instance rl1889 is 316,536. As data in Table 7 (except line no. 17) display, the relative error for the 18 variants of MMAS with ampl = 40 is placed in the interval [0.49, 0.69], corresponding to an absolute error (distance-to-optimum in the file rl1889) in the interval [1565.4, 2184.6]. For this instance, the relative error is very small also, as it is for d2103. We note that the minimum value for the relative-error is also obtained for MMAS with ampl=10 (0.49, in line 17), and Figure 2. Intercorrelations between the path-performance metricsaverage-best, stddev-best, best try and worst try for fl3795.

Discussion on Behaviour of Adapted MMAS with TSP Instance rl1889
The optimum for TSP instance rl1889 is 316,536. As data in Table 7 (except line no. 17) display, the relative error for the 18 variants of MMAS with ampl = 40 is placed in the interval [0.49, 0.69], corresponding to an absolute error (distance-to-optimum in the file rl1889) in the interval [1565. 4, 2184.6]. For this instance, the relative error is very small also, as it is for d2103. We note that the minimum value for the relative-error is also obtained for MMAS with ampl=10 (0.49, in line 17), and that this value corresponds to a smaller absolute error of 1543.8. For the original MMAS, recovery speed ∈[0.7864, 2.5652] (see file rl1889), while for the applications corresponding to data in Table 7 (with ampl = 40), recovery speed ∈ [0.641, 2.444], with only two values smaller than the minimum recovery speed for MMAS with ampl = 0. For this data set, the metric shows that MMAS(*,*,*, 40) continue to behave very well and deliver quality solutions.
The results above allow us to qualitatively assess the resilience of the adapted MMAS applications as High [34] for this instance, too.
Linear regression model for the metric average-best follows in Table 8. Considering the adjusted R-Squared index we conclude that the linear regression model does not explain the average-best dependence of epoch, p_nod_ch, p_inten, ampl. As in the case of d2103, the implementation performs very well for every combination of parameters.
For the metric recover speed, the linear regression model is presented in Table 9. This model indicates a negative correlation of the metric recovery speed with ampl and positive correlations with p_nod-ch, p_inten and epoch. As for the case of instance d2103, only the factors epoch and ampl are significant in the model.
In Figure 3 we chose to display the same type of correlogram as in Figure 1, in order to compare the intercorrelations of other time-performance metrics for d2103 and rl1889. In Figure 3 we chose to display the same type of correlogram as in Figure 1, in order to compare the intercorrelations of other time-performance metrics for d2103 and rl1889. We notice the same types of correlations for d2103 as for rl1889, except for average-iterations and stddev.time best, for which the correlation is positive for d2103 and negative for rl1889. For rl1889, the order parameter [84] is OP = 0.5129. We notice the same types of correlations for d2103 as for rl1889, except for average-iterations and stddev.time best, for which the correlation is positive for d2103 and negative for rl1889. For rl1889, the order parameter [84] is OP = 0.5129.

Discussion
The statistical analysis of data recorded for two of the studied TSP instances (d2103 and rl1889) reveals that the adapted MMAS implementation is highly resilient: both the delivered solutions and the time-performance metrics show that the ant algorithms perform very well when distortions appear. For these two instances, under the settings presented in Section 4.2, the quality of the solutions returned by MMAS(epoch, p_nod_ch, p_inten, ampl) do not depend on the four factors, while the recovery speed depends on epoch and ampl. These facts, supported by the statistical analysis of the experimental data, allow us to conclude that the applications are highly resilient. For the other TSP instance, fl3795, the adapted MMAS also proves to be resilient, but with a different behavior pattern: the quality of the solutions returned is slightly influenced by epoch and ampl, while the recovery speed does not depend on the four factors. The answer to the question "Why does this happen?" could be searched among multiple causes, but two are immediate: the number of nodes and the instances of structures. These will drive our next studies: one focused on larger instances and another based on the structural features [85] of TSP instances. The nondeterministic nature of MMAS can also suggest that research on another output data set, for the same instances, is worth being investigated.
Our previous research with ACO [86,87] gave us some insight on the ant algorithms behavior under different circumstances. When referring to the resilience attributes synthesized in [34], we notice that all of them are representative for this class of algorithms. A brief enumeration of resilience attributes (and their translation to ACO and the artificial ants) include: interconnection among the agents (artificial ants communicate, which leads to system flexibility); redundancy of units (artificial ants perform the same simple tasks, but the performance of the system emerges from their interaction); diversity and variety in resilient systems, which means that there are different parts and a variety of ways of functioning (all these features have been implemented in ACO); the diversity of approaches, which is about generating various views on a situation (also possible in ACO). Moreover, parallelism and distributiveness (implicit in ACO, with no central command and no single point of failure) and handling of increased workloads (easily achievable with ACO), dynamic learning (the pheromone updating mechanism) are important attributes. Finally, self-organization lies in the natural model that inspired ACO and further developments of this heuristic approach.
When optimization problems arise in practice, the users need to have confidence in the quality of the solutions provided by their applications. There are many situations when things do not go "by the book". Sometimes unexpected events happen and change the graph representation of a problem, or make some data unavailable. Other demanding situations can emerge when communication is interrupted, data are imprecise or mistaken, or even human errors appear. In such cases, if applications are resilient, these manage to provide solutions which are sufficiently close to those provided in normal conditions. Ant algorithms are systems that behave in a manner that is widely accepted by researchers and practitioners as resilient. One goal for our future work would be the design of ant algorithms that embed other types of intelligence and are able to learn and adapt. This falls into line with the innovative proposal of Florio [88]: " . . . we need to look at the nature and change the paradigm of resilience from «being at work while staying the same» to «being at work while getting better», namely becoming a better system or being".