Multiagent Learning of Asset Maintenance Plans Through Localised Subnetworks

Maintenanceplanningofnetworkedmulti-assetsystemsisacomplexproblemduetotheinherentindividualandcollectiveassetconstraintsanddynamicsaswellasthesizeofthesystemand interdependenciesamongassets.Althoughmulti-assetsystemshavebeenstudiednumeroustimesinthepastdecades,maintenanceplanningimplicationsofthesystem’snetworkcharac-teristicshavebeenbarelyanalysed.Likewise,solutionsthatconsiderthenetworkperspectivesuﬀerfromscalabilityissuesasanetwork-wideobservabilityisassumed.Thispaperproposes anetworkmaintenanceplanningapproachbasedonthedecompositionofthemulti-assetnetworkintoﬁxed-sizelocalisedsubnetworks.Theoverallnetworkmaintenanceplanisproduced byaggregatingthesubnetworkmaintenanceplans,whicharecomputedindependentlyviaamultiagentdeepreinforcementlearning(MARL)algorithm.Theresultsareevaluatedagainst anetwork-wideapproachaswellasthecommonly-usedindividualapproach.ThepaperalsointroducesasystematicapproachtointegratetheMARLresultingpolicyinamulti-assetagent-basedmodel.Simulationresultsofseveralrandomassetnetworksandalargenationwidenetworkinfrastructureshowthat,althoughanetwork-wideapproachoutperforms,onaverage,other approachesconsidered,thelocalisedsubnetworksapproach,providesanacceptablealternativeinnetworkswithsmall-worldproperties,withouttheneedofanetwork-wideview.


Introduction
Maintaining critical infrastructure assets in a timely manner is key to prolonging their lifespan.This reduces the maintenance operation costs and the overall impact on the service delivery.Most of the attention in research and practice has been around individual asset maintenance and only recently, the sources of complexities of multi-assets systems have been considered for a system-wide maintenance planning (Petchrompo and Parlikad, 2019).
Although progress has been made to understand the collective nature of the maintenance planning problem, there is still a lack of studies covering the implications of maintenance planning in networked multi-assets systems as discussed in section 2. These studies are relevant to support the management decisions in the context of large networks of assets aggregating subnetworks with multiple topologies such as those present in nationwide critical infrastructures, among others (Zio, 2007).This type of networked multi-asset systems bring additional challenges to the maintenance problem.
On the one hand the interdependences among assets and on the other, the scale of the systems restrict the application of (global) network-wide approaches that consider the state of every individual asset and the resulting network dynamics.
This paper explores the use of network-specific maintenance planning with an aim to minimise maintenance costs and the impact on service provision.A network-specific planning approach captures the network characteristics and uses that information to identify opportunities for synergistic maintenance activities in a multi-asset system.Our previous work has shown that, although network-wide approaches might yield best results, compared to individual approaches and thanks to the global view of the system, when the multi-asset system grows, the computational resources needed to generate the plan also scales up (Pérez Hernández, Puchkova and Parlikad, 2022).Hence, the motivation of this study is to identify alternative approaches that consider the network characteristics of the multi-asset system, to generate an acceptable maintenance plan without the need of a global network view.
To address this challenge, this paper proposes a maintenance planning approach grounded in breaking down the networked multi-asset system in fixed-size subnetworks and solve separately every subnetwork maintenance plan.This task is approached by framing the planning as a multiagent reinforcement learning (MARL) problem (Busoniu, Babuska and De Schutter, 2010).The paper also introduces a systematic approach to integrate the MARL resulting policy in an agent-based model that enables simulation of the dynamics of a networked multi-asset system.This approach is evaluated against a network-wide approach and common preventive and corrective individual approaches.
The contents of the paper is structured as follows.The relevant literature on multi-asset maintenance planning is reviewed in section 2, mainly considering nationwide infrastructures where complex networks characteristics are clear.The formulation of the network maintenance problem is presented in section 3. A comprehensive Network-wide approach is explained in section 4. As alternative to this approach, a novel localised approach is introduced in section 5.This approach uses only information of the local subnetworks to produce the maintenance plan.As this approach relies on Multiagent Reinforcement Learning (MARL) to learn the optimal policy, the mechanism to integrate MARL policies into a networked-assets agent-based model is also proposed in section 6.The network-specific approaches are evaluated against alternatives for several random networks and a large nationwide infrastructure network.The context of evaluation is provided in section 7. The evaluation results are presented and discussed in section 8. Finally, future work and conclusions are drawn in section 9.

Multi-Asset Maintenance Planning
In the last years, there has been significant attention to the maintenance planning of multi-asset systems.These are systems of several homogeneous or heterogeneous assets that might depend on each other (Petchrompo and Parlikad, 2019).These dependencies can be physical or logical and these are the source of the existence of networks of assets.
Although a network perspective is not always considered in the study of multi-asset systems, this perspective brings tools to capture the static and dynamic properties of the system to better understand its behaviour (Vespignani, 2018).This behaviour is key to determine best maintenance approaches.The network structure and dynamics of the systems can be considered among other factors, at different planning levels.For example, a maintenance strategy encompasses a wide organisational perspective considered as highlighted in industry standards such as ISO55001 (ISO (International Organization for Standardization), 2014) and strategies can be classified according to distinctive approaches of corrective (breakdown), preventive and predictive maintenance (Poór, Ženíšek and Basl, 2019).Likewise, multiple objectives such as maximising system availability, minimising maintenance costs, or risk of failures, can be considered among these maintenance strategies (Pinciroli, Baraldi and Zio, 2023).
Civil infrastructures are usually seen as networks with multiple maintenance planning drivers.Researchers have structured a solution based on dynamic programming to plan the maintenance of a bridge network, considering safety and cost objectives (Frangopol and Liu, 2007).Their approach aims to reach optimal solutions, firstly at individual level and secondly at network level.Similarly, minimisation of pavement costs while maintaining the quality requirements has been achieved with a multi-objective optimisation model (Meneses and Ferreira, 2012).Moreover, a model combining analytical and numeric techniques for multi-component multi-system networks is presented in (Liang and Parlikad, 2020).The model introduces a genetic algorithm where the mutation is based on an agglomerative procedure.
All together solving the Markov Decision Process (MDP) for maintaining the entire system.The model is demonstrated in a two-bridge network obtaining reductions of overall maintenance costs.A weighted random forest algorithm also enables maintenance planning decisions in a road network (Han, Ma, Xu, Chen and Huang, 2022).The decisions are based on the sequence of the conservation plan, the time after maintenance, and specific maintenance indicators.
Maintenance plan optimisation has also been a recurrent challenge in transport industries.Maintenance of railway networks is studied in (Mohammadi and He, 2022).Authors use double deep Q-networks reinforcement learning to find the policy that optimises maintenance and renewal planning.The aim is to reduce costs and failure occurrence in large railway networks.The case study focused on a 5-year plan for a railway network of 4000 miles, that was discretised into segments as part of the state model.Mixed integer programming has enabled the optimisation of the maintenance activities while taking into account the railway traffic, at the long-term, and also the required cycles for maintenance (Lidén and Joborn, 2017).Another proposal aims to formulate a plan for the chinese railway infrastructure, by considering both, the maintenance requirements and the constraints derived from the railway network schedule (Zhang, Gao, Yang, Gao and Qi, 2020).These authors use a heuristics algorithm built on the Lagrangian relaxation process for solving the joint optimisation problem.Simulation of these railway asset networks has been also addressed by researchers.Fleets of assets have been simulated to study the condition-based maintenance of critical components (Márquez, Alberca and del Castillo, 2023).Their focus is on optimizing the maintenance activities for each asset's critical components based on their remaining useful life (RUL).Moreover, vehicle fleets have been studied as a multiobjective problem (Wang, Limmer, Van Nguyen, Olhofer, Bäck and Emmerich, 2022).In their work, authors use the predicted RUL of the vehicle components to compute the maintenance schedule of the vehicle fleet.The schedule is obtained by using a tailored evolutionary algorithm that seeks to reduce repair costs, improve safety and reduce downtime.
In addition to specific asset condition, different aspects including geography, customer needs and risk, among others, are also considered in maintenance planning of different infrastructures.Researchers of the water distribution networks, have proposed a Maintenance Grouping Optimisation model based on genetic algorithms (Li, Ma, Sun and Mathew, 2014).This approach enables the grouping of adjacent pipelines to plan maintenance, showing cost benefits compared to ungrouped plans.Empirical, single and multi-objective optimisation approaches for the maintenance of pipeline networks have been also evaluated with evolutionary algorithms (Chu, Zhou, Ding and Tian, 2022).
In this case, the "Non-dominated Sorting Genetic Algorithm II" (NSGA-II) with an elitist selection, obtains the maintenance plan that contemplates costs, reliability and overall network health.Online and Offline Deep Q-Networks reinforcement learning has been also used in maintenance planning of water pipes (Bukhsh, Molegraaf and Jansen, 2023).Authors train an agent to learn the optimal rehabilitation policies based on pipe deterioration profiles.A riskoriented perspective has been adopted for the study of natural gas distribution systems in Italy (Leoni, BahooToroody, De Carlo and Paltrinieri, 2019).By considering the relevant risks, authors are able to optimise the maintenance time for the components of the analysed gas monitoring stations.
Maintenance planning of power networks has also received significant attention.Customer requirements and longterm economic savings are considered in microgrids (Moradi, Vahidinasab, Kia and Dehghanian, 2019).Researchers propose a multi-attribute decision making model that enables identification of critical components and their failure rate over time.Focusing more on power distribution networks, another proposal considers not only maintenance planning but also day-ahead scheduling (Matin, Mansouri, Bayat, Jordehi and Radmehr, 2022).In their work, authors demonstrate that an approach, based on the Epsilon-constraint method, can lead to significant reductions in operating costs and improved reliability of multi-microgrids.Authors of (Rocchetta, Bellani, Compare, Zio and Patelli, 2019) study a deep reinforcement learning framework for planning maintenance and operations of a power grid system.An agent was trained to select the combined operations and maintenance actions.Solutions were found to be comparable with true optimals for a scaled-down grid scenario.
Approaching the maintenance problem as a Markov Decision Process has enabled researchers to use reinforcement learning (RL) algorithms.Researchers have developed a RL model based on neural networks for the maintenance of pavement (Yao, Dong, Jiang and Ni, 2020).The single-agent deep Q-learning solution approach achieves longterm cost-effectiveness in the context of Ningchang and Zhenli expressways.Deep Q-learning is also applied in a Kcomponent mechanical systems with structural dependencies, being able to bring policies that reduce system lifecycle costs (Chen and Wang, 2023).Multi-component systems are also approached using Deep Reinforcement Learning (Zhang and Si, 2020).These authors incorporate dependent and competing risks to the problem formulation.Deep Q-networks (DQN) model is also applied for planning maintenance of regional deteriorating bridges (Lei, Xia, Deng and Sun, 2022).Their model optimises regional life-cycle strategies according to various budget constraints.
So far, there is a limited volume of works that have adopted a multiagent perspective, particularly in the maintenance planning of nationwide infrastructure.A multiagent environment is considered for the optimisation of the maintenance of parallel homogeneous working machines (Kuhnle, Jakubik and Lanza, 2019).In this work, opportunistic agents learn, via proximal policy optimisation, when to trigger maintenance actions as close as possible to breakdown hence reducing downtime and maintenance costs.Although the paper considers interdependencies and interactions between the different machines in the production system, the topology studied is simple, based on parallel machines.Another multiagent approach is used to coordinate maintenance scheduling among a set of partiallyobserved machines (Rodriguez, Kubler, de Giorgio, Cordy, Robert and LeTraon, 2022).A mix of sequential/parallel and centralised/distributed agent architectures are analysed.The problem is approached as a Markov game that is tackled with the proximal policy optimization algorithm.Likewise, Multi-Actor Critic (MAAC) framework have been used to plan the maintenance of radio access networks with a single grid topology (Thomas, Hernandez, Parlikad and Piechocki, 2021).Although network dependencies in a particular network are considered, there is no indication of how this could work in different network configurations.
Based on this review, the maintenance of multiple assets has generated more interest in the context of some civil infrastructures such as bridges, power networks and less attention in other infrastructures such as telecommunications.
The summary of works reviewed is presented in Table 1.Although network structures are implied when identifying dependencies, the topologies analysed are usually simple, with limited number of assets or simple sequential/parallel structures in the multi-asset systems.Furthermore, there is still limited research on the assessment of the benefits and trade-offs of different maintenance approaches for networked multi-asset systems.

Table 1
Summary of reviewed works covering multi-asset infrastructure maintenance planning and selected works using reinforcement learning.

Network Maintenance Problem
The goal of the maintenance planner is to identify the optimal plan for maintaining a portfolio of assets.In a network setting, not only the characteristics of the assets (network elements) but also the structure of the connections among assets (network topology) become relevant when considering the potential impact of the maintenance plan.The optimal plan should consider minimum maintenance costs but also minimum impact on the quality of services enabled by the assets.For simplification, this study considers a network of assets, where the asset heterogeneity is limited to the speed of the deterioration patterns, but following a common linear deterioration function.The focus is on the role of network topology, the effect on throughput, as the key quality indicator, and the overall cost per cycle.The total maintenance cost function is made up of the downtime cost, labour cost, lost life cost and cost of parts.Pérez Hernández et al. (2022) provide a detailed description and discussion of the cost function and the parameters used in this study.
To tackle this problem, well-known corrective and preventive approaches can be used to support decisions from an individual perspective of the assets.Beyond that, multi-asset approaches can incorporate the dynamics of multiple assets into the decision problem, however these approaches do not capture in detail the network properties of the system.
The following sections introduce two approaches that exploit those properties, enabling solutions that are tailored to every network of assets.For the sake of clarity, the approaches are explained in the context of a Telecommunications network, however, the abstractions used and solution approach enable to capture the dynamics of other complex network systems such as nationwide critical infrastructures i.e. transport, water, energy or others.Note this problem focuses on the network perspective of the multi-asset systems.This approach enables the formulation and use of techniques for analysis that are common to multiple domains, however, a comprehensive planning requires also elements that are specific of the domain.

Network-wide Approach
This approach assumes complete observability of the relevant features of network.A centralised optimiser is fed, periodically, with snapshots of the condition of individual assets and the state of the traffic flows during a time window.
In a Telecommunication network, the traffic flows pertain to the data packets being transported across multiple network equipment from the source of the data, i.e. servers, camera feeds, sensors, etc, to the consumers e.g.mobile phones, industrial computers, laptops, tv, cars, etc.
We represent the telecommunication network as nodes ∈ linked by arcs ∈ with a set {( , )} corresponding to services, where traffic flow starts from a source node and ends at a destination node .Nodes that are likely to fail are denoted by ̄ ⊂ .A number of control variables are defined to model traffic flow and maintenance decisions as follows: Pérez Hernández, Puchkova, Parlikad: Preprint , , -traffic amount flowing through arc at time from source to destination , 1, if predictive maintenance job on node ∈ ̄ starts at time , 0, otherwise.
To guarantee that decision variables behave as desired, the following constraints need to be introduced.Constraints (1) ensure that the sum of all traffic coming from node is equal to traffic demand for each service.The sum of traffic coming from any intermediate node doesnot exceed the sum of traffic flowing to this node, see constraints (2).
Constraints (3) imply that the sum of traffic going through arc cannot exceed its capacity , as well as that traffic flow is not permitted on those arcs that are incident with the node undergoing maintenance (i.e. when = 1).
A node ∈ ̄ on maintenance is shut down for the duration of predictive maintenance job , see ( 4).In this model we consider continuous maintenance without pre-emption as represented by constraints (5).
The optimisation model aims to identify the best values of decision variables defined earlier that minimise the total cost consisting of maintenance cost, cost of traffic loss and rerouted traffic cost: Pérez Hernández, Puchkova, Parlikad: Preprint where is failure probability of node ∈ ̄ .

Local-networks (Localised) Approach
Due to scalability of assets in the network or specific deployment limitations, there are cases where full observability of the network cannot be guaranteed or there is not enough capacity to process and compute timely the data collected from the entire network.In these cases, alternative approaches considering only partial network information are necessary.The Local-networks (Localised) approach defines the network maintenance problem as an Multiagent Reinforcement Learning (MARL) problem.
The reinforcement learning framework enables an agent to learn, from the interactions with the environment, a sequence of actions that maximise a given cumulative reward (Sutton and Barto, 2018).A RL problem is formally defined as a Markov Decision Process (MDP) and RL algorithms aim to find a policy that drives the agent-decision making process (Sutton and Barto, 2018).MARL is an extension of the single-agent reinforcement learning (RL) problem where multiple agents are interacting with the environment and taking actions, hence potentially having influence on each other (Busoniu et al., 2010).
The overall network maintenance plan is built from the localised maintenance decisions that independent agents take, based on the observability of their local networks.Agents observe their environment and learn decentralised policies that seek to maximise individual rewards.Collectively, the aggregation of individual rewards yields a systemlevel reward.The idea of applying this approach is to determine the ability of agents to learn an acceptable policy and understand the magnitude of the compromise that maintenance planning decision-maker faces when a Network-wide approach is not feasible.The rationale for this approach is to reduce the dependency on the full network information to drive maintenance decisions.Mathematically the problem can be formulated as an adaptation of the stochastic game definition (Busoniu et al., 2010), as follows: Where Γ is the environment, of which are the possible states, from which a number of agents observes (See section 5.1), such as = 1 , 2 , … , .Every agent is able to take actions , note that original MARL definition is simplified by assuming that 1 = 2 = … , in other words, all the agents have the same action space, which is discrete with 0 :Do nothing and 1 : Start maintenance.Likewise, and are functions that capture the transition probabilities and represent the collective reward, respectively.In this case the collective reward is a simple aggregation of the individual rewards of every agent: = ∑

Full Network
Local network (Egonet) Egonet maintenance Figure 1: Local network observation process.Local networks are extracted for each node of the network, representing assets and their links.Then agents are trained using information about nearby unavailable assets (red nodes) to learn when maintenance is more detrimental , according local network robustness metrics.

Networked Asset State
The environment state is formed by continuous, discrete and network representation components.This is defined by the tuple: = ⟨ , , ⟩, where is the current condition of the asset, is the individual tracker of the maintenance state of the asset (up, down or on maintenance) and is the local network state.Every agent is assumed to have visibility of the local network of the asset.This also known as an ego-centric perspective or egonet (Scott and Carrington, 2011).There is more information available ( and ) for the ego (focal) node and the number of nodes of the egonet varies depending on the depth.Both the size of the egonet and the information shared by neighbours are limited to control the communication overhead required in the learning process.As every agent is computing the maintenance plan for an egonet, this subnetwork serves as a limit for the quantity of assets considered in every maintenance plan.
Similar approaches of using limited-range network centrality measurements have been used to address complexity of characterising the structure of large networks in other domains (Ercsey-Ravasz and Toroczkai, 2010).process.Following a standard reinforcement learning process with Deep Q-learning, the agents are trained by observing their network state, taking individual actions and receiving rewards according to the state-action pair.As one of the main drivers of this approach is to offer a low-overhead localised alternative, the state space is transformed to a uniform continuous space.This is possible by computing the "health" of the local network based on the available edges at every time.This metric is regarded as network density (Newman, 2018): = ∕ ( − 1) which corresponds to the connected edges over the possible connections ( − 1).The rationale of using this metric is that it can inform the agent about the impact of going to maintenance at time at local network level.Note there is no consideration of flows in the state Pérez Hernández, Puchkova, Parlikad: Preprint space, this is also a design decision to reduce the overhead as live traffic is also computationally expensive to collect and process.

Reward Function
The reward function of every agent ( , ) at a time step is inversely derived from the cost function (section 3).
Moreover, an additional term is added to account for the local network information.When the 1 (Start maintenance) action is selected, the agent is penalised proportionally according to the density of the egonet at the current step.This is multiplied by a factor that considers the importance of the egonet information in the agent's reward.Accordingly the function is defined as: Where , is the total cost of maintenance calculated according to section 3. The density of the (egonet) subnetwork at time , is influenced by the condition of the assets that are part of the subnetwork.Particularly, if assets have failed, will be lower than the expected density * when all assets are working.The expected effect of this function is that the agent learns to balance the decision of when the maintenance is due because the asset condition has deteriorated while also discouraging maintenance when there are other assets, within its local network, on maintenance or failed.

Independent DQN
A reinforcement learning (RL) algorithm finds the set of actions, namely policy , that maximises the agent's cumulative reward.Algorithms are normally suited for a particular environment, including specific action and state spaces.Independent Deep Q-Networks (I-DQN) (Tampuu, Matiisen, Kodelja, Kuzovkin, Korjus, Aru, Aru and Vicente, 2017) is a foundational algorithm that has been used in multiagent environments with discrete action spaces.I-DQN is a multiagent adaptation of the single-agent DQN algorithm that has been benchmarked in different environments (Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski et al., 2015).As expected, in a multiagent environment, this algorithm does not provide guarantees for convergence to an optimal global policy due to non-stationarity.However, similar approaches have been successfully used in two-player games (Foerster, Assael, De Freitas and Whiteson, 2016).This algorithm was selected for the suitability for the discrete action space, its simplicity and the decentralised nature which is aligned to the aim of learning a policy based only on the local subnetworks.
Algorithm 1 is derived from the multiagent I-DQN adapted by (Tampuu et al., 2017).It aims to find the policy starting with the observation of the current environment state.Every agent selects an action by using the -value function that estimates the quality of each action , at the given state .To allow for exploration, instead of always Algorithm 1: Independent DQN.Derived from (Tampuu et al., 2017)  Multi-Asset Agent-Based (ABM) Simulation.MARL agents are trained offline to learn a policy that is later used to generate maintenance plans of specific test networks.These plans are pushed to the ABM simulator enabling examination of networked multi-asset system dynamics for the given plan.

Reinforcement Learning In The Multi-Asset Agent-Based Model
NAssets.jl is an agent-based model and simulator (ABMS) introduced in (Pérez Hernández et al., 2022) that enables modelling and simulation of networked multi-assets systems.This model represents assets as agents whose condition deteriorates along the time, following a defined model.Likewise, NAssets.jlallows for configuration of network topologies identifying assets as vertices and the edges among them enable traffic flows.This simulator enables the introduction of an agent-based control system that manages the maintenance operations and routing of the underlying network of assets.This control systems could be defined by a single agent or by several arranged in their own control network.As part of this work, NAssets.jlmodel is extended to enable integration of offline Network-specific approach described in sections 4 and 5.
Network-specific maintenance approaches rely on an offline planning phase, which uses available network data to determine the maintenance plan.Network topology and condition deterioration functions are used as starting points in both Network-wide and Localised approaches.In the Localised approach the complete topology is only necessary when evaluating the learned policy on the network of interest, thus subnetworks are obtained around every critical asset.
Once the maintenance plan is generated the NAssets.jlmodel simulates traffic and condition deterioration dynamics during a defined observation time.
The integration of Reinforcement Learning approaches into Agent-based Models (ABMs) has been identified as a way to support decision-making processes within the simulation of complex systems.Similar techniques have been explored in domains different to network maintenance planning, for example in (Vargas-Pérez, Mesejo, Chica and Cordón, 2023;Lee, Rucker, Scherer, Beling, Gerber and Kang, 2017).For network maintenance, MARL processes are integrated into ABM according to the flow presented in Figure 2.
There are two main phases in this flow.During the first Offline Planning phase, the MARL agents are trained according to the process described in section 5.3.Thus, the networks of interest are pushed to the agents that use the reward (section 5.2) function to drive the policy learning process that determines the maintenance actions.Once a policy is learned by the agents, test networks are used to generate maintenance plans for a required period.Resulting plans are consolidated in a single plan with the form of a × binary matrix, with assets and time steps and set to 1 when maintenance is due.
During the Network Dynamics Simulation phase, the consolidated plan is loaded into NAssets.jlwhich is also configured according to the network topology, the service portfolio supported by each network, the condition deterioration model for the assets and the traffic dynamics parameters.At start, the ABM configures maintenance activities and traffic re-routing as events in line with the input plan.The agent-based control system monitors asset's condition and acts according to events planned.

Case Study: Multi-Asset Networks In Nationwide Digital Infrastructure
The nationwide digital infrastructure is a multi-asset system where routers, mobile antennas, ad hoc computing resources and many other devices enable data packet transport along the country.This infrastructure is also a large complex network of networks that is carefully designed considering several requirements such as performance, quality, reliability and cost-efficiency.Particularly, it is expected that data packets across the network only travel a few hops until the destination, this is known as the small-world effect (Newman, 2018).Likewise, others have highlighted that traffic flows within these type of networks follow a scale-free model (Pastor-Satorras and Vespignani, 2004).
Pérez Hernández, Puchkova, Parlikad: Preprint The infrastructure makes possible the transport of data between providers and consumers.Service requirements specify data transfer expectations from individual and business customers.Likewise, mobile or broadband operators use the nationwide digital infrastructure to support their own service portfolio (Amin et al., 2000).There is a Service Level Agreement (SLA) for each service that also includes Key Performance Indicators (KPIs), facilitating evaluation of the delivered quality of service against specification (Kosinski, Nawrocki, Radziszowski, Zielinski, Zielinski, Przybylski and Wnek, 2008).As multiple KPIs are monitored depending on the service, the focus of this case is on one of the most common: Throughput, which indicates the rate of data packets delivered over time from end to end (providers to consumer).
Although the network perspective is not constrained to a particular planning level, the case focus is on the tactical maintenance planning.This planning assumes a stable network of assets and a set of fixed contracted services according to the network capacity for a medium term period, e.g. six to twelve months.A challenging task at this level is to balance the maintenance costs while keeping an adequate quality of service across the infrastructure, built from geographically distributed assets.The infrastructure follows a hierarchical architecture organised in network segments with different technologies and protocols (Tanenbaum, 2003).Access networks enable users, either data providers or consumers, to join the network, while metro/regional networks connect specific geographical areas to the core/backbone network which ensures national long-distance data packet transfer (Stavdas, 2010).
The environment for evaluation of the localised maintenance approach is motivated by the characteristics and dynamics of the UK's nationwide digital infrastructure.At the small scale, random networks are generated to resemble some of the networks present in this infrastructure.Particularly, the Barabasi-Albert (BA) model (Barabási and Albert, 1999) facilitates the generation of scale-free networks and the Watts-Strogatz (WS) model (Watts and Strogatz, 1998) is used for networks that exhibit the small-world properties.At the large scale, the UK's metro-core network presented Pérez Hernández, Puchkova, Parlikad: Preprint most serious impact of the maintenance activities is presented in Figure 4 when due to the lack of backup paths, the service throughput drops to zero.The figure shows the different timing of the maintenance activities, according to each approach.For example, if maintenance starts too early and there are no alternative paths, more maintenance activities are required during the same time frame as shown in the preventive approach.
The descriptive statistics of the throughput reduction due to maintenance, in the networks studied, are presented in Table 3.Moreover, Figures 5 -6 show the throughput reduction for 10 BA and 10 WS random networks.Each point in the plots represents the reduction of throughput of one of the services provisioned within one random network.As expected, the Network-wide approach yields the minimum average impact on the throughput with a mean of only 0.12 (reduction of the expected throughput) for BA networks and 0.22 for WS.The standard deviation shows the dispersion of the measured impact, of a given maintenance approach, among networks of similar characteristics.For the networkwide approach the standard deviation is 0.084 for BA and 0.097 for WS networks, showing significant dispersion of the measured impact although lower than the standard deviation observed in other approaches.This shows that impact of this maintenance approach is slightly more consistent than others, across the various sets of services and networks analysed.Although not fully shown in the plots, in few cases only, the Network-wide approach causes higher impact, on specific services running in BA networks, than other approaches.Particularly, the corrective approach is the best one in these cases.This might be due to the greater availability of alternative paths for certain nodes in BA networks hence the path chosen after a node fails leads to lower throughput reduction than the anticipated path chosen in the network-wide approach.More details are presented in (Pérez Hernández et al., 2022).
For the BA networks, the greater throughput reductions are obtained by the Localised and Corrective approaches, respectively with 0.24 and 0.18 less throughput than expected.For WS random networks, the performance of the Network-wide decreases, making the Localised approach an acceptable alternative, slightly better than the preventive approach.However, the standard deviation is the highest with 0.12 for both types of networks.Note that as per statistics in Table 3, the performance of the Localised approach is stable across BA and WS networks.This might be explained as both types of networks were used when training the agent that yields plans according to this approach.Note also that standard deviation is high in comparison to the ranges of the reductions obtained, which limits the power to generalise Pérez Hernández, Puchkova, Parlikad: Preprint the behaviours observed.This needs to be investigated further and could be due to throughput reduction being more specific to the characteristics of the selection of services simulated and the network configuration used.
The cross-network analysis shows that average throughput reduction, as a measure of the impact of the maintenance approach, is higher in random networks created with the WS model than those created with the BA model.Extreme Low reduction or no reduction at all in some services is due to the availability of backup plans that can be used to re-route traffic during maintenance operations.This is evidenced by the overlapping markers close to 0.0 for several services and across all approaches in Figure 5.The lack of backup paths seems to affect the performance of the Network-wide approach while it does not show substantial impact in the Localised approach.Overall, the Network-wide approach performs better, on average, than the alternatives, with larger differences in the BA networks.These results show that in BA networks the maintenance planner is better off using an individual preventive or corrective approach, in case the Network-wide is not possible.In WS, the differences among approaches is only 0.09 of reduction, however the Localised approach offers performance close to the Network-wide with lower overhead.Likewise, the distribution of the throughput reduction for the Localised approach in the WS networks shows a close-to-normal shape, which is useful to enable assumptions looking at wider simulation scenarios.
The comparative performance of the approaches in the Backbone (Metro-core) network is presented in Figure 7 and the descriptive statistics also in Table 3.The trend is similar to that observed in the WS random networks.The Network-wide has the lowest impact on the services with only 0.14 of throughput reduction, the next best performance is by the Localised with 0.15.In this case the standard deviation is slightly higher for the individual approaches and lower in the Network-wide and Localised approaches.
The results of the maintenance costs per cycle, indicate that the sensitivity among approaches was minimal for the parts costs.Likewise the labour costs behaviour was similar to the downtime costs.For the sake of clarity, only downtime and lost life costs for the three types of networks analysed are presented in Figure 8.
The positive slope of the preventive approach confirms that this approach is highly sensitive to lost life cost.It is the most expensive when lost life costs increase, as shown by the blue lines above others at right side of the subplots.
When downtime costs are high (20x) and lost life costs are medium (5x) or low (1x), the corrective approach shows Pérez Hernández, Puchkova, Parlikad: Preprint As a result of the cases studied, the Network-wide approach is the one that leads to average lowest impact on the quality of the services and the most cost-effective.Cost-wise, there are minimal difference between the Network-wide and the Localised approach for the cost parameters analysed.This is explained as both approaches are designed to optimise, or learn the policy that optimises, the defined cost function.The spread of the data obtained suggests that specific analysis is required for different portfolio of services and network configurations.This analysis discourages the use of an asset's individual preventive approach for networked assets as the costs and impact on quality are higher than the Network-specific approaches.Individual corrective causes the highest impact on the quality and the cost per cycle, is only as low as network-specific alternatives when the downtime costs are also low.However, corrective is the simplest approach to implement.Hence, when there is tolerance to quality reduction and downtime costs are low, the corrective approach seems an acceptable alternative.
The Network-specific approaches are more complex to implement.Particularly, as the Network-wide requires a comprehensive view of the network assets' state, the maintenance plan generation is more computationally demanding than individual approaches.In this case the planning process is highly sensitive to the scale the network, because the greater the number of assets to consider the higher the computational resources needed to both store condition trajectories of every asset and calculate alternative paths along a large network.The Localised approach although also computationally demanding for the training phase, is not as sensitive to the scale as the Network-wide as the scale of the subnetwork of assets, considered for planning, is capped to the size of the egonet with a fixed depth.This reduces and breaks down the demand of computational resources, compared to computing the plan for the entire network and then offers an acceptable alternative for networks when the topology shows small-world properties (WS model).
The Localised approach still shows room of improvement as only one Independent DQN algorithm was evaluated, while this is an active area of research.Likewise, alternative approaches for the approximation of the agent policy can be based on Graph Neural Networks (Cappart, Chételat, Khalil, Lodi, Morris and Veličković, 2021) which are naturally suited to represent local network state.Although this approach seems promising for the network maintenance problem, additional computational and environment design overhead must be also considered when using these approaches.

Conclusion and Future Work
This paper explores the use of network properties to plan the maintenance of multi-asset systems, aiming to reduce the impact of maintenance operations on the quality of services and the overall costs.Two network-specific maintenance planning approaches are introduced: A Network-wide and a Localised approach.The former considers the network topology and the dynamic traffic flows of the multi-asset system to jointly plan maintenance operations and re-route traffic flows accordingly while optimising impact reduction and costs.The latter approach, identifies local subnetworks and uses the Independent Deep Q-Learning Networks (I-DQN) algorithm, to learn a policy that generates the maintenance plan for each local subnetwork.The purpose of this latter approach is to reduce the overhead of considering the full network topology, the flows and every asset's condition when planning maintenance operations by providing an alternative, working in smaller subnetworks with a fixed size.
The performance of the proposed approaches is evaluated against individual corrective and preventive approaches over twenty random networks and an example of the UK's nationwide digital infrastructure backbone network.For evaluation of the Localised approach, an approach for integration of Multiagent Reinforcement Learning (MARL) and a Multi-asset agent-based model is also introduced.The Network-wide approach yields, on average, the lowest reduction on service throughput across all approaches and networks analysed.In networks with small-world properties, particularly the random networks generated from the Watts-Strogatz model and the backbone core, the Localised approach shows a performance close to the Network-wide with less overhead.Cost analysis across all networks and covering various combinations of parameters show minimal differences between the network-specific approaches, which are less sensitive to network and parameter changes, in contrast to individual approaches.
The current work sets the basis for the design of network-specific maintenance approaches using, agent-based modelling, mathematical optimisation and multiagent reinforcement learning.Further work is required to evaluate approaches in a wider mix of network topologies and dynamics as the standard deviation of the results obtained in this study is high.Moreover, the Localised approach shows promising results and alternative MARL algorithms should be evaluated.More complex scenarios where assets have heterogeneous capacity and traffic can be distributed among more than one assets deserve further exploration as these resemble more closely existing nationwide infrastructure networks.

CRediT authorship contribution statement
Marco Pérez Hernández: Concept, Methods and Writing.Alena Puchkova: Concept, Methods and Writing.

Figure 1
Figure 1 illustrates the local network observation process.Starting with the identification of the (left) to the exploration of different scenarios where the neighbouring nodes are undergoing maintenance (right), every agent observes the individual maintenance tracker of the local network of assets.This allows the agent to determine what assets are undergoing maintenance (red nodes in the figure) and uses this information to drive the policy learning

Figure 2 :
Figure2: Integration of Models: Multiagent Reinforcement Learning (MARL) of Network Maintenance Plans and Networked Multi-Asset Agent-Based (ABM) Simulation.MARL agents are trained offline to learn a policy that is later used to generate maintenance plans of specific test networks.These plans are pushed to the ABM simulator enabling examination of networked multi-asset system dynamics for the given plan.

Figure 3 :
Figure 3: Nationwide backbone network.Nodes are network elements distributed geographically across the country.Exact geographic location of the nodes has been randomised.

Figure 4 :
Figure 4: Impact of Maintenance Activities on Service Quality.Extreme example of throughput (Packets/time) reduction against expected (Grey line) in a simulated service.Due to the lack of alternative paths when active network elements are on maintenance, throughput drops to 0. Differences in timing of maintenance according to each approach.

Figure 5 :
Figure 5: Reduction Of Service Throughput Owing To Maintenance Operations In 10 Barabási-Albert Networks.

Figure 6 :
Figure 6: Reduction Of Service Throughput Owing To Maintenance Operations In 10 Watts-Strogatz Networks.

Figure 7 :
Figure 7: Reduction Of Service Throughput Owing To Maintenance Operations In The Backbone Network.

Figure 8 :
Figure 8: Sensitivity To Downtime And Lost Life Cost Parameters Of The Maintenance Costs Per Cycle For The Network Models Analysed . Starting from observation of the environment state, agents estimate -value of the actions and select one, keeping record of their experiences every time.Then deep neural networks are used to approximate the function by minimising the loss ( ).