Distributed Configuration of Sensor Network for Fault Detection in Spatio-Temporal Systems

The problem of fault detection in spatio-temporal systems is formulated as that of maximizing the power of a parametric hypothesis test verifying the nominal state of the process under consideration. Then, adopting a pairwise communication schemes, a computational procedure is developed for the spatial configuration of the observation locations for sensor network which monitor changes in the underlying parameters of a distributed parameter system. As a result, the problem of planning the percentage of experimental effort spent at given sensor locations can be solved in a fully decentralized fashion. The approach is verified on a numerical example involving sensor selection for a convective diffusion process.


Introduction
Recently, one can observe an extremely fast development of methods of Fault Detection and Isolation (FDI) for dynamical systems. Although a vast amount of existing approaches (for surveys reader can be referred to [3,7,9,24,39]), most of contributions focus on the methods for lumped parameter systems with just a few approaches dedicated to spatio-temporal systems also known as distributed-parameter systems (DPSs). Since in DPS it is usually impossible to observe the system states over the entire spatial domain, the problem of sensor location and observation scheduling for proper system monitoring become of vital importance. But within the framework of FDI systems for DPSs, the optimization of measurement process which increases the reliability of the diagnosis is usually neglected and there is a substantial lack of techniques dedicated to this area.
Technological advances in communication systems and the development of inexpensive mobile measurement systems made it possible to deploy a group of networked devices in a number of environments, cf. [13,4]. A cooperated and scalable network of sensors has the potential to substantially enhance performance of the observation systems. Although the number of sensor placement techniques to manage the configuration for sensor networks of practical scale is very limited, some effective approaches has been proposed to cover a number of different experimental settings, including stationary [12,34,22,26,17,10,21,11], scanning [22,33,15] or moving observations [29,27,34,19,36]. Their adaptation to the sensor-network setting was reported in [32,31].
However, most techniques communicated by various authors rely on centralized techniques, which assume an existence of some superior entity to maintain the whole network and responsible for optimization of the observation strategy. The distributed nature of the design problem is taken into account very occasionally. Recent advancements in sensor networks necessitate effective, distributed and fault-tolerant algorithms for computation and information exchange.
In the background of diagnostic problems, another difficulty is a development of proper relation between the performance of diagnostic system and the observational strategy. For this purpose, a classical hypothesis testing is used for model-based diagnosis in DPSs. This allows us to construct a qualitative criterion of sensor allocation using the notion of D-optimum designs originated in optimum experiment design theory. Then, proper fault detection and localization schemes are developed based on the results reported by Patan [22] for stationary sensors and further extended for scanning and mobile sensor networks [16,22,37,23,14,30].
In this paper we wish to focus on certain computational aspects of sensor placement problems for fault detection in DPSs. Given a finite set of stationary located sensors, the observation horizon is divided into consecutive stages. Then, the non-negative weights are assigned to each sensor at given stage so as to maximize the determinant of the Fisher Information Matrix (FIM) related to the parameters to be identified. The weight assigned to a measurement point can be interpreted as the proportion of observations performed at this point, or the percentage of experimental effort spent at this point. The potential solutions are of considerable interest while assessing which sensors are more informative than the others and allow for complexity reduction of the measurement system. The solution proposed here is close in spirit to the classical optimum experimental design theory for lumped systems [1,38]. Its novelty relies on a generalization of a known numerical algorithm for computation of a D-optimum design on a finite set together with its efficient distributed implementation dedicated to sensor network.
As a result, an extremely simple computational scheme is developed to determine best sensor configurations. Its idea is based on the class of so-called gossip algorithms in which each node communicates with no more than one neighbor at each time instant [40,5,6]. The different nodes of the net-work independently calculate and store the desired quantities and their final estimates are obtained in a fully decentralized fashion. The advantage of such information exchange is clear as it is robust with respect to individual sensor faults and the the global estimate of information matrix is stored at all sensor nodes so easily can be recovered.

Model-based fault detection in DPS
Parameter estimation is one of the fundamental approaches in the framework of analytical techniques of fault detection [9]. It is especially useful in situations when the faulty system state does not only changes system response but leads also to perturbation in values of model parameters. In practice, this is quite common if only parameters have a physical interpretation built upon proper analysis and modeling of monitored process variables.
Let y = y(x; t; θ) denote the scalar state of a given DPS at a spatial point x ∈ Ω ⊂ R d and Here θ represents an unknown constant m-dimensional parameter vector which must be estimated using observations of the system.
The state y is observed by N scanning sensors, which possibly change their positions at time instants 0 < t 0 < t 1 < · · · < t K = t f and will be remaining stationary for the duration of each subinterval T k = [t k−1 , t k ], k = 1, . . . , K. The observations can be formally represented as [20,15] for j = 1, . . . , N and k = 1, . . . , where x j k is element selected from defined a-priori set of sensor locations X = {x 1 , . . . , x N } and ε(x j k , t) is measurement noise which is assumed that is zero-mean, Gaussian, spatial uncorrelated and white [14,20], i.e. E[ε(x j k , t)] = 0 and var(ε(x j k , t)) = σ 2 where σ > 0 denotes standard deviation of the measurement noise. A further assumption is that the estimation of the unknown parameter vector θ is performed via minimization of the least-squares criterion with θ ∈ Θ ad and Θ ad is the set of admissible parameters andŷ(·, ·; θ) denotes the system model response suitable to θ. The estimate of the true value of θ is a vectorθ which minimize J (θ) . An elementary idea of fault diagnosis is to compare the resulting parameter estimates with the corresponding known nominal values, treating possible differences as residuals which contain information about potential faults. Then using thresholding techniques, the appropriate decision making system could be constructed to detect abnormal situations in system functioning, see [22,18,37,23]. Based on the observations, it is possible to test the simple parametric statistical hypothesis H 0 : θ = θ , where θ is the nominal value of the vector θ corresponding to the normal system performance.
Introducing the generalization of the likelihood function for the considered experiment [8]: and setting Θ 0 = {θ ∈ Θ ad : θ = θ } it could be define the following generalized log-likelihood ratio: The generalized log-likelihood ratio test is widely used in statistics, because it can be shown that assuming the validity of the null hypothesis H 0 the sequence λ(z) for N → ∞ is weakly convergent to x 2 random variable on m degrees of freedom [8, Thm. 3.6.1, p 55].he meaning of this fact is that for the given significance level γ representing a fixed range of model uncertainty we can compare the observed value of λ(z) with some threshold k γ obtained from the cumulative chi-squared distribution with m degrees of freedom, where k λ is 'λ-quantile' of the distribution. Then, the decision rule for fault detection takes the following form: The potential rejection of H 0 indicates a significant deviation of the vector from the nominal value of this parameter and forms a base for proper detection of abnormal states in the system. When the null hypothesis H 0 is true, we do not want to reject it. The mistake in rejecting null hypothesis H 0 when it is true is called the Type I error. Similarly, accepting null hypothesis when the alternative hypothesis of the form H 1 : θ = θ is true is called the Type II error [8] In fault diagnosis, there is a direct connection between these errors and the probability of a false alarm and missed detection, respectively.

Optimal measurement problem
In order to achieve a reliable diagnosis on the DPS state we have to keep the probabilities of false alarms and missed detection low. In practice, in order to avoid the multi-criteria optimization, a typical approach is to minimize the Type II error with a constraint imposed on the maximum level of the Type I error.
Based on the parameter vector θ it can be shown that the power of the presented hypothesis test (i.e. the probability of not committing the Type II error) can be increased by taking a large number of measurements N or, alternatively, by maximizing the D-optimality criterion (see [18] for details): defined on the so-called average Fisher Information Matrix given by [15] where stands for the so-called sensitivity vector, θ 0 being a prior estimate to the unknown parameter vector θ. The D-optimality leads to minimization of the volume of the uncertainty ellipsoid for the parameter estimates. Since in real world problems, the number of available observations has to be always limited, thus in practice, such strategy is important alternative to obtain the high powered test which leads directly to the low probability of missed detection. The introduction of an optimality criterion renders it possible to formulate the sensor location problem as an optimization problem with respect to x j k , j = 1, . . . , N, k = 1, . . . , K belonging to the admissible set X. Owing to assumption on independent measurement noise, we admit of replicated measurements, i.e. some values x j k may appear several times in the optimal solution (we can use multiple sensors located at the same position or we can repeat the process). Consequently, we introduce r j k as the numbers of replicated measurements corresponding to the sensor nodes locations x 1 , . . . , x N at consecutive time subinterval T k . The collection of variables where p kj = r j k /N , N = N j=1 r j k is called the exact design of experiment for subinterval T k . The proportion p kj of observations performed at x j k can be considered as the percentage of experimental effort spent at that point.
Introducing the ξ = (ξ 1 , ξ 2 , . . . , ξ K ) as the total experimental design, we rewrite the FIM in the form Here the p kj 's are rational numbers, since both r j k 's and N are integers. This discrete nature of N -observation exact designs causes serious difficulties and cannot be solved easily by standard optimization techniques, particularly when N is large. A reasonable approach is to relax the definition of the design. When N is large, the weights p kj can be approximated with a real numbers in the interval [0, 1], not necessarily integer multiples of 1/N . Obviously, we must have N j=1 p kj = 1, for k = 1, . . . , K, so we may think of the designs as probability distributions on X. This leads to the so-called continuous designs which constitute the basis of the modern theory of optimal experiments [2,28]. It turns out that such an approach drastically simplifies the design. Thus, we shall operate on designs which concentrates approximately N p kj measurements at each location x j k s. Then we may redefine an optimal design as a solution to the optimization problem where Ξ(X)denotes the set of all probability distributions on X = {x 1 , . . . , x N }.

Characterization of the optimal solutions
In the remainder of this paper we shall assume that g ∈ C(X × T ; R m ). A number of characterizations of the optimal design ξ can be derived in a rather straightforward manner from the general results given in [35,16]. The next result constitutes the necessary and sufficient condition for the optimality of designs. It is usually called an equivalence theorem [35,16].
The sensitivity function φ k (x, ξ) is of vital importance here, providing us with a simple test for the optimality of designs. In particular, if it takes the values less than or equal to ς k (ξ) for all x ∈ X then ξ is optimal, otherwise ξ is not optimal.

Distributed algorithm
Analytical determination of optimal designs is possible only in simple situations and for general systems it is usually the case that some iterative design procedure will be required. In the case considered in the paper, i.e. the design for fixed sensor locations, a computational algorithm can be derived based on the mapping T : Ξ(X) → Ξ(X) defined by , k = 1, . . . , K. From Theorem 1 it follows that a design ξ is D-optimal if it is a fixed point of the mapping T , i.e. T ξ = ξ .
Also, we have the valuable property that the sum of weights p kj is invariant with respect to T for any k = 1, . . . , K (i.e. it is always equal to 1). As for interpretation of this mapping, let us note that in case of ξ not being the optimal design, it increases the weights for those sensor locations where we observe the high values of sensitivity function at the cost of decreasing the weights of non-informative locations with small values of φ. Using this simple idea, it is possible to develop a decentralized configuration algorithm for sensor network being a distributed generalization of that proposed in [25, p. 139] for the classical optimum experimental design problem consisting in iterative computation of a D-optimum design on a finite set (and further extended in [35,16]).
In the following we assume the asynchronous time model for the configuration process. Let r = 0, 1, 2 . . . be the discrete time index, which partition the continuous configuration time axis into time slots Z r = [z r−1 , z r ).
Owing to Theorem 1 we can use the mapping T to iteratively improve the experimental effort distribution. Unfortunately φ k (·, ξ) and ς k (ξ) computed according to (14) in a centralized fashion are not very useful in our setting. But the only component which cannot be calculated independently of other network nodes is the global information matrix M (ξ). Moreover, from (12) it is clearly seen the information matrix is a weighted average of the subinterval information matrices Thus, our task is closely related to the problem of distributed averaging on a network which appears in many applications and has been a subject of extensive studies [40,5,6]. Distributed averaging can be achieved in many ways. One of straightforward techniques is a pairwise communication flooding, also known as a gossip scheme, which in its classic version assumes that at the k-th reconfiguration time slot the i-th sensor contacts some neighboring node j with probability Q ij i.e. a pair (i − j) is randomly and independently selected. At this time, both nodes set their values equal to the average of their current values. Denoting the estimate of (18) for subinterval T k maintained by j-th sensor at configuration time slot Z r as M j k (ξ (r) ) we have (← is an update operator).
However, in our setting the averaging problem is not typical as the contribution of the nodes with weights tending to zero as r increases should be eliminated from the averaging. This can be achieved by increasing the contribution of the local information matrix via the concept of so-called running consensus [6] where Υ k (x i k ) = T k g(x j k , t)g T (x j k , t). The first term enforces consensus among the nodes (represents the average information from the rest of network), while the second accounts for the increase in the total contribution of the local node. Note, that there is no need to record the whole design structure of ξ (r) at each node as the information about the current solution is propagated via local estimates of global FIM, i.e. M i k (ξ (r) ). The whole idea of the communication process is given by the Algorithm 1. Algorithm 1 Distributed optimization of experimental effort. Indexes i and j denote, respectively, data from local repository and obtained from neighbor, r is an index of current configuration slot Z r . Function EXCHANGE is responsible for both sending and receiving data to/from connector neighbor (order descending on who initiated communication) . 1: procedure exchange protocol(r) 2: for k ← 1, K do for all time subintervals T k 3: EXCHANGE(M j k (ξ (r) ), M i k (ξ (r) )) sends and receives 4: EXCHANGE(p ki and p kj ) FIM weights 5: p ← p ki + p kj store old weights for normalization 6: Update M i k (ξ (r) ) ← (p ki M i k (ξ (r) ) + p kj M j k (ξ (r) ))/p 7: Calculate φ k (x k , ξ (r) ) and ς k (ξ (r) ), ∈ {i, j} 8: Update p ki ← p ki φ k (x i k , ξ (r) )/ς k (ξ (r) ) 9: p ki ← p · pi(p ki + p kj ) normalization p ki 's to sum up to 1 10: Update Mi(ξ) ← r−1 r Mi(ξ) + 1 r Mi 11: end for 12: r ← r + 1 13: end procedure For all subintervals T k data are calculated separately to reach most informative configuration for mentioned part of experiment. At r = 0 each network node starts with a global FIM estimate M i k (ξ (r) ) initialized with its local information matrix Υ k (x i k ) and starting with non-zero weights (it is important since once the weight attains zero, a multiplicative update forces it to remain zero). Then at each of asynchronous configuration time slot Z r an appropriate pair of nodes exchanges information according to Algortithm 1. It can be shown that algorithm converges to the optimal solution, what is a consequence of independent weight update for each network node and the stochastic convergence of gossip algorithms [40].
Obviously, the sensor network topology and the choice of proper communication scheme significantly influence on the convergence rate. In general, under some assumptions on the network connectivity graph, a suitable gossip algorithm can be provided via the semidefinite programming with an exponential convergence rate. Since discussion on those very important issues is far beyond the scope of this work, therefore we refer the reader to the seminal papers [40,5] for details.

Simulation example
As an illustration of the presented approach, consider the problem of sensor configuration for parameter estimation in the process of air pollutant transport over a given urban area. Within this domain, which has been normalized to the unit square Ω = (0, 1) 2 , an active source of pollution is present, which yields the pollutant spatial concentration y = y(x, t). The evolution of y over the normalized observation interval T = (0, 1] is described by the following advectiondiffusion equation: subject to the boundary and initial conditions: y(x, 0) = y 0 , in Ω, where term f (x) = 50 exp(−50||x − c|| 2 ) represents the pollutant source located at the point c = (0.3, 0.3) and ∂y/∂n stands for the partial derivative of y with respect to the outward normal to the boundary Γ. The mean spatiotemporal changes of the wind velocity field over the area were approximated by v(x, t) = (2(x 1 + x 2 − t), x 2 − x 1 + t). The assumed functional form of the spatial-varying diffusion coefficient is a(x) = θ 1 + θ 2 x 1 x 2 + θ 3 x 2 1 + θ 4 x 2 2 . The subject of interest here is a detection of significant increase in the intensity of the pollutant emission from the source. As the symptom of this abrupt fault we can observe the excessive deviation of the parameters θ from their nominal values. Therefore, these parameters need estimation based on measurement data from monitoring stations.
A Matlab program was written using a PC equipped with Intel Core i5 processor (1.9GHz, 12 GB RAM) running Windows 10. The nominal values of the systems parameters were assumed to be θ 0 1 = 0.02, θ 0 2 = 0.01, θ 0 3 = 0.005. Calculations were performed with a finite element method for a spatial mesh composed of 682 triangles, 378 nodes and an evenly partitioned time interval (3 subintervals T k = ((k − 1)/3, k/3], k = 1, 2, 3). The sensitivity coefficients were then linearly interpolated and stored within the sensor nodes repositories. In our scenario, the observation grid was assumed to be created at locations selected from among those elements of above-mentioned 378-point triangulation mesh which do not lie on the outer boundary (there were 312 such nodes, which are indicated with dots in Fig. 2). Given N = 312 prospective sites in Ω ∪ Γ we aim at finding the locations at which the measurements made by sensors would lead to D-optimum least-squares estimates of the parameters θ. In simulation it was assumed that the network is fully connected with uniform probability distribution for the connection between selected pair of nodes. Sensor configurations at different stages of algorithm are shown in Fig. 2. The sensors tend to form the pattern reflecting the areas of greatest changes in the pollutant concentration but the observations are averaged over time and it is not trivial to follow the dynamics of the observation strategy. It can be observed that during each time subinterval the system dynamics enforce the distribution of experimental effort. In the first stage (T 1 ), the pollutant concentration takes significant values only in the very close vicinity of the source, thus all measurement effort is spent at this location (cf. Figs. 2(a)-(c)). As time elapses the pollutant spreads over the whole spatial domain, therefore the sensors tends to measure the system state in other regions, where we observe the greatest changes in concentration and gather the rich information about the system dynamics (Figs. 2(d)-(f)). Note, that algorithm approximate the pattern of final design very fast after reaching relative small number of r. Then, the algorithm slows down and achieving a very high accuracy of the solution becomes expensive in the sense of the number of pairwise communications. An application of a more efficient communication scheme at the latter stage of configuration process would be helpful here to overcome this effect.

Concluding remarks
The problem of selecting optimal observation points in view of reliable detection of abnormal state of DPS has been addressed. The main contribution of this work is a proper characterization of problem relaxation and its adaptation to a distributed scheme of computation and information exchange. The advantage of introducing continuous designs lies in the fact that the problem dimensionality is dramatically reduced. A decided majority of existing approaches to the sensor location problem for DPSs is designed only for centralized monitoring systems, and in the context of modern sensor networks often this comes out impractical. On the other hand, numerous effective routines for distributed information fusion designed for sensor, peer-to-peer or wireless ad hoc networks are not directly applicable for the purpose of parameter estimation in DPSs. The decided advantage of the proposed distributed calculation scheme is a great simplicity and the ability to produce solutions reasonably fast.