Achieving Fault-Tolerant Network Topology in Wireless Mesh Networks

Wireless Mesh Network (WMN) is an ad-hoc network with a fixed network infrastructure (see an example in figure 1). The physical structure of a WMN includes base stations, a backbone and mobile stations. The base stations (also known as mesh routers or mesh points) are static wireless nodes, forming the network infrastructure and providing wireless network access to the mobile stations. The backbone is a wireless ad-hoc network among the base stations. The fixed network infrastructure provides wireless network access to the mobile stations in a service area. Service area is a finite three-dimensional space. The mobile stations are wireless

nodes which move within the service area and communicate to other stations via the WMN.The stations in a WMN use a multi-hop routing protocol for communication.This protocol automatically discovers the network topology and delivers the messages to the destination; if needed over multiple hops.We can think of a WMN as an infrastructure wireless network in which the backbone is replaced by a wireless one and the communication is done in a (multi-hop) ad-hoc way.
We consider a wireless mesh network which supports a business process and is under the administration of an organization.This is not a MANET (Mobile Ad-hoc Network) consisting of self-dependent mobile nodes, like it is often in the literature.The organization has control over the network infrastructure and aims at providing radio coverage and connectivity in a clearly defined service area.The management appliance is a central instance for basic configuration and diagnosis of the WMN, including topology monitoring, protocol settings, traffic management, etc.
Radio coverage and connectivity are basic services of a wireless mesh network which are required for communication.Radio coverage ensures that the mobile stations can access the network infrastructure (backbone) while they are located or moving in the service area.Connectivity ensures that the topology of the backbone is connected.

Radio coverage
The service radio coverage is correct, if the service area is covered by the base stations.The service area is covered, if the unification of radio cells of all base stations contains the whole service area.The radio cell of a base station is a part of the space around it, in which a mobile station observes the base station with a radio signal strength sufficient for communication.The sufficient radio signal strength in the service area is a basic requirement for the mobile stations to be able to access the WMN.The radio coverage service ensures this sufficient signal strength in the service area.Service location is a point of the service area, specified by its coordinates.A service location is covered, if the unification of radio cells of all base stations contains the service location.

Connectivity
The service connectivity is correct, if the backbone graph is connected.The backbone graph is a graph with the base stations as vertices and the routing layer links among them as edges.A link exists if two wireless devices can communicate through the wireless medium obeying some qualitative parameters (see section 4.3 for more information).The backbone graph represents the network topology at the routing layer.This graph is connected, if a path (a sequence of edges) exists between every two vertices.A connected backbone graph means a connected routing layer topology which is a basic requirement for communication through the WMN.The connectivity service ensures that the backbone graph is connected.
At the example WMN in figure 1 the radio coverage and the connectivity are correct.The unification of radio cells contains the service area and the backbone graph is connected.

Problem exposition and contributions
In this chapter, we address the problem of guaranteeing radio coverage of Wireless Mesh Networks, which are exposed to environmental dynamics.
The environmental dynamics are unpredictable changes of the radio propagation and radio attenuation properties of the environment (e.g.new obstacles, movement of obstacles, increased humidity).They occur due to reconfiguration of the plant layout.Environmental dynamics occur, for instance, in Reconfigurable Manufacturing Systems (RMS) [11,26].An RMS is a production system with an adjustable structure, that is able to meet the market requirements with respect to capacity, functionality and cost.This adjustable structure at the system level includes changes in the plant layout, for instance "adding, removing or modifying machine modules, machines, cells, material handling units and/or complete lines" [11].In RMS the environmental dynamics are unpredictable at design time, because the system layout adjustments are made on the fly to meet the actual production demand.The environmental dynamics negatively affect the radio coverage (radio signal strength between mobile stations and base stations) and the backbone network connectivity among base stations of an WMN.
The first contribution of this chapter is a fault-tolerance method for guaranteeing radio coverage of wireless mesh networks in dynamic propagation environments.The basic idea of this approach is to automatically detect an error state, which is lack of redundancy in radio coverage and connectivity, and correct this error by reconfiguring the base stations before the radio coverage fails.The error detection is based on a radio propagation model: if an error is detected in the model, this is an indicator that an error in the real radio coverage exists.In order to be able to make this conclusion, this model is automatically calibrated to the real enviromnent by using radio signal strength measurements.
The second contribution of this chapter is an automatic base station planning algorithm for the reconfiguration phase of the fault-tolerance approach.In this phase base stations are added to the network in order to correct errors in the radio coverage and connectivity.The question is: what is the minimum number of base stations to be added and at which positions in order to restore the original state of radio coverage and connectivity.Our approach is an optimization algorithm, which uses knowledge from the calibrated radio propagation model and answers this question in a sufficient time frame.

Structure of the chapter
In section 2 we will discuss related work.In section 3 we will present our fault-tolerance approach for ensuring the availability of radio coverage and connectivity of wireless mesh networks.In section 4 we will present our approach for automatic base station planning in wireless mesh networks, which is used in the reconfiguration phase on the fault-tolerance approach.Section 6 provides a conclusion and a summary of future work.

Related work
Firstly, we will present related work aiming at availability of the radio coverage and connectivity.Then we will discuss related work to the automatic base station planning algorithm.

Availability of the radio coverage
The availability of the service radio coverage is a necessary condition for reliable communication in wireless networks.The issue of reliable communication via wireless medium has been extensively investigated during the design of every wireless communication system.Since the wireless medium is unshielded, the effect of the environment on the wireless communication is specific to the environment.Different methods have been developed for increasing the reliability of the communication through the wireless medium.Most of them are at the physical layer.For instance the robust modulation methods (e.g.MIMO), frequency hopping, spread spectrum transmission, redundancy in the antennas and redundancy of the transmitters.At the data link layer, error correction codes and retransmissions are typical measures.These methods mostly address the time-variability of the wireless channel caused by multi-path propagation.However, all these methods require some minimum radio signal strength at the receiver which is a basic requirement for decoding the frames successfully.Providing this minimum radio signal strength is a matter of network deployment and configuration in the particular environment.
The state-of-the-art method for ensuring radio coverage has a static nature (e.g.[8,10,35]).Figure 2 shows the general procedure of this method.The method ensures radio coverage during the network deployment before the network starts operation.Usually, an expert plans the base stations properties so that the requirements for the radio coverage are fulfilled.The expert makes this planning based on knowledge about the environment and the requirements.For this purpose, measurements in the particular environment are typically needed.Then, the base stations are installed.After the installation, a manual site survey is conducted with the purpose of proving that the requirements are satisfied.The site survey includes manual measurements of the radio signal strength on selected service locations in the whole area.If the requirements are not satisfied adjustments should be made.The adjustments are site-specific and may include removing obstacles, changing frequencies, or adding new equipment [10].When the requirements are fulfilled, the wireless network enters the operational phase.In the operational phase, there is no automatic function for monitoring and maintaining the radio coverage.The only way to do this is by making a manual site survey which is expensive in terms of time and effort.The loss of radio coverage can only be detected by the mobile stations and the applications.The network connection is lost and no communication is possible.The repair of radio coverage is started when the applications report a problem of this kind.During the radio coverage repair the presence of a expert is required for troubleshooting and base station planning.
For compensating the dynamics of the environment, the static method uses static radio signal strength redundancy (called fade margin).In communication systems design the term fade margin (or margin) is the amount of signal strength reserve.This is the power, added to the needed minimum level for reception of the frames at the receiver.The fade margin is configured during the planning phase via adequate selection of transmitters and antennas [8,37].The fade margin is used for compensating temporal variations in the environment.When the environment changes, the radio coverage eventually degrades.But if the redundancy is sufficient, the radio coverage is still correct and the applications are not affected.However, the radio coverage could have entered a critical state; meaning that further changes in the environment may lead to service failure.Since there are no automatic monitoring functions for the radio coverage, this state of lost redundancy is not detected, and remains in the system.In this state, the next change in the environment can lead to service failures.
In the context of this chapter, we have high availability requirements.We have an environment which can change in unpredictable way during the network's life-cycle which is typically larger than 10-20 years.For this reason, it is hardly possible to plan sufficient static redundancy for all possible changes of the environment.They are not known at the deployment phase.Even if this would be possible, it would be extremely inefficient.Consequently, a new method is needed for guaranteeing radio coverage.When the factory-layout changes for adapting to a new market, the method should enable an easy adaption of the WMN and should guarantee high availability of the radio coverage and the connectivity.

Connectivity and base station planning
In this section we focus on the deployment and operation of the base stations which is an essential function for connectivity.For the routing protocol and the topology discovery we base on the research within our working group (e.g.[15,29,32]).
Industrial automation networks have usually been isolated, single-cell networks or classic infrastructure networks with multiple cells.This means that base station planning is required only for the 'last mile', i.e. the connection between a base station and a mobile station, e.g.[8].In the case of multi-hop wireless mesh networks, the planning of the backbone network is a new research aspect that needs to be considered.Research on radio network planning consider network throughput as a main planning goal, e.g.[7].However, the most common requirement of industrial networks is availability.With the introduction of technologies for multi-hop communication in industrial environments (e.g.Zigbee, Wireless HART), the base station planning problem gains importance.Paper [37], for instance, presents the challenges for developing a planning tool for industrial wireless sensor networks.However, to the best of our knowledge, no systematic approach exists for planning multi-hop wireless networks with respect to fault-tolerance requirements of industrial automation networks.
The existing algorithms for the base station planning in wireless mesh networks [2,39] have a different goal.It is to design a mesh network with a minimum number of base stations such that the end-to-end throughput requirements of application flows are fulfilled.These requirements are typical for Internet access in areas with no alternative high-speed wired connection.The approach is to transform the planning problem into a linear optimization problem which is a combination of a set covering problem and a network flow problem.As a result, the backbone is a connected graph, but with no fault-tolerance.Another disadvantage is the intractability of the proposed approaches.For some inputs, the algorithm takes too much time for the result to be useful.This is because the underlying linear optimization problem is a binary integer problem which is well known for its NP-completeness.Paper [39] addresses this issue by a decomposition method, but the algorithm still runs about 22 hours for a network with 58 nodes.This is acceptable for the mentioned scenarios, but for network reconfiguration in automation scenarios a faster algorithm is required.Extending these algorithms to fault-tolerance would mean an additional increase in the complexity.
Paper [41] considers the problem of coverage control in wireless sensor networks, including various aspects like activating/deactivating of the nodes, finding the coverage characteristics of a given network, and sensor node deployment.However, all considerations include only the aspect of last mile coverage, i.e. the sensing function of the nodes.They do not consider the problem of the backbone connectivity for communicating the sensed data to a central instance.
Our approach is to extend the existing methods from infrastructure network planning to planning multi-hop wireless mesh networks with fault-tolerance aspects.Other papers about fault-tolerance in wireless multi-hop networks can benefit from our approach for generating a fault-tolerant topology.Papers considering fault-tolerant routing, for instance [4,19,27], have a prerequisite of biconnected backbone network, but do not address the base station planning problem.The base station planning problem has been little addressed so far because in most mobile ad-hoc and sensor network scenarios the number and position of the nodes are considered uncontrolled or hardly controlled.However, in automation scenarios the networks are typically planned to provide service in some predefined geographical area (e.g.production hall).This requires careful base station planning for ensuring high availability of the radio coverage.
The topology control problem is to configure a given an instance of a multi-hop network such that it is connected and a quality of service property is fulfilled.Depending on the configured parameter, these methods adjust the transmission power [6] or the time of activity and sleeping periods of the nodes [5].Paper [6] presents an algorithm for distributed adjustment of the transmission powers of the nodes with the purpose of minimizing the interference and keeping the network topology connected with a high probability.Paper [5] presents a distributed protocol for topology management which determines the active and sleeping periods for the nodes in such a way that the network is connected, the energy consumption is minimized, and the data is delivered with real-time guarantees.Paper [40] considers the issue of data forwarding in industrial wireless sensor networks and the integration in a wired backbone.It proposes a chain-based communication protocol for real-time communication over multiple hops.It is common for all topology control protocols that they operate on some existing instance of a multi-hop network.For achieving the required quality of service property, these protocols require some topological properties of the network (like connectivity or k-connectivity).The difference is that our base station planning algorithm plans a given network to be deployed with the desired topological properties.In this way, our algorithm can be used in the first phase of planning the topological properties of the network.In a second phase a topology control algorithm can be used to additionally adjust the transmission powers or active/sleep times of the nodes for achieving the required QoS property.

Fault-tolerant radio coverage and connectivity
This section presents our approach for fault-tolerant radio coverage and connectivity of wireless mesh networks in dynamic propagation environments.A premature version of this approach, considering only radio coverage, has been published in [22].

Fault-tolerance approach
We consider the goal of this chapter at a general abstraction level.It is to guarantee availability of the services (radio coverage and connectivity) of a system (wireless mesh network) which is exposed to dynamic external behavior (the dynamic propagation environment).The environmental dynamics is an external factor to the wireless network.It results from the changing surroundings of the wireless network.
For this general type of problem, a well-known method exists in the field of dependable computing.This is the fault-tolerance approach [3].Fault-tolerance avoids service failures in the presence of faults.Service failure, or failure, is the inability of a system to perform a service according to the service specification.Error is a part of the system state which may lead to a subsequent service failure.A fault is the cause for an error.The fault-tolerant system design includes fault model definition, error detection and system recovery.The fault model definition identifies a set of faults, for which service failures do not occur.The error detection identifies errors in the system, caused by the faults.The system recovery transforms a system with errors to a system without errors.The idea is to detect errors and perform system recovery before the errors lead to failures.In this way, the fault-tolerance approach avoids failures if faults from the fault model occur.In this chapter we apply the fault-tolerance approach for guaranteeing availability of radio coverage and connectivity of wireless mesh networks in dynamic propagation environments.

Fault model definition
A fault in our system is the environmental dynamics.Environmental dynamics are changes of the radio attenuation properties of the environment (e.g.new obstacles, movement of obstacles, increased humidity).The attenuation describes the ability of the radio propagation environment to absorb and weaken the radio waves.An increased attenuation has a negative effect on radio coverage and connectivity.Regarding radio coverage, it reduces the radio signal strength at the service locations.This can lead to the fact that some service locations are not covered.The effect on connectivity is that some backbone links can be lost.This can disconnect the backbone network.If no measures are taken, the fault environmental dynamics can lead to service failures.A fault is the event of environmental dynamics which decreases the ARSS (Average Radio Signal Strength) up to a user-specified amount ΔARSS.

Fault-tolerant system design
Our system design uses redundancy for tolerating the faults.Figure 3 shows the state machine of our fault-tolerant system.The figure shows the system states, their attributes and their entry actions.The initial state is the normal state.In addition to the correct service, the normal system state contains redundancy for compensating the faults at run-time.In this normal state the system performs concurrent error detection, meaning that the error detection takes place during the normal service delivery.In the error state the redundancy is lost due to a fault, but the service is correct because the initial redundancy has compensated the negative effects of the fault.In this state, the system performs system recovery.The system recovery restores the initial redundancy.In the following sections we will specify how we applied this concept to the services radio coverage and to connectivity.For each service we will define the correct service specification, the redundancy and the error.A failure for both services occurs when the service consumer (a mobile station) tries to use the service and the service is not correct.Our fault-tolerant system design avoids the failures.

Correct service
Radio coverage is correct if every service is covered by at least one base station with a radio signal strength of at least ARSS Min .

Redundancy
In order to ensure correct radio coverage in case of faults, the normal system state uses radio signal strength redundancy.This means that every service location is covered by at least one base station with a radio signal strength of at least ARSS RED .ARSS RED is the value of the redundant radio signal strength needed for compensating the environmental dynamics during the error detection and system recovery (ARSS RED = ARSS Min + ΔARSS).

Error
In the error state, the radio coverage is not as good as the radio coverage in the normal state, but the radio coverage is still correct.An error exists, if at some service location the ARSS is less than the redundancy value, but it exceeds the minimum threshold for correct coverage:

Correct service
Connectivity is correct if the backbone graph is connected.

Redundancy
In order to ensure correct connectivity in case of faults, the backbone graph is biconnected (2-connected).A graph is biconnected if any two vertices can be joined by two independent paths [9].This backbone redundancy compensates for the loss of a backbone link as a result of a fault.

Error
In the error state, the backbone graph is not biconnected, but it is connected.The loss of biconnectivity can be caused by environmental dynamics leading to link loss.The loss of a link is not necessarily a connectivity error.It is an error only if it leads to loss of the biconnectivity.

Error detection
When faults occur and lead to errors, the errors have to be automatically detected by the system.Since we are considering two services, radio coverage and connectivity, we need methods for detecting radio coverage errors and connectivity errors.Figure 4 shows our methods for error detection and their integration in our fault-tolerant system design.

Normal state Error
Correct

Connectivity error detection
For detecting connectivity errors we use a monitoring at the routing layer and a classic biconnectivity testing algorithm from graph theory [9].This algorithm uses information about the backbone graph and determines whether it is biconnected or not.If the graph is not biconnected, then there is an error.The required information for biconnectivity testing are the edges (links) among the vertices (base stations) of the graph.In our scenario, this information is globally available at the management appliance.As a part of the routing protocol, the base stations monitor the backbone link states by exchanging control messages with other base stations [17].The state of every link is determined by two communication endpoints (base stations).One of them sends control messages and the other one determines the link state based on a statistic on the received messages.The link state information is periodically updated and communicated, so the management appliance has an actual global view of the backbone network.Based on this global view, the management appliance performs biconnectivity testing.The fact that every link state is determined by two communication endpoints enables us to detect connectivity errors by monitoring at the routing layer.If the backbone link state information is not available globally, distributed biconnectivity testing algorithms can be used (e.g.[34]).

Radio coverage error detection
The information required for radio coverage error detection is the radio signal strength at every service location.However, a communication endpoint at every service location does not exist.Therefore, radio coverage errors can not be detected by monitoring, as with the connectivity errors.Nevertheless, a method for detecting these errors is needed because the environmental dynamics affect the radio coverage.The radio coverage should be guaranteed for every service location before a mobile station moves to those locations.
Our approach is to use a model-based assessment for detecting radio coverage errors at the physical layer.We use a radio propagation model for assessing the radio signal strength at every service location.This model has a tight relation to the propagation environment.We use measurements from the wireless network for calibrating the model to the reality.
In the state-of-the art assessment approaches the radio propagation models are static; meaning that they do not reflect the dynamics of the environment.The innovation of our approach is that the radio propagation model automatically calibrates to the real environment.Radio model calibration is the process of adjusting the model-parameters in such a way that the model reflects better a set of measurements from the actual propagation environment.Radio coverage assessment is the model-based estimation of the radio signal strength for the purpose of error detection.The radio model calibration method is out of scope of this chapter, but the reader can find a detailed description in [20,23,24].

System recovery
The system recovery transforms a system with errors to a system without errors.In our approach we use the same mechanism for recovery from radio coverage errors and for recovery from connectivity errors.This mechanism adds new base stations to the network.The new base stations improve the radio coverage by increasing the radio signal strength at the service locations.The new base stations also improve the connectivity by adding new links to the backbone network.Given a wireless mesh network with radio coverage and/or connectivity errors we have to decide how many base stations there is to install and and where to install them in order to correct the errors.For this purpose, we have developed an automatic base station planning algorithm (see section 4. The error recovery includes automatic base station planning and manual reconfiguration (see figure 4).The management appliance runs the base station planning algorithm and gives instructions to the operating staff for the reconfiguration.The operating staff performs the reconfiguration which restores the redundancy of the services.

Automatic base station planning
This section describes our algorithm for automatic base station planning.It starts with a problem definition for the base station planning, followed by an overview of our approach in section 4.2.The following sections define the details of the algorithm, namely the used link state model, the optimization approach and the graph consolidation approach.This algorithm is published in [25]; in addition this section describes the integration with the presented fault-tolerance framework.

Problem definition
The problem of the base station planning algorithm is to find a minimum number of base stations to be installed which transform a wireless mesh network with radio coverage errors and/or connectivity errors to a system without errors.The existing algorithms for this type of problem in wireless mesh networks are computationally intractable, or do not provide the required fault-tolerance (see section 2.2 for a discussion).The following input information is given to the base station planning algorithm: • Service location information.This is information about the service locations which have to be covered.
• Candidate sites information.This is information about possible locations of the base stations.The candidate sites and the service locations are specified by the deployment staff.
• Radio coverage information.This information is obtained from the radio propagation model.This is for every service location, the candidate sites which cover this service location, if base stations were installed at all candidate sites.
• Connectivity information: for every candidate site, the candidate sites which have a link in the backbone network, if base stations were installed at all candidate sites.For this purpose, we use our calibrated radio propagation model and a link state model (section 4.3).

• The currently installed base stations and their positions
The base station planning algorithm has to determine the number and positions of base stations to be installed such that: • The radio coverage and the connectivity enter the normal state.The normal state includes redundancy in the services which has been defined in section 3.
• The algorithm should provide an acceptable relation between base stations minimality and running time.The running time of the algorithm should be appropriate for error detection and system recovery in a dynamic propagation environment.
The challenge of the defined problem is the connectivity requirement.The coverage requirement can be formally defined as a local property which depends only on the considered entities (e.g. a base station covers a service location).For the connectivity, the requirement is global.It includes all network paths among all pair of base stations.The existence of a path between two base stations depends not only on the considered base stations, but on the number and positions of all other base stations in the network.The fault-tolerance (biconnectivity) requirement increases the complexity of the problem.It has been shown that finding a minimum number of base stations for this type of problematic is an NP-complete problem.For this reason, we are looking for an approach, having a good balance between minimality and running time.

Overview of the algorithm
Our idea is to perform an optimization, satisfying a simple local network property which significantly affects the fulfillment of the global property (biconnectivity).This local property is the minimum degree.For the backbone (multi-hop) network, the degree of a base station is the number of links to other base stations.The minimum degree of the network is the least degree among all base stations.In graph theory, the minimum degree is a necessary but not sufficient condition for k-connectivity [9].This means that a k-connected graph has a minimum degree of k, but a graph with minimum degree of k is not necessary k-connected.Formally, this rule applies to the backbone of wireless mesh networks.We consider both radio coverage and connectivity.The service locations are spread in some area (e.g.production hall).Hence, the probability that the necessary condition is also sufficient in mesh networks is significantly higher than the probability in graph theory.Therefore, our algorithm fulfills the local necessary condition and checks whether the global sufficient condition is also fulfilled.If not, the algorithm performs an incremental correction.The advantage of this approach is that it fulfills the connectivity requirement without increasing the complexity of the underlying optimization problem.
The algorithm operates in three steps: optimization, connectivity testing, and graph consolidation (figure 5).The optimization step finds an optimal solution for the optimization criteria.The optimization criteria are the radio coverage requirement and the necessary condition for the connectivity (the local property min.degree).The optimization uses the radio propagation model and the link state model.The connectivity testing step tests the resulted graph for biconnectivity (the sufficient condition).If the sufficient condition is true, the algorithm finishes.Otherwise the algorithm performs a graph consolidation step.The consolidation step maps biconnected parts of the to a single vertex.After the consolidation, the algorithm continues with the optimization step which is done based on the consolidated graph.After a few (expected 1-3) iterations, the algorithm produces a solution that satisfies the coverage requirements.

Example
The optimization step has produced a graph with minimum degree 2 (figure 6A) according to the necessary condition.This graph does not satisfy the biconnectivity requirements (one edge and two vertices exist whose removal disconnect the graph).The consolidation step identifies two sub-graphs which are biconnected, and maps them to vertices (figure 6B).Note that after the consolidation, the minimum degree of the graph is 1.Then the optimization step places a new base station, such that the consolidated graph plus the new vertex result in a graph with minimum degree of 2 (figure 6C).Finally, the deconsolidated graph satisfies the biconnectivity requirements.

Link state model
This section defines the used link model which models the link state based on the radio signal strength.The used link model in this chapter considers the operation of an ad-hoc routing protocol.We have shown in [17] that the communication in a mesh network is possible only if the links have some quality level.
The routing protocols determine the state of a link by analyzing the periodically received Hello packets from the neighbors.Depending on the mobility and the required stability of a link, different approaches for determining the link state at the routing layer exist [28,31,42].What is common for all of them is the analysis of received Hello packets at the routing layer.The AWDS (Ad-hoc Wireless Distribution System) [1][17] routing software, for instance, identifies a link as existing if 10 consequent Hello packets in both directions are received correctly.A link is identified as non existing if 3 consequent Hello packets in either direction are not received.
The radio signal strength is one of the main factors which determine the reception of the packets at the receiver [14,35].This means that if the RSS is too low, then the wireless adapter can not decode the frame correctly.Therefore, to model the existence of a link, we use a threshold model based on ARSS.If the average radio signal strength exceeds the threshold (ARSS ≥ ARSS Min ), then a link exists, otherwise a link does not exist.Remember that our fault-tolerance approach ensures that ARSS ≥ ARSS Min + ΔARSS.
There are other factors, influencing the packet loss and the link state (e.g.collision, radio interference).But the factor RSS is a necessary condition for successful frame decoding.In wireless mesh networks, it is one of the most influencing factors for the link state.This has been shown in our research in wireless mesh network routing [16][17][18], wireless network simulation and emulation [21].Other researchers in our group are working on improving the link state model.They apply a data mining based approach for predicting the link state from various network monitoring information [28].

Minimization approach
Our algorithm uses a minimization approach based on binary search for finding the minimum number of base stations (BS min ) which satisfies the optimization criteria.It searches iteratively

Optimization problem formulation
The optimization performed at each iteration can be defined by the following:

• Variables
The optimization variables are the positions of the base stations (X, Y, Z) BS .We consider a typical multi-hop network, operating in a single frequency.Therefore, the frequency assignment is a constant for all base stations.

• Bounds
The variables have lower and upper bounds according to the candidate sites information, provided by the user.For instance, if the base stations are to be installed on the ceiling of a production hall with dimensions 200x300x6m, then the bounds are: 0 ≤ X ≤ 200, 0 ≤ Y ≤ 300,Z = 6.For the currently installed base stations, the lower and upper bounds are equal to the base stations coordinates.In this way, they are considered in the solution, but are not relocated by the algorithm.
parameters (X, Y, Z) BS .This is because the objective function contains the radio coverage model which includes the geometry of the model.Several algorithms exist for solving this type of problem (pattern search, genetic algorithm, simulated annealing).We have selected pattern search, because it has a proven convergence and supports any type of constraints [33].

Connectivity testing
For k-connectivity testing in a graph with n vertices, we use existing algorithms from the graph theory [9].The complexity of this algorithm is O(k * n 3 ), under the condition that k < √ n which is true in our case.

Graph consolidation
In this step, the algorithm finds sub-graphs satisfying the connectivity requirements and transforms each subgraph into a single vertex.The formal specification of the graph consolidation step is described by pseudo code in algorithm 2 which is explained in the following list.Figure 7 shows an example of the operation of the graph consolidation step.
1. Given a graph G, identify all biconnected components G c containing at least 3 vertices and store them in a set BC.For finding biconnected components, existing graph theory algorithms are used.
2. Identify the special articulation points which are articulation points shared between the biconnected components in the set BC.An articulation point is a vertex whose removal disconnects a graph.On figure 7B) vertices 1, 2 and 3 are articulation points.Vertex 1 is a special articulation point, since it is shared between two biconnected components of size of at least 3.For identifying biconnected components and articulation points existing graph algorithms are used [9].
3. Every vertex which is either a special articulation point or other vertex, not belonging to a biconnected component in BC, is directly transformed into a vertex in the consolidated graph.The consolidated vertex inherits all edges of the original vertex.

Evaluation approach and implementation
We will present an evaluation of the base station planning algorithm according to the following evaluation criteria: • Fault-tolerance: this shows the algorithm's ability to generate a network configuration that satisfies the fault-tolerance coverage requirements.
Algorithm 2 Pseudo code of the graph consolidation step   We performed a model-based evaluation of the algorithm.We generated different inputs to the algorithm, then executed the algorithm and observed the evaluation criteria.As an input of the algorithm, we used a service area with various sizes; typical for a production environment (see table 1 for the parameter values).The service locations comprise of the entire floor.The candidate sites comprise of the entire ceiling.We also varied the attenuation of the propagation environment.For the radio connectivity model, we used the log-normal shadowing propagation model [36] which is used for radio coverage assessment.The path loss exponent has been fixed in these experiments.The shadowing factor X σ models the inhomogeneity of the propagation environment and it has been varied in these experiments.
The other parameters of the propagation model are fixed.To determine the connectivity, we used our threshold-based link state model.The base station planning algorithm has been implemented in Matlab (about 600 lines of code).The algorithm has been tested on all the combinations of input parameters (area size and shadowing deviation) which make a total of 36 executions.At the end of each algorithm execution, we performed a requirements test.We tested whether the radio coverage and the connectivity were in normal (redundant) state.

Results for fault-tolerance
With all the inputs, the algorithm has generated a network topology in which the radio coverage and the connectivity were in the normal (redundant) state, as defined in section 3.An example graph of the network topology, generated by the algorithm for area size 200/200m and shadowing deviation 8 is shown on figure 8.The related work algorithms [2,39] generated topologies which are not fault-tolerant.Their topologies optimized the network throughput, but the backbone network war not biconnected (see figure 3 in [2], and figure 4 in [39]).Figure 8 clearly shows the effect of the shadowing (inhomogeneous environment) on the base station planning.Because of the shadowing, some links are shorter than others and in some areas, more base stations are needed to provide coverage.

Results for termination, minimality and running time
Figure 9 shows the measured termination property of the algorithm within the performed evaluation.The figure shows the cumulative termination, i.e. the percentage of the algorithm executions that have terminated up to some number of iterations.30% of the algorithm executions generated a correct fault-tolerant solution directly after the first iteration.This means that in these cases, the graph consolidation step was not performed at all.These were the cases when the area sizes were smaller (50/50m and 100/100m).80% of the algorithm executions generated a correct fault-tolerant topology after the second iteration.This means that only two optimizations and one graph consolidation were needed.The algorithm needed a maximum four iterations to complete all the inputs.90% of the base stations were selected at the first algorithm iteration.This means that 90% were selected according to the global optimization function and were optimally placed.The remaining 10% of the base stations were selected during the subsequent algorithm iterations in order to ensure the biconnectivity of the backbone.Figure 10 shows the result after the first iteration for area size 150/150m and shadowing deviation 7.In the middle of the graph (around coordinates 65/44), a base station exists, whose removal would disconnect the network.In the next iteration the algorithm corrected this by inserting one base station in proximity of the first one (see figure 11).
For the total 36 executions, the algorithm needed about 25 minutes to complete on a laptop with a dual core 2.5GHz processor and 3GB operating memory.This means that the average running time was 42 seconds.As a comparison, a related work algorithm in [39]    hours for a 58-node scenario because of the intractability of the approach.This means that for the purpose of the system recovery, our algorithm has an acceptable running time.

Conclusion
In this chapter, we developed a new approach for guaranteeing the availability of the services radio coverage and connectivity of Wireless Mesh Networks in dynamic propagation environments.Our approach is to apply fault-tolerance for avoiding service failures in the presence of environmental dynamics.Differing from the existing methods, we use reconfigurable redundancy of the services.As the radio propagation environment changes, our method changes the redundancy of services.
When the environmental dynamics is detected, the system recovery adds base stations to the network for restoring the redundancy of the services.But firstly, it has to be decided what the minimum number of base stations would be (and respectively their positions) which will restore the redundancy.For this purpose, we developed a new base station planning algorithm which takes the required decision and proposes reconfiguration instructions.Since the underlying optimization problem is NP complete, our algorithm is a trade-off between minimum base stations and minimum running time.The operating staff performs the network reconfiguration which restores the redundancy of the services.
In future work the presented concept will be integrated in a system for dependable end-to-end communication in wireless mesh networks.This system will incorporate other ongoing research works within our working group [30,31] developing concepts for end-to-end quality of service guarantees (throughput, packet loss, latency) in Wireless Mesh Networks.Another aspect of our future work is to integrate the developed concepts in components for industrial wireless communication in cooperation with german product manufacturers.

Figure 4 .
Figure 4.The error detection and system recovery of our fault-tolerant system

Figure 5 .
Figure 5. Base station planning algorithm

Figure 6 .
Figure 6.Example operation of the base station planning algorithmthe interval between a lower bound BS low and an upper bound BS up .At each iteration, the algorithm chooses the middle of the interval as a current value for BS and determines whether a solution is possible by solving an optimization problem.If the solution satisfies the optimization criteria, then the algorithm decreases BS by searching the lower half of the interval, otherwise it increases BS by searching the upper half of the interval.Finally, the algorithm finds a minimum value for BS which satisfies the optimization criteria.

4 .
For every biconnected component in the set BC: (a) If it contains special articulation points, then they are removed from the component.(b) All vertices from the component are transformed into a single vertex in the consolidated graph.(c) The consolidated vertex inherits all edges of the original vertices to other vertices in the graph.Other vertices are vertices not belonging to the same biconnected component.(d) Duplicated edges in the consolidated graph are removed.

Figure 7 .
Figure 7. Example of the graph consolidation step • Termination: this shows the number of iterations the algorithm needs to complete and the running time.

Figure 9 .
Figure 9. Algorithm termination: 80% of all algorithm executions terminated after 2 iterations.The algorithm needed a maximum of 4 iterations to complete.

Figure 10 .
Figure 10.Example network topology after the first algorithm iteration

Figure 11 .
Figure 11.Example network topology after the second algorithm iteration.Only one additional base station results in a biconnected topology.

Table 1 .
Evaluation parameters