Design and Implementation of a Single-Frequency Mesh Network Using OpenAirInterface

OpenAirInterface is an experimental open-source real-time hardware and software platform for experimentation in wireless communications and signal processing. With the help of OpenAirInterface, researchers can demonstrate novel ideas quickly and verify them in a realistic environment. Its current implementation provides a full open-source software modem comprising physical and link layer functionalities for cellular and mesh network topologies. The physical (PHY) layer of the platform targets fourth generation wireless networks and thus uses orthogonal frequency division multiple access (OFDMA) together with multiple-input multiple-output (MIMO) techniques. The current hardware supports 5 MHz bandwidth and two transmit/receive antennas. The media access (MAC) layer of the platform supports an abundant two-way signaling for enabling collaboration, scheduling protocols, as well as tra ﬃ c and channel measurements. In this paper, we focus on the mesh topology and show how to implement a single-frequency mesh network with OpenAirInterface. The key ingredients to enable such a network are a dual-stream MIMO receiver structure and a distributed network synchronization algorithm. We show how to implement these two algorithms in real-time on the OpenAirInterface platform. Further more, we provide results from ﬁeld trials and compare them to the simulation results.


Introduction
The design and implementation of next generation wireless networks is a very challenging task. To ensure optimal performance it is necessary to carry out performance evaluations and field trials in parallel to standard development. Easily reconfigurable testbeds are a convenient way to investigate new ideas and to tackle many problems at an early development stage.
Novel ideas for wireless networks are usually first studied using computer simulations based on some kind of model of the network, the hardware and the radio channel. These models usually make assumptions in order to simplify or isolate the problem at hand. However, it might turn out that the assumptions are not fulfilled in a real environment. An easily reconfigurable experimental platform allows to study novel algorithms under realistic conditions. Comparing simulation results with results from lab tests and field trials reveals if initial assumptions were correct or if they need to be refined. This paper presents the Eurecom testbed OpenAirInterface, which is an experimental real-time, open-source hardware and software platform for future wireless networks. OpenAirInterface can be seen as a mock standard for realistic experimentation purposes which retains the salient features of a real radio system, without all the required mechanisms one would find in a standard used in deployment of commercial networks. Its aim is to study techniques such as multicell cooperative techniques, distributed synchronization, interference coordination and cancellation.
OpenAirInterface features an open-source software modem written in C comprising physical and link layer functionalities for cellular and mesh network topologies. This software modem can be used either for extensive computer simulations using different channel models or it can be used for real-time operation. In the latter case, it is run under the control of the real-time application interface (RTAI) which is an extension of the Linux operating system.
The use of an open-source software modem has several advantages. Firstly, the same code can be initially debugged and tuned in simulation before using it in the real-time modem (where debugging and performance analysis is much harder). Secondly, the system is very flexible and parameters like frame structure, pilot placement, and so forth, can be changed rather easily. Thirdly, researchers can implement new ideas rather fast, without having to use very sophisticated hardware description languages (HDL). Last but not least since all code is open-source, other researchers can use easily adjust the modem to their needs and collaboration is fostered.
Other highlights of the OpenAirInterface platform are its usage as measurement platform or as an emulation platform which allow to study different aspects of a wireless network in isolation. In the emulation mode, the physical layer is abstracted and emulated over the ethernet. This approach allows to test and investigate MAC and link-level algorithms without using the radio interface [1]. The OpenAirInterface can also be used to perform channel measurements which can be used for channel characterization and capacity analysis [2].
Apart from a general overview of OpenAirInterface, in this paper we present OpenAirMesh-a specification of wireless mesh network and its implementation on the Ope-nAirInterface platform. OpenAirMesh exemplifies two major challenges in future wireless networks. The first challenge is interference which is caused by a very tight frequency reuse in order to increase the network throughput. Interference is especially strong for users at the cell edge severely limiting user's throughput. We propose a low-complexity dualstream MIMO receiver that is able to cancel out interference from a neighboring cell and show its implementation and performance on the OpenAirInterface. This example also highlights the insight that OpenAirInterface provides for developing multicell algorithms. More concretely it was found out that frequency offsets and receive correlation have a very strong influence on the receiver performance and can thus not be neglected in the simulations.
Another big challenge in future wireless networks is synchronization between nodes, especially indoors where a reference timing signal such as the one provided by the global positioning system (GPS) cannot be used. Synchronization is needed for example for the dual-stream MIMO receiver and to enable the collaboration between base-stations both on the media access (MAC) and the physical (PHY) layer. The distributed synchronization algorithm proposed in this paper can be interpreted as a form of firefly synchronization [3].

Related Work.
Many people have recognized the advantages of testbeds for wireless networks. However, many of them focus on a particular layer, such as the physical layer, the link layer or the network layer. A good overview of testbeds that focus on the physical layer and especially MIMO communications is given in [4,5].
Most of the testbeds that provide both physical and link layer functionalities (like OpenAirInterface) are either based on the Universal Software Radio Peripheral (USRP) from Ettus Research [6] together with the GNU software radio [7] or the wireless open-access research platform (WARP) from Rice University [8]. For example, the Hydra testbed of the University of Texas in Austin [9] is based on the GNU radio platform while the WAPRnet testbed [10] is based on WARP platform. The WiTestLab from the Polytechnic Institute of NYU [11] has been experimenting with both platforms. Another testbed example it the Cognitive Radio Testbed from Berkley [12], which uses the Berkeley Emulation Engine 2 (BEE2) [13] as an implementation platform.
Compared to the OpenAirInterface platform, the GNU radio project does not provide a full reference design, but only building blocks. Further, a MAC layer implementation is missing in the current distribution. Also the USRP hardware has its limitations, mainly due to the connection to the PC over USB or Ethernet, which severely limits the achievable system throughput. Like the OpenAirInterface, WARP is also a full software defined radio (SDR), but physical layer algorithms have to be developed either directly in VHDL or using the Xilinx System Generator toolchain for Matlab. Compared to the use of C language in OpenAirInterface, the use of VHDL is more cumbersome and time consuming. Also, the Xilinx System Generator is not openly available. The BEE2 platform is a very flexible hardware platform, which has been designed for a multitude of applications. However, no software modem exists for this platform.
Last but not least we mention here the two testbeds developed within the EASY-C project (http://www.easy-c.de/) that were set up in Berlin [14] and in Dresden [15] (both Germany). The project is a cooperation between German universities, research centers and industry and focuses on LTE-Advanced technologies. However, both testbeds use proprietary hardware and software and are not openly accessible.
1.2. Organization. Section 2 gives an overview of the Ope-nAirInterface experimental platform. Section 3 presents the network, the link layer and the physical layer architecture of OpenAirMesh-a mesh network built using the Ope-nAirInterface. Section 4 describes the two building blocks for the implementation of OpenAirMesh: a novel lowcomplexity dual-stream receiver architecture and the distributed synchronization algorithm. Finally we show results from computer simulations as well as real experiments in Section 5. We conclude the paper in Section 6.
Notation. Let C denote the set of complex numbers. Scalars are denoted by x. Column vectors and matrices are denoted by a and A and their elements are denoted by a i and A i, j respectively. Transpose and Hermitian transpose are denoted by · T and · H . I M is the identity matrix of size M and 0 M is an M-dimensional vector of zeros. The Euclidean ( 2 ) norm of a vector a is denoted by a and the Frobenius norm of a matrix A is denoted by A F . E denotes expectation, and CN (m, C) denotes a multivariate proper complex normal distribution with mean vector m and covariance matrix C.

OpenAirInterface Overview
The OpenAirInterface platform consists of both hardware and the software components. Additionally it comprises different simulation tools as well as collaborative web tools. The hardware components are described in Section 2.1. In Section 2.2, we describe the basic organization of the OpenAirInterface software components (which are available under the GNU GPL from the OpenAirInterface website (http://www.openairinterface.org/)).

Hardware Components.
In OpenAirInterface there are two different hardware modules available: CardBus MIMO1 (CBMIMO1) and it successor Express MIMO. All current activities (including the experiments described in this paper) are based on CBMIMO1. In the following we will describe the main characteristics of the two boards.
The CBMIMO1 board (cf. Figure 1) comprises two time-division duplex (TDD) radio frequency (RF) chains operating at 1.900-1.920 GHz with 5 MHz channels and 21dBM transmit power per antenna for an orthogonal frequency division modulated (OFDM) waveform (EURE-COM has a frequency allocation for experimentation around its premises in Sophia Antipolis). The cards house a medium-scale field programmable gate array (FPGA) (Xilinx X2CV3000) which makes use of the open-source LEON3 embedded processor from Gaisler research [16]. In the current version, the FPGA implements the interfaces with the Peripheral Component Interconnect (PCI) bus, with the RF frontend as well as with the A/D and D/A converters. The card can be connected to a host PC (in our lab we use Dell Precision M2300 laptops) using a CardBus PCI interface. See Table 1 for an overview of the card's components.  Express MIMO is a baseband processing board, which provides significantly more processing power and bandwidth than CBMIMO1 and will be used for future applications. It comprises two FPGAs: one Xilinx XC5VLX330 for realtime embedded signal processing applications [17] and one Xilinx XC5VLX110T for control. The card uses an eight-way PCI express interface to communicate with the host PC. The card employs four high-speed A/D and D/A converters from Analog Devices (AD9832) allowing to drive four RF chains using quadrature modulation or eight RF chains in low intermediate frequency (IF) for bandwidths of up to 20 MHz; See Table 2 for an overview of the card's components. A RF board for Express MIMO called Agile RF is also available. It offers significantly more RF functionality in terms of tuning range and channel bandwidth than CBMIMO1. The tuning range per RF chain is 180 MHz-8 GHz with 20 MHz channels.

Software
Components. The software components are organized into four areas (folders), which correspond more or less to the different layers of the Open Systems Interconnection (OSI) reference model. The areas also correspond to the directory structure on the OpenAirInterface Subversion (SVN) server. (b) Openair1: Baseband Signal Processing. This folder contains the code for the physical layer software modem along with RTAI/Linux device drivers and user-space tools to control the hardware. It also contains simulation environments and channel models to test the code without the hardware or to do performance simulations. Further, openair1 provides also the functionality for the Eurecom MIMO OpenAir Sounder (EMOS) to perform MIMO channel measurements over multiple users [2].
(c) Openair2: Medium-Access Protocols. This folder contains the layer 2 protocol stack development for PCs along with Linux networking device drivers for Internet Protocol (IP) and Multiprotocol Label Switching (MPLS) interconnection. This pertains to both cellular and mesh network topologies. The folder also contains an abstraction of the PHY layer, providing an efficient emulation platform for layer 2 and higher algorithms.

OpenAirMesh Specification
In this section, we present the specification of OpenAirMesh, a mesh network built using the OpenAirInterface [18]. We start off by describing the network topology in Section 3.1. In Section 3.2 we describe the layer 2 and finally in Section 3.3 the physical layer. A more detailed specification can be found in [19].

Network Topology.
In OpenAirMesh, the network is organized in clusters, where nodes can either take the role of a cluster-heads (CHs) or a mesh router (MR). CHs are typically the best-connected nodes in a particular geographical area and manage radio resources within the cluster. MRs are used to relay information between CHs. An example of a mesh architecture with 5 nodes is shown in Figure 2.

Cluster Head.
The primary role of the CH is to manage radio resources in their cluster, much as a base-station would do in a cellular network. The cluster is defined as the set of nodes which are characterized by one-hop connectivity with the CH. The system is designed as a TDD system, where CHs and MRs transmit in alternating transmission time intervals. Thus-due to the half duplex constraintdirect CH ↔ CH communication is not supported. The downlink (CH → MR) signaling channels allow for the CH to schedule transmission of labels (in the form of time and frequency mappings on the radio resource) each of which carry different types of traffic throughout the mesh network according to pre-defined quality-of-service (QoS) descriptors. The Uplink (UL) signaling channels (MR → CH) are used for relaying bandwidth requirement indicators and channel quality measurements from nodes within the cluster. These feed the scheduling algorithms residing in the CH and allow for proper resource allocation satisfying QoS negotiations carried out using Layer 3 (L3) signaling. The latter are beyond the scope of the description in this paper.

Mesh
Router. The primary role of an MR is to interpret the scheduling information from the CH on the downlink (DL) signaling channels in order to route the traffic corresponding to the scheduled labels on the allocated physical resources. MRs can also be connected to more than one cluster at the same time. Since all CHs transmit on the same time-frequency ressources, MRs must be able to cancel interference; see Section 4.1 for details.

Layer 2 Protocol
Stack. The OpenAirMesh Layer 2 protocol stack is depicted in Figure 3 and comprises: (i) A IP/MPLS networking device (nonaccess stratum (NAS) driver) responsible for provision of IP/MPLS layer services to Layer 2 and vice-versa (ii) A Radio resource control (RRC) entity responsible for MAC layer signaling for configuration of logical channels and retrieval of measurement information.
(iii) A Radio Link Control (RLC) entity which is responsible for hybrid automatic repeat request (HARQ) protocols and IP packet segmentation (iv) A convergence protocol (PDCP) responsible for IP interface and related functions (header compression, ciphering, etc.) (v) A scheduling and multiplexing unit to control the media access (MAC).
The information flow is organized into different traffic queues.
(i) Radio bearers are the user-plane traffic queues at the PDCP-RLC interface (ii) Signaling radio bearers are the control-plane traffic queues at the RRC-RLC interface (iii) Logical channels are the traffic queues at the RLC-MAC interface (both control and user-plane data, see Table 3) (iv) Transport channels are the traffic queues at the MAC-PHY interface which are mapped to physical channels by PHY (see Table 4).
The MAC layer scheduling and multiplexing entity is responsible for scheduling control plane and user plane traffic (logical channels) on the physical OFDMA resources (transport channels). It is important to note that although dedicated resources are configured at the input of the MAC layer, the physical resources allocated in the scheduling entities (with exception of the CHBCH) are dynamically allocated every CH transmission time interval (TTI) and thus all physical resources are shared. The BCCH is multiplexed EURASIP Journal on Wireless Communications and Networking Cluster 1 Cluster 2 Figure 2: The mesh network topology is organized in clusters. Each cluster is controlled by a cluster head (CH). Other nodes in the network are called mesh routers (MR) since they can be used to relay information between CHs. in the scheduling entity responsible for generation of the CH-BCH transport channel along with MAC-layer signaling. MAC signaling concerns both allocations of CH-SACH in the current frame and MR-SACH in the next frame (uplink, downlink and direct link map of PHY resources). The CCCH (uplink) is used exclusively during the attachment phase of the MR with a particular cluster and corresponds to the only random-access resources allocated by the CH in the frame. The DCCHs are multiplexed along with user-plane traffic DTCHs on the available CH-SACH resources. Based on measurement and feedback information, SACH scheduling (see Figure 4) aims to respect the negotiated QoS of each logical channel, while maximizing the aggregate spectral efficiency of the data streams. Different wideband scheduling policies taking into account both queuing measures from RLC and channel quality feedback can be accommodated (see, e.g., [20]). Channel quality information is signaled between corresponding MAC-layers based quantized wideband channel estimates received from PHY.    signaling strategies are included to provide the means for exploiting channel state feedback at the transmitters in order to allow for advanced PHY allocation of OFDMA resources via the MAC. In addition to the physical channels of Table 4, there are two synchronization channels (see Table 5) which are used for parameter estimation. In the following we describe the framing and channel multiplexing as well as the coding and modulation scheme.

Framing and Channel
Multiplexing. The physical resources are organized in frames of OFDM symbols. One frame consists of 64 OFDM symbols and is divided equally in a CH transmission time interval (TTI) and a MR TTI (see Figure 5). The first four symbols of the CH TTI are reserved for pilot symbols. Each CH transmits one common pilot symbol (CHSCH 0 ) at position 0 and one dedicated pilot symbol (CHSCH i ) at position i ∈ {1, 2, 3}. This way, we can ensure orthogonality between the pilots of different CHs received at one MR. The pilot symbols are followed by the broadcast channel (CH-BCH), which contains 128 data subcarriers and 32 pilot subcarriers. The remaining 20 OFDM symbols of the CH TTI frame are divided into 16 ressource blocks (RB), which constitute the multiplexed  scheduled access channels (CH-SACH). Each RB contains 8 subcarriers for data and 2 pilot subcarriers (one for each CH), which are used for frequency offset compensation.
The MR TTI contains the random access channel (MR-RACH) with an associated pilot symbol (SCH 0 ). The next two symbols are reserved for pilots. Each MR transmits a pilot symbol SCH i , i ∈ {1, 2} corresponding to the cluster it belongs to. This way we can ensure orthogonality between the pilots of different CHs. The pilot symbols are followed by the uplink broadcast channel (MR-BCH) with an associated pilot symbol (MRSCH). The rest of the uplink frame contains the multiplexed scheduled access channels (MR-SACH). The end of the CH and MR TTIs are protected by a guard interval of two symbols. All pilots are designed for MIMO and/or multiuser channel estimation at the corresponding end.

Coding and Modulation.
OpenAirMesh makes use of punctured binary codes (64-state rate 1/2 convolutional or 8-state rate 1/3 3GPP/LTE Turbo code). Puncturing can use either 3GPP rate matching or random puncturing in order to fine tune the coding rate to adapt to configurable transport block sizes delivered to PHY by the MAC. The overall coding sub-system is shown in Figure 6. New transport blocks arriving from the MAC layer (based on multiuser scheduling) are coded using a CRC extension and the chosen binary code. These are then fed to the active transport block buffer along with those that are to be retransmitted. Each transmitted block is punctured and then passed to a bitinterleaver and modulation mapper (BICM). OpenAirMesh supports QPSK, 16-QAM and 64-QAM modulation. The transmitted transport blocks can be split into two spatial streams in the case of point-to-point MIMO transmission.
The modulated symbols are then multiplied by an adjustable amplitude and passed to the space-time-frequency (STF) parser. The STF parser multiplexes the pilot symbols and the data symbols into OFDM symbols, taking into account the sub-band allocation from the scheduler. In the case of one available spatial stream the STF parser also performs fast antenna cycling, that is, every subcarrier is transmitted from a different antenna. This way each stream can access all the degrees of freedom of the channel. In the case of two spatial streams the STF parser guarantees that both streams use different antennas in the same time/frequency dimension. This is a form of superposition coding since the two streams are combined additively in the air through the use of multiple transmit antennas. Last but not least the symbols are transformed to the time domain using an IFFT and a cyclic prefix is appended.
This design allows to use the same transmitter and receiver structure both for point-to-point MIMO as well    as distributed MIMO transmission. In the latter case one spatial stream is used at each source and the second stream originates in another part of the network, either in the same cluster or an adjacent cluster. A particular user can decode both streams or simply select the one it requires. In Section 4.1 we derive a low-complexity successive interference cancellation (SIC) receiver for this design.

Implementation of OpenAirMesh
In this section, we show how OpenAirMesh as specified in Section 3 can be implemented as a single-frequency network. The solution makes use of a distributed network synchronization procedure and a dual-stream MIMO interference cancellation receiver. In this section we describe these novel solutions and their implementation on OpenAirInterface. We present results from simulations and field trials in Section 5.
The implementation is based on the CBMIMO1 hardware (cf. Section 2.1) and is thus restricted to two antennas. Therefore we can process up to two spatial streams coming from two different CHs. The extension of the receiver structure to more antennas can be found in [21].

Dual-Stream MIMO Receiver Architecture.
In this section, after the general overview of the receiver structure in Section 4.1.1, we describe two different dual-stream multiantenna demodulators, namely a linear minimum mean squared error (MMSE) receiver (see Section 4.1.3) and an approximate maximum likelihood receiver (see Section 4.1.4) [22][23][24]. The derivations are based on the signal model presented in Section 4.1.2.

Receiver
Architecture. The overall receiver structure is shown in Figure 7. After the CP removal and the FFT, a channel estimation based on the least squares algorithm is performed for CH i using the corresponding synchronization symbol (CHSCH i ). Further, the frequency offset is estimated using the pilot subcarriers in the CHBCH. The main block of the receiver is either the spatial MMSE filter or the reduced complexity max-log MAP receiver described in the following subsections. Finally we perform inverse bit interleaving and Viterbi decoding. Figure 2 with two clusterheads but only one MR (MR2). We assume that each CH has n t antennas and MR2 has n r antennas. Let x ( j) m,q denote n t × 1 vector of the transmit symbols for subcarrier q of OFDM symbol m of CH j, j = 1, 2. We assume that the transmit symbols are taken from a signal set χ j ⊆ C of size |χ j | = M j with a Gray labeling map μ j : {0, 1} log |Mj | → χ j .

Signal Model. Consider the scenario depicted in
Cascading the IFFT and the CP extension at the CHs and the FFT and the CP removal at MR2, the received signal at MR2 at qth frequency tone and the mth OFDM symbol can be expressed as where H (1) q and H (2) q denote the n r × n t MIMO channel between CH1 and MR2 and between CH2 and MR2. The channel is assumed to be frequency selective (i.e., it varies with subcarrier index q) and block fading (i.e., constant over the OFDM symbols of a frame). φ 1 is the frequency offset between CH1 and MR2 and φ 2 is the one between CH2 and MR2 in radians (to convert them to Hertz, multiply by the OFDM symbol rate). z m,q ∈ C nr is the vector of circularly symmetric complex white Gaussian noise of variance σ 2 .
Since each clusterhead transmits only one spatial stream and antenna cycling is used, only one element of x ( j) m,q, j = 1, 2 is nonzero for every m and q. We identify this nonzero element with x ( j) m,q, j = 1, 2 and can rewrite (1) equivalently as where h (1) q and h (2) q are the equivalent channel vectors for the nonzero elements. The complex symbols x (1) m,q, x (2) m,q of the 2 streams are assumed to be independent and of variances σ 2 1 and σ 2 2 respectively. Assuming that the first stream is the desired stream, the signal to noise ratio (SNR) is given by σ 2 1 /σ 2 and the signal to interference ratio (SIR) by σ 2 1 /σ 2 2 .

MMSE Receiver.
Linear spatial filters such as minimum mean square error (MMSE) and zero forcing (ZF) filters can be used to minimize the level of interference in the former case while nulling out the interference in the latter case. Linear MMSE filters exhibit better performance compared to ZF ones and are thus being considered as favorable candidates for future wireless systems [25,26]. However, it is well known that MMSE detection for non Gaussian alphabets in low dimensional systems (low number of interferers) is sub-optimal [27] and moreover MMSE detection cannot exploit the interference structure. The frequency domain MMSE filter M q is given as where P is the diagonal power distribution matrix with the diagonal as [σ 2 1 , σ 2 2 ] and H q = [h (1) q h (2) q ]. The estimates of the transmitted symbols x m,q = [ x (1) m,q, x (2) m,q] T are computed in three steps. Firstly, the frequency offset needs to be compensated by computing y m,q = e −2π jφ1m y m,q . Secondly the spatial filter M q is applied to y m,q by computing x m,q = M q y m,q . Finally an unbiasing operation is performed by computing Post detection interference is assumed to be Gaussian which on one hand reduces the computational complexity but on the other adds to the sub-optimality of MMSE detection. MMSE pre-processing decouples the spatial streams and the bit metric for the i-th bit for bit value b of the symbol x (k) m,q on kth stream is given as denotes the subset of the signal set x (k) m,q ∈ χ k whose labels have the value b ∈ {0, 1} in the position i. Based on these bit metrics, bit log likelihood ratios (LLRs) are calculated which after de-interleaving are passed to the channel decoder.
Implementation. The core of the MMSE receiver is the matrix inversion that is needed to calculate the filter M needs to be calculated for every subcarrier and every frame, but we drop the indices for notational convenience. Since we are limited to a 2 × 2 MIMO system, the matrix inversion can be calculated directly using Cramer's rule A −1 q = (1/ det(A q ))adj(A q ), where adj(A q ) denotes the adjoint of the matrix A q . Care has to be taken to properly scale the intermediate results in a fixed point implementation. The entries of the channel matrices H q are stored in signed 16 bit-wide variables, but their resolution is limited to 14 bit due to the A/D convertors. Since the calculation of the determinant det(A q ) involves terms up to the fourth power, the dynamic range of the determinant can reach up to 48 bit. In order to handle this high dynamic range we first use a 64 bit-wide variable to calculate det(A q ) thus not loosing any accuracy. This intermediate result is then shifted such that max det(A q ) uses 16 bits.
In order to calculate the inverse of det(A q ), we interpret all numbers as fractional Q15 numbers and use standard fixed-point arithmetic. Intermediate results are stored in double precision. The inverse is scaled back by the mean (over all subcarriers) of det(A q ) and saturating to 16 bit. Finally, the MMSE filter matrix is calculated according to M q = (1/ det(A q ))adj(A q )H H q , scaling the intermediate results always to 16 bits.
The high dynamic range of the determinant can cause severe problems. Especially in a frequency selective channel its inverse may saturate on some carriers and can be zero on some other frequencies. This is one of the reasons why the MMSE receiver has a worse performance than the max-log MAP detector described in the next subsection.

Low Complexity max-log MAP Detector.
This detector is a low complexity version of max-log MAP detector and is based on the matched filter outputs [21]. Its low complexity is based on the reduction of one complex dimension. Instead of attenuating the interference this detector exploits its structure and mitigates its effect. Without loss of generality, consider the first stream being the desired stream.
Contrary to the MMSE detection we do not compensate the frequency offsets in the received signal, but instead we integrate them in the channel estimates. Therefore define h (k) m,q = h (k) q e 2π jφkm , k = 1, 2. For clarity we omit the subindices m, q in the following derivation. The max-log MAP bit metric for bit b of the desired stream x 1 is given as [28] where y 1 = h H 1 y be the matched filter output for the first stream and p 12 = h H 1 h 2 be the cross correlation between the first and the second channel. Note that subscripts (·) R indicates the real part. Writing terms in their real and imaginary parts, we have where (·) I indicates imaginary part. For x 2 belonging to the equal energy alphabets (such as QPSK), the values of x 2,R and x 2,I which minimize (7) need to be in the opposite directions of 2(p 12,R x 1,R + p 12,I x 1,I ) − 2y 2,R and 2(p 12,R x 1,I − p 12,I x 1,R ) − 2y 2,I respectively thereby evading search on alphabets of x 2 and reducing one complex dimension of the system. The bit metric is therefore written as For non equal energy alphabets (such as 16-QAM), it is the minimization problem of a quadratic function again trimming one complex dimension of the system. In that case, the real and imaginary parts of x 2 which minimizes (6) are given as where → indicates the quantization process in which amongst the finite available points, the point closest to the calculated continuous value is selected. The reduced complexity max-log MAP detector has a much lower complexity than the MMSE receiver [22]. Furthermore, it can be implemented without any division and therefore it is numerically more stable than the MMSE receiver.

Network Synchronization.
The dual-stream MIMO receiver described above requires timing and frequency synchronization. It has to be assured that the transmit frames are aligned and that the carrier frequency offsets between different nodes are small. The accuracy of the timing, that is, the time difference between signals coming from the two different CHs, has to be smaller than the CP length of the OFDM system. Altough carrier frequency offsets are compensated in the receiver, large frequency offsets cause intercarrier interference and thus degrade the performance of the receiver. We will evaluate the maximum allowable frequency offset in Section 5 by simulation.
Timing synchronization can be achieved by using high accuracy reference clocks, such as rubidium oscillators or global positioning system (GPS) receiver. However, the rubidium oscillators are very expensive (in comparison with other components of the receiver) and very large. GPS receivers on the other hand are not able to operate indoors. Therefore we will take a distributed network synchronization approach.
In nature, distributed synchronization scheme can be observed on the flashing of fireflies [29]. Recently, this nature-inspired scheme has been applied to synchronization in wireless networks [30][31][32][33]. However, most of these works consider the isolated synchronization problem and neglect the actual data communication. The pulse-coupled oscillator model (the model inspired by firefly synchronization) assumes that nodes have to be listening to all other nodes except during its own transmission of the synchronization pulse and immediately afterwards (refractory period). Therefore data transmission can only take place in the refractory period. However, this period must not be very long because otherwise the system becomes unstable [32].
In OpenAirMesh, we follow a similar approach as in [3] for distributed timing synchronization. It is based on the two physical channels CHSCH and MRSCH (see Table 5) which are transmitted in alternating TTIs from the CHs and MRs respectively. Initially we declare one CH to be the primary CH, which is the reference clock in the system. The primary CH continuously sends out a synchronization signal (the CHSCH) that allows every MR within the CH's broadcast region to synchronize to the network. As soon as a MR is synchronized (i.e., when it can detect the CHBCH successfully), it sends out a synchronization signal itself (the MRSCH). A secondary CH not within the broadcast domain of the primary CH can use the MRSCH to synchronize to the network. As soon as a secondary CH is synchronized to the network, it also sends out a CHSCH, allowing further MRs to synchronize, and so on. A positive side-effect of this method is that several MRs form a distributed antenna array when sending out the MRSCH. This means that the secondary CH can benefit from this array gain when detecting the MRSCH.
For the carrier phase synchronization, we use off-line calibration prior to the system deployment. However, the granularity of the calibration on the CBMIMO1 cards is in the order of 500 Hz, causing residual frequency offsets.

Integration.
The dual-stream MIMO receiver and the distributed network synchronization procedure described above enables the implementation of a single-frequency mesh network. Since all CHs are synchronized and transmit on the same frequency, MRs that are in the broadcast domain of two such clusters must use the dual-stream MIMO receiver to decode the CHBCHs of both CHs concurrently. But the receiver can also be used for the SACH (both in the uplink and in the downlink) allowing the CHs to schedule their resources independently (and thus significantly reducing the signalling overhead). For the downlink SACH, the receiver is used exactly the same way as for the CHBCH. On the uplink the MR transmit two independent data streams as described in Section 3.3.2 and the CHs decode only the stream dedicated for them, treating the other one as interference.

Experiments and Results
In this section, we investigate the performance of the two dual-stream receiver structures described in the previous section. Firstly, in Section 5.1, we perform computer simulations of the two receiver structures using a simple synthetic channel model. Secondly, in Section 5.2 we present performance results from the real-time implementation on the OpenAirInterface platform. Last but not least, in Section 5.3 we present field trial experiments that were conducted within the CHORIST project (http://www.chorist.eu/) close to Barcelona, Spain in February 2009. All performance comparisons (both for simulation and lab tests) were done using the broadcast channel (BCH) of the primary clusterhead (CH1) with interference from the BCH from the secondary clusterhead (CH2). The BCH uses QPSK modulation and rate 1/2 convolutional code. The block length is 1056 bits, which corresponds to 8 OFDM symbols with 132 data subcarriers each. We use 2 antennas on all nodes.

Computer Simulations.
In the computer simulations we isolate and study the effect of the following phenomena on the performance of the receiver: (i) channel state information at the receiver (CSIR), (ii) frequency selective fading versus. frequency flat fading, (iii) Rayleigh fading versus. Ricean fading, (iv) receive antenna correlation, and (v) frequency offsets.

Channel Model.
For the simulations the 2 × 2 MIMO channel matrices H (1) q and H (2) q are modeled as spatially white and independent. The channel is assumed to be constant during a block and varies independently between blocks. We use both a frequency flat fading model as well as a frequency selective model. In the frequency flat case the channel matrices stay constant over all subcarriers q with channel coefficients drawn from a Rayleigh distribution with unit variance. In the frequency selective case we model the channel as a tapped delay line with 8 sample-spaced taps with an exponential decaying power delay profile. Each tap undergoes Rayleigh fading. If line-of-sight (LOS) is present, the first tap undergoes Ricean fading. Receive correlation is modeled by multiplying (from the left) the MIMO channel matrices with the square root of the receive correlation matrix R Rx = 1 ρ ρ 1 .

Simulation
Results. The two receiver structures described in Section 4.1 were implemented in fixed-point C. The simulation model follows the model (1) with the difference that the C simulator includes the IFFT and CP insertion at the transmitter and the corresponding FFT and CP removal at the receiver. The channel is thus simulated in the time domain rather than in the frequency domain. Further, we can simulate carrier frequency offsets. By not adding any noise on the pilot symbols we can also simulate the case of perfect CSIR. This allows us to study the impact of imperfect channel estimates on the two receiver structures.
We perform Monte Carlo simulations with the MMSE receiver and the max-log MAP receiver. We fix the SNR for the first stream and vary the interference from the second stream on the y-axis. Each figure shows the frame error rates of the first stream for both receivers and several SNR values. We only show a representative subset of the simulation results.
(i) Figure 8 shows results for the frequency flat Rayleigh fading channel with perfect CSIR.
(ii) Figures 9, 10, and 11 show results for the frequency flat and the frequency selective Rayleigh fading channel as well as for a frequency selective Ricean fading channel with a Ricean K-factor of K = 10.
(iii) Figure 12 shows results for a frequency flat Rayleigh fading channel with a receive antenna correlation of |ρ| = 0.75.
(iv) Figure 13 shows results for the frequency selective channel, where CH2 has a frequency offset of 1500 Hz w.r.t. the receiver.
We can make the following observations. First of all the max-log MAP receiver always performs better than the MMSE receiver. Together with the fact that the max-log MAP    receiver has actually less complexity that the MMSE receiver [22], it is clearly the first choice for such a system. Also note that the performance of the max-log MAP receiver actually gets better when the interference gets stronger. Channel estimation errors have a stronger impact on the max-log MAP receiver than on the MMSE receiver. On the other    hand it can be seen that in a frequency selective channel the max-log MAP receiver profits most from the the additional diversity, while the performance of the MMSE receiver hardly improves. This is due to the fact that the max-log MAP receiver has full diversity gain while MMSE receiver loses one order of diversity [34].  In a LOS Ricean fading channel, the performance of the max-log MAP receiver is better than in a Rayleigh fading channel only if the power of the interference is either stronger or weaker than the desired signal. If the powers are similar, a LOS component is not beneficial for the performance. Frequency offsets also have a very strong negative impact on the performance of the system. In fact as the interference gets stronger the max-log MAP receiver is not able to cancel out the interference as good as in the case with no frequency offsets. Last but not least, receive correlation also degrades the performance of the system.

Test Setup.
The dual stream receiver was tested in the lab using an extended version of the Eurecom MIMO OpenAir Sounder (EMOS) [2]. The EMOS can be seen as a stand-alone version of the physical layer of the OpenAirMesh testbed. Only the synchronization symbols (CHSCH, MRSCH) and the broadcast channels (CHBCH, MRBCH) are transmitted. Instead of the scheduled access channels (SACH), additional pilot symbols are transmitted that can be used for channel sounding purposes, but this functionality is not presented in this paper. Instead we record the frame error rates (based on the CRC check) of the CHBCH. Note that the real-time system uses the same fixed point code for the receiver as the simulator.
For the experiments we set up three nodes (CH1, CH2 and MR2) in our lab. To simulate different SNR levels at the receiver we changed the transmit powers of the two CHs between −20 dBm and 0 dBm.    Figure 14 shows the FER of the first stream for different SNR values for the max-log MAP and the MMSE receiver respectively. It is worth noting that in the test setup we encountered a frequency offset of the second stream of around 500 Hz, while there was very little offset on the first stream (less than 10 Hz). This is very likely the reason why the interference cannot be cancelled out that well at a high interference level. Another difference to the simulations is the correlation of the MIMO channel, which was approximately |ρ| = 0.9 in the measurements. As we have seen in the simulations this has a very strong impact on the results. Nevertheless, we can still observe similar trends in the measurements as in the simulations, such as that the MMSE receiver has in general a worse performance than the max-log MAP receiver.

Field Trials.
One major application of OpenAirMesh platform is the demonstration of rapidly-deployable broadband ad-hoc communications systems for public safety units in interventions following natural disasters and industrial accidents. Such a demonstration took place in February 2009 in Bellaterra, Spain in the context of the European project CHORIST, which is funded by the 6th framework program of the European Commission. During the trials we set up a mesh network with five nodes as depicted in Figure 2 on the parking lot of the fire brigade building in Bellaterra (see Figure 15). MR1 was placed on the roof of the building and served as an edge router establishing the connection to the core network and the control room. CH1 and MR2 were placed on the parking lot in front of the building. CH2 and MR3 were placed behind the building, such that there was no connection between CH1 and MR3 as well as between CH2 and MR1. MR2 was in the broadcast domain of both CHs, relaying traffic between them.
Both MR2 and MR3 were used as gateways to other networks. Two different end-to-end applications were tested on the network: a video surveillance application and a pushto-talk VoIP application. During the trials we managed to establish a reliable connection (in the sense that both applications were running smoothly) between MR1 and MR3. See the CHORIST website (http://www.chorist.eu/) for more details and a video of the demo.

Conclusions
In this paper, we have shown the feasibility of distributed network synchronization and distributed MIMO on the realtime open-source OpenAirInterface platform. We conclude this paper by describing a few lessons we have learned during the implementation and the field trials.
Synchronization is a prerequisite for the dual-stream MIMO receiver described in this paper and other cooperative communication schemes. We have seen that the proposed synchronization is feasible for small scale networks in indoor and medium-range outdoor scenarios. For larger networks, the requirement of a single reference clock is somewhat restrictive, since when it fails the whole network fails. Also it is not proven that the algorithm is stable in larger networks. We are planning to investigate this issue in future works.
As for the implementation of the dual-stream MIMO receiver we have seen that the reduced complexity max-log MAP detector has several advantages over the linear MMSE receiver. First of all its performance is much better (both diversity and coding gain), especially when the interference level is high. Further it can be implemented without any divisions which is very advantageous on a fixed point processor. The implementation of the MMSE receiver on the other hand requires a matrix inversion, which is not trivial using fixed point arithmetic.
During the trials we have also seen that the dual-stream MIMO receiver is very sensitive to channel conditions. The best performance is achieved if the two transmitters have a line of sight to the receiver and if the receive correlation is small. However, positioning the nodes and their antennas in such a way is not trivial. In case of the max-log MAP receiver significant differences in the received powers form the two sources can also improve the performance.
In future work we would like to include distributed space-time coding and collaborative beamforming into Ope-nAirInterface. This could for example be used in a multiple relay channel, when several relays are placed between two clusterheads. One particular aspect we would like to investigate are the consequences of such scenarios on design aspects related to spatial HARQ and channel coding mechanisms.