Self-Organizing Architecture for Information Fusion in Distributed Sensor Networks

. The management of heterogeneous distributed sensor networks requires new solutions that can address the problem of automaticallyfusingtheinformationcomingfromdifferentsourcesinanefficientandeffectivemanner.Intheliteratureitispossible tofinddifferenttypesofdatafusionandinformationfusiontechniquesinusetoday,butitisstillachallengetoobtainsystems thatallowtheautomationorsemiautomationofinformationprocessingandfusion.Inthispaper,wepresentamultiagentsystem thatbasedontheorganizationaltheoryproposesanewmodeltoautomaticallyprocessandfuseinformationinheterogeneous distributedsensornetworks.Theproposedarchitectureisappliedtoacasestudyforindoorlocationwhereinformationistaken fromdifferentheterogeneoussensors.


Introduction
During the last decades, sensor networks have become increasingly relevant and nowadays are present in practically all sectors of our society [1].Their great capacity to acquire data and act on the environment may facilitate the construction of smart environments, allowing a detailed and flexible analysis of processes that occur in the environment and the services that can be provided to the users.
Despite the many advances that exist today, the interconnection of sensor networks is challenging due to the complexity of interaction with the different existing technologies and communication protocols, each of which has certain requirements [2,3].Additionally, it is a challenge to efficiently manage information generated by sensors.This may be complicated because the data can be inconsistent due to different reasons, highlighting (i) the deployment over open environments which lack the ability to integrate sensor technologies (they are highly complex and heterogeneous) or (ii) the difficulty to merge the information from such sensor networks in a simple and efficient manner.Thus, at present there is a growing need for versatile open platforms capable of integrating different sensing technology over wireless technologies, fusing data sets from heterogeneous sources, and intelligently managing the generated information.Nowadays there are already some architectures or frameworks that allow the interconnection of sensors, both in academia [2,4,5] and in industry [6][7][8][9].However, the reality is that existing platforms are designed for a specific end, using a specific technology stack for each deployment and offering a very specific set of services whose functionality is very limited [10].Therefore, current systems are limited by the preinstallation of infrastructure, and integrators have to face the decision of choosing between other technologies or adapting their existing systems and infrastructure [11].It is also difficult for integrators to combine the information obtained from heterogeneous wireless sensor networks [12][13][14], since there are no adequate tools to do so.
In this paper, we present an open platform that will facilitate the integration and management of sensor networks and provide services based on the information provided by mentioned sensor networks.One of the main novelties of the platform is its open character, allowing the addition of networks built using different existing technologies as well as new technologies or sensor networks as well as the integration of the data in a cloud computing environment.Open systems exist in dynamic operative environments where new components are integrated or existing components leave the system continually and where even the operating conditions can change unpredictably.Open systems are characterized by the heterogeneity of their participants, limited trust, individual goals in conflict, and a great probability of disagreement with the specifications [15].The proposed platform combines virtual organizations of multiagent systems [15][16][17] with information fusion algorithms [8,12,13], which provides great adaptability to changes in the controlled environment and a more efficient usage of the technologies available in that environment.In addition, we propose to integrate the virtual organization within cloud computing environments, providing ubiquitous computation and communication mechanisms, and an adequate environment for the application of intelligent algorithms, which will allow the extraction of knowledge from great volumes of data, as well as the customization of the services offered to the end user.These mechanisms differ from the existing security alternatives, which are based on firewall-like models and do not enable dynamic adaptation.As of today, a platform like this has not been developed, that is, a platform with an open character, able to incorporate information fusion algorithms, capable of integration with cloud technology [18] built upon the base of social computation.
This paper proposes a level-based information fusion architecture for distributed sensor networks.This model will be able to obtain information provided by the sensor networks and to be integrated with virtual organizations of agents.The model proposes the use of signal filtering algorithms, normalization services, and other signal processing services at a basic level.It also proposes new algorithms for information fusion at higher levels that could be integrated with intelligent agents, following processes similar to the mixture of experts [19], which will obtain the data coming from heterogeneous sources and merge it according to the preferences of the environment.Each expert will be implemented by an intelligent agent, making it easy to dynamically add it to the architecture.Each mixture of experts will also be implemented by an intelligent agent.These systems will incorporate probabilistic models, Bayesian networks, fuzzy systems, and image recognition algorithms to process information from sensors.Also, the system will incorporate agents to process and automatically generate efficient information fusion processes using statistical analysis of workflows based statistics.The information fusion will incorporate neural networks, Bayesian models, and linear programming.
This paper is organized as follows: Section 2 revises the related work, Section 3 describes the proposed architecture, Section 4 presents the case study in which the platform is applied, and finally Section 5 shows the results and conclusions obtained.

Related Work
Nowadays, there is a lack of open platforms that integrate new technologies of sensors and that are capable of efficiently fusing information from such networks and have capacities for self-organizing [18,20,21].If we revise the related work, it is possible to find that traditionally WSNs have been developed without considering a management solution that can dynamically adapt to the changes that occur in the environment and the user needs.In Table 1, we present a review of the existing approaches for the management of WSNs.
As can be seen in Table 1 most of the existing approaches are designed for specific environments or specific purposes and none of them combine incorporate capacities for selforganization and open integration.Besides, the possibility of using information fusion techniques is not explicitly defined as a capacity of the architecture.Although significant progress has been made in the development of architectures to manage  wireless sensor networks, at present there is not a single open platform that efficiently integrates heterogeneous wireless sensor networks and provides intelligent information fusion techniques and intelligent services.The use of multiagent systems with capabilities for self-organization and dynamic management of information fusion workflows can notably help to improve the management of heterogeneous sensor networks.
Information fusion is applied to classification problems or merging the solutions of different systems [22,23].However, it is not possible to find  solutions that allow the semiautomatic creation of workflows of information fusion that would reduce the development time integrating new components.Some existing classifiers include bagging and boosting [23,24] techniques, which allow merging the outputs of several classifiers to improve the results of the individual ones.However, their results are not always satisfactory as we have proven in previous works [25,26].In order to find the desired behavior for our system, we can revise the structure of the mixtures of experts used in artificial intelligence, which merge the information based on the output provided by several experts [27].This issue has been explored in several studies to improve the predictions processes [27,28].It is also common to find tools such as Spring Integration, Apache Camel, or even Weka, which can define workflows in data analysis.However, it would be of interest to obtain a new architecture that would learn from the information fusion workflows and generate new flows automatically or semiautomatically, as previously stated [17,29].This led to the possibility of creating an open architecture with capacities for learning and self-organization making use of virtual organization of agents [15,17,29].Virtual organizations of agents combine the multiagent paradigm and the theory of organization, allowing the design of artificial societies regulated by organizational aspects [30,31].In the next section, we describe our proposal, a multiagent architecture based on virtual organizations, specifically designed to manage distributed sensor networks in an organizational way.

Proposed Architecture
Our proposal consists of a multiagent architecture based on virtual organizations that integrates with an information fusion model.The architecture is designed on the basis of the previously developed PANGEA platform [31], a multiagent platform that facilitates the management of virtual organizations of agents, including self-organizing capacities.A complete description of PANGEA can be found in [31].The new architecture is organized in 4 layers as can be seen in Figure 1 and is developed according to the classification of the JDL fusion model [31,32].
The platform will facilitate the integration of sensing technologies, regardless of their nature, and will provide an openness environment that allows dynamic International Journal of Distributed Sensor Networks addition of new sensor systems and technologies.For this, the platform will provide data encapsulation mechanisms that standardize the information received from standard communication protocols such as WiFi and ZigBee.In this way, the platform will have sufficient dynamism to allow for the incorporation of emergent technologies.The incorporation of a new sensor network technology to the platform depends on communication technology used by the network.Layer 0 of the platform is a broker that defines communication with sensor networks of different nature (Wi-Fi, ZigBee, Bluetooth, etc.) and gets the raw data from sensor networks.This process of obtaining raw data from sensor networks is associated with Level 0, Data Assessment of the JDL model.This process of acquiring raw data from sensor networks is associated with Level 0, Data Assessment.The authors of this paper have conducted some preliminary research in which data from ZigBee and Wi-Fi sensors were obtained [10,[29][30][31].The process of obtaining data requires studying the technologies and existing communication protocols for sensor networks and defining a new broker and adapters to provide the platform with openness.The main novelty of this layer is the ability to provide the platform and the upper layers with openness regarding the connection to sensor networks of different natures and thus ensure that upper layers of the architecture have access to information and are able to perform data fusion at different levels.
(ii) Layer 1: Low-Level Services.The contextual information obtained from layer 0 is processed by a set of low-level services.After obtaining the raw data, a gateway [20] is defined through adapters that allow standardizing the information received.The data processing corresponds to Level 1, Object Assessment of the JDL model.In this first stage, the platform provides services such as filtering of signals, normalization services, or other treatment services at the basic level signals.These services are provided by the adapters and will be associated with algorithms that perform initial treatment of the data.Each of these services exposes an API to higher layers which allow interaction with each low-level service and thus with the sensing/performance underlying technologies.
(iii) Layer 2. This layer includes Levels 2 to 4 of information fusion displayed on the JDL model.The platform is structured as a multiagent system based on virtual organizations.Each organization contains the roles required to facilitate an intelligent management of the information obtained from the lower levels of the architecture.The multiagent system incorporates agents specifically designed to interact with low-level services.In addition, we propose the design of intelligent agents specialized in information fusion.For this purpose, roles that allow merging information automatically through supervised learning and previous training are included.This procedure is similar to the one used for the mixture of experts [32,33].The layer incorporates different statistical techniques based on linear programming [34], neural networks [32], and classifiers [34].Thus, specialized agents in information fusion can combine different sensing technologies that provide heterogeneous data and provide more accurate information to the upper layer services.Besides providing information fusion techniques, this layer will provide an automatic generation of information fusion flows between different levels that will be described in detail in the following subsection.
(iv) Layer 3. The top layer of the platform provides an innovative module that allows management and customization of services to end users, thanks to the capabilities of multiagent system which is deployed over the Cloud Computing environment.In this layer Levels 5 and 6 according to the JDL model are included.At these levels tasks associated with the man-machine interfaces are performed according to the characteristics of the user and facilitate decision making by the user.

Layer 2: Workflow and Fusion Organizations.
In this paper, we will focus on layer 2 of the architecture, which defines different organizations of agents specialized on information fusion.Specifically we will focus on two organizations: workflow organization and fusion organization.The former aims to explore and find new workflows [17] that can analyze the data more optimally, while the latter creates a series of optimization techniques to optimize the process in terms of different variables.In addition to these two virtual organizations, the architecture will have its own organizations from the PANGEA architecture [31] and those of the case study as well.
3.1.1.Workflow Organization.This organization contains agents that play roles such as analysis workflow, prediction workflow, and workflow processing.The analysis workflow is responsible for analyzing a given workflow and for creating new workflows in an automatic way [21].The workflows combine different information fusion techniques and algorithms integrated into the virtual organization for data processing for a specific case study.To carry out the analysis and prediction of the workflows, statistical techniques based on statistical models are applied to estimate the most appropriate execution flows [21].The workflows obtained will be presented to the top layer, as a high-level API specific for each service, so that it is possible to use them in an easy and simple way.The extraction of the relevant action is carried out applying different tests according to the characteristics of a given action.The chi-squared test is a nonparametric test that can be applied to analyze the independence of two variables  and ; the null and alternative hypotheses are H0:  and  are independent and H1:  and  are not independent.Applying this test is recommended ensuring that 80% of the expected frequencies of the contingency table have a value greater than 5.If the restriction is not met, it is possible to introduce  a correction by Yates [35] or Monte Carlo simulation [36] should be applied if the frequency is between 3 and 5.If the value is lower than 5 we should apply a Fisher's exact test [37].The statistical test for the chi-square is defined by where   is the value of the row  and column  of the contingency table and  = 0, where Yate's correction is introduced  = 0.5.Once  2 exp has been calculated, it is possible to define the significance level  defined by  2 exp =  2 (−1)(−1); , which allow us to determine the importance of the variable.
Fisher's exact test is usually applied for 2 × 2 contingency tables but it can be applied to higher dimensions; for example,  software performs a chi-squared test if Cochran conditions are true and in other cases Fisher's exact test may be applied.Thus, the flowchart in Figure 2 represents the selection process for the statistical test to establish if a variable is relevant or not to the problem.
According to the  value, calculated in (1), it is possible to determine the degree of relevance for each of the possible actions, as shown in where  is the service that separates the set of data  in  1 ,   ,    is the number of elements in   ( = 1 nonefficient and  = 2 efficient),  are the classes in ,   is the elements in ,  max is the maximum error in the workflows, and    is the average error in  and class   .
Once the weights for each of the services are defined, it is possible to obtain a graph as shown in Figure 3 that captures the knowledge about the relevance of the actions.The graph contains information about the inputs and outputs of each of the services, which can limit the connectivity between the different nodes in the graph.To select a workflow, we apply Floyd's algorithm [22].Those nodes that are connected to a fusion node are added to the workflow if the value obtained by the algorithm is higher than the previous one.

Fusion Organization.
The fusion organization includes algorithms to select and combine the optimum technique(s) according to the characteristics of the service or information that will be fused.In this organization we define different techniques for information fusion such as multilayer perceptron (MLP) [38] and Bayesian networks [39] that can work in a similar way than a mixture of experts, selecting the inputs that provide better results [25,27].These are supervised methods that require a previous training and it is necessary to have available a dataset for training the system.As an alternative to the supervised methods, we also incorporate a linear programming method to the fusion organization, based on the simplex algorithm [33].
In the next section, we present a case study where the proposed approach is evaluated.In the case study, we will focus on the information fusion capacities of the architecture.

Case Study
In this case study, we present the application of the proposed platform to an indoor location system that combines information coming from four heterogeneous sensor networks: WiFi system, BLE based system, accelerometers and compass, and cameras.The environment for testing the proposed architecture was located at the University of Salamanca, Spain, and contains a WiFi network, a BLE network, and a set of cameras and the users were provided with smartphones that gather information by means of the accelerometer and the compass.The architecture was deployed in the test environment making use of Wi-Fi agents, BLE agents, camera agents, user agents, workflow agents (analysis, prediction, and workflow agents), and fusion agents (ANN, classification, and linear programming agents).The WiFi indoor location system is based on the use of fingerprints on a map of intensities.In a first stage, it is necessary to generate the maps of intensities.Once the maps are generated, the location of the users can be obtained by comparing the current level of intensities to those stored in the map of intensities.Thus, the first step is to obtain a representative number of samples of intensities that allow us to define a radio electric map of the environment.For example, Figure 4 corresponds to a high-level service of layer 3 of the architecture designed for mobile devices and presents a route carried out to obtain the initial fingerprints in a laboratory of the University of Salamanca.Each fingerprint is composed of different measurements about the levels of intensity of the WiFi signal and contains the MAC addresses and RSSI level of each of the hotspots detected at a given location.
Once the fingerprints have been obtained, we create a classifier that allows calculating the probability of belonging of a measurement to each of the available fingerprints based on a significance level and according to a normal distribution [39].The reason for creating a classifier agent for each of the floors of the building is to avoid an excessive growing of the classifier agent and the corresponding computational cost for the Smartphone at the time of estimating the position of the user.To determine the specific floor where the user is located, we apply a RandomTree classification agent.Once the floor is determined, the classifier agent that was previously created for that floor is selected and the system can proceed   to calculate the location of the user.This way, it is possible for us to reduce the time required to calculate the location of a given target and to have a more realistic approach when working with resource-constrained devices.
To calculate the user location, we use a Bayesian network integrated within a classifier agent that takes into account the probability for a given input of belonging to each of the previously scanned fingerprints.Taking into account the probability of belonging provided by the Bayesian network, we proceed to triangulate the location of the users in intermediate points of the route.Besides, to stabilize the position of the users, we apply temporal series, which allow us to avoid continuous changes in the position of the users.The algorithm proposed to locate the users in the environment is shown in Algorithm 1.
The algorithm based on WiFi technology can be improved if the information from additional sensors is taken into account.More specifically, we use information coming from the accelerometer and compass of the Smartphone, the information provided by cameras, and the information provided by BLE tags.The accelerometer provides a way to define a step detection system that can be used together with the compass to locate the user.This system is available by default in modern smartphones that are used in this case study such as the Sony Xperia Z3, but for older devices, it is necessary to design a new algorithm based on thresholds [39] that is shown in Algorithm 2.
The main problem that presents a location system such as the one presented in this case study is the error reset.To correctly reset the error of the system it is required to determine the location in an accurate way.If this is not possible, the location will be provided based on the step detector and on the compass.To obtain the location of the user in a very accurate and exact manner, it is necessary to combine the WiFi system with complementary technology.In our case, we will use BLE tags and the camera of the mobile phone that were installed along the environment in the different floors of the building.Our first option was the use of BLE tags.We defined a threshold for the BLE signal and, when the RSSI level obtained for the signal is under the threshold and over the minimum detected value, we identify a situation where the user is close to a BLE tag.
The problem of the BLE tags is the variability of the signal when the user moves away from the tag.The levels of the  signal do not have a meaningful variation when the distance is up to 1.5 meters; thus, it is difficult to detect changes in the signal for instance when the distance is between 2.5 and 4 meters.In the case study, we distributed the BLE tags in each of the floors as shown in Figure 5. Figure 5 shows a highlevel visualization service that is offered to the user in layer 3 of the architecture.There are locations where the BLE tags were installed by default such as elevators, stairs, corridors, or doors.
As an alternative to the use of BLE tags, we also use the camera of the mobile phone.The camera detects patterns that are used to locate the users in an accurate manner.The feature detection is carried out by means of the FAST algorithm [12] and the search of features is carried out through a kd-tree [40].
The information obtained from the different heterogeneous sensors is combined by means of a neural network integrated within a classifier agent.The MLP neural network acts as an information fusion mechanism and is designed taking into account the parameters presented in Table 2.
The set of low-level services that interact with the sensors and provide the necessary processing and normalizing capacities for the indoor location high-level service to the user is presented in Table 3.

Results and Conclusions
To evaluate results obtained testing the proposed architecture in the case study, we proceeded to calibrate three floors of the Multiusos  +  Building at the University of Salamanca, Spain.Each of the floors has an area of 53 × 54 meters.Algorithm 2 and Figure 5 show screenshots of the indoor localization service in the second floor of the building.The other two floors have a similar distribution of the space.The WiFi calibration process was conducted as shown in Algorithm 2, and we used a similar route for the rest of the floors.In the second floor of the building, we obtained a total of 2031 measurements distributed in 138 locations.That is, we obtained 138 fingerprints with 14 measurements for each fingerprint.In the case of the camera based system, we Fusion Mixture of experts to fuse the information [33] obtained 20 images per floor that correspond to the names of the rooms that are placed in the walls, close to the door of the room.
The training and identification of location areas were implemented through a training service, which is executed in the location server.This is due to the training process and the tests to analyze the functioning of the calibration process are carried out offline, since these are processed requiring dedicated computational resources.Furthermore, it is easier for the user to interact with the architecture to define the location areas in a simple manner as well as to monitor the calibration process.As can be seen in Figure 6, the user can define the location areas, in this case the second floor of the building is an area in our case study, and new polygon can be defined to create new areas.
Once the training data were obtained using the low-level services defined in layer 1 of the architecture, it was possible to establish the initial settings for the workflow and fusion organizations of agents.The initial test consisted of a calculation of the error for each of the individual location techniques used in the case study.In the case of the WiFi based algorithm, the calibration obtained during the training process was compared to the estimations obtained in a second route.In the case of the accelerometers and the camera, we obtained data in different routes in such a way that when a BLE tag or an image was detected, the difference between the estimation obtained and the real location of the BLE tag or the camera was computed as the error rate.For the rest of the points in the route, the error was computed for each of the steps of the user.Table 4 shows the average error rate obtained for all the floors in the building for each of the individual location services defined in the architecture.The routes correspond to each of the lines presented in Figure 6 taking also into account crosses and changes of direction.Nine possible routes were defined for the second floor of the building.The accomplishment of the routes allows the evaluation of the image processing location technique and gathering data to evaluate the functioning of the WiFi location system.The WiFi location service was tested using different classifiers.Table 5 shows the results obtained for the WiFi location service.As can be seen in Table 5, the WiFi location algorithm proposed in the architecture notably improved the performance of existing classifiers.The classification processes were executed on a i7 3612QM, 8 GB RAM server.
After testing the individual location services, we evaluated the self-organizing capacities for the architecture.More specifically, we evaluated the fusion organization (fusion location service) and workflow organization defined for the case study.In the case of the fusion organization, taking into account the definitions given in Section 3.1.2,we chose a MLP International Journal of Distributed Sensor Networks  technique incorporated inside an agent at the fusion organization level to merge the location algorithms, as the inputs for the MLP are continuous.The self-adaptation capacities of the organization are mainly given by the MLP agent, as it automatically rescales the input variables to take values into the interval [0.1, 0.9], and when there is no data available from a sensor, the value is −1.Besides, the MLP agent was trained taking into account the inputs for each of the sensors and with a value of −1 for the rest of the inputs.The hidden layer of the MLP is designed with 2 + 1 [41] neurons, where  is the number of neurons in the input layer.The output layer is composed of three neurons that correspond with the coordinates (, , ).In addition to the MLP, the linear programing method defined in [34] was also implemented in the fusion organization in order to schedule the different actions to be carried out.
In the case study, the workflow organization was trained to manage twenty flows that were applied according to the techniques shown in Table 2.The workflow organization allows us to dynamically define different possible workflows for the organization, which can be defined and used in real-time in the virtual organization.Thus, it is possible to dynamically make decisions in the workflow organization to provide the most efficient workflows for the problem at a given moment.Algorithm 2 shows three of these flows.The efficiency of each action was estimated according to the efficiency of each flow.The best flow was determined by applying the process presented in Section 3.1 and was estimated by the system.The error obtained was 1.43 meters, which is lower than the error obtained by the individual location techniques and lower than the error obtained using previous workflows.The workflow ranked with a lower error rate (1.57meters) is the one shown in blue in Figure 7.
The system provides self-organizing capacities, as it can automatically adjust workflows and merge information from different low-level services to obtain a more precise location.The neural network can perform this task efficiently, improving the precision of other systems such as RSSI signal levels.The basic reason why the proposed approach improves previous results is that MLP selects the technique with a lower error rate taking into account the training data.For example, the WiFi location system does not work well in open spaces such as the area shown in Figure 8  available, the location was carried out via WiFi.The camerabased location systems were not used most of the times, given the difficulties for obtaining a natural way of using the camera of the mobile device to detect the patterns in the doors of the rooms.
The camera-based system processes the images and detects patterns that correspond to the names of the rooms.The performance of this system was not good mainly due to the problems to detect the pattern with the camera of the smartphone, as it is difficult for the user to use the camera and detect the patterns while walking.In the case of the BLE based location service, the performance can be considered as good, although it is possible to have problems when the user has an erratic behaviour and moves and changes his orientation continuously or agitates the smartphone.Regarding the sensors of the smartphone, in the current service, the user orientation is determined by means of the position of the smartphone and taking into account that the user holds the mobile looking at the screen.If this position changes, or if the user moves in a different way compared to the usual one, the behaviour of the system can be erroneous.To determine the user orientation it is possible to take into account the linear accelerometer sensor of the smartphone, which together with the compass allow us to detect the direction of the movement of the user.This option will be evaluated in future works.The WiFi location system is the more stable, but the intensity maps have to be continuously updated.
From the point of view of the organizational architecture with information fusion capacities, we have proposed an adaptive approach that calculates the workflow with the minimum cost and optimum error rate.It was necessary to include negative weights in the connections of the graph to avoid that the fusion nodes can add all the services that are connected to them.In this way, it was possible to reduce the number of connections for the fusion nodes.As an alternative to the workflow selection approach presented in this paper, it is possible to consider another gain function, as the Gini index or the Bayes probabilities.In this sense, there exist varieties of methods that can be incorporate to the architecture in order to fuse information in a dynamic way.In this paper, we have presented an innovative algorithm that proposes an information fusion technique depending on the variables in the environment: for continuous variable the MLP is selected, for nominal variables the classifier is the option chosen, and in those cases where a supervised learning technique is not possible, the system proposes a linear programming solution.

Figure 2 :
Figure 2: Flowchart to select the correct statistical test.

Figure 4 :
Figure 4: Screenshot of the WiFi indoor location service for smartphones.

Figure 5 :
Figure 5: BLE tags location in the environment.

Figure 7 :
Figure 7: Execution flows in the system.

Figure 8 :
Figure 8: (a) Open space in second floor and 2 (b) corridors with no BLE tags.

Table 2 :
Input and output variables in the neural network.) coordinates , .The true position of the user

Table 3 :
Low-level services available for the indoor location high-level service.

Table 4 :
Average with the different sensors.

Table 5 :
Average error according to the different classifiers.