MPD-Model: A Distributed Multipreference-Driven Data Fusion Model and Its Application in a WSNs-Based Healthcare Monitoring System

We first propose an MPD-Model, a novel distributed multipreference-driven data fusion model for WSNs. Here, preferences are looked as the core elements of collaboration mechanism in a data fusion procedure. We then present MFA, a distributed multi-preference feature-level fusion algorithm based on weighted average method. Next, to implement feature extraction of wrist-pulse data, we propose FEA, a light-weight adaptive feature extraction algorithm for time series sensed data. Simultaneously, we design TFD-Pattern that is a unique human pulse pattern. Based on historical data, we propose an SVM-based algorithm for health status detection tasks. Finally, we implement the proposed methods in a real wearable healthcare monitoring system which had been previously developed in-house. We validate the proposed methods using real-world data sets with 2046 pulse samples. Experimental results show that the proposed methods outperform the baseline methods, and the proposed MPD-Model is reasonable and effective.


Introduction
The rapid development of Wireless Sensor Networks (WSNs) brings some new situations. On the one hand, WSNs applications rapidly expand from traditional fields, for example, Military and National Defense and Environmental Monitoring, to emerging civil fields such as intelligent transportation, healthcare monitoring, and smart home. On the other hand, more mobile devices (e.g., smartphones) or local network systems (e.g., Body Sensor Networks) are frequently treated as one single node of WSNs. Undoubtedly, the situations will lead to form a large-scale and complex sensor network. We think the network is a typical Internet of Things (IoT) [1] and it have some characteristics such as interactivity and sociality. Simultaneously, It emphasizes intelligence and sensing-actuating ability of WSNs applications. Consequently, sensed data generated under the situations have the following features [1][2][3][4]. (1) The data has polymorphism and heterogeneity. (2) The data is massive.
(3) The data is real time. (4) Storage locations of the data have much diversity. (5) The data has high complexity and strong relationships.
In the new situations, users play more and more importance to intelligence of sensor network systems because of the growing actual demands. But it is very difficult to implement the intelligence due to seriously limited resources on nodes and the lack of historical sensed data. This is particularly true in the healthcare monitoring applications based on WSNs. Fortunately, system detection/identification accuracies and intelligence can be greatly improved if multisource data fusion technologies are fully leveraged. However, if so, some challenges still exist below. (1) How to select and evaluate those multisource sensing parameters to fuse? (2) How to establish novel data fusion model/mechanism to improve detection accuracy? (3) How to leverage limited resources on nodes to automatically recognize complex parameter patterns? 2 International Journal of Distributed Sensor Networks To address these challenges, we propose an MPD-Model, a novel-distributed multi-preference-driven data fusion model for WSNs. Based on traditional data fusion architecture, MPD-Model takes regard preference information as core elements of collaboration mechanism in a data fusion procedure. It introduces large-scale historical sensed data into machine learning for data fusion, treats the massive data as real-world test set, and thus effectively ensures enough identification accuracy and high intelligence of WSNs application systems. Here, preference is defined as a kind of description information being different from existing nature, accuracy, and detail level information of monitoring tasks in traditional data fusion method. It can reflect subjective wishes of individuals or a group. For example, the preference, Task Parameter Relevance, just is a kind of importance degree information telling that the accuracies of detection tasks depend on sensing parameters selected for fusion. It has two characteristics: uncertainty and dynamics. The former means that preference information varies with both the size and the type of WSNs applications. This well embodies individual data fusion. The latter indicates that preference information can be quantified and then evaluated by formalizing them.
We systematically address the problems of data fusion and health status detection in a healthcare monitoring system based on WSNs with the following contributions.
(i) To address the basic problem of data fusion, we propose an MPD-Model, a novel-distributed multipreference driven data fusion model for WSNs. Based on MPD-Model, this paper conducts the individual data fusion of multisource sensing parameters in a real-world healthcare monitoring system. MPD-Model can not only solve incompleteness, inaccuracy, and uncertainty problems of preference information, but also significantly improve detection accuracy and intelligence of WSNs applications. According to MPD-Model, this paper presents several fusion algorithms to finish practical health status detection tasks.
(ii) In MPD-Model, we design and formalize four kinds of preference information (or impact factors).
Through analyzing multi-preference information, we can quantify and evaluate importance degree of different multisource sensing parameters during data fusion. We propose MFA, a distributed and unified multi-preference fusion algorithm based on weighted average method for generating complex parameter patterns that will be further used for health status detection tasks.
(iii) To implement MPD-Model in a real wearable health status detection system [1], we propose EFA, a light-weight adaptive feature extraction algorithm for pulse data, to obtain physiological features of human wrist pulse waveforms. According to these features, we design TFD-Pattern, a novel time series pattern, and make the pattern as the input of MFA algorithm. Considering massive historical pulse data, this paper also presents SVM-based featuredecision-making algorithm to detect human health statuses, for example, subhealth. The algorithm is simultaneously an instance of y i element in MPD-Model.
(iv) We validate our proposed MPD-Model and fusion algorithms using large-scale real-world healthcare data sets. Experimental results show that FEA is effective and outperforms the existing derivativebased method at an over 16% feature extraction accuracy; an average fusion ratio of 98.73% is available when running MFA. SVM-based health status detection method well support intelligence of MPD-Model via outperforming (20∼30%) AR and WPT methods. Different weighted values of multipreference parameters perform distinct diagnosis performance. It proves that our individual fusion model based on preferences and weighted average method is reasonable and effective.

MPD-Model.
In this paper, we propose an MPD-Model, a novel-distributed multi-preference-driven data fusion model for Wireless Sensor Networks, as in Figure 1.
Explanations of MPD-Model are given below. Its input may include environment monitoring information, human physiological parameters, and movements/activities information.
indicates the output results of signals/data processing procedure P i on the sensor node S i . R i R j (1 ≤ i / = j ≤ n) states the fusion operation between any two output results. D k (1 ≤ k ≤ m) means one motoring target state or decision-making as the output of the model. y i stands for any method, algorithm, or model for recognition or decisionmakings.
The model is suitable for all kinds of WSNs applications for human sensing, particularly in healthcare monitoring, due to the following flexible features. (1) P i , a signal/data processing procedure, may be any kind of processing method or technology that can run on the sensor nodes, and they should be distributed, light-weight, or interactive. (2) Fusion operation, namely, R i R j , may also be any kind of calculation, algorithm, or technology. (3) The decisionmaking y i may be a complicated classification method (e.g., Support Vector Machine and Time-Space Factor Graph Model), a user-defined rule, or even a simplest Threshold Value Judgement Operation such as IF statement. (4) Values of D k depend on different WSNs applications. It can identify one monitored target state with Yes or No when n = 2. Otherwise, multiple states of one target can be recognized and output.
MPD-Model views intelligence and historical sensed data as core elements and key features. This is reasonable and scientific because they are the development tendency of data fusion model. There are several reasons as follows. (  illustrate is that characteristics of sensed data generated in WSNs have dramatically varied. (2) Historical sensed data is paying more and more important role in a data fusion procedure. Thus, the role should be clearly embodied in a novel data fusion model, or fusion performance conducted by intelligent methods based on analyzing historical data will be seriously weakened. (3) With rapid expanding of WSNs applications from traditional domains to civil fields, they have dramatically increased demands on the intelligence. (4) Data processing ways depend on data types. But only data fusion model can effectively solve the requirement when considering the intelligence. This model should be driven by physical model of WSNs, and aiming at the intelligence of WSNs applications, this model should combine all kinds of data processing technologies and establish a unified architecture or a novel fusion mechanism to efficiently implement users' requirements.

Formalization of Preferences' Information/Impact Factors.
Influences on detection decision-making results of sensing parameters depend on specific WSNs applications. Thus, we propose an individual data fusion model for WSNs. The model first computes fusion weights of different impact factors according to one WSN application's own features, and then combines and fuses these factors using weighted average method. Finally, more reasonable and accurate event detection results can be obtained. In the model, we innovatively design four kinds of ubiquitous preferences (impact factors or constraints) for data fusion in WSNs. They can clearly describe uniqueness or features of every WSNs application. They are formalized and described below.

Task-Parameter
Relevance, TPR. The impact factor measures the relevance degree between sensing parameters and task types of event detection. For instance, although some parameters, for example, temperature, humidity, air composition, and so on, are often combined together to detecting fire event, temperature parameter will obviously have more influences on fire event detection than others. TPR is designed from the perspective of the event detection types and computed using the following formula: where q E is the vector of event types and q P is the vector of sensing parameters for detection or monitoring task.

Event-Location-Collection-Position Relevance (ECR).
The impact factor reflects the relevance degree between the event location and the sensing parameter collection position. As we all know, those parameters that are collected from event locations will have more influences on event detection performance. For example, in the fire detection, temperature parameters at fire location will have higher reference value than the ones at other locations. ECR is computed according to formula (2) e score(P, E) where es indicates event and La = {l 1 , l 2 . . . , l m } means the set of position where sensing parameters are collected from.

Observer Recommendation Influence (ORI).
It indicates one parameter's importance degree in the eyes of observers or domain experts during event detection. It is well known that different experts will emphasize different sensing parameters in one same event detection task. Thus, the preference, that experts assign to one sensing parameter, will be very the important information for fusion. For example, environmental experts can obtain more accurate detection results of Algae outbreak by combining water quality, PH value, and water temperature parameter together, and ordinary observers can directly do it by using the value of Algae content information. ORI is calculated using the following formulas: where O denotes an observer or a domain expert, E an event, and spw means the weight of shortest path between O and E. The term "observers" is equivalent with "recommenders."

Collection Cost Preference (CCP).
It measures collection cost of sensing parameters selected in data fusion of WSNs applications in order to design more reasonable detection solution. For example, we can easily detect Algae outbreak event using Algae content sensors. However, for most environmental managers, it is impossible to do it by this way due to the technology difficulty or the high-cost solution. But according to suggestions of environmental experts, they can combine water quality, water temperature, water visibility, and PH value together to detect the same event. These kinds of information can be obtained using sensors or devices with high quality and low price. In this case, collection cost of sensing parameters has undoubtedly effect on the detection precision of Algae outbreak event.
Computing CCP needs to map all kinds of information from both sensing parameters and collecting sensors/devices into one same number interval so that comparisons might be made at one same order of magnitude. Therefore, Sigmoid function is used to map all inputs of real number field into the interval [0, 1], and then rank fraction function is applied to map ranking information into the interval [0, 1]. They are defined as in (4), respectively, sigmoid : rank score : R(r, c) where r indicates the rank of a sensing parameter or a device, and c indicates the number of the ranks. We first adjust input errors of devices and then carry out formatting and normalization processes. At last, combining sensing parameters' information, we compute t rank-the rank of technology maturity degree of sensors/devices, t count-the number of technology maturity degrees, p rank-the rank of popularity degree of devices, p count-the number of popularity degrees, d rank-the rank of difficulty grade/degree for collecting sensing parameters, d count-the number of difficulty grades, and gpa score-the percentage transferred from the price of devices. Thus, we can compute collection cost score of a sensor/device for a parameter using formula (5), the final formula for computing CCP impact weight is given, as in (6): where d count = n, t count = m, p count = k, P means a kind of sensing parameter, I indicates a sensor/device, and E means a monitoring task or an event.

Historical Sensed Data in MPD-Model.
Though lots of WSN-based applications had employed machine learning methods to conduct recognition in WSNs [1,2], to the best of our knowledge, few work definitely introduce Historical Sensed Data (or Historical Data), shortly HSD, as the core component of fusion model [6]. In our opinion, HSD should be involved in data fusion model design due to the following facts.
(i) In WSNs, long-term continuous monitoring will inevitably produce massive data. The importance of these data is that it not only provides query services for users, but also lays a solid foundation of decisionmaking for future data mining.
(ii) In WSNs, most parameters, such as pulse, ECG, and physical image, have complex data patterns. Accordingly, diverse machine learning methods are needed to train, learn, and classify these patterns. Therefore, HSD should be undoubtedly involved in data fusion procedure of WSNs systems.
(iii) Parameter types often determine what decisionmaking methods should be employed to finish health status detection task. Specifically, in the case of unattended monitoring, only some simple data processing method can be employed.
In the case of human sensing, complex factors such as users' social relationships might be considered and accordingly call for more complicated data processing methods (e.g., Time-Space Factor Graph Model).
To illustrate the role of HSD in data fusion tasks, we conduct comparative diagnosis experiments based on AR [7] and WPT [8] methods to recognize the five kinds of human health statuses. SVM and user-defined rules are employed as decision-making methods. The comparison results, as in Figure 2, show that SVM-based methods with HSD often have higher recognition accuracies (30∼40%) than userdefined rules without HSD.
International Journal of Distributed Sensor Networks 5

Health Status Detection
Chinese Pulse Diagnosis Theory (CPDT) [9] has been used in clinical diagnosis for thousands of years and has been proved to be valuable. According to CPDT, wrist-pulse data contains the abundant physiological and pathological parameter information of human body. Pulse information has different patterns which can identify different kinds of people with a specific health status or symptom/disease. Thus, sensed pulse data can be leveraged to effectively detect human health status.

TFD-Pattern: A Novel Time Series Pattern for Health
Status Detection. In this paper, a unique kind of pulse pattern is designed based on CPDT [9] and our previous researches on pulse diagnosis [1,5]. We first compute 90 time-domain feature variables based the five feature points (see Figure 3). For example, the feature variable T 1 /T can be obtained through leveraging the T-point and the P-point. Then we compute 36 frequency-domain feature variables based on the discrete Fourier transform formula, as in (7), and the discrete power spectral density function, as in (8).
Finally we obtain 6 feature variables for pressure value. Each of them is the average value of all the real-time external pulse pressures in one pulse waveform data. Actually, these above computations are just suboperations of f (·) in Figure 1: We merge the three types of feature variables into one 132-dimensional pulse pattern that stands for a wrist-pulse waveform and call it time-frequency-domain Pattern, shortly TFD-Pattern, as in Figure 4.

SVM-Based Feature-Decision-Making Algorithms for
Health Status Detection. Since the SVM performs well on problems with low training set sizes, we conduct SVM-based identification experiments on these pulse data sample groups with different human health statuses. Based on decision function of SVM theory, as in (9), we propose a health status detection algorithm which runs on the central server: where k is the number of pulse data samples; E is a set of feature vectors. K might be one of Linear, Quadratic, Polynomial, RBF, and MLP kernel functions. α i y i is weight vector and n i=1 α i y i = 0. It needs to be noted that the algorithm was designed just according to our proposed MPD-Model and is corresponding to α i y i in the model. Another characteristic of this algorithm is that it is based on our designed TFD-Pattern which is totally different from other pulse diagnosis (or health status detection) methods [7,8].

Distributed Multipreference-Driven Feature-Level Fusion Algorithms Based on Weighted Average Method
In terms of f (·) : R 1 R 2 · · · R m in MPD-Model, we present MFA, a novel distributed and unified multipreference fusion algorithm based on weighted average method. MFA first computes multi-preference parameter's impact weights of detecting health statuses using probabilities of four impact factors mentioned above, as in (10) then fuses all physiological parameters to generate parameter patterns for further health status detection decision-making, as in (11) where the initial values of all weights are set to 1 and m i=1 impact weight(P i , D) = 1. With receiving feedbacks from users, the model would use feedback data to optimize these weight values.

Experimental Setup.
In our previous work, we had developed a wearable healthcare monitoring sensor network system [1] for health status detection. This system helps us to collect the real-world health care data set in a large scale clinic experiments which happened at the Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) in 2009 and 2010 Hi-tech Fair of China in Shenzhen City, respectively. Wrist-pulse data samples distribution is shown in Figure 5.
We validate the effectiveness of the proposed methods comparing with the following baseline methods. [11]. In this method, an automated derivative-based time-domain feature extraction method on wrist pulse waveform is proposed to extract magnitude-type feature h i (i = 1 to 5). Here, the set of h i is equal to the five feature points (see Figure 3) that are employed in our work. [7]. In this health status detection method, an autoregressive-(AR-) based method is proposed to extract the pulse signal features. The mean and variance of the prediction error are calculated and selected as features. The selected features are then taken as inputs to a support vector machine (SVM) for diseases classification. [8]. This health status detection made pulse waveform feature extraction based on optimal Wavelet Packet Transform (WPT). Subband energies contained in the best basis were extracted as features and SVM classifiers were trained, differentiating cholecystitis patients and nephritic syndrome patients from normal people.

Feature Extraction.
According to CPDT [9], we must first extract the five most important time-domain feature points (see Figure 3) before running MFA algorithm in our health status detection. Thus, we propose FEA, an adaptive feature extraction algorithm for pulse data. It can be considered as an instance of P i → R i (1 ≤ i ≤ n) in MPD-Model. Specifically, we formalize the slope variability as a trend decision function, as in (12). Then, the final desirable feature points can be selected from the candidate ones according to some adaptive parameters. The most important adaptive parameter Threshold of PULSE is used to find the PB-point, the TD-point, and the PD-point, and it is computed by (13). Another adaptive parameter Amp pulse indicates the mean value of all pulse cycles in one pulse waveform. It can be computed by (14):

Effectiveness of Adaptive Parameters.
To illustrate the effectiveness of Threshold of PULSE, we run FEA algorithm on normal pulse data with 50 Hz, 100 Hz, and 1000 Hz sampling frequencies, respectively. The results are shown in Figure 6. From this figure, we can see that all the beacon features points can be accurately extracted. Similarly, according to our prior knowledge, we may set Amp pulse = T/2 when the irregular slippery pulse is processed, and Amp pulse = T/3 when the irregular normal pulse is dealt with. T indicates the length value of pulse waveform cycle. The results in Figure 7 indicate that FEA algorithm also well performs on different wrist-pulse conditions.

Performance Evaluation of Feature Extraction.
We compare the proposed FEA algorithm with a typical derivativebased method [11] via extracting these five feature points on 354 healthy people's wrist-pulse samples. Table 1 shows comparison results of extraction accuracies of 10 random examples.
From Table 1, FEA outperforms (+16%) the derivativebased method due to the following. (1) Though the derivative-based method is also light-weight, it is vulnerable to heavy noise from human body's activities. (2) The FEA not only is light-weight and provides great support to real-time pulse signal processing tasks, but also makes full use of many adaptive parameters based on automatic computing or prior knowledge.

Fusion Effectiveness.
In our experiments, different types of data at each processing stage would be saved into the respective files. Considering the existing information fusion estimator, for example, sample-based estimator using measurements [12], we calculate data fusion\reduction ratios by comparing file sizes before and after processing sensed pulse data. Reduction degree of these files size directly shows the performance of MFA algorithm. Table 2

Coronary Cirrhosis Pregnancy Hypertension
To better reflect  the number of one wrist-pulse ample, the second column stands for the sizes of these files before running MFA algorithm, the third column shows the sizes of these files after running MFA algorithm, and the final one gives data fusion ratios. The results show MFA algorithm has excellent fusion effect for wrist-pulse data.

Health Status Detection Performance.
We make evaluations from the following aspects.    [13]) have great impacts on revealing disease/symptom information. Based on this investigation (see Figure 8), we further compute different weight values that are listed in the 2-6 columns of Table 3. According to formula (1), we detect four kinds of health statuses or diseases/symptoms: hypertension (H), coronary (Co), cirrhosis (Ci), and pregnancy (P). These weights are selected and grouped into three weight solutions shown in Table 3. According to these weight solutions we conduct comparative experiments and give the results shown in Figures 9 and 10. From the two figures, we can see that the greater weight values of related impact factor (P i ) become the higher health status detection precisions have. This proves that our multi-preferences-driven data fusion-based impact factor evaluation can well work.

Comparative Evaluation.
In this paper, we compare with the baseline methods, AR [7] and WPT [8] methods, to validate the effectiveness of the proposed MPD-Model. The two methods and ours all focus on feature extraction of wrist-pulse data and are based on SVM to distinguish patients with different diseases from normal people. The difference is that distinct wrist-pulse pattern is employed to recognize. Thus, we can get solid comparison results shown in Figure 11. These comparative experiments are conducted on the real healthcare data sets mentioned above (see Figure 6). CPU time of ours, AR, and WPT health status detection method is approximately 17.3121 s, 164.7448 s and 42.0289 s, respectively. Among all comparative experiments, our SVMbased health status detection method takes the least running time and thus it is proved to be light-weight and more suitable to fit wearable healthcare monitoring system based on WSNs. As for hypertension, coronary, cirrhosis, and pregnancy symptoms, our method outperforms (average 15∼30%) AR and WPT methods. Thus, our method is more effective to detect human health status. However, identification accuracy of our method is a little lower than the two methods when detecting subhealth status. There are two reasons to explain it: (1) low accuracies of feature extraction for pulse waveform data will cause that it is hard to clearly distinguish some pulse patterns from others. But AR and WPT methods have no such problem just because algorithms of computing feature vector are totally different from ours. (2) Generally, it is very difficult to identify people with subhealth status from the normal ones since such patients often have not clear disease characteristics.

Effectiveness Evaluation.
At last, feature variables with significant difference in the two groups, as in Table 4, are obtained by conducting T statistical analysis method. As Figure 3 shows, feature variable S 5 /S indicates the ratio of TD-point-T-point area to the whole area of one pulse cycle. It reveals clear physiological meaning. That is, patient's reflected wave of blood flow in one heartbeat cycle are stronger than the healthy. This is may be one reason that blood pressure values of a hypertension people are always high. Another feature variable, real-time External Pulse Pressure (EPP), indicates that real-time blood pressure values recorded by pulse-sensor nodes have distinct effects on the patients. It is more reasonable than S 5 /S to explain such the pathological phenomenon that patient's blood pressure is higher than healthy people's.

Related Work
WSN-based healthcare solutions become a hot research topic. It has been well developed through the research work in activity recognition [14], physiological data gathering [15], pattern recognition [16], and so forth. Though lots of existing work collect blood pulse data for a healthcaremonitoring purpose, few work study systematically how to fuse parameters from diverse sources based on impact factors. The five most important feature points (see Figure 3) considered as the basis of the objectification of pulse study by most researchers [17]. Currently, most extraction methods of time domain features are derivative based [11]. This kind of method is easy to be implemented on sensor nodes, but is just suitable to regular pulse waveform with high Signalto-Noise Ratio (SNR). To address the problem of limited resources, feature extraction in our work focuses on sensed pulse data with low SNR. The proposed algorithm is lightweight and adaptive. Much work focus on pulse detection and classification. For example, [18] proposed a Lempel-Ziv complexity analysis-based approach to classify seven pulse patterns that exhibit different rhythms. Besides, Chinese Pulse Diagnosis Theory (CPDT) is also objectified in order to pulse acquisition, feature extraction of pulse, and pulse diagnosis [17]. As a powerful method for pattern classification with low training set sizes, SVM is often leveraged to diagnose pulse signals/patterns that are computerized using Modified Auto-Regressive Models [7] or spatial and spectrum features [19].
Though there already exist some data fusion methods, for example, Crash Fault Correction [20], Privacy Protection [21], position-based Aggregator node election in WSNs [22,23], and environmental Monitoring [24], they are not suitable for Wearable Sensor Networks due to complicated sensed data patterns and high intelligence requirements [6]. Reference [25] also focus on multisource fusion model, but employing Chi-square distribution with n degrees of freedom method instead of weighted average method. Reference [12] discusses information fusion estimation problems and provides useful estimation methods for our proposed data fusion model. A continuous classification of multisource sensed data in [26,27] inspired us to introduce historical data into designing multisource-driven data fusion model.

Conclusion
In this paper, we try to systematically investigate the problems of data fusion and health status detection and propose MPD-Model, a novel-distributed multi-preferencedriven data fusion model for WSNs. Based on MPD-Model, we present MFA algorithm (an instance of f (·) in MPD-Model), EFA algorithm (P i ), and SVM-based health status detection method (y i ). We also design TFD-Pattern, a novel time serials pattern according to CPDT. It is a kind of human pulse pattern as well as is both EFA's output and MFA's input. Experimental results show that they all outperform baseline methods with the feature extraction accuracy of +86% and the system diagnosis accuracy of +75%, and our proposed MPD-Model is reasonable and effective.