Wearable Inertial Sensors to Assess Standing Balance: A Systematic Review

Wearable sensors are de facto revolutionizing the assessment of standing balance. The aim of this work is to review the state-of-the-art literature that adopts this new posturographic paradigm, i.e., to analyse human postural sway through inertial sensors directly worn on the subject body. After a systematic search on PubMed and Scopus databases, two raters evaluated the quality of 73 full-text articles, selecting 47 high-quality contributions. A good inter-rater reliability was obtained (Cohen’s kappa = 0.79). This selection of papers was used to summarize the available knowledge on the types of sensors used and their positioning, the data acquisition protocols and the main applications in this field (e.g., “active aging”, biofeedback-based rehabilitation for fall prevention, and the management of Parkinson’s disease and other balance-related pathologies), as well as the most adopted outcome measures. A critical discussion on the validation of wearable systems against gold standards is also presented.


Introduction
Human balance in the upright stance can be quantitatively evaluated by means of a posturographic examination. Posturography is the systematic measurement and interpretation of quantities that characterize postural sway in upright stance. In the clinical field, posturography is used to estimate fall risk in geriatric subjects [1] and to objectively evaluate balance-related disabilities (such as Parkinson's disease, concussion, and stroke) and rehabilitation protocols [2][3][4][5], while in sport science, posturography is used to appraise subtle differences in the balance performances of athletes [6]. The increasing interest towards the study of balance has led to a continuous evolution of the methods used to carry out this examination. Traditionally, posturography exploits a force plate to evaluate the body's postural sway by recording the trajectory of the Center of Pressure (COP), which is the point of application of the resultant ground reaction force [7]. Although the force plate is considered the gold standard to obtain reliable balance measurements, it is expensive and heavy to transport, making it impractical in clinical settings and sport centers. In recent years, wearable sensors based on miniaturized Inertial Measurement Units (IMUs) or Magneto Inertial Measurement Units (MIMUs) are increasingly being used in posturography, as demonstrated by the high number of papers focusing on this topic [8][9][10][11][12][13][14].
Subjects can easily wear these sensors on various body segments, through elastic belts or Velcro ® bands. The number of sensors and their positioning generally depend on the application considered.
A wearable inertial sensing unit typically includes accelerometers, gyroscopes, and magnetometers. A triaxial accelerometer measures the proper linear acceleration of movements in a sensor-fixed three-dimensional (3D) frame; measured data include both motion and gravity components. A triaxial gyroscope measures its proper angular velocity in a 3D space, and the components of the rate of turn are assessed in a sensor-fixed three-dimensional frame; rotations around three orthogonal axes are commonly defined as Euler angles, e.g., "roll", "pitch", and "yaw". A magnetometer measures both amplitude and direction of the local magnetic field in a 3D space; magnetic field components are stated in a sensor-fixed three-axes frame. Usually, accelerometer, gyroscope, and magnetometer measurements refer to a common three-axes frame fixed to the sensing IMU.
However, wearable sensors have not yet become a standard in posturography due to the unknown accuracy of IMU-based evaluations for balance assessment with respect to the gold standard force platform. If proven accurate, the use of wearable sensors for balance measurements would be ideal, since they are low cost and easily portable in different environments.
In the literature on balance control, fall risk assessment through wearable sensors is a debated topic [15][16][17][18][19][20][21][22][23]. Three systematic reviews specifically focused on the objective estimation of fall risk in geriatric populations: the first one, dating back to 2013, addressed the use of inertial sensors for fall risk assessment [19]; the second one, in 2017, addressed balance and fall risk assessments with mobile phone technology [20]; while the third one, in 2018, considered novel sensing technologies in fall risk assessment in older adults [21]. Another review provided insight into the detection of "near falls" (slips, trips, stumbles, and temporary loss of balance) using wearable devices [22]. An additional review targeted activity trackers for senior citizens [23] for monitoring various physical activity indicators and analyzed fall detection and prediction.
Among the various pathologies affecting balance performance, it is widely recognized that Parkinson's Disease (PD) is a condition that may greatly benefit from an innovative clinical management of patients based on wearable monitoring technologies. A systematic review, published in 2013, discussed wearable technology and the principal postural parameters that should be analyzed for assessing PD [24]. Another systematic review, published in 2015, analyzed wearable sensor use for assessing both standing balance and walking stability in people with PD [25]. Finally, a systematic review, published in 2016, highlighted the characteristics and validity of monitoring technologies to assess PD [26]. Another commonly reported balance disorder that may strongly benefit from the use of wearable sensing technology is Multiple Sclerosis (MS). A recent systematic review, published in 2018, analyzed the validity of wearable sensor use for mobility and balance tracking in patients affected by MS [27]. Besides the constant need for rehabilitation professionals to have reliable balance outcome measures, there is a growing interest in the development of wearable systems specifically designed for the market of "active aging". These systems may address both healthy and pathological populations. In this context, balance training based on wearable sensors and biofeedback constitutes a promising field of investigation. In 2016, a systematic review focused on balance improvement effects of biofeedback systems with wearable sensors [28]. In 2018, a systematic review and meta-analysis of randomized controlled trials analyzing both healthy and patient populations provided valuable knowledge on the effects of wearable sensor-based balance and gait training on balance, gait, and functional performance [29]. In addition, a review specifically analyzed smartphone applications to perform body balance assessments [30].
Despite the expanding body of evidence supporting the use of wearable sensors to assess postural balance, it is important to recognize that this area of research is still developing. As described before, several other systematic reviews have been published in the last years, focusing on postural balance assessment of sample populations affected by different balance-related pathologies. This study extends previous efforts by reviewing a large number of papers that use wearable sensors to assess postural balance and by providing a detailed overview of the most commonly reported applications that involve the use of wearable sensors to assess postural balance. The objectives of this work are (1) to select high-quality papers that adopt wearable inertial sensors for quantitatively evaluating standing balance; (2) to highlight the most important clinical applications in the framework of the fast-growing consumer market of IMUs, including rehabilitation and biofeedback; (3) to investigate the most common sensor placement and test protocols; (4) to describe the main parameters and outcome measures adopted; (5) to indicate which works perform a validation against a gold standard or a clinical score; and (6) to suggest future design directions of IMU-based wearable systems.

Methods
This review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [31].

Search Strategy
PubMed and Scopus electronic databases have been interrogated in February 2019 to identify articles measuring postural balance through IMU wearable sensors. The following keywords were used for the electronic database search within the title and/or the abstract: "posturography", "postural sway", "postural control", "balance", "IMU", "MIMU", "inertial sensor", "accelerometer", "sensor", "wearable", "smartphone", and "activity tracker". Specifically, the query that was used to search the articles in the databases was ("posturography" OR "postural sway" OR "postural control" OR "Balance") AND ("IMU" OR "inertial measurement unit" OR "MIMU" OR "magneto inertial measurement unit" OR "inertial sensor" OR "accelerometer" OR "wearable sensor" OR "smartphone" OR "activity tracker"). In addition to the electronic database search, the reference lists of all the identified articles were searched by hand in order to identify additional relevant studies. The literature search was conducted by M.G.

Study Selection and Quality Assessment
After the initial electronic database search was completed, one rater (M.G.) screened the titles and the abstracts of each included article and decided on the suitability of the study for inclusion in this review. Articles were excluded if they (i) were not written in English, (ii) were an abstract and/or included in the proceedings of a conference, (iii) were a review article or a case study, (iv) were similar to other studies, (v) were published before January 2010, (vi) were not available in full text, (vii) did not enrol a sufficient number of subjects (<10 subjects), (viii) were not ranked on Thomson Reuters, and (ix) did not use any form of wearable sensor to measure variables associated with standing balance. Furthermore, articles were excluded if they were out-of-topic with respect to the aims of the present review, i.e., the study of standing balance using wearable sensors. Hence, we excluded studies focused on gait analysis, walking balance, fall detection, anticipatory postural adjustments, and other dynamic tasks such as sit-to-stand and Time-Up-and-Go (TUG) tests. If a study included both gait and balance analysis, we considered only the balance part of the study.
The full text of the articles that met the initial inclusion criteria were retrieved and downloaded into Mendeley Desktop 1.19.4 for further screening. To make a further selection of the large number of studies that were available for the present review, a quality assessment was performed for each included article. Full-text articles were independently assessed for suitability by two raters in terms of internal, statistical, and external validity [32] (V.A. and L.G. for papers with clinical applications and S.P. and L.G. for the remaining papers). In particular, internal validity concerns the assessment of possible biases in the research design and methods, statistical validity allows for quantifying the statistical significance of the results, and external validity is useful for assessing the generalization of the study [33]. Each rater was asked to answer a 15-item checklist similar to those commonly used in the literature for systematic and/or meta-analysis reviews [34][35][36][37][38] and modified based on the specific review topic. In particular, the proposed checklist (Table 1) provided information on (i) internal validity (question numbers 1, 3-6, and 9-11); (ii) statistical validity (question numbers [12][13][14][15]; and (iii) external validity (question numbers 2-4 and 6-8). Each item of the checklist had to be answered with "Y", "N", or "Maybe" corresponding to scores of 1, 0, and 0.5, respectively. For each article, the total score was computed as the sum of scores of all the items in the checklist. Once each rater had completed the quality assessment, Cohen's kappa statistics [39] was used to compute the degree of agreement between raters.
For each article, the final quality-assessment score was computed as the average of the scores assigned by each reviewer. The analysed articles were then divided into three different classes based on the final quality-assessment score: (i) "high quality" (final score >10), (ii) "medium quality" (final score between 5 and 10), and (iii) "low quality" (final score <5). Only articles classified as "high quality" were included in the present review.

Searching Results and Study Selection
A detailed flow diagram illustrating the searching results and the screening strategy is provided in Figure 1. A total of 696 articles was identified as eligible for inclusion in the present review. The initial screening of titles and abstracts removed 204 studies due to the previously stated exclusion criteria, which involved(ii) abstract or conference proceedings (46 articles), (iii) systematic reviews or case studies (34 articles), (iv) duplicated studies (26 articles), (v) studies published before January 2010 (25 articles), (vi) unavailable full text (7 articles), (vii) studies that enrolled less than 10 subjects (30 articles), (viii) studies not ranked on Thomson Reuters (32 articles), and (ix) studies that did not used any form of wearable sensor to measure variables related to standing balance (4 articles). A further 419 articles were removed since they were out-of-topic. The remaining 73 articles were reviewed in their full-text versions to assess their inclusion in the review after the quality check (details are in the next section). Finally, 47 high-quality articles were included in this systematic review.

Quality Assessment Results
Internal, statistical, and external validity were evaluated by the two raters for each of the 73 full-text papers analysed. The summary of the quality assessment is reported in Table 2. Considering the final quality-assessment score, each article was classified as low, medium, or high quality. Forty-seven articles (64.4%) were classified as high-quality contributions, 24 articles (32.9%) were classified as medium quality contributions, and 2 articles (2.7%) were classified as low-quality contributions. The detailed results of the quality assessment performed by the raters on the 73 full-text articles are summarized in Table S1 (articles included in the systematic review) and in Table S2 (articles not included in the systematic review). The inter-rater agreement, computed by means of the Cohen's kappa, was equal to 0.79, suggesting a good agreement between raters. After the quality assessment, 47 articles were included in this review (only those classified as "high quality"), with an average quality score of 13 ± 1 (the maximum score was 15).
A summary of the main characteristics of the articles included is reported in Table 3.

Sample Population Characteristics
As detailed in Table 3, sample population characteristics and sizes varied across the included articles. The subjects enrolled in these studies consisted of healthy, young, and/or older adults (with mean age between 15 and 78 years), persons with sport-related concussions, and patients with Parkinson's Disease (PD), Multiple Sclerosis (MS), ankle sprain, Traumatic Brain Injury (TBI), diabetic peripheral neuropathy (DPN), degenerative cerebellar ataxia, stroke, high fall risk, and haemophilia. For what concerns the patients mentioned above, a summary of the studies is provided in Table 4. The most commonly reported balance disorders were Parkinson's disease (14 articles), degenerative cerebellar ataxia (4 articles), sport-related concussion (4 articles), and diabetic peripheral neuropathy (3 articles). Among the 47 studies included, 29 articles (61.7%) assessed the standing balance of pathological subjects with respect to a healthy control population, 11 articles (23.4%) assessed the standing balance only on healthy subjects, while 7 articles (14.9%) assessed the standing balance only on pathological subjects. Sample size ranged from 10 (based on the exclusion criterion) to 135 subjects.

Sensor Type and Placement
Several wearable sensors were used to assess standing balance. Wearable sensors included inertial motion sensors equipped with accelerometers, gyroscopes, and magnetometers; standalone multiaxial accelerometers, and smartphones equipped with inertial sensors. Of the 47 included articles, 26 articles (55.3%) used commercial inertial sensors, 13 articles (27.7%) used commercial 3D accelerometers, and the remaining 8 articles (17.0%) used one-dimensional or two-dimensional homemade accelerometers. The most commonly used inertial sensors were Opal APDM Wearable Technologies (10 articles), MTX Xsens Enschede (8 articles), and BalanSens BioSensics LLC (3 articles). A wide range of sampling frequencies (from 10 Hz to 1000 Hz) was used to acquire the signals during standing balance measurements, but the most commonly used sampling frequency was 100 Hz.
Similarly, several sensor placements of the wearable sensors were described in the experimental protocols, depending on the postural task. Among the 47 included articles, 38 articles (80.9%) placed the wearable sensors on the lower back near the center of mass (e.g., lumbar region of the trunk at L5 and sacral region of the trunk at S2), 15 articles (31.9%) placed it on the lower limb (e.g., thigh, malleolus, and shank), 7 articles (14.9%) placed it in correspondence with the sternum, 5 articles (10.6%) placed it on the upper back (e.g., thoracic region of the trunk at Th4), 3 articles (6.4%) placed it on the upper limb (e.g., wrists), and 1 article (2.1%) placed it on the forehead. Figure 2 represents all the sensor placements used in the reviewed articles. All the wearable sensors were attached to the subjects by means of elastic belts or Velcro ® bands. Further details on the type and placement of the wearable sensors used in the included articles are summarized in Table 3.

Parameters for Standing-Balance Assessment
Several parameters were calculated for assessment of the standing balance from the signals acquired through the wearable sensors. The acquired signals were usually lowpass filtered by means of digital filters with cut-off frequencies that ranged between 0.5 Hz and 10 Hz. The most commonly reported parameters computed from the filtered acceleration signals were Root-Mean-Square (RMS) (21 articles) expressed in m/s 2 , jerk index (8 articles) expressed in m 2 /s 5 , range of accelerations (8 articles) expressed in m/s 2 , centroidal frequency (7 articles) expressed in Hz, and frequency dispersion (6 articles). The most frequently used parameter computed from the velocity signals (first integral of acceleration) was the mean sway velocity (12 articles) expressed in m/s. The most commonly reported parameters computed from the displacement signals (second integral of acceleration) were RMS (6 articles) expressed in mm, sway area (5 articles) expressed in mm 2 , mean distance (5 articles) expressed in mm, and sway path length (4 articles) expressed in mm.
A summary and a brief description of the balance parameters used in at least two articles is provided in Table 5, with indication of the corresponding references. Parameters used only by a single article were not reported.

Validation Against a Gold Standard
Validation against a gold standard (e.g., force plate and/or clinical score) was introduced by some authors to check the sensitivity and experimental validity of the accelerometric measures (acquired through inertial sensors) compared with the standard laboratory measures (COP and clinical scores). Among the 47 articles included in the review, only 17 validated the results against a gold standard. Ten articles (21.3%) validated the results against a force plate (e.g., AMTI AccuSway-O, Kistler and Synapsis Posturography System), and the other 7 articles (14.9%) validated against a clinical score (e.g., Balance Error Scoring System (BESS) and Berg Balance Score (BBS)). Among the articles that included a validation against a gold standard, 4 articles (8.5%) also compared the test-retest reliability of wearable-sensor and force-plate measurements. A summary of the articles that included a validation against a gold standard is reported in Table 6.

Discussion
This work demonstrated that, in the literature, there is a large body of high-quality papers (47 articles) evaluating postural balance through wearable sensors. We obtained a good inter-rater agreement for the assessment of quality of the full-text papers analysed (Cohen's kappa equal to 0.79), meaning that the raters had only minor discrepancies in their judgments of internal, statistical, and external validity concerning the articles examined.
The authors think that, in clinics, the advantages of using wearable-sensor outcome measures of balance, instead of clinical subjective scores, are evident. Wearable sensors can provide a huge amount of data that, if properly processed and correctly interpreted, may allow for assessing balance performance in a more useful, accurate, reliable, and repeatable manner. Indeed, in using wearable sensors, it is possible to easily include a large number of subjects and task repetitions, to collect data out of the lab, to engage patients in more personalized rehabilitation protocols, and to campaign to older subjects active aging and fall prevention.
Among the many different applications, it emerged that the postural sway assessment through wearable sensors may be particularly important for Parkinson's disease patients. This is not surprising considering the difficulties that clinicians may have in the prescription of the correct Levodopa drug dose and its fractioning and in the follow-up adjustments to therapy to control patient symptoms and the effects that the drug itself may have on balance performance [44,45,47,48,52,53,56,[69][70][71]74,76,77,79].
Wearable sensor technology is widely available at low cost. In the simplest applications, the inertial sensors embedded in smartphones can be used to measure postural sway [49,65,66,69,76]. On the other hand, recently, a number of wearable systems were specifically designed to perform instrumented balance analysis. In some cases, these systems were customized for specific applications in the rehabilitation field, including systems relying on biofeedback. From a technical perspective, reviewing the articles for this work, the authors realized that there is a general lack of information pertaining to sensor calibration procedures. Commonly, two of the IMU sensing axes were oriented along the ML and AP anthropometric directions and the third axis was oriented along the vertical direction (i.e., gravity line). Considering that balance postural tasks involve quiet standing trials and small sway angles, measurements in the ML and AP directions are ideally not biased by gravity acceleration. In the reviewed paper, it was generally assumed that the components of gravity acceleration in the AP and ML directions due to sensors misalignment were negligible. Overall, little or no information is provided on this important aspect. A rigorous measurement approach requires that the estimated orientation of the sensor axes with respect to a fixed global frame is used to rotate the measured acceleration from the sensor-fixed to global frame and that the gravity constant is subtracted to obtain the net motion acceleration.
A variety of different research protocols was found in the examined articles. In many practical situations, a single sensor positioned on the lower back of the subject, mostly at the L5 level, is used to perform the posturographic examination. In some papers, additional sensors are placed on the lower limbs to assess the postural strategy (e.g., hip or ankle strategy) [44,45,59,60]. Few articles report sensor placement also on the upper limbs and trunk, but in these case, additional aims are the assessment of the base of support [61], trunk tilt [47], objective BESS [46], and correction of the vertical position of the Center of Mass (COM) [56]. In most cases, subjects are asked to maintain double leg stance for 30 s. While in the literature it is widely recognized that the position of the feet on the support surface heavily influences the postural sway, since it modifies the base of support, a standard feet position is not fully established. Indeed, the feet position sometimes is not even reported in the study protocol (n = 10 papers failed to report this information). Typical feet positions in double leg stance are: (1) feet together (opening angle: 0 • , inter-malleolus distance: 0 cm); (2) feet opening angle ranging from 10 • to 30 • (the latter being the most frequent value), with inter-malleolus distance ranging from 0 cm up to a 10 cm; (3) self-selected feet position; and (4) footprint, having the same template position for every subject. Although one may think that the position with feet together might be easily standardized, this position can be challenging for some subjects suffering from balance-related disabilities. Patients may prefer keeping their feet apart to maintain balance. Furthermore, keeping feet apart in a comfortable self-selected position seems to provide an ecological test condition, closer to real-life upright stance. The drawback of this choice is, evidently, that the balance performance may be biased by the subjective selection of the base of support (the larger the base of support, the better the balance performance). The above-discussed issues are probably the main reasons why researchers have not yet reached a consensus on feet positioning during the examination of postural sway. In this perspective, the same debate characterized "traditional" posturography, i.e., posturography performed through force plates. However, since many current applications based on wearable sensors and many more forthcoming applications will be carried out-of-the-lab in uncontrolled environments with subjects tested at their domicile and/or during their habitual activities of daily living, the self-selected feet position might still be the best compromise. Typically, at least two different test conditions are considered, i.e., with eyes open and closed, to estimate the effect of visual deprivation on balance. In some cases, in addition to a firm surface, a foam surface is used to differently stimulate the proprioceptive system of subjects during the postural balance task. In other cases, subjects stand in tandem, semi-tandem, or single-leg stance (on the dominant side, on the contralateral side, or alternating both conditions) in order to challenge their balance control. In a few studies, a dual task protocol is also introduced, e.g., asking subjects to count down by 3 from 100 while standing upright, to study the interference of a concomitant cognitive load on balance.
Most of the outcome measures introduced in the analysed studies are based on accelerometric signals; a few studies use gyroscope signals, and only very seldomly, signals from magnetometers are mentioned. The most frequently used outcome measure is the Root-Mean-Square (RMS) calculated from acceleration signals. This parameter is typically evaluated separately for the anteroposterior and mediolateral directions. In some cases, the total RMS is reported. With regard to acceleration signals, it should be noticed that a direct comparison with traditional force-platform (COP) signals is not possible [87]. The parameter values obtained from acceleration and COP signals estimate different physical quantities. Furthermore, wearable sensors are placed in different positions on the body (the most common location being on the back at the L5 level in correspondence to the COM) with respect to where the information from the COP signals arise (between the feet and within the base of support). In some cases, a 1-link or 2-link inverted pendulum model is applied in an attempt to bridge the gap [55,56,62,71,76,77,79,81,84]. The fact that acceleration signals obtained from wearable sensors and traditional COP signals obtained from a force platform cannot be directly compared is not a problem by itself if the concept of a new, wearable-based posturography is introduced. With this statement, the authors mean that, as long as wearable sensors provide useful information on postural balance, it is irrelevant that this information is based on parameters that are not directly comparable with those used in traditional posturography. This point of view is supported by valuable contributions such as the Instrumented test of Postural Sway (ISway) proposed by Mancini et al. in 2012 [71]. The basic idea of this kind of approach is that the new wearable technology, introducing an IMU-based assessment of the postural sway, is mature enough to "replace" balance clinical scales and scores without the limitation of the traditional posturographic approach.
Moreover, analysis of the most significative parameters associated with different balance disorders shows that, in PD populations, the parameters that best discriminate postural sway in the time domain are the jerk index [48,70,71,77], the sway amplitude [56,77], and the range of acceleration signals [76], while in the frequency domain, they are frequency dispersion [70,77] and centroidal frequency [71,79]. People with MS have increased sway acceleration amplitude [83], and instrumented standing balance measures were best for spatiotemporal measures, while frequency measures were less reliable [50]. Individuals with concussions displayed increased normalized path lengths of the acceleration signal in the AP [42] and ML [68] directions but also wider sway volume and area of the acceleration signals [54]. Among the high-quality articles selected in this review, 36% focused on the validation of wearable sensors against a gold standard approach. In particular, 10 papers were focused on the comparison between the performance of wearable sensors and force plate for postural balance assessment, while 7 papers focused on the correlations with clinical scores or scales (such as BESS or BBS). For the former, investigations were frequently limited to the evaluation of the repeatability of the wearable sensor approach compared to the traditional COP measurements, through the analysis of intra-class correlation coefficients or analogous measures. The authors noticed a lack of information on the comparison of sensitivity between wearable systems and force-plate traditional approaches. However, this is a crucial aspect. Indeed, especially in the clinical field, it is very important that the "least detectable change" of an outcome measure is smaller, enough for the specific application under consideration. Hence, the authors think that one open issue in this research field is the sensitivity of wearable systems with respect to traditional gold-standard force plates. Future studies should investigate more deeply this aspect.

Conclusions
After a quality assessment of the selected papers, we summarized the state-of-the-art knowledge on wearable sensors used to evaluate standing balance, highlighting the main applications in clinics and active aging and discussing the best sensor location and most effective data acquisition protocols. The results of this review suggest that efforts in the validation of wearable systems against traditional posturographic approaches should focus on the evaluation of the sensitivity of the outcome measures provided by this promising technology.
Supplementary Materials: The following are available online at http://www.mdpi.com/1424-8220/19/19/4075/s1, Table S1: Results of the quality assessment performed by the raters on the 47 full-text articles. Table S2: Results of the quality assessment performed by raters on the articles not included in the systematic review.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The