Deep Learning Model to Evaluate Sensorimotor System Ability in Patients With Dizziness for Postural Control

Balanced posture without dizziness is achieved via harmonious coordination of visual, vestibular, and somatosensory systems. Specific frequency bands of center of pressure (COP) signals during quiet standing are closely related to the sensory inputs of the sensorimotor system. In this study, we proposed a deep learning-based novel protocol using the COP signal frequencies to estimate the equilibrium score (ES), a sensory system contribution. Sensory organization test was performed with normal controls (n=125), patients with Meniere’s disease (n=72) and vestibular neuritis (n=105). The COP signals preprocessed via filtering, detrending and augmenting during quiet standing were converted to frequency domains utilizing Short-time Fourier Transform. Four different types of CNN backbone including GoogleNet, ResNet-18, SqueezeNet, and VGG16 were trained and tested using the frequency transformed data of COP and the ES under conditions #2 to #6. Additionally, the 100 original output classes (1 to 100 ESs) were encoded into 50, 20, 10 and 5 sub-classes to improve the performance of the prediction model. Absolute difference between the measured and predicted ES was about 1.7 (ResNet-18 with encoding of 20 sub-classes). The average error of each sensory analysis calculated using the measured ES and predicted ES was approximately 1.0%. The results suggest that the sensory system contribution of patients with dizziness can be quantitatively assessed using only the COP signal from a single test of standing posture. This study has potential to reduce balance testing time (spent on six conditions with three trials each in sensory organization test) and the size of computerized dynamic posturography (movable visual surround and force plate), and helps achieve the widespread application of the balance assessment.

performance of the prediction model.Absolute difference between the measured and predicted ES was about 1.7 (ResNet-18 with encoding of 20 sub-classes).The average error of each sensory analysis calculated using the measured ES and predicted ES was approximately 1.0%.The results suggest that the sensory system contribution of patients with dizziness can be quantitatively assessed using only the COP signal from a single test of standing posture.This study has potential to reduce balance testing time (spent on six conditions with three trials each in sensory organization test) and the size of computerized dynamic posturography (movable visual surround and force plate), and helps achieve the widespread application of the balance assessment.

I. INTRODUCTION
V ERTIGO, as a subtype of dizziness, refers to a perceptual phenomenon characterized by a false sensation of moving or surrounded by moving objects [1].It can evoke nystagmus and walking abnormalities along with nausea and vomiting [2].The etiology of dizziness is largely divided into vestibular disorders related to the vestibular system and non-vestibular disorders, which include psychogenic and cardiogenic impairment, with more than three-fold higher incidence of vestibular disorders [2].Vestibular disorders can be divided into peripheral types (85%) caused by dysfunction of labyrinth and vestibular nerve, and central disorders (15%) caused by defective or increased levels of vestibular nuclear complex [3].Typical peripheral vestibular disorders include Meniere's disease, which is associated with membranous labyrinth, and is caused by endolymphatic hydrops [1] and vestibular neuritis, a neurological degeneration caused by viruses [4].These peripheral disorders lead to difficulties in maintaining the correct posture when standing or walking due to dizziness.
Maintaining a successful postural balance without dizziness is achieved by harmonious coordination and integration of visual, vestibular, and somatosensory systems [5], [6].The visual and vestibular systems contribute to postural stability via feedback from the external environment and by controlling the position and movement of the head [7].The somatosensory syst em controls muscle activity by the nervous system to increase joint stability, thereby helping maintain a stable  posture [8].Each sensory system detects posture changes independently via the sensorimotor pathway.The central nervous system activates muscles appropriately for posture control based on integrated information obtained from the sensory system [9].Therefore, dizziness occurs when the sensory information entering the central nervous system does not correspond with patterns experienced during the growth process [10].This type of dizziness can occur due to abnormal excitation or damage anywhere along the sensory and central nervous systems.
Clinical diagnosis and rehabilitation assessment of the dizziness is performed by computerized dynamic posturography (Fig. 1).Sensory organization test (SOT) in computerized dynamic posturography is clinically the most widely used balance assessment method certified by the American Society of Otolaryngology and Neuroscience [11].SOT is a test constructed to induce posture control by altering human sensory system inputs (Fig. 2).It assesses balance by isolating sensory and motor factors [12].Thus, each of the somatosensory (forward/backward rotation of footstep) and visual inputs (eye closure) or their combination can be manipulated to measure the degree of perturbation prior to and following human orientation and determine the contribution of each sensory system to posture control based on an equilibrium score (ES).ES is a parameter that represents the normalized scores by dividing the forward and backward rotation angles of the center of gravity (CG) during six conditions of the SOT in computerized dynamic posturography by the maximum rotation angle (12.5 • ).This parameter is used to evaluate and record the individual's postural control ability.However, a computerized dynamic posturography system, which includes movable visual surround and dual force plate, is relatively large in size compared with the challenge of space utilization.In addition, evaluating each sensory system through artificial manipulation with each test condition has limitations in that it cannot simultaneously evaluate the role of the integrated sensory system in posture control [13].Therefore, there is a need for a novel evaluation protocol that can identify the integrated sensory contribution via simple postures such as quiet standing.
The rest of this article is organized in the following order.Section II includes previous studies related to the clinical rehabilitation assessment methodologies of dizziness.Section III presents the role of deep learning approach and objectives of this study.Section IV presents materials and methods including experiments and technical details of a novel deep learning-based evaluation protocol for postural control.The experimental results of the proposed deep learning model are presented in Section V. Finally, Section VI interprets the results along with the limitations of this study and provides future research directions.

II. CLINICAL REHABILITATION ASSESSMENT OF
POSTURAL CONTROL Clinical rehabilitation assessment of dizziness is based on perturbed test (PT) and not-perturbed test (NPT), depending on the presence of a support in a standing position [14].PT evaluates balance by inducing balance perturbations such as reaching task or leaning actions [15], or constrained movements of external devices.Krishnan et al. [16] performed a balance recovery evaluation for rapid arm flexion/extension movements during standing posture using a force plate.Patients with multiple sclerosis showed less anticipatory muscle activity than normal controls, leading to larger center of pressure (COP) displacement during the balancing recovery stage [16].As a mechanical perturbation, the SOT protocol in computerized dynamic posturography is clinically the most widely used balance assessment method [11].SOT protocol quantifies the contribution of the residual sensory system following perturbed or removed sensory inputs due to anterior and posterior movements of the visual surround or force plate.Laufer et al. [17] evaluated the role of visual and somatosensory inputs in maintaining standing balance in hemiplegic patients within two months of stroke.The SOT protocol was used to confirm the increased dependence of visual system in maintaining balance in the patient group compared with the normal control group [17].Jayakaran et al. [18] compared the postural control ability of patients with dysvascular transtibial amputation and traumatic transtibial amputation and normal patients with or without a dysvascular condition utilizing SOT protocol.In #3 and #4 conditions of SOT, both the dysvascular transtibial amputation and traumatic transtibial amputation groups experienced higher anterior/posterior sway than the adult group without dysvascular conditions [18].More recently, the correlation between postural instability and autonomic dysfunction was evaluated in patients with early Parkinson's disease [19].The SOT protocol of the computerized dynamic posturography was used in case of postural instability, combined with testing for autonomic function test and heart rate variability.As a result, it was confirmed that postural instability in patients with early Parkinson's disease was strongly correlated with dysfunction of the parasympathetic autonomic nervous system.Additionally, various studies used SOT protocol to evaluate postural control in patients with dizziness based on the presence of abnormal vestibular or somatosensory system [20], concussion [21], stroke [22], and Parkinson's disease [23].However, as mentioned above, there is a limitation in that the size of the computerized dynamic posturography is relatively large.Also, the role of the integrated sensory system cannot be quantified by evaluating posture control through artificial manipulation of each sensory system.
In contrast, NPT is used to evaluate the COP trajectories in a standing posture without any external artificial perturbation [24], which can be measured with a relatively simple and small pressure sensing device.COP trajectory is a variable that reflects body motion to maintain balance [25], and time series variables such as moving area and speed of COP trajectories can be utilized to evaluate the balancing ability.Furthermore, specific frequency bands of COP signals are closely related to the sub-structure of the sensory system [26].Bizid et al. [27] evaluated the effects of voluntary muscular contraction and fatigue induced by electrical stimulation on postural control.The COP frequency was evaluated by dividing into low (0 -0.5 Hz, visuo-vestibular system), medium (0.5 -2 Hz, cerebellar regulation), and high frequency (2 Hz or more, somatosensory system) bands, and it was confirmed that muscle fatigue affects the somatosensory system in postural control [27].Suarez et al. [28] utilized COP frequencies to evaluate posture control in patients with hearing loss carrying cochlear implants.In their work, COP frequencies were divided into bands 1 (0-0.1 Hz), 2 (0.1-0.78 Hz), and 3 (0.78-25 Hz).Bands 2 and 3 were related to the vestibular system [28].More recently, the COP frequency bands related to the sensory system for evaluating balance in patients with scoliosis were divided into 0-0.1 Hz, 0.1-0.5 Hz, and 0.5-1.0Hz [13].Scoliosis increased the energy rates of medium and high frequency bands compared with the control group, suggesting that scoliosis caused by morphological changes in the whole spine can be treated with compensatory balancing strategies mediated via vestibular and somatosensory systems.The contribution of the sensory system was analyzed in four different bands (moderate: 1.56-6.25 Hz, low: 0.39-1.56Hz, very low:0.1-0.39Hz, and ultralow: less than 0.10 Hz frequencies) in another study [25].Since frequency bands used by different studies might overlap or differ with high variability in subjects and experimental conditions, the ability of the sensory system to control posture might not be accurately determined [14], [29].

III. ROLE OF DEEP LEARNING AND
OBJECTIVES OF THIS STUDY While existing studies have the presence of specific COP frequency bands for visual, vestibular, and somatosensory systems, clear band distinction is a challenge.Although the underlying factors are unclear, the range of bands affecting each sensory system is wide and reflects various diseases related to dizziness or subject characteristics.A machine learning model can be used to distinguish the specific frequency bands related to each sensory input due to ambiguous boundaries.Machine learning model provides an effective solution by modeling and training various complex COP frequencies and their effects on each sensory system as inputs and outputs, respectively.As a representative machine learning model, artificial neural network refers to a distributed adaptive computational system that utilizes interconnected neurons similar to the neural network structure of a human [30], [31].Furthermore, as a type of specialized neural network, a convolutional neural network (CNN) has been used in various applications [32], [33].CNN automatically extracts discriminative feature points composed of multiple convolutional and subsampling layers, and shows superior performance compared with conventional single-layer neural networks [34].It is less cumbersome to learn key features from raw data via multiple layers of the model and thus extract input variables manually, which improves learning accuracy [35].
Therefore, in this study, we proposed a novel protocol to evaluate the capabilities of the sensory system by utilizing COP signals during a quiet standing posture.We compared four typical CNN model architectures (GoogleNet, ResNet-18, SqueezeNet, and VGG16) using the frequency of COP signals in standing posture to estimate the ES, a sensory system variable in the SOT protocol.To improve the performance of the CNN models, the encoding and decoding technique that trains and tests by grouping the model output into subclasses was included.Additionally, the clinical application of the model in patients with diseases associated with peripheral vertigo, Meniere' disease, and vestibular neuritis was evaluated.

A. Experiments
1) Participants: This study involved normal controls and patients who underwent dynamic posture tests at Korea University Anam Hospital from January 2018 to December 2020.Patients with established peripheral vestibular disorders such as Meniere's disease and vestibular neuritis were selected.Among the total patients, those with musculoskeletal disorders and visual impairments, and children who encountered difficulty standing for more than 20 seconds were excluded.Patients who had to rely on a harness for support due to loss of balance during the experiment were also excluded.Finally, a total of 302 patients participated in this study.Seventy-two patients with Meniere's disease (age: 15∼87 years; height: 143.3∼181.2cm) based on the 2016 American Society of Otolaryngology (AAO-HNS) diagnostic criteria for probable and definitive Meniere's disease were selected [36].Another 105 patients were selected for the vestibular neuritis group (age: 22∼83 years; height: 147.5∼184.3cm) with acute onset (within one day) symptoms along with sustained vertigo, and unidirectional horizontal spontaneous nystagmus aggravated by the head-shaking test and positive unilateral Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
bedside head impulse test.Patients with vestibular neuritis showed normal results in other auditory and brain MRI tests.This study also included 125 normal controls (age: 18∼80 years; height: 144.2∼184.3cm) who sought medical attention for dizziness but were diagnosed by clinicians as having no abnormalities in the visual, somatosensory, or vestibular systems.The research protocol was approved by the Clinical Research Review Committee of Korea University Anam Hospital (IRB No.2018AN0297) and complied with the tenets of the Declaration of Helsinki.
2) Apparatus and Experimental Procedures: As a computerized dynamic posturography (Fig. 1), an EquiTest System (Neurocom International Inc., Clackamas, OR, USA) developed by Nashner and Peters in 1985 was used in this study.This system is now most commonly used in hospitals [37].The test equipment consists of a movable visual surround to stimulate vision, a fixed frame and harness set for preventing injury when falling during examination, a dual force plate that moves forward and backward to measure reaction force, and a computer with Windows operating system that analyzes integrated data.
For each patient, SOT was conducted to evaluate the sensory system via three trials, with 20 seconds each under six testing conditions while standing on a plate (Fig. 2).The six testing conditions are as follows: 1) standing with eyes open while the force plate and visual surround are fixed; 2) standing with eyes closed while the force plate and the visual surround are fixed; 3) standing with eyes open when the force plate is fixed but the visual surround rotates; 4) standing with eyes open when the visual surround is fixed while the force plate is rotated up and down; 5) standing with eyes closed when the visual surround is fixed and the force plate moves up and down; and 6) standing with eyes open when both the visual surround and the force plate are rotating.Here, up and down rotations of force plate and visual surround were calculated based on forward and backward movements of the center of gravity of the human body.Consequently, conditions #2 and #5 determined the role of the somatosensory or vestibular system in maintaining balance under closed eye state while conditions #4, #5, and #6 inhibited the function of the somatosensory system due to rotation of the force plate.In addition, it was possible to examine whether the effect of inappropriate vision was suppressed due to the rotation of the visual surround under conditions #3 and #6.
3) Equilibrium Score and Sensory Analysis: The ES was measured based on the rotation angle (θ ) of the center of gravity of a human under six test conditions of SOT.While standing, the sway angle at the center of gravity was up to 12.5 • (8.0 • forward and 4.5 • backward).The ES is calculated by normalizing the maximum angle to 100% (Eq.2).Thus, if the angle was 0 • , the score was 100 in the most stable state.If the angle was 12.5 • , the score was 0 since the state was right before falling.
Equillibrium scor e (E S) = 12.5 − (θ max − θ min ) 12.5 × 100 Based on the six ESs, we calculated the ratio of visual, vestibular, and somatosensory systems, and visual preference contributing to posture control [38].The ratio of somatosensory system was the ES under condition #2 divided by the ES under condition #1, which indicated postural stability when the visual system was removed.The ratio of visual system was calculated by dividing the ES in condition #4 by the ES in condition #1, which was referred to as postural stability when the somatosensory system changed due to the movement of the force plate.The ratio of vestibular system was calculated by dividing the ES in condition #5 by the ES in condition #1, which suggested postural stability when visual and somatosensory inputs were removed and changed, respectively.Finally, the ratio of visual preference was calculated by dividing the total ES under conditions #3 and #6 by the total ES under conditions #2 and #5, to measure the degree of excessive dependence on visual information.

B. Proposed Evaluation Protocol for Postural Control
The estimation of ES (conditions #2 to #6) using COP trajectories under open eye condition in a quiet standing posture is shown in Fig. 3.The input matrix of the deep learning model was created via preprocessing and frequency transformation of COP signals in the anterior/posterior direction during standing posture.The output vector is defined as a sub-class by encoding the ES derived from SOT.These inputs and outputs are used to train a CNN deep learning model.The final ES calculations and sensory analysis are performed by decoding the model output.
1) Data Preprocessing (Model Input): We eliminated linear trend lines of the COP trajectory before and after condition #1 for normalization of the position of subject's ground reaction force.Next, to eliminate noise, we applied low-pass filters for COP trajectories with removed trend lines.The fourth Butterworth filter was applied with a cut-off frequency of 5 Hz.The cut-off frequency was verified using a power spectral density test [38].Subsequently, the data were augmented to enhance the performance of the learning model.As an augmentation technique, jittering and scaling usually utilized in time series data were applied for augmentation within -20 % to 20 % of the raw COP signals.Processed COP data were converted into frequency domains via a Short-time Fourier Transform (STFT) process.Fourier transform has a limitation in that it cannot account for frequency variation over time.Therefore, we used STFT to observe changes in frequency over time by applying a moving window filter as shown in Eq. 1: where h(n) denotes a Hamming window function, and n and w represent time and frequency measurements, respectively.Test condition #1 data were obtained at 100 Hz for 20 seconds with a discrete interval of 0.01 seconds.In response, windows were divided into 50 data sections (0.5 seconds each), allowing 50 % data overlap.Based on STFT results, we calculated the power spectral density of the spectrogram.used in this study (Fig. 4).These models were chosen from the most popular architectures reported in previous studies [39], [40].
To improve the performance of the CNN model, optimal hyperparameters were extracted using Bayesian optimization technique.The objective function was determined by the classification error, and five parameters (Initial learning rate, L2 regularization value, optimizer type, and mini batch size, maximum epochs value) were selected as design variables.Optimization was repeated 30 times, and the variables when the error was minimized were selected as the final hyperparameters of the models.Table I shows the data types, ranges, and the final values of hyperparameters.
A total of 604 raw experimental data points were collected from 302 participants, each performing two trials.These data were randomly divided into training and test datasets in a 50:50 ratio.This process was repeated five times using the five-fold cross validation technique.Augmentation techniques were primarily applied to the training dataset, and model testing utilized only the SOT experimental data [41]

C. Performance Measures
Accuracy and f-1 score parameters were used to compare CNN without output encoding models (GoogleNet, ResNet-18, SqueezeNet, and VGG16 architectures).Additionally, absolute difference between actual ES values from SOT protocol and the values predicted from CNN without output encoding models was employed.Afterwards, the CNN architecture with the highest performance based on f-1 score was selected.The confusion matrices were derived from each encoding model applying the selected CNN architecture, and performance comparison was conducted using absolute difference between actual ES value and the decoded ES values predicted by the encoding models.The decoding values were set to the midpoint of the predicted encoding model class.For instance, in 10 sub-classes, the predicted ES values for the 1st through 5th classes would be set to 5, 15, 25, 35, and 45 scores, respectively.Finally, the sensory contributions calculated from the measured ES and estimated from the model were compared using a paired t-test.The Statistical Package for the Social Sciences was employed for all statistics (SPSS, v.18.0.0), and the significance level was set at p<0.05.

V. RESULT A. Model Input
Fig. 5 presents anterior/posterior COP and medial/lateral displacement under six SOT conditions.Representative test results of a normal person showed varying displacement patterns under each condition.In addition, both anterior/posterior and medial/lateral directions showed the magnitude and fluctuation of displacement, which were increased under conditions #4-#6 than in conditions #1-#3.
A representative spectrogram of anterior/posterior direction of COP time series data for each group (normal, Meniere's disease, and vestibular neuritis) is presented in Fig. 6.Based on low-pass filter cut-off frequency, the concentrated power spectral density shown in frequency bands was smaller than 5 Hz.In all groups, the increase in power spectral density was less than approximately 3 Hz when the conditions increased from 1 to 5 or 6.Based on group, the power  spectral densities of patients with Meniere's disease and vestibular neuritis were higher than those of the normal group.In particular, the difference was significant under conditions #4-#6 (perturbation of somatosensory system).However, there was no significant difference in power spectrum density value or distribution by disease (Meniere's disease vs. vestibular neuritis).

B. Performance of CNN Model Architectures
The GoogleNet, ResNet-18, SqueezeNet and VGG16 CNN architectures were trained within a total of 25 epochs.The validation accuracy in all four CNN architectures increased and the loss decreased as the epoch increased (Fig. 7).Specifically, the ResNet-18 model demonstrated a clear trend of markedly improving accuracy and decreasing loss at the initial training stage.It reached a steady state with almost no change in value until saturation.Additionally, the ResNet-18 showed the highest accuracy in epoch 25, followed by VGG16, SqueezeNet, and GooggleNet with the lowest validation accuracy.
The scatter plots used to evaluate the performance of the CNN model with four architectures are presented in Fig. 8.The representative comparison between the measured and predicted ES values in SOT condition #5 is displayed.Each predicted value was normalized to a percentage and displayed in intense blue color.Compared with GoogleNet and SqueezeNet, the blue intensity of the diagonal elements was stronger in the ResNet-18 and VGG16 architectures, and   the misclassification excluding the diagonal elements showed a relatively small distribution.
Fig. 9 presents the performance index of each architecture of the CNN model calculated via 5-fold cross-validation.The ResNet-18 model showed the highest indicators except for specificity with accuracy (76.0 ± 1.0%) and F-1 score (76.0 ± 1.1%).Comparing the ES measured in SOT with the predicted ES based on CNN models, the ResNet-18 score of about 2.9 ± 1.3 was better than that of GoogleNet (5.9 ± 2.8), SqueezeNet (6.3 ± 2.8), and VGG16 (4.7 ± 3.5).Therefore, the performances of ResNet-18 in the CNN model were the best among the four different architectures.

C. Effects of Output Encoding on CNN Model Performance
The encoded label in conditions #2-#6 using power spectral density of COP trajectories in condition #1 was evaluated using the confusion matrix.Fig. 10 (A) to (D) display the confusion matrices of ResNet-18 CNN model composed of 5, 10, 20, and 50 sub-classes encoded using ES 100 points.The average f-1 scores for models with 5, 10, 20, and 50 subclasses were 84, 92, 97, and 93%, respectively.The blue intensity of diagonal elements in confusion matrices of models was prominent in all conditions except condition #2 of the model with 5 sub-classes encoded output.The f-1 score of the model composed of the encoded 20 sub-classes for each condition ranges between 95 and 98%, and the highest f-1 score was in condition #2 (Fig. 10 (C)).Also, the distribution of estimation labels deviating from the true label tended to increase in conditions #2 to #6.However, the blue intensity of diagonal elements in confusion matrix appeared to be prominent in all conditions, indicating satisfactory testing of the training model.Table II lists the absolute differences between the measured ES and the estimated ES for each patient group.Overall absolute error was between 1.1 and 2.2.The error tended to increase for conditions #2 to #5 and #6.However, absolute errors were found to be similar across groups (normal, 1.7; Meniere's disease, 1.7; and vestibular neuritis, 1.6).
Results of calculating capabilities of each sensory system using measured and estimated ESs from the model are presented in Fig. 12.The measured values of overall data were 96 ± 5.7, 83.4 ± 11.4, 63.4 ± 14.3, and 94.2 ± 7.9 for somatosensory system, visual system, vestibular system, and visual preference, respectively.The error of each sensory system performance using the estimated ES values in the model were around 1% on average without showing statistically significant difference between the two groups for each sensory system (p > 0.05).

VI. DISCUSSION
The specific frequency band of the COP signal measured in the standing posture has been found to be closely related to sensory inputs for postural stability.In this study, a deep learning model was proposed to estimate the contribution of sensory input signals for visual, vestibular, and somatosensory systems and visual preference using the power spectrum density information of the COP signal.In particular, a novel strategy was suggested to encode the output of the training model into sub-classes by compression, leading to improved performance.The results of this study could contribute to the simplification of medical device complexity and reduction of processing time for the diagnosis of dizziness, contributing to the universalization of the diagnosis and rehabilitation.
Meniere's disease and vestibular neuritis are the most common peripheral vestibular disorders along with benign paroxysmal positional vertigo.Meniere's disease is known to be caused by endolymphatic hydrops of the inner ear.Clinically, vertigo attacks occur due to vestibular excitation, and persistent condition results in vestibular dysfunction.Vestibular dysfunction in Meniere's disease has been reported in various ways over a long period of time (35∼50% decrement over 5 to 10 years) [43].Vestibular neuritis is a disease characterized by acute unilateral loss of vestibular function.Viral infection, and vascular and immunological factors are thought to be the cause of this disease.In most cases of vestibular neuritis, severe vestibular dysfunction occurs in the acute phase, with varying manifestations, such as continued decline in vestibular function, partial improvement, or complete recovery during the recovery phase [43].The deep learning algorithm in this study can be used to estimate the contributions of vision, vestibular, and somatosensory systems and visual preference in postural control.Specifically, our model provides robust training by using three different types of patient data (normal control vs. Meniere's disease vs. vestibular neuritis groups).
During SOT, frequency power spectral densities of COP signals in patients with vestibular disorders (Meniere's and vestibular neuritis) differed from those of the normal group (Fig. 6).In the normal group, the more difficult the condition, the greater the density in the low frequency band and the more sway compared with condition #1 as the remaining sensory system must be controlled when the visual or somatosensory system was eliminated or disturbed (Fig. 5).This result was consistent with previous studies showing a larger power spectrum under increasingly difficult SOT conditions [38].This also corresponded to a previous study showing that perturbed sensory systems increased the energy usage by the rest of sensory system to maintain a balanced posture control [13].Even in patients with Meniere's disease and vestibular neuritis, the more difficult the condition, the greater the power density in the low frequency band.Compared with the normal group, the power density was more concentrated in conditions #4-#6 than in conditions #1-#2, indicating that the vestibular system did not adequately contribute to posture control.SOT test conditions #4-#6 required control of visual and vestibular systems or only the vestibular system due to abnormal somatosensory system.In particular, condition #5 and #6 required an important role of the vestibular system because elimination or disruption of the visual system (Fig. 2) resulted in concentrated power density in the low frequency band.
The structure of the proposed prediction model in this work differed from that of a general machine learning model.CNN models are deep learning networks generally used in other studies.However, the application of classifiers by coding/decoding outputs was new in our study.The ES value used to estimate in this work yielded scores ranging between 1 and 100.Thus, a classifier or regression for the last layer of deep learning was needed.Classifier is typically used to determine the type or presence of output [33].For a regressor, it is mainly used to estimate time series data [44], [45].Correctly classifying 100 ES classes can be very challenging.Therefore, in this work, a novel architecture was proposed to improve performance by encoding the 100-point ES in units of 2, 5, 10, and 20 points, and the final five-point interval had the highest performance (Fig. 10 & 11).Of course, the five-point interval encoding has an inherent error of up to two points within the class.However, reducing the number of classes dramatically improved the model's performance.As a result of the training, the final ES estimation error was 1.1 -2.2, similar to the maximum error of two points in the class, meaning that the performance of the trained model was maximized.Based on the trial-and-error results in this study, it is evident that a study to optimize the number of classes through manifolding or clustering will be necessary.
This study has several limitations.First, only data of patients with peripheral vertigo (Meniere's disease and vestibular neuritis) were used in the training model to develop a new method for the evaluation of sensory system.In addition to diseases investigated in this study, other diseases also show vertigo seizures, such as benign paraplegia [46] and delayed lymphoma [47].However, this work proposed a new protocol and a deep learning model for diagnosis of sensory system performance contributing to posture control.In the future, various diseases associated with dizziness should be included to enhance the model value.The training was limited by the sensory system evaluation based on ES values of SOT.In fact, ES was the result of normalizing the maximum front/posterior sway based on experimental data suggesting the anterior/posterior human sway within 12.5 • .Therefore, we only utilized the point-in-time data where maximum/minimum values occurred.Thus, there was a limitation in that we could not represent all maximum before/after sway values that differed from person to person.In the future, analysis of additional parameters such as PSI index [48] and entropy [49] is needed to further validate the proposed protocol.Finally, there are limitations on the aspect of deep learning models.Despite utilizing Bayesian optimization techniques for hyperparameter tuning, there are constraints such as high computational costs and the early restriction of the acquisition function, which may lead to overlooking important features [50].Additionally, only limited deep learning architectures (GoogleNet, ResNet-18, SqueezeNet and VGG16) were employed in this study.Due to the rapid advancements in the field of machine learning, recent powerful deep learning architectures such as attention-56 are being introduced.In the future, it is essential to conduct studies aimed at maximizing performance by leveraging stateof-the-art deep learning models and robustly validating them using external datasets.
In this work, we proposed a novel protocol to evaluate the capability of the human sensory system contributing to posture control using human COP signals in a standing posture alone.We constructed CNN model with power spectrum density as input and ES value extracted from SOT as output through frequency transform of COP signals.Four different types of CNN backbone architecture including GoogleNet, ResNet-18, SqueezeNet, and VGG16 were used in this study.Additionally, the 100 original output classes (1 to 100 ESs) were encoded into 50, 20, 10 and 5 sub-classes to improve the performance of the prediction model.As results, absolute difference between the measured and predicted ES ranged between 1.1 and 2.2 (ResNet-18 with encoding of 20 subclasses model), resulting in an error of approximately 1.0%.The results suggest that the sensory system contribution of patients with dizziness can be quantitatively assessed using only the COP signal from a single test of standing posture.This study has potential to reduce balance testing time (spent on six conditions with three trials each in sensory organization test) and the size of computerized dynamic posturography (movable visual surround and force plate), leading to further widespread utilization of balance assessment strategies.

Fig. 3 .
Fig. 3.Overall methodological protocol used to evaluate sensory system capabilities based on raw COP signals during quiet standing.

2 )
Encoding of Equilibrium Scores (Model Output): We performed an encoding process for classification of ES (conditions #2 to #6) for use in the output of the learning model.Since the ES ranges from 1 to 100, a method to improve the performance of the prediction model by reducing the class was proposed.In this study, the 100 original classes (1 to 100 ESs) were encoded into 50, 20, 10, and 5 subclasses.In case of encoding into 20 sub-classes, each class was distributed into 5 scores (ES from 1 to 5 as class 1, ES from 6 to 10 as class 2, . . ., and ES 96 to 100 as class 20).In encoding into 10 sub-classes, each class was divided into 10 scores (ES from 1 to 10 as class 1, ES from 11 to 20 as class 2, . . ., and ES 91 to 100 as class 10).3) Convolutional Neural Network Architectures: The power spectral density of spectrogram during quiet standing (SOT condition #1) and encoded classes under conditions #2 to #6 were used as input and output in CNN training, respectively.As inputs in deep learning model, spectrograms were converted to RGB three-channel images.The images were resized to obtain a resolution of 224 by 224.Four different types of CNN backbone architecture including GoogleNet, ResNet-18, SqueezeNet, and VGG16 were and encoded classes, we adjusted the weights in the loss function based on the square root of class frequency[42].The weights (w) used in the loss function are as follows: w c = 1 √ n where, c and n represent the encoded class number and the number of samples for that class, respectively.For each crossvalidation, training was carried out by assigning different weight values based on the number of each class in the training set.Classes estimated in our proposed model and those converted to ESs measured via computerized dynamic posturography devices were evaluated comparatively using a confusion matrix.Matlab software (Ver.2020b, MathWorks Inc., Natick, MA, USA) was used for training, validation, testing, and optimization of CNN models with a processor of RTX2080Ti GPU with 4352 CUDA cores, 1665 MHz base clock speed, and 11 GB RAM.

Fig. 5 .
Fig. 5.The anterior/posterior and medial/lateral center of pressure (COP) displacement in the six conditions used for the sensory organization test.

Fig. 6 .
Fig. 6.A representative spectrogram in the anterior/posterior direction of COP trajectory based on SOT involving normal controls, and patients with Meniere's disease and vestibular neuritis.

Fig. 11 .
Fig. 11.Effect of different sub-classes encoded output on the performance of ResNet-18 CNN model.

Fig. 11
shows the absolute difference values of the ResNet-18 model composed of the output extracted by encoding 100 ES points into 5, 10, 20, and 50 sub-classes.Overall, the estimation error (the absolute difference in the ESs measured with SOT and predicted by CNN models) in condition 2 was the lowest with 2.2 ± 0.8, and the errors in conditions #4, #5, and #6 were similar to the mean 3.3, 3.5, and 3.5, respectively.In addition, the ResNet-18Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
model estimated by encoding 20 sub-classes yields the lowest absolute difference values in all conditions compared with those of the model with 5, 10, and 50 sub-classes.In the conditions ranging from #4 to #6, the absolute difference values in estimation decreased as the number of sub-classes increased.The values reached a minimum in the model with 20 sub-classes, followed by an increase.

TABLE I HYPERPARAMETER
TYPES, RANGES AND THE SELECTED VALUE IN RESNET-18 CNN MODEL . A total of 2,718 STFT spectrogram transformed data comprising both four-level augmented (2,416 datasets) and raw experimental (302 datasets) datasets were used for training of the models.To address the issue of non-uniform distribution of ES scores Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II ABSOLUTE
DIFFERENCE BETWEEN ESS MEASURED WITH SOT IN COMPUTERIZED DYNAMIC POSTUROGRAPHY AND PREDICTEDWITH THE DEEP LEARNING MODEL Fig.12.contribution of each sensory system is calculated as a analysis score using measured and predicted ESs from our proposed model.