A Bayesian Model for Prediction of Stroke with Voice Onset Time

The purpose of this study was to examine the changes in voice onset time (VOT) of stroke patients (elderly) and healthy elderly, and to compare them. Also, to propose a prediction model by considering speech analysis data. One hundred and fifty-three healthy elderly and fourty six stroke patients participated in this research study. Each group performed a plosive pronunciation of a Korean word 3 times, and voice signals were recorded. The speech analysis calculates probability parameters of speech signals. The parameters were mean, standard deviation, minimum value, and maximum value of the voice onset time. Finally, a Bayesian model was prepared with these parameters as inputs to predict stroke. Both groups’ speech signals were analyzed and confirmed that there were significantly different in their VOT parameters. And with the calculated probability of both stroke and healthy elderly, the Bayesian prediction model was proposed for stroke prediction. This present study shows that the proposed prediction model could assist in classifying whether the person having stroke or not through their voice onset time data.


Introduction
Stroke rated as the third leading cause of death in the developed countries [1]. To predict stroke incidence, Amini et al [1] conducted a study in 2010-2011 at Al-Zahra and Mashhad Ghaem hospital. This study considered a total of 807 subjects including healthy and sick, and the risk factors were collected based on checklist. It contains fifty risk factors for stroke including heart disease, diabetes, hypertension, etc. The collected data were analyzed using K-nearest neighbour and C4.5 decision tree. The authors concluded that both the data mining techniques (K-nearest and C4.5 decision tree) were predicting stroke with good accuracy of 95.42 % and 94.18%. Mroczek T et al [2] explained a supervised maching learning technique named New General To Specific algorithm to generate production rules from data. They applied the algorithm and extracted 84% of data from the 162 specimen data with 84.8% accuracy. Tjortjis et al [3] proposed a method to buid accurate predictive model and descriptive model for medical health record. In addition, the authors compared the various data mining methods and reported the pros and cons. The result mentioned that T3, a decision tree classifier offered an impressive performance with 0.4% classification error than the other decision tree classifiers offered 33.6%. An artificial intelligence method named "fuzzy conginitve map" used to  [4] used fuzzy cognitive maps and nonlinear Hebbian learning algorithm to predict stroke probability where they have considered various factors including blood pressure, body mass index, smoking, heart disease, blood sugar levels, and cholesterol levels. Their proposed method was compared with other maching learning algorithm like support vector machine and KNN classifiers. Min et al [5] proposed pre-diagnosis algorithm to predict stroke with modifiable risk factors. They have developed a logistic regression model for pre-diagonsis of stroke symptoms.
Stroke signs are including vision problem, dizziness, difficulty in understaning, lack of balance, trouble in walking, numbness, confusion, severe headache, and and speech difficulties. To prevent muscle fatigue, it is recommend to study of fatigue. In the same way, by studying individual signs related to stroke will give understanding on the prevention of stroke or pre-dignosis of stroke. This paper focused on one of the signs of stroke, "speech difficulties". Language disorders affects daily life practice in communicating orally. This disorders tend to recover naturally, but neurological speech impairment may occur due to partial or complex problems of the nervous system, even if recovery is achived [6]. Park et al [6] discussed about "dysphagia", is cacused by various abnormalities that can occur on the pathway from oral to gastrointestinal tracts. It is reported that 40-80% of stroke patients experiences some form of dysphagia. Also, many clincians consider the change of voice after aspiration to be one of the most common symptoms for identifying dysphagia, with as high as 80% accuracy rate, yet the evaluation method itself is somewhat subjective. A number of previous studies have attempted to come up with more concrete and objective way of demonstrating the linkage between the changes in the voice and the swallowing disorders. Likewise analysing abnormal speech can be one of the most effective ways to accurately identify stroke in emergency situations and to provide appropriate treatment. The purpose of this study was to examine the changes in voice onset time (VOT) of stroke patients (elderly) and healthy elderly, and to compare them. Also, to propose a prediction model by considering speech analysis data.

Methods
This research study considered two test group, which comprise of healthy elderly and another group is known to have stroke i., stroke patient group. Forty six stroke patients and one hundred and seventythree healthy elderly participated. Each group performed a plosive pronunciation of a Korean word 3 times, and voice signals were recorded for voice onset time (VOT). After the recording, the start time of the vocal vibration of the collected voice data is extracted and measured, and the stroke patient is identified according to a statistical analysis of the measured voice onset time. In another method, the VOT from each of the voice data stored in the voice recording unit and obtaining one or more probability parameter associated with the VOT; and a stroke determination unit that calculates a first probability of belonging in the normal population and a second probability of belonging in the stroke patient group according to an integration section defined from the parameters obtained by the voice analysis unit, then apply Baye's theorem to the first probability and the second probability to determine whether an individual is a stroke patient Fig. 1. The average of the VOT of both groups presented in Table 1.  Figure 1. Methodology adopted in the system [6,7].
The probability density function is used to draw a normal distribution curve for the VOT of each group. The probability density function is shown (Eqn. 1).

(1)
Here, "x" is a random variable for the VOT of a normal or storke patients population, and f(x) is a probability density function for the VOT of the group. "m" represents the mean value of the normal population or the stroke patient population, and "σ" represents the standard deviation of the normal population or the stroke patient population.
The Bayes theorem shows the relationship between the prior probability and the posterior probability of two random variables. In this study, both groups prior probability [P(B) for stroke patient and P(H) for normal elderly] set as 0.5. Here, P(B) is a prior probability or mariginal probability for event B in a state that has not been affected by event A. P(A|B) is a conditional probability in which event A will occur after the occurrence of event B. It is called posterior probability, because P(A|B) depends on specific information about B. P(B|A) is a conditional probability in which event B will occur after the occurrence of event A. P(B) is the prior probability or marginal probability of event B. P(B) serves to normalize the probability of P(A|B).
Our system will calculates a probability of belonging to the normal group P(H|I) within the range of the VOT through equation (Eqn. 2). (2) Here, "H" denotes a normal group amng the test group, "B" denotes a stroke patient group among the test group, and "I" denotes interval data indicating a range of the VOT. Thus, P(H) is the probability of belonging to the normal population, and P(B) is the probability of belonging to the stroke patient population. P(I) is the probability that the range of the VOT belongs. P(I|H) is the integral value of the normal population in the measurement interval data (I) of the VOT. P(I|B) is the integration value for the stroke patient population in the measurement interval data (I) of the VOT. Accordingly, our system also calculates the probability P(H|I) belonging to the normal population in the integration section. The probability analysis unit may calculate the probability P(B|I) belonging to the stroke patient group in the integration section through the following equation (3). The result of the determination can be represented by the population distributin probability. The system for determining a stroke applies the calculated probabilities to the Bayes theorem to determine whether the subject are stroke patients. The detailed explantion and examples can be found in Park et al [6,7].