Acoustic to kinematic projection in Parkinson’s disease dysarthria

https://doi.org/10.1016/j.bspc.2021.102422Get rights and content

Highlights

  • An acoustic-to-kinematic model to project acoustic dynamics to jaw-tongue kinematics has been proposed.

  • A weight estimation method based on gradient-descent has been used in reducing estimation errors.

  • A weight optimization method based on signal realignment has been defined.

  • Time delays from male and female Parkinson’s Disease patients have been estimated.

  • Large and small movement ranges in hypokinetic dysarthria have been obtained from acoustic signals.

Abstract

Speech signal analysis is a powerful tool that facilitates the monitoring and tracking of symptom deterioration caused by neurodegenerative disorders, typically achieved using either sustained vowels, diadochokinetic exercises or running speech. This study expands our previous work on the study of the movement produced by the jaw-tongue biomechanical system. The aim is to further investigate the effects of neuromotor activity during muscular exertion that translates formant acoustics into speech articulatory movements affected by hypokinetic dysarthria in Parkinson’s Disease (PD). The objective of this study is to estimate the parameters of an inverse acoustic-to-kinematic projection model that takes as an input the variations of the first and second formants and estimates as output the spatial variation of the jaw-tongue biomechanical system. The spatial variations have been extracted from 3D accelerometry (3DAcc). These serve as ground truth for comparison with the estimated activity projected from speech kinematics, as a measure of fitness of the inverse model. The estimation method is a two step process: first initial weight values are produced using multiple regression between each of the formant dynamic signals (acoustical analysis) and the estimated spatial variations (accelerometry). The second step uses a weight refinement method based on gradient-descent. Additionally, a time-realignment study has been carried out on the acoustic-to-kinematic projection model, based on the estimation of relative time displacements as to maximize the cross-correlation between signals. The study is complemented with an estimation of the model weights on a dataset from PD participants and Healthy Controls (HC). This methodology opens up new ways to investigate the underlying physiological voice production mechanism which may offer new insights into PD symptoms.

Introduction

The influence of neurological and cognitive processes on speech is a well-established and recognized fact [[1], [2], [3]]. Many studies in the last decade have explored diverse signals such as EEG, MEG, fMRI and other non-invasive methods to provide new insights into the vocal production process [4,5]. This is of particular interest when investigating neurological diseases (cognitive and neuromotor) such as Alzheimer, PD, Amyotrophic Lateral Sclerosis (ALS), Huntington’s Chorea, and others related [6]. Speech allows the contactless remote recording on smart terminals, as phones, tablets or laptop computers. Speech offers the added value of mapping acoustic estimates to neuromuscular activity, providing an advantage in the detection and monitoring of diseases dependent on neuromotor pathway transmission remotely [7]. A comprehensive study on the effects of PD on speech [8,9] could provide insights into the underlying physiology, associating speech characteristics to the physical manifestations of the disease. This can be achieved through the study of phonation, articulation, prosody and fluency [10] which would offer valuable information on the activity of specific brain areas involved in speech production, such as motor planning, premotor and motor, and working memory. There is an unmet need to establish a robust and reliable methodology to map estimates extracted from the speech acoustics to motor actions in certain muscles involved in speech articulation and production. One such example is the masseter muscle, responsible of raising the lower jaw. Such a projection methodology is proposed in this research work to transform speech formant dynamics to articulatory kinematics [11,12]. First proposals of an inverse model (relating formant dynamics and articulation) were presented; as a result several indicators were developed to encompass articulatory movements from speech alone (e.g. Absolute Kinematic Velocity, AKV) [13,14]. The problem with these first attempts was the lack of a robust model parameter estimation. This led to further exploratory work, were the relationships between sEMG, accelerometry and Speech were investigated [15]. After an in-depth study of the affectations of PD on these biometric signals [16], the conclusions were applied to the characterization of PD hypokinetic dysarthria [17,18].

The aim of the present study is to provide an insight into acoustic-to-kinematic projection, which could eventually allow to extract and transfer acoustically relevant articulation features to neuromotor actions, to be used in the characterization and monitoring neuromotor activity in specific diseases such as PD which is used as testbed in this study, this being the objective of future research already in progress. This approach can provide new insights into the physiological voice production mechanism and tentatively assign any effects of PD on specific vocal production model components. Such a model would add new semantic value over other standard approaches in the state of the art. In this regard, the previously proposed mapping model is reformulated in terms of time and space variables to allow a dynamic description of the model coefficients to be used in further mapping processes in remotely monitoring PD. This description includes also estimations from HCs.

The paper is structured as follows. In Section 2 the acoustic-mechanical model proposed is reformulated in the time-space domain to allow a more robust estimation of its coefficients. Section 3 is devoted to describe the data acquisition framework, platform and protocols, the biometrical data of participants, and the estimation methodology for the model coefficients in the space-time domain. The results derived from the PD and HC subsets are presented and described in Section 4. Section 5 is devoted to discuss the robustness of the methodology considering the results, and to analyze their impact and limitations in possible online applications. The study’s key findings are summarized in Section 6.

Section snippets

The neuromechanical model of the lower jaw articulation

The present study is based on a simplified jaw-tongue articulation model [16] which is known to be representative of PD dysarthria [18]. It allows to create a relationship between acoustic and kinematic variables relating the first two formants F={F1, F2} to the horizontal and vertical coordinates S={xr, yr} of the joint Jaw-Tongue Reference Point (PrJT) in the sagittal plane This point represents the center of moments of the biomechanical system integrated by the maxillary bone, tongue and

Data acquisition framework

The study cohort comprises 8 PD participants (four male, four female, all Spanish native speakers, stage 2 in H&Y scale) who were recruited from a PD patient association in the metropolitan area of Madrid (Asociación de Pacientes de Parkinson de Alcorcón y Móstoles, APARKAM). For comparison purposes four male and four female healthy control participants have been included in the study. The biometrical data of participants are given in Table 1.

The study was approved by the Ethical Committee of

Data recording examples

The speech signal, the sEMG and the 3 acceleration channels from two repetitions of the […aja…] by a female HC participant (CF1) are shown in Fig. 3 as an illustrative example. The sEMG signal has been included in the plots (channel b) with the purpose of witnessing that the acceleration and speech signals are concordant with the action of the masseter.

Similarly, the same set of recordings from one of the PD female participants included in the study (PF1) are shown in Fig. 4.

The repetition of [

Discussion

In Section 3 an inverse linear model based on an acoustic to kinematic projection has been presented. This model has been validated by the results shown in Section 4, consequently the following findings may be highlighted:

  • The relationship between acoustic to kinematic variables (ΔF to ΔS) has been established and may be explained using the inverse model described in expression (1).

  • The initial estimation of the model weights has been carried out using least squares linear regression.

  • A

Conclusions

The present study has been conceived to provide further insights into the acoustic-to-kinematic model of the jaw-tongue articulation joint, based on preliminary approaches. In summary, the key findings derived from this study are:

  • An acoustic-to-kinematic model to predict the jaw-tongue joint kinematics from acoustic dynamics expressed in formants has been examined in depth, with special emphasis on weight estimation procedures.

  • A weight estimation refinement method based on an iterative gradient

CRediT authorship contribution statement

A. Gómez: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing, Visualization. A. Tsanas: Validation, Formal analysis, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition. P. Gómez: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - review & editing, Visualization, Supervision. D. Palacios-Alonso: . V. Rodellar:

Acknowledgments

This research has been funded by grants TEC2016-77791-C4-4-R (MINECO, Spain) and CENIE_TECA-PARK_55_02 INTERREG V-A Spain-Portugal (POCTEP). The authors thank Asociación de Parkinson de Alcorcón y Móstoles (APARKAM), and Azucena Balandín (director) and Zoraida Romero (speech therapist) for their help and advice.

Declaration of Competing Interest

The authors report no declarations of interest.

References (27)

  • D. Skaper et al.

    An inflammation-centric view of neurological disease: beyond the neuron

    Front. Cell. Neurosci.

    (2018)
  • S. Arora et al.

    Developing a large scale population screening tool for the assessment of Parkinson’s disease using telephone-quality speech

    J. Acoust. Soc. Am.

    (2019)
  • Y. Yunusova et al.

    Classifications of vocalic segments from articulatory kinematics: healthy controls and speakers with dysarthria

    J. Speech Lang. Hear. Res.

    (2011)
  • Cited by (0)

    View full text