External lighting and sensing photoglottography: Characterization and MSePGG algorithm
Introduction
Observation, and further measurement, of the glottal area between the moving vocal folds during breathing, speech production or swallowing, has been for long a major technological challenge. Since Garcia's pioneer experiments using mirrors [1], several different techniques have been developed and optimized. Video recordings using an endoscope coupled with a stroboscopic light or a high speed camera has become a very popular technique despite of the costs of the equipment and the need for extensive post-processing in the case of high-speed recordings [2], [3], [4], [5], [6]. This technique might cause discomfort and is invasive due to the insertion of optic devices (through the oral cavity in the case of a rigid endoscope, or through the nasal cavity in the case of a flexible endoscope), and thus a medical environment is required and pronunciation of certain phonemes can be hindered or inhibited. Quantitative area extraction from endoscopic images remains challenging even when stereo-endoscopy or additional devices are used [3], [7], [8], [9], also being due to the trade-off between spatial and temporal resolution for image acquisition among others. Non-invasive alternatives are very few [10]. Ultrasound techniques have been tested but lack of spatial resolution [11], [12], [13], and therefore ultrasound-based imaging is mostly used for innocuous visualization only [14], [15], [16], [17], [18].
PhotoGlottoGraphy (PGG) [10], [19] consists of devices for illuminating the glottis and measuring the amount of light that passes between the vocal folds. In its original development, PGG is an invasive technique as it requires the insertion of a light source or of a light sensor through the oral or nasal cavity. In contrast, External PhotoGlottoGraphy (ePGG) [20], [21], [22] is a non-invasive technique, both light source and sensor being placed outside of the vocal tract on the exterior of the neck (Fig. 1). Another difference with the classical PGG is the use of a lightning in the near infrared (IR) instead of visible light. Indeed, IR wavelengths in this range 700–1000 nm are reported to transilluminate large sections of human tissue [23], [24], [25]. Given the non-invasive nature of ePGG, this system no longer requires a medical environment and allows to make continuous measurements with as less disturbance as possible, e.g. during speech production. Consequently, if a relationship between measured ePGG signals and glottal area variation Ag(t) can be established, ePGG is suitable to observe variations of the glottal area non-invasively and continuously regardless of location (medical practice, laboratory, field, …) which makes it an interesting technique for many disciplines. Therefore, the aim of this work is to investigate and formalize the relationship between ePGG measurements and glottal area.
In Section (2), the ePGG system is detailed. Next, mechanical replicas and setups are presented (Section 3) and used to experimentally characterize (Section 4) the relationship between varying glottal area and ePGG signal under controlled conditions. From this characterization, a Multi-Signal-ePGG (MSePGG) model and parameters estimation procedure is proposed (Section 5). MSePGG estimated glottal areas are then validated (Section 6) on a deformable glottal replica and its application to a human subject is discussed. The general discussion and conclusion is formulated in Sections 7 and 8.
Section snippets
ePGG measurement system
The ePGG system [20], [21], [22] consists of two main elements (Fig. 1): a light source (infrared LED, LSF812N1, wavelength 810 nm, size ≤5 mm, beam angle 45 ± 5°) and a single light sensor (photo-diode, Vishay Semiconductors BP104, peak sensitivity at wavelength 950 nm, size ≤3 mm) placed in a holder. Electrical ePGG signals (between 0 V and 5 V) are acquired using a data acquisition card (Data Translation, 16 bit) and software (QuickDaq 7.8.10). In addition, the ePGG signal is amplified
Mechanical replicas and setups
To fully assess the potential of ePGG as a non-invasive measurement of glottal area Ag, the relationship between ePGG and Ag needs to be studied quantitatively as a function of parameters potentially affecting the ePGG signal. Therefore, mechanical replicas of laryngeal airway portions are mounted to an experimental setup developed to control and measure physical quantities in a reproducible and accurate way. An overview of variables of interest and their order of magnitudes on human adult
ePGG signal characterization
The ePGG system (Section 2) is assessed on the mechanical replicas (Section 3). Since experimental setups are equipped to measure the glottal area, the relationship between ePGG signal and glottal area can be systematically studied on these replicas as a function of parameters potentially affecting the ePGG signal (Fig. 1). In the following, the experimental ePGG signal characterization is presented firstly for static geometrical configurations with constant glottal area (Section 4.1) and
Multi-signal-ePGG (MSePGG)
In Section 4, it was shown that the ePGG signal is mainly determined by (1) the source-sensor distance, (2) the minimum area of the channel portion between the source and sensor and 3) the measurement condition determined by the combination of wall properties (e.g. absorption), environment (e.g. light) and ePGG system settings (e.g. amplification outlined in Section 2) and positioning (e.g. orientation angle). In the following, a Multi-Signal-ePGG (MSePGG) approach is proposed accounting for
MSePGG results
MSePGG outlined in Section 5 is applied following the workflow shown in Fig. 10. Measurements on the deformable mechanical replica (Section 6.1) and on a human speaker (Section 6.2) are assessed.
Discussion
Results shown in Section 6 illustrate that MSePGG provides an estimation of the time-varying minimum area on a mechanical replica and on a human speaker following the workflow summarized in Section 5.4. The MSePGG algorithm and workflow provides an elegant and innocuous method relying on 3 parameters to be estimated simultaneously. The MSePGG algorithm avoids dealing with the complexity of the composing tissue layers and anatomy. This way some restrictions related to the use of other techniques
Conclusion
Following characterization of ePGG measurements on mechanical replicas, the MSePGG algorithm and workflow is proposed in order to provide a quantitative estimation of the time-varying glottal area following a brief calibration protocol exploiting several source-sensor distances. The good quantitative agreement obtained on mechanical replicas (mean error 5.4%) and preliminary observations on a human subject (estimations within 12%) suggests that MSePGG is a promising technique to estimate the
Acknowledgements
Partly funded by ArtSpeech project (ANR-15-CE23-0024). Human ePGG data registration was approved by ethics committee 1922081 (dated 02/02/2016). Thanks to D. Sathiyanarayanan for his contribution to measurements on a human speaker.
References (36)
- et al.
Calibration of laryngeal endoscopic high-speed image sequences by an automated detection of parallel laser line projections
Med. Image Anal.
(2008) - et al.
Comparison of an audio-based and a video-based approach for detecting diplophonia
Biomed. Signal Process.
(2017) Editorial: translarygeal vocal cord ultrasound: ready for prime time
Surgery
(2016)- et al.
Mucosal wave measurement and visualization techniques
J. Voice
(2011) - et al.
Anatomical factors affecting the use of ultrasound to predict vocal fold motion: a pilot study
Am. J. Otolaryngol.
(2018) - et al.
Realistic glottal motion and airflow rate during human breathing
Med. Eng. Phys.
(2015) Observations on the human voice
Proc. R. Soc. Lond.
(1855)- et al.
Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2-d diagrams for visualizing and analyzing the underlying laryngeal dynamics
IEEE Trans. Med. Imaging
(2008) - et al.
State of the art laryngeal imaging: research and clinical implications
Curr. Opin. Otolaryngol. Head Neck Surg.
(2011) - et al.
Tracking of multiple fundamental frequencies in diplophonic voices
IEEE Trans. Audio Speech Lang. Process.
(2018)
Stereo-fiberscopic measurement of the larynx: a preliminary experiment by use of ordinary laryngeal fiberscopes
Ann. Bull. RILP
Measurements of the vocal fold length by use of stereoendocope – a preliminary study
Ann. Bull. RILP
Estimation of glottal area function using stereo-endoscopic high-speed digital imaging
Proc. Interspeech
Clinical Measurement of Speech and Voice
Ultrasonic recording of the vibrating vocal folds
Arch. Otolaryngol.
Transmission of ultrasound through the larynx as a mean of determining vocal-fold activity
IEEE Trans. Biomed. Eng.
Ultrasonic observation of vocal folds vibration
Vocal Folds Physiology
Role of ultrasound in the assessment of vocal cord function in infants and children
Ann. Otol. Rhinol. Laryngol.
Cited by (2)
Imaging of auto-oscillating vocal folds replicas with left–right level difference due to angular asymmetry
2021, Biomedical Signal Processing and ControlCitation Excerpt :Clinical UVFP examinations often reveal an air escape due to left–right VF asymmetries with respect to VF’s shape, tension and positioning [1,4,5]. Despite the continuous advancement of measurement techniques [6–9] quantitative accurate in-vivo clinical data assessment on human speakers remains tedious, which hampers a systematic assessment of the influence of these asymmetries on voice properties. As a result, a consensus concerning the definition, diagnosis and hence treatment of UVFP remains yet to be achieved [3].
Portable Photoglottography for Monitoring Vocal Fold Vibrations in Speech Production
2021, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings