External lighting and sensing photoglottography: Characterization and MSePGG algorithm

https://doi.org/10.1016/j.bspc.2019.01.014Get rights and content

Highlights

  • Countries must internationally report emissions/removals.

  • Estimates and confidence intervals alone do not characterize the quality of the information.

  • A checklist of survey-based information is proposed to reporting countries.

  • General survey information, error sources and propagated error are relevant in the list.

  • Uncertainty aversion may preclude countries from providing accurate estimates.

Abstract

Continuous observation of the time-varying glottal area lacks a direct, quantitative, non-invasive measurement method despite its relevance to study breathing, speech production, swallowing, etc. External photoglottography (ePGG) relies on external glottal transillumination and sensing, it is therefore suitable for non-invasive and continuous observation. Nevertheless, a formalized relationship between ePGG signal and glottal area is lacking. The current paper proposes a Multi-Signal-ePGG (MSePGG) algorithm approach based on characterization of ePGG measurements under controlled conditions using mechanical glottal replicas. MSePGG accounts for main parameters affecting the ePGG signal: glottal area to be quantified and measurement conditions such as tissue properties and signal amplification. It is shown that MSePGG enables quantitative and continuous measurement of the time-varying glottal area on mechanical replicas. Application to a human subject is illustrated and discussed.

Introduction

Observation, and further measurement, of the glottal area between the moving vocal folds during breathing, speech production or swallowing, has been for long a major technological challenge. Since Garcia's pioneer experiments using mirrors [1], several different techniques have been developed and optimized. Video recordings using an endoscope coupled with a stroboscopic light or a high speed camera has become a very popular technique despite of the costs of the equipment and the need for extensive post-processing in the case of high-speed recordings [2], [3], [4], [5], [6]. This technique might cause discomfort and is invasive due to the insertion of optic devices (through the oral cavity in the case of a rigid endoscope, or through the nasal cavity in the case of a flexible endoscope), and thus a medical environment is required and pronunciation of certain phonemes can be hindered or inhibited. Quantitative area extraction from endoscopic images remains challenging even when stereo-endoscopy or additional devices are used [3], [7], [8], [9], also being due to the trade-off between spatial and temporal resolution for image acquisition among others. Non-invasive alternatives are very few [10]. Ultrasound techniques have been tested but lack of spatial resolution [11], [12], [13], and therefore ultrasound-based imaging is mostly used for innocuous visualization only [14], [15], [16], [17], [18].

PhotoGlottoGraphy (PGG) [10], [19] consists of devices for illuminating the glottis and measuring the amount of light that passes between the vocal folds. In its original development, PGG is an invasive technique as it requires the insertion of a light source or of a light sensor through the oral or nasal cavity. In contrast, External PhotoGlottoGraphy (ePGG) [20], [21], [22] is a non-invasive technique, both light source and sensor being placed outside of the vocal tract on the exterior of the neck (Fig. 1). Another difference with the classical PGG is the use of a lightning in the near infrared (IR) instead of visible light. Indeed, IR wavelengths in this range 700–1000 nm are reported to transilluminate large sections of human tissue [23], [24], [25]. Given the non-invasive nature of ePGG, this system no longer requires a medical environment and allows to make continuous measurements with as less disturbance as possible, e.g. during speech production. Consequently, if a relationship between measured ePGG signals and glottal area variation Ag(t) can be established, ePGG is suitable to observe variations of the glottal area non-invasively and continuously regardless of location (medical practice, laboratory, field, …) which makes it an interesting technique for many disciplines. Therefore, the aim of this work is to investigate and formalize the relationship between ePGG measurements and glottal area.

In Section (2), the ePGG system is detailed. Next, mechanical replicas and setups are presented (Section 3) and used to experimentally characterize (Section 4) the relationship between varying glottal area and ePGG signal under controlled conditions. From this characterization, a Multi-Signal-ePGG (MSePGG) model and parameters estimation procedure is proposed (Section 5). MSePGG estimated glottal areas are then validated (Section 6) on a deformable glottal replica and its application to a human subject is discussed. The general discussion and conclusion is formulated in Sections 7 and 8.

Section snippets

ePGG measurement system

The ePGG system [20], [21], [22] consists of two main elements (Fig. 1): a light source (infrared LED, LSF812N1, wavelength 810 nm, size ≤5 mm, beam angle 45 ± 5°) and a single light sensor (photo-diode, Vishay Semiconductors BP104, peak sensitivity at wavelength 950 nm, size ≤3 mm) placed in a holder. Electrical ePGG signals (between 0 V and 5 V) are acquired using a data acquisition card (Data Translation, 16 bit) and software (QuickDaq 7.8.10). In addition, the ePGG signal is amplified

Mechanical replicas and setups

To fully assess the potential of ePGG as a non-invasive measurement of glottal area Ag, the relationship between ePGG and Ag needs to be studied quantitatively as a function of parameters potentially affecting the ePGG signal. Therefore, mechanical replicas of laryngeal airway portions are mounted to an experimental setup developed to control and measure physical quantities in a reproducible and accurate way. An overview of variables of interest and their order of magnitudes on human adult

ePGG signal characterization

The ePGG system (Section 2) is assessed on the mechanical replicas (Section 3). Since experimental setups are equipped to measure the glottal area, the relationship between ePGG signal and glottal area can be systematically studied on these replicas as a function of parameters potentially affecting the ePGG signal (Fig. 1). In the following, the experimental ePGG signal characterization is presented firstly for static geometrical configurations with constant glottal area (Section 4.1) and

Multi-signal-ePGG (MSePGG)

In Section 4, it was shown that the ePGG signal is mainly determined by (1) the source-sensor distance, (2) the minimum area of the channel portion between the source and sensor and 3) the measurement condition determined by the combination of wall properties (e.g. absorption), environment (e.g. light) and ePGG system settings (e.g. amplification outlined in Section 2) and positioning (e.g. orientation angle). In the following, a Multi-Signal-ePGG (MSePGG) approach is proposed accounting for

MSePGG results

MSePGG outlined in Section 5 is applied following the workflow shown in Fig. 10. Measurements on the deformable mechanical replica (Section 6.1) and on a human speaker (Section 6.2) are assessed.

Discussion

Results shown in Section 6 illustrate that MSePGG provides an estimation of the time-varying minimum area on a mechanical replica and on a human speaker following the workflow summarized in Section 5.4. The MSePGG algorithm and workflow provides an elegant and innocuous method relying on 3 parameters to be estimated simultaneously. The MSePGG algorithm avoids dealing with the complexity of the composing tissue layers and anatomy. This way some restrictions related to the use of other techniques

Conclusion

Following characterization of ePGG measurements on mechanical replicas, the MSePGG algorithm and workflow is proposed in order to provide a quantitative estimation of the time-varying glottal area following a brief calibration protocol exploiting several source-sensor distances. The good quantitative agreement obtained on mechanical replicas (mean error 5.4%) and preliminary observations on a human subject (estimations within 12%) suggests that MSePGG is a promising technique to estimate the

Acknowledgements

Partly funded by ArtSpeech project (ANR-15-CE23-0024). Human ePGG data registration was approved by ethics committee 1922081 (dated 02/02/2016). Thanks to D. Sathiyanarayanan for his contribution to measurements on a human speaker.

References (36)

  • M. Sawashima et al.

    Stereo-fiberscopic measurement of the larynx: a preliminary experiment by use of ordinary laryngeal fiberscopes

    Ann. Bull. RILP

    (1974)
  • M. Sawashima et al.

    Measurements of the vocal fold length by use of stereoendocope – a preliminary study

    Ann. Bull. RILP

    (1981)
  • H. Imagama et al.

    Estimation of glottal area function using stereo-endoscopic high-speed digital imaging

    Proc. Interspeech

    (2010)
  • R. Baken

    Clinical Measurement of Speech and Voice

    (1987)
  • C. Hertz et al.

    Ultrasonic recording of the vibrating vocal folds

    Arch. Otolaryngol.

    (1970)
  • S. Halmet et al.

    Transmission of ultrasound through the larynx as a mean of determining vocal-fold activity

    IEEE Trans. Biomed. Eng.

    (1972)
  • T. Kaneko et al.

    Ultrasonic observation of vocal folds vibration

    Vocal Folds Physiology

    (1981)
  • E. Friedman

    Role of ultrasound in the assessment of vocal cord function in infants and children

    Ann. Otol. Rhinol. Laryngol.

    (1997)
  • Cited by (2)

    • Imaging of auto-oscillating vocal folds replicas with left–right level difference due to angular asymmetry

      2021, Biomedical Signal Processing and Control
      Citation Excerpt :

      Clinical UVFP examinations often reveal an air escape due to left–right VF asymmetries with respect to VF’s shape, tension and positioning [1,4,5]. Despite the continuous advancement of measurement techniques [6–9] quantitative accurate in-vivo clinical data assessment on human speakers remains tedious, which hampers a systematic assessment of the influence of these asymmetries on voice properties. As a result, a consensus concerning the definition, diagnosis and hence treatment of UVFP remains yet to be achieved [3].

    • Portable Photoglottography for Monitoring Vocal Fold Vibrations in Speech Production

      2021, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
    View full text