Electrolarynx in voice rehabilitation

doi:10.1016/j.anl.2006.11.010

Auris Nasus Larynx

Volume 34, Issue 3, September 2007, Pages 327-332

https://doi.org/10.1016/j.anl.2006.11.010 Get rights and content

Abstract

Objective

Patients of laryngeal cancer who have undergone the surgical removal of the entire larynx suffer the loss of phonation. Electrolarynx (EL) speech is the most commonly adopted alaryngeal phonation. However, EL speech is notorious of the sound quality being monotonic and robotic with the lack of pitch control and the presence of the radiated noise. This paper provides a review of modalities in EL speech as well as introducing the technologies to control the pitch and reduce the noise of the device.

Methods

Improvements of EL speech quality have been divided into two parts: improving the sound quality of EL device by applying different enhancement algorithms to reduce the radiated and the additive noise, and implementing pitch-control function to the EL with advanced technology.

Results

Adaptive filtering and the subtractive-type algorithms have shown to be able to reduce the noise level associated with EL speech. And more mature technologies are showing promise to the making of a hand-free EL system producing more accurate and synchronized pitch and voice onset control.

Conclusion

The advent of micro-technology and human-machine integration promisingly improves EL speech quality and more efficient algorithms enhance EL sound quality. Such improvements apparently improve the intelligibility of EL speech, and thus better quality of life of the EL speakers.

Introduction

The removal of the entire larynx as a treatment of laryngeal cancer usually results in the loss of the ability to produce voice and speech. Statistical data show that there are over 600,000 laryngectomees in the world [1], and apparently voice restoration is essential to these people. Standard esophageal (SE) speech and tracheoesophageal (TE) speech are two main methods used by laryngectomees for voice rehabilitation. But due to the low acquisition rate in SE speech (∼6%) [2] and the fact that as many as one-third of laryngectomized patients find TE speech unsuitable for anatomical or personal considerations [3], electrolarynx (EL) phonation is the most commonly adopted form of phonation. An electrolarynx (EL) is a battery-powered device, which incorporates the internal preset pitch that can be adjusted to meet with individual preference for male and female speakers. Lauder [4] and Rothman [5] found that the use of the EL was easier, produced longer sentences without special care, and was more effective for communication in many situations.

Since the debut of the first EL, named Sonovox, by Wright in 1942, EL has been undergoing many modifications. In 1945, Aurex company in Chicago started producing an EL named Aurex Neovox M-520T, setting the design foundation of modern EL. In 1959, the transistorized EL was developed by the Bell Laboratories [6]. Up to date, there are several commercially-used ELs including Nu-voice, Romet, Amplicode, Cooper-Rand, Servox, etc. The former four do not allow pitch adjustment during speaking, and Servox only has two preset pitch levels (high and low) with an external tone activation switch during conversation (see Fig. 1). There are two different types of EL: the neck-type and the intra-oral type. The neck-type EL is the most widely used among the laryngectomees. During phonation, the hand-held device is held against the neck approximately at the level of the former glottis to put the sound into the oral and pharyngeal cavities by an electromechanical vibrator. The vibrated electronic sound source is transmitted through the neck tissues, where the user modulates it to create speech by movements of articulators such as the lips, teeth, tongue, jaw and velum [6]. The particularity of the intra-oral type EL is the path that the sound is transmitted through. With the use of intra-oral tube, it transmits the sound into the mouth directly. Therefore, the energy leakage of the sound is limited and the speech quality is better as compared to the neck-type EL. Yet, due to the inconvenience in using the device such as articulation and sanitary problems, intra-oral type EL is more commonly used among laryngectomees immediately after laryngectomy or when still undergoing radiation therapy.

Acoustic and perceptual characteristics associated with EL phonation have been studied extensively in the English-speaking alaryngeal population. Because the pitch and intensity of EL speech are fixed during phonation, only few studies focused on the acoustic perspectives [5], [7], [8]. The experiments performed to characterize spectral differences between EL and normal phonation indicate that EL speech emitted energy peak near 500 Hz with a maximum amplitude near 2.5 kHz, whereas normal speech displays a energy peak near 500 Hz. Qi and Weinberg [8] also reported that EL speech has a substantial low-frequency deficit with an output of 30 dB lower below 500 Hz but 5–10 dB higher above 2 kHz than that of normal speech. Most of them focused on describing the perceptual characteristics of EL speech, which were usually investigated in comparison with SE or TE speech [9], [10], [11], [12], [13], [14], [15], [16]. It is generally agreed that EL speech is associated with lower intelligibility, poorer listeners’ acceptability, and more serious voicing confusions as compared to SE or TE speech, despite both are considerably worse than normal laryngeal speech. A review of the literature also indicates that some studies examined the characteristics of EL speech in tone languages including Thai, Cantonese, and Mandarin [17], [18], [19], [20], [21], [22], [23], [24], [25]. Compared to non-tone language, tone languages are characterized by wider ranges of fundamental frequency (F₀) and faster F₀ changes. This explains the poorer performance of EL speech. Results indicated that EL speakers were generally not able to produce the phonemic tones at a level of proficiency comparable to that of normal speakers. The patterns of tonal confusion of EL speech were different from those of NL speech. Higher intelligibility was related to level tone for EL speech than those related to rising or falling tones. Such poor performance was related to EL speakers’ inability to consistently produce pitch contours comparable to those of normal speakers due to the limitation of the instrument itself. Although some researches focused on acoustic and perceptual characteristics of EL speech, the literature presents conflicting information with regard to comparisons of EL and other alaryngeal speakers. Discrepancies in the findings from these studies may be due to the differences in subject sampling, recording methods, methods of analysis or speech samples used.

Section snippets

EL with pitch-control function

With more advanced technologies, several newer generations of EL have been developed to improve the sound quality. As indicated in previous discussion, the monotonic and robotic sound quality associated with EL speech is due to the lack of pitch adjustment during phonation. Pitch is preset at a certain level before use and remains steady during speech production. Improvements in the EL design therefore should focus on the real-time adjustment of the pitch. According to the method of pitch

EL speech enhancement

During EL phonation, some of the sounds produced by the vibrating diaphragm are radiated directly from the device. Poor interface with the neck and the surrounding neck tissues may result in radiated noise which interferes with the intelligibility of the speech. In the extreme cases, the stiff neck tissue resulting from radiation therapy may reflect all the acoustic energy from the EL back into the environment without propagating into the oral cavity for articulation. This apparently fails the

Summary

As mentioned above, considerable researches have been conducted to investigate the practical and theoretical improvements of EL speech. With the development of the state-of-the-art technology, we have reasons to believe that significant advances in improving the speech quality of EL speech in two aspects. The first aspect is the ability of the EL to adjust pitch and intensity real-time during phonation. The EMG-control EL will likely be adopted for several reasons. First of all, the method of

References (40)

A.A. Knox et al.
The effects of training in comprehension of electrolaryngeal speech
J Commun Disord
(1973)
J. Gandour et al.
Perception of contrastive stress in alaryngeal speech
J Phonet
(1982)
M. Ng et al.
Speech performance of adult Cantonese-speaking laryngectomees using different types of alaryngeal phonation
J Voice
(1997)
H.J. Liu et al.
Aerodynamic characteristics of laryngectomees breathing quietly and speaking with the electrolarynx
J Voice
(2004)
H. Takahashi et al.
Alaryngeal speech aid using an intra-oral electrolarynx and a miniature fingertip switch
Auris Nasus Larynx
(2005)
Hirokazu S, Takahashi H. Voice generation system using an intra-mouth vibrator for the laryngectomee. MS thesis. Japan:...
R. Hillman et al.
Functional outcomes following treatment for advanced laryngeal cancer. Part 1. Voice preservation in advanced laryngeal cancer. Part II. Laryngectomy rehabilitation: the state-of-the-art in the VA system
Ann Otol Rhinol Laryngol
(1998)
C. Karen et al.
Utilization of microprocessors in voice quality improvement: the electrolarynx
Curr Opin Otolaryngol Head Neck
(2000)
E. Lauder
The laryngectomee and the artificial larynx—a second look
J Speech Hear Disord
(1970)
H. Rothman
Acoustic analysis of artificial electronic larynx speech

H.L. Barney et al.

An experimental transistorized artificial larynx

Bell Syst Tech J

(1959)

M.S. Weiss et al.

Acoustic and perceptual characteristics of speech produced with an electronic artificial larynx

J Acoust Soc Am

(1979)

Y. Qi et al.

Low-frequency energy deficit in electrolaryngeal speech

J Speech Hear Res

(1991)

M. Hyman

An experimental study of artificial-larynx and esophageal speech

J Speech Hear Disord

(1955)

R.L. McCroskey et al.

The relative intelligibility of esophageal speech and artificial larynx speech

J Speech Hear Disord

(1963)

T. Shipp

Frequency, duration, and perceptual measures in relation to judgments of alaryngeal speech acceptability

J Speech Hear Res

(1967)

S.C. Holly et al.

A comparison of the intelligibility of esophageal, electrolarynx, and normal speech in quiet and in noise

J Commun Disord

(1983)

S.E. Williams et al.

Differences in speaking proficiency in three laryngectomy groups

Arch Otol

(1985)

J. Gandour et al.

Production of intonation and contrastive stress in electrolaryngeal speech

J Speech Hear Res

(1984)

J. Gandour et al.

Vowel length in Thai alaryngeal speech

Folia Phoniatric Logo

(1987)

Cited by (68)

Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion
2023, Biomedical Signal Processing and Control
Citation Excerpt :
Patients with laryngeal cancer who receive full therapy have their larynx removed by the total laryngectomy surgery, and as a result, they lose the fundamental frequency generation mechanism of the human vocal tract [1]. The EL is one of the speaking-aid devices they can use to rehabilitate their speech [2]. Although laryngectomees lose their larynges, the other organs for producing speech are usually unaffected.
An electrolarynx (EL) is a medical device that generates speech for people who lost their biological larynx. However, EL speech signals are unnatural and unintelligible due to the monotonous pitch and the mechanical excitation of the EL device. This paper proposes an end-to-end voice conversion method to enhance EL speech. We adopt a speaker-independent automatic speech recognition model to extract bottleneck features as the intermediate phonetic features for enhancement. Our system includes two stages: the bottleneck feature vectors of the EL speech are mapped by a parallel non-autoregressive model to the corresponding feature vectors of the normal speech in stage one. Then another voice conversion model maps normal speech’s bottleneck feature vectors directly to normal speech’s Mel-spectrogram in stage two, followed by a MelGAN-based vocoder to convert the Mel-spectrogram into waveform. In addition, we incorporate data augmentation and transfer learning to improve conversion performance. Experimental results show that the proposed method outperforms our baseline methods and performs well in terms of naturalness and intelligibility. The audio samples are available online.²
Aoustical and perceptual characteristics of mandarin consonants produced with an electrolarynx
2020, Speech Communication
Citation Excerpt :
Previous studies have indicated that more than half of laryngectomees use an EL up to two years post-laryngectomy due to a number of advantages, including its ease of learning and operation, and continuous output (Hillman et al., 1998). However, patients using ELs for communication are still limited in their communication due to the unnatural speech produced, thereby leading to low intelligibility (Kaye et al., 2017; Liu and Ng, 2007; Sluis et al., 2018; Verkerke and Thomson, 2014). A number of speech characteristics have been previously reported to affect EL speech quality and intelligibility.
The electrolarynx (EL) is an electromechanical device that enables patients to produce voice following the surgical removal of their larynx. The purpose of this study is to understand the acoustic and perceptual characteristics of Mandarin consonants produced by EL speakers. First, the acoustic characteristics (including speech intensity, consonant duration, spectral peak, and F2 onset) of Mandarin EL consonants are investigated by comparing the EL and normal consonants. Then, a perceptual evaluation of EL consonants is conducted to identify the relationship between acoustical characteristics and perceptual intelligibility. The results suggest three consonant confusion types are mainly responsible for the poor intelligibility of Mandarin EL consonants: (1) the “unaspirated-for-aspirated” confusion caused by the significantly shortened voice onset time of aspirated consonants; (2) the “voiced-for-voiceless” confusion caused by the continuous pulsing of the EL device and low consonant intensity; and (3) the “perceptual omission” caused by low intensity of consonants and / or consonant omission. The results obtained are promising and potential for further improvements in Mandarin EL speech intelligibility.
Speaking without vocal folds using a machine-learning-assisted wearable sensing-actuation system
2024, Nature Communications
Intelligent, Flexible Artificial Throats with Sound Emitting, Detecting, and Recognizing Abilities
2024, Sensors
Development and evaluation of a new intraoral voice assist device called the voice retriever
2024, Laryngoscope Investigative Otolaryngology
Patterns of alaryngeal voice adoption and predictive factors of vocal rehabilitation failure following total laryngectomy
2023, Head and Neck

View all citing articles on Scopus

View full text

Electrolarynx in voice rehabilitation

Abstract

Objective

Methods

Results

Conclusion

Introduction

Section snippets

EL with pitch-control function

EL speech enhancement

Summary

J Commun Disord

J Phonet

J Voice

J Voice

Auris Nasus Larynx

Functional outcomes following treatment for advanced laryngeal cancer. Part 1. Voice preservation in advanced laryngeal cancer. Part II. Laryngectomy rehabilitation: the state-of-the-art in the VA system

Ann Otol Rhinol Laryngol

Utilization of microprocessors in voice quality improvement: the electrolarynx

Curr Opin Otolaryngol Head Neck

The laryngectomee and the artificial larynx—a second look

J Speech Hear Disord

Acoustic analysis of artificial electronic larynx speech

An experimental transistorized artificial larynx

Bell Syst Tech J

Acoustic and perceptual characteristics of speech produced with an electronic artificial larynx

J Acoust Soc Am

Low-frequency energy deficit in electrolaryngeal speech

J Speech Hear Res

An experimental study of artificial-larynx and esophageal speech

J Speech Hear Disord

The relative intelligibility of esophageal speech and artificial larynx speech

J Speech Hear Disord

Frequency, duration, and perceptual measures in relation to judgments of alaryngeal speech acceptability

J Speech Hear Res

A comparison of the intelligibility of esophageal, electrolarynx, and normal speech in quiet and in noise

J Commun Disord

Differences in speaking proficiency in three laryngectomy groups

Arch Otol

Production of intonation and contrastive stress in electrolaryngeal speech

J Speech Hear Res

Vowel length in Thai alaryngeal speech

Folia Phoniatric Logo