Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech

Radha, Kodali; Bansal, Mohan

doi:10.1007/s00034-023-02399-y

Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech

Published: 20 May 2023

Volume 42, pages 6228–6252, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

234 Accesses
8 Citations
Explore all metrics

Abstract

The children below 6 years of age are called preliterate who use speech as one of their primary forms of communication. Fundamental frequency or pitch is a characteristic that is used to classify gender, but young children have reasonably similar pitch due to their immature vocal tract which varies from 215 to 390 Hz for both genders. Most studies for gender identification have utilized pitch and mel frequency cepstral coefficients (MFCC), because of their ability to capture the efficacy of signals. However, the performance of pitch and MFCC on noisy speech signals are poor, and as a result, they fail to accurately detect gender characteristics. Considering this limitation, the proposed work investigates the novel fusion and ablation experimentation of mel frequency cepstral coefficients (MFCC) and gamma-tone frequency cepstral coefficients (GFCC). To enhance the accuracy of a robust text-independent children gender identification model, the cepstral features are combined with the tonal descriptors (pitch and harmonic ratio). The most contributing front-end features were selected by fusion and ablation analysis and distributed to a bagged tree classifier ensemble. To manage the memory requirements, redundant features are trimmed using principle component analysis (PCA). The hyper-parameter optimization is accomplished using the grid search technique to further increase frame-level accuracy. This study is likely to be a forerunner in the field of children’s speech recognition, which has been revealed to be a reliable and accurate method of gender identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise-Robust Gender Classification System Through Optimal Selection of Acoustic Features

A Study on Feature Selection for Gender Detection in Speech Processing for Assamese Language

A Comparative Study on Speaker Gender Identification Using MFCC and Statistical Learning Methods

Data Availability Statement

The open access data that support the findings of this study is available from the ZENODO repository, “https://zenodo.org/record/200495#.Yit0zXpBxPZ”. More details about the data are given in Sect. 3.

References

R.S. Alkhawaldeh, DGR: gender recognition of human speech using one-dimensional conventional neural network. Sci. Program. (2019)
A.A. Badr, A.K. Abdul-Hassan, CatBoost machine learning based feature selection for age and gender recognition in short speech utterances. Int. J. Intell. Eng. Syst. 14(3), 150–159 (2021)
Google Scholar
M. Bansal, P. Sircar, Parametric representation of voiced speech phoneme using multicomponent AM signal model, in 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) (IEEE, 2018), pp. 128–133
M. Bansal, P. Sircar, Phoneme based model for gender identification and adult-child classification, in 13th International Conference on Signal Processing and Communication Systems (ICSPCS) (IEEE, 2019), pp. 1–7
T. Bocklet, A. Maier, J.G. Bauer et al., Age and gender recognition for telephone applications based on GMM supervectors and support vector machines, in ICASSP (IEEE, 2008), pp. 1605–1608
G. Chen, X. Feng, Y.L. Shue et al., On using voice source measures in automatic gender classification of children’s speech, in Eleventh Annual Conference of the International Speech Communication Association (2010)
T. Cincarek, I. Shindo, T. Toda et al., Development of preschool children subsystem for ASR and Q &A in a real-environment speech-oriented guidance task (2007)
F. Ertam, An effective gender recognition approach using voice data via deeper LSTM networks. Appl. Acoust. 156, 351–358 (2019)
Article Google Scholar
D. Feng, F. Chen, W. Xu, Efficient leave-one-out strategy for supervised feature selection. Tsinghua Sci. Technol. 18(6), 629–635 (2013)
Article Google Scholar
M. Feurer, F. Hutter, Hyperparameter Optimization (Springer, Cham, 2019), pp.3–33
Google Scholar
E.J. Hunter, A comparison of a child’s fundamental frequencies in structured elicited vocalizations versus unstructured natural vocalizations: a case study. Int. J. Pediatr. Otorhinolaryngol. 73(4), 561–571 (2009)
Article Google Scholar
R. Jahangir, T.Y. Wah, N.A. Memon et al., Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8, 32,187-32,202 (2020)
Article Google Scholar
B.H. Juang, W. Hou, C.H. Lee, Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)
Article Google Scholar
H.K. Kathania, S. Shahnawazuddin, N. Adiga et al., Role of prosodic features on children’s speech recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 5519–5523. https://doi.org/10.1109/ICASSP.2018.8461668
J. Kennedy, S. Lemaignan, C. Montassier et al., Child speech recognition in human–robot interaction: evaluations and recommendations, in Proceedings of the 2017 ACM/IEEE International Conference on Human–Robot Interaction (2017), pp. 82–90
S. Lee, A. Potamianos, S. Narayanan, Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)
Article Google Scholar
A. McAllister, S.K. Brandt, A comparison of recordings of sentences and spontaneous speech: perceptual and acoustic measures in preschool children’s voices. J. Voice 26(5), 673.e1-673.e5 (2012)
Article Google Scholar
H. Pérez-Espinosa, H. Avila-George, J. Martínez-Miranda et al., Children age and gender classification based on speech using ConvNets. Res. Comput. Sci. 147, 23–35 (2018)
Article Google Scholar
T.L. Perry, R.N. Ohde, D.H. Ashmead, The acoustic bases for gender identification from children’s voices. J. Acoust. Soc. Am. 109(6), 2988–2998 (2001)
Article Google Scholar
J. Qi, D. Wang, J. Xu et al., Bottleneck features based on gammatone frequency cepstral coefficients, in Interspeech, International Speech Communication Association (2013)
J.R. Quinlan et al., Bagging, boosting, and c4. 5, in AAAI/IAAI, vol. 1 (1996), pp. 725–730
K. Radha, M. Bansal, Audio augmentation for non-native children’s speech recognition through discriminative learning. Entropy 24(10), 1490 (2022)
Article Google Scholar
K. Radha, M. Bansal, Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children. Int. J. Inf. Technol. 15(3), 1375–1385 (2023)
Google Scholar
K. Radha, M. Bansal, S.M. Shabber, Accent classification of native and non-native children using harmonic pitch, in 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP) (IEEE, 2022), pp. 1–6
L.E. Raileanu, K. Stoffel, Theoretical comparison between the Gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)
Article MathSciNet MATH Google Scholar
P.B. Ramteke, A.A. Dixit, S. Supanekar et al., Gender identification from children’s speech, in 2018 Eleventh International Conference on Contemporary Computing (IC3) (IEEE, 2018), pp. 1–6
S. Safavi, M. Russell, P. Jančovič, Automatic speaker, age-group and gender identification from children’s speech. Comput. Speech Lang. 50, 141–156 (2018)
Article Google Scholar
S. Seneff, Real-time harmonic pitch detector. IEEE Trans. Acoust. Speech Signal Process. 26(4), 358–365 (1978). https://doi.org/10.1109/TASSP.1978.1163118
Article Google Scholar
G. Sharma, K. Umapathy, S. Krishnan, Trends in audio signal feature extraction methods. Appl. Acoust. 158(107), 020 (2020)
Google Scholar
Y.L. Shue, M. Iseli, The role of voice source measures on automatic gender classification, in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. (IEEE, 2008), pp. 4493–4496
G. Yeung, A. Alwan, On the difficulties of automatic speech recognition for kindergarten-aged children. Interspeech (2018)
F. Yu, Z. Yao, X. Wang et al., The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines, in 2021 IEEE Spoken Language Technology Workshop (SLT) (IEEE, 2021), pp. 1117–1123
A. Zourmand, H.N. Ting, S.M. Mirhassani, Gender classification in children based on speech characteristics: using fundamental and formant frequencies of Malay vowels. J. Voice 27(2), 201–209 (2013)
Article Google Scholar

Download references

Acknowledgements

The authors would like to extend their gratitude to VIT-AP University for providing the essential resources necessary to conduct this research at the High-Performance Computing Laboratory.

Funding

This research received no external funding.

Author information

Authors and Affiliations

School of Electronics Engineering, VIT-AP University, Amaravati, Andhra Pradesh, 522237, India
Kodali Radha & Mohan Bansal

Authors

Kodali Radha
View author publications
You can also search for this author in PubMed Google Scholar
Mohan Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohan Bansal.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Radha, K., Bansal, M. Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech. Circuits Syst Signal Process 42, 6228–6252 (2023). https://doi.org/10.1007/s00034-023-02399-y

Download citation

Received: 11 March 2022
Revised: 04 May 2023
Accepted: 05 May 2023
Published: 20 May 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00034-023-02399-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech

Abstract

Access this article

Similar content being viewed by others

Noise-Robust Gender Classification System Through Optimal Selection of Acoustic Features

A Study on Feature Selection for Gender Detection in Speech Processing for Assamese Language

A Comparative Study on Speaker Gender Identification Using MFCC and Statistical Learning Methods

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech

Abstract

Access this article

Similar content being viewed by others

Noise-Robust Gender Classification System Through Optimal Selection of Acoustic Features

A Study on Feature Selection for Gender Detection in Speech Processing for Assamese Language

A Comparative Study on Speaker Gender Identification Using MFCC and Statistical Learning Methods

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation