Abstract
The children below 6 years of age are called preliterate who use speech as one of their primary forms of communication. Fundamental frequency or pitch is a characteristic that is used to classify gender, but young children have reasonably similar pitch due to their immature vocal tract which varies from 215 to 390 Hz for both genders. Most studies for gender identification have utilized pitch and mel frequency cepstral coefficients (MFCC), because of their ability to capture the efficacy of signals. However, the performance of pitch and MFCC on noisy speech signals are poor, and as a result, they fail to accurately detect gender characteristics. Considering this limitation, the proposed work investigates the novel fusion and ablation experimentation of mel frequency cepstral coefficients (MFCC) and gamma-tone frequency cepstral coefficients (GFCC). To enhance the accuracy of a robust text-independent children gender identification model, the cepstral features are combined with the tonal descriptors (pitch and harmonic ratio). The most contributing front-end features were selected by fusion and ablation analysis and distributed to a bagged tree classifier ensemble. To manage the memory requirements, redundant features are trimmed using principle component analysis (PCA). The hyper-parameter optimization is accomplished using the grid search technique to further increase frame-level accuracy. This study is likely to be a forerunner in the field of children’s speech recognition, which has been revealed to be a reliable and accurate method of gender identification.
Similar content being viewed by others
Data Availability Statement
The open access data that support the findings of this study is available from the ZENODO repository, “https://zenodo.org/record/200495#.Yit0zXpBxPZ”. More details about the data are given in Sect. 3.
References
R.S. Alkhawaldeh, DGR: gender recognition of human speech using one-dimensional conventional neural network. Sci. Program. (2019)
A.A. Badr, A.K. Abdul-Hassan, CatBoost machine learning based feature selection for age and gender recognition in short speech utterances. Int. J. Intell. Eng. Syst. 14(3), 150–159 (2021)
M. Bansal, P. Sircar, Parametric representation of voiced speech phoneme using multicomponent AM signal model, in 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) (IEEE, 2018), pp. 128–133
M. Bansal, P. Sircar, Phoneme based model for gender identification and adult-child classification, in 13th International Conference on Signal Processing and Communication Systems (ICSPCS) (IEEE, 2019), pp. 1–7
T. Bocklet, A. Maier, J.G. Bauer et al., Age and gender recognition for telephone applications based on GMM supervectors and support vector machines, in ICASSP (IEEE, 2008), pp. 1605–1608
G. Chen, X. Feng, Y.L. Shue et al., On using voice source measures in automatic gender classification of children’s speech, in Eleventh Annual Conference of the International Speech Communication Association (2010)
T. Cincarek, I. Shindo, T. Toda et al., Development of preschool children subsystem for ASR and Q &A in a real-environment speech-oriented guidance task (2007)
F. Ertam, An effective gender recognition approach using voice data via deeper LSTM networks. Appl. Acoust. 156, 351–358 (2019)
D. Feng, F. Chen, W. Xu, Efficient leave-one-out strategy for supervised feature selection. Tsinghua Sci. Technol. 18(6), 629–635 (2013)
M. Feurer, F. Hutter, Hyperparameter Optimization (Springer, Cham, 2019), pp.3–33
E.J. Hunter, A comparison of a child’s fundamental frequencies in structured elicited vocalizations versus unstructured natural vocalizations: a case study. Int. J. Pediatr. Otorhinolaryngol. 73(4), 561–571 (2009)
R. Jahangir, T.Y. Wah, N.A. Memon et al., Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8, 32,187-32,202 (2020)
B.H. Juang, W. Hou, C.H. Lee, Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)
H.K. Kathania, S. Shahnawazuddin, N. Adiga et al., Role of prosodic features on children’s speech recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 5519–5523. https://doi.org/10.1109/ICASSP.2018.8461668
J. Kennedy, S. Lemaignan, C. Montassier et al., Child speech recognition in human–robot interaction: evaluations and recommendations, in Proceedings of the 2017 ACM/IEEE International Conference on Human–Robot Interaction (2017), pp. 82–90
S. Lee, A. Potamianos, S. Narayanan, Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)
A. McAllister, S.K. Brandt, A comparison of recordings of sentences and spontaneous speech: perceptual and acoustic measures in preschool children’s voices. J. Voice 26(5), 673.e1-673.e5 (2012)
H. Pérez-Espinosa, H. Avila-George, J. Martínez-Miranda et al., Children age and gender classification based on speech using ConvNets. Res. Comput. Sci. 147, 23–35 (2018)
T.L. Perry, R.N. Ohde, D.H. Ashmead, The acoustic bases for gender identification from children’s voices. J. Acoust. Soc. Am. 109(6), 2988–2998 (2001)
J. Qi, D. Wang, J. Xu et al., Bottleneck features based on gammatone frequency cepstral coefficients, in Interspeech, International Speech Communication Association (2013)
J.R. Quinlan et al., Bagging, boosting, and c4. 5, in AAAI/IAAI, vol. 1 (1996), pp. 725–730
K. Radha, M. Bansal, Audio augmentation for non-native children’s speech recognition through discriminative learning. Entropy 24(10), 1490 (2022)
K. Radha, M. Bansal, Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children. Int. J. Inf. Technol. 15(3), 1375–1385 (2023)
K. Radha, M. Bansal, S.M. Shabber, Accent classification of native and non-native children using harmonic pitch, in 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP) (IEEE, 2022), pp. 1–6
L.E. Raileanu, K. Stoffel, Theoretical comparison between the Gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)
P.B. Ramteke, A.A. Dixit, S. Supanekar et al., Gender identification from children’s speech, in 2018 Eleventh International Conference on Contemporary Computing (IC3) (IEEE, 2018), pp. 1–6
S. Safavi, M. Russell, P. Jančovič, Automatic speaker, age-group and gender identification from children’s speech. Comput. Speech Lang. 50, 141–156 (2018)
S. Seneff, Real-time harmonic pitch detector. IEEE Trans. Acoust. Speech Signal Process. 26(4), 358–365 (1978). https://doi.org/10.1109/TASSP.1978.1163118
G. Sharma, K. Umapathy, S. Krishnan, Trends in audio signal feature extraction methods. Appl. Acoust. 158(107), 020 (2020)
Y.L. Shue, M. Iseli, The role of voice source measures on automatic gender classification, in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. (IEEE, 2008), pp. 4493–4496
G. Yeung, A. Alwan, On the difficulties of automatic speech recognition for kindergarten-aged children. Interspeech (2018)
F. Yu, Z. Yao, X. Wang et al., The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines, in 2021 IEEE Spoken Language Technology Workshop (SLT) (IEEE, 2021), pp. 1117–1123
A. Zourmand, H.N. Ting, S.M. Mirhassani, Gender classification in children based on speech characteristics: using fundamental and formant frequencies of Malay vowels. J. Voice 27(2), 201–209 (2013)
Acknowledgements
The authors would like to extend their gratitude to VIT-AP University for providing the essential resources necessary to conduct this research at the High-Performance Computing Laboratory.
Funding
This research received no external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Radha, K., Bansal, M. Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech. Circuits Syst Signal Process 42, 6228–6252 (2023). https://doi.org/10.1007/s00034-023-02399-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-023-02399-y