Skip to main content
Log in

Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The children below 6 years of age are called preliterate who use speech as one of their primary forms of communication. Fundamental frequency or pitch is a characteristic that is used to classify gender, but young children have reasonably similar pitch due to their immature vocal tract which varies from 215 to 390 Hz for both genders. Most studies for gender identification have utilized pitch and mel frequency cepstral coefficients (MFCC), because of their ability to capture the efficacy of signals. However, the performance of pitch and MFCC on noisy speech signals are poor, and as a result, they fail to accurately detect gender characteristics. Considering this limitation, the proposed work investigates the novel fusion and ablation experimentation of mel frequency cepstral coefficients (MFCC) and gamma-tone frequency cepstral coefficients (GFCC). To enhance the accuracy of a robust text-independent children gender identification model, the cepstral features are combined with the tonal descriptors (pitch and harmonic ratio). The most contributing front-end features were selected by fusion and ablation analysis and distributed to a bagged tree classifier ensemble. To manage the memory requirements, redundant features are trimmed using principle component analysis (PCA). The hyper-parameter optimization is accomplished using the grid search technique to further increase frame-level accuracy. This study is likely to be a forerunner in the field of children’s speech recognition, which has been revealed to be a reliable and accurate method of gender identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability Statement

The open access data that support the findings of this study is available from the ZENODO repository, “https://zenodo.org/record/200495#.Yit0zXpBxPZ”. More details about the data are given in Sect. 3.

References

  1. R.S. Alkhawaldeh, DGR: gender recognition of human speech using one-dimensional conventional neural network. Sci. Program. (2019)

  2. A.A. Badr, A.K. Abdul-Hassan, CatBoost machine learning based feature selection for age and gender recognition in short speech utterances. Int. J. Intell. Eng. Syst. 14(3), 150–159 (2021)

    Google Scholar 

  3. M. Bansal, P. Sircar, Parametric representation of voiced speech phoneme using multicomponent AM signal model, in 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) (IEEE, 2018), pp. 128–133

  4. M. Bansal, P. Sircar, Phoneme based model for gender identification and adult-child classification, in 13th International Conference on Signal Processing and Communication Systems (ICSPCS) (IEEE, 2019), pp. 1–7

  5. T. Bocklet, A. Maier, J.G. Bauer et al., Age and gender recognition for telephone applications based on GMM supervectors and support vector machines, in ICASSP (IEEE, 2008), pp. 1605–1608

  6. G. Chen, X. Feng, Y.L. Shue et al., On using voice source measures in automatic gender classification of children’s speech, in Eleventh Annual Conference of the International Speech Communication Association (2010)

  7. T. Cincarek, I. Shindo, T. Toda et al., Development of preschool children subsystem for ASR and Q &A in a real-environment speech-oriented guidance task (2007)

  8. F. Ertam, An effective gender recognition approach using voice data via deeper LSTM networks. Appl. Acoust. 156, 351–358 (2019)

    Article  Google Scholar 

  9. D. Feng, F. Chen, W. Xu, Efficient leave-one-out strategy for supervised feature selection. Tsinghua Sci. Technol. 18(6), 629–635 (2013)

    Article  Google Scholar 

  10. M. Feurer, F. Hutter, Hyperparameter Optimization (Springer, Cham, 2019), pp.3–33

    Google Scholar 

  11. E.J. Hunter, A comparison of a child’s fundamental frequencies in structured elicited vocalizations versus unstructured natural vocalizations: a case study. Int. J. Pediatr. Otorhinolaryngol. 73(4), 561–571 (2009)

    Article  Google Scholar 

  12. R. Jahangir, T.Y. Wah, N.A. Memon et al., Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8, 32,187-32,202 (2020)

    Article  Google Scholar 

  13. B.H. Juang, W. Hou, C.H. Lee, Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)

    Article  Google Scholar 

  14. H.K. Kathania, S. Shahnawazuddin, N. Adiga et al., Role of prosodic features on children’s speech recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 5519–5523. https://doi.org/10.1109/ICASSP.2018.8461668

  15. J. Kennedy, S. Lemaignan, C. Montassier et al., Child speech recognition in human–robot interaction: evaluations and recommendations, in Proceedings of the 2017 ACM/IEEE International Conference on Human–Robot Interaction (2017), pp. 82–90

  16. S. Lee, A. Potamianos, S. Narayanan, Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)

    Article  Google Scholar 

  17. A. McAllister, S.K. Brandt, A comparison of recordings of sentences and spontaneous speech: perceptual and acoustic measures in preschool children’s voices. J. Voice 26(5), 673.e1-673.e5 (2012)

    Article  Google Scholar 

  18. H. Pérez-Espinosa, H. Avila-George, J. Martínez-Miranda et al., Children age and gender classification based on speech using ConvNets. Res. Comput. Sci. 147, 23–35 (2018)

    Article  Google Scholar 

  19. T.L. Perry, R.N. Ohde, D.H. Ashmead, The acoustic bases for gender identification from children’s voices. J. Acoust. Soc. Am. 109(6), 2988–2998 (2001)

    Article  Google Scholar 

  20. J. Qi, D. Wang, J. Xu et al., Bottleneck features based on gammatone frequency cepstral coefficients, in Interspeech, International Speech Communication Association (2013)

  21. J.R. Quinlan et al., Bagging, boosting, and c4. 5, in AAAI/IAAI, vol. 1 (1996), pp. 725–730

  22. K. Radha, M. Bansal, Audio augmentation for non-native children’s speech recognition through discriminative learning. Entropy 24(10), 1490 (2022)

    Article  Google Scholar 

  23. K. Radha, M. Bansal, Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children. Int. J. Inf. Technol. 15(3), 1375–1385 (2023)

    Google Scholar 

  24. K. Radha, M. Bansal, S.M. Shabber, Accent classification of native and non-native children using harmonic pitch, in 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP) (IEEE, 2022), pp. 1–6

  25. L.E. Raileanu, K. Stoffel, Theoretical comparison between the Gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  26. P.B. Ramteke, A.A. Dixit, S. Supanekar et al., Gender identification from children’s speech, in 2018 Eleventh International Conference on Contemporary Computing (IC3) (IEEE, 2018), pp. 1–6

  27. S. Safavi, M. Russell, P. Jančovič, Automatic speaker, age-group and gender identification from children’s speech. Comput. Speech Lang. 50, 141–156 (2018)

    Article  Google Scholar 

  28. S. Seneff, Real-time harmonic pitch detector. IEEE Trans. Acoust. Speech Signal Process. 26(4), 358–365 (1978). https://doi.org/10.1109/TASSP.1978.1163118

    Article  Google Scholar 

  29. G. Sharma, K. Umapathy, S. Krishnan, Trends in audio signal feature extraction methods. Appl. Acoust. 158(107), 020 (2020)

    Google Scholar 

  30. Y.L. Shue, M. Iseli, The role of voice source measures on automatic gender classification, in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. (IEEE, 2008), pp. 4493–4496

  31. G. Yeung, A. Alwan, On the difficulties of automatic speech recognition for kindergarten-aged children. Interspeech (2018)

  32. F. Yu, Z. Yao, X. Wang et al., The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines, in 2021 IEEE Spoken Language Technology Workshop (SLT) (IEEE, 2021), pp. 1117–1123

  33. A. Zourmand, H.N. Ting, S.M. Mirhassani, Gender classification in children based on speech characteristics: using fundamental and formant frequencies of Malay vowels. J. Voice 27(2), 201–209 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to extend their gratitude to VIT-AP University for providing the essential resources necessary to conduct this research at the High-Performance Computing Laboratory.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohan Bansal.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Radha, K., Bansal, M. Feature Fusion and Ablation Analysis in Gender Identification of Preschool Children from Spontaneous Speech. Circuits Syst Signal Process 42, 6228–6252 (2023). https://doi.org/10.1007/s00034-023-02399-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02399-y

Keywords

Navigation