• 1990 (Vol.4)
  • 1989 (Vol.3)
  • 1988 (Vol.2)
  • 1987 (Vol.1)

Identification of speaker gender by voice characteristics under background of multi-talker noise

© 2024 O. V. Labutina, S. P. Pak, E. A. Ogorodnikova

Pavlov Institute of Physiology, Russian Academy of Sciences, 199034, Makarov emb., 6, St. Petersburg, Russia

Received 05 Feb 2024

Psychophysical methods were used to study the features of identifying the gender of a speaker based on voice characteristics under conditions of speech-like interference and stimulation through headphones. We used a set of speech signals and multi-talker noise from experiments in a free sound field – a spatial scene (Andreeva et al., 2019). The set included 8 disyllabic words spoken by 4 speakers: 2 male and 2 female voices with average fundamental frequencies of 117, 139, 208 and 234 Hz. Multi-talker noise represented the result of mixing all audio files (8 words * 4 speakers). The signal-to-noise ratio was 1:1, which subjectively corresponded to the maximum noise level in the spatial scene (SNR = –14 dB). Adult subjects from 17 to 57 years old (n = 42) participated in the experiments. Additionally, 3 age subgroups were identified: 18.6±1.5 years (n = 27); 28±4.1 years (n = 7); 46±5.4 years (n = 8). All subjects had normal hearing. The results of the study and their comparison with the data of mentioned work confirmed the importance of voice characteristics for the auditory analysis of complex spatial (free sound field) and non-spatial (headphones) scenes, and also demonstrated the role of mechanisms of the masking and binaural perception, in particular, the high-frequency mechanism of spatial hearing. A relation the perceptual assessment of the gender by voice in noise and the age of the subjects and the gender of the speakers (male/female voice) was also found. The results are of practical importance for the organization of hearing-speech training, early detection of speech hearing interference immunity impairment, as well as the development of noise-resistant systems for automatic speaker verification and hearing aid technologies.

Key words: perception, voice, gender feature, imitation of a complex scene, noise, polyphony, spatial acoustic scene

DOI: 10.31857/S0235009224020041  EDN: DDOTRT

Cite: Labutina O. V., Pak S. P., Ogorodnikova E. A. Opredelenie pola diktora po kharakteristikam golosa na fone shuma mnogogolosiya [Identification of speaker gender by voice characteristics under background of multi-talker noise]. Sensornye sistemy [Sensory systems]. 2024. V. 38(2). P. 54–61 (in Russian). doi: 10.31857/S0235009224020041

References:

  • Balyakova A.A., Labutina O.V., Medvedev I.S., Pak S.P., Ogorodnikova Ye.A. Osobennosti raspoznavaniya rechevykh signalov v usloviyakh golosovoy konkurentsii v norme i pri narusheniyakh slukhorechevoy funktsii [Features of speech signal recognition in conditions of vocal competition with normal hearing and with hearing or speech disorders]. Sensornyye sistemy. 2023. V. 37. № 4. P. 342–347. DOI: 10.31857/S0235009223040029.
  • Koroleva I.V. Osnovy audiologii i slukhoprotezirovaniya. [Fundamentals of audiology and hearing aid]. St. Petersburg: KARO, 2022. 448 p. (in Russian).
  • Koroleva I.V., Ogorodnikova E.A., Pak S.P., Levin S.V., Baliakova A.A., Shaporova A.V. Metodicheskiye podkhody k otsenke dinamiki razvitiya protsessov slukhorechevogo vospriyatiya u detey s kokhlearnymi implantami. [Methodological approaches to assessing the dynamics of the development of hearing and speech perception processes in children with cochlear implants] Russian Otorhinolaryngology. 2013. № 3. P. 75–85. (in Russian).
  • Lopotko A.I., Berdnikova I.P., Boboshko M.Yu., Zhuravleva T.A., Zhuravskiy S.G., Kvasova T.V., Lomovatskaya L.G., Mal’tseva N.V., Molchanov A.P., Ryndina A.M., Savenko I.V., Slesarenko N.P., Soldatova G.Sh. Prakticheskoye rukovodstvo po surdologii [A practical guide to audiology]. St. Petersburg: Dialog, 2008. 273 p. (in Russian).
  • Lyashevskaya O.N., Sharov S.A. Chastotnyy slovar’ sovremennogo russkogo yazyka (na materialakh Natsional’nogo korpusa russkogo yazyka) [Frequency dictionary of the modern Russian language (based on materials from the National Corpus of the Russian Language)]. Moscow: Azbukovnik, 2009. 1090 p. (in Russian).
  • Ogorodnikova Ye.A., Labutina O.V., Andreyeva I.G., Gvozdeva A.P., Baulin Yu.A. Faktor prosodiki v vospriyatii kommunikativnoy stseny s prostranstvennym razdeleniyem istochnikov rechi i rechepodobnoy pomekhi [Prosody factor in the perception of a communicative scene with spatial separation of speech sources and speech-like interference]. Tezisy dokladov Mezhdunarodnoy konferentsii “Lingvisticheskiy forum 2020: Yazyk i iskusstvennyy intellekt” / Pod red. A.A. Kibrika, V. Yu. Guseva, D.A. Zalmanova. Moscow: Institut yazykoznaniya RAN, 2020. P. 127–128. (in Russian).
  • Sapogova Ye.Ye. Psikhologiya razvitiya cheloveka [Psychology of human development]. M.: Aspekt press. 2001. 460 p. (in Russian).
  • Khukhlayeva O.V. Psikhologiya razvitiya. Molodost’, zrelost’, starost’ [Developmental psychology. Youth, maturity, old age]. Moscow: Akademiya, 2006. 208 p. (in Russian).
  • Andreeva I.G. Spatial selectivity of hearing in speech recognition in speech-shaped noise environment. Hum. Physiol. 2018. V. 44(2). P. 226–236. https://doi.org/10.1134/S0362119718020020
  • Andreeva I.G., Dymnikowa M., Gvozdeva A.P., Ogorodnikova E.A., Pak S.P. Spatial separation benefit for speech detection in multi-talker babble-noise with different egocentric distances. Acta Acustica united with Acustica. 2019. V. 105. № 3. P. 484–491. https://doi.org/10.3813/AAA.919330
  • Balling L.W., Mølgaard L.L., Townend O., Nielsen J.B.B. The collaboration between hearing aid users and artificial intelligence to optimize sound. Seminars in Hearing. 2021. № 42(3). P. 282–294. https://doi.org/10.1055/s-0041-1735135
  • Bharathi R., Nalina H.D. Survey of Recent Advances in Hearing Aid Technologies and Trends. International Research Journal on Advanced Engineering Hub. 2024. V. 2. I. 2. P. 303–308. https://doi.org/10.47392/IRJAEH.2024.0046
  • Bregman A.S. Auditory scene analysis: the perceptual organization of sound. Cambridge: MIT Press, 1990.
  • Bronkhorst A.W. The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attention, Perception & Psychophysics. 2015. V. 77(5). P. 1465–1487. https://doi.org/10.3758/s13414-015-0882-9.
  • Cherry E.C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 1953. V. 25. № 5. P. 975.
  • Darvin C.J., Brungart D.S., Simpson B.D. Effects of fundamental frequency and vocal-tract length changes on attention to one or two simultaneous talkers. J. Acoust. Soc. Am. 2003. V. 114. P. 2913–2922.
  • Davis A., McMahon C.M., Pichora-Fuller K.M., Russ S., Lin F., Olusanya B.O., Chadha S., Tremblay K.L. Aging and Hearing Health: The Life-course Approach. Gerontologist. 2016. № 56 (Suppl 2). Р. 256–267. https://doi.org/10.1093/geront/gnw033.
  • Fostick L., Ben-Artzi E., Babkoff H. Aging and speech perception: beyond hearing threshold and cognitive ability. J. Basic Clin Physiol Pharmacol. 2013. № 24(3). Р. 175–183. https://doi.org/10.1515/jbcpp-2013-0048.
  • Gutschalk A., Dykstra A.R. Functional imaging of auditory scene analysis. Hear. Res. 2014. V. 307. P. 98.
  • Lesica N.A., Mehta N., Manjaly J.G., Deng L., Wilson B.S., Zeng F.-G. Harnessing the power of artificial intelligence to transform hearing healthcare and research. Nat. Mach. Intell. 2021. № 3. Р. 840–849. https://doi.org/10.1038/s42256-021-00394-z
  • Moore B.C.J. An Introduction to the Psychology of Hearing. Leiden. Brill., 2012. 442 p.
  • Musiek F.E., Chermak G.D. Handbook of central auditory processing disorder. San Diego. Plural Publishing, 2014. V. 1. Auditory neuroscience and diagnosis. 768 p.
  • Pernet C.R., Belin P. The Role of Pitch and Timbre in Voice Gender Categorization. Front. Psychol. 2012. Sec. Perception Science. V. 3. https://doi.org/10.3389/fpsyg.2012.00023
  • Popper A.N., Fay R.R. (Eds). Perspectives on auditory research. Springer handbook of auditory research. 2014. 680 p.
  • Shamma S.A., Elhilali M., Micheyl C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 2011. V. 34. P. 114.
  • Smirnova V.A., Labutina O.V., Gvozdeva A.P. Chapter 9: Speech detection in spatially distributed speech-like noise. In: Neural Networks and Neurotechnologies (eds: Yu. Shelepin, E. Ogorodnikova, N. Solovyev, E. Yakimova). St. Petersburg, VVM, 2019. P. 52–60.
  • Weston P., Hunter M.D., Sokhi D.S., Wilkinson I. Discrimination of voice gender in the human auditory cortex. NeuroImage. 2014. V. 105. P. 208–214. https://doi.org/10.1016/j.neuroimage.2014.10.056