Perception of Femininity and Masculinity in Voices as Rated by Transgender and Gender Diverse People, Professional Speech and Language Pathologists, and Cisgender Naive Listeners

Summary: Objective . To explore whether cisgender naive listeners, transgender and gender diverse (TGD) listeners, and speech-language pathologists (SLPs) experienced in providing gender-affirming voice training differ in their perception of femininity and masculinity in voices. Methods . Samples of spontaneous speech were collected from 95 cisgender, and 37 TGD speakers. Three listener groups of cisgender naive (N = 77), TGD (N = 30), and SLP (N = 14) listeners, respectively, rated the voices on visual analog scales in two randomly ordered blocks, in which the perceived degree of femininity was rated separately from the perceived degree of masculinity. Results . The three listener groups showed similar patterns in their distribution of ratings on the femininity and masculinity scales. The TGD listeners’ mean ratings did not differ from the cisgender naive listeners’, whereas SLPs showed a small, but significant, difference in their ratings compared with both TGD and cisgender naive listeners and rated the voices lower on both the femininity and masculinity scales. Conclusion . The results differ from previous studies as TGD, and cisgender naive listeners rated the voices very similarly. The lower ratings of femininity and masculinity by the SLPs were likely influenced by their awareness of the complexity in the perception of voices. Therefore, SLPs providing gender-affirming voice training should be attentive to how their professional training may influence their perception of femininity and masculinity in voices and encourage discussions and explorations of the TGD voice client’s perceptions of voices.


INTRODUCTION
For transgender and gender diverse (TGD) individuals, their birth-assigned sex is not congruent with their gender identity.To reach a better alignment with their gender identity, some TGD individuals wish to modify their gender expression, including their expression in voice and speech.Gender-affirming voice training provided by speech-language pathologists (SLPs) aims to assist the TGD individuals to reach an increased level of satisfaction with how their voice is perceived. 13][4] There is therefore a clear link between the clients' gender identity and the process of setting goals for gender-affirming voice training.For TGD individuals identifying as men or women, the goal for voice training can be to reach a voice that is consistently perceived as feminine 5,6 or masculine by listeners. 2Others aim for a voice that they can adjust according to their preferred expression in a specific social context. 2,4TGD individuals identifying outside the binary categories man and woman may wish to modify their voice to various extents to match their preferred self-presentation within a nonbinary or fluid gender identity. 7][9][10] In research on voice aspects that influence the perception of voice, a predominantly binary view on gender has been presented as listeners have been asked to rate voices according to their perceived agreement with the gender categories man or woman, [11][12][13] with the consequence that nonbinary people who identify outside of these categories may be disregarded.To replace a binary gender identification task with continuous ratings of the perceived degree of femininity and masculinity in voices, has the potential to incorporate a broader and more inclusive view on gender and gender expression in voice.In fact, the choice of terminology used was in a study by Houle et al 14 seen to nuance listeners' ratings related to gender expression, since the anchor terms female/male tended to elicit more binary ratings toward the scale endpoints compared with more centralized ratings of femininity/masculinity.The authors argued that the use of femininity/masculinity as anchor terms did not elicit the same binary stereotypes compared with anchor terms female/male. 146][17][18] To assume a dichotomous relationship between femininity and masculinity may be too much of a simplification and the perception of femininity and masculinity in voices is likely more thoroughly evaluated if listeners are able to evaluate the two dimensions independently. 14,19That is why, in our study, we used separate scales for the perceived level of femininity and masculinity, respectively, to better represent listeners' presumably multidimensional conception of vocal gender 20 and afford a more nuanced analysis of femininity and masculinity in voice.
A related aspect that we think warrants additional attention is the influence of listener characteristics, such as their own gender identity or formal training related to perception of voices.6][17] A few of these studies have included SLPs to represent experienced listeners [22][23][24] but most studies with cisgender participants have included naive listeners without formal training or extensive experience of performing voice assessments. 13,17,25Hence, gender-affirming voice training offered to TGD individuals has therefore to a limited extent been supported by research based on the perceptions of TGD individuals or SLPs with professional training and experience of working with TGD voice clients.
A predominately cisgender naive basis for gender-affirming voice training may well be appropriate for TGD speakers who aim for a voice that others perceive to be in accordance with the speakers' gender identity.Cisgender listeners' perceptions give an idea of how the speakers' vocal expression will be perceived in their daily life.However, how their voices are perceived by others may not always agree with the speakers' own perceptions of their voices. 6,26In fact, Hope and Lilley 20 concluded, based on their perceptual experiment with synthetic voices, that TGD listeners exhibited a different rating pattern than cisgender listeners.While cisgender listeners more frequently rated voices in a binary manner, with ratings clustered near the endpoints of the scales, TGD listeners tended to use the full range of the femininity and masculinity scales.The TGD listeners' perception of the genderambiguous synthetic voices was suggested by the authors to reflect the listeners' conscious processing and challenging of established gender categories. 208][29] However, research on potential differences between TGD and cisgender naive listeners is scarce and has shown inconclusive results 19,21,22 and, hence, needs to be further explored.
The sociocultural background and trans experiences in terms of gender nonconformity can be assumed to affect TGD clients' goals for voice training, 2 as well as satisfaction with training results.Identifying possible differences between TGD individuals' and SLPs' perceptions of voices can thereby inform the SLP to plan for a voice training that is affirmative of the client's views on voice.While previous research has shown that SLPs with voice expertise do not differ from naive listeners when rating femininity and masculinity in voices, 6,22,24,30,31 potential differences between SLPs and TGD listeners when rating a variation of voices have, to our knowledge, not yet been studied.
The present study aimed to explore how cisgender naive listeners, TGD listeners, and SLPs who on a regular basis provide gender-affirming voice training to transgender clients, rate masculinity and femininity.A large set of voices representing cisgender, and TGD speakers with a variety of self-reported gender identities were collected and rated perceptually by the three listener groups.Two questions of particular interest were: − Do cisgender naive, and TGD listeners rate femininity and masculinity in the voices differently overall, or use the rating scales differently?− Do TGD listeners and SLPs experienced in providing gender-affirming voice training rate femininity and masculinity differently, or use the rating scales differently, when rating the same sample of voices?
The study was designed to support SLPs and TGD voice clients to reach a common ground in setting goals for gender-affirming voice training that brings the clients forward toward their preferred gender expression in voice.

METHOD
Ethical approval for the study was obtained from the Swedish Ethical Review Authority (Case number 2019-05374).All speakers signed an informed written consent for their speech samples to be used in listener ratings.

Collection of recordings for listener ratings Speaker characteristics
Speech samples from 132 speakers were used for the listener rating task.Efforts were made to recruit adult speakers of working age (18-65 years) and with different gender identities.Fifty-four speakers were cisgender women, 41 were cisgender men, and 37 were TGD speakers who self-identified in another gender than their birth-assigned sex.Among the TGD speakers, 17 identified as women, of which three were retransitioning women assigned female at birth but had received testosterone treatment due to their previous male gender identity.In addition, 12 identified as men and seven as nonbinary.One speaker described their gender identity as unsure and was in the analysis included in the nonbinary group due to a TGD gender identity other than man or woman being indicated (Table 1).Previous or ongoing gender-affirming voice training led by an SLP was reported by 14 TGD speakers (nine identifying as women, four as men, and one as having a nonbinary gender identity).Among the TGD men, 10 had been on hormonal treatment with testosterone for > 6 months, and one participant for > 3 months.All three retransitioning women had experienced virilization of the voice due to previous testosterone treatment > 1 year.The mean age for all speakers was 33.8 (19-62) years.
All participants were native speakers of Swedish.Speaker demographics that can potentially affect the voice were collected, such as smoking habits and medical treatment with corticosteroids.A diagnosis of voice disorder or impaired voice quality was not seen as a reason for exclusion, contrary to previous studies. 32,33Instead, a variety of voices was assumed to be representative of speakers heard by listeners in their daily lives.Six cisgender speakers reported a voice disorder, of which four had been diagnosed with phonasthenia, one with vocal fold polyp, and one with chronic laryngitis.None of the TGD speakers had a voice disorder diagnosis.Speakers were recruited from the TGD client load at the Speech and Language Pathology clinics at Umeå University Hospital, Umeå, and at Karolinska University Hospital, Stockholm, and by convenience sampling.

Procedure
To facilitate the recruitment of speakers, recordings were made at a location convenient for the speaker, most often at a hospital or in the participant's home.In preparation for the study, the recording equipment was calibrated to a reference tone, using a Cirrus Research sound pressure meter (CR831A) held at a 5-cm distance from the mouth, and at an approximately 45-degree angle.The calibration procedure was assisted by the Sopran software program (http:// www.tolvan.com).When possible, a separate room was used to reduce ambient noise.Before the recording started, the mobile application Noise 34 was used to ensure that the surrounding sound level was less than 38 dB(A). 35An omnidirectional RØDE SmartLav+ microphone with a frequency range of 20 Hz-20 kHz and a signal-to-noise ratio of 67 dB was head-mounted at a distance of 5 cm from the angle of the speaker's mouth. 36The speech recordings were collected via the mobile application AVR X 37 at a sampling rate of 44.1 kHz and recording rate of 128 kbps.Android mobile phones were used as they allowed for automatic gain control to be disabled. 38Recordings were subsequently transferred via a portable memory stick to a computer and then erased from the mobile device.

Speech material
The spontaneous speech of the 132 speakers was elicited by asking them to talk about something they like to do (eg, to describe a hobby).If speakers expressed that the task was too demanding, they were asked to talk about something they had seen earlier that day (eg, on TV, in a magazine, or on the way to the recording session).The participants were given follow-up questions by the test leader to support monologs of approximately 1 minute.No specific instructions were given to the participants about how to use their voices during the recording.However, a few TGD participants asked about this, and were then encouraged to use a voice that they felt was most representative of themselves.
From the collected recordings of spontaneous speech, shorter sections were extracted for the listening task to reduce the overall listening time for the listeners rating the degree of femininity and masculinity, respectively.Statements that could be preconceived to contain direct or indirect gendered information about the speaker, for example, "my sons´ father," or "me and the other tenors" and person-identifiable portions were excluded to limit the influence of speech content affecting listeners' perceptions of the speaker.Nonspeech sounds were also edited out to avoid, for example, low-frequency coughs to affect the perception of a voice.The spontaneous speech samples used in the listening experiment included a minimum of three phrases. 39In cases of a short total duration, one to two phrases were added to the speech sample to approximate a total duration of 10 seconds.The speech samples were on average 11.4 seconds (SD = 3.8).The software program Praat 40 was used to edit the speech material.

Rating procedure
The listener rating task consisted of two separate parts, in which listeners were asked to rate the perceived degree of femininity and masculinity in the samples, respectively.Hence, the listeners listened to the 132 speech samples twice, and were randomized to start either with the femininity or the masculinity ratings.Written information about the rating procedure was presented on a computer screen at the start of the listening task, encouraging listeners to perform the ratings in a silent room, and to wear headphones. 6,33Before the ratings started, the listeners were asked to complete two test rounds in which sound files were played, allowing listeners to adjust the sound playback to a comfortable level and familiarize themselves with the rating procedure. 33The listeners were also informed that closing the program before the rating task had been completed would cause the data to be lost.The listeners were instead encouraged to take a short break between the two parts of the rating task.Ratings were made on visual analog scales by moving a cursor to match the listener's perception of femininity and masculinity, with "not feminine at all" and "not masculine at all," and "very feminine" and "very masculine" as low and high endpoints, respectively.No information about speakers' gender identity was presented to the listeners.Listeners could not proceed to the next voice sample until the entire previous sound file had been played and rated.In each listener rating task (rating of femininity and of masculinity, respectively), each sample was played only once, and the listener could not go back and listen to previously rated speech samples.The Psychopy software package 41 was used to produce a per-participant randomization of speech sample presentation order and to present the stimuli to listeners.The stimulus presentation was transferred to the Pavlovia experiment online presentation platform 42 due to the onset of the COVID-19 pandemic, as well as the demand to recruit from a wider geographical area to support participation of more TGD individuals and SLPs providing gender-affirming voice training.The perceptual ratings could not be traced back to a specific participant, regardless of the experiment platform used.

Listener participants
Three listener groups were recruited to perform the listener ratings: TGD individuals, cisgender naive listeners, and SLPs with experience in providing gender-affirming voice training to TGD clients.The listeners were required to be between 18 and 65 years, to be native speakers of Swedish, and to have no hearing deficit of a severity that would interfere with everyday verbal communication.The participants were recruited via convenient and snowball sampling (TGD, and cisgender naive listeners), through direct contact with national and regional trans and LGBTQI+ organizations (TGD listeners), or via a national professional network of SLPs specialized in gender-affirming voice training (SLP listeners).A total of 121 participants (77 cisgender naive, 30 TGD, and 14 SLP listeners) completed the two listener rating tasks (separate ratings of femininity and masculinity), and provided personal demographic information about their age, birth-assigned gender, gender identity, and experience of receiving or providing voice training.The listener groups are described in Table 2.The first 60 listeners performed the ratings on a computer provided by a research assistant at a location convenient for the participant, and the remaining 61 participants performed the ratings within the online Pavlovia platform.The SLPs reported a mean of 12.7 (1-20) years of experience in providing gender-affirming voice training.

Data analysis
In the first step of analysis, differences in how listener groups rated the 132 voices in terms of femininity and masculinity were investigated using Kruskal-Wallis H test, with subsequent post hoc pairwise comparisons.The Kruskal-Wallis H test was used due to the observation of a non-normal distribution of ratings.In the second step of the analysis, speaker gender identity was taken into account when the ratings of the three listener groups were compared.This allowed for a more detailed description of the listener ratings that would also be of clinical relevance to TGD speakers and SLPs within the context of genderaffirming voice assessment and training.
In the final step of the analysis, the listener groups' rating patterns were further explored, based on their distribution on the visual analog scales.This allowed for a listener group comparison of their tendency to place ratings toward the endpoints of the scales, indicating a more binary perception of femininity and masculinity, or toward the midrange of the scale, indicating a perception of a neither low or high degree of femininity and masculinity, respectively.Ratings of femininity and masculinity were categorized as low when below 250, mid when in the 251-750 range, and high when above 750.The 250 and 750 points were selected to afford separate investigation of clearly high and low femininity or masculinity ratings; speakers rated in the midrange on average were also better represented by using these boundaries compared with a split of the range into three portions of equal size (ie, 0-333, 334-666, and 667-1000).A chi-square test was applied to test whether any listener group rated a particular speaker group into one of the rating regions (low, mid, and high) more often than expected.The level for statistical significance was set at P < 0.05.

Listener group comparisons of femininity and masculinity ratings
In the first step of the analysis, the three listener groups (cisgender naive, TGD, and SLP listeners) were compared regarding their femininity and masculinity ratings of all 132 voices.The Kruskal-Wallis H test indicated no differences in median femininity ratings among the listener groups (H (2) = 3.509, P = 0.173).In the masculinity median ratings, a significant difference among listener groups was found (H (2) = 30.226,P = < 0.001) for which post hoc pairwise comparisons showed that the SLP group significantly differed from both cisgender naive and TGD listeners.However, the actual differences among listener groups were small, and the standard deviations were large in all three groups.The mean (with standard deviation) masculinity rating for the SLP listeners was 404 ± 372 on a 0-1000 rating scale, while the corresponding ratings given by cisgender naive and TGD listeners were 427 ± 352 and 438 ± 353, respectively.All listener groups displayed ratings that ranged the full length of the two scales.Extreme values of "0" and "1000" were, however, seldom used in any of the listener groups (2.1%, 1.2%, and 0.7% in the TGD, cisgender naive, and SLP listener group, respectively) (see Figure 1).

Listener group comparisons related to speaker gender identity
As a next step in the analysis, the three listener groups were compared by analyzing their femininity and masculinity ratings in relation to speakers' self-reported gender identity.The analysis showed that cisgender naive, TGD, and SLP listeners applied predominantly similar rating patterns for each of the different speaker subgroups.All three listener groups applied femininity and masculinity ratings toward the scale endpoints for voices belonging to cisgender men and women, and to TGD nonbinary speakers assigned female at birth.Voices belonging to TGD speakers identifying as men, women, and nonbinary persons assigned male at birth, on the other hand, displayed a high variability in both femininity and masculinity ratings in all three listener groups.A slightly higher variability in terms of a wider interquartile range (representing the mid 50% of the ratings) was seen in the SLP group compared with the other two groups.TGD listeners showed a tendency toward a higher variability in ratings of TGD speakers, but a lower variability in ratings of cisgender speakers, compared with cisgender naive listeners.Femininity and masculinity mean values did not differ between TGD and cisgender naive listeners (see Figure 2).

Listener group rating patterns
In the final step of the analysis, the rating patterns among the listener groups were analyzed based on their distribution of low, mid, or high ratings of masculinity and femininity, respectively.Listener group comparisons indicated that SLPs more often applied low ratings, and more seldom applied ratings in the midrange of the scales (ratings corresponding to 251-750 on a visual analog scale ranging from 0 to 1000), compared with the ratings provided by TGD and cisgender naive listeners.Cisgender naive listeners showed the largest proportion of ratings in the midrange.The differences among listener groups were more evident in the femininity ratings than in the masculinity ratings.However, the applied chi-square test indicated significant listener group differences for both femininity ratings (X 2 (4) > = 44.83,P = < 0.001) and masculinity ratings (X 2 (4) > = 21.11,P = < 0.001).The proportion of low, mid, and high ratings applied by the three listener groups is shown in Figure 3, which also shows the differing rating patterns applied to cisgender and TGD speakers, as was previously mentioned.While cisgender women's voices were predominantly rated with a high degree of femininity and a low degree of masculinity, and vice versa for cisgender men, TGD speakers identifying as men, women, and nonbinary (assigned male at birth) predominantly received ratings in the midrange of the femininity and masculinity scales by all listener groups.A few exceptions were seen in the SLP group as TGD men and nonbinary (assigned male at birth) were more often perceived with a low degree of femininity, and retransitioning women were more often perceived with a low degree of

DISCUSSION
This study explored whether listeners' trans experience, and their professional training and experience in providing gender-affirming voice training, influence their perception of femininity and masculinity in voices.Three listener groups were compared: cisgender naive listeners, TGD listeners, and SLPs providing voice training to TGD clients.The results showed predominantly similar patterns in how the three listener groups rated the perceived degree of femininity and masculinity in voices of both cisgender and TGD speakers.It was indicated, however, that SLPs more often applied lower ratings of both femininity and masculinity, compared with cisgender naive and TGD listeners.No differences in how cisgender and TGD listeners rated voices were observed, regardless of the group (cisgender or TGD) the speaker belonged to.All listener groups rated cisgender speakers' voices in a more binary manner compared with TGD speakers', who's voices more frequently received ratings in the midrange of both the femininity and masculinity scales.
The results from this listening study of perceived femininity and masculinity in voices comprise a large data set both regarding number of speakers and number of listeners.Both the speakers and listeners included predominately cisgender people, but also people with a variety of gender expressions or identities.To our knowledge, there is only one study recently published in which TGD listeners have rated a variety of voices, including both TGD and cisgender speakers, 23 but ratings of the different listener groups were then not compared.In all, previous research that has included TGD people as listeners is scarce and has shown inconsistent results.TGD and cisgender naive listeners have been indicated to either agree 21 or differ 20,22 in their ratings of femininity and masculinity in TGD and gender-ambiguous voices.The fact that our study did not show any clear differences between cisgender naive and TGD listeners' ratings goes against previous studies which have suggested that TGD listeners may be more attentive to variability among voices 22 and rate ambiguous voices in a less binary way, and instead more often use the midrange of femininity and masculinity scales, 20 compared with cisgender listeners.In our study, however, cisgender naive listeners were seen to place ratings in the midrange of the femininity and masculinity scales as frequently as TGD listeners.In addition, ratings indicating a binary perception of a specific voice (near the "0" and "1000" endpoints) were observed infrequently for FIGURE 3. Proportion of low, mid, and high ratings of femininity (top "(F)") and masculinity (bottom "(M)") in different speaker categories, given by cisgender naive, TGD, and SLP listeners.Speaker categories are based on speakers' gender identity.AFAB, assigned female at birth; AMAB, assigned male at birth.

Jenny Holmberg, et al
Perception of Femininity and Masculinity in Voices all listener groups.We therefore conclude that our listeners, regardless of them reporting cisgender or TGD identities, rated femininity and masculinity in voice similarly, and as continuous rather than dichotomous aspects, contrary to the listeners in some previous studies. 20ifferences between our study and the study of Hope and Lilley 20 that may have transferred to a differing result and interpretation can be found in the nature of the speech stimuli used.In our study, spontaneous speech of human speakers was used.It is possible that the speech samples therefore conveyed spectral and linguistic information omitted in synthesized speech 43 and in simple sentences 20,22,23 that aligned the perceptions of the listener groups.While there are advantages of keeping the linguistic content constant, the use of spontaneous speech for listener assessments was considered preferable for our study as it provides the most ecologically valid information about how speakers will be perceived by listeners in daily conversations.
A further difference between our study and previous work is the differing societal contexts in which listeners were situated.The frequently upheld view of Sweden as a country of gender equality and upending gender stereotypes 44 may have contributed to listeners' reluctancy to use stereotypical labels and, hence, being less inclined to use binary ratings toward the scale endpoints.In fact, a recently published study showed that the binary extremes of the rating scale were used much less often by Swedish listeners (68%) than by Australian listeners (92%) when rating the degree of femininity in TGD women's voices. 5A better understanding of the views and societal norms upheld by listeners would likely provide a better understanding of perceptual rating behaviors and should be considered in future research.
While TGD listeners' ratings did not differ from those of cisgender naive listeners, we acknowledge that TGD people's unique experiences may lead to a more reflective listening 20 and exposure to a variety of voices 23 that may influence the individual listener's perception of voices, including vocal aspects that are not captured in this study.A limitation of our study is that we did not control for individual listeners' familiarity with TGD voices.Although none of the cisgender naive listeners in our study reported personal trans experiences, there may have been participants also in the cisgender group who were well-acquainted with voice-related aspects of gender, or in other ways familiar with gender diverse identities and expressions, which could have increased their alignment with TGD listeners' ratings.However, considering the large number of cisgender participants, a few participants experienced with TGD speakers are unlikely to have significantly influenced the ratings of the cisgender listener group.Hence, the ratings of the cisgender listeners are considered representative of people that TGD speakers are likely to encounter in their daily conversations, at least in Sweden.
Although our study did not confirm TGD listeners' ratings to differ from those of cisgender naive listeners, the results did, however, show slight differences between SLPs and the other two listener groups.Previous research has concluded that SLPs with voice expertise do not differ from naive listeners when rating the perceived degree of femininity/masculinity in voices. 6,22,24,30,31The differing results might be due to the varying compositions of the SLP listener group, including ratings by SLP students, 6,31 and ratings performed through a consensus listening procedure by only two SLPs. 24n only one previous study, the participating SLPs were included based on their experience in providing gender-affirming training to TGD clients, 22 which was the inclusion criterion used in the present study.Experience of working toward clients' varying goals for their vocal gender expression can be assumed to increase sensitivity to multiple aspects in voice and speech that can influence the perceived degree of femininity and masculinity in voice.Further, clinical training in audio-perceptual assessments could be expected to increase the ability to simultaneously identify several distinct aspects in voice.In fact, listener group comparisons performed outside of research on the perception of femininity and masculinity have shown that expert listeners attend to other, and a larger set of voice aspects when rating voices, compared with naive listeners who mainly attend to fundamental frequency. 45Hence, an attentiveness to a wider and potentially more nuanced set of voice aspects might explain SLPs' tendency to be more restrictive in their ratings of femininity and masculinity, compared with naive listeners, who presumably have a more coarse attention to acoustic aspects of the voice.Our results highlight the need for SLPs to be aware that their professional experience or training may cause small divergences in how voices are perceived from that of the surrounding naive listeners.In addition, we argue that SLPs should self-reflect on how one's understanding of voice is incorporated in the assessment of femininity and masculinity.While trained to assess voices from an anatomical and physiological understanding of voice production, it is important that SLPs also reflect on how voice production and perception are influenced by speakers' and listeners' sociocultural background and thus may lead the SLPs' perception to differ from that of the client. 46esides the SLPs' professional experience of providing gender-affirming voice training, their higher mean age should also be considered when listener groups are compared.Working as a certified SLP in Sweden requires a 4-year education.For SLPs providing gender-affirming voice training, an additional education regarding voice and communication for TGD voice clients, preceded by two mandatory years of clinical work with voice clients, is indicated in the national guidelines by the Swedish Professional Association for Transgender Health.Therefore, a lack of SLP participants between 18 and 25 years was expected to result in a higher mean age in the SLP listener group.However, the influence of listener age on the perception of femininity and masculinity in voices has not been conclusively supported in research. 21The effect of the extensive professional education and training was therefore regarded as a more potent influential factor guiding SLP listeners' voice ratings than age.
The high variability in femininity and masculinity ratings of TGD speakers' voices shows that these speakers were not perceived in a homogeneous way by any of the listener groups.The high variability is assumed to mainly result from the varying vocal gender expressions in the TGD speaker groups.However, a high variability in listeners' ratings might also reflect an unfamiliarity regarding voices outside the binary, in line with previous studies that have reported on listener-perceived uncertainty when rating voices that are considered to not consistently reflect the norms of either cismen's or ciswomen's voices. 5,47For SLPs providing gender-affirming voice training, there is a need for consistent and valid assessments of voices with varying vocal gender expressions.The large variability also in the SLPs' ratings indicates that agreed-upon standardized rating procedures that specifically target vocal gender expression would be beneficial for SLP practice within the area of gender-affirming voice training.A standardized and validated protocol for these assessments, including variables of importance, for example, resonance for TGD voices would, in turn, facilitate consensus listening training to strengthen SLPs' internal standards and make them better matched and less variable. 48istener characteristics in focus of this study were gender identification (as cisgender or within a TGD identification) and listener expertise in providing gender-affirming voice training.Gender identity and listener expertise are, however, not the only listener characteristics with the potential to influence listeners' perception of voices.While previous research has shown no consistent effect of listener gender, 21,49 age, 21 and sexuality 32,49 on ratings of perceived femininity and masculinity in TGD voices, characteristics reflecting the listener's sociocultural identity and background have been argued to be likely to influence the constructs and perceptions of femininity and masculinity in the listener. 5,27,29Geographically bound sociocultural constructs, as well as language-specific linguistic structure have, for example, been argued to influence listeners' perception of vocal gender. 5,27The present study enrolled only listeners who were native speakers of Swedish, which may limit the generalizability of the results to languages close to spoken Swedish.Further studies are needed to explore the potential influence of listeners' sociocultural identity, including linguistic and cultural background on the perception of femininity and masculinity in voice.
The number of SLPs participating in this study was small and included only cisgender women, despite the recruitment effort having access to the national network that gathers all SLPs who, on a regular basis, provide genderaffirming voice training to TGD voice clients.Therefore, generalizations of findings to a larger group of SLPs providing gender-affirming voice training should be made with caution.However, the study is one of few in which the participating expert listeners have been restricted to include only SLPs with experience in providing gender-affirming voice training.Hence, the rating patterns seen for SLPs provide valuable insights for SLPs in the field to reflect upon in their work with TGD voice clients.The small number of SLP participants limits the possibility of analyzing the effect of length of experience and of formal specialized training on how voices are rated in terms of perceived degree of femininity and masculinity.Further research is needed to elucidate the impact of these aspects.
No assessments were made of listeners' reliability in rating the perceived degree of femininity and masculinity in voices.While we acknowledge this to be a limitation of our study, the risk of participant dropout was considered to increase substantially with duplicated voice samples added to the already-extensive rating task.Assessment of intrarater reliability is recommended in future studies on listeners' perception of voice.
We employed two separate ratings of femininity and masculinity to support the listeners' consideration of these properties separately and independently.While voices that received low ratings on one scale generally received high ratings on the other scale, voices that received ratings in the midrange of the scales did not show an equally dichotomous distribution of femininity and masculinity ratings.The use of two separate scales thus showed the benefit of perceptions not being shoehorned into a one-dimensional scale with femininity and masculinity as opposing endpoints, and instead allowed for voices to be rated as both quite feminine and quite masculine.As listeners may direct their perception of vocal gender differently depending on the terminology used and how the listening task is presented, 14,19 we suggest that future research on the perception of voice makes careful considerations regarding the choice of scales and scale endpoint labels, to avoid forcing a dichotomous representation of femininity and masculinity upon listeners.

CONCLUSIONS
The results from this listening study of perceived masculinity and femininity in voices are based on spontaneous speech and comprise a large data set both regarding the number of speakers and listeners.The study explored whether TGD, and professional SLPs who provide gender-affirming voice training perceive femininity and masculinity in voices similarly as cisgender naive listeners.TGD speakers did not perceive voices differently than cisgender naive listeners.SLPs providing gender-affirming voice training perceived voices as both less feminine and less masculine than the other two listener groups.This may be explained by SLPs' clinical training in assessing voices, which is assumed to increase the awareness of the complex nature of voice perception and may cause small differences in their perception of femininity and masculinity compared with other listeners.The SLPs should, therefore, be attentive to how their professional training may influence their perception of voices, and encourage discussions and explorations of the TGD clients' perception of femininity and masculinity to ensure a clientcentered voice assessment and training.

Declaration of Competing Interest
The authors declare no conflict of interest.

FIGURE 1 .
FIGURE 1.Ratings of the perceived degree of femininity (left) and masculinity (right) given by cisgender naive, TGD, and SLP listeners, presented in boxplots showing mean (central dots) and median (horizontal lines) ratings, with boxes representing interquartile range (the mid 50% of the ratings).