Exterior sounds for electric and automated vehicles: Loud is effective

Exterior vehicle sounds have been introduced in electric vehicles and as external human – machine interfaces for automated vehicles. While previous research has studied the effect of exterior vehicle sounds on detectability and acceptance, the present study takes on a different approach by examining the efficacy of such sounds in deterring people from crossing the road. An online study was conducted in which 226 participants were presented with different types of synthetic sounds, including sounds of a combustion engine, pure tones, combined tones, and beeps. Participants were presented with a scenario where a vehicle moved in a straight trajectory at a constant velocity of 30 km/h, without any accompanying visual information. Participants, acting as pedestrians, were asked to hold down a key when they felt safe to cross. After each trial, they assessed whether the vehicle sound was easy to notice, whether it gave enough information to realize that a vehicle was approaching, and whether the sound was annoying. The results showed that sounds of higher modeled perceived loudness, such as continuous tones with high frequency, were the most effective in deterring participants from crossing the road. The tested intermittent beeps resulted in lower crossing deterrence than continuous tones, presumably because no valuable information could be derived during the inter-pulse intervals. Tire noise proved to be effective in deterring participants from crossing while being the least annoying among the sounds tested. These results may prove insightful for the improvement of synthetic exterior vehicle sounds.


Introduction
More than 270,000 fatal pedestrian traffic accidents occur annually worldwide [1], the majority of which occur during road crossing [2].Causes of pedestrian-vehicle accidents include underestimation of the crossing gap or the time needed to cross [3], low visibility [4], and visual obstruction of the approaching vehicle [5].Augmenting the sounds emitted by vehicles could potentially help prevent unsafe crossing.Such solutions have been the topic of investigation for electric vehicles (EVs) as well as in the form of external human-machine interfaces (eHMIs) for automated vehicles (AVs).

Sound design for electric vehicles
The market penetration of EVs is increasing, and countries worldwide have announced plans to cease sales of new internal combustion engine vehicles (ICEVs) between 2025 and 2035 [6][7][8][9][10].However, one of the issues with EVs is that the lack of combustion noise can make EVs so quiet that they may become unsafe for vulnerable road users (VRUs) with visual impairments [11,12].Therefore, in recent years, legislation in Europe [13,14] and the United States [15] has been introduced to address the inherent quietness of EVs at low speeds.The law specifies that, up to 20 km/h in Europe and 30 km/h in the USA, EVs and hybrid vehicles must emit a minimum level of decibels, distributed across a minimum number of frequency bands.Beyond these speeds, the noise generated by the tires and aerodynamic drag is considered sufficient, making synthetic noise unnecessary.Furthermore, the legislation stipulates additional requirements.The sound must be dependent on the speed [14,15], and within the European legislative framework, the sound must be continuous [13].In Europe, there is even an allowance for the inclusion of specific preferred sounds selectable by the driver [13].In summary, the current legislation aims to ensure safety through synthetic sounds but also provides some room for creativity, allowing automobile manufacturers to create vehicle-specific branding through the selection of different frequencies [13].
Only a few studies have investigated the relationship between detectability and annoyance in EVs, and which sound characteristics may result in high detectability, yet low annoyance.In an experiment with 30 participants, Lee et al. [28] reported that sounds with amplitude and frequency modulation led to faster detection and lower perceived annoyance than saw-tooth signals.Petiot et al. [27] used an interactive genetic algorithm to develop sounds: 15 assessors rated sounds in terms of their detectability and pleasantness.The sounds were weighted combinations of four components (a thermic motor sound, a harmonic sound, and two types of filtered broadband noises).Two different filters were also applied to the final sound.In total, more than 70 parameters were adjustable, including the frequencies and amplitudes of the four components.The genetically evolved sound was found to result in statistically significantly higher 'fitness' (i.e., a combination of higher detectability and lower unpleasantness) than sounds developed by human designers instructed to develop sounds satisfying these two criteria.It was also found that including the sound of a motor is important for detectability.
The effect of sounds on participants' willingness to cross has not been extensively investigated.An exception is the on-road study by Wall Emerson et al. [32], in which visually impaired participants were standing next to a road with approaching traffic and asked to indicate their willingness to cross by pressing and releasing a button.The authors reported that hybrid vehicles switching to internal combustion soon after accelerating from a stop were considerably more detectable than hybrid vehicles switching to internal combustion later (i.e., after having reached a higher speed), a finding that indicates the importance of engine sound as a cue of approaching traffic.

Sound design for automated vehicles
Next to EVs, auditory signals could be useful in the form of auditory external human-machine interfaces (eHMIs) that communicate the state or intention of automated vehicles (AVs) to VRUs.The majority of eHMIs developed so far are visual, but a number of auditory eHMIs have also been proposed (see [33] for an overview).Mahadevan et al. [34] used the verbal messages "I see you" and "cross" together with visual and tactile (via a mobile phone) feedback and found that a combination of modalities improved pedestrians' awareness of the approaching AV compared to single modalities.Deb et al. [35] tested a horn, music, and the verbal message "safe to cross" and found that verbal messages were preferred over the abstract sounds tested.Music was tested by Florentine et al. [36], who noted that it helped draw pedestrians' attention.Inspired by research on EV sounds, Moore et al. [37] proposed the use of a synthetic engine sound to indicate the intention of a driverless AV to stop in front of a pedestrian.The authors tested the concept with a hybrid vehicle in a Wizard-of-Oz naturalistic setting.The results showed that participants in the role of pedestrians rated the clarity of the AV's intention higher in the presence of engine sound compared to a condition without engine sound.
While the mandatory exterior vehicle sounds for EVs are primarily intended to ensure that EVs are audible in the same manner as ICEVs, serving as compensation for the intrinsic quietness of EVs, this implementation does not exploit the full potential of what can be achieved with sound.Just as visual eHMIs on AVs provide the opportunity to offer additional information that would otherwise be invisible to pedestrians [38], external vehicle sounds can be more than just mimics of traditional ICEVs.They could take various forms, such as loud tones or warning beeps, as investigated in the current work.Other possibilities that are not pursued in this study could include linking the sounds to the stopping or turning intentions of the AV or dynamically adapting the sounds to the traffic situation, with options such as directional emission or sounds that rely on pedestrian recognition by the AV.

Study aim
The aim of this study is to examine the effectiveness of various types of synthetic exterior vehicle sounds that could be used as auditory eHMIs to inform pedestrians of an approaching vehicle.We evaluated the effect of pure tones, combined tones, intermittent tones, beeps, and engine/ tire sounds-presented with and without background noise-on perceptual factors such as the extent to which the sound is easy to notice and annoying.Moreover, instead of asking participants to press a key as soon as they detected the exterior vehicle sound, we asked them to hold a key when they would be willing to cross the road.We expect crossing deterrence to increase with loudness and tonal frequency, whereas annoyance is likely to increase with loudness [39][40][41].Participants were recruited via crowdsourcing, resulting in a larger sample and higher statistical power compared to previous studies.

Sound emission from a moving source
In this study, auditory stimuli were generated from a vehicle passing a stationary observer.In this section, we describe the assumptions made during the simulation of the passing vehicle.In the subsequent section, we detail the various artificial sounds that were used to produce the auditory stimuli presented to the participants in the experiment.
This study assumes a simple sound source and observer geometry.A two-dimensional arrangement is used, where the observer and the sound source are on the same plane, and there is no reflection of sound from buildings or other structures.Furthermore, the sound source is assumed to be a monopole, a theoretical point source that radiates sound evenly in all directions.Modeling a passing car as a monopole could be a Fig. 1.Schematic representation of the vehicle approaching a stationary pedestrian.In this figure, the x-axis and y-axis originate from the observer.The x-axis runs parallel to the road, and the y-axis is perpendicular to the road.In the audio clips, the car exhibits uniform motion from (x, y) = (x s,0 , y s ) = (-60, 3) to (x s,f , y s ) = (53, 3).reasonable simplification for studying how a stationary observer perceives the sound.When the observer's distance is considerably greater than the size of the car itself, the specifics of the car's acoustic near field can often be disregarded.Previous research on the design of external vehicle sounds for EVs also used a monopole assumption (see [27,28,42]; although other methods such as multiple monopoles [43], directional emission [44], and physical model methods [45] have also been used for the simulation of exterior vehicle sounds).
The sound source was modeled to move at a constant velocity V of 30 km/h (≈ 8.33 m/s, Mach number M ≈ 0.024) in the positive x direction along a straight line at a distance y s = 3 m from the observer (Fig. 1).In Fig. 1, θ is the angle between the velocity vector V and the source position vector r.The initial position of the sound source at t = 0 is defined at (x s,0 , y s ) and at a general instant t as (x s,0 + V⋅t, y s ).In the generated auditory stimuli, the source moved from x s,0 = − 60 m to a final position of x s,f = 53 m.At the given speed of 30 km/h, this meant that the vehicle passed the observer after 7.2 s and that the audio clip concluded after 13.6 s.
The decision to use − 60 m and + 53 m was driven by the objective to keep the audio clips limited in duration (specifically, 13.6 s); this approach enabled participants to assess a substantial number of stimuli without causing the entire experiment to become lengthy.At a position of − 60 m, the sound volume was still low (see Fig. 2), and at 53 m, where the vehicle had already passed 5 s prior, no additional valuable information could be obtained regarding the participants' inclination to cross.
For a stationary observer, the amplitude of a sound emitted by a moving source changes over time due to the relative motion of the source.Equation (1) gives the amplitude modulation factor [46,47].It describes the amplitude of the sound pressures observed by the pedestrian (A(t obs )) relative to the amplitude of the sound pressures observed by the pedestrian if the vehicle were standing still (V = 0) in front of the pedestrian at coordinates (0, y s ) (A 0 ).

A(t obs )
A in which M is the Mach number, defined as V/c ≈ 8.33m/s 343m/s ≈ 0.024, where c is the speed of sound.
The sound signal is observed with a slight delay with respect to the emission time: Equation ( 1) describes that the observed sound amplitude increases as the vehicle gets closer to the pedestrian and decreases once the vehicle has passed the pedestrian.Furthermore, the effect of the source motion causes a frequency shift due to the Doppler effect, as described by Equation (3) [46].

Auditory stimuli
Thirty auditory stimuli were synthetically generated.These stimuli can be classified into four categories: (1) continuous pure tones at a single frequency, (2) intermittent pure tones (a 500-ms interval emitting followed by a 500-ms interval not emitting), (3) combined tones, and (4) double beeps.In addition, a stimulus with a diesel engine sound signal [48] was included as a baseline representing an ICEV.The selection of a diesel engine's sound was made because of its unique and characteristic noise, which sets it apart from other sounds such as those made by tires or wind.A stimulus with only tire noise [49] was also included to assess the performance of a quiet EV/AV.
The tonal sounds of categories ( 1), (2), and (3) were presented at four frequencies: 350 Hz, 500 Hz, 1000 Hz, and 2000 Hz.It is known that the hearing of young adults is most sensitive in the range of 2000 Hz to 5000 Hz, with peak sensitivity around 3000 Hz [50].We opted for lower frequencies, from 350 to 2000 Hz.Reasons for avoiding higher frequencies included potential issues with the directivity of the exterior loudspeakers, the possibility that atmospheric absorption might attenuate the sound emitted too much (necessitating very loud emission levels to remain effective), and the fact that the optimal frequency tends to reduce with biological age [51], making testing at 3000 Hz less relevant.For comparison, the (recommended) frequencies of sirens of emergency vehicles are also within the range of 400 Hz to 2000 Hz [51,52,53].
The combined tones of category (3) were the same as the continuous pure tones of category (1) but with two additional tones of lower amplitude (see Fig. A2) at frequencies 90 Hz above and below the main tone.The combined tones were expected to be perceived as more annoying because of the addition of the extra tone [54].The doublebeep signal (4), with each beep at a tone of 1800-1900 Hz, consisted of eight pairs of 240-ms beeps separated by a silent 100-ms interval within each pair and a 1000-ms interval between pairs.This stimulus was tested by Bazilinskyy et al. [55] (in a series of double-beep stimuli with 2000, 1000, 750, and 430 ms intervals).These authors found that shorter intervals between beeps led to a higher perceived urgency.For the current study, the stimulus with a medium (1000 ms) interval was selected.
The set of 15 signals (3 tonal sounds × 4 frequencies + double beeps + ICEV + tires) was offered with background noise (a recording of a quiet street [56]) and without, resulting in a total of 30 stimuli.In all cases with background noise, the tire noise sound was also added (for the tire noise stimulus, this means that it was offered once with and once without background noise).In the cases without background noise, no tire noise was added to test the pure effect of the synthetic sound.The sounds with background noise were presented at a lower volume than their counterparts without background noise (see Fig. 2 for an illustration), with the purpose of making the background noise prominent compared to the artificial tones that were part of the stimuli.All stimuli were generated at a sampling rate of 44.1 kHz.The duration of each stimulus was 13.6 s.Table 1 provides an overview of the auditory stimuli used in the experiment.Appendix A delivers a full account of all auditory stimuli used in this study.There, we display the signal amplitude as a function of time and also provide the discrete Fourier transforms of these signals.

Participants
Participants subscribed to the study through the crowdsourcing service Appen (https://appen.com).They could become aware of this research by logging into one of the channel websites (e.g., https://www.ysense.com),where our study was presented on a list of other projects.We allowed contributors from all countries to participate.Participants were not allowed to complete the study more than once using the same worker ID.A payment of 0.50 USD was offered after the completion of the experiment.
In addition to the crowdsourced participants, a small number of participants were recruited among acquaintances to conduct the same study.Due to COVID-19 restrictions at the time of the study, these participants conducted the experiment online using their personal computers rather than in the laboratory.They answered the same preexperimental questions as the crowdsourced participants but did so using a Google Form questionnaire instead of Appen.The experiment itself was presented in the same online environment as that of the crowdsourced participants.
The research was approved by the Human Research Ethics Committee of the Delft University of Technology (reference number 1233).

Procedure
The study was presented in English.At the top of the webpage introducing our study, contact information was provided.Participants were informed that they could contact the investigators to ask questions about the study and that they had to be at least 18 years of age.Information about anonymity and the voluntary nature of participation was

Table 1
Auditory stimuli included in the experiment.Participants first provided demographic information about their age, gender, hearing problems, use of headphones at the moment, and driving experience.Next, they were asked to leave the questionnaire by clicking on a link that opened a webpage with the experiment and were presented with the following instructions: "Imagine that you are a pedestrian standing on the side of the road.You will listen to 60 sounds of vehicles driving by you.When the sound is playing, press and HOLD 'F' when you feel safe to cross the road in front of the car.You can release the button and then press it again multiple times during the sound.After each sound, you will be asked to answer a few questions.After each 10 sounds you will be able to take a short break.Sometimes you will be asked to listen to a phrase and type what was said.
Please make sure that your audio is on.On the next page, you will listen to a song.When you will be listening to the song, adjust your volume level to be able to hear the song clearly.Do NOT change your volume level till the end of the experiment.Press 'C' to proceed." The song used to adjust the volume was instrumental and copyrightfree, taken from [57].On a scale from 0 to 1, the maximum amplitude of the digital sound from the music was 0.1 (compare this with Fig. 2 and Fig. A1 in the Appendix, which indicate that the maximum amplitude of the auditory stimuli without background noise is 1.0).Studies have shown that the preferred playback level for music listeners with headphones or earphones corresponds to a mean A-weighted ear canal sound pressure between 70 and 80 dBA and a standard deviation of 7 to 10 dBA [58][59][60].
The experiment was created using a modified framework based on jsPsych [61] that was used in a previous study on the measurement of reaction times to auditory, visual, and multi-modal stimuli [62], as well as in studies investigating the willingness of pedestrians to cross in front of an automated vehicle, using the same keypress method as in this study [63,64].The sounds were pre-loaded before the start of the experiment to prevent delays during the experiment.
The participants had to respond to 60 sounds presented in blocks.Participants were randomly assigned to either listen to 30 auditory stimuli with background noise first, followed by 30 without background noise, or vice versa.Each stimulus was presented twice per block, in a random order that differed for each participant.Before each stimulus sound, the participants were instructed as follows: "Start by HOLDING the 'F' key.Release the key when it becomes unsafe to cross; press again when safe to cross."The instruction remained visible throughout the duration of the stimulus.After each stimulus, they were asked to rank the sound based on three criteria: (1) "easy to notice", (2) "gave me enough information to realize that a vehicle was approaching", and (3) "annoying".Each criterion was ranked by moving a slider between 0 and 10 (Fig. 4).The participant could not proceed to the next stimulus before having moved all the sliders.Participants did not receive feedback on their responses.
In order to ensure attentive participation, five test phrases were injected, randomly selected from one of the following six: (1) "Oranges are orange", (2) "Lemons are yellow", (3) "Cherries are red", (4) "Apples are green", (5) "Blackberries are black", and ( 6) "Grapes are blue".The test phrases were generated using the British English Amy female voice available at [65].The participants had to type the test phrase they listened to.On a scale from 0 to 1, the maximum amplitude of the digital sound from the test phrases was 0.1, identical to the amplitude of the music that was used by the participants to adjust their volume settings.After the experiment, the response typed by the participant was automatically compared to the correct response to determine whether participants were able to hear the sound and were still attentive to the task.
It should be noted that the synthesized voice may have been difficult to understand because they were presented without context.The purpose of the test phrases was to ascertain whether participants had their sound enabled and maintained sufficient attentiveness throughout the experiment, not to evaluate their proficiency in English.Consequently, we adopted a scoring method in which only the first three letters of the fruit or the first three letters of the color were considered as correct responses.For instance, for the test phrase "lemons are yellow", responses containing the strings "lem" or "yel" were marked as correct.The comparison was not case-sensitive.
After every ten trials, participants were presented with text indicating how many of the 60 sounds they had completed, for example: "You have now completed 10 sounds out of 60.When ready press 'C' to proceed to the next batch." At the end of the experiment, the participants were given a unique code.They had to enter the code on the questionnaire as proof that they completed the experiment in order to receive their remuneration.

Data analysis
For each auditory stimulus, a 'crossing deterrence score' was calculated based on the participants' keypress behavior.The concept of such a score computation originates from previous research [63,[66][67][68], where the time intervals used for calculating the score corresponded to triggers like the moment the vehicle started to brake, the activation of an eHMI, or the vehicle coming to a halt.In our experiment, we expect participants to release the response key (indicating that they do not feel safe to cross) as the vehicle approaches, and to press it again after the vehicle has passed, i.e., after the 7.2-s mark.We defined the crossing deterrence score over the interval from 1.0 to 7.2 s, incorporating a 1-s start-up margin (see Fig. 5 for justification).More specifically, the crossing deterrence score was calculated as 100 % minus the percentage of keypresses in the 1.0 to 7.2-s interval, averaged across all trials and participants.
Moreover, the loudness for the 1.0-7.2s interval was computed from the sound signal using ISO 532-1 [69] (Zwicker loudness; for stationary sounds: [70]; for time-varying sounds: [71]) and ISO-532-2 [72] (Moore-Glasberg method [73]).These methods of determining acoustic loudness take into account characteristics of human hearing, such as the dependence of sound transmission through the middle ear on frequency [70].Swart and Bekker [74] compared the Zwicker loudness with other psychoacoustic metrics and showed that it was suitable for distinguishing between similar vehicle sounds.In addition to the Zwicker loudness and Moore Glasberg methods, we also included Integrated Loudness (LUFS) as a loudness score [75,76].Integrated loudness is considered a more versatile measure of perceived loudness; it is commonly applied to broadcasting and streaming services, where dynamic content spans a range from speech to music and sound effects [77].
Pearson correlation coefficients were computed between the crossing deterrence scores, loudness levels, and the scores of the three questions presented after each stimulus ("easy to notice", "gave me enough information to realize that a vehicle was approaching", and "annoying"), averaged across all participants.

Results
A total of 995 people participated between September 16 and 17, 2020.However, due to a data storage error, data for 420 participants was unavailable, reducing the effective sample size to 575.From this pool, we applied a filtering process, removing participants based on the following criteria: (1) self-reported non-compliance with reading the instructions, (2) being under 18 years of age, (3) completion of the study within an implausible time frame of 15 min, given that the total duration of the stimuli was 13.6 s multiplied by 60 audio clips (equaling 816 s), plus the required time to answer the questions, (4) self-reported hearing problems, and (5) incorrect answers to two or more of the test phrases.These five criteria led to the exclusion of 370 participants, 352 of whom were excluded due to errors in the test phrases, thus yielding a final sample size of 205.
The 205 participants had a mean age of 37.1 years (SD = 11.4 years, median: 36).Of the 205 participants, 140 were male, 63 were female, and 2 preferred not to respond.The countries that were most represented were Venezuela (n = 77), India (n = 13), Russia (n = 11), and the United States (n = 11).The participants took, on average, 48.0 min to complete the study (SD = 19.8min, median = 43.3min).Of the 205 participants, 84 confirmed the use of headphones at the time, while 120 did not, and one chose the option 'I prefer not to respond'.
From the recruiting via acquaintances, 21 people participated between 4 December 2020 and 12 January 2021.This sample consisted of   Fig. 5 shows the corresponding keypress percentages for two example signals from 30 auditory samples.A substantial difference in keypresses can be distinguished, with the continuous tone being a stronger deterrent to cross (i.e., participants released the key earlier) compared to the beeping sound.
Table 2 shows the correlation matrix between the crossing deterrence score, the three self-reported measures, and the loudness for the interval from 1.0 to 7.2 s.The scores were averaged over the 226 participants.One noteworthy observation, illustrated in Fig. 6, is that louder sounds generally had a higher crossing deterrence score.When sounds implicit to driving (such as tire noise and diesel engine noise) are excluded, the correlation between the Moore-Glasberg loudness and the  crossing deterrence score, as depicted in Fig. 6, is considerably stronger (r = 0.86, n = 26) compared to when these sounds are included (r = 0.72, n = 30).
Table 2 and Fig. 7 show that sounds with higher crossing deterrence scores were generally also perceived as more annoying.However, driving-related sounds, namely tire noise and the sound of a diesel engine, were found to be less annoying than the other tested sounds (Fig. 7).Tire noise and the sound of a diesel engine were also deemed less easy to notice than the rest of the sounds (Fig. 8) but still resulted in above-average crossing deterrence (Figs. 6 & 7).
High tones yielded a high crossing deterrence score, encouraging participants to release the key early, while lower tones scored substantially lower, as seen in Figs. 6 and 7. Additionally, intermittent tones resulted in lower crossing deterrence than continuous tones.
Finally, Table 2 highlights the noteworthy finding that the four different measures of perceived loudness capture various aspects.The  ISO-532 measures (Zwicker, Moore-Glasberg) primarily correlate with the crossing deterrence score (r = 0.66-0.72),which is based on keypress behavior, while the integrated loudness seems to correlate more strongly with the subjective measures of 'easy to notice' and 'annoyance' (r = 0.60 and 0.75, respectively).

Discussion
This online study tested how vehicle sounds, naturalistic ones, as well as tones and beeps, influence participants' crossing intentions.In order to isolate the effect of sound, participants were not presented with any visual information, which may be representative of situations in which visual information is unavailable due to visual impairment or  A distinctive feature of our research is that we not only queried subjective qualities, as previous studies have done (e.g., [16,21,30]), but also procured objective data through a keypress.With regard to the latter, a unique aspect is that we did not measure reaction times with the intention of assessing detectability (see, for example, [12,18,22,[26][27][28][29]), but instead asked participants to press and hold down the response key for as long as they believed it was safe to cross.This approach yields information equivalent to that obtained from reaction times, supplemented by the fact that participants can also integrate a decision threshold with respect to what they deem as a 'safe crossing'.Furthermore, through our keypress method, we were able to depict the dynamics of crossing tendencies, as demonstrated in Fig. 5, from which a crossing deterrence score was subsequently calculated.
The results showed that loud sounds were the most effective in discouraging participants from crossing the road.Our findings also indicated that across all 30 stimuli, the more effective sounds, i.e., those causing higher crossing deterrence, were also the more annoying sounds.This supports the "trade-off hypothesis of pleasantness and power" that was reported by Bisping [78] for in-vehicle sounds and according to which an increase in perceived pleasantness of a sound beyond a certain level has negative consequences on the perceived powerfulness of the car and vice versa (and see [79] for a similar negative correlation between valence and dominance/arousal of auditory stimuli).These results raise the question of whether current EVs, which tend to emit naturalistic sounds or pleasant tones [18], are optimally effective in ensuring the safety of pedestrians.
Beeps and intermittent sounds, particularly those of low tonal frequency, yielded low crossing deterrence, even when there was no background noise.One possible explanation is that the inter-pulse intervals we used were long (1000 and 500 ms for beeps and intermittent tones, respectively).Previous studies that investigated shorter intervals found that perceived urgency increases as the beep rate increases, following Stevens' power law [55,[80][81][82].In practice, slow beeps are typically used for slowly evolving situations, such as a truck reversing, whereas fast beeps indicate an approaching hazard [83].Moreover, the duration of the beeps themselves was short, which inhibited speed estimation.At the same time, during the non-emitting intervals, particularly if these are long, no new information is conveyed, leading to a lag in information processing.
In assessing our results, it is important to note that loudness has various definitions and that some of our loudness scores did not strongly intercorrelate.The highest correlation was 0.81 (between Moore-Glasberg loudness and Zwicker time-varying loudness) and the lowest was only 0.04 (between Moore-Glasberg and integrated loudness).The loudness scores correlated with objective outcomes (crossing deterrence score) and subjective outcomes to different extents.One possible explanation resides in the underlying purposes of the loudness scores, where the ISO-532 scores (Zwicker and Moore-Glasberg) are based on psychoacoustic models that take into account human hearing as a function of frequency content [69][70][71][72][73]84], with the Moore and Glasberg method being a more recent alternative [72,73,85].Integrated loudness, on the other hand, represents a more practical industry standard used to assess a wide variety of audio content, including different types of music in terms of loudness [75,76,86,87].These differences are also observable in our results where, for a given sound type (such as the pure tone), there exists a linear relationship between the Moore-Glasberg loudness and objective and subjective outcomes (e.g., Fig. 6), while the other types of sounds do not fit the same trendline, presumably because the ISO-532 methods were not originally intended or validated to compare sounds that are fundamentally distinct from one another [88].
A limitation of this study is that only a small range of simple signals was tested.Moreover, only non-verbal stimuli were tested because, while verbal messages can have richer semantic content, non-verbal ones can convey information faster [89].Nevertheless, a direct comparison of verbal and non-verbal sounds would deserve further investigation.
Another limitation is that we did not have control over the participants' volume settings or the quality of their audio equipment, something that is particularly relevant given the fact that many participants were from low-income countries where laptops or PCs might be of inferior quality.However, we believe that the findings of our study, which concern relative comparisons between stimuli, are robust due to the substantial sample size.In our previous research using a similar  crowdsourcing approach, we found that despite absolute differences in the outcomes between individual countries and variations in age and gender among participants from these nations, the ranking of stimuli in comparison to each other remains consistent across countries [68,90].This means that the ordinal relationships are invariant, even though specific differences exist between the countries.
Lastly, the ecological validity of our experiment is limited because the sounds were tested in a computer environment and in a predictable scenario of an approaching vehicle.In real-world situations, road users may have to divide their attention over a large number of audiovisual stimuli, some distracting from the crossing task, which would potentially impact the detectability of auditory signals emitted from approaching cars.
Our study queried each participant about every sample, assessing to what extent the sound was easy to notice, to what extent it provided sufficient information that a vehicle was approaching, and to what extent it was annoying.The first two items exhibited a very strong correlation (r = 0.94) across the 30 samples and are thus nearly redundant.Accordingly, our three questions capture the two key features of sounds, specifically whether the sound is loud (powerful, potent) and whether it is annoying (unpleasant).Previous research, which examined a large number of samples from the automotive domain and encompassed a greater number of rating scales for these sounds, indicates a third dimension based on multivariate analysis.This dimension can be characterized as brightness or clarity and may also relate to the timbre of the sound [91,92].In other words, our three questions may provide a reasonable depiction of the subjective attributes of sounds but are not exhaustive; future research could include a larger number of diverse rating scales.A disadvantage of this, however, is that the experiment may become unduly lengthy for participants and may be perceived as monotonous.As a remedy, a between-subjects design could potentially be chosen, wherein not all participants would need to evaluate all sound samples.
It is important to note that our findings do not necessarily imply that electric vehicles or eHMIs for automated vehicles should be made louder (and, by extension, more annoying).Doing so is one of the possibilities but not the only way of informing vulnerable road users.Another potential method to increase detectability and possibly increase crossing deterrence is to reduce background noise, for instance, by mitigating various forms of noise pollution on streets.For example, it may be expected that with the increasing electrification of vehicle fleets and the corresponding decline of combustion engine vehicles, artificial engine sounds may be more readily heard.Other options would be to continue research into sounds that are easily detectable yet not annoying, a topic that has been explored before in the literature [27].
Our research provides insights regarding the design of deterrent sounds that are nonetheless not annoying.Research suggests that implicit communication, defined by signals emanating from the vehicle itself, such as speed and pedestrian-vehicle distance, is often sufficient and that explicit forms of communication, like light-emitting eHMIs, hand gestures, or eye contact, are not frequently used or needed [93,94].Our research seems to present a similar picture when it comes to exterior vehicle sounds, where the sounds of the tires and engine alone yielded a high crossing deterrence score while being considered the least annoying.One possible explanation is that tire and engine sounds constitute noise across a broad frequency spectrum, making them less annoying than an explicit tone.Another possible explanation is that tire and engine noises are familiar and naturally associated with a moving vehicle, whereas understanding the meaning of a beep or tone might require some learning, equivalent to visual eHMIs necessitating learning [95][96][97].On the other hand, it should be noted that tire noise can also be confused with the sound of wind or other noises that are naturally present [18].In conclusion, it is our belief that the quest for optimal exterior vehicle sounds-those that are effective without being annoying-resides in the domain of implicit communication, such as the use of tire or engine noise.These findings correspond with earlier suggestions made by [16,30].
Finally, as indicated in the introduction, it must be noted that the precise imitation of the sounds of an ICEV may not necessarily be required or optimal in this context.An interesting approach is used by Maunder [98], who refrained from imitating ICEV sounds.Instead, microphones were used to amplify the sound of the AV motors, aiming to provide a unique experience for both interior and exterior vehicle sounds.A sound resembling that of tires or engines could also be amplified and modified, an approach that is already being adopted by EV manufacturers to comply with legislation.This strategy could be further developed toward more intelligent or adaptive exterior vehicle sounds for AVs and EVs, a process that would necessitate the amendment of existing legislation.
To conclude, this study found a fairly strong relationship between the acoustic loudness of a sound and its effectiveness in preventing participants from crossing the road.The present findings provide a useful reminder that loudness may be the primary (yet not the only) factor to consider in exterior sound design for EVs and AVs.The study also suggests that intermittent beeps may need to be avoided as they may impede the pedestrian's ability to perceive the speed and distance of the approaching vehicle.

Fig. 2 .
Fig. 2. Sound signal as a function of the elapsed time for two example stimuli without background noise (top) and the same stimuli with background noise (bottom).

Fig. 3
shows the spectrogram of the continuous pure tone at 2000 Hz, with and without background noise.From the spectrogram, the sound of the vehicle can be clearly distinguished.The change in frequency at around 7.2 s corresponds to the Doppler effect (see Section 2.2).The regular peaks between 3000 and 6000 Hz in the bottom spectrogram correspond to bird tweets in the background noise.

Fig. 4 .
Fig. 4. A depiction of the questions that the participants received after listening to each sound.

Fig. 5 .
Fig. 5.The mean safe cross perception rate (equivalent to the percentage of trials in which the 'F' key was pressed) plotted as a function of elapsed time for two selected samples.The legend shows the mean and SD of the crossing deterrence score.The gray background indicates the time interval across which the crossing deterrence score was computed.

14 males and 7
females.Collectively, they had an average age of 32.0 years (SD = 11.1, median: 30).None reported wearing headphones at the time of the study.The remainder of the analysis will be conducted for the 205 crowdsourced and the 21 additionally recruited participants together (n = 226).This combined sample consisted of 154 males, 70 females, and 2 participants who indicated 'I prefer not to respond' to the gender question.The mean age for this combined group was 36.6 years (SD = 11.5, median = 34.5).

Fig. 6 .
Fig. 6.Scatter plot of the crossing deterrence score (based on keypress inputs) and the computed loudness score of the 30 stimuli.Filled markers pertain to sounds with added background noise; open markers relate to sounds without any added background noise.

Fig. 7 .
Fig. 7. Scatter plot of the crossing deterrence score (based on keypress inputs) and the perceived annoyance of the 30 stimuli.Filled markers pertain to sounds with added background noise; open markers relate to sounds without any added background noise.

Fig. 8 .
Fig. 8. Scatter plot of the perceived annoyance and the perceived 'easy to notice' of the 30 stimuli.Filled markers pertain to sounds with added background noise; open markers, that is relate to sounds without any added background noise.

Fig. A1 .
Fig. A1.Sound signal as a function of the elapsed time for the auditory stimuli (without background noise).

Fig. A3 .
Fig. A3.Sound signal as a function of the elapsed time for the auditory stimuli (with background noise).

P
.Bazilinskyy et al.

Table 2
Means, standard deviations, and correlation coefficients of the variables for the auditory stimuli (n = 30).