Audible Threshold of Early Reflections with Different Orientations and Delays

Early reflection is an important component in an enclosed sound field. Due to the precedence effect, the early reflection may not be the dominant factor in sound source localization; however, it still has obvious influences on spatial position, loudness, timbre, and etc. Till now, there have sparse studies on evaluation of the audible threshold of early reflections with lacking of general and quantitative results. This work investigated the audible threshold of early reflection with a simplified sound field model under various experimental conditions including the combination of eight incidence angles and five time delays. Three-down-one-up adaptive strategy with three-interval three-alternative forced-choice (3I-3AFC) paradigm was used due to its efficiency and robustness. Results indicate that (1) the audible threshold of early reflections decreases monotonically with increasing time delay relative to the direct sound. Furthermore, a linear equation between early reflection threshold and time delay is established with correlation coefficient higher than 0.9; (2) When the direct sound and the reflection locate in the same half-plane, the audible threshold of early reflections decreases with increasing angle deviation between the direct sound and the reflection. Moreover, a front-back symmetry of early reflection threshold is observed for stimuli below 5 kHz; (3) Considerable variations in early reflection threshold are found among individuals, especially at large angle deviation and time delay of early reflections relative to the direct sound.


Introduction
A sound field in an enclosed space can basically be divided into three components: the direct sound, early reflections and late reflections (i.e.reverberation) 1,2 .Lots of literatures demonstrate that the direct sound has a dominant effect on the localization of auditory event.Compared with the direct sound, the early reflections usually cannot form a separate auditory event due to the precedence effect (also called the law of the first wave front) 3,4 .However, they may cause variations in sound characters (such as loudness, timbre, and spatiality) and have influence on perceptual source width, speech intelligibility, and etc. [5][6][7][8][9] Generally speaking, the early reflections are usually defined as those observed within the initial 50 ms~80 ms after the arrival of the direct sound 6,10,11 .From a physical point of view, the early reflections represent the preceding several orders of interactions of sound with surroundings, such as floors, ceilings, and sidewalls.Due to the room masking effect, the early reflections may be suppressed by the direct sound as well as the late reflections, leading to difficulty in detecting the early reflections.Accordingly, the amplitude level of the early reflection relative to the direct sound, at which it begins to cause any detectable auditory perception, is defined as audible threshold of early reflections.Studies on audible threshold of early reflections are important to understand human auditory mechanism in acoustically complex environments.Moreover, it is also helpful in determining necessary absorptive treatment for building acoustic design, simplifying the complexity of digital signal processing parameters for real-time auralization systems, and designing loudspeaker-based sound reproduction [12][13][14] .
Audible threshold of early reflections has been reported previously 9,[15][16][17][18] .Bech examined early reflection audible thresholds in the context of a loudspeaker-based simulation system positioned in a listening room of normal size 9 .Olive and Toole measured the audible threshold of early reflections in stereophonic reproduction in rooms of domestic or control-room size by using three loudspeakers 15 .Taking advantage of virtual auditory technology, Begault et al. obtained the audible threshold for early reflections as a function of spatial position, time delay, stimulus type, and reverberation in an ordinary room 16 or a simulated 5.1 surround sound listening room 17 .In Reference 17, rules of thumb, such as "a modest amount of reverberation added to anechoic speech stimuli (reverberant-direct ratio of -20 dB) increases thresholds by up to 11 dB", were initially formed for relevant applications.Furthermore, Buchholz et al. pointed out the room masking was the underlying mechanism of early reflection audible threshold, and developed a monaural perceptual model of room masking 18 .As reviewed, previous studies on the measurement of early reflection threshold are mainly confined to specific rooms or reproduction layouts, and therefore the spatial and temporal distributions of early reflections relative to the direct sound are limited.Moreover, the number of subjects in previous studies was always less than ten due to intensive experimental load.In summary, the audible threshold of early reflections has not been systematically investigated by adopting a complete list of parameters; moreover, a quantitative relationship between the audible threshold and parameters of early reflections has not been established yet.
This work aims to comprehensively measure the audible threshold of early reflections under various conditions, and set up a formula of the audible threshold and time delay of early reflections.As seen from above literatures, with the development of modern signal processing, the experiment on early reflection threshold has been implemented from real loudspeaker setup to virtual headphone-based reproduction 19 , while the latter is beneficial for flexible parameter manipulation and hardware control.Therefore, this work employed virtual headphone-based reproduction to evaluate the audible threshold of early reflections.The paper is organized as follows: Section 2 describes the model and paradigm in details; Section 3 analyzes results qualitatively and quantitatively, in which the linear relationship between audible threshold and time delay of early reflections is proposed; In Section 4, some existing findings are justified by comparing results from the present and previous studies; Finally, a summary is given in Section 5.

Method 2.1 A simplified sound field model
Although an infinite number of temporal and spatial distributions exists for early reflections in real cases, this work used a simplified sound field consisting of a single direct sound and a single early reflection.As pointed out by References 9 and 17, the presence of multiple reflections, either early reflections or late reflections, results in an increase of the audible threshold for the individual reflection due to the masking effect.Therefore, the simplified model with a single direct sound and a single early reflection represents the worst-case where humans are most sensitive to the early reflections, resulting in the minimum value of the audible threshold of early reflections.This model has also been used in research of the precedence effect 3,4 and simplification of binaural head-related impulse response 20 .Note that, although the simplified sound field used here may cause some unnatural perception, it has no influence on discrimination in three-interval three-alternative forced-choice paradigm; moreover, the experiment was designed to test a principle, so not necessarily to be realistic.

Experimental method
The transformed up-down adaptive method is recognized as a robust and effective way to evaluate threshold 21 , in which the stimulus level on any one trial is determined by the preceding stimuli and response.In experimental design, four aspects need to be taken into consideration.
(1) Choice of up-down strategy.The up-down strategy determines the convergence point on a psychometric function.Three-down-one-up adaptive strategy, which produces a threshold level targeting 79.4% correct responses, was adopted here.In this strategy, the adaptive rule prescribes that three consecutive positive responses lead to a decrease in the level of early reflection, whereas a negative response leads to an increase.
(2) Choice of response paradigm.On each trial, a multiple sequential presentation according to three-interval three-alternative forced-choice (3I-3AFC) paradigm was used due to its efficiency and robustness 22 .In 3I-3AFC, a trial (or a stimulus presentation) is consisted of three segments, and each segment is either reference stimulus A or comparison stimulus B. In this case, the reference stimulus A represented the condition only a single direct sound; while the comparison stimulus B represented the condition with a signal direct sound plus an early reflection.There were totally three kinds of stimulus presentations: A-A-B, A-B-A, and B-A-A.Subjects were asked to judge which segment was different from the other two segments according to whatever differences perceived, and then gave a response.
(3) Choice of initial value and step size of stimulus level.The level of early reflection relative to the direct sound in the first trial is termed the initial value.To facilitate the subjects to make positive responses, the initial level of early reflection was chosen to be 0 dB relative to the direct sound.On the other hand, the step size is often changed from a high to low value after a certain number of trials, implying gradual convergence.In this work, the initial step size of early reflection level was set to be 8 dB, and reduced by half at each reversal until the 1 dB step size was reached.
(4) Choice of termination condition.In the up-down adaptive method, a run refers to a sequence of trials where the changes in stimulus level are all in one direction ("up" or "down"); while a reversal is the point where the direction of stimulus adjustment changes.As recommended by Wetherill G. B. and Levitt H. 23 , a testing would be terminated after obtaining six to eight reversals.In this work, termination reached after a total of eight reversals, and the threshold was then calculated as the mean value of the five final reversals.
Figure 1 shows an illustrative experimental block in measurement of the audible threshold of early reflections.The initial level value of the early reflection in the comparison stimulus B was set to be 0 dB relative to the direct sound.Under this condition, the subject was always able to make three consecutive positive responses for obvious difference in auditory perception between stimuli A and B. Thus, the level of early reflection was decreased from 0 dB to -8 dB for the initial step size was set to be 8 dB.The subject was still able to make three consecutive positive responses, and then the level of early reflection was decreased from -8 dB to -16 dB.Under this condition, the first reversal occurred because the subject made a negative response, and then the level of early reflection was increased from -16 dB to -12 dB with a step size of 4 dB (i.e.half of the initial step size of 8 dB).Whereafter, the second reversal occurred because the subject gave three consecutive positive responses again, and then the level of early reflection was decreased from -12 dB to -14 dB with a step size of 2 dB (i.e.half of the step size of 4 dB).Furthermore, the step size became 1 dB after the third reversal and retained till the appearance of the eighth reversal.The mean value of -16.2 dB across the latter five reversals (No. 4~No.8) was calculated as the audible threshold of early reflections for this experimental block.

Experimental apparatus and procedure
The head centre was chosen as the origin of coordinates.The distance from a sound source to the origin was denoted by r.The sound source direction was specified by the azimuth θ from 0º to 360º and the elevation  from -90º to 90º, where =0º, 90º respectively correspond to the horizontal plane and top.In the horizontal plane, θ=0º was the front and θ=90° was the right.
Figure 2 illustrates modules of the experimental platform.In the module of binaural stimulus synthesis, the binaural signals representing sound transmission from sound source to ears through direct or reflective paths were synthesized using virtual auditory technique, in which a virtual source was synthesized by filtering mono stimuli with head-related transfer functions (HRTFs) at corresponding position of the intended virtual source 19 .In our experiments, an HRTF dataset of KEMAR was used because KEMAR is regarded as a representative manikin and has been widely used in research of binaural hearing [24][25][26][27] .The dataset of our lab contained HRTFs at 3259 spatial positions at r=1.0 m with a sampling frequency of 44.1 kHz 25 .Moreover, a typical segment of Chinese speech ("Mei Tan Bu Mei"), which has been adopted as one of materials for subjective assessment of sound quality of electro-acoustical products in Chinese National Standard GSBM.6001-89, was used here as the mono stimulus 30 .In the experiments, the direct sound was always fixed directly in front of the subject at (θ=0º, =0º).To thorough investigate how the audible threshold of early reflections varies with incident angle as well as time delay relative to the direct sound, eight horizontal incident angles of early reflections (θ=0°, 30°, 45°, 60°, 90°, 120°, 150° and 180°, ϕ=0°) with five different time delays (10 ms, 20 ms, 30 ms, 40 ms, 50 ms) were tested, that is, 40 experimental conditions were examined.Note that, although HRTFs represent the sound transmission from source to ears in free-field without reflections, the reflective transmission path in the comparison stimulus B mentioned in Section 2.2 can also be synthesized by filtering mono stimuli with HRTFs at corresponding position of the mirrored virtual source determined by the image-source method 28,29 .
A pair of circumaural headphone (Sennheiser HD 250 II) was used to render the synthesized binaural signals.In order to eliminate the adverse influence caused by non-ideal headphone transfer character on reproduction performance, headphone equalization was implemented in the module of headphone equalization 31 , see Figure 2. The three-down-one-up adaptive procedure with a 3I-3AFC paradigm described in Section 2.2 was implemented through a graphical user interface (GUI) created in MATLAB.For each trial, the subject indicated his or her response via pushing corresponding button in the GUI interface.
Fifteen human subjects from 21 years to 25 years of age participated in the listening experiments.As verified by audiometric screening within 15 dB HL below 8 kHz, the fifteen subjects have normal hearing and basic hearing balance for both ears.To guarantee experimental stability, the subjects were exposed to an extensive training program before they participated in the main experiments, including a procedural training and an auditory training with aim to familiarize the subjects with experimental procedure and signals respectively.In total, each subject conducted 40 experimental blocks (8 incidence angles×5 time delays).Note that, in each block, the number of trials varied with subjects due to differences in auditory discrimination ability and experimental stability among subjects.For example, in Figure 1, 32 trails were carried out till eight reversals reached.Generally speaking, 30~60 trails were needed before terminating the test.

Results
Figure 3 shows the mean value of early reflection thresholds across fifteen subjects.As shown, the early reflection threshold decreases monotonically with increasing time delay between the direct sound and the reflection for all incidence angles of early reflection, which is consistent to previous reports (for example, see Figure 6 in Reference 15).This phenomenon is attributable to the release from masking of the direct sound, when the time delay of the reflection relative to the direct sound increases.Moreover, a clear linear relationship between early reflection threshold and time delay is observed, and thus a further linear fitting was performed, see Table 1.The fitting results indicate that, although the early reflection thresholds for different incident angles have different values (i.e.different intercepts), the decreasing rate of early reflection threshold with time delay is similar (i.e.similar slope).Compared with the frontal incidence angles, the relationship between early reflection threshold and time delay for the lateral and backward incidence angles (θ≥90°) slightly deviates from linearity, which may due to the complex diffraction of the early reflection caused by the protruding form of pinna.
Figure 4 shows the mean value of early reflection threshold varying with incidence angle of the early reflection.For each time delay, two general tendencies are observed: (1) When the reflection deviates from the direct sound in the frontal half-plane (θ=0°~90°), the value of early reflection threshold decreases gradually, while the maximal threshold occurs when the direct sound and reflection overlap in space (i.e. both at θ=0°).This phenomenon is also attributable to the release from masking of the direct sound, when the difference in spatial position between the direct sound and the reflection increases.
(2) There exists a front-back symmetry of early reflection threshold for each time delay, that is, the early reflection threshold for θ=0°, 30°, and 60° is close to that for θ=180°, 150°, and 120°, respectively.The frequency range of the Chinese speech employed in the current work is mainly below 5 kHz.As pointed out in our previous study 32 , the human anatomical structures (such as head) demonstrated a good front-back symmetry below 5 kHz, and then the sound transmission path of a frontal sound incidence (e.g.θ=30°) and that of a mirrored sound incidence (e.g.θ=150°) were similar.Therefore, early reflection thresholds for the two positions (e.g.θ=30° and 150°) are similar.

Figure 3. The mean value of early reflection threshold varying with time delay.
Although the general tendencies of mean values of early reflection threshold are obvious in Figures 3 and 4, considerable inter-subject differences are observed as shown in Figure 5, especially at lateral positions.The maximal standard deviation of 4.2 dB appears at incidence angle 120° and time delay 50 ms; while the minimal standard deviation 1.7 dB occurs at incidence angle 0° and time delay 50 ms.For most of incidence angles and time delays, the standard deviation of early reflection threshold is around 2~3 dB.

Discussion
This work comprehensively investigated the audible threshold of early reflections under various incidence angles and time delays using HRTF-based virtual auditory technique, and furthermore established a linear relationship between early reflection threshold and time delay.Results indicate that, although the early reflection thresholds vary with individuals, the general tendencies are obvious and consistent to previous literatures.
Previous literatures reported the early reflection threshold decreased monotonically with increasing time delay between the direct sound and the reflection 15,17 , which is also observed in this work (see Figure 3).However, in literatures, the above conclusion was derived from results of a limited number of time delays, since previous studies mainly confined to specific rooms or loudspeaker layouts.On the contrary, the current work employed time delays from 10 ms to 50 ms at a uniform interval of 10ms covering the whole time span of early reflections.On the other hand, previous literatures also demonstrated that the early reflection threshold decreased with increasing angle deviation between the direct sound and the reflection, which is also observed in this work (see Figure 4).Similarly, previous studies derived this conclusion from the limited number of incidence angles of early reflections which were mainly distributed in the frontal half-plane; however, the current work employed incidence angles from the direct front to back positions.On basis of the sufficient experimental data, this work mainly contributes at two aspects: (1) Establishing the linear relationship between early reflection threshold and time delay; (2) Pointing out the front-back symmetry of early reflection threshold.
Although there is a good qualitative agreement between current and previous works, some quantitative differences exist.For example, the early reflection threshold at incidence angle 0° and time delay 10 ms was about -18 dB in Begault's work 17 , while about -12 dB in this work.As previous studies pointed out, the threshold for a single early reflection in combination with a direct sound was independent of reproduction level, if the absolute level of the reflection was above the hearing threshold 9 .Therefore, the difference in early reflection thresholds may not attribute to the difference in reproduction level between Begault's work and the current work.Since the early reflection threshold is highly stimuli-dependent 9 , the above-mentioned difference in early reflection threshold may be due to the difference in acoustical characteristics between the foreign speech used in Begault's work and the Chinese speech used in the current work.
In virtual auditory technique, employing individualized HRTFs measured from the listener is optimal in view of accurate source localization 33 .However, this work used generic HRTFs from KEMAR due to following considerations.Firstly, in evaluating the spatial discrimination threshold of HRTF magnitude, the authors concluded that it was reasonable to use mean data (such as HRTFs from KEMAR) rather than individualized data in discrimination threshold evaluation, such as statistical mean HRTFs were used in ANSI loudness model; moreover, the ability of auditory discrimination depended on human hearing rather than HRTF data 34 .Secondly, individualized HRTFs are difficult to obtain because of tedious measurements as well as the requirement of sophisticated equipment.Finally, this work aims to obtain general results on early reflect threshold by using the mean value across all subjects, which naturally neglects the individual characteristic of subject.

Conclusions
This work investigated the audible threshold of early reflections in a worst-case by using a simplified sound field model under various experimental conditions (including incidence angle and time delay).The major conclusions are summarized as follows.
(1) The audible threshold of early reflection decreases monotonically with increasing time delay relative to the direct sound.Moreover, a good linear relationship between early reflection threshold and time delay is established with correlation coefficient higher than 0.9.
(2) When the direct sound and the reflection locate in the same half-plane, the audible threshold of early reflection decreases with increasing angle deviation between the direct sound and the reflection.Moreover, a front-back symmetry of early reflection threshold is observed.
(3) There are considerable variations in early reflection threshold among individuals with increasing of angle deviation and time delay, whereas the general tendency is distinct.

Figure 1 .
Figure 1.An illustrative experimental block with eight reversals in measurement of the audible threshold of early reflections.

Figure 2 .
Figure 2. Modules of the experimental platform.

Figure 4 .
Figure 4.The mean value of early reflection threshold varying with incidence angle.

Figure 5 .
Figure 5.The standard deviation of inter-subject differences in early reflection thresholds.

Table 1 .
The linear fitting between early reflection threshold and time delay.