The soundscape dynamics of human agglomeration

We report a statistical analysis about people agglomeration soundscape. Specifically, we investigate the normalized sound amplitudes and intensities that emerge from people collective meetings. Our findings support the existence of nontrivial dynamics characterized by heavy tail distributions in the sound amplitudes, long-range correlations in the sound intensity and non-exponential distributions in the return interval distributions. Additionally, motivated by the time-dependent behavior present in the volatility/variance series, we compare the observational data with those obtained from a minimalist autoregressive stochastic model, a GARCH process, finding a good agreement.


Introduction
Physicists are now addressing problems very far from their traditional domain. Even social phenomena are now ubiquitous in the research made by the statistical physicists [1]. In particular, the general framework of the Statistical Physics has been successfully applied to diverse interdisciplinary fields ranging from finance [2], genetics [3] and biology [4], to religion [5], tournaments [6], culinary [7], and music [8].
Naturally, in social phenomena the basic constituents of the system is the human. Humans are known to have nontrivial collective dynamics, much more complicated than idealized physical interacting systems. Moreover, even individual aspects related to social agents may not be available. This complex scenario is reflected, in some sense, in several human activities. For instance, elections [9], collaborations between actors [10] and also between scientists [11], phone text-message [12], mail [13,14] or email [14,15] communication, human travel [16,17], and collective listening habits [18,19] are just a few examples where complex structures have been found.
Most of the previous investigation deal with record data obtained directly or indirectly from the system, trying to extract some patterns or regularities about the system dynamics. This approach has been a trend towards investigating social phenomena and also complex systems in general [20,21,22,23,24]. Within this framework the most diversified data were used as the sound. "Listen to" the system dynamics may be both a simple task and a minimally invasive measurement. In this direction, several studies focused on the sound time series have been done. Just to mention a few: researches about the acoustic emission from crumpled paper [25,26], from paper fracture [27], and fractures in general [28,29] show several features related to critical phenomena, the power spectrum of music and speech sounds presents 1/f -like spectra [30] and the normalized sound amplitude shows non-Gaussian features [31], traffic flows were investigated by using the sound noise revealing scaling and memory [32], avalanches-like dynamics was found in the sound of popping bubbles in foams [33] and also in the lung sound [34].
In this work, we present an investigation about a very common situation related to human collective activities: the people agglomeration. Human beings agglomerations can emerge in various places for different reasons, for example, people having lunch in restaurant, parties, and working meetings. In all these situations a common and notorious feature is perceptible: the resulting sound noise from these agglomerations. Here, our main goal is to show that a nontrivial dynamic emerges when analyzing this kind of time series. In addition, employing a minimalist model we are able to reproduce statistical aspects of the empirical data. In the following, we present the details about the data acquisition, the statistical analysis of the data, our minimal model, and finally we end with a summary.

Data presentation
The observational data was obtained by recording the soundscape of people agglomeration in the recreation time at our university. The meeting point is an open place where the students spend time until the next class. All the measurements were made by using a condenser microphone (Shure Microflex MX202W/N) positioned in the central part of the agglomeration. We employed a sampling rate of 44.1 kHz in order to cover the full audible human range (approximately between 20 Hz and 20 kHz). Additionally, the measurements were made during different periods in nine days totaling 16 records. The number of people during the recordings ranged approximately between 100 and 200, and these variations does not significantly change the statistical results. Typical recording times are about 10 minutes and along the recording the number of persons was approximately constant. We also analyzed 10 recordings from a web sound database ‡ finding similar results when compared with our measurements. Figure 1(a) shows a representative record signal where we employed the normalized sound amplitude A t , i.e., the sound amplitude subtracted by its mean value and divided by its standard deviation. Figure 1(b) presents the sound intensity, A 2 t , divided by its standard deviation. From these two figures, we can observe the existence of some bursts where the sound amplitude and the sound intensity exceed values much larger than their standard deviations. Qualitatively, the origin of these extreme events may be, for instance, related to the fact that the people want to be heard, and if the neighbors are talking out loud they also have to increase the sound intensity.

Statistical analysis
One of the most direct ways to characterize the sound amplitude is by evaluating its probability density function (pdf). We show this analysis in Figure 2(a) for three typical recordings where we also plot one Gaussian distribution with zero mean and unitary variance (dashed line). A quite similar behavior has been found for all the other realizations of the experiment and also for the web recordings (at least in the central part of the distribution). The empirical distribution clearly differs from the Gaussian one, especially for larger values of the sound amplitude (|A| greater than four standard deviations). Naturally, this heavy tail behavior reflects the presence of extreme events that we qualitatively see in Figure 1.
A possible manner to investigate the dynamics of these extreme events is by evaluating the time interval between them. These time intervals can be obtained by considering a threshold value q and storing all the initial time t i for which the normalized sound intensity is above this edge. The difference between two consecutive times τ i = t i+1 − t i is the so called return interval. For Gaussian uncorrelated (or weak correlated) random variables the distribution of τ i is well known to follows an exponential distribution p(τ ) ∼ e −τ /τq , whereτ q is the average value of τ i when considering the threshold value q. Additionally, empirical results have shown that, in the presence of power law correlations in the data, the distribution is well adjusted by a stretched exponential [35,37,36] or by a Weilbull distribution [38], i.e., where A and B are constants and γ is the exponent of the power law autocorrelation function. These distributions also emerge in the analytical framework of Santhanam and Kantz [39] when considering a long-range correlated noise with Gaussian pdf. Notice that all these distributions are dependent on q, but if we employ the scaled variable τ i /τ this dependence is eliminated. Before we investigate the return intervals, let us address the correlation question by using the detrended fluctuation analysis (DFA) [40]. This technique basically considers the root mean square fluctuation function F (n) (see for instance [41] the series is long-range correlated. Figure 2(b) shows the fluctuation function versus n for the same three recordings of Figure 2(a) where we found h ≈ 0.88 indicating that long-range correlations are present in the data. Note that the three curves are practically identical. This fact is evidenced by evaluating the mean value of h (h) and its standard deviation (σ h ) over the 16 recordings findingh = 0.88 and σ h = 0.001. When considering the web recordings this values remain close:h = 0.89 and σ h = 0.01. Now, advancing with the return interval distribution, it is interesting to emphasize that the exponents h and γ are related via γ = 2(1−h). Moreover, since the distribution of τ i /τ should be normalized and also have unitary mean, the only fit parameter is γ that can be obtained from h leading to γ ≈ 0.24. Figure 2(c) shows this distribution for three values of q where we can observe a reasonable data collapse but a not so good agreement with the distributions of equation 1. Similar situation have been recently observed when considering non-Gaussian distributions related to water boiling [42].
We can also investigate the bursts observed in Figure 1(a) by evaluating the volatility of the normalized sound amplitude.
This time series refers to the local standard deviation of A(t) estimated over a time window w = n∆t, i.e., v 2 where , n is a integer and ∆t is the sampling time interval. Figure 2(d) shows the volatility distribution of our empirical data for time windows ranging from 1/100 to 1 second. Notice that we found a good collapse of data and that this distribution has an asymptotic power law decay characterized by a exponent η = 4.1. The mean value and the standard deviation of η calculated over the 16 realizations are respectivelyη = 4.29 and σ η = 0.35 (η = 4.90 and σ η = 1.10 for the web recordings).

Modeling
Our starting point to model the data behavior is the non-stationary aspect of the volatility. Figure 2(d) supports the conclusion that the volatility of the sound amplitude is a time-dependent stochastic process and Figure 2(b) indicates that long-term memory are present in sound intensity series. This feature is very common in financial data where the volatility (or risk) is one of most essential ingredients in the price dynamics. In this scenario, much work has been done [43] and consequently a large amount of models are available. From a qualitative point of view, the interactions (competitions) among people present financial markets seems to be similar to the ones existent in our social system. This picture motives us to employ a typical financial model to our data.
One of these models is the generalized autoregressive conditional heteroskedastic processes or simply the GARCH process. This model was proposed [44] (at least in part) to take into account the long memory typically found in financial data. It is defined in its most general form, GARCH(p, q), by where α i and β i are positive control parameters and ξ t is a uncorrelated random variable with zero mean and unitary variance. Thus, the GARCH process is uncorrelated in x t but correlated in the variance. Also note that for α i = 0 the GARCH recovery the so called ARCH process [45].
Here, for simplicity and also for satisfactoriness we will focus on the GARCH(1, 1) process for which we choose the distribution of ξ t to follow the standard Gaussian. After this simplification the model have three parameters: α 0 , α 1 and β 1 . However, since the sound amplitude is scaled to a unitary variance, we can eliminate one of these parameters by using the expected variance of the GARCH(1, 1) process x t : In this manner, we have now two parameters that we incrementally update to minimize, via the method of least squares, the difference between the simulated values of sound amplitude and the observational ones. The best values for the parameters are α 1 = 0.011 and β 1 = 0.9889 leading to α 0 = 0.001 since σ x = 1. The comparison with the empirical data is shown in Figure 2, where the GARCH(1, 1) predictions are indicated by the continuous lines. We can see that the agreement between the data and the GARCH(1, 1) is very good. Concerning Figure2(b), where we compare the DFA analysis, we have to remark that the autocorrelation function of the variable x 2 t is not really long-range correlated. In fact, it has an exponential decay [44], i.e., However, the GARCH(1, 1) process can mimic the long-range decay for large values of the characteristic time τ c . This feature can be achieved by choosing the sum α 1 + β 1 closer the unity. In our case, α 1 + β 1 = 0.9999 leading to characteristic time τ c ∼ 10 4 seconds, which is very large mimicking at least in part the long-range correlations. Notice that the empirical data also present deviations from the straight line suggesting that correlations present in the data may have a kind of exponential cutoff.

Summary
In this work we investigated some statistical aspects of the collective sound emitted by people when they are agglomerated in a meeting place. Empirical evidences showed that (i) the normalized sound amplitude is not Gaussian distributed, (ii) the sound intensity presents long-range correlations, (iii) the return interval distribution of the sound intensity is not exponential, and (iv) the volatility of the sound amplitude is nonstationary having a power law tail in its distribution. Motivated by the time dependence of the volatility, we compared the observational quantities with the predictions of the GARCH(1, 1) model, finding a good agreement with all of them.
Before concluding, we would like to point out some possible mechanisms responsible by the presence of heavy tail distributions and long-term correlations in the data. The first one is related to the fact that humans already have an intrinsic complex behavior which may manifest in our measurements. Second, these individuals form small interacting groups adding more complexity to the system. On a third level, there is also an emergence of interactions between groups. Naturally, more detailed measurements and models should be considered, in comparison with those one presented here, to obtain a broad understanding of this system.