Introduction

Noise pollution is a serious environmental stressor affecting human health, second only to ultra-fine particulate matter (PM2.5) in its impact (). Noise may be defined as unwanted and unwelcome sound which causes nuisance and irritability, or more simply as sound out of place (, , ). About 75% of Europe’s population lives in urban areas and with increasing urbanisation, population density and associated daily human activity, urban environments are becoming noisier and complaints against environmental noise are increasing (). Approximately 20% of the European population is exposed to unacceptable noise levels (, ). Road traffic is the main cause of noise pollution in urban settings (, , , ) followed by rail and air traffic and industry (). Beyond Europe the situation may be even worse since in those countries noise pollution is not always considered an environmental problem ().

Although people are generally quite resilient to noise exposure, the level of adaptation (which is never totally complete) differs substantially from individual to individual (). Long term exposure can lead to annoyance, stress, sleep disturbance and daytime sleepiness, affect cognitive performance in schoolchildren, and the performance of staff and patient outcomes in hospitals (, , , ). Apart from causing hearing loss and tinnitus, exposure to noise is also linked to the occurrence of hypertension and an increased risk of heart attack, stroke, ischemia and other cardiovascular diseases (, , ).

Noise pollution is characteristically a spatial and temporal phenomenon, largely shaped by urban form and land-use. Considering the elements of urban form in the assessment of urban noise is undoubtedly important for understanding exposure to noise at the street scale (, ). It is widely accepted that the mapping and modelling of noise and sound across space is crucial for visualising and quantifying potential impacts in urban environments (, , , ). Yet, despite all the evidence and the ability to measure it precisely, the amount of averaged energy or peak levels of noise pollution that people are exposed to in cities has been neglected (). Even less is known about the spatial pattern of the various sound frequency bands that make up the acoustic environment in urban areas. Indeed, very few large-scale surveys have been carried out to capture urban sounds and noise from all sources at street level in a spatially-explicit manner, most noise maps being based on traffic data and sound propagation modelling (). Assessing the acoustic environment of an entire city is important because the city is often the management and governance unit within which planning decisions are made. If planners knew the entire landscape they were managing, more informed decisions could be made. However, traditional large-scale acoustic surveys and soundscape assessments are complex, expensive and time-consuming to mount (, ). Most involve measurement at a small scale () at either fixed locations or at stops along soundwalks () where sound (or noise levels) and peoples’ reactions are recorded for at least five minutes per location (, ). In this paper, the authors present a different approach, using mobile acoustic surveys () made by walking observers. Using this rapid survey approach, we were able to record ~52,000 sound clips across an entire city (~52 km2) and report here the findings from a data mining analysis of the spatial patterning in sound pressure levels according to frequency bands and band combinations. To illustrate the value of a city-wide sound survey, we consider two applications: one related to biodiversity conservation and human well-being; and the second to studies of environmental social equity.

Concern about the negative health and well-being consequences of noise has led many authors to ask where its impacts are least felt (), and this is especially relevant to urban greenspace () as a key provider of ecosystem services (, ). There is also growing recognition that acoustic environments that are good for humans may also benefit urban biodiversity (, ). The linking concept is “naturalness”, recognised as a desirable but elusive property in biodiversity conservation since the 1970s () and now applied in studies of well-being (). Methods to characterize naturalness in the acoustic environment have come from soundscape ecology (, ) although they have been developed mainly for natural areas (). Frequency bands have been divided into the characteristic sounds of the biotic world, human sources and the physical environment, known as biophony, anthropophony and geophony respectively (, , , ). There remains doubt, however, whether this classification is fully applicable to urban areas because of the potentially different spectral composition present (, ). If ecoacoustic indices could be applied in urban areas, they could offer a useful way to characterize the spatial pattern in naturalness within cities (but also see ).

One of the potential consequences arising from spatial pattern in the acoustic environment is social inequity in exposure (, , , ). This applies not only to noise but also differential exposure to the various frequency components. There is often a general presumption that socially disadvantaged groups are more exposed to environmental stressors including noise (, ) although this view has not been supported in some studies (, ). Again, such questions are challenging to address at the whole-city scale and most studies have focused only on road traffic noise () rather than sounds from all sources at street level. Using the data gathered from our rapid survey approach, we consider whether perceptions of inequity depend on the sound frequency bands being studied and not only the usual A-weighted noise levels (dBA).

The research questions specifically addressed in this paper at the whole city scale are therefore:

  1. How does sound frequency composition differ across urban space?
  2. How useful are ecoacoustic indices at characterising naturalness across the city?
  3. Do perceptions of social inequity to sound exposure differ according to the frequency bands considered?

Methods

Data collection and pre-processing

Sound data were collected in the city of Southampton on the south coast of the UK and covered the whole area (51.8 km2) within the administrative boundaries of the City Council. Much of Southampton lies on a peninsula formed by the Itchen and Test rivers which funnel traffic towards the city centre and the busy freight and passenger terminals at the port. The city’s traffic congestion causes air pollution and noise problems yet is contrasted by many quieter areas of greenspace amounting to c.11.0 km2. This varied urban form might be expected to lead to a similarly varied acoustic environment.

Using a spatially-stratified sampling scheme based on land characteristics (see details below), continuous sound recordings were made by walking surveyors within the period 14 July to 25 August 2016 during the morning rush hour (7:00–9:00), afternoon (13:00–15:00) and evening rush hour (16:30–18:30). It was not practicable to define precisely the routes taken by surveyors due to road closures, traffic and other constraints beyond our control. Instead, start and finish points were defined and surveyors were asked to walk through as many land cover types as possible along their route. The data for all time periods have been merged for the analyses in this paper and temporal differences are not considered further.

Recordings were made using Fostex FR–2LE and TASCAM DR–40 recorders, with PCB sensor signal conditioners and microphones. Equipment was carried inside rucksacks, with the microphone mounted above the shoulders at a height of 1.65–1.70 m from the ground. To minimise impact on recordings, surveyors wore soft-soled shoes, soft clothes, no jangling accessories such as necklaces, and walked at a constant pace. Surveys were not carried out during windy and rainy days, although variations in wind during surveys were inevitable. The location of the walking observer was logged every ten seconds as a track using a Garmin Oregon 400t GPS unit.

The recording sampling rate used was 96 kHz to ensure coverage of the ultrasonic range, theoretically to 48 kHz but limited to 22.7 kHz here due to microphone sensitivity. Microphones were calibrated using a Brüel & Kær sound level calibrator type 4230 emitting 94 dB at 1000 Hz, at the beginning or end of survey days. Both low and high frequency components showed some difference between equipment sets and therefore an empirical calibration was additionally used with corrections of around 1 dB below 44 Hz and 2 dB above 5.7 kHz. Sound clips were saved as uncompressed wav files. Custom-written routines in Matlab used Fast Fourier Transforms to calculate the sound pressure levels (SPL) in octave bands (Table 1) for non-overlapping 10-second clips from each sound recording, centred on the timestamp of each stored GPS coordinate. SPLs were then used to calculate dBA (A-weighted decibel levels) through the logarithmic addition of the factors in Table 1.

Table 1

Octave bands used for analysing the frequency composition of city sound clips and conversion factors for A-weighting (dBA).

Octave bandLower frequency HzUpper frequency HzA-weighting factor

B11122–57.21
B22244–39.80
B34488–26.43
B488177–16.21
B5177355–8.65
B6355710–3.22
B771014200.01
B8142028401.20
B9284056800.96
B10568011360–1.17
B111136022720–6.75

Ecoacoustic indices

Many ecoacoustic indices have been proposed (, , , ) based on weighted combinations of SPL in frequency bands. However, most use narrow bands of 1 kHz in contrast to the octaves used here. Biophony is often defined as sounds between 2 and 8 kHz, and anthropophony as sounds between 1 kHz to 2 kHz (e.g. ). The nearest equivalent for biophony calculated from octaves would be bands 9 and 10 (i.e. 2.8 to 11.4 kHz) and anthropophony bands 7 and 8 (710 Hz to 2.8 kHz). In calculating approximations to ecoacoustic indices based on octave bands, here they have been re-named using the word “octave” to avoid confusion. Specifically, a Normalized Difference Octave Index (NDOI) was calculated as: NDOI = ([B9 + B10]–[B7 + B8])/(B7 + B8 + B9 + B10), where Bx refers to band x in Table 1. An Octave Diversity Index (ODI) was calculated using the Shannon-Weiner function as: ODI30 = –∑px ln px, where px is the proportion of energy in band x, included only if its SPL was >30 dB. (The 30dB cut-off was simply used to focus the index on higher SPLs). For both indices, the calculations were made on linear measures of power, e.g. an SPL of 50 dB was first converted using 10(50/10) before manipulation. These measures are not equivalents of the usual ecoacoustic indices but have some of the same characteristics.

Spatial patterns

Two approaches were used to detect patterns in the sound pressure levels among octaves within the sound clips. First, treating the dB levels in the 11 octave bands as variables, a variant of spectral clustering was used to identify natural groupings of the sound clips while making minimal assumptions about the data. Spectral clustering is a non-parametric eigenvector approach based on graph theory that is unaffected by outliers, noise in the data or the shape of clusters, and which often outperforms traditional clustering methods (, ). It becomes computationally unfeasible, however, when the sample size and number of variables are large. To overcome this, we used the novel implementation in the R package SamSPECTRAL (, ) which combines a faithful sub-sampling scheme with spectral clustering through a modification of the similarity matrix based on potential theory. Furthermore, it integrates an objective, data-driven method to identify the optimum number of clusters in contrast with techniques such as K-Means clustering or Kohonen’s self-organising maps that require the user to predefine the number of clusters to extract. In SamSPECTRAL this is achieved through two tuning parameters which determine the resolution in the initial spectral clustering stage, and then the extent to which the identified clusters are finally combined (). After randomizing the order of sound clips to remove dependencies, the tuning parameters were defined on a random sample of 5000 clips for efficiency, and then used for clustering the entire data set of 52,366 sound clips based on the 11 octave variables. The result in our case is an objective definition of how many different patterns of SPL exist across the octave bands within the city.

In the second approach, standardized principal components analysis (PCA) was applied to identify linear combinations of the dB levels in the 11 octave bands that best summarised the variance across all sound clips. A varimax-rotated solution was chosen to provide good separation of octaves among principal components (PCs), and all PCs with eigenvalues greater than 1.0 were extracted. A second PCA was also run including the 11 octave bands plus dBA, NDOI and ODI30 since, according to Devos (), ecoacoustic indices are naturally co-linear with each other and other acoustic information. The aim here was to reduce the octave bands and ecoacoustic indices to fewer, zero correlated factors to ease interpretation and to facilitate mapping.

The PCs and sound frequency components were mapped in ArcGIS 10.4 and interpolated to 30 m resolution using Inverse Distance Weighting (IDW) with a power of 2 and fixed radius of 200 m. A resolution of 30 m is appropriate because the sound clips were gathered during 10 seconds of walking, positioned using GPS with 5–10 m error, making the longest axis of the sampled space around the recorder about 20–30 m in length. Each sound clip was assigned membership to a cluster identified by spectral clustering and a majority rule used to map the modal cluster per 30 m pixel.

To interpret the maps in terms of land use and land cover (especially relevant to the consideration of naturalness), we overlaid the sound data on OS MasterMap 1:1250 scale topographic vector data (downloaded 10 Mar 2015 from the EDINA Digimap Ordnance Survey Service http://digimap.edina.ac.uk). The vector data were rasterized to 1 m resolution and then aggregated to 30 m resolution, resulting in a % land cover classification based on 900 sample pixels. Only the % of vegetated cover is used here as a generalized gradient of urbanisation. In addition, we used the OS MasterMap Greenspace product () to identity polygons of greenspace within the city. To focus on vegetated areas, we removed the polygons whose primary function was classified as ‘Land Use Changing’ or ‘Private Garden’, and areas where the primary form was listed as ‘Beach Or Foreshore’, ‘Inland Water’ or ‘Multi Surface’.

Social equity

To derive evidence on whether certain sections of society live in areas with less favourable acoustic conditions (, ), we examined the relationship between sound characteristics and ethnicity or social deprivation for the 766 Output Areas (OAs) in Southampton. Output Areas are the UK’s base geographic unit for census data, comprising spatial clusters of a minimum of 40 resident households and 100 resident people, designed to have similar population sizes and to be as socially homogenous as possible. For the analyses here, ethnicity was collapsed into a single metric “% self-declared white” and deprivation into “% with no deprivation”, based on the 2011 Census returns (data available at https://www.nomisweb.co.uk/). Spatial multiple regression models (predicting three different sound characteristics from ethnicity and deprivation) were created using the R package spdep (), spatial weights being defined using queen contiguity and spatial dependency calculated using Moran’s I and Lagrange Multiplier tests. The choice between spatial lag and spatial error models was based on the decision rules of Anselin (). Specifically, we first compared the probabilities associated with the Lagrange Multiplier tests for spatial lag and spatial error terms. Where these were both significant, we chose the spatial model based on the smaller of the probabilities when Robust Lagrange Multiplier tests were applied instead. In the one case where a spatial lag model was selected over a spatial error model (see Results), we ran both models (not shown here) and found no material difference in their interpretation, meaning the outcome was robust to the model used.

A workflow for the entire analysis is given in Figure 1.

Figure 1 

Data analysis workflow. Input data are shown in blue boxes and outputs (with corresponding Table and Figure numbers) are given in pink boxes. Calculation stages are shown in parallelograms.

Results

Frequency composition and loudness

With the settings used (see Methods), spectral clustering classified 52,341 of the 52,366 sound clips into five natural groupings, the remaining 25 sound clips being unresolved. Of the classified clips, 45.7% fell into the first cluster, 35.7% into the second and 18.5% into the third, the remaining two clusters making up less than 0.1% of clips. The mean frequency profiles of the 99.9% of clips comprising clusters 1, 2 and 3 showed a gradual decline in SPL across the octaves (Figure 2), all profiles showing a slight peak in band 7 (710 to 1420 Hz). Clusters 4 and 5 were very different, showing a wide peak in octaves 7, 8 and 9 for cluster 5, and a pronounced single peak in octave 10 for cluster 4 (Figure 2). The elevations of the lines in Figure 2 show strong differences in the overall SPLs between clusters and this was also evident for dBA for the three dominant clusters (Figure 3). With the sample size being so large, p-values are unreliable indicators of differences between groups, but 95% confidence intervals for the means of dBA, ODI30 and NDOI did not overlap for clusters 1, 2 and 3. Effect sizes (eta squared) were 77.2%, 4.7% and 1.6% respectively for dBA, ODI30 and NDOI, meaning that despite the differences in spectral composition, the dominant feature differing among sound clips was loudness.

Figure 2 

Mean frequency response profiles for the five clusters recognised by spectral clustering. The light blue, grey and orange profiles made up 99.9% of sound clips.

Figure 3 

Frequency distribution of SPL (dBA) by spectral cluster. The means were 68.0, 49.7 and 58.8 dBA respectively for clusters 1, 2 and 3 (n = 52,296).

Principal components analysis extracted three rotated PCs from the 11 octave bands accounting for 48.8%, 31.3% and 14.2% of the variance respectively (Table 2). PC1 was dominated by mid-range frequencies from 177 to 11360 Hz, whereas PC2 featured frequencies below 117 Hz and PC3 high frequencies from 5.7 to 22.7 kHz. Note that PC1 included all the frequency range defined as characterising either biophony or anthropophony, without distinguishing them.

Table 2

Rotated component matrix for the PCA on the 11 octave bands using varimax rotation with Kaiser Normalization. Highest weightings are emphasised in bold.

Octave bandPC1PC2PC3

B1.009.944.091
B2.194.946.148
B3.480.810.181
B4.577.730.170
B5.843.444.198
B6.924.242.245
B7.937.176.218
B8.915.172.306
B9.832.200.476
B10.721.218.627
B11.529.255.783
Eigenvalue5.373.441.56
% variance explained48.831.314.2
Cum. % var explained48.880.194.3

When the PCA was repeated including the ecoacoustic indices, PC1 was dominated by SPL in the frequencies from 117 Hz into the ultrasonic region, and overall SPL (dBA). PC2 again focused on lower frequencies whereas PC3 captured NDOI and PC4 the ODI30 (Table 3). As PCs are orthogonal, this is confirmation that NDOI and ODI30 represent characteristics not captured by the octave bands or each other, although together they accounted for only 17.5% of variance.

Table 3

Component matrix for the PCA on the 11 octave bands, dBA and two ecoacoustic indices using varimax rotation with Kaiser Normalization. Highest weightings are emphasised in bold.

VariablePC1PC2PC3PC4

B1.086.961–.003–.063
B2.288.931.010–.112
B3.568.746–.020–.183
B4.666.624–.006–.304
B5.894.340–.117–.219
B6.950.186–.181.007
B7.931.152–.270.128
B8.946.157–.163.168
B9.951.186.085.173
B10.917.208.232.172
B11.815.248.281.127
dBA.940.268–.157–.046
NDOI–.101–.003.976.054
ODI30.213–.264.064.926
Eigenvalue7.673.221.271.17
% variance explained54.823.09.18.4
Cum. % var explained54.877.886.995.3

Spatial patterns

PC1 in Table 3, which represented all octave bands 5 to 11 and overall dBA, showed strong spatial patterning in the city and was, in fact, indistinguishable from an interpolated map of dBA alone (Figure 4). Although some quiet areas corresponded to greenspace (e.g. the Common outlined in panel b of Figure 4), others were simply suburban neighbourhoods and areas with less traffic. The main road network is obvious as the primary source of broad-spectrum urban noise in the city.

Figure 4 

Almost identical spatial pattern in PC1 from Table 3(panel a) and dBA (panel b). The Common lies within the dashed polygon in panel b.

High octave diversity (as represented by PC4 in Table 3) visually correlated with the occurrence of trees in the city (red areas in Figure 5). This was especially obvious on the Common (marked on Figure 4). Whether the source of more diverse sounds is the trees themselves (e.g. the rustling of leaves) or associated birdlife is not known. However, this relationship was not universal and other parts of the city without trees also had high octave diversity. As the ODI30 index only included bands with an SPL >30 dB, these areas might simply be noisier across a wide spectrum of frequencies, but further research is clearly needed.

Figure 5 

Spatial pattern in PC4 in Table 3 which mostly correlates with acoustic diversity above 30 dB (ODI30). Red colours show greater diversity in sound frequencies.

NDOI (PC3: Table 3 and representing the contrast between biophony and anthropophony) also showed strong spatial patterning although its interpretation was not always clear (Figure 6). By overlaying areas defined as greenspace in the city, it is apparent that not all greenspace has acoustic characteristics that might be regarded as natural (Figure 6b) and, equally, not all acoustically natural areas were greenspace (Figure 6a). This might partly be explained by the presence of major roads which are clearly highlighted by cluster 1 from the spectral clustering (also plotted on Figure 6) as the major source of noise in the city with a mean SPL of 68 dB and high frequency components.

Figure 6 

Spatial pattern in PC3 from Table 3 (cf. NDOI) that lies outside areas defined as greenspace (panel a) and within greenspace (panel b). Cluster 1 from Figure 3 is overlaid as black pixels and shows a close match to the main road network.

Figures 7 and 8 focus on opposite ends of the sound spectrum. The hotspot map in Figure 7 shows areas with the highest SPL in frequencies from 5.6 kHz into the ultrasonic range. The pattern is difficult to explain but some city locations with high SPL appear to coincide with industrial and commercial premises while others might be transient sounds from road vehicles (e.g. motorbikes). The map of low frequency sound (11 to 88 Hz) in Figure 8 includes noise below the level of normal human hearing. The concentration of high SPL in the south-west might be related to port activity such as heavy goods vehicles and loading cranes, but sounds at this frequency are the ones most affected by wind noise and need careful attribution to source. Other locations with high SPL at low frequencies appear to be redevelopment sites undergoing building work.

Figure 7 

Hotspot map of high sound frequencies concentrated at 5.7–22.7 kHz (PC3 in Table 2). Red colours show higher SPL.

Figure 8 

Hotspot map of low sound frequencies concentrated below 88 Hz (PC2 in Table 2). Red colours show higher SPL.

Relationship with vegetated land cover

When the fraction of vegetated land cover was binned into 20% quantiles and the mean SPL extracted for each octave band, there was a tendency for band means to decline with vegetated cover (Figure 9). The separated components for biophony (bands 9 and 10), anthropophony (bands 7 and 8) and dBA generally declined with vegetated cover over 60%, although there was little difference in the NDOI or ODI30 as vegetated cover increased suggesting that the greenest pixels tended to be quieter rather than possessing different frequency characteristics (Figure 10).

Figure 9 

Mean SPL profiles across 11 octave bands for sound clips grouped according to vegetated cover within the 30 m pixel where the recording was made.

Figure 10 

Sound characteristics grouped by percentage of vegetated cover within the 30 m pixel where the recording was made.

If sound characteristics are analysed only against the land cover in the immediate 30 m pixel, sound sources beyond the pixel are ignored, for example, an adjacent road. By relating the sound characteristics to clusters of pixels (Table 4), it is possible to account for the composition of neighbouring pixels too. Here there was a slight tendency for the percentage of vegetated cover to differ according to sound clip clusters both within the 30 m pixel where the recording was taken and in windows of 3 × 3, 5 × 5 or 7 × 7 pixels i.e. up to a distance of 195 m away (Table 4). Cluster 2 showed a slightly elevated percentage of vegetated cover at all patch sizes and cluster 5 was especially distinct although based on the smallest sample size. These results suggest there is some relationship between the sound characteristics identified by the clusters and vegetated land cover at the landscape scale. This further supports the visual interpretations of the maps (Figures 5 and 6) given earlier.

Table 4

Percentage of vegetated land cover within pixel groupings associated with the sound clip clusters derived from spectral clustering. Values are mean % ± the standard deviation followed by the sample size of sound clips (slight differences between columns due to missing pixels in the land cover data).

ClusterOne 30 m pixel3 × 3 pixels5 × 5 pixels7 × 7 pixels

118.7 ± 24.97
23885
21.1 ± 21.25
23903
21.8 ± 19.76
23918
22.3 ± 18.61
23923
227.3 ± 34.14
18626
28.9 ± 29.26
18665
29.7 ± 26.42
18665
30.0 ± 24.60
18665
321.8 ± 27.63
9687
22.6 ± 22.26
9706
23.2 ± 19.93
9707
23.6 ± 18.51
9708
421.6 ± 26.89
38
24.4 ± 22.18
38
24.8 ± 19.86
38
25.3 ± 18.08
38
560.7 ± 32.62
7
53.7 ± 34.36
7
49.1 ± 35.87
7
42.9 ± 35.32
7

Social inequity

The different spatial patterns in Figures 4, 7 and 8 (dBA, high frequencies and low frequencies respectively) suggest the possibility of different levels of sound exposure by residents across the city. If, in addition, those residents are spatially clustered according to socio-economic metrics, there may be different patterns to inequity of exposure depending on which sound frequency bands are used. To assess this, we applied multiple ordinary least squares (OLS) and spatial regression models to predict their mean values according to census Output Areas (OAs), using the percentage of self-declared white residents and percentage of residents showing no deprivation as the predictor variables (Table 5).

Table 5

Evidence of social inequity in noise exposure as assessed by Ordinary Least Squares (OLS) and Spatial Lag or Spatial Error models. AIC = Akaike’s Information Criterion. ns = not significant, * p < 0.05; ** p < 0.01; *** p < 0.001. –ve or +ve indicate the sign of the regression coefficient. (For technical details of the tests applied see ; ).

dBALow frequencies (Table 2: PC2)High frequencies (Table 2: PC3)

OLS% white–ve,***+ve,***–ve,*
% with no deprivation–ve,**–ve,***–ve,*
Adjusted R2 3.9%10.7%1.0%
AIC4415.13731.85119.87
Moran’s I20.92***21.86***20.48***
Multicollinearity condition number18.4618.4618.46
Jarque-Bera test for normality of errors5.60, ns24.19***32.76***
Breusch-Pagan test for heteroskedasticity0.19, ns5.29, ns5.41, ns

Spatial regression% white–ve, ns+ve, ns–ve,*
% with no deprivation–ve,**–ve,***–ve, ns
Lag coefficient+ve,***+ve,***+ve,***
Pseudo R2 48.0%55.5%43.5%
AIC4059.15317.91–206.80
Likelihood ratio test for spatial dependence355.98***415.94***326.66***
Lag or errorErrorLagError
Breusch-Pagan test for heteroskedasticity4.80, ns12.52**4.30, ns

For the three sound frequency groupings, Moran’s I showed highly significant spatial autocorrelation among Output Areas and highly significant spatial lag coefficients in the regression models (see rows 6, 12 and 15 in Table 5). This indicates strong bias if OLS is used to analyse these data instead of the spatial models. For example, the apparent highly significant difference in dBA and low frequencies according to the percentage of self-declared white residents disappeared to non-significance when spatial dependency was adequately modelled. Despite this, dBA and sound pressure levels at low frequencies differed significantly according to the percentage of the population showing no deprivation. As the regression signs were negative, this indicates decreasing SPLs and less noisy environments as the percentage of the population showing no indicators of deprivation increased. The sound pressure levels at high frequencies also differed significantly (p < 0.05) according to the percentage of self-declared white residents, again with a negative sign indicating lower SPLs as the percent of white residents increased. Some caution is needed in interpreting these results as the effect sizes were small and errors showed some signs of heteroskedasticity. Also, although the multicollinearity condition number (Table 5, row 7) was below the critical threshold of 30 (), a closer look at social deprivation and ethnicity as independent predictors may be warranted. The crucial point here though is evidence of different spatial patterns in perceptions of inequity according to which sound frequency band groupings are used.

Discussion and Conclusion

In this paper, the authors mapped the acoustic environment of an entire city using a rapid field survey technique as opposed to the more usual traffic count and propagation modelling or spot measurement approaches. By recording sound on the move rather than at static recording stations, it was possible to gather over 52,000 georeferenced sound clips of 10 seconds duration each within a six-week survey period. Based on an average walking speed of 1.4 ms–1, the surveyors covered 733 km and produced 145 hours of sound recordings. These figures may help future researchers to decide whether a similarly extensive survey is appropriate in their setting. Short duration sound recordings show greater between-clip variation than longer recordings, and some form of spatial averaging (such as the Inverse Distance Weighted interpolations used here) is necessary to smooth out chance events (). More work is needed on the equivalence of mobile and static surveys () but we believe that our approach was successful in addressing the need for better information on the measured levels of sound energy citizens are exposed to from all sources at the whole city scale (). Furthermore, extensive surveys can go some way to overcoming the limitations highlighted by Fairbrass et al. () that arise when studying only a few land cover types. The outputs produced in this paper are largely consistent with expectation (e.g. dBA tracking the main road network – Figure 4) but with additional information across a wide frequency range. An important refinement for the future would be to include the temporal component of the acoustic environment (), omitted in this paper for brevity.

No attempt was made here to consider the human perception of sounds in the city. Our survey therefore focused on the acoustic environment, defined in ISO 12913-1:2014 as the “sound at the receiver from all sound sources as modified by the environment”. In contrast, the term “soundscape” (formerly used in a general sense to indicate the combination of sounds in a landscape) is now reserved for the “acoustic environment as perceived or experienced and/or understood by a person or people, in context” (see ISO 12913-1:2014 and ). There is thus a distinction between studies that focus on perception (a psycho-acoustic problem) and those that consider exposure to levels of sound which may or may not be perceived. For example, there are potential health impacts from ultrasound which lies beyond the range of human hearing and therefore cannot be perceived by the individual being exposed (). Human acoustic perception is also irrelevant in biodiversity studies.

By undertaking an empirical study of an entire city rather than at selected study sites, it was possible to address a number of research questions at the scale relevant to urban planners. To address our first research question (whether sound frequency composition differs across urban space), two approaches were used to identify groupings in the sound pressure levels among octave bands within the sound clips. In the first, spectral clustering was applied to find natural groupings in the data, implemented using the R code SamSPECTRAL (, ). This code was developed for large flow cytometry data sets and we know of no other application to sound data. Despite the need to set a scaling parameter and separation factor, we found the number of clusters defined after the initial separation and then final combining stages was always low. SamSPECTRAL is good at separating rare and overlapping clusters (), which are often challenging for other algorithms, and we therefore accept that only a few distinctive clusters exist in the city. In fact, at observer level, Southampton is dominated by broad-spectrum noise that naturally clusters 99.9% of locations into just three groups with mean levels at 50, 59 and 68 dBA (Figure 3). As these are averaged levels along the routes our surveyors walked, some routes taken by pedestrians are likely to include far higher noise levels that could potentially impact on human health. The cluster of locations with the loudest sounds were characterised by a broader shoulder at frequencies below 88 Hz and the largest difference in sound pressure levels between clusters occurred at around 710 to 1420 Hz, within our definition of anthropophony. This, plus the fact that the commonest cluster mapped neatly onto the main road network (Figure 6), leaves little doubt that road traffic is the principal source of noise in the city and, in our experience, there are few places where the sound of traffic is not audible. Further work is encouraged using spectral clustering on sound data, for example, examining how choice of similarity graph and associated metrics affect the outcome ().

As an alternative to spectral clustering, the simpler (but less robust) principal components analysis was also applied to the sound data as an unsupervised classifier (). Two key findings emerged: (i) that when octave bands were used alone, the first three principal axes extracted corresponded to mid, low and high frequencies; and (ii) when ecoacoustic indices and dBA were also included, these fell onto their own principal components. By definition, these findings indicate weak correlation between the SPLs in the low, mid and high frequency ranges that translate into different spatial patterns in sound frequency components. The PCAs also indicate that the ecoacoustic indices contain information that is unique between them and in comparison with octave band combinations such as dBA. This somewhat counters the view that ecoacoustic indices are colinear with each other and other acoustic metrics (). Thus despite the overwhelming nature of noise in the city, principal components analysis was able to recognise distinctive contributions (totalling ~17% variance) from the ecoacoustic indices used here. This suggests that some signal of the natural acoustic environment might be detected although in combination with anthropogenic sounds its influence in terms of SPL is weak ().

In examining our second research question, whether ecoacoustic indices are useful for characterising naturalness across the city, we found some (but not a unique) correspondence between the principal component summarizing NDOI and greenspace across the city (Figure 6). Furthermore, analysis showed some tendency for noise to decrease when vegetation cover was over 60% although this could be an artefact of those areas having fewer roads rather than an attenuating effect of vegetation. In reality, many areas of greenspace are affected by adjacent road traffic noise and there are few places where it is quiet enough to appreciate the sounds of nature (biophony: , ). This should not be seen as a failure of NDOI to recognise naturalness, but rather that other factors are involved. Possibly of more concern was the finding that NDOI was capable of suggesting naturalness where there was no greenspace. Although most anthropogenic sounds occur at low frequencies (), even the upper frequencies more characteristic of biophony appeared to be dominated by human-generated sounds in Southampton’s urban environment. Thus NDOI (an index of biophony and anthropophony) showed limited value as a measure of naturalness when used in isolation. A similar conclusion was reached by Fairbrass et al. () who found London’s urban environment to be dominated by a wider frequency range of anthropogenic sounds than occur in the more natural habitats where ecoacoustic indices were developed. One limitation of the data analysed here is that surveys took place only in the summer months when the breeding season of birds is over and they are less inclined to sing; this may have weakened the prominence of biophony in the recordings. A further complicating factor is the lack of a single definition of greenspace () and this is especially problematic in cities. We used as our starting point the OS MasterMap Greenspace product () but removed private gardens (and a few other features – see Methods) because gardens are mixed surfaces of unmapped composition which dominate the city. While some definitions of greenspace include private gardens, others consider only publicly-accessible land (). However, in terms of the acoustic environment and indeed for biodiversity, gardens may actually function as “natural” habitats and could explain some of the anomalies in Figure 6a. More work is clearly needed on the role of private gardens in the urban acoustic environment.

These findings have implications for Southampton City Council’s Green Space Strategy which has the admirable aims of enhancing economic value, social inclusivity and cohesion, health and wellbeing, and biodiversity (, ). Neither document mentions the impact of noise or sound on greenspace and our data suggest that many green urban areas may simply be too narrow to preserve a natural acoustic environment. If the city were to devote space to soundscape design for the public good () and for biodiversity, one obvious large location would be the Common (Figure 4) which already has protected status as a Site of Special Scientific Interest. (Opportunities for creating new greenspace are almost non-existent). However, the heavy traffic that travels north-south along the Avenue bisects the Common, lessening the chance for the development of a natural acoustic environment (see the dominance of noise in Figure 4b). The incursion of some noise into greenspace, however, may not lessen its benefits to human well-being (in contrast to biodiversity) since the perceived benefits of vegetation in noise reduction far outweigh the actual attenuation achieved (). Very little appears to be known about the difference between exposure to the acoustic environment and its perception in non-human species. The fact that wild species occupy noisy urban areas in which sounds interfere with their ability to breed (, ) might suggest that species perceive a site as suitable breeding habitat when it is not, an example of an “ecological trap” (e.g. ).

Our third research question asked whether perceptions of social inequity to sound exposure differed according to the frequency bands considered. We found that residents living in different parts of the city partitioned by census Output Area were exposed to unequal levels of noise and from different frequency components. The strongest effect (but still relatively weak) was that those living with social deprivation were exposed to noisier environments at low frequencies. The difference was less significant for dBA and not significant (at p < 0.05) for high frequencies. There was also a very slight tendency for exposure to noise to be lower in areas with a higher percentage of white residents but only at high frequencies (once adjustment had been made for spatial autocorrelation). Other European studies have found similar inequity in noise exposure in London (), Hamburg () and Bradford (), but not in Paris () or Amsterdam (). However, none of these studies considered differences in exposure according to frequency band because measurement data were lacking. Given the relationship between the characteristics of the acoustic environment and urban land use, our findings for Southampton are possibly a consequence of the unequal access to greenspace experienced by different societal groupings, an effect previously noted for Leicester (). The variation in people’s responses to sounds () makes it difficult to generalise about what the consequences of differential exposure to sound frequency components might be and also what might be regarded as a bad or a pleasant soundscape. This is something that might itself vary with experience and ethnicity (). However, having acoustic data across an entire city makes it possible to consider the impacts of alternative locations for developments on issues of equity within the planning process (), an important step towards sustainable urban development.