Diversity of fish sound types in the Pearl River Estuary, China

Background Repetitive species-specific sound enables the identification of the presence and behavior of soniferous species by acoustic means. Passive acoustic monitoring has been widely applied to monitor the spatial and temporal occurrence and behavior of calling species. Methods Underwater biological sounds in the Pearl River Estuary, China, were collected using passive acoustic monitoring, with special attention paid to fish sounds. A total of 1,408 suspected fish calls comprising 18,942 pulses were qualitatively analyzed using a customized acoustic analysis routine. Results We identified a diversity of 66 types of fish sounds. In addition to single pulse, the sounds tended to have a pulse train structure. The pulses were characterized by an approximate 8 ms duration, with a peak frequency from 500 to 2,600 Hz and a majority of the energy below 4,000 Hz. The median inter-pulsepeak interval (IPPI) of most call types was 9 or 10 ms. Most call types with median IPPIs of 9 ms and 10 ms were observed at times that were exclusive from each other, suggesting that they might be produced by different species. According to the literature, the two section signal types of 1 + 1 and 1 + N10 might belong to big-snout croaker (Johnius macrorhynus), and 1 + N19 might be produced by Belanger’s croaker (J. belangerii). Discussion Categorization of the baseline ambient biological sound is an important first step in mapping the spatial and temporal patterns of soniferous fishes. The next step is the identification of the species producing each sound. The distribution pattern of soniferous fishes will be helpful for the protection and management of local fishery resources and in marine environmental impact assessment. Since the local vulnerable Indo-Pacific humpback dolphin (Sousa chinensis) mainly preys on soniferous fishes, the fine-scale distribution pattern of soniferous fishes can aid in the conservation of this species. Additionally, prey and predator relationships can be observed when a database of species-identified sounds is completed.


INTRODUCTION
The Pearl River Estuary (21 • 40 -22 • 50 N; 112 • 50 -114 • 30 E) is in a subtropical area of the northern South China Sea. The estuary is one of the most economically developed regions in China, and the rapid local industrialization and large-scale infrastructure projects, e.g., the ongoing construction of the Hong Kong-Zhuhai-Macao bridge (Wang et al., 2014b) and the Guishan wind farm project (Wang et al., 2015b), have placed an extraordinarily heavy burden on coastal environments and accelerated human damage to coastal ecosystems.
Sound production in soniferous fish has been shown to be associated with reproduction (e.g., courtship and spawning) and territorial or aggressive behavior (Hawkins & Amorim, 2000;Takemura, Takita & Mizue, 1978). Most of the repetitive fish sounds are species specific (Tavolga, 1964), which enables the identification of the distribution and behavior of soniferous species by acoustic means. As a noninvasive technology, passive acoustic monitoring has been widely applied to map the spatial (over a wide range of habitats and at varied depths) (Wall, Lembke & Mann, 2012;Wall et al., 2013) and temporal (diel, seasonal and annual) (Locascio & Mann, 2011;Ruppé et al., 2015;Turnure, Grothues & Able, 2015) occurrence and behavior of soniferous fishes, even in severe conditions.
Overfishing and ocean pollution in the past decade have led to a dramatic decrease in fish in the wild fisheries of China (Liu & Sadovy, 2008;Sadovy & Cheung, 2003). The endemic species of giant yellow croaker (Bahaba taipingensis), which is highly valued as a traditional medicine of its swim bladder and was an important fish stock before the 1960s, collapsed in the wild and was determined to be commercially extinct in 1997 (Sadovy & Cheung, 2003). The spotted drum (Protonibea diacanthus) and large yellow croaker (Larimichthys crocea, which is endemic to East Asia and was once one of the three top commercial marine fishes in China), have been severely depleted throughout their geographic range since the 1980s and have now almost entirely disappeared from landings (Liu & Sadovy, 2008;Sadovy & Cheung, 2003). The most recent study of Indo-Pacific humpback dolphins (Sousa chinensis, locally called the Chinese white dolphin) biosonar activity in the Pearl River Estuary indicated that its diel, seasonal and tidal patterns might be ascribed to the spatial-temporal variability of its prey (Wang et al., 2015b); however, little attention has been paid to local fishes, with only sporadic fishery distribution data with poor temporal and spatial resolution obtained from 1986 to 1987 by bottom trawl and in 1998 by beam trawl and hang trawl (Li, Chen & Sun, 2000;Wang & Lin, 2006). The fine-scale distribution pattern of humpback dolphin prey has yet to be investigated.
In this study, the ambient biological sounds in the Pearl River Estuary were recorded using passive acoustic monitoring. Suspected fish sounds were quantitatively and qualitatively characterized. We compared the species-specific sounds thorough a literature review, especially of those species that are distributed in the research area, to confirm the caller's identity. These baseline data can serve as a first step toward mapping the spatial and temporal distribution patterns of soniferous fishes in the estuary. Moreover, they are helpful for planning fisheries management and evaluation of the damage to aquatic environments (e.g., spawning grounds of the sciaenids) from various large-scale infrastructure projects because marine environmental impact assessments must be based upon a good understanding of the local baseline biodiversity. Additionally, the baseline data can aid in the protection of local humpback dolphins and the implementation of conservation strategies.

Acoustic data recording system
Underwater acoustic recordings were made using a Song Meter Marine Recorder (Wildlife Acoustics, Inc., Maynard, MA, USA), which included an HTI piezoelectric omnidirectional hydrophone (model HTI-96-MIN; High Tech, Inc., Long Beach, MS, USA) with a sensitivity of −164 dB re 1 V/µPa at 1 m distance, a recording bandwidth of 2 Hz-48 kHz and a flat frequency response over a wide range of 2 Hz-37 kHz (±3 dB). The hydrophone also included a programmable autonomous signal processing unit integrated with a band-pass filter and a pre-amplifier. The signal processing unit can log data at a resolution of 16 bits and at a 96 kHz sampling rate, with a storage capacity of 512 GB. The signal processing unit was sealed inside a waterproof PVC housing and was submersible to 150 m. The recording system was calibrated prior to shipment from the manufacturer.

Data collection
Static acoustic monitoring was conducted underwater at the base of a telephone signal tower (22 • 07 54 N, 113 • 43 54 E) located among the Sanjiao, Chitan and Datou islands (Fig. 1). The recordings were taken continuously throughout deployment periods from May 26 to June 4, 2014, and June 17 to 22, 2014, at a 96 kHz sampling rate. The acoustic recording system was attached to a steel wire rope and suspended below the signal tower in the middle of water column 4.0 m above the ocean floor and approximately 3.0-5.8 m (depending on the tide conditions) below the water surface. A 40 kg anchor block was attached on the bottom of the steel wire rope and laid down on the seabed to reduce the movement of the recording system due to water currents.

Acoustic data analysis
Upon retrieval of the recorder, the acoustic data were downloaded and processed. Raven Pro Bioacoustics Software (version 1.4; Cornell Laboratory of Ornithology, NY, USA) was used to initially visualize the acoustic data in the spectrogram (window type: Hann windows; fast Fourier transform (FFT) size: 2048 samples; frame overlapping: 80%; frequency grid spacing: 46.88 Hz; temporal grid resolution: 4.26 ms). Only calls with good signal-to-noise ratios (SNR >15 dB, noise level obtained just before or after the pulse) and satisfying the criteria of no interference by other sounds were extracted for further quantitative analyses. To make the data more independent and reduce the possibility of using multiple sounds from the same individual, only one signal was extracted for each call type in every 10 min bin for further analysis. The recorded sounds generally featured single or multiple-pulse structures. A custom acoustic analysis routine based on MATLAB 7.11.0 (The Mathworks, Natick, MA, USA) was developed to analyze the extracted calls. For each call, the peak amplitude time for each pulse within the call was logged using a pulse-peak detector. Through trial and error, the pulse was defined and extracted as an 8 ms signal that began 2.5 ms before and ended 5.5 ms after the time point of the peak amplitude (Figs. 2B and 2C). The 8 ms definition was validated because it encompassed the majority of the energy of a pulse and was longer than the shortest interval between pulses within a call. The sonic parameters of the number of pulses in a call, total call duration (in ms), inter-pulsepeak interval (IPPI), and the inter-pulse interval (IPI) were calculated for each call. Call duration is derived by adding (B) Pulses detected by the pulse-peak detector. Vertical dashed lines denote the starting (green), peak (red), and ending (blue) points of a pulse. (C) Close-up of the oscillogram of extracted 8 ms pulses showing the fine-scale call structure. (D) The cumulative energy of the extracted pulse, τ 95% , was the duration containing 95% of the cumulative energy of the pulse, which was derived from the time difference between the 2.5th and 97.5th cumulative energy percentiles. (E) Normalized signal envelope of the extracted pulse; τ −3 dB and τ −10 dB are the time differences between the −3 dB and −10 dB end points relative to the peak amplitude of the signal envelope, respectively. (F) Normalized power spectrum of the extracted pulse. Spectrum configuration: FFT size, 96,000; frequency grid spacing, 1 Hz.
Full-size DOI: 10.7717/peerj.3924/ fig-2 8 ms to the time difference of the last pulsepeak and the first pulsepeak; IPPI is the time difference between the peak amplitude of consecutive pulse units in the train, which is equal to the pulse period in the literature (Parmentier et al., 2009), and IPI is the time interval between the end of one pulse and the onset of the next one in a series. The temporal characteristics for each 8 ms pulse were computed as τ 95% , τ −3dB and τ −10dB .τ 95% is the duration containing 95% of the cumulative energy of the pulse (Fig. 2D), which began when 2.5% of the cumulative signal energy was reached (CE 2.5% in Fig. 2D) and ended when 97.5% of the cumulative signal energy was reached (CE 97.5% in Fig. 2D), and τ −3dB and τ −10dB are the time differences between the end points that were 3 dB and 10 dB lower than the peak amplitude of the envelope of the pulse waveform, respectively (Fig. 2E). The signal envelope was generated by taking the absolute value of the waveform after applying the Hilbert transform function (Au, 1993;Madsen & Wahlberg, 2007). The frequency and bandwidth properties for each 8 ms pulse were determined from the power spectrum, which was calculated from the squared fast Fourier transform of a 96,000-point Hanning window. Parameters of the peak frequency (f p , the frequency at which the spectrum has its maximum value) (Fig. 2F), center frequency (f c , the frequency that divides the power spectrum into equal energy halves) and centralized root-mean-square bandwidth (BW rms , the spectral standard deviation of the f c of the spectrum) (Au, 1993;Madsen & Wahlberg, 2007) were measured since they were proposed to be good descriptive parameters for signals with bimodal spectra (Au, 2004). Parameters of 3-dB and 10-dB bandwidths were not measured since they might only cover the frequency range near the peak frequency and tend to provide a misrepresentation of the bandwidth of signals with bimodal spectra (Au, 2004). The quality factor of each pulse (Q, an appropriate way to define the relative width of a signal) was computed as the ratio of the f c to the BW rms (Au, 1993;Au, 2004). The sound pressure levels (SPLs, dB re 1 µPa) and energy flux density (EFD, dB re 1 µPa 2 s) were derived for each 8 ms pulse over its τ 95% . The SPL parameters included the zero-to-peak SPL (SPL zp ) and the root-mean-square SPL (SPL rms ) (Urick, 1983). The absolute pressure levels were derived by subtracting the sensitivity of the hydrophone and the gain due to the amplifier (Urick, 1983).
The pooled distribution pattern of the IPPI for all analyzed calls was characterized by a multi-peak mode, with a distribution curve peaking at 9, 10, 12, 13 and 18 ms (Fig. 3A). Previous experience in fish acoustic analysis by other investigators indicated that the IPPI was the most reliable basis for signal identification and species-specific recognition (Mann & Lobel, 1997;Parmentier et al., 2009;Spanier, 1979), and most signals in our database ended with a pulse train featuring regular IPPIs (Table 1). In this study, calls were classified into types primarily based on their IPPI patterns and their amplitude and temporal modulation patterns (Table 1). The calls were initially grouped according to the number of sections they contained (Table 1). For each call, pulses with IPPIs greater than 1.5 times the median IPPI of the call were divided into different sections. Based on the bimodal distribution of the IPPI for calls that consisted of fewer than three pulses, pulses with an IPPI greater than 24 ms (three times the duration of a single pulse of 8 ms) were divided into different sections (Fig. 3B). To name each call type, such as 2+1+N 10 , (1−) 4 +(2−) 2 +N 10 and i N 13 (Figs. 4-6, Figs. S1-S26), '+' was used to separate the different sections of a call, a number was used to denote the number of pulse for that section and '(1 −)' and '(2−)' to denote repeated sections that consist of one or two pulses, respectively, with digital superscripts denoting the number of repeats in a repeating section. 'N' was used to denote the last section of a call with a variable number of pulses, and the digital subscripts denote the median IPPIs of the last portion of the call; the subscript i was used to denote calls with a zero-to-peak sound pressure level of the first pulse approximately 10 dB weaker than that of the remainder of the call. Occasionally, a train of calls was extracted with significantly higher SNR (SNR > 25 dB), a regular inter-call interval, and a gradually changing pattern in its sound pressure level distinct from the ambient biological sounds. These sounds were likely produced by the same individual fish, which facilitated the estimation of the inter-call intervals.

Statistical analysis
Descriptive statistics were used to summarize the biographical information. All the parameters were tested for normality (using the Shapiro-Wilk test for data sets <50 or the Kolmogorov-Smirnov test for data sets ≥50) and homoscedasticity (using Levene's test for equality of variance) (Zar, 1999). Because of the grossly skewed distribution of the majority of the data, the descriptive parameters of median, quartile deviation (QD), 5th percentile (P5), and 95th percentile (P95) were adopted. The QD was defined as one-half the interquartile range, which is the difference between the 25th and 75th percentiles in a frequency distribution. Notes.
For each signal, pulses with an inter-pulsepeak interval (IPPI) greater than 1.5 times the median IPPI of the signal were grouped into different sections. For signals that consisted of fewer than three pulses, pulses with an IPPI greater than 24 ms (three times the duration of a single pulse) were further grouped into different sections. In the call name column, '+' is used to separate different sections of a call; the number denotes the number of pulses in that section; '(1−)' and '(2−)' denote repeated sections that consist of one and two pulses, respectively; the digital superscripts denote the number of repeats in the repeating section; 'N' denotes the last section of a call that varied in the number of pulses; the digital subscripts denote the median IPPIs of the last portion of the call; the subscript i denotes calls with a zero-to-peak sound pressure level of the first pulse approximately 10 dB weaker than that of the remainder within the call. For call types with more than one portion, the IPPI pattern of the last section is given.
Principal component analysis was used to identify the variables explaining the most variance among the acoustic parameters. Call types with an analyzed number greater than five were extracted for further discriminant and cluster analyses. Canonical discriminant analysis was used to assess the variation among call types relative to the variation within call types and determine the validity of our call types. Hierarchical cluster analysis (Romesburg, 2004), a step-wise process that merges the two closest or furthest data points at each step and builds a hierarchy of clusters based on the distance between them, was applied to discover similar call types in each set. Because the amplitude parameters were not critical for species recognition (Ha, 1973) and the call duration was dependent on the number of pulses in a call (Parmentier et al., 2009), these parameters were not included in the principal component analysis, canonical discriminant analysis and hierarchical cluster analysis. The and row 2 (E-H) are the oscillogram and sonogram, respectively, of a representative signal for each call type. Row 3 (I-L) is the duration of a call as a function of the number of pulses within the call for each call type. Results of the pooled inter-pulsepeak interval (M-P in row 4), sound pressure level (Q-T in row 5), peak frequency (U-X in row 6), and center frequency (Y-BB in row 7) of each pulse versus the order at which it occurs within a call for each call type are also given. For the boxplot, the line inside the box indicates the median value, and the upper and lower box borders are the first and third quartiles, respectively. The length of the box is the interquartile range (IQR). The whiskers extend to the most extreme data within the limit of 1.5 IQRs from the end of the box. Open circles (o) denote mild outliers with values greater than 1.5 IQRs but fewer than 3 IQRs from the end of the box. Asterisks (*) denote extreme outliers with values greater than three box lengths from the upper or lower edges of the box. Sonogram configuration: FFT size, 96,000; window type, Hanning; overlap samples per frame, 95%.

RESULTS
Ambient biological sounds and suspected fish sounds were recorded over all the 16 recording days and sometimes formed dense choruses of individual sound emissions produced simultaneously and/or overlapping with each other that obscured the signals Notes. P50, median; P5 and P95, 5th percentile and 95th percentile, respectively; QD, quartile deviation; Dur, duration; IPPI, inter-pulsepeak interval; τ 95% , duration of 95% cumulative energy; τ −3 dB and τ −10 dB , duration of −3 dB and −10 dB of the peak amplitude of the enveloped signal, respectively; f p , peak frequency; f c , center frequency; BW rms , centralized root-mean-square bandwidth; Q, quality factor; SPL zp and SPL rms , zero-to-peak and root-mean-square sound pressure levels, respectively; EFD, energy flux density; N1, N2 and N3, number of calls, inter-pulsepeak intervals and pulses analyzed, respectively.
The duration is in seconds, the frequency is in Hz, the SPL is in dB re 1 µPa, and the EFD is in dB re 1 µPa 2 s. The IPIs are not shown here and can be obtained by subtracting 8 ms from the IPPIs. The same notation was used for the following tables. and could not be discriminated individually, especially before dusk. In addition to some single pulses, individual calls tended to possess a multi-pulse burst structure. The most representative pulse consisted of six oscillations (Fig. 2C). Owing to the single hydrophone methodology, animal localization was not possible in this study. The recorded sound was occasionally clipped, indicating that the source level of the sound was higher than 164 dB (limited by the hydrophone sensitivity). A total of 1,408 calls comprising 18,942 pulses were extracted for statistical analysis and were categorized into 66 call types (Table 1).

Figure 7 Representative oscillogram and sonogram of two section signals with the first section contain two pulses (2 + N 9 in A and D and 2 + N 18 in G and J), three pulses (3 + N 9 in B and E and 3 + N 17 in H and K) and four pulses (4 + N 9 in C and F and 4 + N 17 in I and L).
Oscillograms in row 1 (A-C) and the corresponding sonograms in row 2 (D-F) are call types with IPPIs medians at 9 ms, whereas oscillograms in row 3 (G-I) and its corresponding sonograms in row 4 (J-L) are call types with IPPIs medians at 17 ms.

Principal component, discriminant function and hierarchical cluster analyses
The principal component analysis indicated that approximately 81.1% of the variability is explained by the first four principal components (39.2% by principal component 1, 18.1% by principal component 2, 13.2% by principal component 3, and 10.6% by principal component 4). Principal component 1 was loaded with the τ −3 dB , τ −10 dB , f c , BW rms and Q parameters. Principal component 2 was loaded with f p . The third component describes the temporal parameter of the IPPI, and the fourth component describes the temporal parameters of τ −10 dB and the IPPI. The validity of our call types was confirmed using a canonical discriminant function that grouped N 17 , 1 + N 19 , 2 + N 18 and 3 + N 17 (Fig. 9A).
Call types with an analyzed number greater than five were extracted for further discriminant and cluster analyses and 31 call types meet the requiment and account for 93.82% of all analyzed calls (Fig. S27). Hierarchical clustering using a between-groups linkage method that measures the squared Euclidean distance automatically grouped the 31 extracted call types into five clusters. The N 17 , 1 + N 19 , 2 + N 18 and 3 + N 17 call types were grouped into one cluster, and i N 13 and i N 15 were grouped together (Fig. 9B). Most of the call types with an IPPI median of 10 ms were grouped together, and those with an IPPI median of 9 ms were grouped together (Fig. 9B).

Call occurrence patterns
Almost all call types with median IPPIs of 9 ms for the last section (i.e., call types with median IPPIs of 9 ms except the N 9 call type) were only observed from June 18 to 20, 2014 (Fig. 10). Most of the call types with median IPPIs of 10 ms for the last section (88%, 29 out of 33), except 1 + N 10 , (1−) 2 + N 10 , 1 + 2 + N 10 , and (1−) 3 + N 10 , were only observed from May 26 to June 4 and June 21 to 22, 2014 (Fig. 10). shows the distance at which the clusters combine. When creating a dendrogram, SPSS rescales the actual distance between the cases to fall into a 0-25 unit range; thus, the last merging step to a one-cluster solution occurs at a distance of 25.

DISCUSSION
Fish sonic muscles are the fastest-contracting vertebrate muscles (Rome & Lindstedt, 1998). Many soniferous fishes produce species-specific sounds by driving their swim bladders with the highly specialized sonic muscles during courtship to aggregate males and females and facilitate successful mating, especially at night and/or in highly turbid water (Fine & Parmentier, 2015;Tavolga, 1964). The spawning-related sounds produced by soniferous fishes have been widely used to identify the timing of spawning and map the areas where spawning occurs (Locascio & Mann, 2011;Turnure, Grothues & Able, 2015). The sound recording period in our study was during the spawning seasons of a majority of the local fishes because their reproduction behavior was most evident from March through June in the Pearl River Estuary (Sadovy, 1998). The spawning activity of the greyfin croaker (Pennahia anea) occurred from March-April to June (Tuuli, De Mitcheson & Liu, 2011), the spawning season of the spiny-head croaker (Collichthys lucidus) began in March and lasted until December, and the season for Belanger's croaker (Johnius belangerii) was from April to December (Li, Chen & Sun, 2000;Sadovy, 1998).
In the present study, presumably spawning choruses were recorded daily, indicating that the sound recording location is a spawning place for local soniferous fish. The smallest inter-pulsepeak interval in our study was 8.32 ms, which was longer than and further validated the conservatively defined 8 ms pulse duration. Figure 10 Occurrence pattern of the 66 call types during passive acoustic monitoring periods. Yellow patches in the matrix indicate the corresponding call types (x-axis) observed on that day (y-axis). Call types are clustered according to their median IPPI and the number on the y-axis corresponds to the call type sequence in Table 1.
Full-size DOI: 10.7717/peerj.3924/ fig-10 In this study, the call types were categorized primarily by their IPPI patterns rather than the IPPI ranges. Although there was some overlap in the range of IPPIs, N 9 and N 10 (A4 and B4 in Fig. 4 and Fig. S28) and i N 13 and i N 15 (A4 and B4 in Fig. 5) were separated based on the distribution pattern of their IPPIs.

Sound comparison of soniferous fish in the PRE
The South China Sea, with at least 2,321 fish species belonging to 35 orders, 236 families and 822 genera (Ma et al., 2008), has long been recognized as a global center of marine tropical biodiversity (Barber et al., 2000) and is one of the richest areas in China, even globally, in terms of its marine fish diversity (Huang, 1994;Ma et al., 2008). More than 834 fish species belonging to 25 orders, 124 families and 390 genera were recorded in the waters near Hong Kong (Ni & Kwok, 1999).

Comparisons with Sciaenid sounds
Fishes of the family Sciaenidae, which are commonly known as croakers or drums, are some of the most well-studied soniferous fish species, and more than 23 species in this family were recorded in the waters near Hong Kong (Ni & Kwok, 1999).

Voluntary sounds
In free-ranging conditions, big-snout croaker (J. macrorhynus) can emit voluntary purr signals with the first and the remaining IPPIs averaging 40.1 ms and 9.7 ms in the field and 35.3 ms and 10.4 ms in a large aquarium, respectively (Table 5) (Lin, Mok & Huang, 2007), which resembles the 1 + N 10 call type in our study (Table 4, Fig. 6A) (note that the IPPI was equal to the summation of the pulse duration and the inter-pulse interval in Lin, Mok & Huang, 2007). In addition, the peak frequency of the pulses in 1 + N 10 (mean ± sd: 1,077 ± 244, N = 1,507) was intermediate between those in the pulses of big-snout croaker purr signals as recorded in the field (mean ± sd: 1,146 ± 131, N = 250) and in a large aquarium (mean ± sd: 1,050 ± 84, N = 60). Additionally, the voluntary dual-knock signal of big-snout croaker with an average IPPI of 36.7 ms and 39.4 ms as recorded in the field and in a large aquarium, respectively (Table 5) (Lin, Mok & Huang, 2007), resembled the 1 + 1 call type in our study with an IPPI of 40.70 ± 4.08 (mean ± sd) (Table S1, Fig. S1B). These matches were further supported by the fact that the peak frequency of the pulses in the 1 + 1 call type (mean ± sd: 1077.75 ± 219.58, N = 126) was close to that of the dual-knock recorded in the field (mean ± sd: 1,133 ± 119, N = 40) or a large aquarium (mean ± sd: 1,135 ± 85, N = 50).
It is possible that J. macrorhynchus might emit dual-knock and purr signals in series and creates a multiple section call type, such as one dual knock combined with one purr which may result in a synthetic three section call type of 1 + 2 + N 10 (time gap between the two signals was equal to 10 ms) or a four section call type of 1 + 1 + 1 + N 10 (time gap between the two signals was over 20 ms). However, both of the synthetic 1 + 2 + N 10 and 1 + 1 + 1 + N 10 signals with the third IPPI ascribed to the first IPPI of the purr signal and averaged at 40.1ms (Lin, Mok & Huang, 2007) cannot match either the 1 + 2 + N 10 or the 1 + 1 + 1 + N 10 call types in our study, since both of which with the third IPPI of less than 30 ms (Fig. S7A and Fig. S12B). Belanger's croaker can emit sounds with the first IPPI much longer than subsequent IPPIs, which follow at regular intervals of approximately 20 ms (Pilleri, Kraus & Gihr, 1982) and resemble the 1 + N 19 call type in our study, although the first IPPI in Belanger's croaker (approximately 40 ms) (Table 5) (Pilleri, Kraus & Gihr, 1982) was smaller than that in the 1+N 19 call type (median at 71.36 ms) (Table 4, Fig. 6C). Their similarity was further strengthened by the fact that the temporal and frequency characteristics of the signal emitted by Belanger's croaker, which consists of 4-14 pulses with a 140-260 ms call duration, a 500-1,000 Hz peak frequency and a majority of the energy within the 500-4,000 Hz frequency band (Pilleri, Kraus & Gihr, 1982), resemble those of the 1 + N 19 call type, which consists of 3-12 pulses with a 97.37-272.85 ms call duration and peak frequency median of approximately 789 Hz (Table 4).

Notes.
Except when mentioned, the results are given as the mean or mean ± standard deviation (sd). a denotes results given in a range. b denotes results given for the inter-pulse interval. c denotes results recorded in the field. d denotes results recorded in a large aquarium. e denotes results that are the mean of all the IPPIs except the first IPPI.

Comparison with biological sounds from other passive acoustic monitoring sites
The statistical parameters of the eight types of wild fish sounds recorded in seven estuaries of the west coast of Taiwan using passive acoustics were unfortunately not available, which restricted direct comparison (Mok, Lin & Tsai, 2011). However, the general trend of the 1 + N 10 and 1 + N 12 call types in our study resembles their type B signal (Mok, Lin & Tsai, 2011), with the first inter-pulse interval much longer than the following ones that had a non-increasing inter-pulse interval toward the end of the call, and the N 17 call type in our study resembles their type E signal (Mok, Lin & Tsai, 2011), with a gradually increasing inter-pulse interval toward the end of the call and the sound energy concentrated in discrete bands. Sounds with much longer second or third inter-pulse intervals, which resemble our 2 + N and 3 + N , respectively, were also observed in the Chosui River in Taiwan (Mok, Lin & Tsai, 2011), but the sound producer was not identified. Four call types from three recording sites on the northwestern coast of Taiwan were recorded, with the call type identical to the purr signal of J. macrorhynus dominated the soundscape and was the most abundance call type of these sites (Huang, 2016). The waveform of call type T3 resemble our call types of i N 13 and i N 13 (Huang, 2016).

Occurrence pattern of call types
In the field environment, to communicate without misinterpreting messages and to avoid jamming, different species of a fish community will partition the underwater acoustic environment (Ruppé et al., 2015). In our study, most call types with IPPI medians at 9 ms and 10 ms were observed at times that were exclusive from each other, suggesting they might have been produced by different species.
The spotted seatrout (Cynoscion nebulosus) is one of the few sciaenid species that produces as many as four types of call (Mok & Gilmore, 1983). It is likely that most sciaenid species have fewer call types. Of all the 66 call types recognized in the survey sites, some of the which might come from the same species. According to the result of cluster analysis, five clades were revealed. However, it is still too early to hypothesize that these groups belong to the call repertoire of five species. Additional studies with more controlled conditions, such as in an aquarium or with field recording equipped with a high-definition sonar system such as the DIDSON Dual-frequency Identification Sonar system, will be required to identify the species producing the calls in our study.

Call trains
Due to the relative simplicity of vocal mechanisms and lack of ability to produce complex calls, fish typically emit sounds with variation in either the temporal and/or frequency patterning (Rice & Bass, 2009). As most of the call types were identified based on the number of sections and the repetition of the anterior section, it is likely that a species might be able to produce several call types by varying the anterior sections of the call as a response to the variable external stimuli. Additionally, the temporal and spectral characteristics of fish signals are involved in information coding and are important parameters for the recognition of sound in fishes (Malavasi, Collatuzzo & Torricelli, 2008;Spanier, 1979). In the present study, fish sounds tended to be frequency modulated, e.g., the peak frequency of the pulses within a call were variable (Fig. 2F), and amplitude modulated, e.g., the i N 13 and i N 15 call types. This is possible because the amplitude of the sound is determined by the swim bladder (Fine et al., 2001;Tavolga, 1964) and the dominant frequency of the signal is determined by the sonic muscle twitch duration and the forced response of the swim bladder to sonic muscle contractions rather than the natural resonant frequency of the swim bladder (Connaughton, Fine & Taylor, 2002). Additionally, the length of the sonic muscle fibers also related to the body size of the fish (Parmentier & Fine, 2016).

Passive hearing by the dolphin
The Pearl River Estuary shelters the world's largest known population of Indo-Pacific humpback dolphins (Chen et al., 2010;Jefferson & Smith, 2016;Preen, 2004), with an estimated population of 2,637 (Coefficient of variation of 19% to 89%) (Chen et al., 2010;Jefferson & Smith, 2016). The general preference of this species for estuarine habitats and coastal and shallow water (<30 m depth) distribution make it susceptible to the impacts of human activity (Jefferson & Smith, 2016). The current conservation status of the Chinese white dolphin meets the IUCN Red List criteria for classification as Vulnerable; however, the conservation management in a majority of its distribution range is severely inadequate, and the humpback dolphin population in the Pearl River Estuary is declining by 2.5% annually (Karczmarski et al., 2016).
In addition to emitting high-frequency pulsed sounds for echolocation and navigation, humpback dolphins can produce narrow-band, frequency-modulated whistles with a fundamental frequency range of 520-33,000 Hz (Wang et al., 2013) and apparent source levels of 137.4 ± 6.9 dB re 1 µPa in rms (Wang et al., 2016) for communication. The fish sounds recorded in this study, which were characterized by a peak frequency between 500 and 2,600 Hz and a maximum zero-to-peak sound pressure level greater than 164 dB, were well within the frequency range of humpback dolphin whistles. It is highly probable that the fish sounds function as acoustic clues of prey to the dolphin, i.e., the dolphin relies heavily on passive hearing during the search phase of the foraging process. On the other hand, the brackish water species of C. lucidus and tapertail anchovy (Coilia mystus, Family: Engraulidae) were the top two predominant species in the seawater/freshwater mixing zones of the Pearl River Estuary (Zhan, 1998), accounting for 89% and 72% of the numbers and biomass, respectively, of the whole fish stock in the Pearl River Estuary region (Wang & Lin, 2006). While, the soniferous fish C. lucidus was observed to be the second-most important prey for humpback dolphin, but the non-soniferous fish C. mystus was not identified in their prey spectrum (Barros, Jefferson & Parsons, 2004). This fact can further reinforce the passive hearing mechanism of the local humpback dolphin.

Importance and application
The high biodiversity of fish fauna dwell at the Pearl River Estuary is a treasure of genetic resources and has great potential application value. However, the loss of the fishery stocks over time has been devastating. Historically poor management and overfishing of wild stocks of the large yellow croaker resulted in overwhelming collapses throughout its geographic range (Liu & Sadovy, 2008), and although substantial funds have been provided and many remedial actions such as fishery control, restocking and marine aquaculture have been applied. However, aquaculture can only supplement, rather than substitute, wild fisheries (Goldburg & Naylor, 2005). No evidence of recovery in the wild stock of large yellow croaker has been observed, and its genetic diversity continues to decrease (Liu & Sadovy, 2008). Similar lessons can be learned from the Atlantic salmon (Salmo salar) (Goldburg & Naylor, 2005). Given the sharp declines in fish stocks, especially of the larger species of croakers owing to overfishing in the Pearl River Estuary, and given that fishing pressure is still high and may be even higher in the future, management activities such as more effective fishing moratoriums should be applied to protect the remaining croakers and other fisheries during the spawning season, especially at their spawning grounds. The baseline data of the ambient biological acoustics in our study represent a first step toward mapping the spatial and temporal patterns of soniferous fishes and are helpful for the protection, management and effective utilization of fishery resources. In addition, since marine environmental impact assessment must be based upon a good understanding of the local biodiversity, the baseline data of suspected fish sounds in our study can facilitate the evaluation of the impacts from various infrastructure projects on local aquatic environments by comparing the baseline to post-construction and/or post-mitigation effort data. Additionally, there is a large body of evidence that the distribution pattern of marine mammals tends to be correlated with the spatial-temporal variability of their prey (Benoit-Bird & Au, 2003;Wang et al., 2015a;Wang et al., 2014a); this correlation was also proposed for the vulnerable local humpback dolphin (Wang et al., 2015b), and the fine-scale distribution pattern of soniferous fishes can aid in the conservation of these emblematic dolphins.

CONCLUSION
Using passive acoustic monitoring, the ambient biological sounds in the Pearl River Estuary were recorded and analyzed. In addition to single pulse, the sounds tend to possess a pulse train structure with a peak frequency between 500 and 2,600 Hz and most of the energy below 4,000 Hz. Sixty-six call types were identified based on the number of sections, temporal characteristics and amplitude modulation patterns. Most of the call types with IPPI medians at 9 ms and those with medians at 10 ms were observed at times that were exclusive from each other, suggesting that they might be produced by different species. A literature review suggested that the 1 + 1 and 1 + N 10 call types might belong to big-snout croaker (J. macrorhynus) and 1+N 19 might be produced by Belanger's croaker (J. belangerii). The baseline data of suspected fish sounds in our study can facilitate the evaluation of the impact from various infrastructure projects on the local aquatic environments by comparing the baseline to post-construction and/or post-mitigation effort data, and the fine-scale distribution pattern of soniferous fishes can aid in the conservation of the local vulnerable humpback dolphins.