Our high-frequency suspended sediment samples were collected at approximately 2-hr intervals during a spring storm that took place between March 29 and 31 in 2017 (ordinal days 88.46–90.41). Samples were collected near Coralville, the outlet of the small agricultural watershed, Clear Creek in Iowa, using an autosampler installed adjacent to the USGS stream gage (USGS 05454300). Clear Creek was one of the Intensively Managed Landscapes Critical Zone Observatory (IML-CZO) study sites. A total of 24 storm samples were collected, and 36 biomarkers were identified and quantified in each sample using the TMAH-thermochemolysis sample preparation followed by GC-MS analysis. The biomarker concentrations were obtained by normalizing the mass of a biomarker in a sample to the mass of OC in the sample and an internal standard. %OC of samples were obtained from elemental analysis. Detailed procedures of the chemical analyses and information about quantified biomarkers are available in our previous publications [22, 34].
A matrix with 24 samples \(\times\) 36 biomarkers was imported and scaled by biomarkers. After scaling, the matrix was transposed to display time series samples in columns and was then used to construct a biomarker heatmap. Hierarchical clustering analysis (Jain et al., 1999) was performed on biomarkers while the samples were displayed in the order of their collection time in columns of the heatmap, without clustering (Fig. 1). This biomarker heatmap illustrates a comprehensive overview of biomarker concentration variations, instead of the conventional data plots such as line graphs displaying concentration profiles of individual biomarkers (Fig. 1). The concentration variations are now recognized effectively and intuitively as ‘patterns’.
The clustering analysis results show three biomarker clusters with different activation periods during which their normalized concentrations are at their maxima (Fig. 2). For example, the first cluster of biomarkers (cluster 1 in Fig. 2) showed peaks in concentrations in the early portion of the storm and the color progressively shifted to white to blue for the later samples. This indicates that the source of this group of biomarkers was mobilized in the early stage of the storm and was either exhausted or overprinted by other sources in the later stage of the storm. The second group of biomarkers showed its peak later. The transition of the two sources occurred between 88.74 and 88.91 ordinal days, or over a time period of 4 hours. After the peak of the second biomarker cluster, the next transition of sources was observed more gradually over a period of ~ 12 hours (between ordinal days 89.24 and 89.74). Later in the storm, a third biomarker group dominated POC compositions, when the other two groups were relatively low in concentration.
Biomarkers in the first cluster include short- (C12-14) to mid-chain (C16-24) even carbon number saturated fatty acids and unsaturated C16 and C18 fatty acids. These biomarkers could be from multiple sources such as algae, bacteria, and some vascular plants [28]. These sources may be further resolved by looking for presence of additional compounds. For example, vascular plants typically produce high long chain fatty acids (C26 or longer saturated fatty acids) to short chain fatty acids (C12-14) ratios and the presence of vascular plant-specific biomarkers such as lignin. Activation of the first cluster coincided with lower concentrations (blue) of long chain fatty acids and lignin-derived compounds. We suggest that POC from in-channel algal and microbial communities were first activated.
The second and third clusters were more consistent with vascular plant inputs. Our biomarker heatmap divides what were presumed to be allochthonous sources into two groups (Fig. 2; cluster 2 and 3). Cluster 2 includes long chain saturated fatty acids, bacterial fatty acids, and phytosterols. The third cluster includes lignin phenols and cutin acids. Long chain fatty acids, lignin phenols, phytosterols, and cutin acids are all typical biomarkers of vascular plants and are used as molecular indicators of allochthonous sources in aquatic systems (Canuel and Hardison, 2016; Hatten et al., 2012). However, our heatmap indicates that there are at least two different time-resolved vascular plant sources that have different distributions of biomarkers. Hypotheses that arise from this observation are 1) different sources or different parts of a source (e.g., leaves, stems) were mobilized and introduced into the system sequentially, 2) there are different diagenetic (degradation) states of materials with altered biomarker distributions and they were delivered at different times and/or 3) hydrodynamic sorting separated the sources with different biomarker distributions during transport processes. Separation and preferential transport of different OC sources by hydrodynamic sorting has been observed in Schmidt et al. (2010) for marine sediments.
The clustering technique may ultimately provide clues concerning the sources of compounds with uncertain origins. As an example, hydroxy benzoic acids (Bds) have been proposed as indicators of oxidation of soil organic carbon (SOC) and used as tracers of a soil input, though the exact origins of the compounds remain unclear [35]. 3,5-dihydroxybenzoic acid (3,5-bd) and meta-hydroxybenzoic acid (m-bd) are the most often used soil indicators [35, 21, 36]. In our heatmap, m-bd and 3,5-bd, as well as ortho-hydroxybenzoic acid (o-bd) and para- hydroxybenzoic acid (p-bd) are clustered separately in all three groups (Fig. 2), suggesting different sources may be responsible for each compound. Further research is needed to identify potential precursors of each within the clusters. This example where we attempt to use clustering to identify the origin of compounds with poorly documented sources demonstrates the power of the clustering approach.
The sequence of in-channel algal activation followed by vascular plant inputs during storm events is commonly observed in many rivers and streams [22, 12, 37]. The observations of separate vascular plant inputs and the time-resolved behavior of the putative soil indicators are novel and were made possible by the combination of the high-temporal resolution sampling, the broad-spectrum biomarker analyses, and the heatmap-cluster data processing approach. What was typically treated as an integrated mixture may in fact have a complex temporal structure.