The use of multivariate PCA dataset in identifying the underlying drivers of critical stressors, looking at global problems through a local lens

Palynology-based multivariate datasets including geological, ecological, and geochemical data identified the relative importance of the underlying drivers of critical stressors to coastal wetlands by identifying and distinguishing between fluvial flooding, saline water intrusion, delta switching, and the landward migration of coastal plants. A sediment core was retrieved using a vibracorer from an intermediate marsh in Lake Salvador, Louisiana, USA. X-ray Fluorescence (XRF) quantified fluvial and marine elemental concentrations (Cl, Sr, Ca, Mn, K, Ti, Fe, Zn, Zr, Br). Palynology-based agglomerate hierarchical analysis of thirty-two pollen taxa was employed to define ecological clusters. The implementation of multivariate principal component analysis (PCA) to geochemical and ecological variables inferred the source of sedimentary material by correlating four taxonomic groups (floodplain trees, upland trees, tidal freshwater herbs, and inland herbs) to specific geochemical signatures and facilitated the testing of potential correlations between geo- and hydrological-conditions and the six ecosystems (interdistributary, delta-plain, deltaic lake, bottomland and swamp forests, freshwater marsh, and intermediate marsh) depicted in each PCA biplot. The PCA scores quantified the relative importance of multiple variables. The squared cosine function, which demonstrates the relative importance of a variable for a given observation, was used to estimate the representation of each variable on the principal component biplots. Multivariate statistical datasets can be valuable to any scientist working across the spectrum of environmental and planetary science fields as a means of identifying the relative importance of diverse background parameters in controlling ecological and environmental conditions. This methodology is applicable across both natural and social sciences as a means of distinguishing natural and anthropogenic impacts.

The PCA scores quantified the relative importance of multiple variables. The squared cosine function, which demonstrates the relative importance of a variable for a given observation, was used to estimate the representation of each variable on the principal component biplots. Multivariate statistical datasets can be valuable to any scientist working across the spectrum of environmental and planetary science fields as a means of identifying the relative importance of diverse background parameters in controlling ecological and environmental conditions. This methodology is applicable across both natural and social sciences as a means of distinguishing natural and anthropogenic impacts.  Table   Subject Earth and Planetary Science Specific subject area Multivariate PCA data set identifying mechanisms (e.g., subsidence, eustatic sea-level rise, tropical cyclones) driving coastal environmental changes Type of data Table  Figure How the data were acquired Sediment coring : A 448 cm long sediment core (LSWMA) was collected using a vibracorer.

Value of the Data
• Multivariate (ecological, geological, and chemical) statistical data, designed for coastal and environmental studies, provide valuable information to distinguish the relative importance of background parameters in controlling the environmental conditions, which was then used to infer the responsible external forcing agents. • Multivariate statistical datasets can be useful to any scientist working in environmental and planetary science fields as a means of identifying the relative importance of background parameters in controlling ecological and environmental conditions. This methodology is applicable across both natural and social sciences as a means of distinguishing natural and anthropogenic impacts. • The datasets can be used in all scientific fields dealing with such pressing global problems as oceanic transgression, species extinction, global warming, the abundance of plastic trash, etc. It provides a starting point and points of attack for devising solutions. This approach can infer hidden forcing agent(s) if the relevant dataset provides sufficient temporal and/or spatial coverage.

Experimental Design, Materials and Methods
The datasets and frames were devised to identify external forcing agest(s) driving each ecological transition and to elucidate the possibility of applying these mechanisms to current and future conditions in coastal environments, a nexus of such globally important environmental and societal stresses as coastal erosion, oceanic transgression, infrastructural vulnerability, human displacement, flooding, saline water intrusion, landward migration of coastal plants, and species extinction. Although these negative impacts are generally global, resolutions are not, as wide variability in local conditions requires site-specific solutions. In any specific location, the first step, of course, is to identify the proximate cause of the stressor, which can be driven by a variety of external factors such as eustatic sea-level rise (SLR), subsidence, climatic shifts, anthropogenic activities, extreme events, and fluvial dynamics (river course changes), all of which leave potentially identifiable ecological and geochemical imprints in the sedimentary record. This methodology has been created to identify the specific external forcing agent(s) driving environmental change at a specific site at a specific time. This is accomplished through the use of multivariate Principal Component Analysis (PCA) of sedimentary data (ecological and geochemical). The methodology involves the following steps.
Step 1. Sample collection and modern vegetation database. The sampling site was carefully chosen after a series of reconnaissance in order to minimize sedimentary disturbance associated with anthropogenic activities (e.g., canal dredging). Plants around the study site were widely analyzed in order to establish modern pollen and vegetation databases.
Step 2. Creating data frames. In order to perform the multivariate statistical tests, pollen data ( Table 1 ) were correlated with core-log geochemical datasets, creating eighty-two rows on an ecological and geochemical data frame ( Table 2 ). Based on the agglomerative hierarchical procedure [ 7 , 8 ], six ecosystems (interdistributary, delta-plain, deltaic lake, bottomland and swamp forests, freshwater marsh, and intermediate marsh) were determined ( Table 3 ). The "data.table()" function was employed to implement six ecosystems to core-log datasets creating eighty-two individuals.
Step 3. Multivariate PCA technique. Multivariate statistical technique facilitated the testing of potential correlations between the marine (Cl, Sr, Ca) and terrestrial (Mn, K, Ti, Fe, Zn, Zr, Br) elements and the four taxonomic groups (floodplain trees, upland trees, tidal freshwater herbs, and inland herbs) by means of software R packages (e.g., "factoextra", "FactoMineR") [6] . The datasets in different scales were standardized as means of "scale()" function [12] . Eigenvalues measuring the amount of variation were computed with the code "get_eigenvalue()", and visualized with the code "fviz_eig" [12] . The representation of each variable on the principal components biplots was estimated with the squared cosine function [13] . The environmental information in given geological and geochemical datasets corresponds to the total variation along the PCA scores, which were visualized graphically in biplots. An ellipse function [14] was used to plot confidence ellipses around group mean points.
Step 4. Biplot interpretation. The biplots (SI1 to SI6) display the correlations between ecological and geochemical variables. The ellipses of categorical variables visualize the relationships between six ecosystems and hydrological conditions, with a larger ellipse indicating a larger variable [14] . The horizontal Dim1 axis represents the first principal component, while the orthogonal Dim2 axis indicates the second principal component. The PCA values elucidate the relative importance of the total variables as they reduced the dimensionality of the multivariate data elaborating unique dimensions of dataset variability. The representation of each variable in biplots indicates the importance of each variable for a given observation. The distance of a variable from the biplot center, positively correlates to relative importance.
These multivariate PCA datasets facilitate millennial-scale paleoenvironmental reconstructions, including hydrological and ecological analyses with accurate dating techniques (e.g., lowenergy germanium γ -spectrometer and AMS radiocarbon [10] ). The detailed information regarding hydrology, geology, vegetation, and soil chemistry over a large swath of the Louisiana coast for an extended time frame, will provide reliable insights into plant migration, relative sea-level rise, salinity variability, and climate change (SI6), thus providing significant datasets for the future wetland study.