Temporal series analysis of abiotic data for a subtropical Brazilian rocky shore

Monthly data of abiotic variables (sea surface temperature; minimum and maximum air temperature; minimum, mean and maximum air humidity; minimum, mean and maximum atmospheric pressure; minimum, mean and maximum dew point; sea surface salinity; wind speed and direction; minimum and maximum tidal level and photosynthetically available radiation) were collected from different online repositories, all regarding the period between January 2013 and December 2017, from localities near Mar Casado Beach rocky shore, in São Paulo State southern coast, Brazil. Principal Component Analysis was performed to verify data variance and correlations among variables. Linear regression decomposition methods were applied to identify trend and seasonal patterns within the time indexed data. Deseasonalized time series were analyzed to identify structural breaks in trend patterns. Spectral analysis was applied to detrended time series.


Data
Intertidal rocky shore communities are daily and seasonally exposed to a variety of abiotic stressors, such as variations in tidal level, air and sea water temperature, solar radiation and desiccation rates. These abiotic factors are known to affect survival and reproduction of organisms inhabiting the intertidal zone of rocky shores [1]. Although there are monitoring initiatives of marine communities for the Brazilian coast (e.g., ReBentos monitoring sites [2]), the monitoring of abiotic factors is still scarce. In an attempt to supplant this lack, we gathered historical data available in different online databases, all concerning Brazilian southeast coast, specifically São Paulo State southern coast (Table 1), from localities near Mar Casado Beach rocky shore. Thus, biotic responses to environmental factors may be studied in a temporal context.
A principal component analysis of quarterly data (as explained further in section 2.2) recovered two distinct seasons: (I) a hot and moist season from October to March (encompassing Spring and Summer) and (II) a cold and dry season from April to September (encompassing Autumn and Winter); an expected result for subtropical zones [3] (Fig. 1). Principal Components (PC) 1 and 2 explain 56.4% of data variance. The plot shows only non-covariant variables identified after a first-round PCA considering all 18 variables, in which SST co-varied with maximum and minimum air temperature and also maximum, mean and minimum dew point. Mean atmospheric pressure co-varied with maximum and minimum values of this same abiotic variable. Also, mean air humidity co-varied with maximum and minimum values of this same abiotic variable.
Time series modelling was carried using only nine of the original 18 abiotic variables (SST, mean air humidity, PAR, mean atmospheric pressure, maximum and minimum tide level, wind speed, wind direction and SSS), as the remaining variables co-varied through time with some of the modelled data. Fig. 2 shows trend, seasonal variation and residuals decomposition for eight variables, since trend and seasonal features for wind direction time series could not be deterministically modelled, a result indicating that wind direction might be stochastic.
Structural breaks in trend pattern were accessed through deseasonalized time series. These structural breaks are points in time in which statistical patterns change, highlighting sections of time when an increasing or a decreasing trend is more pronounced [4]. Fig. 3 shows deseasonalized time series, overlaid to their structural breaks (vertical lines) and increasing or decreasing trend for each time interval. No breakpoints were identified in minimum tide level time series, consistent with time series decomposition (Fig. 2F)  breakpoints were identified in PAR time series. While no deterministic trend could be identified in mean atmospheric pressure time series (Fig. 2D), a breakpoint could be identified (Fig. 3C) although the confidence interval stands outside the observation time interval. A confidence interval problem also occurred in maximum tide level structural break analysis (Fig. 3D). We expect that the number of breaks in a biotic variable time series (for example, the percentual cover of barnacles on the studied rocky shore) would be similar to those of a correlated abiotic variable. In order to understand the periodicity of each abiotic factor, spectral analysis was applied to detrended time series. Significant Fourier frequencies at 1.0, corresponding to a quarterly rate, were observed in SST, PAR, mean atmospheric pressure and maximum tide level time series (Fig. 4AeD, respectively). Significant Fourier frequencies at 2.0, corresponding to a semestral rate, were observed in mean air humidity, minimum tide level and wind speed time series (Fig. 4EeG, respectively). A significant Fourier frequency at 1.3, corresponding to an approximate 4 months rate, was observed in SSS time series (Fig. 4H).

Data mining
Monthly abiotic data were collected from different online repositories, all regarding the period between January 2013 and December 2017, from localities near Mar Casado Beach rocky shore, in São Paulo State southern coast, Brazil. Different sources were used for the construction of this dataset as different monitoring initiatives presented specific monitoring resolutions, and there was no database that assembled all the abiotic data of interest for the same site.
Air temperature ( C), air humidity (%), atmospheric pressure (Hg mm) and dew point ( C) daily data were collected from WeatherUnderground [5] for Santos Air Base and missing points in time were filled with data from Instituto Nacional de Pesquisas Espaciais [6] for Itanha em, São Paulo, Brazil. Data for SST (ºC) was extracted from the NOAA OI daily SST with a 0.25-degree resolution model [7] for the coordinates À23.875, À46.125. Monthly data for PAR (E/m 2 s ¼ 1 mol of photons m À2 s À1 [SI]) was collected from NASA Ocean Color MODIS Terra satellite data bank [8] for the coordinates À23.979170, À46.187496. Monthly data for SSS (PSU ¼ g/kg [SI]) were collected and assembled from NASA Aquarius [9] with a 1-degree resolution and Remote Sensing Systems [10] with a 0.25-degree resolution for the coordinates À24.5, À46.5 and À24.875, À46.625, respectively. Wind direction (º) in relation to the parallels and wind speed (Km/h) data were collected from Blended daily averaged 0.25-degree Sea Surface Winds assembly (at a 10 m level), provided by NOAA [11], for the coordinates À24.00, À45.75. Daily tidal level (m) data was collected from the Tide Board of Marinha do Brasil [12] for the Port of Santos. Averaging was applied to daily data in order to build a monthly database.

Principal component analysis
In order to identify co-variant abiotic variables, a Principal Component Analysis (PCA) was carried using the software R-0.3.5.0 [13]. Pre-treatment of the data before PCA was averaging the sampled months of January to March (Summer); April to June (Autumn); July to September (Winter) and October to December (Spring), centering (subtracting the mean value of each variable) and scaling (dividing each variable by its standard deviation) [14]. Afterwards, PCA was performed using the function princomp from the stats R package [13].

Time series analysis
A time series is a collection of observations indexed by time. The main features of a time series are trend and seasonal variations that can be modelled deterministically in function of time [15]. Each abiotic variable collected was treated as a continuous quarterly time series, using the average (as explained in section 2.2). While decomposing a time series, its trend and seasonal features might interact additively or multiplicatively. Once identified the relation of seasonal and trend variations, and log transforming the data if a multiplicative relation is observed, a linear regression (considering higher order polynomial in function of time and seasonal variations) was used to find a model that better fitted the observed data. Two models were considered: (I) linear model with seasonal variables:  Where x t is the modelled time series in function of time t (t ¼ 1,2,…,20 [4]), a is a coefficient, s t is the seasonal component for which each season has a coefficient b and z t is the residual time series, defined as the difference between original data and fitted model; and (II) harmonic seasonal model: Where x t is the modelled time series in function of time t (t ¼ 1,2,…,20 [4]), a is a coefficient, S is the number of seasons (S ¼ 4, for a quarterly time series), s i and c i are coefficients, and z t is the residual time series, defined as above [15]. The best model was elected by the minimization of its Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Residuals Sums of Squares (RSS).

Trend analysis: structural breaks approach
A structural breaks approach to trend analysis is commonly used to identify intermediate time intervals in which the data follow an increasing or decreasing trend pattern that might be different from the global trend pattern [4]. The R package strucchange [16] includes functions that identify breakpoints in a deseasonalized time series. Deseasonalized time series were obtained by subtracting the seasonal component modelled in time series analysis (Section 2.3). The function breakpoints [16] was used to determine the number of structural breaks, using a maximum of 5 breaks (m ¼ 5) and choosing the number of breaks which minimized the BIC and RSS. Finally, a linear regression was applied to each time interval, considering only first order polynomial in function of time (trend ¼ a 0 þ a 1 t, t ¼ 1,2,…,20 [4]).

Spectral analysis
Spectral analysis can be used to identify periodic events in a stationary time series that may be corrupted by other intrinsic variations [15]. Firstly, detrended time series were obtained by subtracting the trend component modelled in the time series analysis (Section 2.3). Time series stationarity was accessed using the function adf.test from the tseries R package [17]. When not stationary, differentiation was applied to the detrended time series imposing a differentiation order enough to make it stationary, using the function diff from the base R package [13]. The periodogram for each time series was then obtained using the function spectrum from the stats R package [13].