Quantifying ecosystem states and state transitions of the Upper Mississippi River System using topological data analysis

Aquatic systems worldwide can exist in multiple ecosystem states (i.e., a recurring collection of biological and chemical attributes), and effectively characterizing multidimensionality will aid protection of desirable states and guide rehabilitation. The Upper Mississippi River System is composed of a large floodplain river system spanning 2200 km and multiple federal, state, tribal and local governmental units. Multiple ecosystem states may occur within the system, and characterization of the variables that define these ecosystem states could guide river rehabilitation. We coupled a long-term (30-year) highly dimensional water quality monitoring dataset with multiple topological data analysis (TDA) techniques to classify ecosystem states, identify state variables, and detect state transitions over 30 years in the river to guide conservation. Across the entire system, TDA identified five ecosystem states. State 1 was characterized by exceptionally clear, clean, and cold-water conditions typical of winter (i.e., a clear-water state); State 2 had the greatest range of environmental conditions and contained most the data (i.e., a status-quo state); and States 3, 4, and 5 had extremely high concentrations of suspended solids (i.e., turbid states, with State 5 as the most turbid). The TDA mapped clear patterns of the ecosystem states across several riverine navigation reaches and seasons that furthered ecological understanding. State variables were identified as suspended solids, chlorophyll a, and total phosphorus, which are also state variables of shallow lakes worldwide. The TDA change detection function showed short-term state transitions based on seasonality and episodic events, and provided evidence of gradual, long-term changes due to water quality improvements over three decades. These results can inform decision making and guide actions for regulatory and restoration agencies by assessing the status and trends of this important river and provide quantitative targets for state variables. The TDA change detection function may serve as a new tool for predicting the vulnerability to undesirable state transitions in this system and other ecosystems with sufficient data. Coupling ecosystem state concepts and TDA tools can be transferred to any ecosystem with large data to help classify states and understand their vulnerability to state transitions.

ABSTRACT 23 Aquatic systems worldwide can exist in multiple ecosystem states (i.e., a recurring collection of 24 biological and chemical attributes), and effectively characterizing multidimensional states will aid 25 protection of desirable states and guide restorations. The Upper Mississippi River System is a large 26 floodplain river spanning 2200 km with a 30-year water quality monitoring data to depict ecosystem 27 states and guide river restoration. We coupled this highly dimensional dataset with multiple topological 28 data analysis (TDA) techniques to classify ecosystem states, identify state variables, and detect state 29 transitions over 30 years. Across the entire river system, TDA identified five ecosystem states. State 1 30 was characterized by exceptionally clear, clean, and cold water (i.e., a 'clear-water state'); State 2 had 31 the greatest range of environmental conditions; and States 3, 4, and 5 had very high concentrations of 32 suspended solids (i.e., 'turbid-states'). The TDA mapped clear patterns of the ecosystem states across 33 several riverine navigation pools, seasons, and habitat types that furthered ecological understanding. 34 State variables were identified as suspended solids, chlorophyll a, and total phosphorus. Some pools 35 underwent seasonal state transitions each year, entering State 1 in winter and State 2 in other seasons. 36 One pool transitioned sporadically several times between States 2 and 3. The TDA was a novel technique 37 that showed the river was characterized by predictable ecosystem states and a few state variables, and 38 pools experienced state transitions based on seasonal or episodic events. These results can inform 39 decision making and guide actions for regulatory and restoration agencies by assessing the status and 40 trends of this important river and providing quantitative targets for state variables if management aims 41 to purposely shift the turbid-state to more desirable conditions like States 1 and 2. The TDA correlation 42 function may serve as a new tool for predicting the vulnerability to undesirable state transitions. These 43 state concepts and TDA methodologies can be transferred to any ecosystem to help classify states and 44 possibly predict vulnerability and causes of state transitions. 45 AUTHOR SUMMARY 48 The Mississippi River in the USA is a large, complex ecosystem that had many changes to water quality 49 since the 1990s. We used a new set of mathematical tools called 'topological data analysis' to help 50 identify common water quality conditions or 'states' of this important river. We also explored when the 51 states have changed over the past 25 years and vulnerability to change. The tools identified five 52 common water quality states, distinguished by either by clear waters or muddy waters. The state 53 changes were not abrupt but rather had improved or declined gradually over many years, depending on 54 the river reach. River reaches named Pool 4 and Pool 8 had become clearer over time. Pool 13 became 55 muddier over time. Three other river reaches were consistently muddy. Our study is a good example of 56 how these topological data analysis tools may help others managing ecosystems by improving 57 communications about what states exist and which are desirable (for example, the clear water state is 58 desirable). Using these tools with long-term data can also alert ecosystem managers to the ongoing, 59 complex changes and vulnerability to undesirable changes so they can intervene with restoration. 60 Many ecosystems worldwide are undergoing severe and rapid changes and managing ecosystem 73 states and states transitions are a time-sensitive and difficult challenge (1). Large floodplain rivers are 74 especially degraded and continue to experience major changes to hydrology and water quality (2-5). 75 Monitoring ecosystem states and states transitions with long term data and advanced analytical tools is 76 critical for protecting and restoring ecosystems that humanity values or depends on (6). 77 Ecosystem states (i.e., a recurring collection of biological and chemical attributes) are well defined in 78 many types of ecological systems, but not for large rivers. Many biomes have at least two states defined 79 by the lower trophic levels, such as the plant-dominated or phytoplankton-dominated states (7,8), 80 marine kelp forest or deforestation (9), mesic grasslands or desertification (6), mesic grasslands or 81 shrublands (10), and diverse coral reefs or macroalgal-dominated reefs (11). State characterization is 82 always multidimensional, and typically has only a few state variables although sometimes is defined by 83 complex biological communities (12). A few examples of state transitions include the restoration of 84 aquatic vegetation and water quality in shallow lakes (8), altered disturbance regimes and subsequent 85 restorations in tallgrass prairies (13,14), and successive state shifts and a novel state arising from 86 multiple stressors in a deep lake (15). 87 Ecosystem states have been conceptualized (16,17) but not well tested in rivers until recently 88 (18,19). The global scientific and management community calls for applying an ecosystem states 89 framework to rivers, while acknowledging most rivers do not have sufficient data to properly evaluate 90 states (20,21). When big data are available, identifying the states and their constraints is key for 91 ecological restoration planning, post-project monitoring, and adaptive management (22). 92 State concepts are complex and multidimensional, and therefore requires substantial data, 93 advanced and novel analysis techniques for disentangling patterns, and expertise for its ecological 94 interpretations. More analytical tools for 'big data' that can reasonably predict ecosystem states and impending state transitions will help conserve, restore, and manage ecosystem resilience 96 (6). Topological data analysis (TDA) provides a powerful suite of tools for disentangling patterns and 97 relationships among multidimensional data and presenting data visually (23). Tools of TDA have 98 effectively shown changes in the human gut following bacterial disturbance (24), predict wildfire 99 severity (25), and assess vulnerability of residential homes to climatic changes (26). The use of TDA 100 remains rare in ecology, although TDA may be an ideal tool for many ecological applications because of 101 its multidimensional framework (27). 102 Objectives and Hypotheses: 103 We provide an analysis of ecosystem states and transitions using topological data analysis (TDA) 104 tools to better characterize the Upper Mississippi River System in the USA, which is a significant 105 ecosystem for humans and wildlife (28). We hypothesized that this river would be in at least two states 106 (17,21): a 'clear-water state' characterized by clean and clear water, and a 'turbid-state' characterized 107 by high turbidity due to high suspended solids and chlorophyll a (Hypothesis 1). We expected at least 108 three state variables based on nutrients and turbidity (Hypothesis 2) but included up to eight 109 hypothesized state variables within TDA. We explored whether different riverine pools were in different 110 states (Hypothesis 3) given their differing water quality conditions and restoration priorities (28,29). We 111 expected TDA would detect an abrupt state transition around years 2008-2010 when aquatic vegetation 112 rebounded (30) and water quality conditions improved (28) (Hypothesis 4a), and that TDA could also 113 detect gradual water quality improvements based on the proportion of sites entering a 'clear-water 114 state' (Hypothesis 4b). 115

116
How many ecosystem states were in the UMRS? 117 The TDA Mapper graph output has six connected components where the main component 118 represented more than 99% of the data (Figure 1). Only <1% of the data points and were excluded from 119 the main TDA structure as 'outliers' or 'noise' (Figure 1 dark violet nodes). The main component had a 120 "body" and a "tail," suggesting there were at least two main ecosystem states as determined by 121 topology (in support of Hypothesis 1). 122 were refined and nested within the body and tail of the TDA structure; States 1 and 2 were nested 132 within the body, and States 3-5 were nested within the tail. There were five, small transition nodes in 133 the TDA structure, which lied between states 1 and 2. States 1, 2, 3, 4, 5 contained 20%, 85%, 2%, <1%, 134 and <1% of data, respectively. 135 136 What were the state variables that defined and distinguished riverine ecosystem states? 137 State 2 contained most sites (Table 1), and the state variables differentiating State 2 from others 138 included total phosphorus, chlorophyll a, and suspended solids (in support of Hypothesis 2). However, 139 State 2 had the greatest variability and many statistical outliers for all state variables. State 1 may be 140 considered the 'clear-water state' because this state had the lowest total phosphorus, highest dissolved 141 oxygen, lowest chlorophyll a, coldest water temperatures, slowest water velocity, and lowest suspended 142 solids relative to the other four states (Figure 2). State 1 was differentiated from State 2 principally by its 143 colder water temperatures and lower suspended solids (Figures 2f, 2h). State 2 typically had lower 144 nutrients but higher chlorophyll a relative to States 3, 4, and 5. 145 Table 1. The distribution of sites in each upper Mississippi River pool (column 1: Pool name, field station 147 city and state) that were classified into a topological/ecosystem state within years 1993-2019. Table 1a  148 shows the proportion (%) of sites in each state according to river pool for comparisons of ecosystem 149 states among the six pools. Table 1b shows the proportion (%) of sites classified within a pool to 150 compare the distribution of states within each pool. Sites sometimes can exist in more than one node, 151 and therefore the percentage of sites within a state may sum to greater than 100%. 152   System (in support of our Hypotheses 1-3 and visualized in Figures 2-4). The TDA detected short-term 241 state changes based on seasonality and episodic events and provided evidence of gradual, long-term 242 changes due to water quality improvements over three decades (Figure 4). We discuss how TDA may be 243 used as a new technique for describing ecosystem states at multiple spatiotemporal scales, how the 244 results will inform ecological restoration and management, and how the state concepts tested with TDA 245 methods may be transferred to any ecosystem that exists in one or more states. 246 We used big data (70,000 sites and 8 variables) from long-term river monitoring (43), and these 290 TDA results can guide structured decision making, adaptive management, and vulnerability assessments 291 in the Upper Mississippi River System. This is the first study in this large watershed to classify ecosystem 292 states, which is foundational for describing the system's ecological structures and functions. Having 293 defined ecosystem states allows practitioners to communicate which states are desirable for 294 preservation and which regimes are undesirable for mitigation, as initiated in (17,29,32). We identified 295 the state variables (specifically total phosphorus, suspended solids, and chlorophyll a), which can guide 296 decision making and actions for regulatory and restoration agencies by providing quantitative targets. 297

TDA as a novel technique for ecosystem states
Currently, specific regional water quality criteria are recommended for total phosphorus and chlorophyll 298 a, but not suspended solids (44). There remains a lack of a specified targets for suspended solids 299 concentrations although water clarity remains a common restoration goal (29), and our TDA results echo 300 the importance of suspended solids a state variable. 301 The TDA's correlation function can be a novel tool for assessing state transitions and 302 vulnerability as new, long-term data are gathered. This function effectively signaled state changes in the 303 Upper Mississippi River System due to seasonality, episodic events, and gradual changes to water quality 304 ( Figure 4). In addition to assessing individual water quality variables through time and across ecological 305 gradients (28), TDA can describe the status and trends of the holistic ecosystem due to natural changes 306 and restoration impacts. The state transition modeling can continue to reveal gradual improvements to 307 water quality and signal vulnerability to undesirable state transitions if States 3-5 become more 308 prominent (Figure 4) Selection of state variables for TDA should be ecologically meaningful and allow managers to influence 326 those variables to advance conservation (46). The state concepts and TDA methodologies can be 327 transferred to any ecosystem to help classify states and predict their vulnerability to state transitions. environmental gradients like geomorphology, hydrology, and water quality. The study area was a high 337 restoration priority and had long-term data on water quality. 338

Study Area, Ecosystem State Context, and Long-term Data
We assessed ecosystem states based on water quality data at 69,307 sampling sites across the six 339 pools (collected four times annually in each season from years 1993-2020), which is a 27-year 340 consecutive record. The water quality data was collected by a long-term, standardized protocol whereby 341 sites were selected by a stratified random sampling design and strata were habitat types like main 342 channel borders, contiguous backwaters, impounded areas above the lock and dams, and lotic side 343 channels (47). Using ecological literature and a priori hypotheses, we chose eight water quality variables 344 thought to be important for defining ecosystem states and ensured the variables were not highly 345 correlated (Pearson's r < 0.45). The 8 variables included: total nitrogen = mg L -1 ; total phosphorus = mg L -346 1 ; dissolved oxygen = mg L -1 ; chlorophyll a = μg L -1 ; water depth = cm; temperature = °C; velocity = m sec -347 1 ; suspended solids = mg L -1 . 348 The three variables with ~66% of the values per the study design were total nitrogen, total phosphorus, 375 and water velocity. To address missingness, we compared seven interpolation methods and utilized a 376 random forests algorithm as the top preforming method. The other occasional missing values were 377 imputed with the median value for each variable. Two water clarity variables, turbidity and secchi disk 378 depth, were not used because they were highly correlated with suspended solids. We used robust 379 scaling so each variable's differing measurement units were accounted for in the multivariate analysis. Input: X, input dataset of continuous variables 390

Quantifying ecosystem states and the state variables
Filter X using f : X → Z, where dim(Z) < dim(X) 391 Cover f with a finite cover U 392 Obtain f −1 (U), a finite cover of X 393 Apply The parameters of the TDA Mapper algorithm are chosen by the user and include filter function, size of 400 the hypercubes, percent overlap, and clustering algorithm. As of 2022, there were not established 401 methods for selecting parameters. The first two principal components (as determined from principal 402 component analysis, PCA) were used as the filter function. As such, we used the following heuristics to 403 determine the optimal cube size and percent overlap parameters like Chang et al. (2020): each node 404 must contain no more than 10% of the data and at least 90% of the data must be used in the main 405 structure. We used the ratio between the explained variances of the first 2 PCA components of our filter 406 function as the ratio of the components in the cube size parameter. Since  We applied a density algorithm to the undirected TDA Mapper graph to produce a directed graph 425 with defined states as adopted and modified from Chang et al. (2020). We used k-nearest neighbor whiskers (25 th -75 th percentiles) did not overlap among the states, we considered that a likely state 450 variable. We did not run formal statistical analysis for 'statistical significance' among the five boxplots 451 (e.g., analysis of variance among the states) because the high number of data points (n=76,669) and 452 degrees of freedom precluded that option. 453 Quantifying ecosystem state transitions 454

Temporal Correlation Function 455
We tested whether states were stable or transitioned over 27 years. We analyzed the state 456 composition of each pool during each season (autumn, winter, spring, and summer). On average, each 457 season contained 122 datapoints from each pool that were collected within a range of 15 days. We 458 defined a temporal correlation function for a pool-state pair as: 459 where is the number of sites collected that lie within nodes of that state during that season, and 461 is the total number of sites collected during that season. In other words, the temporal correlation 462 function is the proportion of sites in that state compared to the total number of sites collected during 463 that season. A hypothetical example showing a state transition between two states is found in Figure 5.