A comprehensive review on the design and optimization of surface water quality monitoring networks. Environmental Modelling and Software,

The surface water quality monitoring network (WQMN) is crucial for effective water environment management. How to design an optimal monitoring network is an important scientific and engineering problem that presents a special challenge in the smart city era. This comprehensive review provides a timely and systematic overview and analysis on quantitative design approaches. Bibliometric analysis shows the chronological pattern, journal distribution, authorship, citation and country pattern. Administration types of water bodies and design methods are classified. The flexibility characteristics of four types of direct design methods and optimization objectives are systematically summarized, and conclusions are drawn from experiences with WQMN parameters, station locations, and sampling frequency and water quality indicators. This paper concludes by identifying four main future directions that should be pursued by the research community. This review sheds light on how to better design and construct WQMNs.


Introduction
Surface water, such as streams, rivers, wetlands, lakes, estuaries and coasts, is the important source of water for human life and industry production and also the most accessible and polluted in many countries. Monitoring activities can help understanding, protecting and improving aquatic habitats and water quality data analysis help to quantify environmental changes and develop best management practices for informed decisions (YSI, 2020). Therefore, the water quality monitoring network (WQMN) is a key element for managing and protecting water environment as it captures information about the states of water systems.
WQMN design and deployment involves not only scientific, but also economic, legal, and technical aspects. A WQMN usually needs to meet different administration requirements, such as regulation for violation and emergency monitoring of incidents. Although the earliest monitoring activity started in 1960s (Sanders et al., 1983), WQMN design remains a critical challenge in both developed and developing countries (Nguyen et al., 2019).
Scientists and practitioners have made many efforts to improve the design of WQMN. They have to balance management requirements against many constrains and influential factors, including budget, monitoring sites, sampling frequency, technology, administrative purpose, and representativeness (Behmel et al., 2016). Reported methods mainly concentrate on allocation of water quality indicators, sampling locations, frequencies and durations. However, few official guidelines are available for quantitative design methods in practice (Shi et al., 2018). In the guidelines officially published by the WHO and environmental protection agencies of different countries (e.g., USEPA, EUEPA, and China EPA), monitoring strategies mainly focus on how to organize monitoring activities (Bartram and Balance, 1996;Behmel et al., 2016;EPA, 2015;Loo et al., 2012;Watkinson, 2000;Zhang et al., 2010).
Recently rapid development of water quality monitoring technology and instruments provide more alternatives, however this leads to higher flexibility of network design and more complexity in network implementation, to meet different administration requirements. It has evolved from traditional field sampling with lab analysis to online monitoring with wet chemistry approaches and advanced in-situ sensors. Moreover, proxy/surrogate monitoring technologies which combine mathematical models with in-situ sensors, are emerging recently (Jones et al., 2011;Viviano et al., 2014). Water quality monitoring has expanded from direct stoichiometric analysis to spectrometry based on optical reflection, scattering or absorption and approaches such as laser radar, remote sensing, and UV/Vis (Ultraviolet-visible) spectrometers. New carriers for sensors have also become increasingly popular and have spread rapidly in practice, such as unmanned vehicles (e.g., drones, boats), buoys, monitoring cars.
The development and implementation of the 'smart city' concept globally (Chapman, 2019) introduce new requirements to the network design, where water-related problems are amongst those of concern and smart urban environmental protection is an important frontier in smart city construction (Alavi et al., 2018;Butler et al., 2014;Ramaswami et al., 2016). Under this circumstance, monitoring infrastructure serves as perception neurons and plays a fundamental role in "smartness", as part of the smart city platform or "city brain" (Chen and Han, 2018;Reis et al., 2015). The WQMN is thereby one of the important infrastructure components of smart city and closely linked with environmental system models, another important component of smart cities, with timely processing and response. Administrative departments such as EPA are facing more challenges than ever before in the optimal design of WQMNs in the smart city era (Chen and Han, 2018). Traditional non-quantitative design approaches including expert panels and brainstorming based on general guidelines are not enough to find an optimal balance (Chapman et al., 2016).
There is an urgent need to conduct a systematically literature review to provide a timely academic and practice reference on the wave of WQMN constructions around the world. Nguyen et al. (2019) conducted a review of WQMN design for rivers, which mainly highlights the influence of the scale of the study area, i.e., watershed size, and water quality indicators for routine regulatory networks. Behmel et al. (2016) provided a review and perspective on the management of monitoring strategy, not focused the designing method itself. However, these reviews offer limited guidance on the perspective of management requirements, regional differences, design method evaluation and flexibility, experiences and recommendations, new trends and opportunities, and linkage between monitoring and modelling.
This paper aims to provide a comprehensive review on the published design approaches for WQMNs of natural surface water bodies. It covers the following aspects (1) holistically analysing the methods and cases proposed in the literature, (2) identifying challenges, future trends and opportunities, and 3) providing most important experiences and guidelines for decision makers, public services managers and other stakeholders in practice. Section 2 illustrates the review methodology. Section 3 presents the characteristics of related research based on a bibliometric analysis, focusing on publishing history, areas, countries, affiliations, technical classifications, and data availability. Section 4 summarizes the characteristics, advantages and disadvantages of reported design methods including topology, information entropy, geostatistics, multivariate statistics and different optimization approaches. Section 5 analyses experiences in the design of monitoring locations, monitoring frequency and water quality indicators. Section 6 analyses the linkage between monitoring and water quality modelling. Section 7 discuss specific concerns, outlines important implications learned from the review, and recommends future research directions.

WQMN categories and design parameters
A logical structure of design/optimization of WQMNs is shown in Fig. 1. Management requirements of WQMNs refer to different administration types (i.e., monitoring purpose) and associated measurement resolution of the water bodies monitored (Chacon-Hurtado et al., 2017), and its constrains include financial resources, data availability, monitoring technologies, accessibility of locations, and administrative and legal considerations. All these drivers of network design can be linked to three basic design parameters of WQMN: monitoring locations, monitoring frequency, and water quality indicators, which were first identified by Sanders et al. (1983) and now have been well-accepted. This review mainly focuses on these three key parameters of network design and optimization.
Generally, the WQMNs can be categorized by three administration types. The first kind of WQMN is for regulation monitoring, the most fundamental function of WQMNs, which is used to monitor the water environment status and regulatory compliance. WQMNs can be used for pollution event emergency management, and it usually includes two different functions: early warning/forecast monitoring (type 2) and source identification monitoring (type 3). In practice, many established regulation WQMNs can be updated to emergency use. Therefore, it can be difficult to distinguish these three administration types.

Literature search and selection criteria
A comprehensive literature search (through November 2019) was conducted of studies using quantitative methods to design river WQMNs. The review was mainly based on research published in international journals or conference proceedings (Cetrulo et al., 2019). We searched for literature in Google Scholar and Web of Science using the different combinations of the following Boolean search strings for paper theme: "water quality" AND ("monitoring network" OR "sample") AND ("design" OR "optimize"). We then manually check the searched targets one by one and applied the following rules to condense the final paper sample set: inclusion of studies focusing on rivers, lakes, watersheds, estuaries, or coasts; exclusion of studies on groundwater; exclusion of studies on urban water supply systems and drainage systems; exclusion of studies on hydrological monitoring such as water discharges, water levels, and speeds; exclusion of studies that do not include quantitative results; exclusion of studies in languages other than English. Ninety articles were selected, most of which were published in the last 20 years.

Review strategy and analysis approaches
The diagram of review strategy is shown in Fig. 2. The bibliometric characteristics were firstly conduced. Chronological pattern was analysed to show the development of the research topic and trends outlook. The distribution of countries studied was analysed to illustrate how the leading countries varied at different stage and their potential to applying WQMN design methods. Journal distribution was analysed to illustrate the research outputs in different disciplines, academic societies, and specific areas. Authors and affiliations were analysed to identify the active scholars, wide attention and collaborations. Citation was analysed to reflect the spread of the knowledge and whether the WQMN design is a hot topic or not. The underlying reasons/drivers were also discussed.
To provide valuable references for practitioners and policy makers, the review on the design methods and associated network parameters is highlighted in this work. The design methods were classified. Method principles, objective functions and optimization criteria were systematically summarized. And the advantages and flexibility in different scenarios were identified. Then the experiences with design network parameters were summed up and the research gaps in literatures were identified.
Relative issues were reviewed and discussed, including modern water quality sensors, relationship and implications to surface water quality modelling, data scarcity and associated uncertainty, more complex network architecture, comparison with other water system, etc. These are emerging issues recently and important for the application of network design method in practices. Future research directions were finally comprehensively provided from aspects of methodology, monitoring technologies and management practices. Fig. 3 shows the temporal distribution of publications on the given topic. As illustrated in this figure, the study history can be divided into three stages analogous to plant growth: I. Sprouting stage (before 2005), II. Seedling stage (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013), and III. Growing stage (after 2013). Very few studies were published before 2004. The first study found was Sharp (1971) in Water Resource Research, who used the topological centroid concept to develop a uniform sampling plan for a river in South Carolina, USA. In the sprouting stage, the majority of study areas were in developed countries or areas, and USA scholars conducted pioneering work on this topic, accounting for 40% of studies. In the 2000s, stable publication status was reached after 2004, with three papers per year on average. In the seedling stage, it is interesting to find the area of Taiwan (18%) became predominant. Developing counties began to focus on WQMN design, e.g., such as Iran (14%). After 2013, up to 9.5 papers on average were published per year. Iran dominated this field with an 18% share, followed by mainland China and the USA. This changing distribution implies that WQMN design is receiving increasing attention due to the development of monitoring technology and increased environmental infrastructure construction around the world.

Countries studied
A total of 22 countries and regions were covered by the retrieved WQMN design studies. As shown in Fig. 4, USA, Iran, Taiwan, and mainland China have conducted the most extensive research, with 16, 14, 12, and 10 studies, respectively. As reported in Fig. 3 and Section 3.1, the study of WQMN design has undergone an obvious shift from developed countries or regions to developing countries or regions due to the more challenging environmental problems in the latter. The USA was at the forefront in the initial two stages before 2010 due to the overwhelming water quality monitoring campaign after the Clean Water Act amendment in 1972 and the implementation of pollution control programmes. In recent years, many developing countries or regions, such as China, Iran, Turkey, India, and Brazil, have made effective attempts, which may be partly attributed to the context of growing environmental pressure over water. Table 1 shows the distribution of publications among journals. 87 papers (the remaining 3 are conference papers) were published in 42 different journals, of which 39 (92%) are indexed in the Thompson database, i.e., SCI journals. This profile indicates that the design of surface water WQMNs has a broad readership and is well-received in many journals. The journal Environmental Monitoring and Assessment published 23% of all studies. Three journals have four publications each, and 6 journals have three publications each. The top 10 journals account for 57% of all publications.

Authors and affiliations
In total, 266 researchers contributed to 90 articles, which denotes the reviewed specific topic has received wide attentions. Only 4.5% of the authors had three or more publications: Kerachian, R. with 5 publications, Karmakar, S. and Nikoo, M.R. with 4 publications, 9 authors had 3 publications, including Aral et al., and 14.3% of authors published two  articles. Figure 5 presents the distribution of the affiliations of the coauthors. Universities clearly led the studies and were involved in a total of 84 papers (93%). Independent research institutions and governments participated in 18 and 12 papers, respectively, mostly in collaboration with universities. There is still much room for improving cooperation between research institutes and administrative departments of local governments.

Citation analysis
Here, we analyse the statistics on the citations of reviewed papers as shown in Fig. 6. The citation distribution presented a power law (noting the logarithmic coordinates of y-axis in Fig. 6), i.e., a few papers contribute most of the citations (Gupta et al., 2005;Redner, 1998). The top 10 papers contributed more than 50% of citations, while the second half, the 45 least-cited papers, constituted 7.1%. The H-index of all reviewed papers is 26, which means the reviewed scientific topic is 'hot' on water environment modelling and management (Banks, 2006). Nine of the 10 most-cited papers were published before 2010. In addition, Ouyang (2005) has been cited the most among all 90 articles, up to 432 times and the citations mostly come from other fields instead of only WQMN-related papers because the author presented a comprehensive possibility of multivariate statistics. By contrast, limited by the methodology, the earliest paper in this research, Sharp (1971), has 113 citations, mostly in related fields.

Administration types and water bodies
As shown in Table 2, 78 studies considered re-/design for regulation monitoring, with far fewer studies on the other two types of use (see Fig. 1). In the reported studies, the authors did not usually emphasize the administration type unless the article was focused on emergency monitoring (Shi et al., 2018). Other network functions can be embodied in the design requirements or constraints.
Of the reviewed cases, 45 articles, i.e., 50%, designed WQMNs for the whole watershed, while 20 focused on a single river reach (Table 2), and 13 focused on lake or reservoir monitoring. A few studies were conducted on bays and estuaries. The topological characteristics of different bodies of water require different design approaches. For example, stream order approaches (Beveridge et al., 2012;Sanders et al., 1983), can be easily used to design a stream WQMN, and the kriging method is straightforward for lake WQMN design (Beveridge et al., 2012).
The majority of papers focus on the design of monitoring locations, which affects the number of stations and further impacts the final cost greatly. As stated before, advancements in monitoring technology have weakened the importance of design/optimization for monitoring frequency and monitoring items. Among data types, it is clear that water quality data are most important, whether historical data from existing WQMNs (63 cases) or simulation data from hydrological models (25 cases), as shown in Table 3. Natural conditions include all natural factors in a study area, such as climate, topography, land use, and even river structure. Social conditions include all anthropogenic factors, such as factories, population density, GDP, etc. Both natural and social conditions are mainly used with optimization methods based on multiple criteria, and the specific data types in a certain study area are determined by the criteria.

Categories of design methods
The reported design methods can be divided into two categories: direct design methods without optimization and optimization methods, as shown in Fig. 7. The former category includes five major sub-classes: topology, multivariable statistics, geostatistics, information entropy and "other". Optimization methods can be divided into two sub-classes according to the data inputs: single criterion, which usually concerns how the network represents the nature of water quality changes and requires only water quality data for optimization, and multiple criteria optimization, which refers to the social values of the water body, such as drinking water source, irrigation, etc. Optimization methods generally depend on the four fundamental direct design methods to handle original data. Besides, special and uncommon used methods are labelled as "Others", such as matter-element analysis (Chen et al., 2012), the concept of the station ratio (Keum and Kaluarachchi, 2015), etc.
The classification and study numbers are summarized in Fig. 7. Except the geostatistics approach, the proportions of the other 5 methods are similar. The sum of all numbers is greater than 90, the number of reviewed papers, due to the abovementioned compatibility of the methods.
The following presents a summary on the six sub-types of quantitative design methods referring to the fundamental theory, design process, flexibility and limitations.

Topological methods
River topology-based methods are amongst the earliest WQMN design methods proposed in the literature. The Sanders approach is a typical example (Dixon et al., 1999;Sanders and Adrian, 1978;Sanders et al., 1983). It named after Emeritus Professor Thomas G. Sanders of Colorado State University, who published a book in 1983 that was long a standard reference for monitoring programme design (Sanders et al.,  1983). The Sanders approach is derived from Sharp's sampling method in Sharp (1971), which is based on the basic topological identification of river systems by Shreve (1967). Details on the Shreve approach can easily be found in many hydrology textbooks. The concept of a centroid was used by Sharp to divide the river network into approximately equal halves, and the centroid link is simply the link whose weight (as upper tributary number) is closest to half the weight of the outlet (Sharp, 1971). That is, is weight of the centroid, M o is the weight of the outlet, M i is the weight of the ith interior link, | | is the absolute value, and [ ] is integer. Then, the first-order potential sampling station can be set in the firstorder centroid link and is usually at the downstream of the link by default. In the second-order river networks divided from the original river network, second-order centroid links and corresponding sampling stations can be found in the same way. A similar procedure is used for the remaining networks.
Sanders (Sanders et al., 1983) later modified this approach by adding pollution loadings and number of outfalls, which is equivalent to the

Table 2
The types of administration and water bodies in the cases in the reviewed papers.  Table 3 The design/optimization items and data used in the cases in the reviewed papers. number of tributaries, to the calculation of weights. In this perspective, the weight can be the sum of the length of upstream reaches or the area of the upstream basin (Dixon et al., 1999). Moreover, the pollution loadings of the whole basin can be simulated and then used as a weight to determine the centroid of the river network (Do et al. 2011(Do et al. , 2012. A notable benefit of topological approaches is that the monitoring network can be used to identify potential pollution sources effectively. As shown in Fig. 8, if a river network has a single pollution source that is detectable at the outlet, then a sequential search is carried out by locating the successive centroids of the network and sampling their outlets. The pollution source is located by noting the presence or absence of the pollutant at the successive sampling sites (#A, #B, #C, and #D) and eliminating all portions of the network where the pollutant is absent (dashed lines). Topological-based design algorithms are also easily implemented on Geographic Information Systems (GIS). Obviously, topological methods are not applicable for WQMNs on lakes, reservoirs or coastal water.

Multivariate statistics
Many multivariate statistics methods can be easily adopted for WQMN design and optimization. Principal component analysis (PCA), principal factor analysis (PFA), and clustering analysis (CA) are most commonly used (Calazans et al., 2018b;Mavukkandy et al., 2014;Varekar et al., 2016). PCA and PFA are similar multivariate statistical techniques that are widely used to identify principal/important components or factors that explain most of the variance of a system. These methods are supposed to reduce the number of variables to a small number of indices (i.e., principal components or factors) while attempting to preserve the relationships present in the original data. CA groups a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
Therefore, these multivariate statistics approaches are mainly used to remove redundant monitoring locations and unnecessary water quality indicators based on historical monitored datasets. These approaches do not work when a new network must be designed based on watershed characteristics. Varekar et al. (2016) compared the Sanders Fig. 8. Location of the pollution source in a hypothetical river network, where subplot (a) shows the weights of links according to Sharp (1971). and multivariate statistic approaches under the effect of seasonal variation and a limited water quality data scenario. FA/PCA was shown to be applicable if adequate water quality data are available, while the Sanders approach is ideal if water quality data are limited but considerable watershed information is available.

Geostatistical methods
By incorporating spatial correlations, geostatistical methods provide another useful alternative for WQMN design. Kriging and Moran's I are the two typical methods proposed (Beveridge et al., 2012;Ou et al., 2012). Geostatistical methods are also data-driven approaches that require large spatial-scale datasets. However, long-term records of water quality observations are not necessary.
Kriging (Krige, 1951;Matheron, 1963) is a widely used geospatial interpolation method that utilizes the observed data from nearby locations to predict the value of a single variable at an unmeasured location. The kriging estimator at a given point is the best linear unbiased estimator (BLUE) of mean parameters. Spatial correlation is expressed using a semivariogram, which is a graphical representation of how the similarity between values varies as a function of the spatial or temporal distance and direction. Kriging also provides an estimation of variance. Unlike variance in linear regression models, the kriging variance at an unsampled point is not a measure of the local estimation accuracy of the variable but is a useful statistic that allows for comparison of network configuration, as it is solely dependent on the overall covariance structure (which is a function of inter-station distance) and kriging weights (Deutsch and Journel, 1992).
Moran's I is an important cluster and outlier analysis method in spatial statistics. Local Moran's I was proposed by Anselin (1995) and identifies clusters of points that are similar or different from their neighbours. Therefore, it can be used to estimate the importance of sites.
For WQMN design, kriging is normally used in two ways: (1) to evaluate errors associated with the removal of sampling stations (Beveridge et al., 2012) and (2) to evaluate the variance as uncertainty with the addition of sampling stations Sabzipour et al., 2017). It can also be used for sampling frequency design. Moran's I has been used to identify clusters of redundant stations that can be removed while minimizing the loss of information, e.g., in association with the Z score (Beveridge et al., 2012).

Information entropy
Recently, information entropy-based design methods have received attention. For example, four related articles were published in 2018. Information entropy is a core concept in information theory. In hydrology, entropy is a measure of the degree of uncertainty of random hydrological processes (Singh, 2015). It is also a quantitative measure of the information content of time series.
The dispersion degree of uncertainty in a random variable X can be measured by information entropy. The larger the dispersion degree of the random variable, the greater the information entropy. The marginal entropy, H(X), can be defined as potential information of the variable and can be calculated as HðXÞ ¼ À P n i¼1 pðx i Þln pðx i Þ, where x i , i ¼ 1, 2, …, n are the values of the discrete variable X, pðx i Þ is the discrete probability of occurrence, and ln pðx i Þ is the information content if the state x i ¼ X.
The information transport index (ITI) has been widely used in WQMN design. It is a better index of dependence and is defined after normalizing transformation. ITI indicates the transfer of standardized information from one variable to another and provides a direct and effective means of assessing the dependence of two random variables (Mogheir et al., 2004). For long-term monitoring data series, information entropy is a good index to evaluate the information and redundancies in WQMNs.

Optimization methods
According to the difference in design drivers, i.e., input information for design, optimization methods proposed in studies can be divided into two classes: one group considers only the representativeness of water quality monitoring, and the other is based on multiple criteria that take natural and social conditions into account (Fig. 7). Fuzzy optimization (Ning and Chang, 2004), genetic algorithm (Icaga, 2005), artificial bee colony (P� erez et al., 2017) and other algorithms have been used in these studies.
The optimization objective for water quality data-driven approach includes minimizing errors in detected and simulated data, maximizing coverage, covering highly contaminated areas (Park et al., 2014), minimizing the detection probability for lower compliance areas, and minimizing redundant information among monitoring stations. Ning and Chang (2004) proposed many specific objectives with clear mathematic expressions. Minimizing the cost of WQMNs is also a consideration in many studies. This goal is sometimes quantified as multi-objective functions, as a constraint of functions, or as a reference for the selection of the final optimization result.
Optimization approaches can also be combined with direct design methods to set up the objective function. WQMNs for regulation monitoring require tools such as information entropy (Nikoo et al., 2016) to minimize system redundancy, while emergency-use WQMNs often consider minimizing detection time and maximizing system reliability. Tables 4 and 5 summarize optimization objectives and criteria proposed in the literature.

Flexibility of design methods
The adequacy or flexibility of the four main direct design methods is summarized in Table 6. These methods have specific advantages for suitable water bodies, network parameters, functionality and data requirements. It is important to select the design method according to the objective, requirements and available information.
In addition, many applications involving combinations of different types of design methods have been reported. For example, Ou et al. (2012) combined the geostatistical method with PCA as a pre-treatment and fuzzy optimization to design two WQ.
MNs in a lake in Canada (Memarzadeh et al., 2013). coupled dynamic factor analysis (DFA) and entropy methods to evaluate the station locations of river WQMNs. To some extent, the combination approaches facilitate the holistic design of a network, but more historical data are required, which raises the technical threshold in practice.
Optimization methods based on multiple criteria may be the best approach to design a new WQMN without any historical WQ data. There are more studies on the optimization of location than on the optimization of frequency and water quality indicators. Topology methods are recommended to help pre-process stations or sub-basins. Geostatistics methods are recommended for specific WQMNs with high spatial correlation. Statistical analysis, i.e., information entropy and multivariate statistics, of existing WQ data is only useful for developing a contraction strategy. However, hydrological models combined with optimization methods can facilitate statistical analysis to propose new stations.
Managers need to carefully select a quantitative design approach when setting up or revising a WQMN according to the flexibility of the technology.

Experiences with design network parameters
Among the WQMN design variables, location is the most important, frequency is more flexible, and water quality indicators are mostly dependent on administrative requirements and the need for localization. This section summarizes the issues on design those variables.

Design station locations
Limited financial resources require as few stations as possible, but on the other hand, the monitoring network has to cover a large enough area that the monitoring data are representative of the water body and merit interpretation and presentation. Early monitoring practices relied on manual sampling and laboratory analysis. The easy accessibility of sites was the primary consideration. Over time, advancements in monitoring techniques have allowed an increasing number of wireless sensor monitoring stations to supervise the water quality status at points of interest such as areas with point pollution sources, areas close to water intakes, or points located upstream and downstream of highly industrialized and populated areas.
A number of approaches have been proposed to select both the number and location of monitoring stations. Almost all reviewed studies are related to monitoring location design, with 84 of 90 studies related to station location design. The methods summarized in Table 4 are all appropriate for location design.
(1) Topological methods. As mentioned in Section 3.1, topological methods were the earliest semi-quantitative design method (Sharp, 1971) for monitoring locations and continue to be actively used. The most recent paper using the Sanders approach is Varekar et al. (2016). Previous work has shown that the Sharp or Sanders approach is suitable to carry out the hierarchical analysis of centroids for large-scale river basins with a number of tributaries. As a result, this approach does not work in mainstreams without tributaries, lakes/reservoirs, or dense river networks with loops. The Sanders approach has been proposed for early warning and source identification to cope with sudden water pollution. In practice, this approach has also proved useful for regulation monitoring. To design a new WQMN in a watershed, the Sanders approach can be a good choice because of its simplicity and low demand for historical water quality data. However, this approach is not recommended for the evaluation or redesign of existing WQMNs. (2) Multivariate statistics. Multivariate statistical methods are the most widely used approach for WQMN design, as shown in Fig. 7, perhaps because these methods analyse water quality data without other types of data input. It is easy to understand that this method can only be used as part of a reduction strategy. However, the adaptability to data which makes multivariate statistical methods feasible for various data scenarios and various water bodies.
(3) Geostatistics. There are relatively few cases of the use of geostatistics methods to design station locations. The basic requirement of the geostatistical method is spatial correlation of the data, which is rarely observed in networks consisting of a small number of stations. Compared with kriging, the demand of Moran's I for the number of stations is greater (Ou et al., 2012), although the calculation is less complicated. However, existing geostatistical analysis tools (i.e., ArcGIS) can effectively aid the calculation. Moreover, geostatistical methods do not require data over a certain length of time, and analyses can be performed with only one sample of data. These characteristics make geostatistical methods more feasible in areas lacking historical water quality data but also create a limitation-it is difficult for monitoring Table 4 Optimization objectives proposed in studies driven by water quality values.  Table 5 Different criteria proposed in studies for multi-criteria optimization. networks designed using this method to cope with time variations of water quality. Based on the features stated above, this approach is recommended for lakes or reservoirs for both the design and optimization of WQMNs. (4) Information entropy. It has been three decades since the application of information entropy to WQMN design was first reported by Harmancioglu and Alpaslan (1992). The general idea is to minimize the redundancy of information, namely, to remove stations that share substantial mutual entropy with others to make the system efficient. In many cases, entropy indexes such as ITI and value of information (VOI) are combined with multi-objective functions (Alameddine et al., 2013). proposed a maximum entropy-based hierarchical spatiotemporal Bayesian model. Three entropy-based criteria were used: dissolved oxygen standard violation entropy, total system entropy, and chlorophyll-a standard violation entropy. In (Nikoo et al., 2016), the entropy was calculated by using simulation data produced by the CE-QUAL-W2 model to cope with the shortage of historical data. In addition to the design of a routine regulatory WQMN, Shi et al. (2018) was the first to use ITI to design an emergency monitoring network. (5) Optimization framework. Single-criterion or multi-criteria optimization methods provide general frameworks, and the detailed objectives are summarized in Tables 4 and 5 These approaches require a number of types of data. The composition of the multiobjective function, in addition to the water quality data, usually requires various geographic data, population data, and pollution source data in the basin. The methods for quantifying these data as part of the objective function are usually different, but some common methods can be summarized as follows.
where f 1 is an objective function, C is the concentration of pollution in the station, and S is the standard value. The maximization of f 1 helps to monitor highly polluted areas (Liyanage et al., 2016;Ning and Chang 2002Park et al., 2006;P� erez et al., 2017): where D is the distance between a station and key points such as water intakes, confluence, or even roads (with the consideration of the accessibility of stations). The maximization of the objective function f 2 helps to enhance the control of important points (Bastidas et al., 2017;Liyanage et al., 2016;Ning and Chang 2002Park et al., 2006): where P is the total population near the monitoring stations, usually taking a radius of 10 km. In the evaluation of sub-basins, P can also indicate the population in the sub-basins. Maximization of f 3 helps to monitor areas of high population density as much as possible (Liyanage et al., 2016;Ning and Chang 2002P� erez et al., 2017). However, overall, different data types and quantification methods remain a problem. The weight selection of different objectives is also an important issue. Usually, after the transformation of the data by min-max or Zvalue, the weights of each objective can be considered equally by default (Chang et al., 2014;Icaga, 2005;Park et al., 2006). Furthermore, expert scoring can also be used to determine the weights (Bastidas et al., 2017;Chang and Lin, 2014b;Liyanage et al., 2016;Ning and Chang, 2002), and fuzzy theory can be used to analyse the weights (Ning and Chang, 2004).
The search space where the multi-objective function is applied, that is, all possible monitoring locations, is theoretically spread throughout the river. However, some potential monitoring stations are usually selected using the Sanders approach to reduce the computational pressure of subsequent optimization (Alilou et al., 2018;Icaga, 2005;Letternmaier et al., 1984;Park et al., 2006). There is no such problem with the evaluation of sub-basins because the number of sub-basins is usually small.
All objectives and criteria are optional depending on the management requirements and data constraints. Thus, an optimization method based on multiple criteria is a very convenient and feasible approach in different information scenarios to provide a complete solution for the design, evaluation or optimization of WQMNs.
(6) Method selection. Previous studies have provided the most experience in monitoring station design. However, how to select the best design method is unknown. Nguyen et al. (2019) summarized that the river size and extent of WQMN do not seem to influence the selection of the design method. Therefore, comparisons or combinations of different design methods are recommended. For example, a topological approach can be used for pre-allocation of the network, and then optimization approaches based on different criteria can obtain a finer solution.

Design sampling frequency
Quantitatively, the design of monitoring frequency is quite different from the design of network locations in terms of methodology. Only 22 cases refer to the design of sampling frequency (Table 3) Naddeo et al. 2007Naddeo et al. , 2013Sanders and Adrian, 1978;Vilmin et al., 2018). The first study was conducted as early as 1978 by Sanders and Adrian (1978). They used confident interval (CI) approaches to define the most suitable sampling frequencies, considering river flow as random variability. The goal is to select the appropriate frequency so that the monitoring can estimate the mean value of the water quality data within a certain confidence interval (Lo et al., 1996). The most recent work was conducted by Vilmin et al. (2018). They used a hydro-biogeochemical modelling approach to design the sampling frequencies for six major water quality indicators defined by the European Water Framework Directive in a large human-impact river. The optimal frequency depends on station location and water quality indicators.
Another typical method is information entropy, which was first used by Harmancioglu and Alpaslan (1992). In this method, the optimal sampling interval is determined after introducing the entropy concept to determine the monitoring locations. The application approach for information entropy is also very simple and straightforward -for a station, the more intensive the monitoring, the richer the water quality information that can be obtained, and the greater the marginal entropy of the time series corresponding to a water quality indicator. If the monitoring frequency is continuously reduced, the water quality information and marginal entropy will subsequently decrease, and the mutual entropy with the original full-frequency data will also decrease. That is, the impact of monitoring frequency changes on water quality information acquisition can be measured quantitatively (Karamouz et al., 2009a;Ozkul et al., 2000;Shi et al., 2018). For the time series of water quality data analysed in the information entropy method, the station number and frequency are the only two labels, and there is no substantial difference. Therefore, considering the time frequency and spatial distribution, we can find the best combination of locations and frequency to capture the water quality information (Harmancioglu and Alpaslan, 1992;Karamouz et al., 2009b). Furthermore, through the bridge of information entropy, the optimization method can also be applied to frequency design (Mahjouri and Kerachian, 2011;Maymandi et al., 2018;Pourshahabi et al., 2018).
In addition to information entropy, various statistical methods in a broad sense are used for the selection of sampling frequency. These methods can be divided into two classes: one reduces the frequency while the other increases the frequency. The basic idea of the former is to reduce the frequency until the water quality data obtained is not as representative as the original frequency scenario. This representativeness can be measured by the confidence interval (Lo et al., 1996) or a self-defined index (e.g., water pollution index (WPI) by Liu et al. (2014)). In addition, analysis of variance (ANOVA) (Guigues et al., 2013) and trend analysis (Naddeo et al. 2007(Naddeo et al. , 2013 can be used to check the representativeness. Some studies have also incorporated a water quality model for frequency design. Hunt et al. (2008) used existing monitoring data to model and analyse the trends of dissolved oxygen and chlorophyll and found that an appropriate reduction in the monitoring frequency had little effect on the statistical model accuracy.
Results for research on increasing frequency are generally qualitative. Cluster and discriminant analysis (CCDA) can cope with temporal changes in water quality and provide some shallow advice. The basic idea is to strengthen the monitoring frequency in periods with more obvious changes identified by statistical approaches, such as spring and autumn (Tanos et al., 2015) or the rainy season (Calazans et al., 2018b). Nguyen et al. (2019) provided good insights on the difference between low-frequency sampling and high-frequency sampling. A sub-daily frequency can be defined as high-frequency monitoring. For sensors, a response ranging from 15-min to 5-min intervals is the maximum frequency they can stably provide. However, few studies have focused on the design of high-frequency monitoring.

Select water quality indicators
The design of water quality indicators is a semi-structured problem and is not as complex as the design of the other two network parameters. Therefore, we use the term 'select' here, consistent with the opinion of Sanders et al. (1983). Only 8 studies refer to the selection of water quality indicators (Table 3), and 7 studies are after the year 2013. Water quality indicators thus far mainly focus on general physicochemical parameters and organic pollutant indicators. Nguyen et al. (2019) summarized the most frequently reported water quality indicators in rivers; the top five are BOD, DO, nitrate, pH, and conductivity.
The most-used approach is multivariable statistics such as PCA and PFA, where the primary purpose is to reduce the number of water quality indicators. For instance, Ouyang (2005) was the first to use PCA and PFA to evaluate 20 water quality indicators in the WQMN of the lower St. Johns River (LSJR) and identified several key water quality indicators that contributed most significantly (Calazans et al., 2018a(Calazans et al., , 2018b used CA to divide the stations into multiple groups and then used the PCA/PFA method to study the main factors and major pollutants in each group. Interestingly, Villas-Boas et al. (2017) used a nonlinear principal component analysis (NLPCA) based on an autoassociative neural network to evaluate the redundancy of water quality indicators in the Piabanha River, Brazil. Guigues et al. (2013) recognized three very different behaviours of water quality variability: indicators with high temporal variability and low spatial variability (e.g., suspended solids), indicators with high spatial variability and average temporal variability (e.g., calcium), and finally indicators with both high temporal and spatial variability (e.g., nitrate). Thus, indicators cannot be reduced beyond these three basic categories.
Existing studies are all focused on rivers or watersheds. Other water bodies, such as lakes, reservoirs, and estuaries, obviously have different characteristics of water quality changes, and more studies are needed.

Surface water quality modelling helps WQMN design
Surface water management often involves the monitoring and modelling of water quality and quantity. Water quality models can be used in areas or periods that monitoring is not feasible or accessible, as well as used to assess and predict water quality status resulting from different management strategies (Fu et al., 2019;Reis et al., 2015).
The design of monitoring network is also closely related to water quality models and modelling, as shown in Fig. 9. On one hand, WQMN outputs measurement data based on designed network parameters for calibration, validation and training the water quality models . On the other hand, water quality models in turn improve the network design.
Water quality model is able to extent monitoring data for data-scarce scenarios (Shi et al., 2018) and the model performance can be used as the optimization objectives of network representativeness and reliability. In the investigated studies, water quality models, whether physical process based models (Nikoo et al., 2016) or data-driven models (Hunt et al., 2008), are recently combined with network design methods, such as (Chen et al., 2012;Do et al., 2011;Hunt et al., 2008;Nikoo et al., 2016;Puri et al., 2017;Shi et al., 2018).

WQMN deployment helps water quality modelling
An effective water quality modelling platform needs a well deployed WQMN. In turn, the development of online multi-parametric water sensors improve the performance both of data-driven models and process-based models. A new trend is integrated modelling and smart sensors under the "Big data" paradigm. Zheng et al. (2018) provided an in-depth analysis on how the crowdsourcing data acquisition, as a largely distributed monitoring network, impacts and improves geophysical modelling. It is promising to develop a real-time monitoring and early warning system on hydrology and water quality, with a bespoke network of wireless water sensors combined with machine learning.
Data assimilation technique supported with real-time monitoring has been widely used to improve forecasting performance of process-based models (Cooper et al., 2018;Park et al., 2020). Assimilating high-frequency water quality data actualize the identification of the multiple sources of uncertainty involving model parameters, model structure (hydrology-hydraulics-water quality), future forcing (e.g. rainfall, temperature, wind speed and solar radiation), and observations, which challenge the validation and application of water quality models (Cooper et al., 2018;Kim et al., 2014). Research has shown real-time monitoring of river water quality can be used to improve the control of urban wastewater systems and thus comply downstream river water quality requirements with reduced energy consumption for wastewater treatment (Meng et al., 2017(Meng et al., , 2020. Thus, the design of WQMNs for supporting water quality modelling should be practically considered in the future research. Reis et al. (2015) proposed a data-to-information-transformation by the intergradation of monitoring and smart sensors. Chacon-Hurtado et al. (2017) demonstrated the roles of measurements in rainfall-runoff modelling and classified the model-free and model-based approaches for network design. More investigations are needed to discuss the appropriate combination of water quality models for various scenarios (Fu et al., 2019), such as lakes and reservoirs, different non-point source pollution scenarios, and different spatial-temporal scales.

Modern water quality sensors and WQMN design relative to smart cities
In situ sensor observation has become increasingly popular with the spread of Internet-of-Things applications (Reis et al., 2015). Automatic high-frequency monitoring (AHFM)-based water environment management is emerging, and a few studies in our review of the literature have discussed this trend (Horsburgh et al., 2010;J� acome et al., 2018;Nam and Aral, 2007). The design of sensor-based WQMNs involve communication, data storage, power management and other special factors. Particular concern should be paid to this topic.
Monitoring devices in smart cities tend to be miniaturized, intelligent, and multifunctional, and their portability is greatly enhanced.
These devices do not need to be fixed in one place for a long time such as a traditional monitoring station. Therefore, the optimal monitoring layout will become increasingly important, and the dynamic layout optimization of the monitoring network can be adjusted at any time.
Information collection technologies in smart cities are expanding and now include passive and active remote sensing using radars and satellites, microwave links, crowdsourcing, and citizen observatories . This unconventional information can supplement the limitations of traditional networks, and new monitoring network design methods are needed to build a unified heterogeneous sensor network (Chacon-Hurtado, 2019).
The papers reviewed here are mainly focused on the design of largescale monitoring projects, including watersheds, lakes, and bays, and the methods used perform well at this scale. In the foreseeable future, it will be difficult and unnecessary to carry out the detailed design of monitoring networks in smart cities as mentioned on a large scale, as mentioned above. Therefore, traditional in-basin monitoring projects provide large-scale background support for water quality management. For smart city construction, the development of new design frameworks that can provide refined management of urban water bodies while complementing each other is urgently needed.

Similarity and nexus with other monitoring networks
Comparison with water quality monitoring network design on other water systems is helpful. They typically involve underground water and artificial water bodies by municipal engineering, such as water supply network (Bragalli et al., 2019;He et al., 2018), drainage system (Casal-Campos et al., 2018) and channels (Chen and Han, 2018). In contrast to surface water quality monitoring, the design of groundwater monitoring networks usually places greater emphasis on identifying pollution sources (Amirabdollahian and Datta, 2013;Loaiciga et al., 1992). Considering the three-dimensional diffusion of pollutants in groundwater, the monitoring design will be more complicated. Because various types of underground data are difficult to obtain, models are also crucial for network design. There are fewer literature reports on the quantitative design of urban water distribution networks and sewer systems (He et al., 2018). Compared with open water bodies, the impact of these clear-cut systems is relatively controllable, and various statistical methods may be more suitable for designing monitoring networks (Yazdi, 2018).
The design and optimization of the hydrological monitoring network are similar to that of surface water quality, i.e., considering the acquisition of information, reducing redundant information, and reducing the uncertainty of other points; therefore they share the same methodology (Chacon-Hurtado et al., 2017). Monitoring of hydrological parameters (e.g., rainfall and stream flow) in many circumstances can be integrated with monitoring of water quality indicators. But the combined design considering their interactions is rarely reported in literature.
Nevertheless, hydrological models, including rainfall and runoff modelling, are more well-developed than water quality models, and the combination of monitoring networks and models can provide more reliable and accurate results. In addition, various modern monitoring approaches such as remote sensing, microwave, and crowdsourcing are also easier to incorporate into hydrological monitoring campaigns and can complement traditional monitoring networks (Chacon-Hurtado, 2019). As a result, traditional monitoring networks must be updated to allow the assimilation of such heterogeneous dynamic data. Similar trends are apparent for WQMNs for smart cities, as depicted in Section 5.1.

Additional relative issues
Other issues warrant specific discussion.
(1) Data scarcity in reality and associated uncertainty. The majority of studies have been conducted under the condition or assumption of sufficient data for design. However, in reality, the available data are often scarce, especially for setting up a new WQMN in ungauged water bodies (Alilou et al., 2019) noted this limitation. A combined approach coupling an analytic network process, fuzzy logic and river mixing length was proposed and finally identified the six most appropriate locations and four candidate locations in a watershed in northwest Iran. How to treat the uncertainty associated with limited data availability during design is an important question. An alternative to solve the dilemma of data scarcity is coupling with a water quality model. (2) Aftermath evaluation of re-designed WQMNs. Few studies have investigated the performance of re-designed networks. How to evaluate the performance of updated networks is an important question for good practices. (3) Adapt to more complex network architecture. Some studies have proposed station locations with different levels of priority (Alilou et al., 2019;Chang and Lin, 2014a;Chang et al., 2014), which is usually a natural output of the optimization process. Such an approach is a good way to balance financial limitations and network functions by setting up a more complex network architecture. Like computer storage hierarchy of registers in a CPU, which includes L1-L3 caches, main memory, local secondary storage, and remote secondary storage (distributed file system, Web services) (Berger, 2005), a hierarchy for WQMN is proposed in Fig. 10. This pathway can also incorporate monitoring network construction, such as phase I stations and phase II stations.

Future research directions
Based on the outcomes of this review and emerging trends in water environmental management, the following four research directions with ten specific questions or issues have been identified for the research community and professionals working on surface water WQMN construction (Fig. 11).
(1) Innovate design patterns and methods. A) Meta-analysis to find new patterns. When suitable cases are available in the literature, meta-analysis will provide new insights on the influencing factors of the network (such as water body size and the extent of human activity impacts) and the performance of design methods (Huang and Han, 2014;Zhuo et al., 2015). B) Novel design methods.
Novel design methods are still desired, particularly for water bodies linked with complex social and economy activities, e.g., WQMNs for urban receiving water connected to drainage systems. Researchers from Alibaba Business College utilized complex network theory to design a WQMN for an urban water environment (Xiang et al., 2016). This approach makes good use of topical characteristics and water quality records. (2) Embrace emerging monitoring technologies. (A) Surfing the wave of automatic high frequency monitoring. The recent rise of highfrequency monitoring has promoted innovation in water quality management (Kunz et al., 2017;Marc� e et al., 2016;Rode et al., 2016). (Marc� e et al., 2016) argued that AHFM maximizes the provision of ecosystem services by lakes and reservoirs and is conductive for reporting lake status to management agencies. It also uncovers new patterns, such as concentration relationships (Bouchez et al., 2017;Moatar et al., 2017), storm event responses (Blaen et al., 2017), and water chemistry (Kunz et al., 2017). How to optimally design different sampling frequencies and select water quality indicators under high-time resolution observations is an open question. (B) Surrogate monitoring and soft measurement. Surrogate monitoring is a peer to AHFM and has been used in water management practices (Horsburgh et al., 2010;Jones et al., 2011). It improves the variability of the selection of water quality indicators. Machine learning technology can be used in water quality monitoring by linking external variables such as rainfall, temperature, and solar radiation with observations from in situ water quality monitoring sensors to provide better water quality estimation. (C) Sensor network. The use of sensors is increasingly popular, and particular concern should be paid to this topic. Some discussion has been provided in Section 5.1. (D) Design towards manifold measurement instruments. As mentioned in Section 5.1, options for monitoring measurements are increasing. A modern monitoring network usually combines various monitoring instruments and several functions. The quantitative design of an optimal network under a complex architecture is very challenging. Some have recognized this issue in smart city construction. Chen and Han (2018) demonstrated how to construct water quality monitoring infrastructures for a smart city. effective network to meet the environmental forensics requirements of water agencies by holistically considering the monitoring location, frequency and indicators is a significant issue that remains challenging for urban water management in developing countries due to the complexity of point source release and transport processes.

Conclusion
The design of an appropriate water monitoring network is a fundamental aspect of water management, as it is the first step in providing a representative and reliable estimation of the quality of surface waters for all stakeholders. Keeping in mind that the best monitoring network is a fit-for-purpose and cost-effective one, great care should be dedicated to the process of (re)designing such a network (Guigues et al., 2013).
The following major lessons can be learned from this critical review: The quantitative design of WQMNs is currently in a stage of rapid development, and several successful methods have been proposed in the literature. Topology, multivariate statistics, geostatistics, information entropy and single-and multiple-criteria optimization are typical categories. Among the network parameters, the station location is of much investigated while studies on sampling frequency and water quality indicators are relative less. The pros and cons of these methods for difference network parameters have been summarized in this work. The chronological changes, journal distribution, authors and affiliations, citations, study areas of reviewed literatures are all present interesting and meaningful patterns.
It seems to be a tendency in developing countries to develop more WQMN construction practices from the aspect of country distribution pattern of publications. The purpose of WQMN design varies from country to country and from decade to decade due to the diversity and succession of environmental problems. Hence, studies on WQMN design will keep active for a long run accounting for the development of monitoring technology and continuous investment.
In the smart city era, surface water WQMNs present new characteristics such as dynamic, heterogeneous coupling with other urban monitoring infrastructures such as urban flood control, transportation, and security. Moreover, the spatial scale of a city is sometimes inconsistent with the natural boundaries of natural water bodies, and the time scale or observation frequency requirements for precise urban environment management may not match those of traditional WQMNs. Those challenges demand smart solutions for network design. The summarized design methods lay the foundation for success under these more complex management conditions. Large gaps in knowledge or methods imply opportunities. For example, how can the design of sampling frequency and water quality indicators be improved in the age of high-frequency water quality management? How should appropriate optimization objectives and the representativeness and stability of the network under different conditions or restrictions be defined? How should the results provided by different methods be evaluated? How should WQMNs be designed under uncertainty? New design methods are still very much needed, particularly for non-point source management, emergency monitoring, mobile monitoring, and pollution source identification.
Furthermore, the international hydrology community proposed 23 open unsolved problems in hydrology in 2019 (Bl€ oschl et al., 2019). Precisely designing a monitoring network will definitely play a fundamental role in solving these problems. There is still a long road ahead before mature, official standardized design guidelines can be issued for industrial utilization.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.