Identification of sampling points for the detection of SARS-CoV-2 in the sewage system

Graphical abstract


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses et al., 2020)) that caused the COVID-19 pandemic, resulted in the death of more than 4.5 million people until end of August of 2021, and the number of the recorded cases reached 218 million people . The virus is similar to the SARS-CoV coronavirus and it was transmitted from animals to humans by a host switch (Morens et al., 2020). The global epidemic has severe and pervasive socio-economic impacts (Nicola et al., 2020) including the direct mortality of the disease, the mortality related to depression, social isolation and unemployment (VanderWeele, 2020). In general, the pandemic posed challenges in achieving several of the sustainability goals (Wang and Huang, 2021), cities being more affected (Megahed and Ghoneim, 2020). The built environment density is in positive correlation with incident rates (Liu et al., 2021a) beside income and poverty (Sannigrahi et al., 2020). Based on mobile phone positioning data it was revealed that even after the incident rates were under control, certain regions in Beijing could not recover to their pre-COVID-19 states (Liu et al., 2021b). A decrease of primary preventive services and hospital admissions for other non-COVID related diseases was also observed partly because the patients were afraid to seek medical care during the pandemic, which might lead to an increased morbidity and mortality in the future due to the delayed start of treatment (Barach et al., 2020). The potential effect of the pandemic on the social arrangement can be compared to the effects of the 1918-1920 Great Influenza Pandemic (Barro et al., 2020). Resilience of cities against the socio-economic effects of COVID-19 pandemic and efficient control measures are associated with urban governance and level of smart city construction (Chu et al., 2021). Additionally, focus should be brought on to the preparedness of water utilities as they play an important role in tackling with the infection (Gude and Muire, 2021).
The geographical tracking of the population-level infection rate and occurrence of COVID-19 disease is one of the most important tasks of epidemiology since the geographical surveillance studies provide the information for predictive risk mapping and the consequent intervention activities. For this purpose, several techniques were suggested and tested such as the online real-or near-real-time mapping of disease (Boulos and Geraghty, 2020) by processing the data of self-reported symptoms (Menni et al., 2020) or high-resolution transaction data (Carvalho et al., 2020). The more 'orthodox' surveillance techniques rely on the medical diagnostic-based detection of the virus, although these methods are relatively expensive and source-limited, only allowing the estimation of the population-level infection and occurrence rate patterns if adequate number of tests are performed in a representative manner. The medical record itself is an unsuitable real-time epidemiological predictor for disease control because SARS-CoV-2 can spread from individuals also in the pre-symptomatic (Arons et al., 2020) and symptomatic clinical phases (Gandhi et al., 2020), beside asymptomatic infections. This circumstance is problematic in the tracking of the pandemic because 18-75% of the positive individuals can be asymptomatic (Day, 2020;Mizumoto et al., 2020;Nishiura et al., 2020); and these people also release the virus. On the other hand, the latency of the appearance of the first symptoms after the infection is estimated to be about 2-14 days by Yang et al. (2020) or 2-12 days by Lauer et al. (2020) with a median of 5 days. If pooled sampling is applied, a larger group of people can be tested by combining their samples (Hanel and Thurner, 2020) saving time (Mutesa et al., 2020) and resources (Hirotsu et al., 2020) but this still requires the collection of the individual clinical specimens.
Wastewater-based epidemiology (WBE) provides an alternative to population-level surveillance of COVID-19 pandemic by testing wastewater samples for the genetic markers of SARS-CoV-2. The technique was used successfully on several occasions previously, for example in identifying population-wide occurrence of infectious diseases caused by human norovirus (Pérez-Sautu et al., 2012), poliovirus (Asghar et al., 2014) and hepatitis A (Hellmér et al., 2014). A common feature of the substances detectable by WBE is that they are unique to human metabolism, discharged with wastewater and therefore can be linked to a community (Sims and Kasprzyk-Hordern, 2020). Though the respiratory system is the main 'target' of the SARS-CoV-2 infection (Fan et al., 2020); (Gattinoni et al., 2020), diarrhoea is also among the frequent symptoms of the COVID-19 infection appearing in 2 to 50 percentage of the cases (D'Amico et al., 2020). A reason for this is that SARS-CoV-2 uses both the transmembrane serine protease 2 (TMPRSS2) and the angiotensin-converting enzyme 2 (ACE2) for S protein priming (Hoffmann et al., 2020), both of which are also expressed in the small intestinal epithelia promoting the infection of the enterocytes and triggering the consequential enteral symptoms (Zang et al., 2020). Reports showed that the virus can be detected in stool samples (Chen et al., 2020a); (Xu et al., 2020) and the persistent shedding of SARS-CoV-2 in faeces was also confirmed (Gupta et al., 2020).
While there are several factors that influence the stability of the viral Ribonucleic acid (RNA) in wastewater and thus the reliability of the detection methods (Hart and Halden, 2020), it was shown that SARS-CoV-2 can be successfully detected from municipal wastewater Medema et al., 2020). Kumar et al. Kumar et al. (2020) proved that ORF1ab, N and S genes of SARS-CoV-2 can be discerned in the influent and the increase in the SARS-CoV-2 genetic loading co-occurred with the growing number of active COVID-19 patients. Using three Reverse Transcription-Droplet Digital Polymerase Chain Reaction (RT-ddPCR) assays (N1, N2, N3), Gonzalez et al. Gonzalez et al. (2020) showed that WBE could be used as a pre-screening tool for the early detection of small, localized outbreaks of COVID-19. Larsen and Wigginton (2020) showed that WBE serves as a cost-effective, rapid and real-time method to survey the transmission dynamics of entire communities related to such viral infections as COVID-19. It can also aid the detection of uncontrolled mutations before the symptoms would appear if continuous monitoring is carried out (Pulicharla et al., 2021). Róka et al. (2021) used composite samples taken once a week from all three wastewater treatment plants of Budapest to forecast the outbreak of the second wave in Hungary and found that the correlation is more pronounced when the number of new cases was increasing than at the plateau of the infection. WBE provides results 1-3 days prior to the tests by specimen collection (Peccia et al., 2020) at minimum. Its aptness as an early warning system becomes more prevalent when the clinical diagnostic testing capacity is insufficient (Daughton, 2020) and it provides synergistic insights when considered together with diagnostic testing (Olesen et al., 2021). It is very important that comparable, wastewater-based epidemiological surveillance can only be performed if the concentration and quantification of SARS-CoV-2 and other markers are performed by validated protocols (Polo et al., 2020) but it can also be used to validate the trends calculated from diagnosed cases (Larsen and Wigginton, 2020).
Several of the earlier mentioned research tested the influent of a wastewater treatment plant (WWTP). Kumar et al. Kumar et al. (2020) used a WWTP of 180,000 m 3 /d for case study, Gonzales et al. Gonzalez et al. (2020) tested nine major plants of Hampton Roads Sanitation District with a combined capacity of 656,000 m 3 /day. The smallest of the WWTPs in question accepts the wastewater of approximately 50.000 people. That size in Europe and specifically in Hungary is considered to be a medium sized facility. Medema et al. (2020) chose to test the treatment facilities of the airport and five other cities. Ahmed et al. (2020) covered only 36 percent of the primary health networks that had clinical prevalence data (still accounting for 700 thousand people in the two catchment areas). On the other hand, Sharif et al. (2020) utilised the previously established polio-monitoring sites, and showed that SARS-CoV-2 detection through wastewater surveillance can be applied as an early warning system in heavily populated areas where personal laboratory monitoring and door-to-door tracing could not be possible.
City-wide or catchment-based measures to control the pandemic could provide a more efficient solution than state-wide restrictions, but a district-based approach could widen the options to tackle the spread of the disease. Both location-based social network data Beria and Lunkar (2021) and data of vacation rentals (Liang et al., 2021) all showed the dynamic changing within cities as restrictions were imposed. That also implies changes in water usage. According to our assumption, in regions where the wastewater is collected from several settlements and transported to a central facility, monitoring the influence of the plant would result in low detection efficiency due to the dilution effect in case of low prevalence and thus unnecessary restrictions in unaffected areas. As Hamouda et al. Hamouda et al. (2020) pointed out, measurement at the wastewater treatment plant can only suffice for qualitative approach, for semi-quantitative or quantitative evaluation samples from different parts of the wastewater collection system have to be taken and analysed. This is underlined by the spatio-temporal analysis of confirmed cases in Tehran that identified several types of hot-spots at neighborhood scale (Lak et al., 2021).
Describing the distribution of the infected population in the cities helps planning appropriate measures on an early warning basis, as the structural description of the wastewater collection system can be used to determine the involvement in the parts of the settlement. Identifying the zones that represent residents living in that area consistently and without blind spots, has to be carried out considering several factors and requires information of the wastewater collection system, and if possible a hydraulic model of it. The aim of this study was to provide a Geographic Information System (GIS) based analytical tool to aid this purpose by evaluating the wastewater based surveillance data with regards to the structure of the wastewater collection system and thus to assess the relative incidence patterns of the COVID-19 infected population within the catchment area of the WWTP. This can provide a more detailed understanding of the hot-spots within a settlement and also to provide more tailored control measures.
Based on the literature information presented above, it can be assumed that the identification of different zones within a city can allows for proactive resource allocation during defense the defence. This includes the planning of medical care (e.g. appointment of doctors, allocation of breathing machines), but also the introduction of rules related to the use of endangered objects and facilities, such as the operation in distance learning form in schools located in areas with higher concentrations of genetic markers in wastewater. Continuous monitoring can also allow the timing of the spread of infection to be described so these control interventions can be planned even more accurately.

Methodology
The main focus of this research is to develop a decision support tool to identify monitoring points for WBE surveillance. This section presents the method that we developed for the identification of potential monitoring points in sewage networks and SARS-CoV-2 detection procedure, supported by wastewater characteristics measurements. The main steps of the method are shown in Figure 1.

Network analysis-based identification of the monitoring points
The developed methodology begins with the physical description of the sewer network. Ideally the system is equipped with smart metering devices or the necessary data of the wastewater collection system, such as layout of pipes and pumping stations and position of access points for sampling along the flow-rates, the travel time, flow direction, and diameter of each pipe is available in a digital form.
In this paper we present an alternative solution when the hydraulic model of the sewer network is not available only a map of the serviced areas. Based on the premise that the utilities and road infrastructure coincide in urban areas, the structure of the wastewater collection system can be estimated based on the publicly accessible road network processed in a GIS environment. Utilising a publicly accessible source makes the method widely applicable. In the network structure potential sampling points are proposed based on an algorithm described in this paper, to facilitate qPCR and wastewater analytical measurements to be performed in a representative manner.
The road network is available in vector format in the OpenStreetMap (OSM) database, where each road is listed as a separate object. To estimate the sewer network, the roads have to be divided into individual sections at the intersections representing the public sewer branches.
In addition to route identification, potential shaft locations also need to be modelled because sampling can be performed efficiently in these structures. In the model, the shafts are placed at the intersections to which the distance between the two points can be calculated.
Taking advantage of the fact that OSM database stores the necessary data for the buildings, the wastewater discharges were estimated and then assigned to the nodes that represent the sections. The cumulative discharge of the node is estimated from the total area of the nearest buildings in the given section: where Q i denotes the cumulative wastewater discharges of the node i (m 3 /day), m stands for the number of the closest buildings to the node i, A j refers to the area of the building j (m 2 ), s represents the specific housing area per capita (m 2 /person) and PE denotes the national population equivalent design value (m 3 /person/day).
In the case of SARS-CoV-2 the estimation of the building-based wastewater discharges are justified on the one hand by the reduced migration due to restrictive measures and on the other hand by the extraction of wastewater samples at the morning peak, thus better characterizing the population location and involvement. The wastewater collection system can be represented as a directed network where the directionality of the edges is determined by the flow directions. As we assume the optimal structure of the system, the set of the directed edges are determined by the Dijkstra algorithm Noto and Sato (2000) utilised to identify the shortest path between the nodes based on the minimisation of the distances measured on the road network.

The proposed measurement placement algorithm
The method presented in the previous section generates a directed acyclic graph (DAC) where the nodes i = 1…, N of the network are represented by the Q i wastewater discharges.
We have worked out an algorithm to optimally place M pieces of measurements. The measurements are assigned to the subsets of the nodes, so the solution of the placement problem is represented by a P = {p 1 , …, p M } set, where p m , m = 1, …, M represents the index of the node where the m-th measurement is assigned. E.g. p 3 = 2153 means that the 3rd measurement is assigned to the 2153th node.
The proposed method is based on the reachability analysis of the Fig. 1. The workflow of the COVID-19 sampling point identification network that calculates which discharges are observable from a given measurement. When R pm defines the set of nodes observable from the p m -th node (measuring point), the sum of the discharges sampled from that measurement is q pm = ∑ i∈Rp m Q i . The set of nodes (discharges) observed by the sensors can significantly overlap, e.g. it can happen that all of the nodes observed by the m-th sensor are also observed by the k-th sensor, as R pm ⊂R p k . Although this case can be considered as a redundancy, placing the k-th or m-th sensor provides additional information as the q p k − q pm difference provides information about the nodes that are in the R pk \R pm set (that represents the nodes that are observable from the k-th set and not observable from the m-th set).
The intersection of these sets segment the network. The measurements that provide information about the nodes of the segments are assigned to these segments. When x = [x 1 , …, x M ] T represents the vector of the x k variables that represent the estimated (indirectly measured) sum of discharges of the nodes of the S k segment as x k = ∑ i∈Sk Q i , the q = [q 1 , …, q m ] T the vector of the measurements can be calculated as q = Bx, where B matrix is obtained based on the structure of the network to represent which S k -th segment are observable from the measurements.
When the B matrix is designed to provide linearly independent aggregates of the segments, the estimated sum of the discharges can be estimated as: The good measurement placement provides good resolution that can be evaluated based on the maximum or the average of the x k values, so the measurement placement problem can be formulated as the minimisation of the maximum or average of the cumulative discharge of the identified segments: This optimisation problem can be solved by any meta-heuristic optimisation algorithm, like simulated annealing (Leitold et al., 2020).
To provide an approximate solution to this NP-hard optimisation problem a heuristic algorithm has been developed. The proposed method is based on the Zahn clustering algorithm (Zahn, 1971) that iteratively finds inconsistent edges, which lead to the best clustering result. In the steps of the iterative procedure the largest inconsistent edge is removed, so the procedure can be viewed as a hierarchical clustering algorithm which follows the divisive approach as the clusters are recursively divided into subclusters Vathy-Fogarassy and Abonyi (2013).
The inconsistency of the edges are measured based on the difference between the sum of the discharges that can be sampled from the related nodes: The proposed greedy algorithm finds the edges with the greatest inconsistency and places the sensors to the tail of these edges. The algorithm stops when the desired number of sensors are added or the desired resolution (measured by the maximal value of the x vector) is reached.

Wastewater analysis
The regional wastewater treatment plant of Nagykanizsa and its wastewater collection system was chosen as case-study. The capacity of the WWTP is 72,837 population equivalent, the actual number of people served was 58,856, out of which Nagykanizsa had 48,241 residents as of 2014 according to the respective government decree Government of Hungary. Since then the number of residents changed slightly (46,049 in 2020 Hungarian Central Statistical Office). There are 17 additional small villages (population between 122-1,562) connected to the wastewater collection system of the WWTP. This setup is quite typical for Hungary.
The digital map of the wastewater collection system was provided but data on the flow direction or flow rate in the pipes were not available as the hydraulic model of the wastewater collection of Nagykanizsa does not exist. Possible sampling points were first defined based on the layout of the sewer system. The water company professionals then specified the manholes where it was possible to take samples safely. Weather forecast was taken into consideration and data was collected from the Hungarian Meteorological Service website Országos Meteorológiai Szolgálat precipitation was registered for each sampling day. As the daily reports contain the precipitation data from the previous day 6 am to next day 6 am, both values were recorded.
For logistic reasons grab samples were taken, it would have been difficult to leave an automatic sampler on the side of the road for safety and security reasons. In a temporary vessel, three liters of wastewater was taken with several immersions to ensure some level of randomness in the samples. During the process all health and safety measures were adhered to.
The wastewater was then transferred into three one liter glass containers that were previously cleaned and disinfected. Samples were then transported to the laboratory in Nagykanizsa and stored at 4 ∘ C until processing could begin. One container was sent to Veszprém for wastewater characterisation, while the other two were used for qPCR measurements after concentration.
For wastewater characterisation the following components were chosen: chemical oxygen demand (COD), total nitrogen (TN), ammonium-nitrogen (NH 4 -N), nitrate-nitrogen (NO 3 -N) and total phosphorous (TP). COD measurements were performed from homogenized raw and filtered samples, according to the standard method MSZ ISO 6060:1991 (a literal translation of ISO 6060:1989).The following standards were used for measuring the concentration of the nitrogen forms: MSZ ISO 7150-1:1992 (ammonium-nitrogen) and MSZ 260-11:1971 (nitrate-nitrogen). For TN, Macherey Nagel tube tests were used. The phosphorous concentration was measured according to the standard MSZ EN 1189:1998.
Total nucleic acid was eluted from the silica particles in 300 μl of diethylpyrocarbonate-treated water (Sigma, St. Louis, MO), then the eluate was centrifuged (10 min at 18,000 × g at 25 ∘ C), and the supernatant was collected in sterile nuclease-free tubes (stored at -80 ∘ C before the molecular analysis).
Quantitative SARS-CoV-2 detection was carried out using conventional real-time (RT)-PCR system. We used the Modular SARS-CoV Egene and Modular SARS-CoV-2 RNA-dependent RNA Polymerase (RdRp) detection kits (Roche, Germany) on the MyGo Pro real-time PCR platform. The resulting quantitation cycle values (Cq) provide information on the prevalence of SARS-CoV-2 in the studied area and time period. Cycling conditions: 5 min at 55 ∘ C reverse transcription, 5 min at 95 ∘ C denaturation, followed by 45 cycles of 95 ∘ C 5 sec, 60 ∘ C 15 sec, 72 ∘ C 15 sec. The fluorescence signal was detected during the annealing step (60 ∘ C) The difference in the quantitation cycle (raw Cq values) was used to compare the concentration of SARS-CoV-2 in the wastewater samples. To confirm assay specificity, representative qPCR amplicons from positive samples were sequenced via Sager Sequencing method. To assess the sensitivity/efficacy of the complete wastewater sample preparation, prior to SARS-CoV-2 RNA detection by RT-qPCR, an inactivated virus stock was employed for calibration. SARS-CoV-2 strain, established in our laboratory (GISAID Accession number: EPI ISL 483637; TCID50 5,6x10 6) on Vero E6 kidney cells (ATCC®CRL-1586™) was used for this experiment. Inactivation was performed at 60 ∘ C for 60 min as described previously by others Pastorino et al. (2020). Complete inactivation was checked via in vitro inoculation onto Vero E6 cell monolayer. For calibration, serial dilution (100 to 0,01 µl l-1) of inactive virus particles in DMEM (Gibco, Germany) suspension was introduced into raw sewage samples, then virus concentration and nucleic acid extraction took place. The SARS-CoV-2 concentration of the nucleic acid mixtures was checked by RT-qPCR (as described above).

GIS sewer model
The reliability of the GIS sewer model can be characterized by the overlap of the road network and the real sewer network, which is shown in Figure 2. As it can be seen, the road network follows the real sewer network well, with an overlap of the two structures of 82%. The directed network identified by the shortest path algorithm is in most cases the same as the orientation of the real sewer system.
The accuracy of the population equivalent estimation algorithm was checked for three Hungarian cities, Nagykanizsa, Veszprém, Székesfehérvár, which are small urban areas according to the OECDs grouping OECD. The data and the estimated PE are summarized in Table 1. The results show that the population equivalents determined on the basis of the floor area of the buildings adequately follow the real wastewater volumes.
It is typical to wastewater treatment plants in Hungary that beside treating the wastewater of a city, but also serve agglomerations Government of Hungary along with accepting wastewater from the nearby industrial facilities. Depending on the industry and its water conservation program, the water consumption relative to the building area might be lower than in the residential area which might result in slight overestimation of the population, as it can be seen for Székesfehérvár. Nonetheless, if local domestic water consumption data is not available, this method provides with sufficiently accurate information on the approximate number of residents in the catchment area.

Wastewater samples
The SARS-CoV-2 RNA concentration in the influent of the wastewater treatment plant of Nagykanizsa was monitored regularly since October 2020. For this study, five occasions were chosen illustrate the Table 2 To connect the WBE results with direct human testing, the ± 7 day-cumulative case numbers were used at the county level (NUTS2) from the governmental information page (https://koronavirus.gov.hu), as the case numbers of Zala county are the publicly available closest approximation to Nagykanizsa. It is clear that, with county-level resolution of case numbers causal relationship with the Cq E values of a city within the county does not exist. Nonetheless, comparing the values to each other can give some insight in the spatial and temporal distribution of the disease if the effect of the environment is ruled out.
Since the city has a combined sewerage system, the precipitation has an effect on the Cq E values through diluting the samples. On 13-14th October there was a moderate rainfall, which evidently caused dilution of the virus concentration in the samples on the 14th, that can be seen from the slightly higher Cq E value despite the visible increase in the prevalence numbers. The number of deaths under this period was 382 in the whole country, the mortality rate was at that time 3.7 % considering the confirmed total cases in Hungary.
Two measurement campaigns were carried out from late October to late November to assess suitable monitoring points. To better understand the quality of the wastewater in the sewerage system the physicochemical parameters were also measured beside the SARS-CoV-2 detection (3). In the first campaign six points in Nagykanizsa were selected and sampled on three consecutive days (one of the original points were discarded, but the numbering was kept). Precipitation was registered for only 27th October but it had negligible effect (0.7 mm). The samples were analysed in Veszprém for COD, TN, TP. Ammonium was determined with a quick test to check the total nitrogen values, and nitrate was measured to make sure there is no infiltration from nitrate containing groundwater (data not shown).
In a second campaign, the sub-catchments of the outskirts of Nagykanizsa were monitored for three days along with the influent of the wastewater treatment plant (included in the prevalence analysis). The locations of the first and second measurement campaigns are shown in Fig 3. The points only represent the wastewater generated within the perimeters of Nagykanizsa.
The WBE sampling points shown in Fig. 3 were selected based on the specification of the sewerage network. The uneven distribution of the sampling points is justified by the confluence and branching characteristics of the wastewater collection system. The identification had a dual purpose, in one hand to follow the infection of the surrounding smaller settlements, and on the other hand to optimally monitor the Fig. 2. Overlap of the road network (black) and the sewer system (green)  infection changes within the city.
In the first case, the lower prevalence (county-level ± 7 day cumulative incidence 412-469) led to negative qPCR results for a few cases. Some of the samples (NK01 all, NK02/1, NK06/1) gave very high COD results and contained sediments of plant origins, suggesting that the sewer pipes were not high traffic allowing the water being stagnant. From this, it became evident that moving the sampling points to another area for safety or other reasons can lead to discrepancies in the sampling strategy.
Regression analyis was carried out on the measurement results (Table 4) to evaluate the relationship between the parameters. The filtered COD results of the second measurement campaign were also included in the evaluation. The strongest correlation was found between the homogeneous COD and TP but the COD-TN and TP-TN connections are weaker, suggesting that the wastewater is mostly domestic but its quality fluctuates even in the same measuring point. From the available data, a moderate correlation can be drawn between the filtered and homogeneous COD values, underlining the heterogeneity of municipal wastewater. Correlation between the Cq E values and the chemical parameters were also calculated, too. The results show that connection is weak between the viral particles and the concentration of the parameters, therefore they cannot be linked to each other.
Considering the characteristics of the studied wastewater can be very useful in interpreting the results of wastewater epidemiological studies, as it can be used to infer the nature of wastewater (industrial or residential wastewater). In the case of combined wastewater and rainwater collection systems we can have a better understanding about the level of dilution caused by precipitation or infiltration based on basic chemical parameters together with easily accessible meteorological data. Thus, the wastewater characterisation should be included in the methodologies as a validation of WBE methods.

Sensor placement model
Within Nagykanizsa, a series of RT-qPCR and wastewater analytical measurements were performed at 5 measurement points, the results of which are shown in Table 3. The measurement points were selected based on the developed heuristic algorithm. The methodology divides  Fig. 3. The identified measuring points in the city of Nagykanizsa based on the optimization algorithm and the national sampling strategy the settlement into zones by the optimal placement of the measurement points, which is shown in Figure 4 for the sewer network. In Figure 4, the different colors represent the zones identified by the sensors. In this case, 6 sensors are placed, one of which is the wastewater treatment plant itself. The zones and, consequently, the number of inhabitants living there can be analyzed with a freely chosen resolution using the approach we propose, for which expert-based sampling point identification is not suitable. The number of sensors (sampling points) can thus be specified in a way that can be optimized for the number of inhabitants. The algorithm assigns a point to WWTP in each case because we get information about the total infection at this point.
The sewer network model shown in Fig. 4 includes a total of 3256 nodes connected by 3212 edges. The network diameter is 102 and the average path length is 29.079. The network details, list of nodes and edges and their details are included in the supplementary.
The extent of the designated sampling points and zones can also be represented in map form based on the sewer network, which helps decision makers to interpret the results. The map overview of the Nagykanizsa case study can be seen in Figure 5.
The sensor locations (potential sampling points) identified by the algorithm are represented by red points, and the locations of the performed sampling are shown in green. The zones delimited by the potential points are represented in gray, red, cyan, magenta, blue, green, and yellow.
The sampling point NK02 completely coincides with point 1022 identified by the algorithm. The sampling point NK05 is used to monitor the gray zone delimited by the sensor 885. It is very close to sensor 135, but the inflowing wastewater comes from different areas. Measuring point NK04 provides information about the red area designated by sensor 135, while measuring point NK01 monitors the purple zone, 593th node according to the algorithm. Data for 1453 (cyan zone) and 892 (blue zone) were not measured directly. Deviation from the proposed sensor locations and the locations of the performed measurements is due practical reasons, we looked at nearby shafts and assessed whether they can be safely approached and opened for the time of sampling (traffic and maintenance were the main issues).
From Figure 4, it can be seen that the identified zones are related to each other. We use this feature to improve the SARS-CoV-2 infection resolution within the city. The existence of correlations between the potential measurement points (135, 593, 885, 892, 1022, and 1453) marked with red circles in Figure 5 is summarized in Table 5. From Table 5, it can be seen that the measurement result of the zone delimited by sensor 892 also includes the prevalence of the zone delimited by sensor 593, so the results of 593 should be taken into account when determining the involvement of zone 892. All other areas are included in the 1022 sensor measurement results. Thus, by comparing the measurement results, it is possible to identify in which parts of the cities the infection is increasing and in which parts it is less, so we can identify targeted measures. By measuring the wastewater treatment plant, the districts cannot be identified, so it is only suitable for monitoring and forecasting the infestation of the entire settlement.

Discussion
Monitoring and predicting SARS-CoV-2 infection patterns in urban areas are key tasks for which different methods were developed. There are several methods in the literature identifying infectious clusters from confirmed cases (Lak et al., 2021;Li et al., 2021;Liu et al., 2021a;Sannigrahi et al., 2020) which could serve as a base for further actions but the dynamics of the infection requires robust prediction solutions. Cities equipped with smart surveillance systems can take advantage of deep learning-based real-time object detection (Shorfuzzaman et al., 2021) not just to monitor social distancing but also to identify individuals with symptoms. Big data analysis framework on social network services was suggested by Azzaoui et al. (2021) and predicted an outbreak seven days before an upsurge of confirmed cases occurred. Others used mobile phone information of infected to backtrack their movement and thus possible infectious sites (Ghayvat et al., 2021) or hybrid machine learning and beetle antennae search approach on existing data of confirmed cases to predict later incidences (Zivkovic et al., 2021). While these methods can be of great help in smart cities or more developed regions, some of them raise privacy issues. Testing of genetic markers in wastewater is an effective tool (e.g. Bivins et al. (2020)) without prying into the lives of individuals and also does not Fig. 4. The identified communities in the sewer network -the different numbers represent the specific communities identified in the network, based on which the city zones can be delineated. rely on previously identified cases or the availability of mobile or social network data.
Though its advantages are without question, sampling is often carried out at the wastewater treatment plant, thus producing information for the entirety of its service area, including settlements in the agglomeration beside the town the plant is situated. It follows, that no information is available on the distribution of the infection within the municipality which could be seen as a drawback. On the other hand, cities and metropolises may be served by more than one facility of various capacities. For example, New York has 14 plants in range of 90,474 -1,068,012 residents served (https://www1.nyc.gov/site/dep /water/wastewater-treatment-plants.page) and Budapest, the capital of Hungary is served by three plants (PE: 426,279,1,034,475 and 1,085,061), two of which have catchment areas on both sides of the Danube and all of them accept wastewater from other neighbouring towns. Regarding the larger plants it would be difficult to identify the hot-spots of infection within the catchment area of the wastewater treatment plant and would result in slower reaction time due to the dilution effect from the non-infected regions and stricter than necessary restrictions in the areas that are not affected by the outbreak.
The application of our novel wastewater epidemiological methodology was demonstrated through the example of Nagykanizsa city, where seven different zones were demarcated with six potential measurement points. Sampling was carried out in four of these zones with six points for qPCR measurements. Wastewater analytical measurements were also performed from the samples to check the characteristic of the wastewater. This is particularly important for cities with integrated wastewater and rainwater collection systems, as the dilution effect caused by precipitation must be taken into account as it was seen in the sample taken on 14.10.2020. The dilution and the residential nature of the wastewater can be identified in the qPCR analysis if the number of genetic markers can be increased, for example by using internal positive control (Wurtzer et al., 2020) or to involve more genetic markers (Trottier et al., 2020) in the tests for each sample. There may be several reasons on why this could be difficult to achieve in an epidemic. Fortunately, the information can be inferred from analysing the chemical parameters of the wastewater that are routinely measured in the laboratories of the water companies.
Expert opinion in itself (without the correct information on the people served by a sewerage section) cannot weigh the role of a node as a sampling point in the overall network and achieve equal distribution of the sampling points. Our methodological approach takes precisely this feature into account. Thus, the coordination of the actual sampling site based on the modelled sewage network is a particularly important task, as the selection of nodes close to the potential sampling point (even neighbouring nodes) identified by the algorithm can result in significantly different spatial coverage. In other words, single-point offset in sampling can lead to drastically different results, for the elimination of which we recommend using the general wastewater characteristic measurements.
In the presented case study, the sampling point NK01 is located close to the identified node 593 delimiting a separate zone in the wastewater collection system, however, the wastewater characteristic measurements confirm that sample was taken from a stagnant part of the sewage system that collects the wastewater from only a small area. NK04 collect wastewater from a larger area, including the section NK06 covers. The quality of the wastewater is similar, except for the first day when some additional sediment got into the sampling vessels. On the other hand, NK06 provided posivite results for all three days while NK04 gave higher Cq values for two instances and came back negative for the second day. Samples were deemed positive if two parallel measurements were positive (Cq E <40). This suggests that the wastewater generated after NK06 had lower number of viral copies suggesting lower prevalence in that region.
Real-time hydraulic modelling of sewer systems increases the efficiency of WBE methods, as a very accurate picture of wastewater flows can be obtained. Sewage volume flowing in the sewer network can be estimated with high accuracy from a large number of sensors in water distribution systems (WDS)  or based on smart metering (Boyle et al., 2013) from intelligent water management systems (Nguyen et al., 2018). Using bottom-up approaches, facilitates the development of demand profiles based on smart meters, which allows for the acquisition of comprehensive water use data sets (Gurung et al., 2016). The more accurate description of the real-time hydraulics of sewer networks and the exact identification of wastewater discharges resulting more efficient early warning systems in WBE applications. Nonetheless, these information can be obtained even when these options are not available. Nourinejad et al. Nourinejad et al. (2020) developed a Bayesian probability-based sensor placement algorithm where ideal placement is identified based on the modelled manholes, which is suitable for identifying the locations of infected communities . The approach we propose, identifies the source of wastewater on the basis of buildings, the reliability of which we have demonstrated for three cities. Remote-sensing-based characterization of urban ecosystems is a commonly used method (Heiden et al., 2012) and our results showed that there is a substantial overlap between the map of the wastewater collection system and the road network. The use of publicly available maps and satellite images can help differentiate different building types to improve the estimation if necessary.
By increasing the number of measurement points, the settlement can be divided into several smaller zones, based on which the temporal development of the infection can be monitored more accurately. Figure 6 shows the zones of the settlement in the sewer network in the case of 24 sampling points. Figure 6 shows two things: on the one hand, the size of the designated zones will be smaller, and on the other hand, the number of elements of the intersection table between the identified zones will increase from the previous 36 (6⋅6) ( Table 5) to 576 (24⋅24). If more optimally designated sampling points are used, it is possible to identify more accurately the origin of the infection in the cities.
The number of sampling points depends mainly on measures to prevent the spread of infection and the scale of the analysed settlement. If the location of particularly vulnerable institutions, such as schools, hospitals or nursing homes, is known, the zones delimited by the sampling points should follow the location of the objects.
A sensitivity analysis of the zoning available in the territorial coverage was also prepared for the presented settlement, which is illustrated in Figure 7, expressed as a percentage of the nodes of the whole settlement size.
It can be seen in Figure 7 that the areas of the zones delimited by the measurement points are not evenly distributed. Thus, the number of proposed sampling points is decisively influenced by the size of the settlement and the number of inhabitants.
As the methodology we propose can also be used to identify neighborhoods (zones), the interventions can be implemented in a more costeffective way (Randazzo et al., 2020). The actuality of our method is justified by the facts that SARS-CoV-2 RNA not only can consistently be detected in the sewage of those cities where the infection occur, but viral RNA concentrations increase rapidly after the ascent in the number of declared cases in an area (Randazzo et al., 2020). Fluctuations in population normalized SARS-CoV-2 concentrations in the sewage can follow the epidemiological trends in a detectable level (Gonzalez et al., 2020).
The importance of having detailed knowledge on the wastewater collection network can easily be understood if we consider that the viral RNA detection power is estimated at least equal to 1 case out of 388-822 inhabitants according to Baldovin et al. Baldovin et al. (2021). Considering the given properties of the wastewater monitoring and SARS-CoV-2 infection, such as the occasional lack of fecal shedding of confirmed patients (Xu et al., 2020), (Chen et al., 2020b), , the varying shedding rate (Zhang et al., 2020a), (Wölfel et al., 2020), the difference in daily water consumption (Hart and Halden, 2020) etc., the theoretical range of feasible detection is between 1 infected in 100-2,000,000 people (Hart and Halden, 2020), (Baldovin et al., 2021), (Hata et al., 2020). Based on the above-listed factors, the overall sensitivity of the detection can influence the selection of sampling points, i.e., the divided inhabitants, though population density, spatial distribution, and high-risk facilities should be the leading motives. If our knowledge of the sewage system is rough, the relatively high sensitivity cannot be realized in the planning of mitigation actions, hence the wide range of feasible detection mentioned above. While the failure to monitor with sufficient sensitivity can result in significant social and economic expenditures, the cost of wastewater epidemiology was valued at only 0.005-0.10 USD per capita by Weidhaas et al. Weidhaas et al. (2021).
Another important outcome of this study is that the presented method provides an adequate tool for the good practice of the selection of sampling points. This is crucial both for cost-effectiveness and for the rapid evaluation of results. Because the half-life of the virus in wastewater is a critical point in the evaluation of the gained viral RNA concentration data (Nghiem et al., 2020), the exact identification of the sampling points is crucial as it provides the option to parameterise the degradation rate of viral RNA in the models.

Conclusions
It can be concluded that the developed method is 1) capable of modelling the sewage system using publicly available data on urban infrastructure, 2) can analyze the optimal number and location of sampling points creating cost-efficient surveillance circumstances and 3) the applicability of the method was demonstrated in a small sized city Nagykanizsa with 58,856 inhabitants, where six sampling points have been identified based on the results of the optimization algorithm. The developed heuristic modelling approach can be used in countries regardless of their gross national income per capita and whether the structure of the wastewater network is available in a detailed level.It is important to point out that the developed method allows the designation of sampling points even in cases when the sewer network is unknown or difficult to access. In one hand, this method significantly facilitates the designation of appropriate sampling points, decreases costs without reducing the representativeness of the measurements and, last but not least, allows measurements to be made in areas where wastewater network documentation does not exist or is incomplete. Advanced monitoring activities based on the wastewater collection system contribute to sustainable cities and communities (SDG11), good health and well-being (SDG3), industry, innovation and infrastructure (SDG9) and clean water and sanitation (SDG6) UN goals, because the sustainability is a strongly interconnected complex system. The applicability of the presented methodology development in medium-sized and large cities requires further investigation and the exploration of the similarities and differences in the network structures would be an exciting contribution in Wastewater-based epidemiology.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.