Long-term SARS-CoV-2 surveillance in wastewater and estimation of COVID-19 cases: An application of wastewater-based epidemiology


 The role of wastewater-based epidemiology (WBE), a powerful tool to complement clinical surveillance, has increased as many grassroots-level facilities, such as municipalities and cities, are actively involved in wastewater monitoring, and the clinical testing of coronavirus disease 2019 (COVID-19) is downscaled widely. This study aimed to conduct long-term wastewater surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Yamanashi Prefecture, Japan, using one-step reverse transcription-quantitative polymerase chain reaction (RT-qPCR) assay and estimate COVID-19 cases using a cubic regression model that is simple to implement. Influent wastewater samples (n = 132) from a wastewater treatment plant were collected normally once weekly between September 2020 and January 2022 and twice weekly between February and August 2022. Viruses in wastewater samples (40 mL) were concentrated by the polyethylene glycol precipitation method, followed by RNA extraction and RT-qPCR. The K-6-fold cross-validation method was used to select the appropriate data type (SARS-CoV-2 RNA concentration and COVID-19 cases) suitable for the final model run. SARS-CoV-2 RNA was successfully detected in 67 % (88 of 132) of the samples tested during the whole surveillance period, 37 % (24 of 65) and 96 % (64 of 67) of the samples collected before and during 2022, respectively, with concentrations ranging from 3.5 to 6.3 log10 copies/L. This study applied a nonnormalized SARS-CoV-2 RNA concentration and nonstandardized data for running the final 14-day (1 to 14 days) offset models to estimate weekly average COVID-19 cases. Comparing the parameters used for a model evaluation, the best model showed that COVID-19 cases lagged 3 days behind the SARS-CoV-2 RNA concentration in wastewater samples during the Omicron variant phase (year 2022). Finally, 3- and 7-day offset models successfully predicted the trend of COVID-19 cases from September 2022 until February 2023, indicating the applicability of WBE as an early warning tool.


• Weekly average cases lagged 3-to 7-days behind virus concentration in wastewater.

Introduction
Wastewater-based epidemiology (WBE) is a novel approach for quantifying chemical and biological markers in wastewater and further using this information to calculate several public health estimates (Sims and Kasprzyk-Hordern, 2020).WBE links the environmental surveillance of wastewater and the community's health information (Daughton and Jones-Lepp, 2001).In the past, wastewater surveillances were conducted to detect polioviruses and identify high-risk areas under the Global Polio Eradication Initiatives of the World Health Organization (WHO) (Asghar et al., 2014).Similar investigations have been carried out for making early warnings of hepatitis A virus and norovirus outbreaks (Hellmér et al., 2014).With the onset of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, wastewater surveillance sped up, and several studies matched the viral RNA concentration in wastewater with the reported coronavirus disease 2019 (COVID-19) cases (Haramoto et al., 2020;Sherchan et al., 2020) in this regard.
Gradually, numerous WBE studies applied correlation analysis to establish the relationship between SARS-CoV-2 RNA concentration in wastewater with COVID-19 prevalence (Medema et al., 2020) and morbidity (Bar-Or et al., 2021) and show the precedence of viral RNA detection by 2 to 4 days of clinical test results (Nemudryi et al., 2020).While other studies used time-series models to identify the relationship of the viral RNA signal in wastewater with local hospitalization rates (Peccia et al., 2020) or inform the 48-h lead time of viral RNA signal in wastewater than new COVID-19 cases in the community (D'Aoust et al., 2021).Community COVID-19 prevalence has also been estimated (Ahmed et al., 2020;Saththasivam et al., 2021) using mass-balance approaches.Numerous studies have used data modeling concepts to interpret wastewater monitoring results, understand COVID-19 epidemiological dynamics, and forecast several epidemiological parameters.A mathematical model capable of predicting epidemiological parameters, such as incidence and prevalence, from the wastewater signal is valuable in pandemic preparedness and management (Aberi et al., 2021).Such models include regression models incorporating time-series correlation structure through an autoregressive model (Stadler et al., 2020), multivariate time-series models, such as vector autoregression model (Cao and Francis, 2021), multiple linear regression model and adaptive neuro-fuzzy inference system model (Li et al., 2021), artificial neural network (ANN) model (Li et al., 2021;Jiang et al., 2022;Zhu et al., 2022), deep neural network (DNN) (Ai et al., 2022), generalized additive model (GAM), support vector regression, and multilayer perception model (Aberi et al., 2021).
Data-driven models or artificial intelligence techniques, such as ANN and DNN, have a huge potential with the current limited knowledge and many uncertainties to achieve the WBE back-estimation of COVID-19 cases (Jiang et al., 2022).However, a large complexity is involved while utilizing these intricate models to translate the wastewater viral RNA signal to epidemiological parameters.Because WBE is a simple and complementary approach to clinical surveillance for mitigating the COVID-19 pandemic and future outbreaks (Hart and Halden, 2020), using complex models requiring mathematical intelligence may hinder their acceptance as a complementary tool.Simple models, such as simple regression models, have accurately predicted COVID-19 cases (Vallejo et al., 2022), and the polynomial regression model has outperformed complex models like GAM (Aberi et al., 2021).The linear regression model is considered better than other models because of its simplicity and low prediction error (Vallejo et al., 2022).In addition, the inclusion of too many variables was identified as a contributing factor to poor performance (Vaughan et al., 2022).
Moreover, the number of new weekly reported cases has either declined or become stable across all WHO regions largely because countries are changing their COVID-19 strategies, resulting in a smaller number of testing and consequently a lower number of detected cases (WHO, 2022).For example, Japan has decided to scale down the reporting of COVID-19 cases to ease the burden on health facilities.Instead, the Japanese Government has launched grassroots-level pilot projects of wastewater surveillance involving municipalities, nursing homes, and other facilities to track COVID-19 commencing in July 2022 and ending in January 2023.
Nationwide wastewater surveillance has been adopted by >50 countries, mainly the United States (CDC, 2021) and European countries (European Commission, 2021).Hence, simple and easily applicable models are more appropriate for using wastewater surveillance data by grassroots-level facilities for preparing public health estimates.
Long-term wastewater surveillance is more reliable than short-term monitoring for forecasting COVID-19 cases (Cao and Francis, 2021).In addition, high sampling frequency and large training sets for modeling contribute to the models' accuracy (Vaughan et al., 2022).With this background, this study aimed to conduct long-term surveillance of SARS-CoV-2 in wastewater in Yamanashi Prefecture, Japan, and develop a simplistic model for forecasting COVID-19 cases in the community.Furthermore, this study investigated the comparative predictive ability of models using nonnormalized and fecal marker-normalized SARS-CoV-2 RNA concentration data.In addition, this study revealed the complications of including different waves of COVID-19 in a single model.

Collection of wastewater samples
In total, 132 grab influent wastewater samples were collected normally once weekly from September 2020 to January 2022 and twice weekly from February to August 2022 from a wastewater treatment plant (WWTP) in Yamanashi Prefecture.Wastewater samples were collected in a sterile 1 L plastic bottle and immediately transported to the laboratory in a cool bag containing ice packs and processed within 2 to 4 h upon arrival in the laboratory.The concentrated samples were kept at −25 °C before RNA extraction.

Enumeration of Escherichia coli
A culture-based method using a CHROMagar ECC (Kanto Chemical, Tokyo, Japan) was applied to enumerate the E. coli concentration in wastewater samples according to the manufacturer's instructions.Petri plates were incubated at 37 °C for 24 h, the number of blue colonies was counted, and E. coli concentration was calculated using colony-forming units (CFU)/mL.

Virus concentration, RNA extraction, and reverse transcription
Upon arrival at the laboratory, wastewater samples were subjected to the polyethylene glycol precipitation method for virus concentration, as described previously (Malla et al., 2022).Briefly, in a tube containing a 40 mL wastewater sample, 4.0 g polyethylene glycol 8000 (Sigma-Aldrich, St. Louis, MO, USA) and 0.94 g NaCl (Kanto Chemical) were added and mixed for 10 min at room temperature, followed by a centrifugation step at 12,000 ×g for 99 min at 4 °C.Subsequently, the supernatant was discarded, and a ~5 mL sample was left for further centrifugation at 12,000 ×g for 5 min at 4 °C to remove the supernatant completely.Finally, the pellet was recovered in 800 μL polymerase chain reaction (PCR)-grade water (Sigma-Aldrich) or autoclaved MilliQ water (Merck, Rahway, NJ, USA) to obtain a virus concentrate, and the volume was recorded.

Sample process control
One microliter of a mixture of F-specific RNA coliphage MS2 (ATCC 15597-B1; American Type Culture Collection, Manassas, VA, USA) and Pseudomonas bacteriophage Φ6 (NBRC 105899; National Institute of Technology and Evaluation, Tokyo, Japan) was added to 140 μL virus concentrate and PCR-grade water [i.e., a noninhibitory control (NIC) sample] as molecular process controls (MPCs), as recommended previously (Haramoto et al., 2018).However, the extraction-reverse transcriptionquantitative PCR (RT-qPCR) efficiency of only Φ6 phage, a surrogate of enveloped viruses, was calculated.Viral RNA was extracted from a 140 μL concentrated sample using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany) in a QIAcube platform (Qiagen) according to the manufacturer's instructions to obtain 60 μL RNA extract.Reverse transcription was performed in a TaKaRa PCR Thermal Cycler Dice Touch (Takara Bio, Kusatsu, Japan) only for the samples collected until September 2021 to analyze pepper mild mottle virus (PMMoV) (Zhang et al., 2006;Haramoto et al., 2013) and Φ6 phage assays (Gendron et al., 2010) using the High-Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific, Waltham, MA, USA) to obtain a 20 μL cDNA from a 10 μL viral RNA, according to the manufacturer's protocol.
The concentration ratio of cDNA in a sample qPCR tube to that in an NIC tube was used to calculate the extraction-RT-qPCR efficiency.
The calculated extraction-RT-qPCR efficiency of the MPC (37.8 ± 50.2 %; n = 103) indicated that there was no substantial viral RNA loss and/or inhibition in the water samples during RNA extraction and RT-qPCR.

Quantification of viral genomes by RT-qPCR
One-step RT-qPCR was performed for all wastewater samples for the SARS-CoV-2 assay.In contrast, for the PMMoV and Φ6 phage assays, twostep RT-qPCR was performed for the samples collected until September 2021 and one-step RT-qPCR for the samples collected from October 2021 onward.A commercial one-step RT-qPCR kit, SARS-CoV-2 Detection RT-qPCR Kit for Wastewater (Takara Bio), was used for the detection of SARS-CoV-2, PMMoV, and Φ6 phage, in which CDC-N1N2, PMMoV, and Φ6 phage assays were labeled with Cy5, FAM, and HEX probes, respectively, as mentioned previously (Angga et al., 2023).Briefly, each 25.0 μL one-step RT-qPCR mixture consisted of 12.5 μL One-Step RT-qPCR Mix (2×), 2.5 μL primers, and probe of SARS-CoV-2 or PMMoV and Φ6 phage combined in a single tube (10×), 5.0 μL RNase-free water, and 5.0 μL viral RNA extract.In the twostep RT-qPCR, each 25.0 μL qPCR mixture consisted of 12.5 μL Probe qPCR Mix with UNG (2×; Takara Bio), 0.1 μL each of forward and reverse primers (100 μM), 0.05 μL probe (100 μM), 9.75 μL PCR-grade water, and 2.5 μL cDNA template.qPCR runs were performed in a Thermal Cycler Dice Real Time System III (Takara Bio).Thermal conditions for one-step RT-qPCR were as follows: 25 °C for 10 min, 52 °C for 5 min, 95 °C for 10 s, and 45 cycles at 95 °C for 5 s and 60 °C for 30 s. Thermal conditions for two-step RT-qPCR were as follows: 25 °C for 10 min, 95 °C for 30 s, followed by 45 cycles of denaturation at 95 °C for 5 s and primer annealing and extension reaction at 60 °C for 60 s (PMMoV) or at 60 °C for 30 s for Φ6 phage.
Six serially 10-fold diluted gBlocks (Integrated DNA Technologies, Coralville, IA, USA) or Positive Control DNA (concentration from 5.0 × 10 0 to 5.0 × 10 5 copies/reaction) provided in the SARS-CoV-2 Detection RT-qPCR Kit for Wastewater were included in each RT-qPCR run to obtain a standard curve.Negative control was included in every qPCR run, and the samples and positive and negative controls were used in duplicate.Threshold cycle (Ct) values above 40 were considered negative for SARS-CoV-2 and Φ6 phage assays, whereas Ct values were set at 35 for the PMMoV assay.
2.6.Modeling procedure 2.6.1.Data preparation Cubic regression (Eq.( 1)), the third-degree polynomial regression, was used to describe the relationship between SARS-CoV-2 RNA signal in wastewater and clinically reported COVID-19 cases.A standardized modeling procedure was applied for implementing and analyzing the cubic regression model, as described in the flowchart (Fig. 1).Several studies have normalized raw SARS-CoV-2 RNA concentration data with E. coli or PMMoV to adjust wastewater flow and fecal load (D'Aoust et al., 2021;Wu et al., 2020).Modeling studies standardize their variables to bring them into the same scale (Aberi et al., 2021).Both data sets were standardized in this study, as shown in the flowchart (Fig. 1).Data preprocessing and preparation are important to improve the model's performance and accuracy (Aberi et al., 2021).In this study, final data for modeling were selected between nonnormalized (raw) and normalized viral RNA concentration data and between standardized and nonstandardized viral RNA concentration and COVID-19 cases data using a K-fold cross-validation approach.Data between September 16, 2020, andAugust 31, 2022 (n = 132) were randomly divided into six groups (K-6-fold).Further, one group was held at a time as a "test" set, whereas the remaining five groups were combined to make a "training" set, and a model was calibrated.The calibrated model was then validated on the "test" set.This process was iterated six times.The evaluation parameters of the models were Pearson correlation coefficient, R, and average root mean square error (RMSE) between the actual and predicted values.RMSE and R averages were calculated from the six models.The K-6-fold cross-validation approach was applied separately for normalized and nonnormalized data and standardized and nonstandardized data.The data type with the lowest average RMSE and higher R was selected to run the final model on the whole data.
where y is the dependent variable (COVID-19 cases), x is the independent variable (viral RNA concentration), and ɛ is random error.

Offset models
Several studies reported that clinical COVID-19 cases lag the viral RNA signal in wastewater (Medema et al., 2020;D'Aoust et al., 2021;Li et al., 2021).Thus, this study evaluated the prediction performance of the 14 offset models where the relationship between viral RNA signal in wastewater and reported clinical cases was offset by 1 to 14 days, with clinical cases lagging viral RNA signals.These models were run using the whole data set (from September 16, 2020 to August 31, 2022).The prediction performance of the models was based on RMSE and R between actual and predicted values.The model with the lowest RMSE was selected as the best model.For the final models, a threshold value of R was set at 0.7.The best model and the other betterperforming models were selected for predicting COVID-19 cases from September 2022 until February 2023 as an application of developed models in real situations.

Statistical analysis
The independent t-test was used to compare the SARS-CoV-2 RNA concentration between influent samples collected from September 2020 to December 2021 and those from January to August 2022.Microsoft Office Excel 2019 (Microsoft Corporation, Redmond, WA, USA) was used to perform statistical analysis with a significance level of 0.05 and modeling.

Data preparation and selection of data set type for the final model run
This study used cubic regression to predict COVID-19 cases from wastewater viral RNA concentration data.The models used in this study were offset models (1-14 days) with clinical cases lagging viral RNA signals.The best data type of SARS-CoV-2 RNA concentration and COVID-19 cases were identified to be used in the final model by the K-fold cross-validation approach.SARS-CoV-2 RNA concentration data types considered for the crossvalidation were (a) nonnormalized (raw), (b) E. coli normalized, and (c) PMMoV-normalized SARS-CoV-2 RNA concentrations, which were further categorized as (i) standardized and (ii) nonstandardized.Likewise, COVID-19 case data types were (a) standardized and (b) nonstandardized.The data type with the lowest average RMSE and high R was selected.
Table 1 summarizes the results of the K-6-fold cross-validation method.The RMSE and R values were the average values of six models for each data type.The average RMSE values of the offset models using nonnormalized SARS-CoV-2 RNA concentration data were lower than those of the Fig. 2. Long-term monitoring of SARS-CoV-2 RNA in a WWTP in Yamanashi Prefecture from September 2020 to August 2022.corresponding offset models using E. coli-and PMMoV-normalized viral RNA concentration data for standardized and nonstandardized forms.The average R values for all models using standardized and nonstandardized data types were equal, indicating equal performance regardless of the standardization of data sets.However, data interpretation would be much easier when using a nonstandardized data set.Hence, for the final offset models, selected data types were nonstandardized and nonnormalized SARS-CoV-2 RNA concentration and nonstandardized daily COVID-19 cases.

Offset models comparison
Using nonnormalized SARS-CoV-2 RNA concentration and daily COVID-19 cases data types, the final 14 offset models (with clinical cases lagging in viral RNA signals) were run using data for the whole surveillance period.Table 2 shows the heat map of the model evaluation parameters, RMSE and R, of the 14 offset models.Among the models, the 4-day offset model had the lowest RMSE, and R was 0.75.The model showed that the wastewater viral RNA signal led the COVID-19 cases by 4 days; thus, SARS-CoV-2 RNA concentration in wastewater can predict the cases 4 days ahead.
The surge in COVID-19 cases started in the prefecture at the beginning of 2022, coinciding with the Omicron variant's spread in Japan.When considering year 2022 as the Omicron variant phase, the mean daily COVID-19 cases during the Omicron variant phase (January-August 2022) was 85.5 ± 95.8 cases (n = 243) and that before the Omicron variant phase (September 2020-December 2021) was 3.1 ± 5.4 cases (n = 472).Considering the significant difference between the average cases per day between the two phases (P < 0.05, independent t-test), it was decided to run separate offset models for these two phases.Interestingly, among the 14 offset models, the 4-day offset model had the lowest RMSE before the Omicron variant phase (RMSE = 3.88, R = 0.59) and during the Omicron variant phase (RMSE = 62.2, R = 0.64).Because the R between actual and predicted COVID-19 cases was quite low for the best models for both phases, the predictive performances of these models were expected to be low.
Reporting of COVID-19 clinical cases suffers from a weekly rhythm/ weekend effect in many countries, with underreporting on some days of the week and overreporting on others (Alvarez et al., 2020).Such administrative noise can mislead trend estimation (Alvarez et al., 2020).Therefore, to adjust the weekly rhythm and average COVID-19 cases and match them with the cases, the weekly average SARS-CoV-2 RNA concentration was calculated for the sample collection days.Then, weekly average data were used for the 14 offset models before and during the Omicron variant phases.Following the changes in the modeling approach, 1-and 2-day offset models had the lowest RMSE (both 3.75) and highest R (0.64) values before the Omicron variant phase.In contrast, during the Omicron variant phase, the 3-day offset model had the lowest RMSE value (53.4) among the 14 offset models, and R was 0.82.The correlation between actual and predicted COVID-19 cases for both phases improved when weekly average data were used instead of daily data.However, the predictive capability of the best models before the Omicron variant phase was expected to be poor because of the low R-value (0.64).Hence, this study focused only on models during the Omicron variant phase.Based on the 3-day offset model (the best model), the wastewater viral RNA signal led the weekly average COVID-19 cases by 3 days; thus, the weekly average SARS-CoV-2 RNA concentration in wastewater can predict weekly average cases 3 days ahead.
The RMSE (R) values of 3-, 4-, 5-, 6-, and 7-day offset models using weekly average COVID-19 cases and weekly average SARS-CoV-2 RNA .2(0.76), and 60.0 (0.74), respectively.Although the 3-day offset model outperformed the other offset models, the R values of 4-to 7-day offset models were higher than the threshold value of 0.7 with slightly higher prediction error than the best model.Fig. 3 shows the smoothed curves of actual and predicted weekly average COVID-19 cases for 3-and 7-day offset models.The exponential smoothing method was applied.The predicted curves in both models showed good correspondence with the actual curves in the figure.
3.4.Application of selected models for predicting COVID-19 cases from September 2022 until February 2023 The 3-and 7-day offset models were selected to predict the unseen weekly average COVID-19 cases from September 2022 until February 2023.As shown in Fig. 4, although the observed and predicted COVID-19 cases matched only for a few instances, both models presented their capability to predict the trend of weekly average cases for the whole 6-month period.Both offset models predicted the decreasing trend of weekly average COVID-19 cases from the beginning of September 2022 until mid-October 2022.Then, the predicted and observed cases increased until middle of November 2022, after which both curves flattened till middle of January 2023 before attaining decreasing trend until the end of February 2023.This study showed that using only the SARS-CoV-2 RNA concentration in wastewater can predict the trend of weekly average COVID-19 cases 3 to 7 days ahead.

Discussion
In this study, a WWTP in Yamanashi Prefecture was monitored for the presence of SARS-CoV-2 RNA from September 16, 2020, to August 31, 2022, and a simplistic model for predicting COVID-19 cases was developed.For the whole surveillance period, the detection ratio of SARS-CoV-2 RNA was 67 %.The detection ratio was 37 % until the end of 2021, but it was 96 % for the remaining 67 samples collected in 2022.This increase in the detection ratio during 2022 is reflected by the significantly higher COVID-19 cases between January and August 2022 than between September 2020 and December 2021 in the WWTP catchment area.Japan reported the first case of a highly transmissible Omicron variant in late November 2021 (Maruki et al., 2022).However, in Yamanashi Prefecture, the Omicron variant rapidly spread from January 2022 (Hirotsu et al., 2022).Hence, the higher SARS-CoV-2 RNA detection ratio during 2022 could be because of the spread of the Omicron variant in the prefecture.
This study applied the K-fold cross-validation method for data preparation for the final model.Data preparation included selection between nonnormalized and PMMoV-and E. coli-normalized SARS-CoV-2 RNA concentrations and between standardized and nonstandardized data.PMMoV is commonly used as a biomarker to normalize SARS-CoV-2 RNA concentration to account for variations in wastewater flow and fecal matter amount (D'Aoust et al., 2021;Wu et al., 2020).However, in this study, the performances were higher for models using nonnormalized SARS-CoV-2 RNA concentration than models using PMMoV-and E. coli-normalized SARS-CoV-2 RNA concentration (Table 1).Previous studies reported that normalizing SARS-CoV-2 RNA concentration by PMMoV reduced or did not improve the correlation with COVID-19 cases (Feng et al., 2021;Vadde et al., 2022;Zheng et al., 2022;Schill et al., 2023).In line with the results of reduced predictive ability of the PMMoV-and E. coli-normalized data in the present study, using the normalized data neither enhanced the models (Ai et al., 2021) nor improved the correlation efficiency and model fitting (Ando et al., 2023) in previous studies.The possible reason reported was higher variation in samples than in the fecal loads (Graham et al., 2021).Some argued for the effectiveness of normalization in longterm monitoring (Amoah et al., 2022).Others believed that normalization of SARS-CoV-2 RNA concentration with fecal markers is unnecessary for WWTP influent samples (Feng et al., 2021).
Modeling studies usually standardize the variables to put different variables on the same scale (Sharma and Singh, 2019;Aberi et al., 2021).In this study, correlation coefficients between actual and predicted cases from all tested models were equal for standardized and nonstandardized data (Table 1).Hence, considering the easy interpretation of the estimates, we selected nonstandardized data for running the final models.Finally, of the different data types tested, nonnormalized and nonstandardized data were selected for the final model run.
Among the 14 offset models prepared using data from the whole wastewater surveillance period, the 4-day offset was evaluated to be the best model when evaluated with RMSE and R between actual and predicted COVID-19 cases.However, considering the significant difference in daily COVID-19 cases before and during the Omicron variant phases, two sets of offset models were prepared for the two phases.Unfortunately, correlation coefficients were poor for the best models of both phases.After obtaining these results, the weekly rhythm of reporting COVID-19 cases was accounted for using the weekly average of COVID-19 cases.RMSE and R values of the 14 offset models of both phases improved afterward, indicating that the weekly average of COVID-19 cases could be a good epidemiological parameter for modeling studies.
The 3-day offset model performed best among the 14 offset models during the Omicron variant phase with RMSE and R of 53.4 and 0.82, respectively.Before the Omicron variant phase, the best models were 1and 2-day offset models with R of only 0.64; hence, these models were expected to have poor predictive ability.This difference in model performances between the two phases could be attributed to significantly higher weekly average COVID-19 cases during the Omicron variant phase (82 cases) than before the Omicron variant phase (3 cases).The performance of the 3-day offset model during the Omicron variant phase was higher (R = 0.82) than that of the best model prepared using whole surveillance data (R = 0.75), indicating that the predictive performance of a model could be compromised if two different COVID-19 waves are included in a single model, as the relationship between wastewater and cases varies with the emergence of new variants (Frampton et al., 2021).In addition, results before the Omicron variant phase could indicate that estimations from SARS-CoV-2 RNA concentration could be less reliable at low disease prevalence.In such cases, detection frequency or positive rate instead of viral RNA concentration data could be used to model relevant epidemiological metrics (Zhu et al., 2022).
The weekly average SARS-CoV-2 RNA concentration data during the Omicron variant phase fitted best on the 3-day offset model.Thus, this study observed a 3-day lead time in wastewater virus concentration over the weekly average COVID-19 cases.The 3-day offset model and other models (4-, 5-, 6-, and 7-day offset models) with considerably high R successfully predicted the trend of COVID-19 cases for September 2022 and January 2023 (Fig. 4).The forecasting result of this study, a 3-day lead time in viral RNA concentration, is in line with the results of studies that forecasted new cases on the 2nd or 4th day or 2 to 7 days ahead of clinical reporting (Aberi et al., 2021;Jiang et al., 2022;Li et al., 2021).A 3-day lead time in wastewater SARS-CoV-2 RNA concentration and successful prediction of the trend of COVID-19 cases 3 to 7 days ahead illustrated the potential of wastewater surveillance as an early warning tool.
Wastewater surveillance is a cost-effective tool for mass screening as the wastewater sample is a pooled sample representing the corresponding community (Daughton and Jones-Lepp, 2001;Hart and Halden, 2020;Kitajima et al., 2020;Shrestha et al., 2021).Results of this study provided evidence that, in combination with simplistic predictive models, wastewater surveillance can provide timely estimates of COVID-19 cases in the catchment.In this study, a sharp decline in observed weekly average COVID-19 cases from the middle of January 2023 (Fig. 4) coincided with the announcement of downgrading COVID-19s status from an infectious disease to the same level as the seasonal flu by the government of Japan on January 20, 2023 (NHK, 2023).However, the SARS-CoV-2 RNA concentration in wastewater and hence the predicted weekly average cases did not reduce distinctly.Such result provided clear evidence that WBE can be an informative addition to clinical disease surveillance (Morvan et al., 2022), especially when clinical surveillance is limited due to several reasons, such as low capacity and downscaled testing strategies.
Currently, SARS-CoV-2 variants with changed phenotypic properties (Davies et al., 2021) are continuously emerging, but pandemic mitigation measures and restrictions have been relaxed, including lowered testing strategies (WHO, 2022).In such a scenario, wastewater surveillance will remain an important data stream for public health surveillance complementing clinical surveillance (Morvan et al., 2022).Considering the popularity and benefit of tracking the COVID-19 pandemic through wastewater monitoring, the institutionalization of wastewater surveillance as municipal and prefectural testing strategy is highly probable in the near future.Realizing this fact, the current requirement is a simplistic model that can provide epidemiological estimates using wastewater data, which can be implemented by grassrootslevel facilities with simple mathematical knowledge.Therefore, this study made a new contribution by implementing a simplistic model which predicted the trend of COVID-19 cases in the catchment area and demonstrated that WBE can serve as an effective early warning tool.

Conclusions
• SARS-CoV-2 RNA was detected in 37 % of the 65 wastewater samples collected from September 2020 to December 2021, with concentrations ranging from 3.5 to 5.0 log 10 copies/L.In contrast, SARS-CoV-2 RNA was detected in 96 % of the 67 samples collected from January to August 2022, with concentrations ranging from 3.6 to 6.3 log 10 copies/L.• The predictive performance of the models using nonnormalized SARS-CoV-2 RNA concentration data was higher than those using PMMoV-and E. coli-normalized viral RNA concentration by the K-6-fold cross-validation approach.• The predictive performance of the models improved when dividing data based on the onset of the Omicron variant phase than using the whole surveillance period as one set, revealing the complications of including different waves of COVID-19 in a single model.• A simplistic model using cubic regression predicted that the weekly average COVID-19 cases lagged 3 days behind the SARS-CoV-2 RNA concentration in wastewater.

Fig. 3 .
Fig. 3. Predicted and actual curves of weekly average COVID-19 cases by the (a) 3-day and (b) 7-day offset models using data from January to August 2022 and smoothed by the exponential smoothing method.

•
Simplistic prediction model was useful for predicting the trend of COVID-19 cases.

Table 2
Heat map of evaluation parameters of the different final offset models from the whole data from September 16, 2020 to August 31, 2022.

Table 1
Average RMSE and correlation coefficient (R)between observed and predicted COVID-19 cases from the K-6-fold cross-validation method.