Relating SARS-CoV-2 shedding rate in wastewater to daily positive tests data: A consistent model based approach

During the COVID-19 pandemic, wastewater-based epidemiology (WBE) has been engaged to complement medical surveillance and in some cases to also act as an early diagnosis indicator of viral spreading in the community. Most efforts worldwide by the scientific community and commercial companies focus on the formulation of protocols for SARS-CoV-2 analysis in wastewater and approaches addressing the quantitative relationship between WBE and medical surveillance are lacking. In the present study, a mathematical model is developed which uses as input the number of daily positive medical tests together with the highly non-linear shedding rate curve of individuals to estimate the evolution of global virus shedding rate in wastewater along calendar days. A comprehensive parametric study by the model using as input actual medical surveillance and WBE data for the city of Thessaloniki (~700,000 inhabitants, North Greece) during the outbreak of November 2020 reveals the conditions under which WBE can be used as an early warning tool for predicting pandemic outbreaks. It is shown that early warning capacity is different along the days of an outbreak and depends strongly on the number of days apart between the day of maximum shedding rate of infected individuals in their disease cycle and the day of their medical testing. The present data indicate for Thessaloniki an average early warning capacity of around 2 days. Moreover, the data imply that there exists a proportion between unreported cases (asymptomatic persons with mild symptoms that do not seek medical advice) and reported cases. The proportion increases with the number of reported cases. The early detection capacity of WBE improves substantially in the presence of an increasing number of unreported cases. For Thessaloniki at the peak of the pandemic in mid-November 2020, the number of unreported cases reached a maximum around 4 times the number of reported cases.


H I G H L I G H T S
• Model estimates viral load evolution in wastewater from infected people dynamics. • Identifying actual conditions for which WBE can be used as an early warning tool. • Early warning capacity increases with an increasing number of unreported cases. • In Thessaloniki November 2020 outbreak,~2 days of WBE early warning is suggested. • In Thessaloniki Nov.'20, WBE suggests unreported cases up to 4 times reported ones.

G R A P H I C A L A B S T R A C T
a b s t r a c t a r t i c l e i n f o

Introduction
In the COVID-19 pandemic, wastewater-based epidemiology (WBE) has become an important tool, supplementing public health surveillance, in the hands of scientists, medical experts, and state officials to evaluate the spread of SARS-CoV-2 in the community (la Rosa et al., 2020;Medema et al., 2020b;Wurtzer et al., 2020). This was more so after the Centers for Disease Control and Prevention (CDC) adopted wastewater disease surveillance as a valid monitoring tool and provided guidance and recommendations for the selection and application of testing methods (CDC, 2021). The latter cover issues from sample collection and sample processing to RNA measurement and use of laboratory controls for the estimation of the performance of the applied methods and of data quality. Laboratory controls deal with difficulties arising from the chemical and biological variability of wastewater quality across different places and over time, e.g. due to season, weather, human activities etc., and also cope with problems of proper RNA extraction and quantification, elimination of inhibition and contamination of reagents.
The so-called matrix recovery controls suggested by CDC refer exclusively to the amount of virus lost during sample processing. However, it is well known that viruses strongly adsorb, and so get inaccessible ("lost"), in the pores of solid particles suspended in wastewater in sewerage networks (Sellaoui et al., 2020;Ye, 2018). Therefore, if WBE studies aim to quantify the virus shedding rate at the source (households) from analysis of samples taken at the entrance of wastewater treatment plants (WTP) then it is of paramount importance to appraise the amount of virus lost in sewerage networks. Determining the extent of recovery, as suggested by CDC, by spiking techniques in samples obtained at the entrance of WTP is incapable of describing the amount of virus lost in sewerage networks. This is because such samples have traveled in sewerage pipes for hours and so have (most of) their adsorption sites saturated with adsorbed species. Desorption of spiked material would have been completely different if a sample had been taken fresh at a shedding spot, i.e., from the sanitary plumbing system of a building.
Recently, a mathematical model has been proposed to account for virus loss in sewerage networks by means of adsorption of virus on porous particles suspended in wastewater (Petala et al., 2021). To do so, the model rationalizes the SARS-CoV-2 concentration in samples taken at the entrance of WTPs with respect to certain quality characteristics of wastewater samples. Rationalization is based on rigorous physicochemical phenomena in adsorption (and not on a statistical dependence of virus concentration on wastewater parameters), also including the effect of large-scale topological complexity of actual sewage networks. At the examined period (April to June 2020), the rationalized decreasing global shedding rate was in agreement with the observed clinical conditions, contrary to the non-rationalized data which showed a different picture. Yet, even with rationalized data it is still difficult to make the critical step ahead and associate the virus shedding rate with the number of cases.
Τhere are several reasons for the discrepancy between viral shedding rate in wastewater and number of infected people (cases) reported by public health surveillance. Limitations in the capacity of medical testing, strict criteria for the application of medical testing and parts of the population being reluctant to seek medical care are responsible for underdiagnosis of cases, mainly asymptomatic cases or patients with mild symptoms who are not tested. These factors lead to reporting a lower number of cases than the actual one. On the other hand, poor recovery in the collected wastewater samples and virus loss in sewage networks, along with ineffectiveness to properly rationalize for the latter, lead to a smaller measured concentration of virus in the collected samples and, thus, to a lower estimated viral shedding rate than the actual one.
The time delay between the onset of patients' viral shedding in infected people and symptoms onset that prompts patients being tested, is another reason for discrepancy. Several reports in literature indicate a time delay up to 8 days between wastewater signal and infected cases reported in public health surveillance system (e.g. (D'Aoust et al., 2021;Nemudryi et al., 2020;Peccia et al., 2020)). This is in line with the reported incubation period of around 6 days from exposure/infection to onset of symptoms of COVID-19 Lauer et al., 2020;Li et al., 2020). Therefore, the discrepancy between the onset of viral shedding and the onset of symptoms is actually an advantage that might explain the early detection capacity of WBE.
Another important reason of discrepancy between daily reports of wastewater viral titers and daily reports of new confirmed cases is the cumulative character of viral shedding rate in wastewater during the disease. Several clinical studies Wu et al., 2020;Zheng et al., 2020) showed that viral shedding is not uniform during the disease, but there is a maximum individual's shedding rate in the very acute phase of infection, probably even before respiratory symptoms appear, being followed by an exponential decline in subsequent days. If one considers that shedding lasts at least 3-4 weeks after the inception of symptoms Zheng et al., 2020), it is apparent that daily wastewater viral titers correspond more to the cumulative shedding of infected people rather than to the shedding of the daily new cases. It must be stressed here that the evidence from clinical studies is limited, and only for samples collected from hospitalized patients, thus after symptoms onset, so one should be extremely careful in estimating individual's shedding rates at the first days right after infection.
If one aims to associate wastewater surveillance data with public health surveillance data, the different sources of bias between these data should be also taken into account. Laboratory practice shows that variation in wastewater data is considerably high from one day to the next because of the many parameters that are involved, e.g., individual's shedding rate (orders of magnitude variation among infected people ; sampling protocol, sample concentration techniques, RNA extraction methodology, normalization with respect to population indicators and flow rate, inhibition assessment in RNA recovery, RNA quantification, etc. On the other hand, public health surveillance is susceptible to a systematic bias because of limitations in the capacity of medical testing along with the unpredictable human behavior under stressful conditions. In principle, one should be concerned about the reliability of both wastewater viral measurements and clinical testing data. The goal of this work is to setup a methodology for the consistent comparison between the number of infected people reported by clinical surveillance and SARS-CoV-2 RNA concentration in wastewater samples. To do so, a model is developed for the estimation of the evolution of global virus shedding rate in a sewage system based on the daily positive medical tests. Apart from corrections based on the laboratory controls suggested by CDC, global virus shedding rate data are further rationalized with respect to physicochemical parameters of wastewater to account for virus loss by adsorption to sewage solids (Petala et al., 2021). To increase accuracy, clinical testing data are based on the date of specimen collection and not on the reporting date. The model is first developed in a generalized continuous time regime and then it is transformed to its discrete counterpart dictated by the daily basis data.

Sampling
Wastewater samples were collected at the exit of Thessaloniki's (a city at North Greece) main sewerage pipe, right before the entrance at the wastewater treatment plant of the city, as described elsewhere (Petala et al., 2021). This plant serves an estimated population of about 700,000 people. The present work reports 24-h composite samples (during a 24-h period, 100 mL were collected every 1 h, while the final volume of each sample transferred to the lab was 1 L out of 2.4 L) taken three times per week (Monday-Wednesday-Friday) from October 5th, 2020 until January 6th, 2021. This period includes the second wave burst of COVID-19 for Thessaloniki in November 2020 which was heavy and led to severe restrictions. Notably, the stringent index was above 80 for more than 5 months (Oxford, 2021). Typical range of values of the physicochemical parameters of wastewater samples for the examined period is displayed in Table 1 (supplementary materials). These parameters were employed in the rationalization of the measured viral concentration with respect to quality characteristics of wastewater according to Petala et al. (Petala et al., 2021). In this study, the model developed by Petala et al. (2021) was employed to calculate the relative global shedding rate, R exp /R expo .

Virus concentration
Upon wastewater collection, 200 mL of each wastewater sample was centrifuged at 4000 ×g for 30 min to remove particles and pH of the supernatant was adjusted to 4 using a solution of 2.0 M HCl. An aliquot of 40 mL supernatant was then passed through an electronegatively charged surface of 0.45 μm-Ø47 mm cellulose nitrate HA membrane ((HAWP04700; Merck Millipore Ltd., Tullagreen, Ireland). Filtration was performed using a magnetic funnel mounted on a glass filtration flask (Pall Corporation) Filtration step was conducted in triplicate and membranes were stored in 15 mL falcon tubes for further processing of RNA extraction and virus quantification.
SARS-CoV-2 quantified RNA (1 × 10 6.3 genomic copies) from human clinical samples was added to a subset of concentrates to estimate the recovery efficiency and reproducibility of the RNA extraction procedure. Similarly, heat inactivated SARS-CoV-2 from human clinical samples was spiked (1 × 10 7 viral particles) in a subset of sewage samples to assess the recovery and reproducibility of the virus concentration procedure.
Each RNA extract was subjected to real-time RT-PCR for SARS-CoV-2 quantification in triplicates. The EQ SARS-CoV-2 RT-qPCR kit (Product No. RN011; EnzyQuest, Heraklion, Crete, Greece) was used, based on two primer/probe sets: the N2 set from CDC that targets the nucleocapsid (N) gene and the set targeting the genomic region that encodes the E protein . The assays were performed on a CFX96 Touch™ Real-Time PCR Detection System (Bio-Rad Laboratories, Hercules, CA, USA). Reactions were considered positive if the cycle threshold was below 40 cycles. Calibration curves were generated using the synthetic single-stranded RNA standard "EURM-019" (Joint Research Centre, European Commission). The possible presence of RT-PCR inhibitors in each RNA extract was assessed in duplicates using spiked EURM-019 included at~1000 copies in additional RT-PCR reactions. Inhibition was expressed as % reduction in reported copy number, compared to the sum of spiked EURM-019 copies and the mean value of measured SARS-CoV-2 genomic copies in the non-spiked RNA. SARS-CoV-2 viral load in each sewage sample was expressed as mean ± standard deviation genome copies per liter, after correcting for RT-PCR inhibition if present, and recovery efficiencies (virus concentration and RNA extraction). Precision of each individual sewage sample quantification was assessed using the coefficient of variation (CV) of the estimated SARS-CoV-2 viral load of the three electronegative membranes processed. We set our precision threshold at 35% CV for each sewage sample measurement.

Epidemiological data
Daily numbers of COVID-19 infected people reported in the city of Thessaloniki, corresponding to actual specimen collection date, were obtained from the National Public Health Organization (National Public Health Organization, 2020). These data reflect residents found positive when tested in public and private laboratories for the area of Thessaloniki and topographically match the geographical area of sewage collection system of the city. Data are presented for the period between September 1st, 2020 and January 6th, 2021 (original data are provided in supplementary materials).

Problem formulation
A model is developed for estimating the evolution of global virus shedding rate to sewage system, that is the theoretically expected shedding rate in sewage, based on the official (announced by the state) results of daily positive tests (cases), registered by the date of specimen collection and not by the date of public reporting. There is typically a delay of few days between specimen collection and public reporting.
The model is first developed in a generalized continuous time regime and then it is transformed to its discrete counterpart dictated by the daily basis data.
Let us denote as f(t) (t: calendar day) the evolution of positive test counts density with f(t)dt being the number of positive tests in the time period between t and t + dt. The first step is to transform the function f(t) to the function F(t) which denotes the total number of reported infected people at time t. It is essential to stress here that F(t) includes people that at time t either had already a positive test or they are in the first days after infection, so do not have symptoms, but still they shed virus and will be tested positive later after their onset of symptoms. The effect of infected but unreported people, such as asymptomatic ones and those not tested because of mild symptoms, is dealt with later.
In order to proceed let us consider the course of the disease of an infected person: Infection starts (disease incidence) at day τ = 1, detection occurs (specimen collection) at day τ = τ d and end of viral shedding occurs at day τ e . The number of days for detection and end of shedding are in general not the same among cases but they show a dispersion which can be described by the respective probability density functions P d (τ d ) and P e (τ e ). It is understood that both functions take non-zero values only in restricted domains of their arguments. The limits of these domains are denoted as τ d1 , τ d2 and τ e1 , τ e2 , respectively. The functions P d and P e satisfy the conditions: where the subscript "av" denotes the average value of the corresponding variable. Taking into account the above definitions, it can be shown that the functions F(t) and f(t) can be related to each other as: This relation is actually the mathematical expression of the statement that infected persons at time t include those already detected and those to be detected in the next days. In principle, a multiplication of this number by the average shedding rate per person would give the required global shedding rate evolution function R(t). Such averaging over all infected individuals would have been sufficient for data reduction purposes only if the shedding rate of individuals during their disease cycle had a constant value. However, as already explained, this is not true since there is a strong variation in virus concentration in stool not only among infected individuals but also across the days of their disease cycle.
Let us denote this variable function of the daily shedding rate per person as S(τ), (τ: day of the disease onset). It has been shown that not only S is not a constant, but on the contrary, it is a strong function of its argument τ. In order to incorporate the effect of function S(τ) to the global shedding rate R(t), knowledge of the distribution of disease days among the infected population is required. The corresponding function is denoted as F*(τ,t) and represents the density function of the distribution of disease days, τ, at time t. It is fruitful to decompose the function F*(τ,t) into the product of the total number of infected people and the probability density function of the days of infection as F*(τ, t) = F(t)g(t,τ) where g(t,τ) is a probability density function with respect to τ and satisfies the condition: It can be shown that the function g can be derived as where U is a step function taking the values 0 for negative and 1 for positive argument. The domain of definition of the above expression is from t = τ e2 − τ d1 to t = T − τ d2 in the case of f(t) known from t = 1 to t = T.

Parametrization of the function S(τ)
In order to proceed, one should know the function of virus shedding rate in stool per person and per day of the disease. A first question is how the onset of infection (i.e., time τ = 1) is defined. There are several possibilities for this definition but in the present context the most relevant one is that τ = 1 day is the first day of non-zero shedding of virus in the stool of a person.
There are only a few available clinical studies in literature (e.g. Wölfel et al. (Wölfel et al., 2020), Huang et al. (Huang et al., 2020); Tan et al. ) referring to the kinetics of SARS-CoV-2 individual's shedding rate in stool during the course of disease (from τ = 1 to τ e ). To our best knowledge, from those studies only the work of Wölfel et al. (Wölfel et al., 2020) reports actual viral concentrations whereas all other studies refer to Ct values from molecular analysis. It must be mentioned that these studies present data exclusively from hospitalized patients, thus from people presenting moderate to severe symptoms. Apparently, it is not easy to collect stool samples from infected persons prior the onset of symptoms, and as a result, asymptomatic or mild cases are not registered in clinical studies. A first common observation in these studies is that virus concentration in stool among infected persons varies by several orders of magnitude. Therefore, here an average shedding function per person will be employed. Interestingly, many past studies assumed in addition a uniform with respect to time individual's shedding rate despite the orders of magnitude variability across the course of the disease (Ahmed et al., 2020;Gonzalez et al., 2020;Medema et al., 2020a;Saththasivam et al., 2021). This assumption simplifies the algebra of the problem enormously but it is incorrect.
A thorough fitting procedure of the virus titers data in stool of Wölfel et al. (Wölfel et al., 2020), led to a peculiar two-steps model (Miura et al., 2021): a first step with zero viral shedding, where virus load simply accumulates in infected hosts up to a maximum concentration reached at the day of symptoms onset, and a subsequent second step characterized by viral shedding at an exponentially decreasing concentration over the days of the disease. This fitting yields a Gamma function which degenerates to an exponential one. Evidently, it is quite arbitrary to assume that there is an initial viral accumulation period up to a maximum concentration without any shedding at all.
Here, an even more general parameterization is introduced. The average over the infected persons function is represented as a product of the following factors: (1) the average stool amount produced by a person per day, A (g stool /day), (2) the maximum in time (averaged over infected individuals) virus concentration in stool, B (g virus /g stool ) and (3) a time distribution function s(τ) of this concentration. This distribution is assumed to consist of an exponential increase from a minimum initial value up to a maximum value being followed by an exponential decrease down to a minimum final value. These minimum initial and final values of the distribution are assumed to be 1% of the maximum value. As a result, the whole distribution spans shedding rates over two orders of magnitude. This assumption may be changed to any other option, e.g., to 1‰ (three orders of magnitude span), but the approach remains the same. However, a two orders of magnitude span represents adequately most of the shedding of infected individuals during a typical course of the disease, so it is adopted herein. A new parameter is introduced, τ a , which denotes the value of τ days at which the maximum virus concentration in stool, i.e. maximum individual's shedding rate, occurs. Summarizing, the above the function S(τ;τ a ,τ e ) is given as: where s(τ;τ a ,τ e ) is: The number 4.6 (since exp.(-4.6) = 0.01) is used to ensure that the individual's shedding rate is 1% of the maximum at the start and at the end of the disease. The shape of the function s(τ) for several pairs of (τ a ,τ e ) -(6,32), (12,32), (9,25) -is shown in Fig. 1. The particular parametrization of the function S proposed in the present work is more realistic than the Miura et al. (Miura et al., 2021) model, as it incorporates viral shedding even in the early days of the disease and before the peak maximum (peak) viral concentration in stool is reached. Such early shedding in stool is in line with clinical studies reporting patients with shedding in oropharyngeal swabs  and gastrointestinal symptoms (Siegel et al., 2020) a few days before the appearance of respiratory symptoms. Nevertheless, one should keep in mind that respiratory shedding is only a proxy for fecal shedding. In addition, the proposed function S is very attractive because there is one to one correspondence between the parameters and major features of the function. The height (amplitude) of S is determined by the product AB, its time length is determined by the parameter τ e and its skewness by the parameter τ a . In particular, the closer τ a is to τ e /2 the smaller the skewness is. It must be stressed that the proposed parametrization of S does not require explicit information on the duration of the incubation period after infection nor on the day of symptoms onset. What actually matters in the context of the present analysis is the day of the maximum individual's shedding rate as this is described by the parameter τ a . Some clinical studies indicated that this maximum might occurs at the end of the incubation period which is taken also as the day of symptom onset (Huang et al., 2020;Miura et al., 2021;Tan et al., 2020). However, in the present formulation this day is flexible and can be anytime from the moment of infection to the end of the shedding period (i.e., the active days of the disease).
The final expression for the global shedding rate is taken by integrating the individual's shedding rate over the days of the disease.
The development up to now is rather complex and includes several unknown probability density functions. In the absence of any information about them, it is convenient to consider them as Dirac delta functions. In this way, we assign to each distribution its average value. This approach not only simplifies considerably the mathematical problem but it also allows -through sensitivity analysis-to assess bounds on the effect of using a different distribution than the Dirac delta function. This is based on the principle that any secondary feature of a distribution has much smaller effect on the result than its average value. By considering the relations P d (τ d ) = δ(τ d − τ d,av ) and P e (τ e ) = δ(τ e − τ e,av ) (where δ denotes the Dirac delta function) and substituting them in Eqs.
(2), (4), (7) leads after some algebra to (the subscript "av" is dropped in the following for clarity): The above set of equations is discretized to be compatible with the present data. These data refer to daily values of virus shedding rate so a finite volume discretization is followed here. Let us denote as f i the number of positive medical tests at day i (measurement period i = 1 to N), F i is the total number of infected people (that is daily cases) at day i (with the specimen of their positive test collected at day i), R i the global shedding rate at day i and g i,j the probability of being at the j-th day of the disease at the calendar day i. The governing equations take the form: for i > m and i < N-nwhere S j is the shedding rate per person at the j-th day of the disease (m = τ e − τ d , n = τ d ).

Results
Fig. 2 displays medical surveillance data reported for the city of Thessaloniki (~700,000 inhabitants) (National Public Health Organization, 2020). More specifically, it presents the daily number of infected people versus the date of their specimen collection for medical testing. The date of specimen collection is back-dated by 1 to 4 days (median of 3 days) from the date of reporting by the Hellenic National Public Health Organization (National Public Health Organization, 2020). The difference between the two characteristic dates is small but the epidemiological data adjusted to the date of specimen collection are more appropriate to compare with wastewater measurements because they are better associated with the date of infection and the date of symptoms onset. Nevertheless, if early warning by wastewater measurements is the stake then comparisons should be made with medical surveillance data as announced, i.e., based on the date of reporting. Delays in the announcement of sewage surveillance results or intermittent sewage sampling and testing can limit the early warning capacity of wastewater. Other factors like sampling methodology and daily population movement can affect the results of sewage surveillance but are not discussed in detail in this study.
The presented data cover the period from September, 1st 2020 to January 6th, 2021. A fitting curve is fitted over the raw data to smooth out the noise. Most of the parametric study that follows is performed with respect to this curve, to skip the noise. Following the loose atmosphere of summer 2020 (stringency index in the country around 50) (Oxford, 2021), in September and October there was no strict quarantine in the city (stringency index in the country between 50 and 60) (Oxford, 2021) and only a rule for social distancing was active along with a rule for limited number of people in confined places, like stores and restaurants. In September and the first 10 days of October the daily reported cases were always well below 20 and often even below 10. At around mid-October 2020 the number of infected cases started to escalate exponentially. In just two weeks, from mid until the end of October, the number of cases increased about ten-fold. This dramatic rise was consistent with the about 2.7 days doubling time of the epidemic reported for European countries in the absence of control measures (SET-C Steering Committee, 2020). A strict lock-down was imposed on the city on November 2nd meant to suppress the outburst of the disease (stringency index between 78.7 and 84.26) (Oxford, 2021). Between Nov 3rd and Nov 17th the number of infected cases fluctuated approximately from 800 to 300 cases between weekdays and weekends. Similar fluctuations were noticed also in other studies (Peccia et al., 2020). On December 12th only few of the strict measures were released; the stringency index was maintained at levels above 80 (Oxford, 2021), e.g., retail stores opened to serve people but only at their entrance door and only through pre-scheduled appointments. As a result, after mid-November the number of infected cases gradually went down until it became pretty stable at below 100 daily cases in the first week of 2021. Fig. 3 presents the experimentally determined relative global shedding rate of viral RNA copies, r 1 (t) = R exp /R expo , in sewage from October 5th, 2020 to January 6th, 2021 for Thessaloniki. October 5th is the last day after summer that the measured viral RNA copies in wastewater fluctuated around the experimental limit of quantification of the employed technique (~10 viral RNA copies/mL). R exp represents the daily value of global shedding rate, whereas R expo is a reference global shedding rate. Presenting experimental shedding data in the form of a ratio reduces the effect of determination uncertainty. R expo has been defined as the average R exp value across the days of the first week of October which was a period with calm epidemiological conditions in the city (less than 10 daily reported cases).
There are certain similarities and differences between the medical surveillance data in Fig. 2 and the wastewater data in Fig. 3. They both show an initial abrupt ascend followed by a later gradual decline. They both show a peak in the first half of November 2020. Yet, the peak in wastewater data is sharp at around November 13th whereas the peak in medical surveillance, despite the intense noise, looks like a plateau between November 3rd and 12th. Next, the time series of the reported daily infected cases in the city of Thessaloniki is analyzed using the procedure in the previous section to determine the functions F, g, R. It is reminded that these three functions correspond respectively to (i) the total number of infected people with positive test at a calendar day, t, (those already registered as positive to day t but also those that will be tested in the next days and their test result will be also registered to day t), (ii) the distribution of the number of infected people with respect to the days of the disease, τ, at a calendar day, t, and (iii) the total shedding rate with respect to the calendar day, t. It is important to note that since the temporal discretization quantity is 1 day, the continuous and the discrete forms of the above functions are arithmetically similar. Thus, the presented results can be read both as continuous functions (per day) or as discrete values (i.e., histograms). The essence of the proposed approach is that it accounts for the time dependence of the individual's shedding rate during the disease. Yet, this is important only when the distribution of the daily individuals shedding rates is not uniform among the population, but it evolves along the days of the disease.
Before a parametric analysis is performed it is useful to identify realistic range of values of the model parameters in order to examine their influence. It is reminded that the three model parameters are (i) the total number of shedding days counted from the infection day, τ e , (ii) the day of the maximum individual's shedding rate, τ a , and (iii) the day of detection (specimen collection), τ d . The analysis of the data of Wölfel et al. (Wölfel et al., 2020) by Miura et al. (Miura et al., 2021) indicated an average shedding period of 26 days after the onset of symptoms. This is a bit longer than the 17 and 18 days, reported by  Huang et al. (Huang et al., 2020) and Tan et al. (Tan et al., 2020), respectively. But both the latter studies reported a high variance in their average shedding period plus they mentioned that in some individuals shedding exceeded five weeks. Apart from this, Tan et al. , indicated a 6 day median incubation period from the day of infection until the day of symptoms onset. If one considers that the day of symptoms onset is a good candidate for the average day of maximum individual's shedding rate then τ a = 6. For Greek patients it is realistic to assume that specimen collection for medical testing takes place on the average about 3 days after the onset of symptoms. This is in line with information in literature that the median time period from symptom onset to hospital admission is 3 days (Huang et al., 2020). Therefore, a realistic average day of detection is τ d = 8. The above information combined implies a total number of shedding days τ e = 32 (6 + 26) and this is the value adopted herein. Summarizing, the base case parameter values around which parametric analysis is performed are τ a = 6, τ d = 8 and τ e = 32. Fig. 4 shows the computed values of g(τ,t) for selected values of calendar days, namely, September 25th, October 10th, October 23th, December 9th and December 29th, 2020. The small oscillation appeared is due to the oscillations in the curve f(t). However, as f(t)the positive tests count density-increases new patients are added to the number of infected people. The addition occurs at a higher rate than the rate of withdrawal of cured people. This leads to the accumulation of infected people at early shedding days, τ, so the function g(τ) acquires the characteristic, decreasing with τ, profiles shown for October 10th and October 23th. In the time regime of the maximum f(t), e.g. curve for December 9th, the function g(τ) tends fast towards uniformity, like at the early shedding days. Finally, during the decreasing period of f(t), e.g. curve for December 29th, the withdrawal of cured people rate overwhelms the entry of new patients so g(t) is an almost linearly increasing function of τ. The above description manifests that the function g(τ) varies a lot with calendar time t and it is far from uniform along the shedding days so the present analysis is necessary for the estimation of global shedding rate across calendar days. It is clear that at a particular calendar day, infected people are at a different stage of the disease, and thus, at a different stage of viral shedding.
The early warning capacity of wastewater surveillance with respect to medical surveillance is estimated by comparing in Fig. 5 the dynamics of functions f(t): daily reported infected people, F(t): total reported infected people, and R(t): experimentally determined global viral shedding rate in sewage. More specifically, Fig. 5a shows this comparison for the base case τ a = 6, τ d = 8, τ e = 32; Fig. 5b for τ a = 2, τ d = 8, τ e = 32; Fig. 5c for τ a = 12, τ d = 8, τ e = 32 (all values in days). Curves in Fig. 5 stop at December 29th, 2020, and not at January 6th, 2021. This is because calculations of F(t) -and accordingly of R(t)-can be performed only up to December 29th, 2020, as this function requires for every daily value available data for the subsequent τ d = 8 days. To avoid experimental noise in the parametric analysis, calculations in Fig. 5 are based on the fitted curve in Fig. 2. For the comparison it is imperative to express these three functions in a similar scale. This is done by transforming them in their probability density function counterparts. For the present case, this is done simply by dividing each function with the sum of its values for the period of interest. The function F(t) always resides after f(t) which is typical for a cumulative type of function. The temporal distance apart between Fig. 5. Comparison of the relative global viral shedding rate in sewage with the number of the daily and total reported infected people per surveillance day. In all cases individual's shedding duration in stool is set as 32 days, laboratory test is assumed at DAY 8, whereas the day of individual's shedding peak in stool is at: DAY 6 (a), DAY 2 (b) and DAY 12 (c). these two functions depends on the values of the parameters τ d and τ e . This distance increases as τ d decreases and τ e increases. The dynamics of the function R(t) depend also on τ a . In case of τ a < τ d then R(t) precedes f(t). This condition is extremely important as it demonstrates that for wastewater surveillance to precede medical surveillance the day of the peak of individual's viral shedding rate in stool during the disease days must lie before the day of specimen collection for medical testing. In all other cases, the curve R(t) resides between f(t) and F(t), with the latter being the approximate long time bound for the dynamics of R(t). The previous statement can be confirmed by observing the three aforementioned functions for the three set of parameters in Fig. 5. Depending on the calendar day, function R (t) precedes f(t) roughly from 0 to 4 days (Fig. 5b). For the base case in Fig. 5a the difference between R(t) and f(t) appears marginal. Yet, one must recall that in Fig. 2 the ascending part of the fitting curve precedes raw data, so in reality even for the base case sewage data lie earlier than medical data.
In summary, the viral shedding rate evolution curve measured in wastewater lies from a few days before the curve of the number of daily infected people up to the curve of the total number of infected people (Fig. 5). The exact position of the curves depends on the relation between the day of maximum individual's shedding rate in stool and the day of specimen collection for medical testing. In literature there is experimental evidence of identifying SARS-CoV-2 in wastewater earlier than medical reporting by several days, e.g., 2 days in Ottawa, Canada, (D'Aoust et al., 2021). Nemudryi et al. (Nemudryi et al., 2020) in Montana, USA, found that wastewater surveillance for SARS-CoV-2 foreshadowed new case reports by 2-4 days, by comparing wastewater surveillance data to the frequencies of reported lab-confirmed cases. Interestingly, Kumar et al. (Kumar et al., 2021)realized the severity of the pandemic situation in the municipal territory of Gandhinagar in India, 1-2 weeks prior to the official reports by the regulatory body based on clinical tests; likewise, Lastra et al. in Madrid, Spain, reported a period of 3 to 11 days early warning detection of SARS-CoV-2 in wastewater (Lastra et al., 2022). Yet, in the latter studies no data is provided regarding time lag between medical testing and reporting. On the other hand, Koureas et al. (2021) in Larissa and Volos, Greece, applied numerous machine learning models to determine the association between RNA concentrations in sewage and 7d cumulative cases data based on the date of sampling and found no clear evidence that wastewater measurements can foreshadow reported cases at all. Such differences are observed because WBE as an early warning system depends on both the viral shedding dynamics relative to symptom onset and the discrepancies between WBE and clinical data reporting. The current study uses literature evidence for viral shedding in stool and actual daily reported cases to calculate theoretical global shedding in sewage (Fig. 5), elucidating the conditions under which sewage monitoring data can precede daily reported cases.
For appraising in real life the early diagnostic capacity of wastewater surveillance it is important to compare wastewater data with medical surveillance data based on the date of reporting (and not on the date of specimen collection). Peccia et al. (Peccia et al., 2020) found in New Haven (USA) that although wastewater data was ahead by only 0-2 days of positive test results by the date of specimen collection, it was 6-8 days ahead of positive test results by the reporting date. In Thessaloniki (National Public Health Organization, 2020) the date of reporting follows the date of specimen collection by about 3 days on the average and so even the base case parameters in Fig. 5a permit an early diagnosis capacity of wastewater surveillance of around 2 days. Besides medical surveillance data reporting, the on-time reporting of wastewater surveillance data is a critical issue. Regarding big cities, wastewater produced during a specific day is travelling for several hours to reach the entrance of the wastewater treatment facility. Normally, 24-h composite samples refer to the wastewater produced the previous day, providing 1-day lag in the global shedding determination. Furthermore, if sampling and analysis is not performed on a single day basis, additional time lags are observed. Delays in wastewater analysis increase the time lag for reporting and suppresses the early warning capacity of WBE.
A critical issue in the disease dynamics is the effect of the number of unreported infected people (unreported cases). Let us denote as U the ratio of unreported to reported infected people. If U is constant along the days of the shedding period, the analysis is exactly the same with that shown above for the reported infected people, F, since the probability density functions do not change. However, if the number of unreported people varies along the shedding period, then this may yield a time lag between wastewater and medical data. Fig. 6 compares the relative medical surveillance data of the smoothed daily reported cases, f (t)/f o , versus the theoretically estimated relative global shedding rate in wastewater, r 2 (t) = R(t)/R o . Normalization parameters, f o and R o are average values of the respective parameters over the same reference first week of October. Comparisons are for the base case τ a = 6, τ d = 8, τ e = 32. The ratio U is assumed to vary proportionally with the number of reported infected people. The parameter U max is used to parameterize U designating the maximum value of U for the case considered. The parameter U max takes the value 0.5, 2 and 4 in Fig. 6. Apparently, for U max = 0.5 (unreported cases becomes at most 50% of reported cases) wastewater and medical data coincide at the ascent of the curves but at the descent wastewater lags behind. This changes for U max = 2 (unreported cases becomes at most 200% of reported cases) and for U max = 4 (unreported cases becomes at most 400% of reported cases) where wastewater data clearly go before medical data at the ascent of the curve. Summing up, a rising number of unreported cases as the shedding rate increases leads to earlier wastewater signal than medical surveillance. Furthermore, the higher the maximum number of unreported cases the more in advance the wastewater data from the medical data.
Next, the experimental relative global shedding rate, r 1 (t) = R exp (t)/ R expo , shown in Fig. 3 is compared with the estimated total number of infected people, including both reported and unreported ones, Fig. 7. It is apparent that from low to moderate relative global shedding rates (R exp (t)/R expo 〈100) and total number of infected people (F < 15,000) there is a roughly linear relationship between the two quantities. This changes dramatically for higher values of either quantities. Moreover, during the increasing phase of the relative global shedding rate, i.e., outbreak of pandemic wave, the same number of infected people sheds a higher viral load than during the decreasing phase. This is so because at the outbreak phase of the pandemic most infected people are at the early phase of their disease when individual's shedding rate is higher, as shown in Fig. 1. In line with this, when the relative global shedding rate starts to fall right after its peak, the total number of infected people continues to rise to about 10% more cases. This happens because the turning point (peak) of the curve designates the moment when the daily new infected cases begin to reduce, Fig. 2, so the majority of the total number of infected people are at later phase of their disease when individual's shedding rate is lower.
Let us now try to correlate the theoretically derived function R (t) with the actual one found from experimental wastewater analysis R exp (t). The latter was shown as a relative global shedding rate, r 1 (t) = R exp (t)/R expo , in Figs. 3 and 7. Contrary to what was done in Fig. 6, now the theoretically estimated relative global shedding rate r 2 (t) = R(t)/R o is computed from the noisy medical surveillance raw data in Fig. 2 for the base case parameter values τ a = 6, τ d = 8 and τ e = 32. As a result, r 2 (t) data are also noisy. Therefore, β(t) = r 1 (t)/ r 2 (t) is a ratio reflecting the total number of reported and unreported infected people, F(1 + U), over the total number of reported infected people, F, at every calendar day. In other words, β = 1 + U, that is, β-1 stands for U, the ratio of the total unreported over the total reported cases. Thus, the minimum value of β is 1 and corresponds to zero unreported cases. As before, the values of R expo and R o both refer to the same reference first week of October. Selection of the reference week at an epidemiologically calm period allows assuming that R expo /R o ≈ 1 so the normalization in β can be ignored. The evolution of β with the calendar days appears in Fig. 8. Considering the intense day-by-day scatter, the ratio β takes values roughly between 1 and 5, following a qualitatively similar trend with the wastewater shedding data in Fig. 3. Interestingly, these results imply that there are essentially no unreported cases at periods of low shedding but unreported cases rise up to four times the reported ones (U/F ≈ 4) when shedding is maximum. Analysis of seroprevalence data in Qatar indicated that diagnosed infections represented about 10% of actual cases (Saththasivam et al., 2021). Another serological-testing study (Havers et al., 2020) showed that the actual number of infections could be from 6 to 24 times the number of the reported cases. Overall, the presently estimated values of U/F from 0 to 4 is close to the above published values. To our knowledge, this is the first time in the SARS-CoV-2 pandemic that wastewater data are employed to indicate a proportion between unreported and reported infected cases. Moreover, the present study shows that the proportion of unreported to reported cases is not constant during the outbreak of a disease wave but as the number of infected people (and global shedding rate) increases, it attains higher values and its scatter decreases.
Based on the above, an effort is made to correlate β to the total number of the reported infected cases F. The result is shown in Fig. 9a where once more red circles denote increasing shedding rates and green squares denote decreasing shedding rates. Alike in Fig. 7, β starts decreasing before the maximum F is reached. In addition, at high F values there is a hysteresis in β between increasing and decreasing branches of the curves. This might imply that virus spread among unreported Fig. 6. Comparison of the theoretical relative global shedding rate in sewage with the relative medical surveillance data of the smoothed daily reported cases for different scenarios of unreported daily cases. Unreported daily cases vary proportionally with the number of reported daily cases starting from 0 and reaching a maximum value of 0, 50, 200, 400% of reported cases at the date of maximum global shedding rate. Fig. 7. Experimental relative global shedding rate in sewage, R exp (t)/R expo , versus the estimated total number of infected people (reported and unreported), F. The latter is estimated assuming that, on the average, the shedding duration in stool is 32 days and medical tests are taken at DAY 8 during the disease. Red circles denote increasing global shedding rates whereas green squares denote decreasing global shedding rates. people is a bit higher in the declining phase of the disease either because of the cumulatively higher virus prevalence in the community or because more people undergo testing during the ascending spreading period of the disease than in the milder descending period. Fig. 9b presents β versus the daily reported number of the infected cases, f. Interestingly there is an almost linear relation between the two parameters up to about 400 daily reported cases. After that point, spreading of the virus in the community is too high so β depends on the total number infected people F and not on the daily reported cases.
It must be mentioned that β may depend also on other quantities (apart from F and the ascending/descending mode), like the value of the F slope, but at present there are not enough data to support this. Having an estimation of the total number of infected people through β, a crude estimation of the product AB (average maximum virus shedding rate per person) is obtained. The estimated value of AB is at about 2 · 10 11 day −1 . This value is in fair proximity with that estimated by the wastewater analysis of Wu et al. . However, more work is needed to allow judging on the correctness of this value.

Conclusions
A mathematical model is developed that permits estimation of SARS-CoV-2 global shedding rate in wastewater from the number of daily infected people (daily cases) announced by medical surveillance. The problem is complex as it requires a cumulative function of the total number of infected people at every calendar day and a function of the average shedding rate among infected individuals at every day along the course of the disease. Based on the limited available evidence in literature, a realistic function for the shedding rate of SARS-CoV-2 in stool of infected individuals is proposed which calls for an exponential increase of shedding rate from the day of infection to the day of symptoms onset, being followed by an exponential decay until the end of the disease. Three characteristic times describe this function: the day of maximum individual's shedding rate, the day of detection (specimen collection) and the end day after the onset of the disease. Applying this model to the public health surveillance data for the city of Thessaloniki (~700,000 inhabitants, North Greece) a thorough parametric study is performed. It is shown that for WBE to afford an early warning capacity, the day of maximum individual's shedding rate must precede the day of detection by a few days. In particular, at the beginning of an outbreak curve, a 4-day early WBE signal requires these two characteristic times to be apart by 6 days. For Thessaloniki where on the average these two characteristic times are apart by just 2 days but, additionally, the day of reporting follows the date of specimen collection also by about 3 days, the early diagnosis capacity of wastewater surveillance is around 2 days, indicating that the foreshadow of wastewater surveillance is quite marginal if there is no lag in reporting. Furthermore, comparison of wastewater surveillance and public health data indicates the existence of a number of unreported infected people. For Thessaloniki in the outbreak of November 2020, this number was negligible at epidemiologically calm days with low global shedding rate but went up to four times the number of reported people when global shedding rate reached a climax at mid-November 2020. Interestingly, the presence of an increasing number of unreported cases with shedding rate enhances the early warning capacity of WBE. To this end, the present model is an essential tool to investigate the dynamics of virus spreading based on wastewater measurements. When such knowledge is adequately acquired then the inverse problem of estimating the number of cases from wastewater data can be attempted.

Declaration of competing interest
(a) (b) , β Fig. 9. Variation of the ratio of the total number of infected people (reported + unreported) over the number of reported infected people with regards to (a) the total number of reported infected people and (b) the daily number of infected people. The ratio is calculated from unsmoothed raw data in Figs. 2 and 3. Red circles denote increasing global shedding rates whereas green squares denote decreasing global shedding rates.