MODELLING SEASONAL INFLUENZA IN ISRAEL

Mathematical modeling approaches are used to study the epidemic dynamics of seasonal influenza in Israel. The recent availability of highly resolved ten year timeseries of influenza cases provides an opportunity for modeling and estimating important epidemiological parameters in the Israeli population. A simple but well known SIR discrete-time deterministic model was fitted to consecutive epidemics allowing estimation of the initial number of susceptibles in the population S0, as well as the reproductive number R0 each year. The results were corroborated by implementing a stochastic model and using a maximum likelihood approach. The paper discusses the difficulties in estimating these important parameters especially when the reporting rate of influenza cases might only be known with limited accuracy, as is generally the case. In such situations invariant parameters such as the percentage of susceptibles infected, and the effective reproductive rate might be preferred, as they do not depend on reporting rate. Results are given based on the Israeli timeseries.

1. Introduction.Seasonal influenza recurs annually in most temperate climatic zones of the world with large epidemics that occur in winter followed by fade-out periods during the warmer months.In Tropical regions influenza is spread more uniformly over the year.Influenza outbreaks have been documented in the scientific literature in records that extend back to at least 1650, although reports of possible influenza may be found in Greek writings from 412 BC [19].These historical records serve to attest to the tenacity and unusual persistence of this respiratory disease.Being highly infective, influenza propagates rapidly through human populations in the form of virus aerosol particles with large attack rates.Although mostly a self-limiting disease, there are few other diseases that cause as much absenteeism 2 O. BARNEA, R. YAARI, G. KATRIEL AND L. STONE from work and schools, suffering, visits to outpatient clinics, and hospitalizations, [17].But influenza is also a source of considerable human mortality, reaching some 250,000 to 500,000 deaths per year globally, and clearly exacting a large overall economic toll on society.
Of the three genera of influenza viruses (A, B and C), Influenzavirus A is the most virulent and has the greatest potential to cause pandemics (see [3] for a review).The virus has several subtypes which are identified by the copies of hemagglutinin and neuraminidase glycoproteins found on the surface of the virus membrane.Hemagglutinin acts to recognize target cells by binding to the cells receptors and then allows entry to the target cell by fusion of the cells membrane with the viral membrane.Neuraminidase helps viruses to be released by the host cell [15].There are 16 known subtypes of Hemagglutinin and 9 known subtypes of Neuraminidase and the strain of influenza is determined by their combinations (eg., H1N1, H3N2).Human-to human transmission is possible for H1, H2 and H3 subtypes, and it is not yet clear if other subtypes have this ability as well [24].
The influenza virus is prone to unusually high mutation rates in the genes encoding the hemagglutinin and neuraminidase surface antigens.These mutations can make the virus unrecognizable by preexisting host antibodies, generated during an earlier influenza infection, and allows the virus to continually evade the immune system.This process of antigenic drift is driven further by the host's immune system, which directs positive selection on the influenza viruss main antigen hemagglutinin.Thus the host's immunity to a particular influenza strain gradually erodes or wanes as antigenic drift proceeds through rapid evolution.In effect, antigenic drift creates an important source of new susceptibles in the human population to fuel the next epidemic outbreak.
There is still much controversy in identifying the seasonal drivers that generate annual influenza oscillations [8].Certainly antigenic drift plays an important role.Various other explanations are usually related to characteristics particular to the winter months: more indoor crowding [16], increased virus survival [13], and decreased immunity of the host, perhaps mediated by a decrease in Vitamin D synthesis from lack of sunlight [2].As in many other infectious diseases the contact structure of the population, and its changing properties over a year, also have enormous influence in driving seasonal epidemics.In a seminal modeling study, Fine and Clarkson [7] demonstrated conclusively how the opening and closing of the school year, as well as holiday periods, manifested strongly in population contact rates reconstructed from long-term measles incidence records.Figure 1 shows that despite the different year to year dynamics exhibited in the measles timeseries, the annual periodic pattern of the contact rate hardly changes, and closely reflects the school year and holiday periods.Despite its simplicity, their modeling approach proves an interesting means for understanding infectious disease dynamics when surveillance timeseries are available.The model itself has a long history and is a simple variant of the Hamer-Soper model (see [22], [10]).
We begin by taking advantage of Fine and Clarkson's method for the purpose of characterizing important epidemiological features of seasonal influenza in Israel, and then we attempt to substantiate our results by using a more sophisticated chain binomial epidemic model.In particular, given a population that experiences recurrent annual flu, it seems to be of considerable interest to ask what percentage of the population is susceptible to the disease at the start of the winter season.This is a question that has rarely been addressed in the literature given that it is difficult to measure in practice, although laboratory seroprevalence tests have been attempted through surveillance.However, these tests are expensive and require large surveillance efforts often at the national scale to achieve reliable results.On the other hand, timeseries models are capable of shedding some light on this question.Moreover, it is a question of particular relevance when studying influenza dynamics.Unlike childhood infectious diseases studied by Fine and Clarkson, where the susceptibility of a population is closely related to the number of new births, for influenza the situation is much more complicated due to the different virus strain experienced each year and due to the process of antigenic drift described above.The procedures we use are also capable of providing estimates of the important epidemiological parameter, the reproductive number R 0 which gives a measure of the average number of secondary infections a typical infected individual will generate in a fully susceptible population.
Our work is motivated by recent access to an unusually detailed ten-year dataset (1998-2008) of influenza-like-illness (ILI) amongst the members of Maccabi Healthcare Services in Israel.Maccabi is the second-largest health maintenance organization in Israel and gives healthcare services to around 23% of the population of Israel.The number of members N varies between 1.4 to 1.8 million in the aforementioned period.The surveillance is implemented by a network of over 3,000 doctors using a computerized medical record system.The Maccabi dataset covers ten full flu seasons, two of which were dominated by influenza B and the rest dominated by influenza A. The daily timeseries of infectives plotted in Figure 2. In all A-dominated seasons the dominant subtype was H3N2, except for the 2000-2001 season when H1N1 was the dominant subtype.A weakness in our estimation schemes for both S 0 and R 0 is that we are unable to obtain reliable information about the reporting rate of the population.Thus while we know the number of individuals reporting ill to the doctor on a daily basis, it is unclear what percentage of the population in the end seeks medical support, and what percentage of those reported as ill really have influenza.In such a situation, we are able to predict S 0 and R 0 only up to the accuracy of the reporting rate.As such, it would seem to be of interest to also investigate parameters that are independent of the reporting rate.Two such parameters are investigated here: the percentage of susceptibles that are infected over the epidemic (Z), and the effective reproductive ratio R ef f .The latter parameter is often used when, as in many situations, the population of N individuals is not fully susceptible (S 0 < N ).In this case the effective reproduction number R ef f is related to the basic reproduction number by Because population susceptibility changes from year to year due to antigenic drift, or through sudden punctuated evolutionary jumps [21] as has recently been suggested, it becomes overly complicated to derive a single model that will simulate all ten-years continuously.Instead we have found it more appropriate to model the population over individual seasons.We find that despite the simplicity of this "single season" approach, new and valuable information is obtained.
2. The Fine and Clarkson (FC) model.We make use of the standard SIR modeling approach which divides the N members of a population into three classes: susceptible (S), infective (I) and recovered (R).As we are modeling a single epidemic it is supposed that on this time-frame the effects of antigenic drift are minimal and there are no reinfections, so that after having the flu a person moves directly to the recovered class and remains there.The model is formulated to estimate the number of newly infected cases I t and the number of susceptibles S t in time step t.To achieve this it is assumed that time steps are given in terms of influenza's average generation time, which is the average time in which an infected person remains infectious.For influenza, this may be reasonably approximated to be three days [5].Thus it was necessary to smooth and bin the daily data to track the number of new infectives every three day period.This has the added advantage of smoothing the data and reducing spurious irregularities and artifacts such as "weekend effects." Similar to Fine and Clarkson's SIR model, we set the number of new infectives generated in the next three-day time step to be: This relies on the assumption that the population is randomly mixing so that the number of new infectives relates to the probability that an infective person meets a susceptible and is thus proportional to the product S t • I t .The infection is transmitted at a rate proportional to the coefficient β t which relates to both the number of contacts a typical infected person will meet over the generation time of the disease, and the probability of infection in each meeting.After every timestep, the number of susceptible decreases by the number of new infective cases: The transmission rate can be calculated by rearranging (1) as Thus the above formulae 1,2 and 3 may be used to reconstruct the transmission rates from the time series of infectives.Figure 1 displays Fine and Clarkson's reconstruction of transmission rates for the England and Wales measles data.
Fine and Clarkson [7] estimated the initial number of susceptibles using information concerning birth rates, age-specific measles incidence data and results of seroepidemiological surveys.However, the method is designed for standard childhood diseases such as measles, and cannot be applied in its current form to the Israeli influenza data.While measles is caused by a relatively stable virus and infection with measles results in most cases in a lifelong immunity against this virus, influenza is caused by different types, subtypes and strains of viruses, with a different strain being the dominant one in each influenza season, and an infection with one strain results in different, and usually unknown, levels of immunity against other strains [1].The number of susceptibles in the beginning of each flu season is, therefore, unknown.We therefore explored model runs with different initial conditions S 0 .
Figure 3 shows the model results for the 2007-2008 season based on two different values of S 0 .The left panel is based on S 0 = 60,000 while the right panel is based on S 0 = 400,000.We first note that direct application of Eqns.1-3 will always reproduce the observed infective cases exactly, as this is built into the model formulation (see Figure 3 top panels).The goal is to use the infectives to reconstruct the susceptibility and missing transmission parameters.For the left panels with initial condition S 0 = 60,000, one notes the number of susceptibles drop significantly as the epidemic evolves and reach almost 50% of their initial value by the time the epidemic has ended.During the epidemic (t = 25 to t = 45) the transmission rate β t remains relatively constant.These two features are typical for an epidemic that is purely driven by the infection process with little interference from seasonal factors or changes in contact rates due to school term.In short, this scenario represents an epidemic that dies out solely due to an exhaustion of susceptibles.
In contrast, in the right panel with initial condition S 0 = 400,000, as the epidemic evolves the number of susceptibles remain relatively constant and are only reduced by less than 10% by the end of the outbreak.It is important to note that the contact rate shows a marked decrease in the late stage of the epidemic, possibly indicative of a seasonal change.This scenario with a higher initial S 0 has all the features of an epidemic that is curtailed [23], [18] and disease dies out due to a change of season.The curtailing manifests by the large drop in the contact rate as seen from t = 80 of the bottom-right panel of Figure 3.
Over the ten years of the surveillance study, the December-February period in Israel would seem to have no major holiday period where contact rates might be considered to change radically.Moreover, features concerning the timing of the epidemic such as the point where infectives/susceptibles drop simultaneously does not appear to coincide with any obvious climatic event or event that relates to closure or opening of the school year.(See, however, [11], where fine-scale changes were detected.)One is tempted to conclude that the left panel with low initial condition and relatively constant transmission rate (at least over the epidemic period), is likely to be the most representative for Israel.Moreover, such a possibility considerably simplifies the modeling process and allows us to explore a number of scenarios that would otherwise be inaccessible.We would suppose that seasonal changes are where the epidemic is clearly curtailed.As such, in this exploratory investigation of seasonal influenza A in Israel, we base our modeling on the plausible assumption that the seasonal contact rate is relatively constant over the epidemic period.We now attempt to estimate the initial value of S 0 and the transmission parameter by fitting the model with constant contact rate to the data.Any model fit will depend on the actual date that is set for the epidemic initiation (t i ) as well as the initial number of susceptibles S 0 .We therefore searched over the parameter space of these two variables (S 0 , t i ) until obtaining the model that best fits the data (in the least squares sense) thus yielding the best fitting parameters.In this scheme, for each pair (S 0 , t i ), the model was run using the observed data and the transmission rates β t were reconstructed using Eqns 2-4.As an example, Figure 4b displays β t for the 2007-2008 season.The transmission rates were averaged in the weeks over the epidemic, to find the mean transmission rate β, which is represented in Figure 4b by the horizontal line segment.The deterministic epidemic model eqns 1 and 2 were then iterated after setting β t = β, yielding a time series of simulated infectives.The least squares distance between the simulation and the observed data was calculated and recorded.The procedure was repeated over (S 0 , t i ) parameter space until the closest fitting model was found that covered a timespan to include a minimum of 80% of the epidemic's infectives.The best fitting initial conditions S 0 are presented in Table 1 for all relevant seasons over the 10 year period, together with the effective reproductive number R ef f = R 0 • S 0 /N .Note that for this particular case of a constant transmission rate, we have the simple relation As a check of self-consistency, it was useful to examine the epidemic dynamics produced from a stochastic simulation model that uses the best-fitting initial susceptibility (S 0 ) and the constant transmission rate β, as calculated arbitrarily for the 2007-2008 season.The model assumes that the epidemic trajectory may be approximated by a Poisson distribution such that: For large populations, the model is in fact a good approximation to the better known chain binomial model (see eg., [6]).By running 10,000 Simulations of the Poisson model, it is possible to report both the mean and range for which 95% of the simulations fall within.We might consider these as "pseudo 95% confidence intervals."As seen in Figure 4a the observed time series of the 2007-2008 season (black) fits well within the pseudo 95% confidence intervals of the stochastic model.
A key difference between this model and the FC model is that here the transmission term β is constant.Eqn.4 states that the number of newly infected individuals in a population at time-step (t + 1) is binomially distributed and may be viewed as a random draw from S t individuals with probability of any susceptible individual coming into contact with an infected individual (1 − e −βIt ).As before, β is the rate of contacts transferring infection between any two members of the population in interval ∆t.
The conditional probability of I t+1 is: The likelihood for both β and S 0 given the entire dataset I is: and β and S 0 can be found by numerically maximizing this likelihood function.This is a straightforward numerical optimization implemented with the Matlab routine fmincon.Table 1 summarizes the results of the analysis for the eight flu seasons in which the dominant strain was influenza A using the SIR Fine Clarkson (FC) method and the maximum likelihood (ML) method described in [6].There we report estimates of the percentage of susceptibles in the population before the epidemic (S 0 ), the effective reproductive number R ef f , the percentage of susceptibles infected over the epidemic Z (as estimated by the FC model), as well as the attack rate A, the percentage of the population infected.3. Discussion.The SIR (FC) model used to estimate S 0 and R ef f is the simplest form of an SIR model, which assumes uniform mixing of susceptibles and infectives within the population, a closed population without migration, and an infectivity period of exactly three days for each infective person.In spite of these unrealistic assumptions, this simplistic FC model gives, in many of the seasons checked here, a good fit to the data and parameter estimates which match well with those found by an established maximum likelihood approach based on the chain binomial model.The FC method which our model is based on dictates that the model uses discrete time, with time units being close to the average generation time of influenza.As Diekmann and Hesterbeek [4] show, for the simple SIR model, there is a direct equivalence between continuous and discrete time versions if the time unit is the generation time.Nevertheless, some differences are usually to be expected when discretizing a process that operates in continuous time.Our choice of three day time-intervals is relatively small (c.f.2-week intervals of Fine and Clarkson) and should minimize any discrepancies.Although the results here were not compared to results of continuous-time models, it would be interesting to do so in the future.
Apart from the 2000-2001 season, the SIR-FC method appears to give slightly larger estimates for S 0 compared to the maximum likelihood method.Conversely, the calculated values of R ef f given by the FC method were always slightly lower.The seasons in which the results were most similar by both methods were found to be those in which the SIR model gave the best fit to the data.
It is important to note that the estimates of S 0 and R 0 made here depend on the assumption that the number of cases in the data is in fact the true number of cases.In fact we expect that only a certain fraction r of cases are reported, for various reasons such as incorrect diagnosis of the disease by the doctor and simply failure to visit a doctor.Another possible source for unreported cases would be asymptomatic cases.These individuals lack symptoms of influenza, and are thus not registered in the dataset, but nevertheless infect others.If one assumes that the infectivity of non-asymptomatic individuals is the same as that of asymptomatic individuals, then the asymptomatics can be treated as just another group of unreported cases, which is what we do here.Some models have accounted for different transmission rates for asymptomatic cases (e.g.[9]).However, as discussed in [14] (supplement), the difference in transmission rate of asymptomatics relative to symptomatics is a parameter which is not identifiable based on the incidence data alone.We note also that being asymptomatic has two opposite effects on the transmission rate: on the one hand asymptomatics are likely to be less infectious upon contact, but on the other hand they are less likely to withdraw to bed, which makes them more likely to infect others.Therefore we believe that the assumption that asymptomatics are approximately as infective as symptomatics, and can thus be treated simply as unreported cases, will not distort the parameter estimates to a great extent.
Assuming that a fraction r of cases are reported, then the true number of cases is I t /r.This implies that to fit the Fine-Clarkson model in (2), we must replace S t by S t /r.The true value of the number of susceptibles at the beginning is S true 0 = 1 r •S 0 .Thus in (1) β t must be replaced by r•β t , so the true value of the reproductive number is R true 0 = r • R 0 .Hence without an independent estimate of the reporting rate r we cannot know the values of of S 0 and R 0 .However the quantity R ef f = R 0 • S 0 /N is invariant, that is, it does not change if R 0 is multiplied and S 0 is divided by the same constant, so that our estimate of it does not depend on the reporting rate r.Another invariant quantity is the ratio of the number of people infected to the number S 0 of susceptibles at the beginning of the season, since changing the reporting rate means multiplying both the numerator and the denominator of this ratio by a constant.We denote this ratio by Z, and note that it expresses the percentage of those people who are susceptible at the beginning of the epidemic who become infected.
The models estimate that the number of susceptibles S 0 in the population range approximately between 3-8% of the population at the beginning of each influenza season.This may at first seem perplexing but it should be kept in mind that the calculations assume 100% reporting rate (r = 1).In this respect we note that the average attack rate is close to A = 2% (Table 1), yet seasonal influenza is generally estimated to have an attack rate in the vicinity of A = 5 − 15% [20].Following the arguments of Katriel and Stone [12], we suppose the real attack rate A = 10% which would suggest a reporting rate of r = 1/5, a rate that would appear a reasonable first approximation in the Israeli context.Factoring this in, the true number of susceptibles are expected to be in the range S 0 = 15 − 40%.
Given the uncertainty in assessing the exact value of the reporting rate r, it is also worth examining Z, the percentage of susceptibles which become infected, which as mentioned is a quantity that is independent of the reporting rate.Table 1 shows that Z ranges between 25 and 41.5%.Intriguingly, independent estimates [12] based only on rough quantitative features of influenza epidemics find that for seasonal influenza, Z should approximate Z = 30%.These results give a testable prediction (in principle): if one can identify individuals who are susceptible by serology.One might then take pre and post-season samples and estimate the percentage of susceptibles who were infected.If this differs radically from 1/3, then the basic SIR model equations which are widely used in the literature may be in need of modification.
Our calculations show that the effective reproductive ratio is approximately R ef f ∼ 1.2 (Table 1).This is the same as the estimate for R ef f made in [12], made using the general properties of seasonal influenza.Again, if we assume a reporting rate of r = 1/5 this implies that R 0 = R ef f /S 0 is expected to be in the range R 0 =2.7-8.0 which is also not unlike that reported in [12].
In summary, these approaches for parameter estimations have the potential to provide very useful information concerning a population's epidemiological status.However, one of the key limitations appears to be sources of quality surveillance data and a realistic estimate of the reporting rate.But even in the absence of the latter, it is still possible to estimate invariant parameters as Z and R ef f .Fortunately, the Israeli seasonal influenza dataset is sufficiently refined to mostly overcome the former problem.

Figure 1 .
Figure 1.Analysis of measles patterns in England and Wales from 1950 to 1965.(A) Average number of infectives notified per week, (B) Weekly transmission parameter N • β t .Shaded blocks indicate school summer and Christmas periods.Figure from Fine and Clarkson [7].

Figure 2 .
Figure 2. Daily number of influenza-like illness cases amongst Maccabi Health Services, 1 Jan 1998 -30 May 2008.The 2002-2003 season and the 2005-2006 seasons were dominated by the influenza B virus, and are characterized by a low number of ILI cases, a relatively gradual increase in the number of cases and a sudden curtailing of the epidemic in the spring.

Figure 3 .
Figure 3. ILI cases (top panels), estimated susceptible numbers (middle panels) and contact rates (bottom panels) in the 2007-2008 season.The left hand side column has figures generated from simulations with S 0 = 60, 000 while the right hand side panels have S 0 = 400, 000.Time steps are in units of three days.

2. 1 .
The Chain Binomial (CB) Model.Ferrari et al.[6] describe a technique for Maximum Likelihood estimation of S 0 and β based on fitting a chain binomial SIR model of the following form:

Figure 4 .
Figure 4. ILI cases in the 2007-2008 season (black dots) and F-C model fit (red dots) and "pseudo 95% confidence intervals" (red bars).Bottom: contact rates for the 2007-2008.The period to which the model was fit is highlighted in dots and the horizontal line is β.

Table 1 .
Results comparing the FC and ML models (see text).