Estimation of invasive pneumococcal disease dynamic parameters and the impact of conjugate vaccination in

Pneumococcal diseases, or infections from the etiological agent Streptococcus pneumoniae, have long been a major cause of morbidity and mortality worldwide. Recent advances in the development of vaccines for these infections have raised questions concerning their widespread and/or long-term use. In this work, we use surveillance data collected by the Australian National Notifiable Diseases Surveillance system to estimate parameters in a mathematical model of pneumococcal infection dynamics in a population with partial vaccination. The parameters obtained are of particular interest as they are not typically available in reported literature or measurable. The calibrated model is then used to assess the impact of the recent federally funded program that provides pneumococcal vaccines to large risk groups. The results presented here suggest the state of these infections may be changing in response to the programs, and warrants close quantitative monitoring.

1. Introduction.Infections caused by Streptococcus pneumoniae, or the pneumococcus, have long been a major cause of disease burden and mortality worldwide.Most affected are children under the ages of 5 in developing countries, where it is estimated that around 1 million die each year of pneumococcal pneumonia [40].In more developed countries these diseases are prevalent in the young and the old, with the elderly population commonly being affected by invasive pneumococcal diseases (IPDs), commonly defined as pneumonia, meningitis, bacteremia, and sepsis.These infections can result in case fatality, an outcome which occurs increasingly often in older age groups [1].While children in more developed countries are typically afflicted by less serious infections such as otitis media, or the common ear infection, it is one of the most common causes of childhood morbidity.In the US it is estimated that 90% of children experience at least one episode by 2 years of age [26,27].Treatment efforts, usually consisting of antibiotic regimes, not only cost the US an estimated $2 billion per year [30], but also result in the generation of antimicrobial resistance among common serotypes.
The pneumococcus is a gram-positive bacterium with over 90 serotypes/serogroups identified thus far.S. pneumoniae commonly colonizes the nasopharyngeal region of healthy adults, with estimates between 9-14 % of the population in the US and Spain.This prevalence varies depending on factors such as crowding, age-profiles, and previous exposure.This colonization is asymptomatic, occurs as a result of casual contacts with other colonized individuals, and occurs for shorter periods of time in healthy adults as opposed to young children [25,29].It is during this stage that the pneumococcus may successfully establish an infection or be cleared by one's immune response.Thus, it is important to note that this stage always precedes infection.Most individuals are able to clear these transient colonies, although some cannot without the aid of vaccination.
A polysaccharide vaccine protecting against 23 serotypes, which is relatively inexpensive and safe, has been available since 1983.However, this vaccine is ineffective in young children, who have not yet developed the ability to respond to polysaccharide antigens.In response to this problem, the conjugate vaccine, in which polysaccharides from 7 serotypes have been conjugated to a nontoxic protein, was developed and licensed in 2001.The implementation of this vaccine in the US has dramatically reduced cases of invasive disease in children, as well as decreased rates of carriage.In addition, infections have been reduced in those who are not vaccinated, suggesting a herd immunity effect.However, the long-term effects of these and other vaccines currently in development are unknown.Thus, theoretical population studies of the effects of vaccination on pneumococcal dynamics are of increasing interest.Of particular interest are theoretical studies that have been calibrated to a specific population and are therefore in a position to directly address questions about the long-term effects of vaccines and the effects of potential control scenarios.
In this paper we discuss a mathematical model of pneumococcal infection dynamics with generic vaccination.Crucial model parameters, not typically available in relevant literature, are estimated using surveillance data collected by the Australian government [10].An accurate picture of the disease dynamics in the Australian population is thereby described for January 2002 through December 2004.As of January 1, 2005, pneumococcal vaccination for all recommended risk groups was made publicly available and the number of infections has been dramatically reduced.The impact of this widespread vaccination is assessed based on the current data available, and the need for qualitative monitoring of this population is illustrated.

2.
Model.We introduce a model of Streptococcus pneumoniae infection dynamics in which individuals are classified according to their epidemiological status.Individuals may be susceptible, represented by the S class, or asymptomatically colonized by S. pneumoniae, represented by the E class.In the event that a colony of S. pneumoniae is able to successfully establish an infection, the individual is then considered infected, represented by the I class.
Pneumococcal infection dynamics with vaccination.
Individuals are recruited into the population at constant rate λ, with all newborns assumed to be susceptible.Susceptible individuals become asymptomatically colonized through effective contacts, measured by per capita rate β, with other colonized individuals (E and E V ) and those with established pneumococcal infections (I and I V ).Thus, the transfer between these two compartments occurs at rate βS E+E V +I+I V N where N represents the total population and is calculated by the sum of all classes in the model, i.e., N = S + S V + E + E V + I + I V , which is not assumed constant.Most individuals return to the susceptible class after colonization, i.e., clearing the bacteria, at per capita rate α.Infection occurs in only a small fraction of the colonized individuals, and is modeled by the rate lκE, where l < 1 is a scaling factor for the proportion of the population that is 'at risk' and κ is the infection rate of this 'at risk' population.This linear term is a gross simplification, as this process is actually a function of many factors, namely the health of the individual's immune system as well as their previous exposure to S. pneumoniae of similar serotype.Thus it is likely that a good way to model this transfer will require further study at the 'within host' level.Death due to pneumococcal infection occurs at per capita rate η, or the infection is cleared at rate γ.Natural death can occur at any stage at per capita rate µ.
Seasonality of invasive pneumococcal diseases has been observed and studies support a seasonal infection rate, κ, rather than in a seasonal effective contact rate, β.We assume that the observed fluctuations are due to seasonal changes in host susceptibility [15,12].For instance, it has been noted that the seasonal fluctuations of pneumococccal pneumonia correspond to those observed in influenza cases [21,37].While seasonal influenza fluctuations are typically attributed to changes in effective contacts, it is reasonable to interpret the effect of influenza on pneumococcal disease dynamics as increasing host susceptibility.Thus, seasonal influenza infections serve to exacerbate the number of new infections and not the number of individuals colonized.In addition, a household longitudinal study of nasopharyngeal carriage rates over a 10 month period (including the months of highest IPD incidence) reported the prevalence of colonized individuals remained relatively constant [19].In a study of frequent nasopharyngeal colonizers, Marchisio et al. found seasonal changes in colonization prevalence by some bacteria, but not that of Streptococcus pneumoniae [22].Hendley, et al., found increased nasopharyngeal carriage due to S. pneumoniae among nasal swabs in children, but also demonstrated that these changes were due to increased nasal drainage, and thus heightened sensitivity of the tests, during the colder months [18].These studies support the implementation of a time-dependent infection rate and not a time-dependent effective contact rate, and thus we ascribe the form to κ(t) as We incorporate general pneumococcal immunization-that is, any type(s) of vaccine implemented may be considered simply by specifying parameter values.For this study, we consider a population vaccinated with either the 7-valent proteinpolysaccharide conjugate vaccine (PCV7) or the 23-valent polysaccharide vaccine (PPV23).Upon effective vaccination, susceptible and colonized individuals enter the S V and E V classes, respectively, at per capita rate φ.Thus, only individuals who respond in some way progress to the vaccinated classes.Typically, adults over the age of 65 lose protection from the PPV23, which we model by the transfer of individuals in S V and E V to their corresponding unvaccinated classes at per capita rate ρ.In the case that only the PCV7 is being implemented, ρ = 0, since this is only recommended for children.However, PCV7 has noticeably conferred protection against colonization and thus this rate would be reduced by a factor 0 < < 1, (1 − would be the fraction by which colonization events are reduced).
In contrast, PPV23 does not reduce susceptibility to colonization and = 1 in that case.Upon immunization, not all individuals respond to all included serotypes, and even in that case, protection against infection is not complete.Therefore, effective vaccination is modeled as preventing a fraction 1 − δ of infections.Thus, colonized individuals who have been vaccinated, E V , progress to an infected state at rate δκE V .We assume that the duration of a given infection in vaccinated and unvaccinated individuals is roughly the same.However, the infection of vaccinated individuals are not assumed to indicate any loss of protection, and therefore, these individuals return to the S V class upon recovery.A complete list of variables can be found in Table 1, and the model equations are given by 3. Methodology: parameter estimation.Here we describe the formulation of the inverse problem used to calibrate the model to the Australian population and then to assess the impact of widespread vaccination.We briefly describe the data set itself and the model quantities represented by the data in Section 3.1.Fixing certain parameters and initial conditions, as described in Sections 3.3 and 3.4, respectively, we use the data reported during the period January 2002 to December 2004 to estimate unknown model parameters, θ = (β, κ 0 , κ 1 , δ) T .The estimation of these parameters via a weighted least squares algorithm is described in Section 3.2.The calibrated model, along with data collected after January 2005, is then used to estimate vaccine parameters or δ.

Description of data.
The number of cases of invasive pneumococcal diseases (clinically defined here as pneumonia, meningitis and bacteremia, or the presence of S. pneumoniae in normally sterile fluid) by month dating back to January 2001 are available on the Australian NNDS website [10].However, it has been noted in annual reports that there were some reporting discrepancies in the first year of collection [38], which were resolved in subsequent years.Therefore, all of our parameter estimates are based on data collected beginning in January 2002, although information from December 2001 is used to calculate initial conditions (in our model t = 0 corresponds to December 1, 2001).
Annual reports [31,32,33,34] on this collected data are published in the Australian journal Communicable Diseases Intelligence.The reports contain vaccination information if collected, and also case fatality estimates for the year.From the case fatality data we directly calculate the disease-induced mortality rate (η) as described in Section 3.3.The vaccination information from 2002, 2003, and 2004 provides another six data points with information on the efficacy of the implemented vaccine program.There is some slight underreporting evident when the summaries in the annual reports are compared to the same data published on the NNDS website.Because of this discrepancy, proportions were used to extrapolate to the total number of cases in the data set for each year.This is described in further detail in the Appendix.While this is of course not exact, we are likely not committing errors of a magnitude that would influence the estimates of the parameters significantly.Moreover, the data is not assumed to be error-free and the extent to which this affects our estimates is taken into consideration in the error analysis described in Section 3.2.
Monthly case notifications d (1) j are best represented as integrals of the new infection rates (including those in the vaccinated class) over each month, since they represent the number of cases reported during the month and do not provide any information on how long individuals remain in an infected state.Similarly, the annual observations d k indicate the cases recorded during the year occurring in either vaccinated or unvaccinated individuals.These data are integrals of the infection rate into the vaccinated or unvaccinated classes over the year.Thus, the observations and the model quantities are related by [31,32,33]), [31,32,33]).
Let us define Y (1) (t j , θ), Y (2) (t k , θ) and Y (3) (t k , θ) to represent the integrals above, so that d (1) j ∼ Y (1) (t j , θ), d (2) k ∼ Y (2) (t k , θ), and d The data collected during the period January 2002 and December 2004 are used to estimate the parameters, β, κ 0 , κ 1 , and δ, and to directly calculate η.The monthly case notification data beginning in January 2005 are then used to examine the effects of increased conjugate vaccination on the population, by determining values of and δ for fixed vaccination rates (φ) that could have generated the number of cases observed.

Least squares estimation of parameters.
To begin, we use n = 42 data points (n 1 = 36 monthly cases and n 2 = 6 annual vaccinated or unvaccinated cases) to estimate four unknown parameters: θ = (β, κ 0 , κ 1 , δ) T (θ ∈ Θ ⊂ R p where p = 4 and Θ is the feasible parameter space).As described above we have a vector system of three types of longitudinal data.We use a statistical model to represent these observations, made up of the deterministic model and error accounting for discrepancies between the model predictions and the data.These include errors incurred when the measurements are reported, along with any stochastic behavior not accounted for by the deterministic model.We assume there exists a 'true' set θ 0 of parameters which generated the data, and our statistical model is then given by The errors ( (i) j in ( 8) - (10) for i = 1, 2, 3) in the above model are assumed to be random variables with means E[ (i) j ] = 0 and constant variances var( (i) j ) = σ 2 0,i , where σ 0,i is unknown.Here we have assumed that the size of the errors committed at each time for a given kind of "measurement" is constant and also does not depend on the magnitude of the measurement itself.We also assume that (i) j are independent and identically distributed (i.i.d.) random variables for each fixed i. Thus since each d (i) j depends on the random variable i.i.d. for fixed i.We further assume that var( (2) j ) = σ 2 0,2 = σ 2 0,3 since these data arose from the same counting process.
We seek the set of parameters θ that minimizes thereby minimizing the weighted differences between the model quantities and data.We remind the reader here that we have assumed σ 2 = σ 3 .In the above cost function, J 42 (θ, σ 2 1 , σ 2 2 ), each residual has been weighted by the variance in the observations as well as a scaling factor to give equal importance to each type of data (w 1 = 1 and w 2 = 9), giving rise to a weighted least squares (WLS) estimator.The estimator θW LS is also a random variable then, and its distribution is called the sampling distribution.This sampling distribution provides information about the uncertainty in the estimates θ of the parameters obtained for a specific data set, which is actually one realization of d (1) , d (2) , d (3) where d (1) = {d (1) k=1 , and d (3) = {d (3) k } 3 k=1 .For a summary of formulation involving ordinary and weighted least squares along with asymptotic distribution results as discussed below, see [7].
The parameters to be estimated are ψ = (θ, σ 2 1 , σ 2 2 ) T since the estimation of θ depends on the unknowns σ 1 and σ 2 .We employ an iterative approach, explained in the algorithm: 2 where the superscript indexes the iteration.2. Minimize the objective functional (11) for θ(0) : 2 by the standard formulas σ(1) 4. Repeat steps 2 and 3 until || ψ(k) || − || ψ(k−1) || ≤ 10 −q where ψ = ( θ, σ1 , σ2 ) T and q is the resolution desired for convergence.Under reasonable smoothness and regularity assumptions, the standard nonlinear regression approximation theory for asymptotic (as n → ∞) distributions can be invoked (outlined in [11,13,20] and Chapter 12 of [35]).Smoothness requirements are easily checked for this example and regularity requirements involve, among others, conditions on the manner in which observations are taken as sample size increases.This theory yields that the sampling distribution for θW LS is approximately a p-multivariate Gaussian with mean E[ θW LS (d (1) , d (2) , d (3) )] = θ 0 and covariance matrix cov[ θW LS (d (1) , d (2) , d (3) . Since θ 0 , σ 0,1 , σ 0,2 , σ 0,3 are unknown we must approximate them in Σ 0 and we obtain Here χ( θ) = ∂Y ∂θ is the n × p sensitivity matrix where Y is a column vector of model representation of the data given by Y = (Y (1) , Y (2) , Y (3) ) T .The sensitivity matrix is then defined by θ= θ , with the quantities C, X, and Z defined as follows.Let C i be the linear operator such that Y (i) = C i X, X denote the state variables of the model, X = (S, E, S V , E V , I, I V ) T , and let G(X(t), t) denote the model dynamics so that dX dt = G(X(t), t).Then z is defined by z = ∂X ∂θ and satisfies the sensitivity equations Note that z depends on Ẋ(t) = G(X(t), t), a nonlinear system of ODEs, and its solution X, and is therefore not available in closed form.We thus use the numerical solution of both X and z in the estimation of Σ.
The standard errors for θ = ( β, κ0 , κ1 , δ) T can be computed from the diagonal entries of the covariance matrix, SE( θk ) = Σkk .These standard errors do not reveal how close the estimates are to the 'true' values for the parameters.But rather, they are an indicator of the reliability of the estimation procedure we have used to obtain them.Small standard errors can provide confidence in the values obtained for the parameter estimate, but do not indicate that these values are 'correct' in any way.On the other hand, if the standard errors are relatively large, our ability to estimate that parameter accurately is suspect.The standard error in that case would not suggest reasonable confidence in the corresponding estimate.
Once the model has been calibrated to the Australian population, i.e., parameter estimates found for β, κ0 , κ1 , δ using the data from 2002-2004, we proceed by estimating vaccination parameters that would quantify the impact of the widespread vaccination program beginning in 2005.To this end, we use the monthly case notification data from January 2005 through December 2005 to estimate θ = or θ = δ.The algorithm is simpler in the sense that we are now using one type of data and therefore can use an ordinary least squares (OLS) approach to find the parameters θ = ˆ or θ = δ that minimizes the objective function Since this expression does not involve σ 4 , an iterative process is not required here and we can first determine the values of the parameter estimates before using the standard formula to arrive at an estimate for the observational variance where n 3 = 12 and now p = 1.
We can apply the asymptotic theory of distributions again and see that the sampling distribution of θOLS (d (1) ) is approximately a Gaussian with mean E[ θ(d (1) Again we estimate Σ 0 by Σ( θ, σ4 ).Further computations are analogous to those described above.

3.3.
Parameters calculated from literature.Many of the model parameters are available from various sources of scientific literature.In this section we briefly discuss which parameters can be reasonably estimated from reported values.Details, if omitted here, of these calculations along with their explanation can be found in the Appendix.The information here was gathered from primary scientific research papers, S. pneumoniae reviews, or from the website of the Australian Bureau of Statistics.All resources used for the calculation of each parameter are cited next to their values in Table 2.
Parameters α, ρ, ω, and τ are readily calculated from literature.The length of asymptomatic colonization, or carriage, has been described for children and adults in a review by Bridy-Pappas et al., [8].This information together with the age-sex profile of the Australian population [5] was used to calculate a population average for α.The age-sex profile was also used to calculate the population average for the rate of loss of protection from vaccination, which occurs about three years after PPV23 vaccination of adults over 65 years of age.
Two parameters governing the oscillatory behavior of the infection rate were calculated directly from observing the infection patterns in the monthly data.It was noted that the infections peak annually, and thus the frequency ω = 2π 6 , and that they peak in July, so the phase shift is then τ = 8.
The proportion of the population who is considered to have a higher risk of progressing to infection is not easily quantified, so we have chosen a value that seems reasonable.The magnitude of this parameter seems to primarily affect the value for the vaccine efficacy, δ, with smaller values of l corresponding to smaller values of δ.Thus, we have chosen l = 0.05 simply because this value consistently results in 0 < δ < 1.Therefore, our estimates of these parameters are not going to be absolute, but rather allows for the discussion of relative changes in the values of δ.
The rate of vaccination can be estimated by a description of the recommended target groups and an estimate of the actual vaccine coverage attained reported in [31,34].The first of these reports contains estimates for coverage as of 2001, and the These numbers reported do not account for the individuals who do not respond to the vaccine, and therefore this is not an effective vaccination rate.However, in vaccine efficacy studies, this occurs in less than 5% of those who have received either of the vaccines considered here.Therefore we do not distinguish between an effective or practical vaccination rate.
Initial estimates for the parameters governing the overall population growth, λ and µ, can be obtained from the Australian Bureau of Statistics website.They have reported there the annual number of births and deaths for the years 1995-2005 along with the estimated resident population estimates.The calculation of these parameters from the reported births and deaths is discussed in the Appendix.However, these parameters do not produce solutions that agree well with the population data, and the model drastically underestimates the resident population growth with the calculated parameters.Since the calculation of both parameters, λ and µ, are equally reliable, both values were slightly changed to produce better agreement between the model total population and the reported population size of Australia.Further discussion of the values used for these parameters can be found in the Appendix.
The reduction in colonization events has been measured in previous vaccination trials and in other populations after the implementation of the PCV7 vaccine.These estimates were used to calculate an estimate for the parameter during the period of January 2002 -December 2004.However, in populations where the use of the conjugate vaccine has been widespread, significant reductions in colonization have been observed, even in unvaccinated persons.While this parameter in our model literally quantifies the direct reduction in vaccinated persons, a reduction in this value when the model is fitted to the set of data beginning in January 2005 would indicate that either the protection from colonization is greater than thought, or herd immunity could be occurring.On the other hand, a large value of would indicate that serotype replacement has occurred to such an extent as to increase the number of colonized individuals.This particular event would be disconcerting as it would implicate the increased threat of serotype replacement in invasive infections, a threat that the scientific and medical communities are aware of, although no documentation has been found to support or dispell this fear.
In the literature, there is minimal information concerning the recovery rates for pneumococcal infections, and certainly no mention of population averages for this parameter.However, we can reason intuitively that it is not common to recover from these serious infections in under 2 weeks.Similarly, we can guess that it is probably rare to remain ill for more than 2 months.Thus, a broad reasonable range for the recovery rate is 1  2 ≤ γ ≤ 2. Since the data we have obtained contains virtually no information about this parameter (discussed in Section 4) we numerically explore estimates of the other model parameters for fixed values of γ in this range in Section 5.
The disease-induced mortality rate can be calculated directly from the annual reports of case fatalities under the assumption that the recovery rate, γ >> 1/year.This assumption would allow us to justify the approximation  [4].The estimates discussed here are by no means exact and therefore sensitivity of estimated model parameters to these initial conditions are later considered in Section 5.The number of cases of IPD reported for December 2001 is 38 [10], but the vaccination status of these cases are unknown, and we initially take I(t 0 ) = 33 and I V (t 0 ) = 5.Many studies have estimated carriage prevalence rates in various populations, especially in light of the effects that conjugate vaccination may have on this aspect of pneumococcal diseases.Here we use studies that have focused on the Australian population (both aboriginal and non-aboriginal) in an effort to quantify the proportion of the population that will be colonized at any given time.In a review of the relevant literature, we have found carriage rates in Australian aboriginal children to be 89% ( [16]), 90% ( [36]), and 49% ( [39]), which results in an average of 76%.Non-aboriginal children in Australia were found to be colonized at rates of 29% ( [17]), 43% ( [36]), and 25% ( [39]), or an average of 32.3%.Nasopharyngeal colonization is not as commonly quantified in adults, so the data here is more sparse.Carriage rates in aboriginal adults are around 34% ( [16]), and in a study of UK households, the adult population there (assumed to have characteristics similar to Australian non-aboriginal adults) had about an 8% colonization prevalence ( [19]).These numbers, in conjunction with the age and sex profiles presented in [5] and [6], allow for the estimation of the colonization prevalence, which we determine to be around 10-15%.
The population for whom pneumococcal vaccination is recommended consists of adults including and above the age of 65 years, and indigenous adults over 50 years of age.Indigenous children 0-5 years of age in Central Australia and all indigenous children 0-2 years of age were also recommended to be vaccinated.The vaccine coverage as of 2002 is discussed in [31], and while it seems to be well quantified in adults (≈ 26.5%), the coverage attained during the year 2001 among targeted children is not mentioned.To begin, we assume that also 26.5% of the children have been vaccinated.Later in the model calibration section, we will discuss any errors that this could have introduced into our estimates for model parameters by using initial conditions where 0 and 100% coverage of targeted children is assumed.By this reasoning, we arrive at the estimates for the model initial conditions as shown in Table 3. 4. Algorithm testing.As routine good practice before use with experimental data, we explore the strength and weaknesses of the algorithm with simulated data from model quantities which we believe the surveillance data represents.Through this examination we are able to test the convergence of the parameter estimates θ to the known values θ 0 .In this way we are able to identify which parameters we can estimate from our data set.Further, we are able to explore how the reliability of the algorithm, and hence that of the estimates, may change as error is introduced in the observations.This testing may also suggest a poor interpretation of the surveillance data if the algorithm gives drastically different results when using the observed data as compared to the generated data.Thus this procedure provides us with a means to avoid common pitfalls during the estimation process.
We generate a set of data Y (1) , Y (2) , Y (3) that corresponds to what we have on hand, described in Section 3.1 with a fixed set of parameters ψ 0 = (θ 0 , σ 0.1 , σ 0.2 ) T .We initiate the iterative weighted least squares estimation process (Section 3.2) with an initial guess ψ that is slightly perturbed from the ψ 0 .If the estimation procedure is working well, the ψ should converge to the 'true' set rather quickly.We look to the standard errors as one indication of the ability of the algorithm to estimate that particular parameter using the data set.
In reality, no data set is free of observation error, and we would like to see how well our algorithm performs when the data contains some noise.To this end, we add error by sampling from a normal distribution with mean 0 and constant variance ( as per the assumptions of our statistical model (equations ( 8) -( 10)).The magnitude of this variance determines how much noise is added.We can expect that 99% of the time, numbers generated from this distribution will be within the interval [−3σ i , 3σ i ].Thus we take Table 4. Algorithm testing for parameter estimation of θ = (β, κ 0 , κ 1 , δ) T .The model was fit to generated data with k = 0, 1, 5, 10% noise where subscripts in ψ (k) denote then the level of noise in the simulated data. where We add noise that is scaled to our observations, but with variance that is constant with respect to time.For k% noise then, we take σ i = k 100 * average j (d (i) j ), so k scales the constant variance relative to the magnitude of the observations.
The results from the testing of the algorithm's ability to estimate the parameters ψ = ( β, κ0 , κ1 , δ, σ1 , σ2 ) T with k = 0, 1, 5, 10% noise are displayed in Table 4 and Figure 2. The estimation procedure appears to be reliable for all parameters since the values for the parameters are close to their true value and the standard errors SE( θk ) are considerably less than their corresponding parameter value θk .Initially, the standard error of all parameters are many orders of magnitude less than their parameter value, but the standard errors increase with added noise, indicating that our ability may depend on the amount of observational error in the data.Still the standard errors are sufficiently small to suggest confidence in parameter estimates.
As discussed in Section 3.3, there is relatively little information on recovery rates from or duration of invasive pneumococcal diseases, so our ability to estimate this rate (γ) is of interest.However, when the parameters to be estimated are expanded to include γ, its estimated value does not converge to its true value (shown in Table 5).Although the fit to clean data is not discouraging, we see that with only 1% noise the standard error SE(γ) for the recovery rate increases dramatically, and we cannot justify estimating this parameter.While we cannot reliably determine a value for this parameter with data of the type used in this paper, it is nevertheless  The lack of information concerning the parameter l, the proportion of the susceptible population which is particularly likely to develop infections, motivates us to attempt to estimate this parameter with our data set.The results of this are shown in Table 6, and suggests that the data available does not enable us to reliably estimate this parameter since the estimated value δ, is significantly different from the true value δ 0 , and the standard errors of both δ and l are quite large.Notably, the parameter most affected is δ, suggesting that it is reasonable to fix l to a value that restricts δ to a reasonable range.7. Top panel: monthly cases; bottom left panel: annual unvaccinated cases; bottom right panel: annual vaccinated cases.
5. Model calibration.Using the iterative weighted least squares procedure described in Section 3.2, we find a best fit solution to the Australian IPD data (Figure 3) and estimates for the model parameters, shown in Table 7.The model solution and data agree well, and parameter values are on the scale of our initial guesses, although their values differ slightly to minimize the cost function in the functional (11).Further, the standard errors are relatively small and suggests that the estimates obtained here are reliable (as we would have expected based on the results in Section 4).These parameters are displayed several times in boldface throughout this section for ease of comparison.The data fits in Figure 3 reveal that the model solution with the parameters shown in Table 7 fits the Australian surveillance data from 2002-2004, with the top panel showing the fit to the monthly case notification data, the bottom left panel the unvaccinated cases reported annually, and the bottom right the annual vaccinated cases.The demographic parameters here are λ = 25, 000 and µ = 0.0003 instead of those reported on the Australian Bureau of Statistics website, since those parameters seemed to underestimate the resident population of Australia, as discussed in further detail in the Appendix.For the sake of consistency of the model to the Australian population, we use these values throughout the rest of the paper.The initial conditions and vaccination rate used to generate this solution are calculated under the assumption of 26.5% vaccine coverage of both the adult and child target groups.This is fairly accurate for the adults, but no data is available on the coverage for the younger target group at that time.We next explore how the calibrated parameter values might change in the event that the vaccination rate and initial conditions have been miscalculated.The effects of changes in the vaccination rate φ on the estimated parameters are given in Table 8.When φ is decreased by about 15% there are no notable changes in the values from the original value of φ (displayed in boldface), and only δ changes by about 80% if φ is increased by an order of magnitude.Since we are not attempting to report an 'absolute' value for δ accurately, but are instead looking for relative changes in this value based on data before and after 2005, there is no cause for concern.Additionally, it is not likely that we have erred this greatly in the calculation of the vaccination rate.In the annual report of 2005, it is stated that 51% of adults over the age of 65 had been vaccinated, which would be reflected by a vaccination rate of φ ≈ 0.001.However, the model and annual data with this  vaccination rate no longer agreed, even in their general trends, as is evident by the large estimated observational variance σ2 .Thus we conclude that the current value of the vaccination rate, for this set of initial conditions, appears adequate.
In the event that our assumption that 26.5% of targeted children to be vaccinated is incorrect, the initial conditions along with the vaccination rate would be erroneous.In Table 9 we investigate the effect that this could possibly have on the estimated values of the parameters.Results of all three inverse problem calculations appear to be essentially identical with only minor differences in the parameter values.These results suggest that possible errors in the calculation of φ and the initial conditions are inconsequential in the estimation of the desired parameters.
Another possible source of error in our calibration may be the recovery rate, γ, on which there is relatively little information.However, we are able to reasonably deduce a range over which this parameter might vary.To investigate the effects of large changes in γ on our parameters we proceed by fixing γ = 1 2 and γ = 2.The corresponding results are displayed in Table 10.Again, the estimates of all parameters change minimally and we conclude that even large changes in the recovery rate γ do not affect our estimates.
The vaccination status of some of the cases were classified as partial for some individuals who had not received the recommended doses for their age group.Recall that this is not relevant for elderly receiving the polysaccharide vaccine, due to this being given as a single dose.In the previous fits partial vaccinated cases were considered fully vaccinated, resulting in an overestimation of the value for δ.Parameters were estimated from data in which 'partial' vaccinated cases were considered unvaccinated.This is an underestimation of the cases due to vaccine failure and therefore also an underestimation of δ (and an overestimation of the vaccine efficacy 1 − δ).The parameters from both of these fits are displayed in Table 11.
With the exception of δ, the estimates for all of the parameters change very little.This agrees with our intuition since the only difference in the data sets used to estimate these parameters is the annual data, which contains mostly information on δ and relatively little on the other parameters.Thus, this numerical experiment has provided us with a range of where a feasible value for δ during the time period 2002-2004 likely lies, i.e., δ ∈ [0.35, 0.72].
To test the assumptions of the statistical model that we have chosen to represent our data, we plot the residuals between the model and observations as a function of the model, that is, |d 4).The lack of a clear relationship between these two quantities would indicate that our assumptions are reasonable and the errors of each observation does not depend on the model values.However, we see six groups of points, which can be explained by the oscillatory pattern of the infections.In the top panel we have plotted just one half of the period of the infection rate and see a completely random pattern, indicating no relationship among these quantities.When we extend this time period for another half of a period, thus plotting an entire period in the middle panel, we see that there are two points in each group of points.Thus, the pattern observed is driven by the seasonality of the infections and not by any incorrect assumptions.On the contrary, only a pattern in the dependent variable (the residuals) would suggest that incorrect assumptions have been made.This analysis suggests that it is reasonable to assume constant variance among observations of the same type, providing support for the statistical model underlying the parameter estimation procedure.
A similar plot of the residuals from the best fit solution to data in which partial vaccination is considered equivalent to unvaccination is not shown here.The difference between these two data sets lies in the annual data, and so the monthly data is unchanged.Although the vaccine efficacy parameter δ is drastically different when estimated from each data set, this does not appear to affect the model fit to monthly data, as can be seen from Figure 5.In fact, when the difference between the two monthly solutions are plotted the solution is in the range of [−1.5, 2] and varies around zero.In this section, we have tested various sources of error as well as assumptions of our statistical model used in the estimation.We have confirmed that the estimation of these parameters for the purposes of calibrating this model to the Australian invasive pneumococcal disease data has been successful.The calibrated parameters are then taken as an average of the two sets displayed in Table 11 and are shown here in Table 12.We seek to find vaccination parameters that minimize the difference between the model and observations via least squares estimation (see Section 3.2).Thus, we are looking for values of and δ that could have resulted in the observed number of cases.We do not estimate the vaccination rate φ since there is sufficient information to determine a realistic range for this parameter in the annual report concerning the data collected during the year 2005 [34].It is reported here that 91% of children under the age of 2 were vaccinated during the year 2005.Neglecting the loss of protection during that time period, especially since this is unobserved in children and most children were previously unvaccinated, we calculate a vaccination rate of children.The proportion of adults 65 and over who had been vaccinated as of 2004 is reported here as 51% and we calculate a range of φ that would have resulted from zero to 100% vaccinated.Additionally, the number of cases that occurred in vaccinated or unvaccinated individuals is given in the 2005 annual report (again, this information was reported in 70% of cases and we have applied these proportions to characterize the remaining cases in the data set).While this only represents one annual data point and hence is not appropriate for least squares estimation, it does provide additional information against which we can verify the validity of our fitted solutions.
We first consider changes in that could have resulted in the observed decreases in infections during the year 2005 with estimates displayed in Table 13 and a sample fit shown in Figure 6.Initial conditions for these fits have been taken as state variable quantities when the model has been run with calibrated parameters to the time step corresponding to January 1, 2005.All estimates for are drastically lower (by about 90%) than the feasible lower bound for this parameter, which is 0.93875.This parameter is strictly interpreted here as representing the direct reduction in colonization in vaccinated individuals, an effect which has only been observed in protein conjugate vaccination.A substantially lower value, like those estimated here, could suggest herd immunity, an effect in which the vaccination of a portion of the population provides protection for the unvaccinated portion as well.In this case, herd immunity could be interpreted as a decrease in infections in unvaccinated individuals who are colonized at a lower rate due to the decrease in colonized vaccinated individuals.This suggests that the unvaccinated cases would be overestimated by our model and the vaccinated cases underestimated if herd immunity is occuring which is precisely what occurs across all three vaccination rates used.The standard errors are notably large, and do not provide additional confidence in our estimates.However, the annual report did note that there were reduced infections in unvaccinated groups, and thus, it is likely that herd immunity is occurring.However, other changes in the population could have occurred which would have also produced a reduction in infections.
The PCV7 vaccine is thought to be much more effective in children than the PPV23 is in the individuals for whom it is recommended, and the large increase of PCV7 vaccine could have resulted in improved protection of vaccinated individuals, notable at even the population level.With this in mind, we have held = 0.93875 constant and estimated δ as δ = 3.22e −14 .The standard error SE( δ) = 2, suggests that we will not be able to get a reliable estimate for δ from this data with the parameters used.Also, the number of vaccinated individuals estimated was 7.58e −11 , which is, of course, unrealistic.We conclude that it is not likely that a sufficiently large decrease in δ could have accounted for the reduction of infections observed in 2005.With the increased vaccination of recommended groups it is likely that the parameter, l, would decrease.This parameter, which scales for the relative risk of the individuals in the unvaccinated population to that of the vaccinated population, is not easily quantified and we were unable to calculate any reasonable estimate of it.We do not attempt to estimate this parameter here, instead, we note the effect of a 20% decrease in l on estimates of and δ (Table 14).Most notable in this experiment is the substantial change observed in the estimate of δ, which could indicate that a smaller herd immunity effect is occurring and also the population vaccine efficacy has changed when PCV7 is implemented.This indicates that some measurement of the relative risk to pneumococcal infections is of interest to those wishing to understand the effects of the vaccination programs.
Another aspect of immunization by PCV7 is that the long-term effects on a population are essentially unknown, so the effects of the new vaccination program should be evaluated as time progresses.If we overlay a best fit solution of the model with the data up to June 2007, we see a general underestimation of cases after 2005 (Figure 7), with the effect exaggerated in 2007 as compared with 2006.The data suggests that the landscape of pneumococcal disease dynamics is changing on a yearly basis under this vaccination program.If the changes could be explained via vaccination rates alone, we would likely see an overestimation of the data by the model, since the vaccination rate, φ, has likely decreased after the first year of the federally funded program.Since the opposite is observed, it appears that the values of or δ (or possibly both) have increased over these time periods.However, we are not able to estimate these parameters reliably with the available data, as their standard errors are an order of magnitude larger than their values and thus are not shown here.Although we are not able to reliably estimate or δ over the 2006-2007 time period, we see by inspection better agreement between the model and data when has been increased from 0.353 to 0.7 (Figure 8 left) or δ has been increased from 0.53551 to 1 (Figure 8 right).Here, the cases are still underestimated considerably, and it is likely that even greater increases in these parameters would have produced the data.Increases in could indicate serotype replacement by non-vaccine serotypes, a possibility which authorities are aware of, but no conclusive evidence has yet been found.Similarly, an increase in δ would indicate that the vaccines are, in general, less effective in the population than in previous times.Again, this would suggest either an evolutionary change in the pneumococcus, or the host response to the vaccines.In either case, a reevaluation of the vaccination programs would be necessary.These results are presented not to support or dispute specific changes associated with the Australian population or the pneumococci prevalent there, but rather, to demonstrate that changes appear to be occurring at a rapid pace.These studies suggest further close and quantitative study of the vaccination programs.
7. Discussion and concluding remarks.In this work, we have calibrated a mathematical model describing the dynamics of pneumococcal infections in the Australian population.We have estimated via a weighted least squares procedure pertinent model parameters of interest, but not typically measurable or reported in literature.All standard errors for these parameters provide confidence in the values obtained, even when initial assumptions have been relaxed as discussed in Section 5. We have shown in Section 4 that certain parameter values, namely the vaccine efficacy and infection rate parameters, depend on the relative risk of unvaccinated to vaccinated individuals, a measure not available in literature.Studies quantifying rates of infection in populations comparing predisposing effects of certain risk factors would provide some measure of this effect.With this knowledge, we would be better able to determine a suitable value for l, which is otherwise complicated by the movement of "at risk" individuals from the unvaccinated to the vaccinated classes.Alternatively, we could explicitly model the distinction between individuals who are more or less likely to proceed to infection upon colonization.However, this would introduce additional complexity which would likely result in difficulty in the least squares determination of model parameters.Here, the parameter values reported allow for the study of changes in the effects of the immunization programs before and after 2005.
Once pneumococcal vaccination was made publicly available to large recommended risk groups, a notable decrease in infections was observed.Our analysis suggests that this reduction is not solely accounted for by an increase in vaccination, but also a decrease in vaccinated colonized individuals, and even appears to suggest the presence of herd immunity in 2005.The potential impact of this reduction of colonization due to PCV7 is very promising from a public health perspective, but serotype replacement has been observed in some populations in the colonization stage, and its effects are uncertain.It is possible that widespread and long-term vaccination by PCV7 could provide a selective pressure for the increased invasiveness of non-vaccine-included pneumococcal serotypes.
We have shown that vaccination parameters which accurately describe the infections for 2005 underestimate the cases for 2006 and 2007.In contrast, changing parameters to reduce the effect of the vaccines provides better agreement between the model and observed data.These results do not conclusively support any one effect generating this behavior but clearly suggest that changes are occurring within the Australian population, such as susceptibility to infection, and possibly the evolution of the pneumococci themselves endemic in that area.The dynamics of S. pneumoniae infections, along with many aspects of their vaccination, are highly dependent on age and other physiological factors such as nutrition and previous infections.As such, mathematical models structured in this way would better be able to comment on specific mechanisms responsible for disease persistence, and to assess the impact of vaccination programs.However, here the data available in this case does not motivate the use of such a framework, and the simpler model is more appropriate.At any rate, the analysis here clearly suggests that the high rate of PCV7 immunization will likely result in a drastic change of the landscape of pneumococcal disease dynamics, and requires close monitoring and quantitative study by public health officials.8. Acknowledgements.The authors would like to express thanks to Alun Lloyd and Eunha Shim for helpful discussions and insight.This research was supported in part by the National Institute of Allergy and Infectious Diseases under grant 9R01AI071915-05, the National Science Foundation under grant DMS-0502349, and an Achievement Rewards for College Scientists Foundation Scholarship, generously donated by Ralph and Sandra Matteucci.

Appendix.
Data.In the annual reports of the pneumococcal surveillance data there is a slight record-keeping discrepancy that is evident when comparing the annual number of total infections reported on the NNDS website and in the papers.There is no explanation for this in the reports, and thus we have no reason to assume that there are known errors in the monthly data presented online.Here we describe the manner in which the vaccinated/unvaccinated cases from the annual reports were converted for use in the least squares estimation.
The total number of cases for which vaccination information is reported (tot v ), along with the number of cases considered fully vaccinated (f ull), the cases which occurred in partially vaccinated individuals (part) and those which occurred in unvaccinated individuals (un).Unless otherwise noted, partially vaccinated cases were considered equivalent to fully vaccinated cases, and the vaccination data was then calculated by the formulas where total refers to the total annual cases reported on the NNDS website.
Parameter calculation.In this section we describe in detail the calculation of the parameters which are not estimated in the paper.
• α: The average length of carriage is 4 months in infants (up to 24 months old), and 2-4 weeks in adults [8].The age-sex profile of the Australian population was found in the Health and Ageing Factbook 2006 [5], with children ages 0-4 comprising 6.2% of the population.Since the length of time in this compartment adversely affects (increases) the number of colonization events, an overestimate of this parameter would result in an overestimate in colonized individuals and thus, infections.Also, it would likely be more reasonable that there is some continuity in the colonization period as a function of age, so while this average overestimates the period for children 2-4 yrs old, it likely underestimates the period for children 5-10 yrs old.Therefore, it is reasonable to estimate α as α = 1 4 * 0.062 + 1 0.7 * 0.938 = 1.3555.
• ρ: Loss of protection from vaccination only occurs in adults ≥ 65, and antibody levels reach pre-vaccination levels in ≈ 3 yrs (36 months) [1].The proportion of the elderly individuals, both non-indigenous and indigenous, in the target group of vaccinated individuals is 1 2,548,936 * (40, 112 + 2, 475, 623) based on demographic information as of 2001, found in [5].ρ is then ρ = 1 36 * (40,112+2,475,623)   2,548,936 = 0.02741.For a description of the target population for vaccination see the discussion for the calculation of the initial conditions in Section 3.4.
• φ: The vaccination rate is estimated by assuming that the vaccinated and unvaccinated portions of the population have reached a quasi-steady state in that no notable exchange is occuring between the two kinds of classes.That is, no significant surge of vaccination or loss of protection change is reported in the annual summaries or observed in the annual data.If the system is completely free of infection, the steady state, or disease-free equilibrium is given by The proportions of the population in S 0 or S 0 V were calculated based on the vaccine coverage estimates used in calculating the initial conditions, and we then solved for the vaccination rate φ that would give those proportions.In this way, we have calculated vaccination rates corresponding to 0, 26.When the model is run with these initial conditions and vaccination rates, very little transfer between the vaccinated and unvaccinated classes is noticeable.This is by no means accurate and in the model calibration we have checked the effects of miscalculating the initial conditions, as well as various vaccination rates for a fixed set of initial conditions.In both cases, it appears that our initial guesses for initial conditions and vaccination rates are sufficient.
• λ: The number of births per year for the years 1995-2005 is found on the Australian Bureau of Statistics website at [2].An estimate for λ can be obtained by averaging the annual births for Australia over the years 2002-2004 and dividing this by 12. Thus, λ = 21, 011 births/month.
• µ: The number of deaths per year for the years 1995-2005 is found on the Australian Bureau of Statistics website [3], along with the annual estimated resident population.The IPD case fatalities were subtracted from the deaths for the years 2002-2004, and this average was then divided by the average resident population for those years.Thus, µ = deaths − IPD deaths total popn ≈ 0.000551.
• : Reduced susceptibility to colonization has been observed in children vaccinated by PCV7 in numerous studies.Three such studies' results showed vaccinated individuals susceptibilities being reduced to 80.2% [9], 86% [14], and 60.2% (average ≈ 75.5%) as compared to unvaccinated children.Two studies, however, showed no significant decrease in carriage in two high-risk populations ( [23], [24]).Notably, Australian aborigines are considered to be a very high-risk population, with some of the largest prevalence rates observed anywhere in the world.If we assume the carriage in aboriginal children could be reduced as well, and that children will make up at most 25% of the vaccinated population, then we would arrive at a lower bound for as ≈ 0. ∼ N (t l ) = S(t l ) + E(t l ) + S V (t l ) + E V (t l ) + I(t l ) for l = 1, ..., 12 and (t 1 , ..., t 12 ) = (3, 6, .., 36) (quarterly population estimates [4]).The demographic parameters, λ and µ, are estimated from the population data only, represented by the statistical model below (16).Again, we assume that Since this is a scalar system, we can use an ordinary least squares approach to estimate the parameters (θ = (λ, µ) T ) by minimizing the objective functional in (17) where n 3 = 12 and p = 2.
The standard errors for θ = ( λ, μ) T are computed analogously to the vector case where the covariance matrix is defined as Table 15 contains the results of fixing the demographic parameters, λ and µ, or estimating them via OLS.The first two columns show the residual when the model is compared with the data and the parameters are fixed at the displayed values.In the other columns, standard errors are reported for parameters which have been estimated, and otherwise the parameter is fixed at the displayed value.When we estimate both parameters, they quickly reach the constraints placed on them, regardless of how large the restraints are set.This suggests that we may not be able to estimate both of these parameters simultaneously.When estimating just one parameter at a time, we see that the value does not differ much from the initial guess, and from a comparison of the residual sum of squares, we see that not   much is to be gained from estimating either λ or µ.However, the model population corresponds much closer to the observed data when λ = 25, 000 and µ = 0.0003, and since it is important that our model follows the population over a period of years, we take these as the parameters used in the study.

4 .
j+12 j I(s)ds.Under this assumption η is calculated as Calculation of initial conditions.Initial conditions for the model are a snapshot of the Australian population in December 2001,

Figure 2 .
Figure 2. Best fit model solutions to generated data with 0% noise (left) and 5% noise (right).On each side, the top panels represent the monthly cases of IPD, and the bottom shows the annual data with the unvaccinated cases on the left side and vaccinated cases shown on the right.

Figure 3 .
Figure 3. Best fit solution to Australian IPD data with parameters shown in Table 7. Top panel: monthly cases; bottom left panel: annual unvaccinated cases; bottom right panel: annual vaccinated cases.

Figure 4 .Figure 5 .
Figure 4. Residuals as a function of model variables.Top panel is over the period January 2003 through June 2003, middle panel is for January 2003 through December 2003, and bottom panel is for all three years

Figure 6 .
Figure 6.Best fit solution to 2005 Australian surveillance data assuming a vaccination rate of φ = 0.007984.

Figure 7 .
Figure 7. Best fit solution from 2005 Australian surveillance data run out to June 2007.
25 * 0.755+0.75= 0.93875.Thus a feasible range for is ∈ [0.93875, 1].We hold = 1 fixed for the model calibration since the portion of vaccinated individuals who received PCV7 during 2002-2004 was small.Population data.Estimates of the resident population of Australia is collected quarterly, available on the website of the Australian Bureau of Statistics [4].We use the estimate in December 2001 in the calculation of the initial conditions, and the estimates reported for March 2002 through June 2006 to refine our estimates of λ and µ from those calculated in Section 3.3.These data are then of the form • d (4) l

Table 1 .
Descriptions and units for model variable and parameters.

Table 2 .
Estimates of "known" model parameters.

Table 3 .
Initial conditions of model state variables.
important to investigate any effect large changes in this value could have on our results.This is done in Section 5.

Table 9 .
Effect of varied vaccine coverage of children on the estimation of parameters ψ = ( β, κ0 , κ1 , δ, σ1 , σ2 ) T .The subscripts denote the percent vaccination coverage of targeted children used in the calculation of the ICs and φ.

Table 10 .
Model parameters using Australian IPD data from 2002-2004 where the value of the recovery rate γ is shown in parentheses.

Table 11 .
Model parameters using Australian IPD data from 2002-2004.The second column contains parameter estimates from data with partial vaccinated cases considered as unvaccinated, and the fourth column considers partial vaccination equivalent to full vaccinated.

Table 13 .
Estimation of , the direct reduction of colonization due to vaccination, from 2005 Australian IPD surveillance data.The observed unvaccinated and vaccinated cases are 1211 and 537, respectively.The three vaccination rates used are based on 0, 75, 100% increases in vaccinated adults.Effect of increased vaccination.As of January 1, 2005, pneumococcal vaccination in Australia was offered free of charge to all adults over the age of 65, indigenous adults over 50, and children under the age of 2. Free vaccination of adults of these ages had been offered in some provinces such as Victoria since 1998, and for some children, but it was not until 2005 that vaccination was funded nationally.In this section, we use the model with parameters estimated from the 2002-2004 data together with the 2005-2006 data to investigate the impact of the vaccination program on the Australian population.

Table 15 .
Estimation of demographic parameters, θ = (λ, µ) T , from quarterly Australian population estimates.Standard errors, SE( θ) are displayed next to parameters estimated via OLS.The first 2 columns shows the residual sum of squares between the model and observed data when θ = (λ, µ) T are fixed at the displayed values.The remaining columns show the results of estimating both or one of the demographic parameters.*denotesestimatedvalues coinciding with the imposed limits on the feasible parameter space.RSS 4.5096e115.7456e 8.6146e8