MODELING AND PREDICTION OF HIV IN CHINA: TRANSMISSION RATES STRUCTURED BY INFECTION AGES

HIV transmission process involves a long incubation and infection period, and the transmission rate varies greatly with infection stage. Consequently, modeling analysis based on the assumption of a constant transmission rate during the entire infection period yields an inaccurate description of HIV transmission dynamics and long-term projections. Here we develop a general framework of mathematical modeling that takes into account this heterogeneity of transmission rate and permits rigorous estimation of important parameters using a regression analysis of the twenty-year reported HIV infection data in China. Despite the large variation in this statistical data attributable to the knowledge of HIV, surveillance efforts, and uncertain events, and although the reported data counts individuals who might have been infected many years ago, our analysis shows that the model structured on infection age can assist us in extracting from this data set very useful information about transmission trends and about effectiveness of various control measures.


1.
Introduction. Human immunodeficiency virus (HIV) has been one of the major public health problems worldwide: in 2005, an estimated 38.6 million people were living with HIV, an estimated 4.1 million became newly infected, and an estimated 2.8 million lost their lives to AIDS [17]. HIV has also been officially recognized by the Chinese government as one of the four infectious diseases that impose a significant challenge to the country's economic growth and public health: in 2005, the basic reproductive number is defined and the threshold for the persistence or extinction of disease is found [20]. Zhou, Ma, and Brauer formulated a discrete model to investigate the transmission of SARS in China, where model parameters were estimated using available but limited statistical data, and numerical simulations were carried out to mimic closely the SARS transmission process in China [21].
In summary, the characteristics of epidemics caused by HIV with long-term infections indicates sensitive dependence of the transmission dynamics on the transmission rates in different stages of infection and calls for infection age-structured models. On the other hand, such models will naturally involve a large number of infection-age-dependent parameters, and estimating these parameters based on very limited surveillance data is a challenging task. The fact that the yearly reported data does not include those newly infected makes the mathematical modeling, the parameter estimation, and simulations even more complicated and difficult. In this paper, for the discrete epidemic model with infection age structure we developed, we propose a simplification procedure based on the epidemiological features of HIV in China and the available data from the Chinese Center for Disease Control's State Key Laboratory for Infectious Disease Prevention and Control to reduce the problem of estimating a large number of parameters to the estimation of a baseline transmission rate and the functional relation between the transmission rates and the viral load of an infected individual.
The model formulation and the calculation of the basic reproduction number are presented in Section 2, while the proof of the fact that the basic reproduction number characterizes the long-term dynamics is given in the appendix. The parameter estimation and projection of HIV in China for 2006-2010 are carried out in Section 3, and a short summary and some discussions are given in Section 4.
2. Model formulation and stability. As discussed earlier, the infectious period of an HIV-infected individual is very long and the transmission rate of HIV-infected individuals varies greatly in different periods of infection. There are no clear symptoms for a long time after the infection, so that infected individuals can not be identified and recorded timely. The number of the reported HIV-infected individuals each year thus consists of individuals who were infected in different years and are in different infection stages.
This motivates the use of an infection-age-structured HIV transmission model, where individuals are assumed to be in one of the following epidemiological groups: susceptible (at risk of contracting the disease), infected (those who were infected and are capable of transmitting HIV, but have no symptoms), AIDS (those who have various symptoms). We also note that an HIV-infected individual usually develops AIDS after eight or ten years, and thus we divide the infected group into twelve subgroups, indexed by the infection age (year) j; j = 1, 2, · · · , 12. Later, for the purpose of parameter identification we also lump together certain infection ages according to their viral loads and CD4 cell numbers. The idea is similar to that in [8,11,5,15]. More precisely, at time t let S(t) be the number of the susceptibles, I j (t) the number of the infected with infection age j, and A(t) the number of AIDS. Let β j , µ j , and α j be the transmission rate, the death rate, and the AIDS rate of the infected individual in the group j, respectively, and δ the death rate of AIDS individual. The AIDS rate α j is introduced to describe the slow and fast progression of AIDS: some HIV infected individuals will enter the AIDS stage faster, some may be slower. We assume that α 1 = 0, α j ≥ 0, j = 2, 3, ..., 11, and α 12 = 1 − µ 12 . The transfer among those groups is shown in a schematic diagram (see Figure 1), where the standard incidence rate is used for the transmission rate: From the the schematic diagram, we formulate the following discrete HIV transmission mode with infection age structure: 11, where N (t) is the total population at time t, namely, Therefore, we conclude from the first equation of the above model that S(t + 1) ≥ 0 for all t ≥ 0 provided S(0) ≥ 0 and max 1≤j≤12 β j ≤ 1 − µ. This shows the positivity of solutions.
Using the following variables and changes of parameters, we can rewrite model (1) as , j=1β jĨj (t), I j+1 (t + 1) =Ĩ j (t), j = 1, 2, 3, ..., 11, Let where 1 − µ 1 − α 1 is the rate at which an infected individual can reach the second is the rate at which an infected individual can reach the j th year, and β j is the new infection that an infected individual can generate in the j th infection year. Therefore, the above number gives the average number of secondary cases generated by a primary case in the considered pool of susceptible individuals, though as shown in [6] this is not necessarily true in general. Nevertheless, as shall be shown mathematically, the basic reproductive number R 0 is indeed the threshold parameter to determine the stability of the disease-free equilibrium and the existence of an endemic equilibrium. So, R 0 serves as a threshold parameter that predicts whether an infection will spread. Simple algebraic calculation shows that there is a unique disease-free equilibrium E 0 of (2) with S 0 = Λ/µ, I 0 j = 0, j = 1, 2, ..., 12 and N 0 = Λ/µ, if R 0 ≤ 1. There exist disease-free equilibrium E 0 and endemic equilibrium E * of (2) if R 0 > 1, with The stability of the disease-free equilibrium E 0 of model (2) is completely determined by the magnitude of R 0 as shown in the following theorem.
We defer the proof to the Appendix. The endemic equilibrium E * of (2) exists when R 0 > 1. At the endemic equilibrium E * the matrix of the linearized system of (2) is The stability of the endemic equilibrium is determined by the locations of the eigenvalues of matrix L * , and a full stability analysis of the endemic equilibrium seems to be difficult because of the complicated expression of the equilibrium's coordinates and because of the high order of the polynomial f . We can, however, estimate the root of f (ρ) = 0 for sufficiently smallĨ * 1 /Ñ * , a situation naturally anticipated for Other roots are either real and negative, or complex, and are given by 1 = 12 Therefore, for sufficiently smallĨ * 1 /N * and for ρ with max{1 − µ, 1 − δ} < ρ < 1 we have Consequently, for sufficiently smallĨ * 1 /N * , f (ρ) = 0 has three positive roots: one is close to 1−µ, one is close to 1−δ, and the other is in the interval (max{1−µ, 1−δ}, 1). In addition to these three positive roots located in (0, 1), other roots of f (ρ) = 0 have moduli less than 1 for sufficiently smallĨ * 1 /N * . Whether the endemic equilibrium E * is globally asymptotically stable or not for the realistic set of parameter values remains an open problem, though our simulations seem to indicate the convergence of solutions to this endemic equilibrium for a wide range of parameter values and initial data. As an illustration, we use the following set of parameters (see next section for parameter identification and estimation): .., 12, µ = 0.006, δ = 0.2 and Λ = 1000. The basic reproductive number is thus R 0 = 3.21. The endemic equilibrium of (2) is The fourteen roots of the equation f (ρ) = 0 are It is easy to see that |ρ j | < 1 holds for all j = 1, 2, ..., 14; hence, the endemic equilibrium is asymptotically stable.  Figure 2: The simulation of model (2) with three sets of initial data specified in the text, for the total population N (t), the total number of infected individuals 12 j=1Ĩ j (t) in all 12 groups, the total number of susceptiblesS(t), and the total number of infected individualsĨ 1 (t). The simulation indicates that solutions corresponding to different initial data are all convergent to the endemic equilibrium E * .
In the simulation reported in Figure 2, we use the following three sets of initial values for our simulation: The simulation results are shown in Figure 2, indicating the convergence of solutions to the endemic equilibrium.
3. Estimation of transmission rates and prediction for 2006-2010. In this section, we use a simplified version of (2) to estimate the transmission rates and to make a prediction for the HIV trend in China during 2006-2010.
There are several factors we should consider when we estimate the parameters in our model. At first, the statistical data of HIV is the reported number. The reported number consists of infected individuals who might have been infected many years ago. Therefore, the annual reported number can not be regarded as the newly infected number. Second, the statistical data contain many variations resulting from the knowledge of HIV, the surveillance efforts, and uncertain events. Third, there are too many model parameters to be determined, and hence it is necessary to reduce model parameters by some epidemiological facts.  Figure  3). These three periods reflect different ways the surveillance system was set up over the period 1985-2004. There was very limited knowledge of HIV infection and virtually no surveillance system in the 1980s, when HIV infection was first reported in China. The surveillance system was then established gradually as more and more attention to HIV infection was given at various levels of government. There were local outbreaks due to contaminated blood between 1995 and 1997, and then a large scale screening program was launched in 2004 and 2005 to trace and identify the HIV infection status for those with possible contact with the contaminated blood. These factors greatly influence the normal and natural surveillance system and the reported data. In our opinion, the annual reported data from 1998 to 2002 are collected in a natural manner that reflects the HIV transmission tendency in China. We therefore will use the regression curve based on those five years of data to estimate the model parameters.
We now outline our assumptions to simplify the model. We observed that, although there are a large number of infected individuals because of the large basic reproductive number R 0 and the large susceptible population, the percentage of HIV infected in China is quite small. As such, S(t) ≈ N (t). Therefore, the structured epidemic model (1) can be approximated by We note that the above twelve equations for the infected individuals are similar to the classical Leslie population model.
We need to estimate the transmission rates β i , 1 ≤ i ≤ 12, and here the underlining assumption is that infected individuals with the same infection age have the same transmission rate since the viral load, the CD4 cell count, and the behavior activity determine the HIV progress and transmission capability. Due to the very limited number of data, we further regroup these infection ages to five stages according to the viral load, the CD4 cell count, and the behavior activity of a HIV infected individual. We used a data set from the State Key Laboratory for Infectious Disease Prevention and Control that contains information of 433 HIV nontreated infected individuals about their CD4 cell counts and viral loads. We regress the logarithm of the viral load with base 10 on the CD4 cell count to obtain the following regression model log 10 (viral load) = 4.790712 − 0.001689 CD4 counts.
The p-value of the F-test is less than 0.0001, indicating a very significant regression relationship. Table 2 shows that the median of the CD4 cell count (100, 275, 425, 750) for each of stages 2 to 5 is quite close to its corresponding average value. We then substitute the median value of the CD4 count at each stage to the above regression model to obtain the corresponding viral load, as shown in Table 2. We also assume that the transmission rate in each stage rate is proportional to the viral load; thus we can obtain the proportionality coefficient of the transmission rate for each of the stages 2 to 5 when the baseline transmission rate at stage 2 is set to 1. It is difficult to catch a newly infected person within three months of the infection, and thus Table 2 does not contain relevant information about the viral load or CD4 for those individuals in the first three months of the infection. Existing literature does indicate that the viral load is extremely high during this stage [10], and it is estimated that the transmission rate in the first three months of infection is thirty to fifty times higher than that in the second stage [13]. Therefore, if we denote the transmission rate in the second stage by x, then we shall assume the transmission in the first three months of the infection as 40x. On the other hand, the time scale of our recurrent calculation is one year, we need to spread the very high transmission rate in the first three months into the first year with the transmission rate given by (3(month) × 40x(hign transmission rate) + 9(month) × x)/12 = 10.7x.
Behavior analysis shows that there is a significant behavior change in the HIVinfected because of the illness and, in the last two stages, because of others' awareness of the infection. This behavior change will reduce the transmission rate, and we assume the reduction rates are 0.5 and 0.25 respectively for the last two stages (seven to nine years, and ten to twelve years). Therefore, we obtain the the transmission rates used in model (6) as follows: 54x, β 7 = β 8 = β 9 = 6.34 × 0.5x = 3.17x, β 10 = β 11 = β 12 = 12.53 × 0.25x = 3.13x.
Hence, estimating the 12 transmission rates β j is now reduced to the determination of the baseline value x.
From the regression function h 5 (t) we see that the average yearly rate increase is 31.3%, and the average number that an infected individual can transmit per year is 0.329, and the regressed number in the first year (1985) is 97. It is important to note that the reported number does not represent the newly infected patients; instead it is the number of individuals who were infected at an early time and whose infection was confirmed that year. We note that only a part of the HIV infected is reported each year. At the end of 2005, the estimated number of HIV-infected individuals in China is 650,000, and the cumulative number of the annually reported number from 1985 to 2005 is approximately 17.76% of the estimated number; 17.76% is the ratio of the accumulated reported number over the twenty-one years to the estimated number 650,000.
Note that 17.76% is not the annual report rate. To estimate the annual report rate, we assume that the report rate is a constant every year, and we use a simple recursive model to calculate this constant as follows. Let I(t), H(t), and U (t) be the HIV-infected number, the reported number, and the unreported number at time t, respectively. The HIV-infected number increased by 32.9% each year. The unreported number is the remainder of the previously unreported infected individuals plus the newly infected. The reported number is part of the infected who have not been reported that year. The infected, the reported, and the unreported numbers therefore satisfy the following recursive equations where the parameter r is the report rate. Starting from any initial value, we add all the reported numbers from the recursive model (8), then calculate the ratio of the total reported to the total infected, and finally, let the ratio be 17.76%. Thus, the annual report rate can be determined by the equation The calculation gives the value of the annual report rate r = 5.1%. After we understand the features of the annually reported HIV-infected number, we now use following model to estimate the transmission rate: , I j+1 (t + 1) = p j I j (t) for j = 1, 2, ..., 11, H 1 (t + 1) = rI 1 (t + 1), H j+1 (t + 1) = rp j U j (t) for j = 1, 2, ..., 11, U 1 (t + 1) = (1 − r)I 1 (t + 1), U j+1 (t + 1) = (1 − r)p j U j (t) for j = 1, 2, ..., 11, where The equation for the HIV-infected I j (t + 1) is just the simplified model (6). The equation for H j (t + 1) calculates the reported number at year t + 1. The reported number of the newly infected at t + 1 year is the product of the reported rate and the infected. The reported number in each of other groups is the product of the report rate and the infected who are not reported at year t + 1. The equation on U j+1 (t + 1) calculates the unreported number at year t + 1. The last equation of H(t + 1) sums up the total number of the reported HIV-infected in the year t + 1.
We do not have the national data for the death rate µ j and the conversion rate α j of the HIV-infected; here we use the survival date of a recent survey in the province of Anhui. It is estimated from 159 HIV-infected individuals that the survival time distribution is g(t) = exp(−0.0006(t − 0.5) 3.099 ); see [19]. The survival rate p j from the j th infection year to the (j + 1) th year in the estimate model (9)  After substituting those parameters and the initial values into model (9), we observe that the reported number H(t) at year t contains only one unknown parameter x. We thus minimize the following function to determine x and estimate the transmission rate, where h 5 (t) is the regression curve of the reported number.
Needless to say, the function φ(x) is such a long polynomial that it is of no interest or value to write it explicitly here. In practice, we use Maple to find the expression of φ(x) and then determine the minimal point x 0 . The calculation gives x = 0.0812.
Therefore, the transmission rates in the five stages are 0.868, 0.0812, 0.287, 0.257, and 0.254, respectively. The basic reproduction number is R 0 = 2.72, the average infection period is 9.23 year, and the average number of transmissions by an infected individual each year is 0.294. We have also carried out similar calculations by using the curve h 13 (t) and h 20 (t), the resulted basic reproductive number is 3.38 and 4.01, respectively. The prediction from model (9) with the estimated transmission rates is shown in the first two rows of Table 3, which clearly shows the fast growth of the reported and the total HIV infection. More efforts must be made for the intervention and control. In particular, if more education and effective control measures can be implemented to reduce the HIV transmission rate by 50% within five years, as policy makers hope, the HIV infection can be substantially reduced. The last two rows of Table 3 show the reported and total numbers of the HIV infection if the transmission rates can be reduced 13% annually from 2006 to 2010.
The simulation results given in Table 3 are obtained under the assumption of the same constant report rate for the HIV infected in each group: r = 0.051. More realistically, the report rate of the HIV infected is an increasing function of the infection time. Unfortunately, we do not have enough data and effective methods to determine infection-age-dependent report rates.
As an example to illustrate the effect of age-infection dependent report rates on the accuracy of the prediction, we use following increasing function r j = 0.051 + 0.6 j − 1 12 2 , for j = 1, 2, ...12. Another contribution of this study is the estimation of transmission rates in the different infection age group. In particular, we develop a simplification procedure based on the epidemiological features of HIV in China and the available data from the Chinese CDC's laboratory to reduce the problem of estimating a large number of parameters to the estimation of a baseline transmission rate and the functional relation between the transmission rates and the viral load of an infected individual. In our parameter identification, we also incorporate the behavior reduction factor for those infected in the last two stages of their infection. We emphasize the importance of introducing the reported rate to the model, derive the reported numbers of the HIV-infected in every infection group, and compare these derived reported numbers to the regression numbers of the real data to determine the transmission rates. We then estimate the transmission rates and the basic reproduction number in two cases: the constant reported rate and death rate, and the infection-age-dependent reported rate and death rate. Simulations and analysis show that the total number of projected infections from 2006 through 2010 is sensitive to the initial infections, while the anticipated reported number is also sensitive to the dependence of the report rate on the infection stages. It is hoped the techniques developed here also apply to other infections with long infection periods.