Introduction

Zika virus (ZIKV) is predominantly vector-borne and the outbreaks are strongly dependent on the mosquito vector density which is influenced by the climatic conditions1. There have been substantial evidence2 and reports3 of sexual transmission of ZIKV. Although ZIKV epidemic models mostly focus on the mosquito-vectored transmission, multiple studies have quantified the contribution of sexual transmission4,5,6,7,8. Modeling analysis has helped to determine the relative importance of different transmission mechanisms. To enhance our understanding on how the combination of both vectored and sexual transmission can help sustain the pathogen during the vector free seasons (e.g. winter), we develop a novel epidemic model that interconnects a homogeneous mosquito vector population with a heterogeneous sexual contact network.

Zika virus is a positive-sense single-stranded RNA virus of the Flaviviridae family9. It is related to other flaviviruses such as dengue, yellow fever, West Nile, and Japanese encephalitis viruses. ZIKV infection symptoms include acute fever, maculopapular rash, arthralgia, and conjunctivitis. However, only about 20% of infected people suffer any significant symptoms10. A large proportion of asymptomatic cases poses a major problem in determining the outbreaks to initiate early response measures. ZIKV infection in pregnant women can result into congenital microcephaly, a severe birth defect11. The transmission of ZIKV occurs primarily by infected mosquito vectors such as Aedes aegypti and Aedes albopictus. However, it can also be spread via infected semen and blood12. The first autochthonous case of Zika in the mainland US was reported in July 2016, in Miami, Florida. As of April 2018, 5,676 cases have been reported in the US States13. In 2016, 224 autochthonous cases have been reported where the state of Florida itself is responsible for 97% of those cases, and the remaining 3% cases occurred in Texas13. Due to the complex nature and multiple pathways of pathogen transmission, several epidemic parameters are still unknown.

There have been numerous attempts in modeling the outbreaks of ZIKV. Several models only considered vectored transmission1,14,15,16 and several others considered both sexual and vectored transmission routes4,5. Kuhlman et al. developed a hybrid agent-based model16 to simulate Zika outbreaks in the central Miami region. While the work demonstrates interlinked host and vector populations, it does not address host-to-host sexual transmissions. Vertical transmission has also been modeled in a work of Agusto et al.17. Olawoyin et al. analyzed the effects of multiple transmission pathways and concluded that, secondary transmision mechanisms increase the basic reproductive number and cause the outbreaks to occur sooner compared to the vector-only transmission18. The work of Gao et al. concluded that sexual transmission contributes about 3.044% of the overall transmission4. Moghadas et al. also found similar contribution from sexual transmission and they emphasized the importance of asymptomatic transmission19. Maxian et al. concluded that this contribution is too minor to sustain an outbreak, as they found that sexual transmission contributes about 4.8% to the basic reproductive number, R06. They also concluded that sexual transmission indirectly helps the vectored transmission by increasing the pool of infectious individuals. The relatively higher prevalence of infection in the female population compared to male population was analyzed by Pugliese et al.20, who concluded that higher susceptibility of females compared to their male counterparts in the case of sexual transmission could be the reason. The work of Sasmal et al. estimated that sexual transmission contribution could be as high as 15.36% for sexual risk-stratified population21. Another study concludes that existing research could potentially underestimate the risk of sustained sexual transmission by an order of magnitude22. It has been found that Zika virus can sustain in semen for a long time after it disappears from blood23. Hence, even though it is clear that sexual transmission alone can’t sustain an outbreak, we don’t have clear information about the persistence of pathogen in a human host network without the help of vectors. Limited information on the reduction of transmissibility for both asymptomatically infected hosts and during the extended sexual transmission period adds to the complexity in understanding how the outcomes are affected.

In this paper, we develop an individual-based (i.e., node-based) network model, which is different from basic compartmental models as it features heterogeneous mixing. Each individual (i.e., a node) is connected to a specific number of individuals and this number can be very different. In the context of real world sexual networks, this is realistic. Using survey data24 on sexual behavior, we develop a network generator tool that can produce networks with properties that closely match the given data. As the vector population is much larger than the host population and vectors don’t transmit between them, it is unnecessary to use node-based models for vectors. We employ a homogeneous population model for the vectors. We interconnect these two populations (hosts and vectors) via the use of infection rate parameters where the infection of one population depends on the number of infected in the other population. There have been several works on epidemic spreading in interconnected networks25. However, due to the multi-host (one being the vector) scenario, our model is built differently from conventional interconnected networks. Our goal is to incorporate heterogeneity in the host population but reduce unnecessary computational overhead by interconnecting with a homogeneous vector population. Our model also differentiates between symptomatically and asymptomatically affected individuals by assigning them different states. We furthermore, incorporate the extended sexual transmissibility in our model. In the host network sub-model, each host node can be in either one of the seven states: SEIsIaJsJaR (Susceptible, Exposed, Infectious (Symptomatic), Infectious (Asymptomatic), Convalescent (Symptomatic), Convalescent (Asymptomatic), and Recovered). The host population is also divided based on gender, sexual orientation, and age. In the vector population sub-model, the population is modeled into three compartments: SVEVIV (Susceptible, Exposed, and Infected). We only use birth-death demography for the vector model and the climatic variation is incorporated into the vector birth rate, which is one of the variable parameters. The simulation model is based on Gillespie’s Stochastic Simulation Algorithm (SSA)26. However, we have a few variable rate parameters in our model, which prompted us to use the non-Markovian Gillespie Algorithm (n-MGA)27 to simulate our model, which we implement using a modified version of the GEMF28 tool.

The contributions of this paper are threefold: (i) we propose a novel interconnected model to evaluate host-to-host and host-to-vector-to-host pathogen transmission, (ii) we develop a sexual contact network generator based on aggregate data, and (iii) using our implementation of the non-Markovian Gillespie Algorithm (n-MGA) we examine survival of pathogen in the host network. We also perform sensitivity analyses on key model parameters to evaluate their role in disease outcomes. The interconnected model is presented in the Model Formulation section, three independent approaches in evaluating model outputs are presented in the Results section, and the Methods section contains the details on the simulation model.

Model Formulation

Interconnected population model

We propose an interconnected population model to investigate the spread of ZIKV within the human host and the mosquito vector population during an outbreak. A basic diagram of the model is shown in Fig. 1. It has two major components: the vector population marked by the cloud on top, and the host contact network marked by the solid circle nodes and the solid edges. The vector population is assumed to be in the vicinity of the host population such that each mosquito vector can bite any of the hosts (indicated by dashed lines).

Figure 1
figure 1

Coupled population model for ZIKV. The black solid circles indicate host nodes (individuals) and the cloud shape above indicates the vector population. The solid edges connecting the nodes indicate the sexual contact network and the dashed edges indicate contacts of the vector population with the host nodes. A host node can be in any of the seven states (SEIsIaJsJaR) and the entire vector population is divided into three compartments (SVEVIV).

Each host node can be in either one of the seven states: Susceptible (S), Exposed (E), Infectious (Symptomatic) (Is), Infectious (Asymptomatic) (Ia), Convalescent (Symptomatic) (Js), Convalescent (Asymptomatic) (Ja), and Recovered (R). The convalescent states were used to model the extended presence of ZIKV in semen as reported by previous works23. People in Js or Ja states cannot transmit infections via vectors, they can only transmit sexually. The vector population is divided into three compartments: Susceptible (SV), Exposed (EV), and Infected (IV). The unidirectional arrows indicate transitions from one state/compartment to another. The symbol adjacent to each arrow indicates the rate of that particular transition. The set of all these symbols constitute the model parameters, and they are listed in Table 1. This table also lists the nominal values and the range of values to be used for the simulation experiments. Some of these values are not readily available in the literature and we performed calibrations to determine acceptable ranges of values to conduct the experiments. We also performed sensitivity analyses with several critical parameters to evaluate model behavior. We have also used boolean condition functions marked by the symbol, Cx(n) as shown in Fig. 1. The subscript’x’ indicates which condition a node n has to satisfy. If a condition is met, the condition function returns True (=1), otherwise it returns False (=0). For example, CAM(i) will return 1 if node i is an adult male (AM), it will return 0 otherwise. The conditions that we have used are: AM = Adult Male, and NAM = Not Adult Male. It should be obvious that for any i, CAM(i) is the complement of CNAM(i). FST(i) is the multiplier component function of the force of infection for sexual transmission from other nodes to node i. The symbols Is, Ia, Js, and Ja in the formulation of FST(i) are host state indicators for neighbors (j) of node i. CST(i, j) is the sexual transmission condition function. It only returns True (1) if i is an adult and j is an adult male. Here, j is considered the transmitter and i is consider the recipient of infection.

Table 1 Epidemic model parameters and functions.

As indicated in Fig. 1, a susceptible host node can get infected (S → E) via two mechanisms: (i) getting bitten by an infected mosquito (at the rate λHV) or (ii) having sexual contact with an infected or convalescent (Is, Ia, Js or, Ja) host (at the rate β). We use the adjacency matrix representation of a graph, where \({\mathscr{A}}\) is the adjacency matrix. An element \({a}_{ij}\in {\mathscr{A}}\) indicates the connection/link between node i and node j. If a link exists, then aij = 1, otherwise aij = 0. The sexual transmission rate, β was computed using the formula β = 1 −(1 − α)n, which was previously used in the work of Agusto5. Here, α is the probability of transmission per coital act with an infected partner. As data specific to Zika virus is not available, we assume a range of α between 0.001 and 0.129. The exponent n is the average number of coital acts per unit time. Some estimated data indicate that people have sex an average of 60 times per year24. Hence, we estimate n = 60/365 = 0.1644 per day. The range of β for the range α is 0.000164 to 0.0172. Once a human host is infected, he/she can either develop symptoms or not show any sign of infection at all. Hence, there are two possible transitions from the exposed (E) state: (i) a proportion, τ of the people becomes symptomatically infected (Is) and (ii) a proportion, 1 − τ of the people becomes asymptomatically infected (Ia). The proportion of symptomatic ZIKV infections is claimed by previous works10 to be about 20%. However, one recent work30 claims that this proportion could be higher ranging from 27–50%. For our analysis, we take this into consideration as shown in Table 1. Once someone is infected, they will stay infected for an average duration of 1/γ1. However, once the virus disappears from blood, the adult male hosts will go through some extended period of infectiousness which is modeled by the convalescent states (Js and Ja). An adult male in the convalescent state may only transmit sexually and would recover after an average duration of 1/γ2. The infectiousness of the 4 states (Is, Ia, Js, and Ja) are not considered the same. The relative transmissibility of the asymptomatic states is represented by θ. The relative transmissibility for the convalescent states is denoted by μ. Although, symptomatically infected people could be spreading less if they are seeking medical attention, we do not reduce the transmissibility for the symptomatic cases in this model. Historically, it has been seen that, possible negligence associated with ZIKV has contributed to sexual transmissions31. A healthy mosquito vector can get infected with some nonzero probability only if it bites an infectious human (Is or Ia) (at the rate λVH). The time dependent per-capita mosquito recruitment rate is represented by F(t) where, F(t) = (1/12)A(t). Here, 1/12 is a fixed birth rate assumed to be the same as the mortality rate (ε). The mosquito abundance factor A(t) is the seasonality parameter which incorporates seasonal variations into the model.

To generate a realistic sexual network, we develop a network generator tool based on the ‘configuration model32’. Our tool takes as input, data of sexual behavior, age, gender, and produces representative random networks. Details are featured in the Methods section. An example network generated by our tool is given in Fig. 2. In our model, we also intend to incorporate the temporal variations of the climate in different seasons and the consequences of such variations on the mosquito vector population. To simplify this task, we use a vector abundance factor (A(t)) expressed as a fraction in the range [0,1]. This parameter was derived from data points used in the work of Monaghan et al.33, which they originally extracted from the works of Reiskind et al.34. For our simulations, we use vector abundance data from Miami, Florida. The mosquito abundance factor over the course of 12 months is plotted in Fig. 3. The data originally had 12 sample points (monthly values). As the model requires the abundance factor on a daily basis, we use linear interpolations between data points of adjacent months.

Figure 2
figure 2

Generated sexual contact network based on sexual behavior24. The given network has N = 1000 nodes. The Methods section contains the details on how it is generated. The nodes are presented in two distinct colors and three distinct sizes. The males are marked as blue and the females are marked as pink. The three node sizes correspond to Children (small), Adults (medium), and Elderly (large) people. This particular instance of the network has an average node degree of 0.803 whereas the 64% adult population have an average degree of 1.26. This particular instance of network has 463 male and 537 female host nodes. The edge density of this network is 0.0008038. There are 600 connected components and the largest one consists of 117 nodes.

Figure 3
figure 3

Variation in mosquito vector abundance with time. Due to changing temperature, rainfall, and other climatic factors, mosquito abundance varies throughout the year. The squares show the data points and the dotted lines are linear interpolations. The data represent Palm Beach, Florida where the vector abundance peaks during June-July. The plot was constructed from data observed in Palm Beach, FL over 27 four-week periods from 2006–200833,34.

Simulation tool

If the transition rate parameters are constant, the overall stochastic model is a Markov process with Poisson arrivals and exponential inter-event times. A stochastic algorithm such as the Gillespie SSA26 can be used to perform simulations in most of these cases. The well-developed GEMFsim28 has been used previously35 to solve such stochastic spreading processes in the networks. However, our model requires a few changes before such simulations can be performed.

In our model, the transition rate parameters are not constant. The mosquito birth rate is a seasonally dependent parameter that varies according to the given vector abundance input data. The vector population compartments are changing with time during the outbreak simulation, which change the host infection rate λHVIV. Similarly, the host infected populations are also changing with time which modify the vector infection rate \({\lambda }_{VH}{N}_{H}{\sum }_{j=1}^{{N}_{H}}{[{I}_{s}+\theta {I}_{a}]}_{j}\). These constitute a set of exogenously varying parameters (i.e., varying due to external forces/catalysts). As a consequence, the processes are no longer Poisson, rather these are called non-homogeneous Poisson. To cope with this issue, we use the non-Markovian generalized Gillespie Algorithm (nMGA)27 which assumes general renewal processes (which is more generic) that allow exogenous variations of parameters. We modified the existing GEMFsim28 accordingly, and added a vector compartmental model solver on top of it. The vector compartmental solver is effectively coupled to the modified network-based nMGA solver via the infection rate parameters (λHVIV and \({\lambda }_{VH}{N}_{H}{\sum }_{j=1}^{{N}_{H}}{[{I}_{s}+\theta {I}_{a}]}_{j}\)), where a parameter in one population depends on some quantity in the other population.

Results

The simulation tool uses the parameters listed in Table 1, the initial conditions, the seasonal variation data, the population data, and the contact network representation. The initial condition is a single infected vector (IV(0) = 1) while keeping the remaining vector population and the entire host population susceptible (S). We specify the pathogen introduction month (Mstart) and the max run time (Tend). For example, if we set Mstart = 5 and Tend = 100, the simulation will start from 1st of May and run up to the 2nd week of August. A simulation can terminate before Tend if the outbreak dies out. As we are running stochastic simulations, we smooth out our results by averaging over 500 repetitions. The population data contains the distribution of genders, sexual orientations and age groups, used to generate the contact network. For the survival analysis, we introduce the pathogen in November (Mstart = 11) to indicate the start of winter. For the sensitivity analysis, we introduce the pathogen into the population in April (Mstart = 4), which is the most likely scenario for the 2016 Miami-Dade outbreak36.

Seasonal analysis

First, we explore the effect of seasonal variations on the epidemic progression. We use the nominal parameter values defined in Table 1 and run the simulations for three distinct pathogen introduction months, Mstart. Taking some cues from the seasonal patterns of Miami, FL shown in the Fig. 3, we choose to run independent scenarios starting on 1st of April (Mstart = 4), 1st of August (Mstart = 8), and 1st of October (Mstart = 10). The averaged results of 500 simulations are shown in Fig. 4.

Figure 4
figure 4

The time series plots of ZIKV outbreaks for three different pathogen introduction months, Mstart in Miami, FL. The left column contains the host plots and the right column contains the corresponding vector plots. The three rows indicate three different pathogen introduction times: 1st of April, 1st of August, and 1st of October respectively. The hosts in different states are expressed as fractions of the total population. The number of vectors in different compartments are expressed in log scaled axes. Here, we present 5 out of the 7 host states (E, Is, Ia, Js, and Ja) and the all three vector compartments (SV, EV, and IV) which are marked with distinct colors described by the legends at the bottom. All plots are averages of 500 independent stochastic simulations.

Without any intervention, we observe a large outbreak (more than 50% of the population affected) if pathogen is introduced on 1st of August. For the remaining two cases, the outbreak sizes are comparatively much smaller. This can be explained using the seasonal variations of Fig. 3. In Miami, the mosquito abundance is high during the beginning of August. This contributes to a boost in the infected vector population compared to the other two dates.

We also observe some interesting behaviors in the outbreak dynamics. Despite the vector abundance having a large positive slope during the month of April and a large negative slope during the month of August, Miami suffers a larger outbreak in the August introduction compared to the April introduction. Figure 4b shows that the infected vectors die out soon in the April outbreak although the upcoming summer (positive slope in abundance) sustains healthy vectors for a long time. In the case of the August introduction, the infected vectors rise rapidly during the first 50 days (Fig. 4d). Although the vector population declines rapidly after 50 days due to unsuitable climates of the upcoming winter, the initial surge in infected mosquitoes causes a large outbreak. The results indicate that the suitability of the climate during pathogen introduction plays a major role in determining the size of the outbreak. The impacts of climatic changes in the following months are minimal.

Survival analysis

As a part of the survival analysis we will be observing several measured quantities: the epidemic/infection lengths, the pathogen survival periods, and the epidemic attack rates. Here, we define them first before discussing the results.

Host Infection Length (T HL)

The host infection length is defined as,

$${T}_{HL}=The\,last\,time\,instant\,between\,0\,and\,{T}_{end}\,when\,there\,is\,at\,least\,one\,infected\,host\,left$$
(1)

It is the last day on the outbreak time-line an infected host can be found. For this property, the infected hosts who are asymptomatic or in the convalescent state are also considered infected. In this article, when we use the term “epidemic length”, we imply host infection length THL unless otherwise mentioned.

Vector Infection Length (T VL)

The vector infection length is defined as,

$${T}_{VL}=The\,last\,time\,instant\,between\,0\,and\,{T}_{end}\,when\,there\,is\,at\,least\,one\,infected\,vector\,left$$
(2)

It is the last day on the outbreak time-line an infected vector can be found. In typical outbreak situations, infected vectors die out before all the infected hosts recover.

Pathogen Survival Period, (T PS)

The vector free pathogen survival period is defined as,

$${T}_{PS}={T}_{HL}-{T}_{VL}$$
(3)

It measures how long pathogen can survive in the host population without the presence of vectors.

Epidemic Attack Rate (AR)

The epidemic attack rate (AR) is defined as,

$$AR=\frac{Number\,of\,hosts\,who\,experience\,infection\,throughout\,the\,outbreak}{Total\,number\,of\,hosts,{N}_{H}}$$
(4)

The value of AR is in the range [0, 1]. We sometimes express this quantity as a percentage instead of a fraction. For example, an AR of 0.3 means 30% of the population were infected during the outbreak and the remaining 70% never experienced any infection.

To see how some of the above mentioned quantities relate to sexual transmission, we vary the sexual transmission rate, β in the range mentioned in Table 1. We find that varying β does not noticeably increase or decrease the epidemic length (Fig. 5a), which remains between 2–4 days for the range of β in consideration. Outbreaks in most of these cases last for about 158 days. The effect of increased sexual transmission in modifying pathogen survival is also minor (Fig. 5b), being within 74–78 days with an average of 76.75 days. A similar situation arises for the attack rate as well (Fig. 5c), with the mean attack rate being 15.12%. If there is no sexual transmission (β = 0), the attack rate is 14.48%. With a sexual transmission rate of 0.0175 (corresponding to about 10% probability of transmission per coital act), the attack rate increases to 15.97%. This accounts to about 10.29% increase in the epidemic size that can be caused by a high rate of sexual transmission. We also have an unknown factor, the relative sexual transmissibility (μ) when recovering hosts are in the convalescent phases. For all other simulations, we assume the infectiousness will reduce by 50% in convalescent stages. However, we also vary this parameter to determine how it affects the outcomes. Just like the effects we saw for β, the epidemic lengths remain within 1–2 days of the mean of 157.6 days (Fig. 5d). The pathogen survival also show similar characteristics as before and stays within 73–76 days (Fig. 5e). The effect on the epidemic size is barely noticeable (Fig. 5f), with small fluctuations.

Figure 5
figure 5

The plots depicting the results of pathogen survival analysis. The results were obtained by varying the sexual transmission rate β (1st row), the relative transmissibility of the convalescent states μ (2nd row), the proportion of symptomatically infected individuals τ (3rd row), and the relative transmissibility of the asymptomatic states θ (4th row). The plots demonstrate the host infection length THL (1st col), the pathogen survival TPS (2nd col), and the attack rate AR (3rd col). Each data point (blue square) in the above plots is an average of 500 independent stochastic simulations. The shaded regions indicate 95% confidence intervals.

ZIKV disease outcomes are also affected by a large proportion of asymptomatic hosts. We vary the two parameters (τ and θ) that we use to model the asymptomatically infected individuals. Although in most works we found that about 20% of the cases were symptomatic, one recent work estimated that this proportion could be higher30 (27–50%). To evaluate the effect of such variations, we vary the symptomatic proportion parameter, τ from 10% to 60%. The epidemic length increases gradually with τ starting from 155 days for 10% symptomatic to 172 days for 60% symptomatic population (Fig. 5g). A 50% increase in the symptomatic proportion causes a 11% increase in the length of the outbreak. The pathogen survival is affected much less compared to epidemic length. However, we find a small fluctuating decrease followed by a gradual increase (Fig. 5h). The vector free pathogen survival remains within 2–3 days of the average length of 75.6 days. The epidemic size has a clear increasing trend as shown in Fig. 5i. This is expected because the symptomatically infected individuals are subject to higher transmission rates. As an example, the outbreaks will be about 45% larger if 50% of the people are symptomatic compared to the usual proportion of 20%. The relative transmissibility has significantly larger effect in the disease outcomes and all three properties increase (Fig. 5 bottom row). In most other simulations, it is assumed by default that infectiousness reduces to half (θ = 0.5) when someone is asymptomatic. Compared to our default situation, if we make infectiousness of both symptomatic and asymptomatic individuals the same, we see 17.5% increase in the epidemic length (Fig. 5j), 14.45% increase in the pathogen survival period (Fig. 5k), and a massive 123.5% increase in the epidemic size (Fig. 5l).

Sensitivity analysis

A sensitivity analysis is performed in order to evaluate the model response with respect to several key parameters. For a vector borne disease such as ZIKV, the ratio of vector and host population is expected to play a prominent role in the disease outcomes. In our previous analysis, we have also found that the proportion of asymptomatic individuals and their relative transmission rate affect the disease outcomes. To evaluate how these parameters interplay, we perform a pairwise sensitivity analysis of the three parameters: the vector-to-host ratio (NV/NH), the proportion of asymptomatic hosts (τ), and the relative transmissibility of asymptomatic hosts (θ). The results are depicted in Fig. 6.

Figure 6
figure 6

Contour plots depicting the results of the sensitivity analysis. The parameters varied in each plot are marked as axes labels. The quantity for which the contours are being displayed is mentioned on top of each plot. The contours are color mapped and a color-bar on the side of each plot indicates the range of values represented by the plot. The 1st row shows the sensitivity of the epidemic length (THL) on the parameters, the 2nd row shows the sensitivity of the pathogen survival (TPS) on the parameters, and the 3rd row shows the sensitivity of the epidemic attack rate (AR) on the parameters. The parameters that were varied here are: the vector-to-host ratio (NV/NH), the proportion of symptomatically infected individuals (τ), and the relative transmissibility of the asymptomatic states (θ). Each data point shown in the above plots is an average of 500 independent stochastic simulations.

The epidemic length (THL) is affected by the three parameters in question. However, we can see that, the effect of symptomatic proportion depends on the relative transmissibility. The symptomatic proportion can regulate epidemic length if relative transmissibility is low as seen in Fig. 6b. If θ is above 0.4, the capability of τ is greatly reduced. The vector-to-host ratio (NV/NH) always increases the epidemic length (Fig. 6a,c), which is also true for the relative transmissibility of asymptomatic individuals (θ) (Fig. 6b,c).

We obtain some interesting results when we analyze the sensitivity of pathogen survival, TPS. Pathogens can survive longer for an intermediate range of host to vector ratio if the proportion of symptomatic individuals is low as shown in Fig. 6d. On the other hand, despite having some large epidemics on the upper right corner of Fig. 6g, we see that pathogen survival is relatively lower (as low as 67 days) (Fig. 6d). In Fig. 6e, we see that depending on the value of τ, the survival can be higher for certain intermediate values of θ. For example, When τ < 0.2 and 0.3 < θ < 0.7, there is a region where pathogen can survive more compared to the other situations. Survival reduces for both high and low values of asymptomatic relative transmissibility (θ). When comparing vector-to-host ratio along with the asymptomatic relative transmissibility, we see that a thick band or a region appears, where the pathogen survival is high (Fig. 6f). There are slight fluctuations in that peak region, however, they are relatively minor to indicate any particular phenomenon. If we compare this plot with the attack rate plot (Fig. 6i), we see that pathogen survival is relatively long for intermediate values of attack rates and relatively short for small or large attack rates. Small outbreaks naturally end sooner. Very large outbreaks also can end soon due to faster spreading dynamics and herd immunity of the recovered population.

The epidemic attack rate (AR), as expected, is highly sensitive to vector-to-host ratio, NV/NH for the entire range of asymptomatic proportion, τ. For the nominal value of τ = 0.2, doubling the NV/NH ratio increases epidemic size by 249.78% (Fig. 6g). However, if the asymptomatic individuals have very low transmission capabilities, it can limit the effect of vectors on epidemic size as seen in Fig. 6i. The fact that majority (80%) of the hosts are asymptomatic in ZIKV infections, indicate that they hold a critical role in spreading. The parameters τ and θ both demonstrate importance in determining epidemic size (Fig. 6h) and among them, the effect of θ is more radical.

Discussion

In this study we have proposed an individual based interconnected network model for ZIKV that can also be used to simulate any vector-borne disease that features contact based direct transmissions. We employ heterogeneous mixing based on the host contact network which is generated based on real world data on human sexual behavior, sexual orientation, gender, and age structure. We utilize the approximations of the non-Markovian Gillespie algorithm to run stochastic simulations. In the beginning, we explore how the seasons can affect this predominantly vector-borne disease. Later, we focus on the survival of pathogen in the climates similar to Florida if an outbreak starts prior to the colder months. In this step, we examine how some of the important model parameters affect the outcomes. Finally, we perform a sensitivity analysis to evaluate our model behavior in response to changes in some key parameters.

Our seasonal analysis results indicate that outbreak size is strongly related to the environmental conditions during the pathogen introduction. The first few weeks are crucial in determining how much the pathogen would spread. After that period, environmental variations have a much weaker effect in reducing or increasing the outbreak size. If the pathogen is introduced during the peak mosquito season, there is a high probability that we will see a large outbreak. Even if climatic suitability of mosquito vectors decline rapidly after the first few weeks, the vectors manage to spread the pathogen in the host population rapidly and cause large outbreaks. This suggests that a ZIKV outbreak can spread rapidly out of control if it is not effectively contained during the initial stage. Early interventions are crucial even though it may be difficult to identify outbreaks due to large asymptomatic group of hosts. The ratio of vector to host, as expected, has shown its prominence in determining outbreak size and the length (Fig. 6).

We have analyzed how sexual transmission affect the outbreaks. Our results in Fig. 5c,f align with the conclusions drawn by some of the earlier works4,5,6 that sexual transmission is a small component in the overall force of infection, which is dominated by the vectored-transmission. However, pathogen still survives up to 75 days in the host network without the help of infected vectors. This long survival can be attributed to small amount of sexual transmission but it is mainly due to the extended infectiousness (convalescence) of the hosts as it is assumed that pathogen can survive up to a month in the semen of recovering males. The use of certain birth controls (e.g. condoms) could be effective in combating the spread of pathogen during this period. These conclusions should inspire further clinical studies in this area to test the efficacy of control measures.

A large number of asymptomatically infected individuals play an important role in outbreak dynamics. The proportion of symptomatic individuals do not affect the survival of pathogen noticeably. However, it is positively correlated to outbreak size. The relative transmissibility of asymptomatic states is one of the key factors in determining disease outcomes which can extend the vector-free pathogen survival up to 3 months. It is one of the most important parameters that needs to be properly estimated in order to obtain informative outbreak predictions.

The conclusions drawn from our results can be useful in evaluating potential endemic scenarios for Zika virus disease in temperate regions. Although the contribution of sexual transmission is small, the ability of the pathogen to survive in a human sexual contact network for extended periods can have consequences in sustaining a Zika outbreak in a region as well as spreading to other regions due to long distance travels. In addition to that, a large asymptomatic proportion could be the most important hurdle in controlling ZIKV outbreaks. Although we provide conclusions on the relative importance of key parameters, the unavailability of data on some of those warrants for further efforts towards estimations. This model should be applicable to other vector-borne disease which have the potential of being transmitted sexually. Our model is also mesoscale (medium sized population) in terms of hosts, due to the limitations imposed by computational complexities. This work can be extended in the future by the use of activity driven networks (ADN), vertical transmission, and larger populations. Those studies would provide more insights into the study of Zika virus epidemics.

Methods

Host network characterization

For the host population, we assume equal number of males and females. We divide the population into three age groups: Children (0–14), Adults (15–64), and Elderly (65+). Based on the World Development Indicators (WDI) data published for 201737, we find that for the USA population, the three age groups are distributed as 19% Children, 66% adults, and 15% elderly people. The adults are the only ones assumed to be capable of transmitting the disease sexually, hence the remaining population are only affected by vectors. Based on the data provided in a sexual behavior study24, we find that sexual orientation of men consisted of approximately 97.2% heterosexuals, 2.5% homosexuals, and 0.3% bisexuals. For women, the study revealed 98.9% heterosexuals, 0.9% homosexuals, and 0.2% bisexuals. The same work24 also compiles a distribution of the population based on the number of sexual partners which is listed in Table 2. The data indicate that majority of the population have a single partner in a timespan of 12 months. The average number of partners for the adult population was 1.28.

Table 2 Average number of sexual partners in the last 12 months24.

The network generator tool is designed using the configuration model32. It takes all the above mentioned population and sexual behavior properties as inputs, and generates a graph whose statistical properties closely match the given partner distribution of Table 2 and the distribution of sexual orientation. Studies on sexual networks also indicate that sexual contact networks have node degree distributions that follow the scale-free structure38. There are large variations in the number of sexual partners while a small group of highly active people form the core38. Core groups are important in sustaining pathogen transmission specially if the duration of infection is short39. To maintain these features, the network generator is designed in such a way that, high degree nodes have higher probabilities of connecting to low degree nodes. An example of network generated by our generation tool is shown in Fig. 2.

Vector characterization

The mosquito vectors are modeled as homogeneous population and we use the classical Ross-Macdonald approach used by Keeling et al. in their book40. We divide the vector population into three compartments, Susceptible (SV), Exposed (EV), and Infected (IV). The transitions between the three compartments are showed in Fig. 1. Table 1 describes the different parameters that were used for the model. The equations for disease dynamics in mosquito vectors are given below,

$$\begin{array}{rcl}{\dot{S}}_{V} & = & F(t){N}_{V}-{\lambda }_{VH}{N}_{H}(\sum _{j=1}^{{N}_{H}}{[{I}_{s}+\theta {I}_{a}]}_{j}){S}_{V}-\varepsilon {S}_{V}\\ {\dot{E}}_{V} & = & {\lambda }_{VH}{N}_{H}(\sum _{j=1}^{{N}_{H}}{[{I}_{s}+\theta {I}_{a}]}_{j}){S}_{V}-\sigma {E}_{V}-\varepsilon {E}_{V}\\ {\dot{I}}_{V} & = & \sigma {E}_{V}-\varepsilon {I}_{V}\end{array}$$
(5)

We incorporate seasonality into this model by using a time dependent mosquito recruitment rate, F(t). This rate depends on the time (day) of the year. The transmission parameters, the λ’s are computed from the mosquito bite rate, r and transmission probability, T. The formula are given in Table 1.

Non-Markovian Gillespie Algorithm

The nMGA (Non-Markovian Gillespie Algorithm) was proposed by Boguná et al.27. The following derivation was also described in the work of Masuda et al.41.

We first consider N renewal processes running in parallel. Let ti be The time elapsed since the last event of the ith process. We denote ψi(τ) as the probability density function of inter-event times for the ith process. The survival function of the ith process (i.e., the probability that the inter-event time is larger than ti) is,

$${{\rm{\Psi }}}_{i}({t}_{i})={\int }_{{t}_{i}}^{\infty }{\psi }_{i}(\tau )d\tau $$
(6)

Now, the probability that no process generates an event for time Δt is,

$${\rm{\Phi }}({\rm{\Delta }}t|\{{t}_{j}\})=\prod _{j=1}^{N}\frac{{{\rm{\Psi }}}_{j}({t}_{j}+{\rm{\Delta }}t)}{{{\rm{\Psi }}}_{j}({t}_{j})}$$
(7)

To determine the time until the next event, Δt, we take a sample u from uniform distribution over [0, 1] and solve Φ(Δt|{tj}) = u. This step is computationally expensive when N is large. To improve performance, we approximate this step as proposed by Boguná et al.27. This approximation is exact as N → ∞. When Δt is small (N is large), equation (7) becomes41,

$${\rm{\Phi }}({\rm{\Delta }}t|\{{t}_{j}\})\approx exp[-{\rm{\Delta }}t(\sum _{j=1}^{N}{\lambda }_{j}({t}_{j}))]$$
(8)

Now, the instantaneous (hazard) rate of the ith process, which is generally assumed to be a function of time since the last event is determined by,

$${\lambda }_{i}({t}_{i})\equiv \frac{{\psi }_{i}({t}_{i})}{{{\rm{\Psi }}}_{i}({t}_{i})}$$
(9)

With the above equations in hand, we can run the Non-Markovian Gillespie algorithm as follows,

  1. 1.

    Initialize tj for all (1 ≤ j ≤ N).

  2. 2.

    Determine the time until next event from,

    $${\rm{\Delta }}t=\frac{-\mathrm{ln}\,u}{{\sum }_{j=1}^{N}{\lambda }_{j}({t}_{j})}$$
    (10)
  3. 3.

    Select the process i that has generated the event with probability,

    $${{\rm{\Pi }}}_{i}\equiv \frac{{\lambda }_{i}({t}_{i})}{{\sum }_{j=1}^{N}{\lambda }_{j}({t}_{j})}$$
    (11)
  4. 4.

    Update the time since the last event, tj = tj + Δt for all j ≠ i. Set ti = 0.

  5. 5.

    Repeat steps 2–4.

The original Gillespie Algorithm can be recovered from this nMGA using λi(ti) = λi. This is the case for all the parameters that are constant.

Numerical simulation

We use the non-Markovian Gillespie Algorithm (nMGA) to simulate the processes related to host nodes. The vector population is simulated using deterministic ordinary differential equation (ODE) solver. We combine the hosts and the vector simulations together by calling the ODE solver at the end of each event. The GEMFsim28 tool, which already supports the Gillespie algorithm was modified in order to include the vector population model and time varying parameters. We use the variables X and Y to describe the host and vector populations in different states/compartments respectively. Here, X contains information about the state of each host node and Y contains the population count of each vector compartment. X0 and Y0 are the initial states/compartments at the start of the simulation. \({\mathscr{A}}\) is the host network representation. The term PARAM is used to denote the set of model parameters listed in Table 1. TH and TV in the output are the host and vector indexing arrays that contain information about time where the data points (X and Y) were generated. We denote xnX as the state of node n. λn(xn → j) is the transition rate of node n from its current state xn to state j. We use M to denote the total number of host states (= 7 in our case). For a node n, Λn denotes the sum of transition rates from its current state to all other possible states. The VECTSOLVER is an ordinary differential equation (ODE) solver that takes as input the current situation of the vector population Y and solves them from the current instance to the time δt. The ODE equations that are being solved are given in equation (5). This solver is invoked once after each event (with the updated parameters) and the time indexing vectors are updated with respect to the global time t. The simplified pseudo-code of the modified GEMFsim is given below,

figure a

As the simulation is event-based and parameters are updated following each event, the inter-event times should be small to keep the variable parameters up to date in both populations. Longer inter-event times in a simulation can increase parameter discrepancy between the host and the vector sub-models. These sub-models are coupled by inter-dependent parameters that vary with time and update after each event. Hence, it is important that events occur at shorter intervals to keep the model outcomes accurate. Fortunately, due to the nature of the Gillespie algorithm, it is indeed the case when a pathogen is present in the host population. We have demonstrated this fact in Fig. 7 that ensures parameter accuracy. The inter-event times are also exponentially distributed which is shown in Fig. 7b. We have a small number of outliers that indicate long inter-event times. Most of these occur when the infection is either very low in the population or it is about to die out. Hence, the simulation results remain unaffected.

Figure 7
figure 7

Inter-event times in different stages for all of the 500 simulations during the first 100 days (left) and their distribution (truncated inter-event times >5) (right). For this case, the average inter-event time is 0.2074 day (95% CI [0.2056 to 0.2092]). About 96.87% of the events that occurred had intervals shorter than a day. There are a few outliers though, which are mostly due to slowing down of the events at the end of the epidemics (when pathogen is low in the host population or has been wiped out).