Data-driven contact structures: From homogeneous mixing to multilayer networks

The modeling of the spreading of communicable diseases has experienced significant advances in the last two decades or so. This has been possible due to the proliferation of data and the development of new methods to gather, mine and analyze it. A key role has also been played by the latest advances in new disciplines like network science. Nonetheless, current models still lack a faithful representation of all possible heterogeneities and features that can be extracted from data. Here, we bridge a current gap in the mathematical modeling of infectious diseases and develop a framework that allows to account simultaneously for both the connectivity of individuals and the age-structure of the population. We compare different scenarios, namely, i) the homogeneous mixing setting, ii) one in which only the social mixing is taken into account, iii) a setting that considers the connectivity of individuals alone, and finally, iv) a multilayer representation in which both the social mixing and the number of contacts are included in the model. We analytically show that the thresholds obtained for these four scenarios are different. In addition, we conduct extensive numerical simulations and conclude that heterogeneities in the contact network are important for a proper determination of the epidemic threshold, whereas the age-structure plays a bigger role beyond the onset of the outbreak. Altogether, when it comes to evaluate interventions such as vaccination, both sources of individual heterogeneity are important and should be concurrently considered. Our results also provide an indication of the errors incurred in situations in which one cannot access all needed information in terms of connectivity and age of the population.

An accurate determination of epidemic thresholds in contact networks is of huge importance, both for mitigating and prediction epidemic spreading, as well as for devising effective vaccination strategies. This research points out clearly for the first time that, when it comes to the evaluation of interventions such as vaccination, both sources of individual heterogeneity are important and should be considered jointly. This was an important open problem in the realm of an intensely investigated subject with obvious practical ramifications. By introducing a clever new approach based on empirical data and network science, this study thus fills and incredibly important gap that bridges the divide, and it reveals just how wrong one could be by neglecting or not having access to all the needed information in terms of connectivity and age of the population.
The paper is well-written, comprehensive, and clear. I find it is among the finest papers that I have had the pleasure of reading in the recent past. The motivation behind the approach and the insights it affords towards improving spreading of communicable diseases is genius, and as such it will surely not fail to impress the diverse readership of PLOS Computational Biology. For these reasons, I warmly recommend publication.
It is quite a challenge to suggest improvements for such an excellent contribution. Perhaps a reference to the current COVID-19 pandemic and how the approach could improve forecasting, as studied in "Forecasting COVID-19", Front. Phys. 8, 127 (2020), would be worthwhile. Apart from this, I can only reiterate my overall very positive impressions and congratulate the authors to a fine contribution.
We sincerely thank the reviewer for the very positive assessment of our work.
Following his/her advice, we have added a brief comment in the introduction and a few lines regarding the current COVID-19 pandemic in the conclusions section. We sincerely thank the reviewer for pointing this out because we indeed believe that the modeling of this disease would particularly benefit from our results. Furthermore, the current lack of data regarding the mixing patterns in countries such as Spain shows that this data's importance is usually neglected. We hope that our paper will help to raise the interest of the community in this information. We agree with the reviewer in the fact that we have only used one set of data. However, we do not think that extending the analysis to other countries will add much information. In particular, the most detailed data on this regard comes from the POLYMOD study, which was performed in Europe. As such, the mixing patterns and the contact distribution of the population is fairly similar from country to country. It is true that the precise values might change, for instance the average degree in Italy is 19, while in Germany it is 9. Yet, although the exact values of the attack rate -and other analysis -will change, the shape of the contact patterns matrix and the demography are fairly similar. Thus, the qualitative results will be the same. Since we are focused precisely on a qualitative analysis of the role that this kind of data plays, adding more countries will obscure the discussion. Note, additionally, that although we used a dataset for Italy, the methodology is general and not specific to this data set. Fig. 1D; and is the "number of contacts" should be "age"? What is the 'X' in Fig. 2B? What is the "Relative Difference (%)" in Fig. 2C.

Many concepts and legends in the figures are not well defined. For example, the "Frequency" in
We have added a sentence in the caption to better explain the meaning of figure 1D. We understand that there might be some confusion regarding the term "frequency" since sometimes it is used as the total count while in others -like in our case -is normalized. Regarding the x axis, the number of contacts is the correct term. In the plot we show the number of contacts that any individual has, ranging from between 0 and 5 contacts to over 45 contacts per day. We hope that the new sentence in the caption clarifies this.
In figure 2B, X represents the susceptibility. We have rephrased the sentence citing this figure to make this clearer. Besides, we also discussed this measurement in the Materials and Methods section.
Lastly, in figure 2C, as explained in the caption, the y axis represents the relative difference in the number of infected individuals between the results obtained using each method and the homogeneous setting. Thus, it is the difference between the number of infected individuals using one method and using the homogeneous mixing over the value of the homogeneous mixing. In other words, we are using the relative difference.
3. Some results in the figures seem not sufficient in its reliability. For example, generally, human contacts are symmetric and reciprocal. However, the gray plot in Fig. 1C looks asymmetric.
We are not sure whether we understood this comment by the reviewer. Please, note that as we mention in the text, the more general situation corresponds to asymmetry. Indeed, figure 1C is asymmetric. This was explained in the paragraph starting in line 176. These matrices would be symmetrical only if the demographic distribution of the population were homogeneous. Suppose that we have a population of 2 individuals, each of them having one link towards a third individual who constitutes its own group. The matrix representing these interactions would be (0,1) (2,0) So that M_{1,2} != M_{2,1}. Yet, the relation M_{1,2} N_1 = M_{2,1} N_2 will hold, due to the different sizes of the groups (N_1=2, N_2=1). This was explained in equation (2) and line 180. Fig. 3B, what is the definition of R0 according to the age and more importantly how is this age-dependent R0 obtained. Are the results in Fig. 3B