A mixed-method approach to determining contact matrices in the Cox’s Bazar refugee settlement

Contact matrices are an important ingredient in age-structured epidemic models to inform the simulated spread of the disease between subgroups of the population. These matrices are generally derived using resource-intensive diary-based surveys and few exist in the Global South or tailored to vulnerable populations. In particular, no contact matrices exist for refugee settlements—locations under-served by epidemic models in general. In this paper, we present a novel, mixed-method approach for deriving contact matrices in populations, which combines a lightweight, rapidly deployable survey with an agent-based model of the population informed by census and behavioural data. We use this method to derive the first set of contact matrices for the Cox’s Bazar refugee settlement in Bangladesh. To validate our approach, we apply it to the UK population and compare our derived matrices with well-known contact matrices collected using traditional methods. Our findings demonstrate that our mixed-method approach successfully addresses some of the challenges faced by traditional and agent-based approaches to deriving contact matrices. It also shows potential for implementation in resource-constrained environments. This work therefore contributes to a broader aim of developing new methods and mechanisms of data collection for modelling disease spread in refugee and internally displaced person (IDP) settlements and better serving these vulnerable communities.


Introduction
Epidemics such as COVID-19 have led to devastating consequences for afflicted individuals and their societies.Understanding how such infectious diseases spread, anticipating future trajectories for transmission, and gathering evidence to inform decisionmaking efforts to prevent, mitigate and respond to epidemics is therefore of vital importance.Mathematical and computational models to simulate disease spread are regularly used to support these efforts.Contact matrices are key to understanding social mixing patterns in populations, and a vital input to epidemiological models 1,2 .Despite renewed efforts to develop such models, additional work must be done to ensure they are available to all 3 .
In this paper we present a new method for determining contact patterns based on combining the information gained from increasingly sophisticated models of disease spread, with that from lightweight surveys which can be rapidly rolled out to populations of interest.We attempt to provide information on contact patterns without requiring the traditional, costly methods of contact data collection.Specifically, we will focus on the use case of the Cox's Bazar refugee settlement in Bangladesh.Epidemics in refugee and internally displaced person (IDP) settlements are commonplace and tend to spread rapidly 4 , and only very few models have been designed to simulate outbreaks in these unique environments and to inform public health decision-making 3 .Given the application domain, we believe this is not just an important area in which to contribute knowledge about disease spread patterns, but also a challenging test case which demonstrates the strengths of our methodology.
Throughout this work we will use the JUNE-COX model 5 , an agent-based model built on the JUNE framework 6 .The model constructs a virtual population at the level of individual residents within a digital twin of the Cox's Bazar settlement.Interactions are simulated between the agents -the virtual residents -in a number of "venues" or "locations" that include: shelters; food distribution centres; market places; and learning centres.We use the information from the lightweight survey to guide these interaction patterns based on the demographics of the agents attending the venues contemporaneously.
The contact matrices encode information on the number and duration of contacts between people of one age group and another, and are usually specific to certain venues or locations in which people interact.There are various types of matrices which can be used both separately and combined, including (i) one-directional, contact matrices NCM 7 which count the (normalised) number of contacts a person in category i has with a person in category j, (ii) bi-directional reciprocal matrices NCM R 7 which also add the number of contacts people in category j have with persons in i, and (iii) venue contact matrices NCM V 8, 9 which assume that every person at venue L has contact with everybody else present.In this article, we will discuss an approach to estimating all three types of matrices.
Traditionally, contact matrices are derived using large scale surveys in which participants record the number of contacts they have in different locations and the ages of the people they came into contact with.Additional metadata is sometimes collected, such as the intensity of the contact (e.g.physical or non-physical) and the duration of each individual contact.Surveys of these types have predominantly been run in the Global North, with comparatively few serving countries in which many particularly vulnerable communities reside 10 .Indeed, to date and to our knowledge only one work has published contact matrices for an IDP settlement 11 , and no such work exists on contact matrices in refugee settlements.While such traditional methods of collecting contact data may be considered the gold standard, they are extremely resource consuming to collect, and therefore cannot be run easily during an ongoing outbreak.As an alternative to these expensive direct means of contact data collection, several other methods have sought a more indirect approach.Using the information from existing contact surveys conducted in 8 European countries 7 , and knowledge of the underlying demographic structures in these populations, Prem et al. 12 used a Bayesian hierarchical model to project these matrices onto those of 144 of countries given similar demographic data and underlying similarities between each of these countries and the original 8 selected in the direct data collection.This has recently been expanded to 177 countries 13 .
Similarly, census/demographic data have also been used to construct synthetic populations which are then used to estimate contact matrices.Fumanelli et al. 8 use such data from 26 European countries to construct representative synthetic household, school, workplace and 'general community' environments and then assume that each individual in each setting has a single contact with every other member.This has been extended to 35 countries, while also incorporating finer-grained data to develop more representative virtual populations 14 .The same approach is used by Xia et al. 15 for the setting of Hong Kong.While such approaches are beneficial as they do not require the expensive collection of long-term contact survey data, they are limited by the assumption that different venues contain static populations and that within venue mixing is homogeneous.
By combining demographic data with data sources such as time use surveys 16,17 or transportation surveys 9 , stochastic approaches -e.g.agent-based models -have been developed to capture a broader variety of mixing patterns in populations.These approaches expand on those described above by exploring many permutations of possible within venue mixing patterns.Despite this, these methods still present similar limitations as those described above.Namely, in the absence of any prior information on interaction patterns, it is largely assumed that each agent contacts every other agent in those venues.As a partial remedy to this challenge, disease data is commonly used to fit integer multipliers to these matrices.While this is generally a necessity to be able to forecast disease spread even when using directly collected contact data 18 , due to differences between disease transmission routes this may not resolve the errors at the matrix element level.Indeed, the output of this process does not provide an understanding of the base level of contacts, but rather a set of contact matrices for each disease.This limits the usefulness and generalisability of such matrices in comparison to corresponding matrices from directly collected data.
In this paper, we seek to contribute at two levels: i) We develop a methodology which addresses the challenges above by taking a mixed-method approach to deriving contact matrices.It combines techniques of extracting contact matrices from sophisticated agent-based models, with information derived from a lightweight survey designed to inform and validate the model-derived matrices, while being significantly less expensive to run than the traditional large-scale contact surveys.ii) We use this new approach to present, to the best of our knowledge, the first contact matrices for a refugee settlement.Because of their use in different types of models the matrices need different normalization, either to the full population, as in the case of location-unspecific simple compartment models of the SEIR type, or to the part of the population actually visiting a venue.We will therefore present results for all three types of contact matrices, for a variety of locations, either normalised to the overall population "P" type contact matrices (PNCM, PNCM R , and PNCM V ) or to the actual users of a location "U" type matrices (UNCM, UNCM R , and UNCM V ).
This work also therefore contributes to the global call to action laid out in prior work, which aims, among others, to develop new methods and mechanisms of data collection for modelling disease spread in refugee and IDP settlements 3 .

Methods
The goal of our method is to construct location-dependent social contact matrices with a high level of granularity without resorting to detailed contact surveys.We achieve this by fitting the (virtual) contact matrices of an individual-based model constructed from higher-resolution demographic data of the population to the real-world results from lightweight surveys with a much lower resolution.The resolution and accuracy implicit to the model allows us not only to infer the highly-granular contact matrices, but also allows us to give a first estimate of the associated uncertainties.In the following we further detail this procedure and exemplify it with the construction of social contact matrices for the residents of Cox's Bazar refugee settlement.

The Survey
The level of detail accessed by surveys in refugee camp settings is often heavily constrained by resource considerations (timing, number of enumerators, need for rapid results etc.), and the highly aggregate contact survey we ran in the Cox's Bazar refugee settlement between October-November 2020 is no exception.During this period, the settlement was continuing to experience cases of COVID-19 19 .However, reported case numbers were low, and the settlement activity had largely returned to pre-pandemic levels, with the exception that learning centres (schools) remained closed and masks were still being worn 20,21 .
The following demonstrates the ability to rapidly run a survey during a public health emergency, in a resource-light way, while producing representative results of the contact patterns which can be used in future studies and modelling works.Although a more intensive survey -such as a diary-based longitudinal study -would provide more precise and accurate data, the ability to perform such a survey may be limited by the number of researchers available or more practical concerns such a limiting social contacts between members of the community and enumerators during a public health crisis.
The survey underpinning our study was conducted by experienced enumerators from the UNHCR Community Based Protection (CBP) team, following standard UNHCR practices 22,23 .Its objective was to collect information on the number of contacts people of different demographics estimate they have with others in different venues they attend during a typical day.The survey considered only three categories of residents, defined by their age: children (< 18 years), adults (≥ 18 and < 60 years), and seniors (≥ 60 years), and we constrained the set of surveyed locations to those contained in the digital twin, JUNE-COX.Data was collected from 22 camps in the Kutapalong-Balukhali Expansion Site (part of the Cox's Bazar refugee settlement).In each camp the respondents were two male and two female residents in each of the three age brackets.In addition, two persons with disabilities were surveyed in each camp, resulting in a total of 22 × 14 = 308 respondents.Details of the survey can be found in Appendix C and the accompanying metadata to the anonymised results 24 .The respondents were asked if they attend various venues and, if so, to estimate the number of adults and children they come into contact with there.To avoid skewing results through uncharacteristically long or short times at a venue, the respondents were asked how much time they generally spend at those venues at any given visit such that the total contacts can be re-scaled to contacts per hour.Since the JUNE modelling framework normalises the contact matrices to represent the mean rate of contacts per hour, many of the demographic data underpinning JUNE-COX do not distinguish adults and seniors and so we combine the data in these two age bins into one "adult" category, thereby arriving at highly aggregate 2 × 2 total contact contact matrices t i j1 for the various locations L2 .We use the survey to calculate UNCM R type matrices for different locations.Here we present the methodology to calculate the different versions of the contact matrices: 1. One-directional contact matrices 7 , NCM, (UNCM and PNCM): Following the notation in 25 the PNCM are denoted as M with elements m i j defined by m i j = t i j /n j with t i j the aggregate total number of contacts of n j survey respondents in category j reported with people in category i.
There is a subtle difference to the UNCM with elements µ i j , where the aggregate number of contacts t i j is normalised to the number of actual users in the venue, η j µ i j = ti j/η j .To make contact between the PNCM and UNCM, one therefore merely has to re-normalise to the overall number of respondents in category j, m i j = t i j /n j = µ i j η j /n j = µ i j a j , where a j denotes the attendance rate to the venue in category j.This re-normalisation can be performed for any conversion from population normalised "P" to user "U" normalised matrices.
2. Bi-directional, reciprocal contact matrices 7 , NCM R , (UNCM R and PNCM R ): Following, again 25 , the PNCM R are denoted by C and their elements are defined as where the w i, j are the overall population sizes in categories i and j.This motivates the notion of these matrices being normalised to the overall population.While using these matrices in compartment models, their application in individualbased models may lead to unwanted results.As an example consider the case of contacts between adults and children in matrix school settings, and assuming that this is meant to primarily capture the contact of teachers and pupils.Normalising the number of contact to the overall adult population size would obviously lead to a massively reduced average number of contacts compared to a more correct normalization to the number of teachers in the respective age bins.We therefore define the user-normalised contact matrices UNCM R Γ with entries where ω i, j denote the actual users attending the venue, i.e. ω i = w i a i .In fact, since we resolve the random movement of individuals to distinct locations in JUNE, we use the Γ instead of the C that are more relevant for compartment models.However, we also present results for the population-normalised PNCM R , which can be obtained by simple rescaling by attendance factors a i and a j from the Γ.
3. Isotropic venue contact matrices, NCM V , (UNCM V and PNCM V ): due to the lack of attendance data we cannot directly derive such matrices v i j and ν i j from the survey.However they can be determined virtually.
Finally, we comment to our treatment of the uncertainties in the survey results.Given the small survey sample size, we right-censor the data at the level of the 90th percentile and perform a bootstrap analysis 26 to determine the median number of contacts between subgroups, µ i j .We assume the uncertainty of this value, ∆µ i j , to be well estimated by the standard error of the bootstrap distribution.From ∆µ i j it is straightforward to derive the uncertainty, ∆γ i j , of the reciprocated matrices, we assume the error in the contacts are dominated by the error from reported number contacts per hour at a venue.We take ω i, j in Eq. ( 2) as an exact quantity from the survey.

The Model
For the construction of the digital twin and simulator we use an existing individual-based model, JUNE-COX 5 , specifying the original JUNE modelling framework 6 to the demographics of the Cox's Bazar refugee settlement.(Note that the original application of the JUNE framework was to model the spread of COVID-19 in the UK and we will refer to this UK specific specification as JUNE-UK.)Both JUNE-UK and JUNE-COX use census data to create a virtual population at the individual level, with JUNE-COX specifically focusing here on the Kutapalong-Balukhali Expansion Site of Cox's Bazar.The census data of its population is organised according to a geographical hierarchy; the ∼600,000 residents are distributed over the 21 camps ("regions") which make up the Kutapalong-Batukhali Expansion Site (in reality there are 22, however, we combine Camp-20 and the Camp-20 extension together given data availability constrains), these contain between 2-7 UNHCR Admin level-2 blocks ("super areas") comprising ∼ 5000 people, which in turn are composed of sub-blocks ("areas") with 90 households on average.The geographical distribution of individuals and their households is explicitly incorporated in the model through the geo-locations of the area centres.For a more complete description of how we distribute individuals into households see Appendix E and the original work describing JUNE-COX 5 .
After the individuals are created and clustered into households, JUNE-COX constructs different venues in the settlement given their latitude and longitude coordinates: food distribution centres; non-food distribution centres (including LPG distribution centres); e-voucher outlets; community centres; safe spaces for women and girls, religious centres, learning centres, hand pumps and latrines.To simulate the movement of individuals in the settlement we decompose each calendar day into discrete time-steps in units of single hours.JUNE uses calendar days to distinguish weekday and weekend activity profiles where certain venues will be closed.Many individuals have fixed, static, activities, such as the 4 hours at the learning centres for enrolled children and the adults specified as teachers.There is also a fixed 14 hours night-time period, during which everyone returns to their shelter.However, the remaining time is free and people are distributed dynamically.Each person not otherwise occupied (e.g.working, or at a medical facility) is assigned a set of probabilities for undertaking other activities in their free time in the model.These probabilities are part of our social interaction model, and depend on the age and sex of the person (Figure 1).They are based on previously collected data capturing daily attendance rates and coarse estimates in proportions of adult/child and male/female attendance (see previous work for details on these calculations and associated data sources 5 and have been further augmented by a series of interviews with CBP officials as detailed in Appendix D).
Given N possible activities with associated probabilities per hour given by λ 1 , ..., λ N , for a person with characteristic properties p, the overall probability, P, of an individual being involved with any activity in a given time interval ∆t is modeled through a Poisson process: If the individual participates in at least one of these activities, the specific activity i is selected according to: and the person is moved to the relevant location.If no activity is selected, the individual will stay in their shelter.One of the outcomes of this exercise is condensed in Figure 1, which shows the likelihoods that men and women attend the different venues in the model as a function of their age.
It is important to stress that such census and demographic data is by default recorded by UNHCR and other non-governmental organisations (NGOs) operating in refugee and IDP settlements, and it can be further supplemented or clarified by the survey described above or by interviews with settlement staff.This implies that it is a relatively straightforward exercise to apply our procedure outlined here to other settlements.

5/31 A Mixed-Method Approach
We have now set the stage to combine the information about the aggregate contact patterns with our highly-detailed model of interactions in a representative virtual population and to interrogate the model and extract detailed, survey informed, matrices.JUNE uses stochastic methods to simulate contacts between members of the virtual population which can be used to construct synthetic CMs.The random behaviour of the virtual population is encoded in repeatedly sampling the γ i j from a Poisson distribution, γi j ∼ P(κ i j ) with the argument κ i j distributed according to a normal distribution, with the γ i j and their uncertainty taken from the survey and re-scaled by the ratio of the typical time people attend a location, T , and the size of the emulation time-step in the model, ∆T .Finally we statistically round the individual instances γi j to integer values.The resulting emulated set of γi j are normalised such that they represent an individual's contacts per hour.Averaging generates the γi j which can be directly compared with the γ i j obtained from the survey.
In the simulation we aim to perform a virtual survey on the virtual population, as close as possible to the conditions in the real-world light-weight surveys.We sample individual behaviour over 28 virtual days to obtain individual γi j 's every time a person attends a venue.The venues are filled according to the probabilities described above, Eqs.(3, 4) and we "measure" the total raw contacts ti j (see Algorithm 1 in Appendix F) in the simulation.To further insure the correct total expected attendance time at the virtual venues compared with the real world, we proportionally close venues to approximate their possible fractional opening times.
This procedure allows us to directly compare resulting matrices ti j , γi j , and ĉi j with their real-world counterparts t i j , γ i j , and c i j above.Even more, we are not constrained to the creation of virtual 2 × 2 contact matrices only, but can infer matrices for any sub-classification i and j that our simulation allows -in the results we present here, the i and j are age brackets of size 1 year.The final type(s) of contact matrix, PNCM V and UNCM V , vi j and νi j , can also be calculated with a minor modification to the algorithm that counts the averaged total contacts per hour, Algorithm 1. Instead of generating a list of people p j at the venue in contact with each person p i , we allow "democratic/isotropic" contacts of all people: For each entry, i j, this represents the total contacts the ηi people with characteristics i at the venue have with the population of the venue in each subgroup.The Kronecker-δ corrects for "self-contacts".

Results
In this section we present the results of the contact matrices derived from our mixed-method approach.We begin by validating our method in the context of the UK where we compare our results against contact patterns directly collected by a traditional survey 25 .Once our method has been validated, we present the matrices for the Cox's Bazar refugee settlement.Throughout, we use several key metrics to determine the similarity between any two sets of matrices: 1. Normalised Canberra distance, D C 27 : where C and C ′ represent two contact matrices we wish to compare, Dim denotes the number of elements, Dim(C n×m ) = n • m, and Z is the number of non zero elements of the difference (C i j −C ′ i j ).; 2. Q index as measure of assortativity 28 : 3. Dissimilarity index, I 2 s 29 :

6/31
where σ p is the standard deviation of the ages of the population, and ⟨(S − T ) 2 ⟩ F c represents the expectation age difference between contacts s and t of the function F c (s,t): Here, ∆t and ∆s are the age bin sizes from the contact survey.
The normalised Canberra distance gives an estimation of the similarity between two matrices -approaching 0 when they are more similar and 1 when dissimilar.The remaining statistics measure the level of assortativity -the level of diagonal dominance and therefore the rate at which similar ages interact compared with dissimilar ages.The Q index ranges from 0 -homogeneous, proportionate mixing -to 1 -fully assortative.I 2 s measures the deviation from perfect assortativity with a value of 0 when fully assortative, and 1 for homogeneous interactions.

UK Validation
The first step of our virtual survey validation is to compare our results with that of real surveys conducted in far greater granularity.JUNE-UK has had extensive tuning for COVID-19 modelling in the UK 6,18,30 .As a proof-of-concept, we focus on the most complex contact matrix -that of the household -and compare the contact matrices produced by the simulation with those from a traditional diary-based survey 25 .The input contact matrix is constructed from a combination of this data, the Office of National Statistics (ONS) census data of UK households 31,32 and UK population demographics 33 .Since the UK census for household types distinguishes children (kids, K, <18 years old), young adults such as students or other dependent resident (Y, assumed 18-25 years old in JUNE-UK), adults (A, assumed 26-65 years old), and older adults (O, assumed >65 years old), we aggregated the granular contact matrix derived from the survey into a significantly coarser 4 × 4 matrix mapping the census categories.We also corrected for different household types to better incorporate the details of the venue-specific heterogeneities in their demographic composition.For more details on this procedure, see specifically Section 4 and Appendix C of the original description of the JUNE-UK modelling setup 6 .The results in Figure 2 show the 4 × 4 input matrix derived from the aggregation process described above and a comparison of the output of the PNCM R Γ from the JUNE-UK model virtual contact survey with the results of the matrix C from the traditional survey.Corresponding results for work place and School settings can be found in Appendix B. This provides a closure test ensuring that JUNE-UK returns realistic contact matrices from coarse aggregate matrices.Clearly, our mixed-method approach is able to reproduce the broad structure of the real-world data -especially capturing the patterns of contacts between children and their parents represented in the off-diagonal structures.The original survey did not contain information on the contacts of younger children due to constraints on the data collection methodology; our method is able to fill this gap.To further validate our approach, we compare the Q, I 2 s and D C metrics of the two matrices.Table 2 shows that the first two metrics are in close agreement, with the overall Canberra distance being close to 0, thereby confirming the similarity of the matrices.Indeed, the difference between the measures of assortativity are comparable or better than those found in similar studies but which do not make use of the guiding input aggregate matrix as we do here 17 .Given these strong findings, together with the visual and structural similarities of the matrices, we consider our mixed-method approach to be reasonably validated for  25 in Fig. 2.
application to settings in which intensive survey-based approaches to deriving contact patterns are not feasible.For real-world applications, we note that our methodology is clearly not exactly reproducing the original surveys; however, users will have to decide whether these errors are acceptable in comparison to having little or no knowledge about contact patterns, or making necessary assumptions about these patterns.It is also worth noting that the virtual agent behaviour of JUNE-UK are much better informed than those in JUNE-COX.This will become clear in the disparity between NCM, NCM R and NCM V type contact matrices.PNCM V matrices presented in the Figure 14 and PNCM R matrices in Figures ??, 13 have the same general shape and scaling of features.In the case of JUNE-COX derived matrices NCM, NCM R and NCM V types are less similar.

Contact Matrices in Cox's Bazar Refugee Settlement
The lightweight survey in the camp was conducted across the following venues: "community centres", "distribution centres", "e-voucher outlets" and "formal education centres".For the remaining two venues -"play groups" and "shelters" -we assume that everyone generally mixes with everyone else in that location given the assumed small groups of children who play together, as well as the dense shelter environments.Since certain shelters are shared between multiple families, we differentiate intraand inter-family mixing with the latter being represented by the diagonal elements of the aggregate matrix (i.e.setting these to the number of contacts within each of the two families in the shelter, and with the off diagonal elements set to the number of contacts between the families).As discussed in previous work 5 , we set the number of contacts within the families or play groups to the average size of these respective groups assuming homogeneous mixing in these settings.In the case of the play groups we dis-aggregate the population into three age groups 3-6, 7-11 and 12-17 which mix homogeneously to emulate children typically interacting with children of similar age.We report the results for the UNCM R γ i j of the prior information and of the survey in Figs 3 and 4. We also perform a closure test by comparing them to the UNCM μi j results from performing a similar survey in JUNE-COX with the same coarse population categories.In the two figures we use the shorthand "T" and "S" for teachers and students in the learning centres, and "H x " for household x in a shared shelter.Once we have determined the UNCM and confirmed that their stochastic uncertainties are within the uncertainties of the input interaction matrices, we can perform any custom binning for arbitrary group characteristics.Fig. 5 shows the final fully    patterns are defined differently for adults and children leading to attendance differences at 18. Secondly, at years of age men are permitted to attend the religious centres.Due to the high rate of attendance observed at the religious centres, there is a drop in attendance at other non-religious centre venues of this age group relative to other age groups.The corresponding UNCM and UNCM V can be found in Appendix A. In Fig. 5, we can clearly see the effects of the different age groups and guiding contact rates.For example, we observe large differences in the number of contacts between all age groups with adults in the community centres relative to the distribution centres, with substructures based on the age profile of children attending these locations shown through the higher number of contacts in younger age brackets.In addition, the learning centre matrices show a clear mix of contacts between children in 10/31 their mixed classes and their teachers -this matrix also encodes information on the enrollment rate of children in the education system, with lower enrollment rates as the age of children increase.Finally, the detailed information available on household and shelter composition appears in the shelter contact matrix which contains a number interesting features.We reconstruct a strong leading diagonal which represents persons of similar ages living together; siblings, parents and grandparents of similar ages the width of the band reflects spousal age gaps and minimal age gaps between consecutive siblings.Using more detailed information about the average age of parents at the birth of their first child we also develop off-diagonal structure in the upper left and lower right quadrants.There exists an almost linear structure corresponding to children and parents interacting and aging together.This structure then tapers off indicating interactions in multi-generational households before many children would leave home at around 18. The details of the household construction and the statistics that define it can be found in Appendix E.
A simpler approach is to just assume that everyone contacts everyone else in these dense settings in the absence of other information -we also present the results for the corresponding UNCM V in Appendix A, Figure 9.However, clearly there is a significant loss of information in doing this, in comparison to the mixed-method approach, as can be seen in the absence of structural detail in many of the UNCM V matrices.
Population normalised matrices can be calculated from the user normalised matrices with a simple re-scaling as described above.We present these for completeness and the varied utility of each normalisation in different model types in Appendix A, Figures 10, 11, 12.

Discussion
To the best of our knowledge, the matrices presented in this paper are the first contact matrices derived for a refugee settlement.While not collected using traditional survey methods, we use a mixed-method approach for their calculation, which presents a new way to collect contact data.This is particularly useful in settings, such as in refugee settlements, in which data collection can present many challenges, and therefore needs to be lightweight and integrated in to existing data collection regimes and programming.
We are able to perform closure tests on the contact matrices we derive and show that they clearly demonstrate great potential for a lightweight survey and an agent-based model to provide deeper insights into social environments when combined together.The survey and JUNE-COX derived contact matrices are initially validated by a comparison of their Canberra distances over the survey subgroups i j.These Canberra distances are found to be very close to zero with the exception of the e-voucher outlets in which child -child contacts are higher in JUNE-COX than reality.This discrepancy can be explained by considering that the survey has a high uncertainty in the expected child -child contacts, an error in which JUNE-COX incorporates into the contact tracking algorithm.
Further validation is performed with JUNE-UK derived matrices on age-disaggreagated contact matrices in which we are able to use other statistics such as I 2 S and Q.These matrices were found to be in good agreement with other more intensive contact surveys.This validation ensures that the combination of coarse input contact matrices and the attendance rates responsible for agent dynamics yield representative contact patterns over all ages.
In the case of refugee settlements, the derived contact matrices can be used to understand the social contact patterns using data already collected regularly by international organisations such as UNHCR, while being supplemented by data which can easily be collected by enumerators in a resource-efficient way.The highly-detailed matrices derived for the Cox's Bazar settlement demonstrate clear inter-age mixing patters which are crucial inputs to other epidemic models to represent realistic social mixing patterns.In particular, clear features are present in the matrices due to differing attendance rates and household compositions.
From the technical perspective, there are several further considerations and limitations to this methodology that become apparent when analysing the full age-disaggregated contact matrices (Fig. 5 and Appendix A).These pertain to the way in which the data is collected and the model is constructed, and can be used as ways to diagnose the performance of the method:

Subgroup classification:
Subgroup classification refers to the broad definition of subgroups defined in the model.Throughout JUNE-COX and JUNE-UK we define "Adults", "Children", "Teachers", "Workers" etc. which all have unique parameters and rules governing their behaviour.Subgroups defined by age can lead to strong banding artifacts in the contact matrices.These effects can be mitigated by blurring the age cut-off with some finite probability -e.g. that a child of 17 may behave like an adult.This mitigation should only be implemented in situations in which we are certain that there should not be a discontinuity in behaviours in the real world.For example, only over 11 year old men are permitted to attend the religious centres and hence we expect a cut off in the contact matrices whereas in many other venues we expect a gradual shift in behaviour as children move into adolescence and then adulthood.This can be a positive feature of the model -i.e. that the model represents the behavioural and movements patterns correctly and forces agents to make a choice between activities they perform as they would in real life -however, this relies on reasonable behavioural data, insights and assumptions.This is demonstrated most clearly in the shelters contact matrices in which the household clustering places adults and children differently based on fixed rules derived from survey and census data (see Appendix E).

Virtual venue demography:
The dynamics of virtual spaces in the simulation are dictated by the probabilistic attendance rates (see Figure .1) and age cut offs.The attendance rates are a function of age, sex, time and venue which leads to different demographies across the virtual spaces and therefore different social mixing behaviours.Again, due to the nature of the simulation in which we have strict probabilistic rules which determine the attendance of different subgroups (children, adults, age or sex etc.),

12/31
we can get strong divisions between groupings.This is shown by the discontinuities in the heat-map representation of the contact matrices.In particular, only men over the age of 11 are permitted to attend the religious centres leading to discontinuities in the religious center contact matrices.In JUNE-UK, there is no simulation of parent-teacher interactions at school that might occur during pick up or drop off times, and the virtual school setting is strictly modelling student-teacher interactions where any teacher-teacher interactions would be restricted to the classroom setting.Further, no children attend any work place settings, and agents can only be employed or attend a work place venue between the ages of 18-65.The contact matrices produced from JUNE-UK therefore lack certain features shown by the BBC Pandemic project.However, this is a problem all such approaches that rely on an imperfect virtual representation of reality can experience.

Virtual world rules and behaviour patterns:
The combination of the above points leads to complex inter-connected behaviours across the simulation.Considering the behaviour of coarser subgroups across all venues we see more general behaviours emerge; for instance, children are less likely to attend any virtual venue than adults, and men are more likely to attend any venue than women due to the attendance at religious centres which increases the overall rate of men not staying in the shelters compared to women, leading to an asymmetry in the shelter contact matrix.An 11-18 year old is more likely to see a 6-11 year old than the converse.A 6-11 year old is more likely to be home than a 11-18 year old therefore on average in any timestep a 6-11 year old will not contact an 11-18 year old in shelters, but when the 11-18 year old is home they will likely contact the 6-11 year old.The normalisation of contacts by users (or population) and contact duration (as done throughout) makes this effect visible.There are other instances, such as the community centres, in which we see a banding effect which is an induced artifact from the movement criterion of the agents in the model (see Figure 1).The high attendance rate expected of 11+ men leads to a reduction in attendance of this group across all other venues, and many of the contact matrices show a banding effect between 11 -18 due to this behaviour.
Given the level of detail contained within the model-derived contact matrices, they have the ability to reveal potential short-comings in both the survey setup as well as the modeling of the virtual world, as they reflect how sophisticated and well understood each venue type is.This means that the amount of resources needed to be expended on collecting more data on certain locations can be estimated in order to improve certain matrices.These can be traded-off against the resources available and the relative expected gain from their expenditure.In this work, we validated our contact tracker in two very different models, JUNE-UK and JUNE-COX.In the former, we demonstrated that the NCM V and NCM R agree well with data collected using traditional methods, cf.Tables 2 and 3.In the latter, NCM V type contact patterns are not available as our extracted contact matrices used coarse survey information on venue attendance to inform the simulation of contact patterns there, with the notable exception of the shelters, which are relatively precisely captured by the census data.Our mixed-method approach allows us to partially compensate for the gaps in detailed understanding of demographic structures at the lesser-known venues.

Conclusion
In this work we demonstrate the complementary power of a lightweight contact survey, approximate details about venues and their attendance rates by different demographic groups, and an agent-based model to generate detailed social contact matrices.In the case of the Cox's Bazar refugee settlement, we use an existing model of the settlement developed using the JUNE framework to perform a virtual contact survey, which is informed by the highly aggregate real world survey, to produce more granular contact matrices which can be further interrogated.Our constructed contact matrices will provide an important input to future disease spread modelling or social dynamic studies in the settlement, and provide a baseline which can be translated to other settlements as well.Further, our method can easily be adapted to other settings for which detailed contact matrices are not available, thereby enabling the use of disease models in contexts where previously large assumptions would have had to have been made about contact patterns.Contact matrices form the backbone of many disease models, and so calculating them at a global scale, with the specific inclusion of those groups who are often most vulnerable to disease spread, is essential 3 .

Code and Availability
• JUNE and JUNE-UK: The current public release of the JUNE simulation framework, and by extension the latest version of the JUNE-UK model, can be found at https://github.com/IDAS-Durham/JUNE • JUNE-COX: The current public release of JUNE-COX epidemic model can be found at https://github.com/UNGlobalPulse/UNGP-settlement-modelling

A Contact Matrices
Here we present the remaining contact matrices derived from JUNE-COX 3 .The derived input interaction matrix, UNCM R for "Companies" and "Schools", where the labels "W" refers to "Workers", "S" students, "T", teachers.Center: The simulated age-binned PNCM R matrix with entries Ĉi j from JUNE-UK.Right: The BBC Pandemic project "all home" contact matrix, C, with entries c i j .

A.1 UNCM V Interaction
Company School

C Survey
The survey between October-November 2020 was conducted by enumerators from the UNHCR Community Based Protection team who regularly conduct surveys within the settlement following standard UNHCR practices 22,23 .Data was collected from 22 camps in the Kutapalong-Balukhali Expansion Site (part of the Cox's Bazar refugee settlement) consisting of 2 men and 2 women in each of the following categories: < 18 years; ≥ 18 years < 60; ≥ 60 years.In addition 2 persons with disabilities were surveyed to make a total of 308 respondents.Anonymised results, and additional metadata, can be accessed through UNHCR 24 .
The survey was conducted by enumerators randomly sampling households in each camp and visiting them in person.Only one respondent per household was permitted and responses were collected using the Kobo Toolbox 34 based on the Open Data Kit 35 .The survey was formatted as follows (italicised text is spoken): This questionnaire has been designed by teams from United Nations Global Pulse and UNHCR and is to inform efforts to better understand how people move around in the camp and interact with others to better understand how COVID-19 might spread in the camp to inform future COVID-19 protection measures.
Good day by name is ____ from UNHCR and I am here to conduct a survey.This study is part of a scientific research project from United Nations Global Pulse and UNHCR.In this study, we will ask questions to better understand how people move around in the camp and interact with others.Your decision to complete this study is completely voluntary, and you may decline to answer at any time.Your answers will be completely anonymous.The results of the research may be presented at scientific meetings or published in scientific journals.For any questions or comments please contact: ____.The survey should not take longer than 30 minutes.

D Questions for the CBP team
To supplement our analysis, a series of informal interviews were conducted with members of the Cox's Bazar refugee settlement UNHCR Community Based Protection (CBP) team.In each of these interviews a set of general enquires into the behaviour and attendance rates were asked of members of the protection team which worked closely with those venue types.
This questionnaire has been designed by teams from United Nations Global Pulse and UNHCR and is to inform efforts to better understand how people engage with each venue in the camp and the demography of the venues.
For the following venues: Community centres, Female friendly spaces, Food distribution centres, E-voucher outlets, Non-food distribution centres -including LPG and blanket centres -Religious centres, and Learning centres.Where you are able and suitably informed please could you answer the following questions; 1.
• Can you describe what a day looks like at venue ?-How many people do you expect at minimum and peak times?
-How do these days and numbers of people vary by day, week, month/season?
-Why do you think there are these variations? 2.
• What is the makeup of multigenerational households -are there generally three generations or more?
-Do these households include extended family?
-Is this a cultural issue or a space constraint?

3.
• What age do children typically move through the camp independently?
-Move out from parents shelter?
-Go to venues on their own?(e.g collect items from the distribution centres for their shelter) -How many hours do they spend moving around in the camp independently?
-Who do they mostly have contact with when they move around?(e.g. more children, teachers at school, other adults?all)

4.
• What time do venues close? 27/31 3. The mean age of mother at birth of first child.
These properties are all known at the super-area level unless specified otherwise.The resulting household demographic structures can be seen in Figure 16 and shelter sizes in Figure 17.The age brackets for each demographic are inferred from survey and data from the settlement.Children [0 -18], 18 is the age at which marriage is legal for women (21 for men), Adults [18 -49]    Those groups where data was unavailable are reported in the figure for completeness.

29/31
F Algorithm for the virtual survey Algorithm 1: The virtual survey.Loop over all venues and people and simulate P contacts between i and j subgroups from survey.The contacts can then be clustered into arbitrary subgroups k, l.We allow for multiple contacts between the same people at venue L.
Data: tL i j = [0] kl for L ∈ Venues do for P x ∈ People @ L, P L do i = subgroup(P x )

Figure 1 .
Figure 1.The mean likelihood to attend certain venues in any weekday 2 hour timestep interval by age for Left: Men, Right: Women.

Figure 2 .
Figure 2. Left: The derived input interaction matrix, UNCM R for "Households".Center: The simulated age-binned PNCM R matrix with entries Ĉi j from JUNE-UK.Right: The BBC Pandemic project 25 "all home" contact matrix, C, with entries c i j .

3 Figure 3 .
Figure 3. Two pairs of UNCM R for the virtual venues determined prior to the light-weight survey (Left) and the JUNE-COX virtual survey in the same coarse population bins (Right), including the Canberra distance between them.

Figure 4 . 3 Q = 8 . 2 × 10 Figure 5 .
Figure 4.The UNCM R from the contact survey data (Left) and JUNE-COX virtual survey UNCM (Right), with the relative Canberra distances.We set "Community centres" and "Distribution centres" identical to "Female friendly spaces" and "Non-food distribution centres", respectively.

Figure 14 .
Figure 14.The normalised venue contact matrices (PNCM V ) by age as simulated in JUNE-UK. .

c h i l d r e n o n l y c h iFigure 16 .
Figure 16. Figure of proportion of household types.Note that not all of these groups are mutually exclusive.Green represents the reconstruction in JUNE and Blue the reported data (if available).Those groups where data was unavailable are reported in the figure for completeness.

Table 1 .
The CM are time normalised, the various UNCM are further normalised by population at the venues, while the PNCM are instead normalised by the total population.

Table 2 .
Contact matrix statistics calculated for the results for the a range of contact matrix types from JUNE-UK and the BBC Pandemic project Details and calculation at https://github.com/UNGlobalPulse/UNGP-contact-survey• JUNE-UK Household contact matrix: Details and calculation at https://github.com/IDAS-Durham/june_household_matrix_calculation • Contact Matrix Results: Our contact matrices are reported here https://github.com/IDAS-Durham/june_mixed_method_CM_results formatted in excel documents for convenience.
• Data: The data from the survey is available by application at https://microdata.unhcr.org/index.php/catalog/587

Table 3 .
Contact matrix statistics calculated for JUNE-UK and BBC Pandemic project reported for company and school mixing.These statistics are calculated for the UK demography reported by ONS in 2011 33 .
If adult: Do you declare that you are at least 18 years of age and that you agree to complete this survey voluntarily?When you go to a food distribution center, approximately how many adults do you come into contact with at the center (for example, talk to)?(d) When you go to the food distribution center, do you wear a mask in the center?When you go to a religious meeting, how much time do you spend there?30 minutes, 1 hour, 1 hour and 30 minutes, 2 hours, other (please specify) (b) When you go to a religious meeting, approximately how many children do you come into contact with at the meeting (for example, talk to)?(c) When you go to a religious meeting, approximately how many adults do you come into contact with at the meeting (for example, talk to)?(d) When you go to a religious meeting, do you wear a mask in the meeting?17.(a) When you go to a water pump or latrine, how much time do you spend there?30 minutes, 1 hour, 1 hour and 30 minutes, 2 hours, other (please specify) (b) When you go to a water pump or latrine, approximately how many children do you come into contact with (for example, talk to)?
• If child: -To parent or guardian: Do you declare that you are at least 18 years of age, that you are the parent or guardian of this child and that you give consent for your child to complete this survey voluntarily?-To child: Do you declare that this is your parent or guardian and that you give consent to complete this survey voluntarily?2. Sex: Female, Male, Other, Do not want to answer 3. Location at the time: ____ (camp) 4. Age: under 18, over 18 but under 60, over 60 5. Disability: Y/N 6. Do you have access to a face mask?Y/N 7. When the learning centres were open, did you attend any formal education?Y/N (a) When you go to a food distribution center, how much time do you spend there?30 minutes, 1 hour, 1 hour and 30 minutes, 2 hours, other (please specify) (b) When you go to a food distribution center, approximately how many children do you come into contact with at the center (for example, talk to)?(a) When you go to a community center, how much time do you spend there?30 minutes, 1 hour, 1 hour and 30 minutes, 2 hours, other (please specify) (b) When you go to a community center, approximately how many children do you come into contact with at the center (for example, talk to)?(c) When you go to a community center, approximately how many adults do you come into contact with at the center (for example, talk to)?(d) When you go to a community center, do you wear a mask in the center?15.Do you ever go to a religious meeting?Y/N 16. • If yes: (a) (c) When you go to a water pump or latrine, approximately how many adults do you come into contact with (for example, talk to)?(d) When you go to a hand pump or latrine, do you wear a mask?26/31 (49 being chosen to provide a realistic age gap for potential grandparents, twice the average mother-child age gap, 22.43 years plus the average spousal age gap, 4.73 years).[49-100] for old adults, the remaining ages in the camp.JUNE-COX has an over clustering of children with single parent housing, this due to any remaining children being randomly clustered into households with adults after the children with couples houses are constructed.The microscopic properties of the clustered households are summarised between Figures16 and 18 1 end for j ∈ L subgroups do Generate γL i j if γL i j = 0 then continue else Generate randomly a list of P contacts of γL i j people at L in subgroup j not including P x end end for P c ∈ P contacts do k = subgroup(P x ) l = subgroup(P c ) tL kl /T L 31/31