Using data from ‘visible’ populations to estimate the size and importance of ‘hidden’ populations in an epidemic: A modelling technique

We used reported behavioural data from cisgender men who have sex with men and transgender women (MSM/TGW) in Bangalore, mainly collected from ‘hot-spot’ locations that attract MSM/TGW, to illustrate a technique to deal with potential issues with the representativeness of this sample. A deterministic dynamic model of HIV transmission was developed, incorporating three subgroups of MSM/TGW, grouped according to their reported predominant sexual role (insertive, receptive or versatile). Using mathematical modelling and data triangulation for ‘balancing’ numbers of partners and role preferences, we compared three different approaches to determine if our technique could be useful for inferring characteristics of a more ‘hidden’ insertive MSM subpopulation, and explored their potential importance for the HIV epidemic. Projections for 2009 across all three approaches suggest that HIV prevalence among insertive MSM was likely to be less than half that recorded in the surveys (4.5–6.5% versus 13.1%), but that the relative size of this subgroup was over four times larger (61–69% of all MSM/TGW versus 15%). We infer that the insertive MSM accounted for 10–20% of all prevalent HIV infections among urban males aged 15–49. Mathematical modelling can be used with data on ‘visible’ MSM/TGW to provide insights into the characteristics of ‘hidden’ MSM. A greater understanding of the sexual behaviour of all MSM/TGW is important for effective HIV programming. More broadly, a hidden subgroup with a lower infectious disease prevalence than more visible subgroups, has the potential to contain more infections, if the hidden subgroup is considerably larger in size.


Introduction
HIV prevalence has declined in southern India in recent years (National AIDS Control Organisation, 2017b), thought to be attributed in part to reductions in heterosexual transmission during commercial sex (Arora, Kumar, Bhattacharya, Nagelkerke, & Jha, 2008). However HIV prevalence remains high among gay, bisexual and other men who have sex with men (MSM) and people who inject drugs (Solomon et al., , 2019. In fact, in three northern cities of India there is the suggestion of emerging epidemics among MSM as HIV incidence is high in that population, despite low or moderate HIV prevalence in the general population . In India, concepts of gender and sexual role have historically defined different subgroups (Asthana & Oostvogels, 2001), with identity linked to whether they predominantly take the receptive (passive) or insertive (active) role in anal sex (Phillips et al., 2008). We use the acronym MSM to refer to cisgender men who have sex with other cisgender men or transgender women, and the acronym TGW to refer to transgender women who have sex with MSM. Examples of these MSM/TGW subgroups include the following: Hijras are transgender women, predominantly take the receptive role during anal sex and have exclusively male partners. Kothis are cisgender men and predominantly take the receptive role during anal sex. Double-deckers are cisgender men who are more likely to have sex with other men than women (Phillips, Bradley, Lowndes, Anthony, & Alary, 2009) and are versatile (have no preference between insertive and receptive roles) in anal sex. Bisexuals are cisgender men who are predominantly insertive and are thought to prefer sex with men but also engage in sex with women (Phillips et al., 2009). Panthis are cisgender men who engage in sexual activity with other men but do not identify as MSM (although are included within our use of the term MSM). They are thought, more typically, to take the insertive role and to have women as their main sexual partners.
As part of Avahan, the India AIDS Initiative of the Bill & Melinda Gates Foundation, two rounds of integrated biological and behavioural assessment (IBBA) surveys were conducted among MSM/TGW in the city of Bangalore, within Karnataka state in southern India, the first in 2006 and the second in 2009 (Bill & Melinda Gates Foundation, 2008;Brahmam et al., 2008;Karnataka Health Promotion Trust, 2013;Saidel et al., 2008). The majority of the data were collected from MSM/TGW recruited at city 'hot-spot' cruising sites, with over half of those surveyed reporting sex involving payment (Brahmam et al., 2008;Karnataka Health Promotion;Saidel et al., 2008;Karnataka Health Promotion Trust, 2013). Of those surveyed in 2009, 85% reported their identity as being either hijras, kothis or double deckers and 78% of the total sample reported their main sexual partner as being either a panthi or bisexual (Karnataka Health Promotion Trust, 2013). Consequently, panthis and bisexuals are likely to be significantly under-represented in the surveys.
A mapping and enumeration exercise were undertaken prior to the surveys which provided MSM population size estimates (Bill & Melinda Gates Foundation, 2008;Lowndes et al., 2012), but these were considered to be underestimates and predominantly identified hijras and kothis. It is, therefore, unclear how large the population of panthis and bisexuals may be, and how representative the panthis and bisexuals included in the surveys are of their wider population. Such information is important for understanding optimal HIV prevention strategies and whether these should focus predominantly on the 'visible' MSM/TGW (i.e. MSM/TGW captured in the surveys at hot spots), or expand outreach to also include the MSM who are currently more 'hidden' (i.e. panthis and bisexuals that may not have been adequately captured in the surveys). It is thought that a large proportion of MSM are likely to be 'hidden' from society, and do not participate in intervention services (Phillips et al., 2010).
The objectives of this paper are to investigate whether our mathematical modelling and data triangulation techniques can be used to determine if the epidemiological attributes of the panthis and bisexuals captured in the survey are likely to be representative of the broader population of panthi and bisexual MSM (including those 'hidden'), and explore their potential importance for the HIV epidemic.

Model structure
A deterministic dynamic compartmental model of HIV and STI transmission was developed (details provided in Appendix A). In brief, the MSM/TGW were grouped into three subgroups according to their sexual identity and predominant role behaviour ( Fig. 1): kothi and hijra (mainly receptive); double deckers (versatile); and panthis and bisexuals (mainly insertive).
HIV infection was divided into four stages, with increased infectivity for the first and last stages. MSM/TGW were assumed to cease sexual activity following the last stage of HIV when they developed AIDS (Wawer et al., 2005). MSM/TGW also leave the model as they cease MSM/TGW sexual activity at rate 1/(duration sexually active as MSM/TGW). Antiretroviral therapy (ART) was modelled as a separate compartment from when it was initially distributed in 2006 (Beattie et al., 2012). Owing to a lack of data on ART coverage for MSM/TGW in Bangalore, we assumed a recruitment rate similar to that used by others (Degenhardt et al., 2010), with 10e50% of eligible MSM/TGW per year being recruited onto ART. Although general population recruitment rates from 2006 to 2009 were lower (3e20%), we expected MSM/TGW to be recruited at a higher rate since a large proportion of the visible MSM/TGW were already linked into HIV care services. Data from Bangalore from that time and soon after also supports this range of ART coverage among MSM/TGW Pickles et al., 2013), although there has been scale up of ART coverage across India in more recent years (National AIDS Control Organisation, 2017a; Pickles et al., 2013). ART was assumed to reduce HIV transmissibility (Cohen et al., 2011) and increase survival (Wandel et al., 2008) in the model.
Two other STIs (HSV-2 and syphilis), and their interaction with HIV, were incorporated into the model. HSV-2 transmission was modelled dynamically in parallel to HIV, assuming a susceptible-infected structure for this lifelong infection (Becker, 2002). HSV-2 reached endemic equilibrium before HIV was seeded. Syphilis prevalence was assumed to remain constant over time. MSM/TGW infected with either syphilis, HSV-2 or both had an increased probability of acquiring or transmitting HIV (Freeman et al., 2006).
As in our prior modelling analysis (Mitchell, Foss, Prudden, et al., 2014), the size of the insertive MSM subgroup was set to balance the total number of insertive and receptive sex acts in the whole MSM/TGW population. The number of receptive sex acts reported by each subgroup were distributed among the three subgroups (as their partners) in proportion to the number of insertive acts reported by each (partner) subgroup. Additional information about how mixing between subgroups was modelled can be found in Appendix B.
The MSM/TGW population size was assumed to remain constant over time, with those MSM/TGW leaving the model being replaced by new susceptible individuals of the same subgroup. The robustness of the model predictions to population growth was explored in the sensitivity analysis (see subsection 2.4.1).

Data for parameterising and fitting the model
HIV, HSV-2 and syphilis prevalence data were obtained from two IBBAs conducted in urban Bangalore (Bill & Melinda Gates Foundation, 2008). The first survey was conducted in 2006 (n ¼ 307) (Brahmam et al., 2008;Saidel et al., 2008) and the second in 2009 (n ¼ 403) (Karnataka Health Promotion Trust, 2013). Data for behavioural parameters were obtained from the IBBA face-to-face interview questionnaires (2006 and 2009) and a special behavioural survey (SBS) conducted among MSM/TGW in Bangalore in 2006 (Phillips et al., 2008). Table 1 provides the behavioural parameters, and their sources and derivation, with further details in Appendix C. Biological parameter values were obtained from the scientific literature and are given in Table 2.
Estimates of the total MSM/TGW population size for urban Bangalore were based on anonymous polling booth interview data from Karnataka (on "ever" having anal sex with another man), in which participants indicted their response by placing tokens in containers which were not traceable to individuals (Lowndes et al., 2012). The highest estimate (6.6%), from Mysore (the city closest geographically to Bangalore), was used to calculate an upper bound estimate for the total number of men 'currently' engaging in sex with men (Lowndes et al., 2012). No lower bound constraint was assumed; this was simply set to zero. This large range created through data extrapolation from 'ever' to 'currently' MSM also then allowed for the additional subpopulation of TGW. Increases in condom use over time were modelled using retrospectively estimated condom use trends for the years 1998e2006 detailed in Appendix D .

Three approaches to model fitting and parametric uncertainty analysis
Given the uncertainty ranges on the model parameter estimates (Tables 1 and 2), Latin Hypercube Sampling was used to generate five million combinations of parameter input sets, using uniform distributions for all parameters.
The reports of the number of partners and role preferences among MSM/TGW sampled in the survey suggest that there were likely to be a greater proportion of MSM/TGW who predominantly took the insertive role in anal sex than suggested by the survey (to 'balance' the high demand for insertive sex from the typically receptive MSM/TGW in the survey sample). We investigated whether a more rigorous insight into the characteristics of these 'hidden' MSM could be gained through comparing three different 'approaches', i.e. hypotheses that motivate different prior assumptions or posterior requirements around parameter values.
Approach 1 assumed that all behavioural and epidemiological data from the IBBAs and SBS for the insertive subgroup were representative of the insertive MSM in the wider population (including the 'hidden' MSM), and so the model was both parameterised and fit to the available data from the surveys. For this approach, model runs were classified as model fits if the HIV and HSV-2 prevalence in all three MSM/TGW subgroups were within the 95% confidence intervals (CIs) for the available IBBA data (Table 2).
Approach 2 assumed that the insertive MSM in the survey were not representative of the 'hidden' insertive MSM in the population overall. To explore the uncertainty in the sexual behaviour of the insertive MSM, the relevant behavioural parameter ranges were expanded, namely the total number of sex acts among insertive MSM and the percentage of their sex acts that were insertive (see Table 1 for details). In addition, no fitting constraints were applied to the insertive MSM subgroup, meaning that model fits were determined as parameter sets which gave HIV and HSV-2 prevalence estimates within the 95% CI of the IBBA data for receptive and versatile MSM/TGW only.
Approach 3 used the same model fits as Approach 2, but added an extra criterion to reject fits if the insertive subgroup had a greater number of sex acts than the receptive group, based on suggestions from previous studies that Hijra and Kothis have a higher frequency of sex (Brahmam et al., 2008;Phillips et al., 2008).
In all three approaches, no fitting restriction was applied to the proportion of MSM/TGW that were in the insertive subgroup. Instead, the proportion who were insertive was simply determined by the proportions in the other two subgroups and the cap on the total number of MSM/TGW across all three subgroups based on available data as explained in Section 2.2.

Model analyses
The model fits from each approach were used to estimate the HIV prevalence and incidence among all MSM/TGW subgroups, and their population sizes. The contribution of insertive MSM to the broader epidemic among men was estimated by calculating the ratio of the number of HIV-positive insertive MSM to the number of HIV-positive urban males. The number of HIV-positive urban males is simply the product of the population size of urban males and the HIV prevalence in Karnataka for males aged 15e54, estimated by the National Family Health Survey (NFHS) data to be 0.82% in 2006 (National Family Health Survey, 2007).

Sensitivity analysis
Uncertainty in the rate at which eligible MSM/TGW were recruited onto ART may be a limitation to our study, so we conducted a sensitivity analysis in which this term was set to zero (i.e. no ART) to assess the extent to which the model results were affected by the ART assumptions.
In addition, to explore the robustness of the key findings to changes in model structural assumptions, an independent 'alternative' model was built by a second modeller and compared to the 'baseline' model used by the first modeller. The alternative model did not include STI transmission (as was included in the baseline model), but did incorporate the potential protective effect of male circumcision and population growth (aspects that were not in the baseline model). Further technical details about both models are contained in the Appendices.

Results
Of the five million model runs, 283 model runs fit within the 95% CI of the HIV/HSV-2 prevalence data among all MSM/ TGW subgroups for Approach 1. For Approaches 2 and 3, 470 and 445 model runs, respectively, fit within the 95% CIs of the data for the receptive and versatile MSM/TGW subgroups.

Estimating the HIV prevalence of the 'hidden' MSM
For all three approaches, over 98% of model fits suggested the HIV prevalence for the insertive subgroup in 2009 to be below the IBBA 2009 survey-estimated mean HIV prevalence for insertive MSM. This implies that HIV prevalence among insertive MSM in urban Bangalore is lower than among the insertive MSM captured in the survey. Therefore, HIV prevalence among the more 'hidden' MSM is likely to be considerably lower than the prevalence among the insertive MSM in the IBBA sample. Fig. 2 compares the HIV prevalence projections from the model fits for each approach with the IBBA data from the same time point (2009). Approach 1 estimates a median HIV prevalence among the insertive MSM of 6.5% (IQR 5.3e8.3%), while Approaches 2 and 3 give lower median estimates of 4.7% (IQR 3.3e7.1%) and 4.5% (IQR 3.1e6.9%), respectively. This is further illustrated in Fig. 3a through the comparison of HIV prevalence estimates (median, IQR and range) projected by the model for all three MSM/TGW subgroups, for the different model fitting approaches. Approach 1 used IBBA round 1 data, adding sex acts with known and unknown partners. Lower bound estimate assumed 1 sex act with a man/ TGW every six months (as our definition of a sexually active MSM/TGW) (C aceres et al., 2006). Approach 2 and 3 assumed same lower bound as Approach 1, but increased upper bound to match receptive MSM/TGW to allow for uncertainty.

Table 2
Biological parameter ranges for all three modelling approaches, and epidemiological data used for model fitting.

Biological parameter estimates Parameter range References
Start year of HIV epidemic 1981e1990 The start date for the HIV epidemic in the MSM/TGW population was based on the first identified case in female sex workers (FSWs) in Tamil Nadu in 1986 (Simoes, Babu, Jeyakumari, & John, 1993), but allows for the fact that the epidemic may have started earlier (Gottlieb et al., 1981) (first reported case in the USA) or later (Bollinger, Tripathy, Quinn, & Thomas, 1995) in Bangalore (high rates of HIV reported in STI clinic patients in 1993/1994 including MSM/TGW). HIV transmission probability per-anal-sex-act for insertive to receptive MSM/TGW partners 0.002e0.014 Estimates of the probability of HIV transmission from an insertive to receptive partner (0.008 per sex act, 95% CI 0.002e0.014) were generated from a systematic review (Boily et al., 2009). HSV-2 transmission probability per-anal-sex-act for insertive to receptive MSM/TGW partners 0.00018e0.00074 Brown et al. (2006) Efficacy of condoms in protecting against HIV 74e94% (Weller & Davis, 2003) multiplied lower bound by IBBA round 2 data on "condom breakage at last sex act" (7.8%) to obtain low efficiency estimate Efficacy of condoms in protecting against herpes simplex virus type 2 (HSV-2)  4e14 months Used (Hollingsworth et al., 2008) in (Wawer et al., 2005)  Multiplicative increase in the probability of acquiring HIV if either partner is infected with an STI (HSV-2 or TP) 1.2e5.3 Used 1.2 as lower bound reported in systematic review among MSM/TGW (Freeman et al., 2006). For upper bound, used 5.3 (mean) from systematic review (Boily et al., 2009

Estimating the size of the 'hidden' and total MSM/TGW populations
Approach 1 estimates a median total MSM/TGW population of 44,700 (IQR 37,700-58,600), with the insertive MSM group accounting for 61% (IQR 52e70%) of the total MSM/TGW population. In comparison, the median estimates were 55,600 (IQR 39,100-83,800) for Approach 2 and 58,600 (IQR 41,800-85,600) for Approach 3, with the insertive MSM subgroup comprising 68% (IQR 55e78%) and 69% (IQR 56e79%) of all MSM/TGW, respectively (Fig. 3b). The flexible Approaches 2 and 3 (which allowed a wider range of uncertainty) suggest that there are moderately more 'hidden' insertive MSM than might be inferred from simply parameterising and fitting the model to the survey data on insertive MSM (as was done in Approach 1). Across all three approaches, the model suggests a much larger percentage of MSM/TGW are in the insertive subgroup (61e69%) compared to the IBBA 2009 sample which contained just 15% in this subgroup (Karnataka Health Promotion Trust, 2013).

Comparing the percentage distribution of HIV infections across each MSM/TGW subgroup and the relative magnitude of incidence rates in MSM/TGW subgroups
Modelled prevalence projections for 2009 were used to estimate the overall percentage of prevalent infections occurring among the insertive MSM compared with the receptive and versatile subgroups (Fig. 3c). Approach 1 estimates that 41.4% of prevalent infections were among insertive MSM, the highest among all the MSM/TGW subgroups. However, Approaches 2 and 3, respectively, provide slightly lower estimates of 32.9% and 33.8%, with the highest proportion of infections for these scenarios being amongst the receptive subgroups (49.1% and 48.5%).
The relative magnitude of incidence rates in MSM/TGW subgroups (relative to the insertive group) was also estimated for 2009 through comparing the total annual number of incident infections that year (Fig. 3d). For Approach 1, we project that 3.9 times as many infections occurred in receptive MSM/TGW compared to the insertive group, while the versatile subgroup had 2.2 times more infections. In comparison to the insertive group, Approaches 2 and 3 estimated the receptive MSM/TGW subgroup to have, respectively, 5.5 and 5.6 times more annual infections, while the versatile subgroup had 1.9 times more (by both approaches).

Projecting the contribution of insertive MSM to the broader HIV epidemic in Bangalore
The model was used to estimate the overall prevalence of HIV among MSM/TGW in 2009. The median estimate for Approach 1 was 9.8% (IQR 8.5e11.5%), for Approach 2 it was 8.4% (IQR 6.2e10.5%) and for Approach 3 it was 8.2% (IQR 6.1e10.2%). All three estimates were substantially lower than the overall estimate from the IBBA 2009 data of 17.0% (95% CI 13.4e21.0%). Our modelling results suggest that insertive MSM account for 10e20% of all prevalent HIV infections among urban males aged 15e49.

Sensitivity analysis
Removing ART in our sensitivity analysis had minimal effect on the model findings (increasing the median HIV prevalence by no more than 0.3% for the insertive MSM in all three model approaches).
The key findings, that the insertive MSM subgroup was larger in size but had lower HIV prevalence than suggested by the IBBA sample, were robust across both models, insensitive to assumptions about STIs, circumcision and population growth. In fact, the alternative model projected a larger bias in the empirical estimates, suggesting that the insertive MSM subgroup population size may be even larger and their HIV prevalence even lower than was suggested by the baseline model.

Discussion
We have shown that our mathematical modelling technique can be used with triangulation of survey data to infer characteristics of more 'hidden' MSM and explore their potential importance for the HIV epidemic. Our model projections across all three different fitting approaches suggest that HIV prevalence among insertive MSM (including those 'hidden') was less than half of that recorded in the IBBA 2009 survey (4.5e6.5% versus 13.1%), while the size of the insertive subgroup was over four times larger (61e69% of all MSM/TGW versus 15%). The three approaches also consistently indicated that insertive MSM accounted for 10e20% of all prevalent HIV infections among urban males aged 15e49.

Our findings in context
The survey-estimated HIV prevalence among panthi and bisexual (insertive) MSM found at 'hot-spot' cruising sites was not representative of insertive MSM across Bangalore: our modelling suggests that the survey over-estimated HIV prevalence in this subgroup.
Mapping studies previously conducted in Bangalore for MSM/TGW have tended to only capture receptive and versatile individuals (Karnataka State AIDS Prevention Society, 2012), and no study has attempted to enumerate the total MSM/TGW population size including the mostly 'hidden' insertive group. Social media data from several other countries suggests that official size estimates of MSM/TGW populations are typically underestimates (Baral et al., 2018). Earlier evidence indicated that, in southern Asia, 6e12% of men report ever having had sex with another man in their lifetime and about half as many reported sex with a man in the past year (C aceres et al., 2006). Using this conversion factor of a half, with polling-booth survey data from other regions in Karnataka estimating that 4e6.6% of males have ever engaged in anal sex with another man (Lowndes et al., 2012), implies that 2e3.3% had sex with a man in the past year: our model estimates closely agree.
Another study, across several different states in India, estimates that about 1 in 10 rural men have had unprotected anal sex with another man in the past year, and that these men also report high numbers of female partners with whom they engage in anal sex as well Verma & Collumbien, 2004). This suggests a high risk of bridging infections between MSM and the heterosexual population (Kumar et al., 2014;Ramakrishnan et al., 2015;Solomon et al., 2015;Verma & Collumbien, 2004). That said, another study concluded that if MSM/TGW were exposed to HIV intervention programmes, in-particular condom distribution and condom demonstrations, then this resulted in safer sexual practices (Mitchell, Foss, Ramesh, et al., 2014). Similarly, a pre-planned, causal-pathway-based modelling analysis indicated that behavioural interventions for female sex workers and MSM/TGW across 24 southern Indian districts averted substantial numbers of HIV infections (Pickles et al., 2013).

Strengths and limitations
The main limitation to our study is that much of the data and analysis focuses on 2009, meaning that the findings are now less relevant to that local context. Over the past decade there has been increased tolerance and acceptance in India, especially in large cities, although legal reforms have oscillated from decriminalising homosexuality in 2009, making it illegal again in 2013, then legalising in 2018 (Beyrer et al., 2016;Harris, 2013;Subramanian, 2018;Timmons & Kumar, 2009). There has also been much more extensive use of social media to connect for sex (Das et al., 2019).
However, the insights gained here could have broader methodological implications. We have demonstrated a technique in which survey data from a more 'visible' population was used with mathematical modelling and triangulation techniques (for 'balancing' reports of the number of partners and role preferences of different MSM/TGW subgroups), to infer characteristics of a more 'hidden' or less sampled MSM/TGW subgroup.
Modelling techniques such as these could be used in other situations, when there may be a large degree of uncertainty in some sections of a network of disease spread, for example due to missing data or concerns about the representativeness of the data. These techniques add further to the work of others in using imperfect data from multiple sources to infer unknowns indirectly by applying Bayesian evidence synthesis methods (Birrell et al., 2011;Rosinska, Gwiazda, De Angelis, & Presanis, 2016).
The analysis is also somewhat limited by the lack of data about sexual mixing patterns, although previous modelling has shown that, for this setting, HIV prevalence projections are largely unaffected by the underlying mixing assumptions (Mitchell, Foss, Prudden, et al., 2014). Insufficient data on other factors or inconsistent data due to different data collection methods (Phillips et al., 2013) also presented a challenge, leading us to make necessary simplifying assumptions in the model to match data availability (Garnett & Anderson, 1996) or having to obtain some model parameter estimates from nearby districts or state-level data. However, the sensitivity analyses showed that removal of ART or STIs, or incorporation of male circumcision or population growth, made little difference to our main findings.

Recommendations and remaining questions
The findings illustrate the need for further and updated research into the population of 'hidden' MSM. An improved understanding of the population size and sexual behaviour of all MSM/TGW is vitally important for implementing effective HIV interventions to reduce HIV risk among MSM/TGW and onward transmission to their female partners. In fact, a recent paper concluded that underestimations of the size of the MSM population likely limited the impact of their integrated HIV services across 27 Indian sites (Solomon et al., 2019). With most intervention programmes focusing on 'visible' MSM/TGW, the 'hidden' MSM could be considered as a potentially marginalised group that should be carefully considered rather than overlooked.
Could early HIV treatment as prevention or pre-exposure prophylaxis among the 'visible' MSM/TGW be a cost-effective strategy to reduce the risk of transmission to 'hidden' MSM and onward spread of infection to their female partners and the general population, especially since the 'visible' MSM/TGW are more easily reached, at higher risk of acquiring HIV as predominantly receptive, and may be more sexually active (Mitchell et al., 2016)? Alternatively, might such a strategy have limited impact on the broader epidemic if the large population of 'hidden' MSM are not reached directly by these initiatives? It is possible that strategies to reach the 'hidden' MSM may also inadvertently reach lower-risk men since the 'hidden' MSM may not be reachable by MSM/TGW programming, so reducing the efficiency of these strategies (Mitchell et al., 2016).
Others have successfully recruited more panthis (45% of MSM/TGW) through respondent-driven sampling and have advocated to use such methods to increase the representation of these 'hidden' MSM in survey data and intervention reach , although there remain challenges with the implementation of either respondent-driven sampling or time-location sampling and both are still seen as the most appropriate for such populations (MacCarthy et al., 2016). However, a major challenge has been the high levels of stigma and discrimination experienced by individuals engaging in same-sex relationships, limiting their contact with services (Chakrapani, Newman, Shunmugam, McLuckie, & Melwin, 2007). Offering some hope, a recent study suggests that peer mobilisation via social media can reach more 'hidden' MSM for HIV services (Das et al., 2019).

Conclusions
Our analysis shows that mathematical modelling can be combined with data from 'visible' MSM/TGW to provide insights into the characteristics of under-sampled 'hidden' MSM, thus providing a better characterisation of the overall MSM/TGW population. A greater understanding of the sexual behaviour of all MSM/TGW is important for effective HIV programming and for estimating the impact that these programmes may have. There are also important broader public health implications from this work: a hidden subgroup with a lower infectious disease prevalence than more visible subgroups, has the potential to contain more infections, if the hidden subgroup is considerably larger in size.

Declaration of competing interest
The authors declare that they have no conflict of interests.
MSM/TGW also leave all compartments at a rate m i ; as they cease MSM/TGW sexual activity at rate 1/(duration sexually active as MSM/TGW). In the baseline model, all MSM/TGW leaving the model due to cessation of MSM/TGW sexual activity after a period of time ðm i Þ; or through progression to AIDS ðg 4 or sÞ, are replaced by d i MSM/TGW susceptible to both HIV and HSV-2, to maintain a constant population size for each subgroup. In the alternative model, this replacement is also made but there is an additional constant recruitment rate (proportional to the initial total population size) incorporated into d i to sustain a slowly growing total population in line with data (World Population Review, 2019).
STIs are included in the baseline model only (i.e. there are no STIs in the alternative model). In the baseline model, herpes simplex virus type 2 (HSV-2) infection is modelled simply as MSM/TGW being either susceptible or infected, with per capita rate of HSV-2 infection being denoted by l vi . Since HSV-2 is a lifelong infection, once MSM/TGW are infected they remain in this compartment until they cease MSM/TGW sexual activity and exit the model. HSV-2 treatment is not explicitly accounted for in the model, however the estimate for the HSV-2 transmission probability in the model is low to reflect the likelihood of MSM/TGW receiving treatment which reduces shedding. In the baseline model, the probability of either partner having one or both of HSV-2 or syphilis (TP) is calculated in order to obtain a single 'STI prevalence' value (s ji ). This s ji is multiplied by a STI cofactor, k, facilitating HIV transmission (through increased susceptibility and/or infectivity). The cofactor therefore describes how the risk of HIV infection is increased if either partner has an STI.
where (with time-dependencies suppressed to simplify notation): x i denotes the number of MSM/TGW in subgroup i ¼ 0; 1 or 2 (based on their predominant role) who are susceptible to HIV. y i1 denotes the number of MSM/TGW in subgroup i who are in phase 1 (initial high viraemia phase) following initial HIV infection.
y i2 denotes the number of MSM/TGW in subgroup i who are in phase 2 (early asymptomatic phase of HIV) with CD4 cell count >200 cells/mm 3 . y i3 denotes the number of MSM/TGW in subgroup i who are in phase 3 (late asymptomatic phase of HIV) with CD4 cell count <200 cells/mm 3 . y i4 denotes the number of MSM/TGW in subgroup i who are in phase 4 (late high viraemia phase) prior to AIDS. y ia denotes the number of MSM/TGW in subgroup i who are on ART.
d i denotes the number of men entering the MSM/TGW subgroup i. In the baseline model, this is set to equal the sum of the number of MSM/TGW leaving that subgroup to maintain a constant population size. In the alternative model, this replacement of those leaving is also made but there is an additional constant recruitment rate incorporated into this term as well to maintain a slowly growing total population. U denotes the proportion of MSM/TGW with CD4 cell count <200 cells/mm 3 who are initiated on ART treatment per year.
(This term was set to zero as part of the sensitivity analysis to explore the extent to which the model results were affected by the ART assumptions).
s denotes the rate at which MSM/TGW cease sexual activity (as progress to AIDS) when on ART. l i ðtÞ denotes the annual per capita force of infection for HIV transmission for MSM/TGW in subgroup i, and is given by the following formula for the baseline model: where: c i denotes the number of sex acts per year for a given MSM/TGW subgroup i. p i denotes the proportion of sex acts that are insertive for a given MSM/TGW subgroup i. e denotes the per sex act efficacy of condoms. f denotes the proportion of sex acts in which condoms are used.
r ji denotes the proportion of receptive sex acts MSM/TGW in subgroup i have with the subgroup j.
B ji ðtÞ denotes the HIV transmission probability from an insertive partner multiplied by the HIV prevalence in that partner population (factoring in different multiplicative cofactors for the increased risk of transmission during the high viraemia stages y i1 and y i4 ). This equates to the proportion of partnerships with an insertive partner that will result in HIV transmission (where j ¼ 0; 1 or 2 represents the three different MSM/TGW subgroups in the partner population of MSM/TGW in subgroup i ¼ 0; 1 or 2). b ji denotes the HIV transmission probability from a receptive partner multiplied by the HIV prevalence in that partner population (factoring in different multiplicative cofactors for the increased risk of transmission during the high viraemia stages y i1 and y i4 ). This equates to the proportion of partnerships with a receptive partner j that will result in HIV transmission.
s ji ¼ 1 À ð1 Àu v Þð1 Àu t Þ gives the probability that, within a sexual partnership, at least one individual has an STI, where u v is the prevalence of HSV-2 and u t is the prevalence of syphilis. (This term was only included in the baseline model, not the alternative model).
k denotes the multiplicative increase in the per sex act probability of HIV transmission in the presence of either STI (HSV-2 or syphilis). (This term was only included in the baseline model, not the alternative model).
In the alternative model, l i ðtÞ is given by: where: j denotes the per sex act efficacy of circumcision in protecting the insertive partner, set at 60% (Friedman et al., 2016;Sanchez et al., 2011). (This term was only included in the alternative model, not the baseline model).
u denotes the proportion of MSM/TGW who are circumcised, which was set to 0.135 based on Indian data (Morris et al., 2016). (This term was only included in the alternative model, not the baseline model).

Appendix B. Mixing matrix equations
The mixing among MSM/TGW was calculated as per the method described in earlier work (Mitchell, Foss, Prudden, et al., 2014). Using this method, the overall numbers of insertive (Ins i ) and receptive (Rec i ) acts for each subgroup i were calculated as: where for each identity subgroup ði ¼ 0; 1 or 2Þ, N i denotes population size, and the other parameters are defined above. Balancing was achieved by altering the subgroup size of the predominantly insertive MSM (N 1 ) to give an equal number of insertive and receptive acts across the whole population. If the total numbers of insertive and receptive acts in the whole population balance, then: Substituting (1) and (2) into (3) and re-arranging gives: N 1 ¼ N 0 c 0 ð1 À 2p 0 Þ þ N 2 c 2 ð1 À 2p 2 Þ c 1 ð2p 1 À 1Þ (B.4) Mixing probabilities between the different groups were calculated using proportionate mixing (Gupta, Anderson, & May 1989). The probability that an individual has a receptive act with an individual in group j, given that they have a receptive act, is calculated as the number of insertive acts offered by group j divided by the number of insertive acts offered by the whole population: R r;j ¼ Ins j Ins 0 þ Ins 1 þ Ins 2 (B.5) The insertive mixing probability is calculated in a similar way: I r;y ¼ Rec j Rec 0 þ Rec 1 þ Rec 2 (B.6) Appendix C. Additional calculations for the number of sex acts and the proportion of these that are insertive To estimate the number of sex acts per year, data from the IBBA round 2 (2009) were used, based on responses to questions on the number of sex acts with known and unknown male sexual partners in the past week summed together and multiplied by the number of weeks per year. Because the data were highly skewed, the median and inter-quartile ranges were used rather than the mean estimate and 95% CIs. In this instance the lower bound for the predominantly insertive population was fixed at two (i.e. one sex act with a man/TGW in a six-month period), assuming that MSM/TGW having sex with other men less frequently than this have negligible effect on the HIV transmission dynamics among MSM/TGW overall.
The proportion of all sex acts which were insertive for each MSM/TGW subgroup was estimated using data from the Bangalore SBS. Data on the number of MSM/TGW within a subgroup who have sex with a certain partner type (commercial/ non-commercial, known or unknown, providing the sample size of MSM/TGW was greater than ten) was used along with the number of insertive sex acts, out of the last ten, that were reported with those partner types.

Appendix D. Condom reconstruction
Survey data from 2006 to 2009 (IBBA rounds 1 and 2), on the percentage of MSM/TGW (all subgroups) reporting condom use in their last sex act was sampled to estimate upper and lower bounds to gain an estimate for condom use in 2006 which was then applied to all years thereafter. Since condom use was reported to be higher in 2009 we used the upper 95% CI bound as an upper estimate and the lower 95% CI from the 2006 survey data as a low estimate. Changes in condom use between 1998 and 2006 were estimated using reconstructed condom use slopes based on previous analysis . Specifically, using the estimated slope, condom use was projected backwards in time until it reached the base condom rate, which was sampled from a range between 0% and the value calculated for 1998 using the condom reconstruction (Fig. D1).