Who is sexually active? Using a multi-component sexual activity profile (MSAP) to explore, identify and describe sexually-active high-school students in rural KwaZulu-Natal, South Africa

Background Understanding sexual activity is necessary to prevent sexually transmitted infections. Evidence from Sub-Saharan Africa suggests that 10–20% of youth aged 15–24 are sexually active before reaching 15 years, yet estimating sexual activity remains challenging. This study explored the use of multiple sexual health outcomes to identify sexually-active young women in rural KwaZulu-Natal, South Africa. Methods Using a multi-component sexual activity profile (MSAP), we aimed to identify sexually active students. Based on data from 2675 grade 9 and 10 students attending 14 high schools) in rural KwaZulu-Natal, we constructed a descriptive diagram identifying students who were sexually active by self-report vs MSAP profile. T-tests for two independent samples was performed to compare by sex and ecological variables that characterise students newly-identified as sexually active. Results Using self-report only, 40.3% self-reported as sexually active, whilst the MSAP identified 48.7% (223 additional students). More females were identified than males. Younger adolescents were more likely to underreport sexual activity but were identified using MSAP. Newly-identified as sexually active were more likely to be female (p = < 0.000), 15 years old or younger (p = 0.008), less likely to perceive being at risk (p = 0.037) or have ever used alcohol (p = < 0.000). At a relational level, they were less likely to report having ever had a boyfriend/girlfriend (p = 0.000) or to have felt pressured to have sex by their peers (p = < 0.000) or partners (p = 0.008). At a familial level they more likely to be of medium socioeconomic (SES) status (p = 0.037) whilst at a school and community level they were less likely to have repeated a grade (p = 0.024) and were more likely to be engaged in social activities (p = 0.032). Conclusions The MSAP profile identified more potentially sexually active students, and gave insight into the characteristics of students who may be unwilling to self-report sexual activity Future work should investigate how this approach could enhance and describe sexually-active adolescents for research and healthcare provision.


Background
Unprotected sexual encounters are the key cause of sexually transmitted infections. Evidence from Sub-Saharan Africa (SSA) suggests that 10-20% of young people aged 15-24-years-old are sexually active before the age of 15 years [1]. Accurately estimating sexual activity in young people remains challenging, self-report measures tend to underreport sexual activity due to social desirability bias, experimenter effects, and a reluctance to admit early sexual engagement [2][3][4]. However, identifying sexually active adolescents is critical, considering sexually transmitted infections (STIs) account for a major health burden in this group [1,5,6]. Globally, over 40 million young people are infected with herpes simplex virus (HSV-2) [7] and four million infected with HIV [6]. Young women in SSA remain the most vulnerable group [6,8,9]; studies from South Africa indicate school-going women aged 13-24-years-old already have a 6% HIV prevalence, 10.7% HSV-2 prevalence and 3.6% pregnancy prevalence [10].
Sexual activity is an important and widely used proxy indicator for increased risk of possible HIV, STI infection or pregnancy, especially considering the low levels of condom use amongst young people [9,11]. Research on obtaining better approximations of sexual activity is limited [12]. The use of multiple data-points to get better estimations on sensitive topics has been used in the field of adolescent pregnancy. With similar limitations to self-reported sexual activity data, researchers in the field of adolescent pregnancy have had to use multi-data variable approaches to improve their estimates [3,13]. The researchers used multiple data-point profiles which include reported sexual behaviour, contraceptive coverage from large surveys, and abortion rates in combination with adolescent birth rates to obtain better estimates of actual adolescent pregnancy rates [13,14]. A similar approach of combining several data points could assist in identifying sexually active adolescents.
Using self-reported sexual activity, as well as other markers of possible sexual activity (HIV, HSV2, pregnancy, STI symptoms and previous pregnancies) could improve the estimation of sexual activity, and in particular unprotected sexual encounters. By better estimating sexual activity, comparisons between adolescents who, despite living in similar high-risk settings, are sexually active but never experience a negative health outcome, with those that do could be completed. This has the potential to provide the basis for additional analyses into how individual [9,11,15,16], school [11,[17][18][19], partner and peer [20][21][22], familial [23][24][25][26][27], and the broader cultural and geographic level factors [20,21,28] affect risk outcomes. More accurate estimates of sexual activity could also assist in assessing whether adolescents correctly understand their risk of negative health outcomes, whether they underestimate their risk relative to their actual risk [29] and provide a basis for discussions about how they define being sexually active. Furthermore, the importance of increasing adolescent involvement in clinical trials means that better indicators of sexual activity and high-risk adolescents are essential.
This paper uses baseline data from a large school-based cluster randomised control trial to describe how a multi-component sexual activity profile (MSAP) can be used to estimate the proportion of sexually-active young people. Additionally, it aimed to understand the differences between those self-reporting sexual activity and those newly-identified using MSAP.

Study setting and population
The analysis uses baseline data collected during a cluster-randomised control trial (CAPRISA 007) that was conducted in a rural part of the uMgungundlovu district of KwaZulu-Natal, South Africa. A secondary data analysis from students attending 14 secondary schools in the district was included in this analysis. Data were analysed from all students in grades 9 and 10 who successfully enrolled. Details of the cohort selection and inclusion criteria are described elsewhere [10,30].

Study design and methods
The CAPRISA 007 study was a two-arm, matched pair, cash incentivised cluster randomised controlled trial that took place in the study schools between 2010 and 2013. The study has been presented and explained in more detail elsewhere [10,30].
All participants provided informed consent prior to being enrolled. Students ≥18 years provided first-person consent following a literacy and comprehension assessment, whilst students < 18 years, provided assent, while consent was obtained from the parent/guardian. In the event that a parent/guardian was not available, proxy parental consent was obtained from a member of the School Research Support Group (SRSG). Behavioural and demographic data were collected using self-report, privately self-completed, structured questionnaires available in isiZulu and English. Biological measures included students' HIV test results, HSV-2 results, and urine pregnancy test result. The details of the measures used in the parent study are discussed elsewhere [10]. All participants received referrals as required. All students consented that their data may be used in additional analyses. This study explores how we estimate sexual activity, and students are not personally identified about incongruencies in reported sexual activity. All ethical approvals were granted by the University of KwaZulu-Natal Biomedical Ethics Committee (BF105/ 010 and BE 523/14).

Measures
We aimed to understand how many students were sexually active using self-reported sexual activity versus a combination of variables including biological and self-reported behavioural markers to create a Multicomponent Sexual Activity Profile (MSAP). For the purposes of this study, self-reported sexual activity was defined as any learner who reported to have ever had vaginal, anal or oral sex. For the MSAP, we included biological variables including: HIV test results, HSV-2 test results, pregnancy test results. MSAP also include self-reported behavioural variables: reported STI symptoms (those symptomatic screening questions used in the South African public health care system, including asking about experience of a vaginal/penile/urethral discharge, a genital sore or ulcer or pain on urination), previous pregnancy, and those who had made someone pregnant before. MSAP variables all suggest possible unprotected previous sexual encounters. The purpose was to identify the maximum number of young people who may have had a sexual encounter, however, it is possible that some students were HIV positive by vertical transmission, other modes of transmission (i.e. injecting drug-use) or reported not sexually active if they had engaged in sex through force. We include HIV positive in the definition as an individual may still pose a risk for infecting others if not virally suppressed. The MSAP variables identified students who self-reported as not sexually active, but whose profile suggested possible sexual activity. The MSAP variables can be easily collected and measured within a research setting [31].

Statistical analysis
Using a staged analysis of the data collected, we constructed a descriptive flow diagram that identified those students who disclosed as sexually-active and those that reported being not sexually active. In order to construct the flow diagram, several steps were followed to ensure we could identify the prevalence of key variables included in the MSAP (See Fig. 1).
The first step was to identify the individuals in the total cohort who had self-reported as either sexually active or not sexually active. The two groups were then split by sex as we expect to see variation in the MSAP by sex, with young women expected to bear a higher burden of each outcome. Descriptive summaries were generated to obtain the overall prevalence of each variable of the MSAP within each. HIV, HSV-2 and pregnancy prevalences were obtained using the outcomes of biological tests for each group, while prevalence data for experiencing an STI symptom and previous pregnancy/previously making someone pregnant were generated from self-report data collected through self-administered questionnaires. The prevalence of the biological markers (HIV, HSV-2, pregnancy) and the self-reported behavioural markers (previous/previously making someone pregnant and STI symptoms) were then calculated for each sub-group, reported as percentages. The prevalence of key MSAP variables within the group identifying as not sexually active was used to indicate sexual activity. The overall flow diagram was used to look at the difference of MSAP variables within each group, as well as highlighting differences between male and female students.
Summary statistics of the basic demographic data were generated for the overall cohort of students enrolled in the study as well as the groups identifying as sexually active or not. These variables were reported using medians for continuous variables and as frequency distributions for categorical data. As the data was collected from a cluster randomised control trial it was necessary to adjust for any cluster effects that may arise from the school-based sampling. The unadjusted analysis did not account for the clustering, calculating prevalence by combining prevalence of all schools (clusters). Using the adjusted prevalence estimates from each of the 14 clusters, a t-test for two independent samples was performed to compare by sex, reported sexual activity, and ecological variable [21,32] differences between those newly identified vs self-reported as sexually active. As there were few clusters (< 20) in the main study, regression methods were not completed due to concerns over robustness [33].

Results
The total grade 9 and 10 school student population in the 14 schools sampled was 3781. The analysis includes a total of 2675 (70.7%) students who provided consent or assent. Of these, 1423 (53.2%) were females and 1252 (46.8%) were males. The flow diagram of identifying the prevalence of variables included in the MSAP for those self-identifying as sexually active versus not sexually active is presented in Fig. 1. In total, 40.3% self-reported as being sexually active, whilst 59.7% of the students reported having never had any type of sex before. A total of 30 students did not complete the question on whether they were sexually active. These 30 students were negative for all variables constituting the MSAP and were excluded.
The basic demographics for the whole cohort, as well as the two groups with different self-reported sexually active status, are reported in Table 1. The overall median age of all students was 16 years (IQR [15][16][17][18], with a median age of 17 years (IQR [16][17][18] for males and 16 years (IQR 15-17) for females. The overall age range of students was 12-28 years for males, and 13-24 years for females. For those students who self-report as sexually active, the median age of students was 17 years (IQR [16][17][18][19] for males and 17 years (IQR [16][17][18][19] for females. The age range of self-reported sexually active students was 12-28 years for males, and 13-23 years for females. For those students who self-report as not sexually active, the median age of students was 16 years (IQR 15-17) for males and 15 years (IQR 15-16) for females. The age range of self-reported not sexually active students was 13-25 years for males, and 13-24 years for females. Overall, young men were more likely to report having been sexually active (50.0%) than their female counterparts (31.8%) (p = 0.001).  1 Flow diagram of biological and behavioural variables suggesting sexual activity in adolescent students self-identifying as either sexually active or not sexually active. # Note Percentage for ever pregnant/ever made someone pregnant in those reporting sexually active was calculated with regards to available data. Should be interpreted with caution as the denominator changed to 73 and 140 for male and females amongst those self-reporting as sexually active As reported in Table 1, young women had a significantly higher prevalence in all the key MSAP variables than young men, other than the experience of STI symptoms where young men experienced a higher self-reported prevalence than women.
When we looked at the perceived risk of getting infected with HIV as a proxy for perceived risk of a negative health outcome, those participants reporting to be sexually active were significantly more likely to think that they were at higher risk (p = 0.025), whilst those Adjusted measures were calculated based on the cluster (school)-level summaries appropriate for school-based sampling b Percentage was calculated with regard to available data. Should be interpreted with caution as the denominator changed to 73 and 140 for male and females that self-reported as sexually active who reported as not sexually active were more likely to report being at no risk of future HIV risk (p = 0.001) ( Table 2).
Prevalence of key MSAP variables amongst students reporting to be sexually active Overall, more male students reported having had some type of sex before (p = 0.001), within those self-reporting as sexually active more males reported having been sexually active (58.1%) than females (41.9%). The overall prevalence of key variables in those reporting to be sexually active was, HIV prevalence 5.7%, HSV-2 prevalence 11.1%, positive pregnancy test 6.3%, self-reported ever made someone pregnant or previous pregnancy 79.8% and self-reported STI symptoms 14%. The gender difference in the prevalence of MSAP variables persisted, consistent with the gender disparity seen nationally and regionally [6]. HIV prevalence amongst sexually active young women was approximately seven times greater than their male counterparts, with young women's prevalence 11.4% to the males 1.6% (p < 0.001). HSV-2 prevalence amongst self-reporting sexually active women was significantly higher compared to young men, reporting a prevalence of 21.0% compared to 3.9% in young men (p < 0.001). Amongst the key self-reported behavioural markers of sexual activity, the gender disparity continued, significantly more females reported having had a previous pregnancy, compared to young men who made someone pregnant before (p < 0.001). Previous pregnancy prevalence for those reporting sexually active should be interpreted cautiously as the available data had high missing values. For those reporting symptoms of an STI, the gender disparity was not significantly different, 13.5% of females reporting to have been sexually active had a symptom of an STI as opposed to 14.5% of young men self-reporting STI symptoms. For young women, it appeared that they were engaging in unprotected sex, indicated by the high pregnancy prevalence (6.3%).
Prevalence of key MSAP variables amongst students reporting to not be sexually active Overall, fewer young men reported that they had never had sex before (p = 0.001). For those self-reporting never having engaged in any type of sex before, fewer males (39.3%) reported not having had sex before compared to females (60.7%). For those reporting to have never been sexually active, the overall prevalence of MSAP variables were HIV prevalence 2.8%, HSV-2 prevalence 4.3%, positive on pregnancy test 2.2%, ever having made someone pregnant or previous pregnancy prevalence was 1% and positive for a self-reported STI symptom was 6.5%. The gender disparity amongst the prevalence of the key indicators of sexual activity remained skewed towards females ( Fig. 1) in almost all the variables. HIV prevalence amongst females reporting to have never been sexually active was significantly higher (3.9%) than amongst males (1.3%) (p = 0.001). Young women had four times higher HIV prevalence when compared to their male counterparts. Amongst those reporting not being sexually active the difference in HSV-2 prevalence was less pronounced between the genders but still significantly different. In females, the prevalence of HSV-2 was 5.6% compared to 2.3% in males (p = 0.001), with females having two times greater prevalence than males. Evidence of unprotected sexual encounters was again evident amongst young females reporting no sexual activity where there was a 2.2% prevalence of pregnancy. For the key self-reported behavioural markers that could indicate sexual activity, there was no significant difference between the genders. Firstly, the prevalence of ever been/ever made someone pregnant (1.2% of females and 0.5% of males (p = 0.068)), suggest possible evidence of unsafe sexual practices, despite reporting to have never had sex. Secondly, amongst those reporting never having had sex, 5.3% of females reported to having a positive STI screening symptom. Here the gender disparity was reversed, young men had more reported positive STI screening symptoms than females, with 9.3% of young men experiencing at least one STI symptom by self-report, however, the difference was not significant (p = 0.125).

Differences between reported sexual activity and newly identified as sexually active
In total, 59.7% of those students that enrolled self-reported as having never had sex before. We identified 14.1% of these students who appeared to have been sexually active because of the presence of at least one sexual health outcome. Of the 223 sexually active individuals newly identified using the MSAP variables, female students were more likely to under-report sexual activity (p < 0.001) ( Table 3). In the 223 individuals, 20.2% were identified by looking at HIV in those reporting to as not sexually active, an additional 27.8% were identified by adding HSV-2, 6.7% were added by adding pregnancy results, 3.6% were added by adding ever pregnant/ever made someone pregnant and the final 41.7% were added by adding self-reported STI symptom. Some variables overlapped, and the breakdown of the variables for those newly identified sexually active individuals are presented in Table 3.
In total, MSAP increased the number identified as sexually active from 40.3 to 48.7%. The basic age demographic for the 223 newly identified sexually active students is presented in Table 3.
Characterising those newly identified as sexually-active compared to those who self-reported as sexually active on various ecological level factors highlighted some important differences ( Table 4 in the appendix section). At an individual level, those newly-identified as sexually active were more likely to be female (p = < 0.000), 15 years old or younger (p = 0.008), 20-year-old or older (0.040), less likely to perceive being at risk (p = 0.037) or have ever used alcohol (p = < 0.000). At a relational level, newly-identified as sexually active were less likely to report having ever had a boyfriend/girlfriend (p = 0.000) or ever having had more than one boyfriend/girlfriend at the same time (p = 0.000). They were also less likely to have felt pressured to have sex by their peers (p = < 0.000) or partners (p = 0.008), or have friends that were mostly boys (p = 0.000). At a familial level, those newly identified were more likely to be of medium socioeconomic (SES) status (p = 0.037) and less likely to be low  SES (p = 0.029). At a school level, those newly-identified as sexually active were less likely to have repeated a grade (p = 0.024) and were more likely to be engaged in social activities (p = 0.032) at a community level.
To further understand whether those newly identified as sexually active perceived that they were at risk of negative health outcomes we used their self-reported perception of future HIV infection as a proxy marker. For those reporting that they felt they were at 'some risk' of future HIV infection, significantly more male students reported to feel at some risk than female students (Table 5). This subset of students, while being part of a group that perceived themselves to be a lower risk (Table 4), appeared to correctly assess that they may still be at some risk. When assessing the risk perception between those who experienced a negative health outcome in the self-reported sexually active and newly identified as sexually active (Table 6), the data showed that amongst those newly identified, those with HSV-2, those who had been pregnant before or those who self-reported an STI symptom were more likely to perceive themselves as being at lower risk.

Discussion
Using the MSAP approach we identified more sexually active high-school students than we would have using only a self-reported measure. Our findings are consistent with previous research which suggests a trend in under-reporting sexual activity amongst young people [2,4,13]. This highlights that the continued reliance on only self-reported measures is susceptible underestimating the proportion of sexually active adolescents [2,4]. Considering that sexual activity is highly related to increased risk of negative health outcomes [1,9,[34][35][36], using MSAP as a basis to understand the predictors of sexual health outcomes, and then prevent future negative health outcomes is important.
Separating by MSAP variables, HIV positive status identified an additional 1.7% of students who should have been included as sexually active, HSV-2 positive status identified an additional 2.3%, while positive pregnancy tests identified an additional 0.6%. Using self-report variables, ever pregnant/made someone pregnant identified an additional 0.3%, and self-report STI symptoms an additional 3.5% of students who should have been included as sexually active. Indicative of high risk, 11.2% of the newly identified students were positive on multiple MSAP variables. In addition to identifying a larger proportion of sexually active students, MSAP identified high-risk students experiencing negative health outcomes that increase their risk of future HIV infection.
Previous research suggests particular difficulty in identifying sexually active young women [2,12]. Using MSAP variables we were able to identify an additional 223 potentially sexually active students, and almost double the number of those newly identified were young women. The disparity between young women and men reporting sexual activity could be a result of social desirability, female vs male gender norm expectations or how young people understand the term sexually active [2,3,13,21]. Failing to identify a proportion of sexually active young people (in particular young women) is problematic considering the association between sexual activity and the risk of HIV, STI and early pregnancy. The MSAP approach highlighted the importance of STIs, HSV-2, pregnancy and HIV as indicators of unprotected sex [8,10,34]. Reducing HIV endpoints remains, understandably, the most important outcome for most research studies. However, low seroconversion rates in adolescent and young cohorts means assessing the efficacy of prevention trials can be difficult [35,37]. One problem with focusing only on HIV as an endpoint is failing to recognise the importance of young people who are sexually active, having high risk, and unprotected sexual encounters but are not yet infected with HIV. In the 223 newly identified sexually active individuals, using only HIV endpoints to define engagement in unprotected sexual activity, we would have excluded 80% of those whose risk profile suggested high-risk sexual encounters and a future risk of HIV infection. Using a MSAP approach we can create nuanced risk categories, identifying those who are sexually active and 1) already experiencing negative health outcomes 2) are HIV negative but experiencing negative health outcomes that put them at high risk of future seroconversion, and 3) those who are sexually active with no negative health outcomes, in addition to those who are not sexually active. This highlights the huge potential for identifying these particularly at-risk students and focusing HIV prevention efforts.
Comparing those who reported being sexually-active with those newly-identified raised some interesting insights into those adolescents who do not disclose sexual activity. When looking at age-stratifications, the findings suggested that the willingness to disclose sexual activity may change over time. Amongst all those students included, a higher proportion of those that self-disclosed sexual activity were older than 16, and male. Amongst the 223 newly identified sexually active students, this trend was observed again, the majority of these students were under 17 years old. This finding is congruent with other studies which have shown that young women 15-19 years were less willing to report marriages and first births before age 15 than were  women from the same group when asked again five years later [12]. This unwillingness to report sexual activity is likely due to broader cultural and social factors which govern the acceptability of sexual activity amongst students [11,12,38].
Other differences between those identified as newly sexually-active raise some interesting hypotheses. The relational, familial, school and community level descriptions suggest that those who do not self-report sexually-active may be those who perceive negative consequences of admitting sexual activity. Besides being more likely to be female, a higher proportion of newly-identified students appear to successfully manage school, come from higher-income households, and participate in extramural activities, less likely to drink alcohol or admit relationships and feel less pressure to have sex from peers or partners. It appears the newly-identified students may be those that either fall outside what people usually define as vulnerable or at-risk. It is possible that for HIV infection vertical transmission may be the cause of infection for some adolescents but for the other MSAP variables, sexual activity is the most likely route of infection. Therefore, the newly-identified may be young people who, 1) come from households where being sexually-active could have negative social implications, 2) come from social groups where being sexually-active is less acceptable, 3) may be young people that choose to be sexually active but are unwilling to admit it because of perceived social or personal repercussions, or 4) come from households/social groups with more conservative values. These factors could make this sub-set of adolescents particularly sensitive to possible confidentiality breaches by peers, school staff, health-care workers, researchers or other social connections [1]. This may have implications for health-seeking behaviour within this group because this group is most likely less inclined to seek out health-care services that require disclosure of sexual activity. The MSAP approach could provide an innovative conceptualisation for identifying a larger proportion of these younger sexually-active individuals.
While MSAP identified a greater proportion of sexually active students, it was also able to identify a particularly at-risk group who experienced at least one negative health outcome. The presence of these negative health outcomes are important when thinking about future HIV risk [29]. Using MSAP variables we noted a disconnect between risk perception and actual risk when investigating the difference in risk perception between those who were sexually active and had a negative health outcome and those who reported being not sexually active but had a negative health outcome. Those who self-reported as not sexually active and had HSV-2 infection, self-reported previous pregnant/made someone pregnant or having an STI symptom were more likely to report that they were at no risk of HIV infection when compared to the same group who reported to be sexually active. This suggests that those students reporting as not sexually active may be reporting socially desirable responses, trust their partners if they have them, or have optimism bias, believing that negative outcomes will not happen to them, underestimating their perceived risk in comparison to their real risk [29]. It suggests that students may not treat all negative health outcomes with equal importance, not realising that other high-risk outcomes (pregnancy/HSV-2/STI infection) increase future risk of HIV. Considering 79.8% of newly identified students had experienced a negative health outcome that increased their risk of HIV but were HIV negative, using MSAP has great potential for building contextualised understandings of risk perception and targeting them with prevention interventions.
Considering the need to include adolescents, students and young people in HIV prevention and clinical trials [39,40] being able to identify at-risk youth is imperative for the field. Most studies and pre-screening protocols rely on questions such as "have you had sex in the last 30 days?" or "have you ever had sex?" to identify at-risk populations. Using MSAP in addition to innovative methodologies such as cognitive interviewing [41,42] during formative stages of study design may assist in constructing more reliable tools, and allow us to improve how we measure health outcomes. Creating pre-screening protocols that assess multiple variables associated with sexual activity, as shown in this analysis, provides a strong theoretical and medically-informed profile for assessing sexual activity and increased risk as well as linking HIV positive young people to treatment.
MSAP provides a methodological and theoretical platform from which to improve how we define sexual activity and risk profiles in order to improve our interpretation of risk amongst young people. A key limitation of the current study is that we had high missing data on certain variables, in particular the self-reported variable of ever pregnant/ever made someone pregnant. Therefore further investigation is required regarding these data, and results should be interpreted with caution on this variable. Additonally, self-reported measures may under-estimate or over-estimate the number of potentially sexually active students. In particular, and that there may have been misreporting on some of the behavioural variables and therefore future work on a MSAP approach is required. We note that the inclusion of self-reported STI symptoms increased the number of students we identified as potentially sexually active. The main study included two male students over 24 years of age, but both already reported being sexually active, not affecting the MSAP profile. However, future work investigating the usefulness of MSAP in identifying sexual active individuals in different age groups is required. The design of the parent study (a cluster randomised control trial, limited the analyses for the current study, highlighting the need for future work. This study also took place with high school students in rural KwaZulu-Natal, so future work in other populations in required and generalisability to other settings may be limited. Further research to refine the components of the profile and differences between those that self-identify as sexually-active and those identified using MSAP variables is required. The next steps for this research will include looking at how the MSAP informed risk profiles (sexually active with negative health outcome/sexually active and no negative health outcome) are linked to increased risk of negative health outcomes and how these change across time.

Conclusion
We know that high-school students are sexually active and are having sexual encounters that put them at risk of negative sexual health outcomes. Our analysis suggests the current methods of assessing sexual activity fail to identify an important portion of sexually-active students. If further research suggests that using and adapting the MSAP variables will assist in identifying a greater proportion of sexually active youth, then it could become a useful tool for identifying adolescents for inclusion in research, and characterising adolescents who avoid disclosing sexual-activity. Future research into the predictors of MSAP profiles could strengthen its use as a prevention tool and aid in the development of screener tools that could be used in the public health care and research settings. The MSAP provides a framework to design more nuanced analyses of risk in sexually active young people, comparing those who are seemingly more resilient, not experiencing negative sexual health outcomes with those that have despite exposure to similar settings of risk. Using an MSAP approach provides a novel approach to defining actual sexual activity in adolescents and a platform for improving our understanding of adolescent sexual health outcomes.

Funding
This work presents independent research using data from the CAPRISA 007 Trial which whose funding was provided by MIET Africa. CAPRISA and MiET Africa were involved in the implementation and evaluation of the CAPRISA 007 trial. The current work represents independent research using data from the main study, and expresses the views of the authors. CAPRISA is part of the Comprehensive International Program of Research on AIDS (CIPRA) and is supported by the National Institute of Allergy and Infectious Disease (NIAID), National Institutes of Health (NIH) and the US Department of Health and Human Services (DHHS) (grant# 1 U19 AI51794), these funding sources provide support for the organisation. The career development of HH was supported by the Columbia University-Southern African Fogarty AIDS International Training and Research Programme (AITRP) funded by the Fogarty International Center, National Institutes of Health (grant #D43TW00231).

Availability of data and materials
The data that support the findings of this study are available from CAPRISA but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of CAPRISA.

Authors' contributions
Contributors HH and QAK conceptualised and designed the current analysis. HH completed the primary draft paper and completed all additional drafts after feedback from additional authors. HH was involved in data collection. HH, FO, LK and QAK contributed to data analysis and interpretation. All authors contributed to either preparation or edits of the final manuscript and approved revisions.

Ethics approval and consent to participate
The protocol, self-administered questionnaires, informed consent forms and study related materials were reviewed and approved by the University of KwaZulu-Natal Biomedical Research Ethics Committee (reference number BF105/010 and BE 523/14). All participants provided written informed consent prior to being enrolled. Students ≥18 years provided first-person consent following a literacy and comprehension assessment, whilst students < 18 years, provided assent, while written consent was obtained from the parent/guardian. In the event that a parent/guardian was not available, proxy parental written consent was obtained from a member of the School Research Support Group (SRSG). This process was approved by the University of KwaZulu-Natal Biomedical Ethics Committee (BF105/010).

Competing interests
The authors declare that they have no competing interests.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.