Contraceptive implant failures among women using antiretroviral therapy in western Kenya: a retrospective cohort study

Background: Women living with HIV have the right to choose whether, when and how many children to have. Access to antiretroviral therapy (ART) and contraceptives, including implants, continues to increase in Kenya. Studies have reported drug-drug interactions leading to contraceptive failures among implant users on ART. This retrospective record review aimed to determine unintentional pregnancy rates among women 15-49 years of age, living with HIV and concurrently using implants and ART in western Kenya between 2011 and 2015. Methods: We reviewed charts of women with more than three months of concurrent implant and ART use. Implant failure was defined as implant removal due to pregnancy or birth after implant placement, but prior to scheduled removal date. The incidence of contraceptive failure was calculated by woman-years at risk, assuming a constant rate. Results: Data from 1,152 charts were abstracted, resulting in 1,190 implant and ART combinations. We identified 115 pregnancies, yielding a pregnancy incidence rate of 6.32 (5.27–7.59), with 9.26 among ETG and 4.74 among LNG implant users, respectively. Pregnancy incidence rates did not differ between EFV- and NVP-based regimens (IRR=1.00, CI: 0.71-1.43). No pregnancies were recorded among women on PI-based regimens, whereas pregnancy rates for efavirenz and nevirapine-containing regimens were similar, at 6.41 (4.70–8.73) and 6.44 (5.13–8.07), respectively. Pregnancy rates also differed significantly by implant type, with LNG implant users half as likely to experience pregnancy as ETG implant users (0.51, CI: 0.33-0.79, p>0.01). Conclusions: Our findings highlight the implications of drug-drug interaction on women’s choices for contraception.

report report report report report 1 The nine facilities were: Awendo Sub-County Hospital, Busia County Refer-

Introduction
All women, regardless of HIV status, have the right to choose whether, when and how many children to have. For women living with HIV (WLHIV), the right to decide not only impacts maternal and infant morbidity and mortality but is a pillar of prevention of mother-to-child transmission (PMTCT) by decreasing unintended pregnancies.
Long-acting reversible contraceptives (LARCs), including progestin-only implants, are efficacious, cost-effective, and have high continuation rates. Implants are 99% effective in preventing pregnancy and are increasing in popularity worldwide [1][2][3] . Coupled with UNAIDS's ambitious goal of reaching 90% of people living with HIV with antiretroviral therapy (ART) by 2020 4 , a growing number of women are accessing contraceptives and ART concurrently. The WHO provides medical eligibility guidelines for women using ART 5 .
Contraceptive prevalence, including implant use, has rapidly increased in Kenya 6 . Among all women of reproductive age, 15% use implants 7 . A study found that Kenyan WLHIV were more likely than their non-HIV affected peers to desire no more children and slightly more WLHIV used contraception 8 . Another study found high overall contraceptive use (91%) among women attending HIV care clinics, but low concurrent use of condoms 9 .
In Kenya, 5.9% of adults 15-49 years of age were living with HIV in 2015. Western counties had the highest HIV prevalence rates, ranging from 6.7% in Busia to 26.0% in Homa Bay 10 . HIV treatment became widely available in the public sector in 2005. Until July 2018, WHO-recommended first-line drug regimens consisted of a combination of two nucleoside reverse transcriptase inhibitors (NRTIs) plus a non-nucleoside reverse transcriptase inhibitor (NNRTI). Kenyan guidelines changed over time, in line with global evidence and recommendations 11,12 . In 2014, country guidelines recommended the NRTIs tenofovir (TDF) and lamivudine (3TC) plus the NNRTI efavirenz (EFV) as first-line 13 . In 2017, approximately 54% of all people living with HIV (PLHIV) on ART in Kenya were taking an EFV-based regimen 14 .
NVP and EFV are metabolized in the liver via cytochrome P450, as is hormonal contraception. Levonorgestrel (LNG, brand names: Jadelle® and Sino-implant/Levoplant®) and Etonogestrel (ETG, brand names: Implanon®, Nexplanon®) are metabolized by the CYP3A4 enzyme along this pathway 15 . Pharmacokinetic and retrospective clinical studies have described concomitant use of EFV with implants, although several include small samples of women [16][17][18][19][20][21] . Additional prospective studies have examined the pharmacokinetic effects of ART on hormone levels among implant users and found various degrees of impact on ETG and LNG bioavailability or adverse events, including pregnancy [22][23][24][25] . A large retrospective clinical record review by Patel et al. in Kenya revealed unadjusted pregnancy rates of 5.5 (ETG) and 7.1 (LNG) among 24,560 women using EFV 26 .
Clinicians serving WLHIV need guidance to appropriately counsel their clients, as misinformation is creating uncertainty about how to describe contraceptive choices to these women 16,27 . This study aimed to contribute data to the body of evidence related to contraceptive failures among women who are on ART and use implants that is largely informed by smaller-scale pharmacological studies, with the ultimate goal to improve counseling for WLHIV. The primary aim of this retrospective record review is to determine unintentional pregnancy rates among WLHIV (15-49 years old) concurrently using contraceptive implants and ART in nine facilities in Western Kenya between January 2011 and December 2015. The secondary aim is to describe the characteristics of concurrent implant and ART users with and without implant failures and to explore alternative correlates of method failure.

Methods
We reviewed charts of all women of reproductive age (15-49 years) who had at least three months of concurrent use of any ART and a contraceptive implant, and who accessed services at a high-volume health facility 1 offering comprehensive care for PLHIV. To be included in the analysis, the use of an implant had to occur during any period within the dates of January 2011 to December 2015, and taking place at a time that a woman was also receiving ART. Prior to developing a protocol for the chart review, a feasibility assessment was conducted to: pretest data abstraction tools and processes; determine the degree of integration between HIV and FP services in high-volume facilities; establish whether linking HIV and FP client data was possible; and verify that there were cases of implant failure among ART clients. The investigators then prioritized nine health facilities 1 in Western Kenya based on completeness of medical records, data management processes, local HIV prevalence, results from past programs and the lack of fees for family planning services. The investigators also excluded facilities in which a similar study was being conducted by Family AIDS Care and Education Services (FACES). To mitigate potential bias, we trained research assistants prior to the initiation of the study using standard operating procedures including a data abstraction form and we conducted a pilot test during this training. The investigators sought to identify possible pregnancies due to contraceptive implant failures from medical records: specifically, records of the client's reason for implant removal as pregnancy or that the client gave birth after receiving an implant and before its removal date.

Data collection
Each CCC client receives a unique identifier (eleven-digit, alpha-numeric code) at enrollment in HIV care and treatment services. The investigators only used this number to locate client records for data verification purposes. Client names were tallied but not included in data abstraction. De-identified data were entered into REDCap v6.14.0 (Research Electronic Data Capture), a web-based data management application, with limited, password-protected access. RAs were strictly instructed to properly store all paper and electronic copies of records with client-identifying information. Portable electronic devices did not contain identifiable information.

Data analysis
We estimated contraception failure rates for LNG and ETG implants with concurrent ART regimens containing EFV, NVP, or a protease inhibitor (PI), assuming a constant rate of implant failure. We estimated the incidence of pregnancy per 100-woman years at risk. We defined person years at risk as the start of concurrent implant and ART use (beginning from the start date of whichever was introduced second) to either pregnancy, date of implant removal, end of approved implant effectiveness, end of ART use, or end of the study period, whichever came first.
We used Poisson regression to calculate the incidence rate ratios (IRRs) of pregnancy by age, CD4 count, BMI, ART regimen, and implant type. We repeated this for all six possible combinations of concurrent ART and implant use. Data on TB treatment and viral load were missing for over 96% of women, precluding analysis of these variables; however, we conducted a separate review of those records indicating TB treatment to verify timelines of TB medication in relation to ART regimen, implant use and any pregnancies. Differences were deemed statistically significant at the p<0.05 level (two-sided test of significance). IRRs were calculated after adjusting for potential clustering within HIV clinics, using cluster-adjusted standard errors 28,29 . To test for robustness of results, we reran analyses excluding women aged 35-49. Two separate investigators conducted analyses and cross-checked results; both used STATA with slightly different versions (14 and 15).
Kaplan-Meier failure curves were created to visualize time-to-implant failure for the different ART and implant combinations by using the date of unintended pregnancy as the failure date. We present duration of concurrent ART-implant use for interpretation of Kaplan-Meier results.

Data cleaning
RAs abstracted 1,612 records from women concurrently using ART and an implant, under two investigators' close supervision. Investigators excluded all 208 records from Busia County Referral Hospital due to inconsistencies in RAs' adherence to standard operating procedures. Subsequently, investigators removed records which were missing critical data indicators, including: 36 suspected duplicate records, 50 records with invalid ART or missing implant data, 81 records which indicated that there was fewer than three consecutive months of concurrent implant and ART use, and 85 records where the woman did not meet other inclusion criteria.
All observations without concurrence end dates were censored at December 31, 2015 or earlier if the end of approved implant effectiveness (3 years for Implanon/Nexplanon® and Levoplant® and 5 years for Jadelle®, per WHO prequalifications 30 ) preceded the end of the observation period. Records which listed emtricitabine (FTC) within the ART regimen were regrouped with 3TC due to pharmacokinetic similarity 31 .
With exclusions, the dataset included 1,152 individual women ( Table 1). The dataset was then expanded so that women who switched to a new ART and/or implant over the course of the study period were recoded as separate observations for each unique ART/implant co-administration, giving a final dataset of 1,190 observations. Table 2 provides age and clinical status of women during each instance of concurrent use of ART and an implant, which allows individuals who changed their regimen and/or implant during the study period to be counted as more than one observation. The most common combination of implant and ART use was LNG-EFV (39.4%), followed by ETG-EFV (26.7%), LNG-NVP (16.7%), ETG-NVP (14.8%), LNG-PI (1.3%), and ETG-PI (1.0%).

Demographic and ART information
In total, 32 of the 1,152 women included in the dataset switched ART regimens while using an implant during the study period; 30 women were exposed to two regimens and two women were exposed to three regimens. Four women used implants twice. Three women were exposed to three different combinations of implant and ART. Our sample skews towards shorter periods of co-administration, associated with a recent rise in implant use in Kenya (not shown). The median duration of implant and ART co-administration in our dataset is 1.33 years (not shown). In total, 38 women had multiple observations included in the analysis. The mean time included in first observations was 560 days (median: 488 days); 501 days for second observations (median: 481) and 431 days for third observations (median: 403). Table 1 and Table 2 do not show the number of women whose records indicated TB co-infection and treatment during periods of co-administration of implants and ART. We excluded this data point in the regression analyses because of the high proportion of records with no TB data. However, we reviewed the subset of women with records of TB treatment and describe the findings in Box 1.

Box 1. Exploration of records of women who received treatment for tuberculosis
As noted, very few of the records had any notations regarding TB status. Clinical records of 45 women (52 co-administrations) indicated TB treatment. Among these, five had pregnancies. However, for two women, documentation indicated pregnancy occurred over 2 years after the completion of TB treatment.
For three women, the possibility of additional drug interactions with rifampicin cannot be ruled out. The documented pregnancies occurred within 3 to 9 months after end of treatment for TB. However, all women were using an EFV-based regimen, so this exploration does not provide any further explanation of why so many women on NVP-based regimen experienced contraceptive failures.

Analysis of incidence by ART regimen and implant type
The pregnancy incidence rates did not differ between EFV-and NVP-based regimens (IRR = 1.00; 95% CI: 0.71-1.43). No pregnancies were recorded among women on PI-based regimens among both ETG and LNG implant users, which was statistically significant. There was a statistically significant difference in pregnancy rates based on implant type, with LNG implant users half as likely to become pregnant than ETG implant users (IRR: 0.51, 95% CI: 0.33-0.79, p<0.01). Table 3 shows pregnancy incidence rates and pregnancy IRRs for all six implant-ART combinations against the reference of the ETG-NVP combination, which was chosen as reference group based upon previously cited evidence of minimal drug-drug interactions. All combinations except ETG-EFV were significantly different from the reference. Similarly, a separate analysis of LNG-ART combinations revealed no statistical difference between the adjusted pregnancy IRR for LNG-NVP compared to a reference of LNG-EFV (1.015). The IRR for LNG-PI remained significant compared to LNG-EFV.
Duration to failure curves ( Figure 1) show that pregnancies began occurring within months of concurrent use, steadily accumulating thereafter. The shortest time to pregnancy was 94 days (16.5% and 48.7% within 6 and 12 months, respectively). ETG implant users had more early pregnancies than LNG implant users, but this is likely correlated with higher overall pregnancy incidence and shorter approved duration of use.
The sensitivity analysis (see footnote in Table 3) suggests a lack of robustness in contraceptive failure rate differences across drug-drug combinations by type of implant, a likely artifact of higher use of ETG implants by younger women.

Discussion
An increase in the incidence of pregnancy among implant users on ART may negatively impact acceptability and trust in implants as a contraceptive choice for all women and their partners, regardless of HIV status. Our findings confirm earlier reports of implant failures among women taking EFV-based ART, with an incidence rate of 4.78 per 100-person years for LNG and 10.00 per 100-person years for ETG implants. However, pregnancy incidence rates were considerably higher than found in the general population or that we had hypothesized at the start of the study 32,33 . Our results differ from previous studies in that NVP use was also associated with a high incidence of pregnancies 24,34,35 . Among NVP users, pregnancy incidence rates were similar to EFV users, 4.85 per 100-person years and 8.68 per 100-person years with concurrent LNG and ETG users, respectively. This is the first report, to our knowledge, to present evidence that NVP-based ART regimens may also influence effectiveness of both ETG and LNG. We explored the data in multiple ways, including visually reviewing the records of women who experienced a pregnancy concurrent with NVP use and an implant, without finding any patterns that would explain this finding other than potential pharmacokinetic interactions between NVP and other drug classes metabolized by the cytochrome P450    co-administration of LNG and NVP impacted phamacokinetic efficacy outcomes. Future advances in pharmacogenomics could potential uncover reasons why subsets of women experience unexpected drug-drug interactions (https://www.statnews.com/2018/10/24/precision-medicine-contraceptivechoice/). Co-administration with other medications that induce the P450 enzymes may contribute to the finding; however, we could not assess this in detail, other than to review of documentation of TB treatment in clinical records. The nearly equivalent rate of implant failures among NVP and EFV users may give indication of a higher 'baseline' rate of failures, however we are unable to determine this, as we only reviewed efficacy of implants among WLHIV on ART.
Our exclusion of any pregnancy detected within three months of initiating co-administration minimizes accidental insertions of implants when conception has already occurred, which was a common problem in the Australia post-marketing study and could explain pregnancies in other studies 36 . Inclusion of all observations linked with any co-administration of ART and implants might have generated even higher pregnancy incidence rates. In our study, pregnancy incidence rates in women using EFV were higher for women using ETG implants than those using LNG implants, but this finding is not robust among younger women. This pattern was repeated among NVP users.

Limitations
Retrospective studies can be limited due to issues of reliability and completeness of medical charts, as well as possible bias and confounding in how data are abstracted. Prospective studies with more frequent and standardized timing of follow-up could address some of these limitations. Prospectively capturing data could also allow for further analysis of drug-drug interactions with either ART or implants, such as rifampicin for TB or artemether-lumefantrine for malaria. However, the necessary financial investment and length of time would be substantial.
Our retrospective chart review had intended to measure multiple factors, including stage of HIV disease and other drugdrug interactions; however, incompleteness of medical records did not allow for adequate analysis to answer these questions. The absence of documented TB status in the great majority of records was disappointing given high prevalence of co-morbidity with HIV. Only 45 records documented TB treatment during the study period, some of which documented treatment completion in 2010 prior to the study period or after a pregnancy. However, three pregnancies in our dataset were possibly due to a triple drug interaction, all of which involved an EFV-based regimen, consistent with guidelines. While null TB test findings are potentially omitted from records more frequently than active TB or treatment, we cannot eliminate the possibility of additional drug interactions with rifampicin or other medications. We should also note unclear definition of pregnancies in our data, linked with lack of details in medical records. We had also hoped to draw conclusions about timing of failure over time of implant use, in case an earlier removal and replacement of the implant could address the problem of drug-drug interactions. However, because of recent expansion in use of implants in Western Kenya, most of the implant users in our data set had relatively a short duration of concurrent use with ART.
Biases may have also been introduced to the results because of variations in how RAs assessed abstracted records, due to the differing record management systems in the study facilities. How pregnancy was recorded also was not consistent across all charts or sites, with some uncertainty as to whether recorded and thus abstracted dates of pregnancies referred to the date a pregnancy was confirmed or rather the estimated date of delivery (EDDs). While we sought to abstract EDDs, we did not require RAs to calculate EDDs if the only information was that pregnancy was confirmed at a single point in time. Given the limitations in completeness of charts in the CCCs, the RAs diligently sought evidence of pregnancies in other units of the facilities, but it is difficult to do this systematically across diverse facilities.

Implications for practice
Women should be afforded choices when selecting FP and ART.
In 2016, WHO included dolutegravir (DTG) as an alternative drug for first-line regimens, which became available in Kenya in 2017 37 and has been recommended by WHO as first-line treatment in combination with tenofovir and lamivudine since December 2018. NVP is no longer recommended, and EFV will remain as an alternative to DTG in the first line regimen, consistent with recent WHO guidelines 38 , particularly for women of reproductive age intending to become pregnant or at risk of pregnancy, given concerns about potential neural tube defects 12 . Co-administration with implants has not been studied, although an interaction is unlikely, as DTG does not inhibit or induce CYP450 enzymes 16  It remains challenging to incorporate the results of this study into service delivery for WLHIV. It is premature to discourage women from adopting implants. However, women on NNRTIs need information, in simple, yet accurate language, about the drug-drug interaction and possibility of method failure. With scale up of transition to DTG and nearly 4 million already on this drug, this becomes less important. (https://medicinespatentpool.org/mpp-media-post/five-years-on-3-9-million-people-inthe-developing-world-have-access-to-hiv-treatment-dolutegravir-thanks-to-access-oriented-voluntary-licensing-agreements/). Experiences in South Africa have demonstrated the difficulties in ensuring providers understand these drug-drug interactions and are comfortable sharing information with clients 27 . One option is to refer to the tier of effectiveness 43 , but explain that for women on certain ART regimens, the implant falls somewhere in the middle tier of effectiveness: slightly better than pills or injectables, but less effective than IUDs or sterilization. Regardless of regimen, women should retain the right to make fully informed decisions about which contraceptive option works best for them and what level of method failure risk is acceptable, given that they cannot, under current HIV guidelines, switch ART regimen based on their fertility preferences. Ideally, building a client-centered culture would allow WLHIV desiring control over their fertility to be offered DTG concurrently with an implant or other LARC.
At minimum, programs should encourage better documentation of all medications in medical charts. Health systems should establish mechanisms to track adverse events (pregnancies) in WLHIV using ART. Study facilities had ongoing efforts to integrate FP within HIV care and treatments services. Fully integrating FP services by including LARCs within services for WLHIV may improve care. Health systems should also ensure that IUDs are as accessible as implants and remove barriers to sterilization services. Research into the benefits and costs of alternative service delivery models could inform national policies.
In conclusion, implants are highly effective; however, clients using a NNRTI-containing ART regimen need additional information about higher incidence of pregnancies when used in combination with ARVs to allow them to make informed decisions about contraceptive options.

Underlying data
The dataset analyzed for this study was generated from client medical records under ownership of the Kenyan Ministry of Health. The authors' permission to study this data does not extend to publically sharing the full dataset without prior permissions from the Kenyan Ministry of Health. Access to the de-identified dataset may be obtained by submitting a request to the Kenyan Medical Research Institute (KEMRI) (seru@kemri.org) and the Jhpiego Open Data Help team (OpenDataHelp@ jhpiego.org), copied to anne.pfitzer@jhpiego.org, with a detailed description of the intended use and an IRB-approved protocol for secondary data analysis. Data will be provided under the condition that researchers have provided the required permissions from the Kenyan National AIDS Control Program and KEMRI.
The authors present an important analysis related to possible higher contraceptive failures with specific concomitant antiretroviral therapies (ART). They undertook a retrospective chart review in their affiliated health facilities in western Kenya to help determine if contraceptive implants and efavirenz (EFV)-containing ART are implicated in possible higher failures, as has been shown by some prior data. Overall, the work is adequately presented though there are several clarifications, some more major than minor, that they could provide, especially in the methods and results sections (e.g. justification for choice of reference categories), to aid others in better replicating or interpreting their results. Another major concern is that the authors provide a very cursory interpretation of their findings, and given that they differ from the already published data to date, this is one of the key questions-why are your study findings different?-that should have been elucidated in their discussion. Another concern is that the authors make several statements, in the introduction and discussion sections (also these sections require some better organization and flow, more accurate citations, etc.), that are difficult to justify based on the leading PK data (my sense is that the authors would have benefited from including even a pharmacology expert in their authorship team, which could be considered even now). Detailed feedback is provided below. their authorship team, which could be considered even now). Detailed feedback is provided below.

Abstract
Background line: technically these are considered drug-drug interactions, as the term "drug interactions" is non-specific Introduction The last line in the 2 paragraph is extremely vague; I don't understand what the authors are trying to convey. It would be better to more explicitly elucidate their thinking. The first line in the 3 paragraph should have a citation, and if it is the same citation as #6, then it should be cited with the first sentence with the assumption that the 2 sentence is supported by the same citation (unless the citation standards for this journal indicate otherwise). Same paragraph: The 3 sentence needs a bit of fine tuning, in that this combination is generally recommended as 1 line ART regimen. Also, the next line appears inaccurate on the year; if I am not mistaken, single pill combination containing EFV, was the recommended 1 line ART regimen before 2015 in Kenya (maybe even before 2012?, and please provide your citation for this assertion if you disagree). Also, the next paragraph doesn't make a lot of logical sense as written, and if the authors want to name the leading 1 line ART regimen, they should name all 3 components of it. The "n" in nevirapine should not be capitalized. For the last line, if they can cite specific number on how many women of reproductive potential were on EFV-containing regimen, the sentence would be more to the point. It seems like the paragraphs in the introduction could be organized better. For example, the first line in the 2 paragraph discusses LARCs and implants and then discussion switches to ART for the rest of the paragraph. The 3 paragraph then discusses more on ART and the 4 comes back to contraceptives. The 5 paragraph then goes into describing the potential pathway for any drug-drug interactions at play here, and the data to date on this topic. Strongly reconsider the organization of the intro section for more logical flow. 5 paragraph: 2 line, technically 4 of the 6 citations are for case reports (which are not the same as retrospective clinical studies). The line discussing the Uganda PK study starts with "Similarly" which does not make any logic sense. It would greatly help readers if the authors only used either ETG/LNG acronyms or the brand names, but not both. Last sentence, technically this was not a record review, as is the current study, but rather a retrospective study of electronic medical record data, it would help to indicate the denominator for the rates i.e. "per 100 women-years of implant use," and it appears that citation number #19 is mistakenly included here. 6 paragraph: it appears the first line mistakenly cites #13 (as that citation is not applicable to their statement). Arguably, the goal of the 2 and 3 lines are redundant, might consider combining by saying something along the lines of: "With the ultimate goal of improving counseling for WLHIV, the primary aim of this…"

Methods
Data collection, 1 paragraph: how did you know who was seen between Jan 2011 and Dec 2015? When listing all the variables you extracted, it might make sense to name them all and indicate that the applicable dates were also extracted (instead of mentioning it twice, and presumably it is applicable to not only viral load results but also CD4 count results). It may help to discuss how pregnancies are detected generally in these facilities-i.e. was it detected with clinical presentation while being gravid or via routine biochemical testing (e.g. urine pregnancy tests)? 2 paragraph: might consider saying "facilities" throughout the manuscript, unless you do literally mean "hospitals" in which case it would help to explain which hospitals you selected and why.
Data cleaning paragraphs: There is a lot of detail here not usually included in publications, but the Data cleaning paragraphs: There is a lot of detail here not usually included in publications, but the details do help. It appears only 38 women had a 2 or 3 observation in the dataset; it might help to state it as something along these lines. Given the organization of Table 1, you could consider including here or in the text, the median duration of observation time for 1 observation, 2 observation, etc. Could you provide the justifications to censor observation time after the recommended use of implant durations, given that maintaining those times arguably provides even stronger evaluation for real-world effectiveness (if the woman is still using the implant past its recommended duration)? Data analysis: only clarification is on last line of first paragraph, as "x100" is not clear if it applies to numerator or denominator. Perhaps consider saying "per 100 women-years." A couple questions: How did you arrive at your sample size? was there ever a determination of the required sample size to achieve power for detecting a difference between certain combinations? Or did you simply go with a convenience sample for what was feasible for the study duration? Could you also describe how you determined ascertainment of the incident pregnancy to a combination category? i.e. was it the combination category at the time of the pregnancy detection or did you base it on date of likely conception (and if latter, please describe how you calculated the date of likely conception)? Did you consider double entry for cases where a pregnancy was detected (i.e. some cross-check or validation of your initial data entry), as data entry errors by RAs could have also occurred?

Results
Overall, the various comparisons are confusing and it is difficult for me to follow which comparison is being made when. It may help to use additional subheadings to clearly separate the findings, and it may also help to prioritize the main findings first Small note on Table 3, that the 3 category of ART is "non-NNRTI" which does not necessarily equate to PI-containing ART (which is how the 3 category is defined in the manuscript), so it would be better to clarify if PI-only. Of note, I would not recommend a non-specific non-NNRTI category as the drug-drug interaction implications vary markedly by not only ART family but individual antiretrovirals. Paragraph after Table 3 that starts with "Analysis of implant…": Could you provide justification for why ETG-EFV combination was used as the referent group? It seems like picking a category where you don't expect a drug-drug interaction, such as with NVP, is a better "control" group or the counterfactual. (Of note, when looking at Table 3, I see that you used ETG-NVP as your reference category-please rectify the difference.) More importantly, please justify why you would use ETG-EFV as the referent group for the LNG combinations? Wouldn't you want to compare each implant type against itself, given the possible differences in hormone concentrations, affinity to receptors, recommended duration of use, number of rods, etc.? As a possible secondary analysis, you can compare the ETG to LNG implants. Also, the conjunctions of "EFV and NVP or using a PI" are confusing, please consider rephrasing for clarity. Figure 1: note use of PI here as opposed to NNRTI 2 to last paragraph: it's not clear the marked utility in Figure 2, however if you wish to keep it, please consider dividing or creating 3 separate figures for, for example, the ART type with then the breakdown of implant type within it. Also, when reporting median duration of co-administration time, please also report standard IQR or range, and consider doing this for your 3 or 6 combination categories. Several of the results paragraphs end in interpretation of the findings, which would be better discussed in the discussion section.

Discussion
The biggest missed opportunity in the discussion section is in better elucidating why no significant differences were found between EFV and NVP and respective implant combination groups. What are the possible reasons you did not detect a difference when several PK studies and other nd rd st nd rd rd nd differences were found between EFV and NVP and respective implant combination groups. What are the possible reasons you did not detect a difference when several PK studies and other retrospective studies (namely Perry and Patel) have found differences? What are possible biases that arise from your chosen methodological approach? NVP's DDI profile is markedly different from EFV, so grouping them together and suggesting that all NNRTIs have similar effects is not well supported by existing PK and clinical studies. To this point, etravirine and rilpivirine, other NNRTIs, also have very different DDI and PK profiles than EFV. Given the topic is deeply rooted in the PK world, greater clarity from the PK perspective on the study findings is a must. First paragraph: the use of "ranged" is likely not appropriate, as those are specific point estimates for certain combination categories, correct? So you might as well state that they are for x and y combinations, respectively. Also, generalizing to all NNRTIs is not appropriate as each NNRTI has a very different drug-drug interactions profile. Also, the first paragraph could use some rewording. It seems like in the middle of it, you are trying to highlight that, surprisingly, you found a higher rate of failures with NVP than you had hypothesized at the start of your study. Finally, in the latter half of the first paragraph, the interpretation of the authors for this finding comes across as too definitive-isn't it also possible that the nearly equivalent rate of implant failures among NVP and EFV users you found may signal possibly a "baseline" rate of failures among implant users? Of course, it does not explain the higher than the "baseline" rate that has been described in other settings. First half of second paragraph: indeed, it is robust of you to censor pregnancies within the first 3 months of implant placement in cases where the implant may have been placed with a pregnancy already in place. However, shouldn't that result in a bias towards a lower pregnancy rate compared to the post-marketing surveillance study from Australia that you are comparing to? It would be prudent to spend some space in the discussion section to offer thoughts on why the overall pregnancy rates are higher in your study that other, predominantly non-African, settings. The second half of second paragraph: this is a very interesting finding, and deserves discussion in a separate paragraph-why do you think this difference might exist? Why does it appear to diminish among younger women? Implications for practice: 1 paragraph, cite your basis for not anticipating DDIs with DTG and implants (i.e. the OC PK study by Song . ). 2 paragraph, shouldn't women also be offered et. al the choice to pick an ART regimen that meets their values and preferences (e.g. DTG, knowing that there is currently an unknown and possible risk with adverse birth outcomes)? Given the great women-centered approach the intro and the first half of this paragraph takes, it seems like a missed opportunity to also not advocate for WLHIV to also be informed with best information available at the moment to make their own decisions.

Throughout
Considering saying either "failure rates" or "unintended pregnancy" throughout the manuscript, as switching between the two, often in back to back sentences, is confusing for readers who may not be familiar with the "contraception world" where the two equate each other. (7): 784-9 | 49

References
PubMed Abstract Publisher Full Text

Is the study design appropriate and is the work technically sound?
Partly st 1 nd

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
Yes, I have some minor competing interests in that I was asked to provide Competing Interests: guidance to some of the authors in study design initially prior to data collection (though was not involved in study implementation at all) and was aware of their initial preliminary analyses. I have conducted very related analyses, and this group and mine held a joint study dissemination and stakeholder's meeting in Kisumu, Kenya in July of 2017 (some of our related analyses have been published to date, but not the primary results).

Reviewer Expertise:
HIV and women's health; cohort studies; contraception; pharmacokinetic studies; resource-limited settings I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 21 Dec 2019 , Jhpiego, 1776 Massachusetts Ave, NW Suite 300, Washington, USA Anne Pfitzer Thank you for taking the time to summarize and provide an overall assessment of the paper.
We note your reservations and have attempted to address them in the paper and in the responses below.
Regarding the choice of reference group, we have kept our presentation as it was, but added results of modeling within LNG reference in the text. The results remain unchanged for comparisons within ETG-ART combinations.
We agree that this paper is unable to provide a plausible explanation for some of the findings of drug-drug interactions with NVP-based regimen. We argue that we are merely sharing observations from medical chart data and that it is beyond our scope to assess underlying reasons. We hope that by publishing these data, we encourage further investigations. If this study is a complete outlier, then reviewers will come to their own conclusions, including about the difficulties of using retrospective, poorly documented medical charts for investigations of this sort.
We have taken your feedback about involving a pharmacologist in the review of our paper and attempted to do this, but could not find anyone able or willing to assist us. attempted to do this, but could not find anyone able or willing to assist us.
Abstract: we have corrected the terminology used for drug-drug interactions and throughout the rest of the manuscript.
Introduction: We accepted your feedback and have made a number of edits to this section. In particular, we edited the text to reflect the first-line ART recommendations (two NRTIs and one NNRTI) prior to the WHO recommendation in Dec 2018 transitioning to the integrase inhibitor, dolutegravir. The 2011 guidelines mentioned EFV or NVP in 3 first line regimens (along with either TDF or AZT or d4T as the first agent in the combination therapy). In 2014, TDF/3TC/EFV was the preferred first-line regimen for adults, and this has been stated. Relevant references have been included.
We disagree as that the Nanda reference (citation numbers have been changed) is not applicable to the statement you indicated. The final conclusion in the Nanda paper is that "More well-designed prospective studies are needed to examine potential drug interactions between ARVs and all contraceptive methods, to better inform guidelines and counseling for the more than sixteen million women living with HIV" and a shorter version of this statement appears in the abstract.
Methods: with respect to knowing about visits, we know who was seen in CCCs within the dates in their medical charts and the presence of their charts in the record rooms. We have clarified where dates of medical tests etc were also extracted in the revised version. We also added a short description about how pregnancy was typically noted in records and changed "hospital" to "facility" throughout the manuscript. We moved the Data Cleaning section to the results. The number of women who switched either ART or implants or both during the observation period are described in the second paragraph of demographic and ART information. We added mean and median durations of observation.
We censored the observation time after the recommended duration of implant use so as not to confuse the interpretation of results. If women concurrently using "expired" implants and ART experience a pregnancy, one could rightly assume that the failure is due to the loss in implant efficacy. We wanted to ensure that contraceptive failures could only be due to drug-drug interactions and not linked to the devices' efficacy. While understanding the real world effectiveness issues of women using implants past their labeled effectiveness is interesting, it merits a different study.
On the sample size: while we had determined in our protocol that we would need at least 177 concurrent users of LNG and EFV and 116 ETG users and EFV to counter the null hypothesis. However, we were concerned that selectively abstracting medical records from the facilities would lead to bias, so instead abstracted all records of concurrent users, yielding a much higher sample. This is why we omit discussion of sample sizes in the manuscript.
As discussed in the methods, we divided concurrent use for each combination of drugs, instead of assessing use by each woman… We based the original analysis on the regimen on date of pregnancy noted in records. Based on this comment, we reviewed the data and found two women that had a regimen switch within 9 months of the pregnancy date. Both were analyzed as EFV-users, but had switched from a NVP-based regimen within 4-5 months of the date pregnancy was noted. Thus, it is possible these were in fact pregnancies that occurred during NVP use. We have added a note about this in the results.
We did not use double entry, however field supervisors reviewed case files of women where a pregnancy was detected, as well as random samples of other case files.
Results: The results section is rather short (though has expanded in this version). There are already 4 subheadings. We hope our re-writing has helped clarify any confusion. Table 3: we edited the table to refer to PI-only, not non-NNRTI. Specifically these included LPV/r or ATV/r. Our raw data set included over a dozen combination of anti-retroviral agents (some of which were not consistent with HIV guidelines). A table listing all the possible combination would be unwieldly and contain small numbers for many categories. Furthermore the "PI" category is made up of a small number of observations. We don't feel breaking this category further is warranted given that we are primarily interested in NVP and EFV. However, researchers interested in other combinations can contact us for the data. The discrepancy between Table 3 and the text when it comes to the reference category has been corrected.
The referent combination was ETG-NVP, not ETG-EFV; the reason for this selection was presumed drug-drug interaction based on previous evidence. We felt comparing all combinations was most efficient. The pregnancy incidence rates remain the same even if the analysis separates by implant types. However, we redid an analysis of pregnancy IRRs for LNG-combinations and inserted the results in the text to address your comment. We didn't repeat the ETG analysis as the results are the same as shown. The text has been modified to clarify the comparisons. Also, we made Table 3  Discussion: You asked us to elucidate why we found no significant difference in the incidence of pregnancies among women using nevirapine and efavirenz-based regimen. We acknowledge the possibility that this finding is an anomaly. In terms of biases, we listed extensive limitations to our approach, with numerous potential errors. However, we doubt that all 40 pregnancies could be ascribed to data errors. Only a little over 30% of the observations included nevirapine, so the comparisons are not balanced. We have spent time exploring our data set to identify possible alternative explanations without seeing any patterns that would provide a plausible explanation.
Our authorship team is composed of primarily program implementers with an implementation science lens. Identifying pharmacologists to consult with about our results is therefore difficult. We attempted to do so through contacts at Johns Hopkins University without success. With our already prolonged timeline for posting a new version of the paper, we abandoned the effort. We suggest that the findings be included in the literature as an impetus to further study the issue.
First paragraph: We had constricted this section in one of our versions, but have edited to be explicit about the results for each combination category. We refined our description of the finding related to NVP.
Second paragraph: We added a sentence acknowledging that a different approach might have yielded even higher numbers of pregnancies. We feel we would be over-reaching if we attempted to comment on pregnancy rates in this context versus in non-African settings, as the number of studies is still very small.
Regarding the change in the sensitivity analysis removing older women: in general, we found a lack of robustness between preliminary analysis and subsequent cleaning of the data. These 1.

2.
3. lack of robustness between preliminary analysis and subsequent cleaning of the data. These pregnancies remain relatively rare events. However, we may be able to hypothesize that younger women are more fertile than women aged 35-49, and thus the higher fertility negates any differences in type of implants.
Implications for practice: We have revised the discussion section to reflect the transition to DTG, citing relevant literature. We believe the last sentence of the implications paragraph is already advocating for person-centered care and ability to pick a regimen that fits their fertility intentions.
Choice of "unintended pregnancy" vs "failure" terminology: We have reviewed our manuscript with this comment in mind and replaced the unintended pregnancies term in some places, as we mostly discuss implant contraceptive failures. However, the measurement of that failure is a pregnancy while using an implant. Therefore there remain a few instances where the term "unintended pregnancy" cannot be replaced.
We have no competing interest with respect to this review. This reviewer Competing Interests: did email me separately to suggest one of the new citations (the protocol in the NIH project report), as she had forgotten to include that comment in her publically posted review. We collaborated in the past given our common efforts in this region of Kenya.

Kavita Nanda
Contraceptive Technology Innovation, Global Health, Population and Nutrition, FHI360, Durham, NC, USA This is an important study that adds to the literature on the issue of contraceptive effectiveness for implants among women taking antiretroviral therapies. However, the study is limited by its retrospective nature, and by the existing limitations of databases designed for other purposes. Of most concern is the outcome ascertainment, as detailed in the specific comments below.

Introduction:
The introduction is very long. Some of it could be moved to the discussion. Methods: How many research assistants abstracted data for each participant? How was data accuracy monitored?
The study inclusion criteria are not clear. Did they have to be currently using an implant to be included? Or were women included if they ever used an implant and ART concurrently?
For what proportion of women were the data incomplete? Did RAs consult other records for all I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Thank you for your thoughtful review and feedback.
Introduction: Your comment is consistent with other reviewers and we have reduced the length of the introduction.
Methods: The first line of the data collection section clearly states that eleven RAs abstracted data (however, the data in the final analysis set excludes data abstracted by two of the RAs.) Accuracy was monitored during regular supervision visits by two co-investigators. We moved the sentence at the end of the first section of the Methods to the data collection section in hopes it flows more logically. We added a sentence about the study inclusion criteria in hopes of making clearer. The women did not have to be using an implant during data collection, but at some point between Jan 2011 and Dec 2015 and receiving ART at the same time.
Regarding incomplete data: most CCC records were missing data about co-morbidities, viral load, and many were missing date of removal, reason for removal, etc. It would difficult to give a proportion as we did not calculate across all variables. And yes, the RAs consulted all medical records of women meeting study eligibility criteria of a CCC visit during the study period. However, we described excluding records with data vital for our analysis in the Data Cleaning section. The most common reason a record was dropped due to incomplete data was because they didn't have an insertion date for their implant. RAs focused their search of other records for women whose CCC charts were incomplete. It is possible they searched even harder when they suspected there was a contraceptive failure. We have modified the statement about MSI data to clarify that this was an aid in searching records but did not dramatically change the process. RAs still confirmed data from client records within the facility, but may have benefited from retrieval of FP client record numbers to do so. The RAs used a few key personal identifiers to match records from CCC with those of the same clients in other units (such as ANC). These identifiers were name, age or DOB, home sub-location. RAs were instructed to maintain confidentiality of those details and destroy them after completing medical record abstraction and having those reviewed by a supervisor.
Data on reasons for removal were abstracted from case files with the rest of the information recorded in a woman's record. The abstraction form included check boxes for likely reasons: pregnancy/method failure, menstrual/bleeding issues, desire for pregnancy, partner/spouse request, and expiration. Another field allowed RAs to add comments or provide an alternative reason for removal if one was found in the records. Almost all women with a recorded date of removal also had an indicated reason. Due to the nature of the study design, RAs could not verify pregnancies. However, missing or inconclusive data were found in the client's records in the Comprehensive Care Clinics, RAs referred to other sources (e.g., FP registers, maternity ward registers, provider notes) to verify and complete the abstraction details.
Three of the investigators interacted closely with providers. We have never heard of a case of a client asking for a removal out of concern of a risk of pregnancy that then later turned out not to Three of the investigators interacted closely with providers. We have never heard of a case of a client asking for a removal out of concern of a risk of pregnancy that then later turned out not to have been. Providers are aware of amenorrhea as a possible side effect of implants. They also routinely carry out pregnancy tests.
Thank you for pointing out an inconsistency in our writing about the role of RAs as opposed to investigators. We have corrected. Research assistants abstracted the data under the investigators supervision (including close supervision of two of the co-authors of this paper).
Regarding concurrent use, no, some of the women whose data were abstracted did not use an implant and ART concurrently. Some used both, but not at the same time and were dropped during data cleaning (they're included in the group that had fewer than three months of concurrent use) We did not estimate the fertilization date. We only documented pregnancy status from clinical records. However, we also added content in the limitations section to explain that records were not always clear.
About the quality of data on implant types: we have added detail on the process RAs undertook to clarify type of implant. We excluded any records where the RAs were unable to determine the type of implant. We considered whether implant type was missing in data cleaning. After other cleaning, 1 person was dropped for not having implant type recorded. However, records with other missing information may have been cleaned prior to that step.
Thank you for the affirmation about excluding pregnancies within three months of implant insertion.
Discussion and limitations: We are limited by the way health providers ascertain a pregnancy and record pregnancy in the charts. We have added this in the limitations section of the discussion In terms of the concern related to other medications, we agree with you as this was our secondary research objective. However, record-keeping on other medications was very poor and we were not able to analyze this. But you will note that we expanded the results and discussion around potential drug-drug interactions further compounded with TB treatment, in response to reviewer comments. We already noted this in the second paragraph of our limitations section, but have expanded on it in this version. Regarding pregnancy in women not on ART. We do not include a comparison of women not on ART.
We have no competing interests with respect to this review Competing Interests: 01 July 2019 Reviewer Report https://doi.org/10.21956/gatesopenres.14082.r27364 © 2019 Chappell C. This is an open access peer review report distributed under the terms of the Creative Commons , which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution License work is properly cited. of reproductive age (15-49, like our study), 2500 currently using a method of FP. Of those, 813 used an implant.
Introduction, paragraph 4, sentence 4: You are correct that the distinction is just between "HAART with PI" and "HAART NNRTI". However, this is no longer relevant as we've removed this portion of the text. We also appreciate pointing us in the direction of the Chappell paper in AIDS. We added it to the discussion section and have also updated the Scarsi reference.
Methods: We moved the data cleaning section to the results per your recommendation.
Results, section "Analysis of incidence by ART and contraceptive use": we appreciate the suggestions and corrections and have incorporated them. We also hope we have clarified what was previously the last paragraph. And regarding that paragraph, you made a comment referring to the sensitivity analysis (which has now incorporated into Table 3 per another reviewers suggestion). We have tried to reword the description of the analysis and its interpretation to make it clearer. The main purpose of such an analysis is to describe the robustness of the results. Our analysis showed lack of robustness in the difference in failures by type of implant. We don't feel showing the detailed data adds value.
Discussion: The multiple reviewer comments about the role of TB medications inspired to take a closer look at the subset of records with indications of TB. We added a paragraph on TB results and a new box summarizing our exploration of a subset of records with TB treatment that describes: the number or percentage of records with documentation of TB treatment in clinical records. We have added more commentary on the availability of TB-related data in the medical charts in the discussion. With respect to ascertaining that "women on NVP were actually on NVP and had not been switched to EFV for some times depending on ART availability". Our methods for data abstraction relied upon complete and timely documentation within the facility-based comprehensive care clinic records. If there was a separate set of TB clinic client records, we did not review them. As for regimen, we can only attest that providers recorded the regimen of clients on their "Blue card" at every visit. It seems unlikely that an error of regimen would recur over multiple visits. We also did not hear of any historical concerns about ART availability.
We agree with you about the contradiction of our NVP finding with the literature. We stated this at the end of the first paragraph of the discussion. We added an acknowledgement at that point that this may be an artifact of the data quality and limitations of chart reviews. We have included 3 paragraphs of limitations in the paper and believe we are transparent on the problems with the chart review.
We have no competing interest with respect to this review. Competing Interests: