Contraceptive implant failures among women using antiretroviral therapy in western Kenya: a retrospective cohort study [version 1; peer review: 3 approved with reservations]

Background: Women living with HIV have the right to choose whether, when and how many children to have. Access to antiretroviral therapy (ART) and contraceptives, including implants, continues to increase due to a multitude of efforts. In Kenya, 4.8% of adults are living with HIV, and in 2017, 54% were receiving an efavirenz-based ART regimen. Meanwhile, 16.1% of all Kenyan married (and 10.4% of unmarried) women used implants. Studies have reported drug interactions leading to contraceptive failures among implant users on ART. This retrospective record review aimed to determine unintentional pregnancy rates among women 15-49 years of age, living with HIV and concurrently using implants and ART in western Kenya between 2011 and 2015. Methods: We reviewed charts of women with more than three months of concurrent implant and ART use. Implant failure was defined as implant removal due to pregnancy or birth after implant placement, but prior to scheduled removal date. The incidence of unintended pregnancy was calculated by woman-years at risk, assuming a constant rate. Results: Data from 1,152 charts were abstracted, resulting in 1,190 implant and ART combinations. We identified 115 pregnancies, yielding a pregnancy incidence rate of 6.32 (5.27–7.59), with 9.26 Open Peer Review


Introduction
All women, regardless of HIV status, have the right to choose whether, when and how many children to have. For women living with HIV (WLHIV), the right to decide not only impacts maternal and infant morbidity and mortality but is a pillar of prevention of mother-to-child transmission (PMTCT) by decreasing unintended pregnancies.
Long-acting reversible contraceptives (LARCs), including progestin-only implants, are efficacious, cost-effective, and have high continuation rates. Implants are 99% effective in preventing pregnancy and are increasing in popularity worldwide [1][2][3] . Coupled with UNAIDS's ambitious goal of reaching 90% of people living with HIV with antiretroviral therapy (ART) by 2020 4 , a growing number of women are accessing contraceptives and ART concurrently. The WHO provides guidelines for this population 5 .
In Kenya, 5.9% of adults 15-49 years of age were living with HIV in 2015. Western counties had the highest HIV prevalence rates, ranging from 6.7% in Busia to 26.0% in Homa Bay 6 . HIV treatment became widely available in the public sector in 2005, consisting of a combination of two nucleoside reverse transcriptase inhibitors (NRTIs) plus a non-nucleoside reverse transcriptase inhibitor (NNRTI). In 2015, tenofovir (TDF) became the preferred NRTI and efavirenz (EFV) replaced Nevirapine (NVP) as preferred NNRTI. As of 2017, approximately 54% of all people living with HIV (PLHIV) on ART in Kenya were taking an EFV-based regimen 7 .
Contraceptive prevalence, including implant use, has rapidly increased in Kenya 8 . An estimated 16.1% of married women and 10.4% of unmarried women use implants 9 . A study found that Kenyan WLHIV were more likely than their non-HIV affected peers to desire no more children and slightly more WLHIV used contraception 10 . Another study found high overall contraceptive use (91%) among women attending HIV care clinics, but low concurrent use of condoms 11 . NVP and EFV are metabolized in the liver via cytochrome P450, as is hormonal contraception. Levonorgestrel (LNG, brand names: Jadelle® and Sino-implant/Levoplant®) and Etonogestrel (ETG, brand names: Implanon®, Nexplanon®) are metabolized by the CYP3A4 enzyme along this pathway 12 . Pharmacokinetic and retrospective clinical studies have described concomitant use of EFV with implants, although with small samples of women [13][14][15][16][17][18] . Three prospective studies have examined the pharmacokinetic effects of ART on hormone levels among implant users. A prospective nonrandomized pharmacokinetic study in Brazil found reduced ETG bioavailability among women on EFV 19 . Another prospective pharmacokinetic study in Brazil reported no pregnancies for the duration of the three years, six months study period among Implanon users on various ART regimens including EFV 20 . Similarly, a prospective pharmacokinetic study conducted in Uganda found that LNG levels were 32-39% higher among NVP users and 40-54% lower among EFV users compared to women not on ART. Three of the 20 women on EFV who were enrolled in the study had drug levels below the minimum recommended concentration for contraceptive efficacy and three pregnancies occurred within 48 weeks 21 . Secondary analyses of Swaziland clinical trial data by Perry et al. found that, in a sample of 570 women, 15 of 121 (12.4%) women using LNG implants and EFV became pregnant, with no pregnancies in women using NVP 22 . A large retrospective clinical record review by Patel et al. in Kenya revealed unadjusted pregnancy rates of 5.5 (ETG) and 7.1 (LNG) among 24,560 women using EFV 19,23 .
Clinicians serving WLHIV need guidance to appropriately counsel their clients, as misinformation is creating uncertainty about how to describe contraceptive choices to these women 13,24 . This study aimed to contribute to evidence related to contraceptive failures among women who are on ART and use implants and ultimately improve counseling for WLHIV. The primary aim of this retrospective record review is to determine unintentional pregnancy rates among WLHIV (15-49 years old) concurrently using contraceptive implants and ART in nine facilities in Western Kenya between January 2011 and December 2015. The secondary aim is to describe the characteristics of concurrent implant and ART users with and without implant failures and to explore alternative correlates of method failure.

Methods
We reviewed charts of all women of reproductive age (15-49 years) who had at least three months of concurrent use of any ART and a contraceptive implant, and who accessed services at a high-volume health facility 1 offering comprehensive care for PLHIV. Prior to developing a protocol for the chart review, a feasibility assessment was conducted to: pretest data abstraction tools and processes; determine the degree of integration between HIV and FP services in high-volume facilities; establish whether linking HIV and FP client data was possible; and verify that there were cases of implant failure among ART clients. The investigators then prioritized nine health facilities 1 in Western Kenya based on completeness of medical records, data management processes, local HIV prevalence, results from past programs and the lack of fees for family planning services. The investigators also excluded facilities in which a similar study was being conducted by Family AIDS Care and Education Services (FACES). To mitigate potential bias, we trained research assistants prior to the initiation of the study using standard operating procedures including a data abstraction form and we conducted a pilot test during this training. Data collection was monitored through supervision visits and calls throughout study implementation. The investigators sought to identify possible pregnancies due to contraceptive implant failures from medical records: specifically, records of the client's reason for implant removal as pregnancy or that the client gave birth after receiving an implant and before its removal date.

Data collection
Each CCC client receives a unique identifier (eleven-digit, alpha-numeric code) at enrollment in HIV care and treatment services. The investigators only used this number to locate client records for data verification purposes. Client names were tallied but not included in data abstraction. De-identified data were entered into REDCap v6.14.0 (Research Electronic Data Capture), a web-based data management application, with limited, password-protected access. RAs were strictly instructed to properly store all paper and electronic copies of records with client-identifying information. Portable electronic devices did not contain identifiable information.

Data cleaning
The investigators abstracted 1,612 records from women concurrently using ART and an implant. Investigators excluded all 208 records from Busia County Referral Hospital due to inconsistencies in RAs' adherence to standard operating procedures. Subsequently, investigators removed: 36 suspected duplicate records, 50 records with invalid ART or missing implant data, 81 records which indicated that there was fewer than three consecutive months of concurrent implant and ART use, and 85 records where the woman did not meet other inclusion criteria.
All observations without concurrence end dates were censored at December 31, 2015 or earlier if the end of approved implant effectiveness (3 years for Implanon/Nexplanon® and Levoplant® and 5 years for Jadelle®, per WHO prequalifications 25 ) preceded the end of the observation period. Records which listed emtricitabine (FTC) within the ART regimen were regrouped with 3TC due to pharmacokinetic similarity 26 .
With exclusions, the dataset included 1,152 individual women ( Table 1). The dataset was then expanded so that women who switched to a new ART and/or implant over the course of the study period were recoded as separate observations for each unique ART/implant co-administration, giving a final dataset of 1,190 observations.

Data analysis
We estimated contraception failure rates for LNG and ETG implants with concurrent ART regimens containing EFV, NVP, or a protease inhibitor (PI). We estimated the incidence of unintended pregnancy per 100-woman years at risk. We defined person years at risk as the start of concurrent implant and ART use (beginning from the start date of whichever was introduced second) to either pregnancy, date of implant removal, end of approved implant effectiveness, end of ART use, or end of the study period, whichever came first. Incidence of unintended pregnancy was calculated as number of pregnancies/number of woman-years at risk x100, assuming a constant rate of implant failure.
We used Poisson regression to calculate the incidence rate ratios (IRRs) of pregnancy by age, CD4 count, BMI, ART regimen, and implant type. We repeated this for all six possible combinations of concurrent ART and implant use. Data on TB symptom screening, TB diagnosis and viral load were missing for over 96% of women, precluding analysis of these variables. Differences were deemed statistically significant at the p<0.05 level (twosided test of significance). IRRs were calculated after adjusting for potential clustering within HIV clinics, using clusteradjusted standard errors 27,28 . To test for robustness of results, we reran analyses excluding women aged 35-49. Two separate investigators conducted analyses and cross-checked results; both used STATA with slightly different versions (14 and 15).
Kaplan-Meier failure curves were created to visualize timeto-implant failure for the different ART and implant combinations by using the date of unintended pregnancy as the failure date. We present duration of concurrent ART-implant use for interpretation of Kaplan-Meier results. Table 2 provides age and clinical status of women during each instance of concurrent use of ART and an implant, which allows individuals who changed their regimen and/or implant during the study period to be counted as more than one observation. The most common combination of implant and ART use was LNG-EFV (39.4%), followed by ETG-EFV (26.7%), LNG-NVP (16.7%), ETG-NVP (14.8%), LNG-PI (1.3%), and ETG-PI (1.0%).

Demographic and ART information
In total, 32 of the 1,152 women included in the dataset switched ART regimens while using an implant during the study period; 30 women were exposed to two regimens and two women were exposed to three regimens. Four women used implants twice. Three women were exposed to three different combinations of implant and ART. Table 3 presents pregnancy IRRs and incidence in personyears of observations broken down by clinical status of women at each observation of co-administration. There were 115 pregnancies in the 1,190 instances of co-administration, yielding a pregnancy incidence rate of 6.32 (95% CI: 5.27-7.59), with 9.26 (95% CI: 7.18-11.96) among ETG and 4.74 (95% CI: 3.65-6.16) among LNG implant users, respectively.

Analysis of incidence by ART and contraceptive use
The pregnancy IRRs did not differ between EFV-and NVPbased regimens (IRR = 1.00; 95% CI: 0.71-1.43). No pregnancies were recorded among women on non-NNRTI-based regimens among both ETG and LNG implant users, which was Analysis of implant and ART combination showed that the pregnancy incidence rates among women using LNG implants with EFV and NVP or using a PI with either implant were significantly different from the reference population of ETG-EFV users.
Duration to failure curves ( Figure 1) show that pregnancies began occurring within months of concurrent use, steadily accumulating thereafter. The shortest time to pregnancy was 94 days (16.5% and 48.7% within 6 and 12 months, respectively). ETG implant users had more early pregnancies than LNG implant users, but this is likely correlated with higher overall pregnancy incidence and shorter approved duration of use.
As shown in Figure 2, the distribution does show skewing of our sample towards shorter periods of co-administration, given that the rise in implant use in Kenya is a recent phenomenon. The median duration of implant and ART co-administration in our dataset is 1.33 years (not shown).
The sensitivity analysis among women aged 15-34 resulted in slightly higher pregnancy incidence (6.89 per 100 person-years [CI: 5.69-8.35]). Other results remained similar, except IRRs for LNG-NVP and LNG-EFV combinations compared to ETG-NVP were no longer statistically significant. This suggests a lack of robustness in differences across drug-drug combinations, a likely artifact of higher use of ETG implants by younger women.

Discussion
An increase in the incidence of pregnancy among implant users on ART may negatively impact acceptability and trust in implants as a contraceptive choice for all women and their partners, regardless of HIV status. Our findings confirm earlier reports of implant failures among women taking EFV-based ART. Pregnancy incidence ranged from 4.78 to 9.84 per 100-person years for women using an implant with an NNRTI, a rate considerably higher than found in the general population 29,30 . Our results differ from previous studies in that NVP use was linked with higher incidence of pregnancies 21,23 . Among NVP users, pregnancy incidence ranged from 4.85 to 8.68 per 100-person years with concurrent LNG and ETG users, respectively. This is the first report, to our knowledge, to present evidence that NVPbased ART regimens may also influence effectiveness of both ETG and LNG. While surprising, the higher incidence of pregnancies in this population is biologically plausible given the metabolic pathways in the liver for both NNRTIs and progestins 31 .
Our exclusion of any pregnancy detected within three months of initiating co-administration minimizes accidental insertions of implants when conception has already occurred, which was a common problem in the Australia post-marketing study and could explain pregnancies in other studies 32 . In our study, pregnancy incidence rates in women using EFV were higher for women using ETG implants than those using LNG implants, but this finding is not robust among younger women. This pattern was repeated among NVP users.

Limitations
Retrospective studies can be limited due to issues of reliability and completeness of medical charts. Prospective studies with more frequent and standardized timing of follow-up could address some of these limitations. Prospectively capturing data could also allow for further analysis of drug-drug interactions with either ART or implants, such as rifampicin for TB or artemether-lumefantrine for malaria. However, the necessary financial investment and length of time would be substantial.
Our retrospective chart review had intended to measure multiple factors, including stage of HIV disease and other drug interactions; however, incompleteness of medical records did not allow for adequate analysis to answer these questions. We had also hoped to draw conclusions about timing of failure over time. However, because of recent expansion in use of implants in Western Kenya, most of the implant users in our data set had relatively a short duration of concurrent use with ART.
Biases may have also been introduced to the results because of variations in how RAs assessed abstracted records, due to the differing record management systems in the study facilities. Given the limitations in completeness of charts in the CCCs, the RAs diligently sought evidence of pregnancies in other units of the facilities, but it is difficult to do this systematically across diverse facilities.

Implications for practice
Women should be afforded choices when selecting FP and ART. However, women are generally unable to choose their ART. EFV is a safe, effective drug and will not be phased out soon as a first-line HIV treatment. Additionally, many countries still have a number of WLHIV using NVP, despite its continued phase out.
In 2016, WHO included dolutegravir (DTG) as an alternative drug for first-line regimens, which became available in Kenya in 2017 33 . Co-administration with implants has not been studied, although an interaction is unlikely, as DTG is not expected to significantly inhibit or induce CYP450 enzymes. Though DTG shows promise for women who wish to use contraceptive implants, concerns about potential birth defects has led to recommending that women of reproductive age be offered EFV instead. Results from a study in Botswana by Westhoff to assess change in ETG plasma levels among ETG implant users taking DTG are expected in late 2019 34,35 .
It remains challenging to incorporate the results of this study into service delivery for WLHIV. It is premature to discourage women from adopting implants. However, women on NNRTIs need information, in simple, yet accurate language, about the drug interaction and possibility of method failure. Experiences in South Africa have demonstrated the difficulties in ensuring providers understand these drug interactions and are comfortable sharing information with clients 24 . One option is to refer to the tier of effectiveness 36 , but explain that for women on certain ART regimens, the implant falls somewhere in the middle tier of effectiveness: slightly better than pills or injectables, but less effective than IUDs or sterilization. Women should retain the right to make fully informed decisions about which contraceptive option works best for them and what level of method failure risk is acceptable, given that they cannot switch ART regimen based on their fertility preferences. Ideally, building a client-centered culture would allow WLHIV desiring control over their fertility to be offered DTG concurrently with an implant or other LARC.
At minimum, programs should encourage better documentation of all medications in medical charts. Health systems should establish mechanisms to track adverse events (pregnancies) in WLHIV using ART. Study facilities had ongoing efforts to integrate FP within HIV care and treatments services. Fully integrating FP services by including LARCs within services for WLHIV may improve care. Health systems should also ensure that IUDs are as accessible as implants and remove barriers to sterilization services. Research into the benefits and costs of alternative service delivery models could inform national policies.
In conclusion, implants are highly effective; however, clients using a NNRTI-containing ART regimen need additional information about higher incidence of pregnancies when used in combination with ARVs to allow them to make informed decisions about contraceptive options.  The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Introduction
The last line in the 2 nd paragraph is extremely vague; I don't understand what the authors are trying to convey. It would be better to more explicitly elucidate their thinking.

○
The first line in the 3 rd paragraph should have a citation, and if it is the same citation as #6, then it should be cited with the first sentence with the assumption that the 2 nd sentence is supported by the same citation (unless the citation standards for this journal indicate otherwise). ○ Same paragraph: The 3 rd sentence needs a bit of fine tuning, in that this combination is ○ generally recommended as 1 st line ART regimen. Also, the next line appears inaccurate on the year; if I am not mistaken, single pill combination containing EFV, was the recommended 1 st line ART regimen before 2015 in Kenya (maybe even before 2012?, and please provide your citation for this assertion if you disagree). Also, the next paragraph doesn't make a lot of logical sense as written, and if the authors want to name the leading 1 st line ART regimen, they should name all 3 components of it. The "n" in nevirapine should not be capitalized. For the last line, if they can cite specific number on how many women of reproductive potential were on EFV-containing regimen, the sentence would be more to the point. It seems like the paragraphs in the introduction could be organized better. For example, the first line in the 2 nd paragraph discusses LARCs and implants and then discussion switches to ART for the rest of the paragraph. The 3 rd paragraph then discusses more on ART and the 4 th comes back to contraceptives. The 5 th paragraph then goes into describing the potential pathway for any drug-drug interactions at play here, and the data to date on this topic. Strongly reconsider the organization of the intro section for more logical flow.
○ 5 th paragraph: 2 nd line, technically 4 of the 6 citations are for case reports (which are not the same as retrospective clinical studies). The line discussing the Uganda PK study starts with "Similarly" which does not make any logic sense. It would greatly help readers if the authors only used either ETG/LNG acronyms or the brand names, but not both. Last sentence, technically this was not a record review, as is the current study, but rather a retrospective study of electronic medical record data, it would help to indicate the denominator for the rates i.e. "per 100 women-years of implant use," and it appears that citation number #19 is mistakenly included here. ○ 6 th paragraph: it appears the first line mistakenly cites #13 (as that citation is not applicable to their statement). Arguably, the goal of the 2 nd and 3 rd lines are redundant, might consider combining by saying something along the lines of: "With the ultimate goal of improving counseling for WLHIV, the primary aim of this…" ○

Methods
Data collection, 1 st paragraph: how did you know who was seen between Jan 2011 and Dec 2015? When listing all the variables you extracted, it might make sense to name them all and indicate that the applicable dates were also extracted (instead of mentioning it twice, and presumably it is applicable to not only viral load results but also CD4 count results). It may help to discuss how pregnancies are detected generally in these facilities-i.e. was it detected with clinical presentation while being gravid or via routine biochemical testing (e.g. urine pregnancy tests)?
○ 2 nd paragraph: might consider saying "facilities" throughout the manuscript, unless you do literally mean "hospitals" in which case it would help to explain which hospitals you selected and why. ○ Data cleaning paragraphs: There is a lot of detail here not usually included in publications, but the details do help. It appears only 38 women had a 2 nd or 3 rd observation in the dataset; it might help to state it as something along these lines. Given the organization of Table 1, you could consider including here or in the text, the median duration of observation time for 1 st observation, 2 nd observation, etc. Could you provide the justifications to censor observation time after the recommended use of implant durations, given that maintaining those times arguably provides even stronger evaluation for realworld effectiveness (if the woman is still using the implant past its recommended duration)? ○ Data analysis: only clarification is on last line of first paragraph, as "x100" is not clear if it applies to numerator or denominator. Perhaps consider saying "per 100 women-years." ○ A couple questions: How did you arrive at your sample size? was there ever a determination of the required sample size to achieve power for detecting a difference between certain combinations? Or did you simply go with a convenience sample for what was feasible for the study duration? Could you also describe how you determined ascertainment of the incident pregnancy to a combination category? i.e. was it the combination category at the time of the pregnancy detection or did you base it on date of likely conception (and if latter, please describe how you calculated the date of likely conception)? Did you consider double entry for cases where a pregnancy was detected (i.e. some cross-check or validation of your initial data entry), as data entry errors by RAs could have also occurred?

Results
Overall, the various comparisons are confusing and it is difficult for me to follow which comparison is being made when. It may help to use additional subheadings to clearly separate the findings, and it may also help to prioritize the main findings first ○ Small note on Table 3, that the 3 rd category of ART is "non-NNRTI" which does not necessarily equate to PI-containing ART (which is how the 3 rd category is defined in the manuscript), so it would be better to clarify if PI-only. Of note, I would not recommend a non-specific non-NNRTI category as the drug-drug interaction implications vary markedly by not only ART family but individual antiretrovirals. ○ Paragraph after Table 3 that starts with "Analysis of implant…": Could you provide justification for why ETG-EFV combination was used as the referent group? It seems like picking a category where you don't expect a drug-drug interaction, such as with NVP, is a better "control" group or the counterfactual. (Of note, when looking at Table 3, I see that you used ETG-NVP as your reference category-please rectify the difference.) More importantly, please justify why you would use ETG-EFV as the referent group for the LNG combinations? Wouldn't you want to compare each implant type against itself, given the possible differences in hormone concentrations, affinity to receptors, recommended duration of use, number of rods, etc.? As a possible secondary analysis, you can compare the ETG to LNG implants. Also, the conjunctions of "EFV and NVP or using a PI" are confusing, please consider rephrasing for clarity. ○ Figure 1: note use of PI here as opposed to NNRTI ○ 2 nd to last paragraph: it's not clear the marked utility in Figure 2, however if you wish to keep it, please consider dividing or creating 3 separate figures for, for example, the ART type with then the breakdown of implant type within it. Also, when reporting median duration of co-administration time, please also report standard IQR or range, and consider doing this for your 3 or 6 combination categories.
○ Several of the results paragraphs end in interpretation of the findings, which would be better discussed in the discussion section.

Discussion
The biggest missed opportunity in the discussion section is in better elucidating why no significant differences were found between EFV and NVP and respective implant combination groups. What are the possible reasons you did not detect a difference when several PK studies and other retrospective studies (namely Perry and Patel) have found differences? What are possible biases that arise from your chosen methodological approach? NVP's DDI profile is markedly different from EFV, so grouping them together and suggesting that all NNRTIs have similar effects is not well supported by existing PK and clinical studies. To this point, etravirine and rilpivirine, other NNRTIs, also have very different DDI and PK profiles than EFV. Given the topic is deeply rooted in the PK world, greater clarity from the PK perspective on the study findings is a must. ○ First paragraph: the use of "ranged" is likely not appropriate, as those are specific point estimates for certain combination categories, correct? So you might as well state that they are for x and y combinations, respectively. Also, generalizing to all NNRTIs is not appropriate as each NNRTI has a very different drug-drug interactions profile. Also, the first paragraph could use some rewording. It seems like in the middle of it, you are trying to highlight that, surprisingly, you found a higher rate of failures with NVP than you had hypothesized at the start of your study. Finally, in the latter half of the first paragraph, the interpretation of the authors for this finding comes across as too definitive-isn't it also possible that the nearly equivalent rate of implant failures among NVP and EFV users you found may signal possibly a "baseline" rate of failures among implant users? Of course, it does not explain the higher than the "baseline" rate that has been described in other settings.
○ First half of second paragraph: indeed, it is robust of you to censor pregnancies within the first 3 months of implant placement in cases where the implant may have been placed with a pregnancy already in place. However, shouldn't that result in a bias towards a lower pregnancy rate compared to the post-marketing surveillance study from Australia that you are comparing to? It would be prudent to spend some space in the discussion section to offer thoughts on why the overall pregnancy rates are higher in your study that other, predominantly non-African, settings.

○
The second half of second paragraph: this is a very interesting finding, and deserves discussion in a separate paragraph-why do you think this difference might exist? Why does it appear to diminish among younger women? ○ Implications for practice: 1 st paragraph, cite your basis for not anticipating DDIs with DTG and implants (i.e. the OC PK study by Song et. al. 1 ). 2 nd paragraph, shouldn't women also be offered the choice to pick an ART regimen that meets their values and preferences (e.g. DTG, knowing that there is currently an unknown and possible risk with adverse birth outcomes)? Given the great women-centered approach the intro and the first half of this paragraph takes, it seems like a missed opportunity to also not advocate for WLHIV to also be informed with best information available at the moment to make their own decisions.

Throughout
Considering saying either "failure rates" or "unintended pregnancy" throughout the manuscript, as switching between the two, often in back to back sentences, is confusing for readers who may not be familiar with the "contraception world" where the two equate each other.

Is the study design appropriate and is the work technically sound? Partly
Are sufficient details of methods and analysis provided to allow replication by others?

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: Yes, I have some minor competing interests in that I was asked to provide guidance to some of the authors in study design initially prior to data collection (though was not involved in study implementation at all) and was aware of their initial preliminary analyses. I have conducted very related analyses, and this group and mine held a joint study dissemination and stakeholder's meeting in Kisumu, Kenya in July of 2017 (some of our related analyses have been published to date, but not the primary results).

Reviewer Expertise:
HIV and women's health; cohort studies; contraception; pharmacokinetic studies; resource-limited settings I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Author Response 21 Dec 2019
Anne Pfitzer, Jhpiego, 1776 Massachusetts Ave, NW Suite 300, Washington, USA Thank you for taking the time to summarize and provide an overall assessment of the paper.
We note your reservations and have attempted to address them in the paper and in the responses below.
Regarding the choice of reference group, we have kept our presentation as it was, but added results of modeling within LNG reference in the text. The results remain unchanged for comparisons within ETG-ART combinations.
We agree that this paper is unable to provide a plausible explanation for some of the findings of drug-drug interactions with NVP-based regimen. We argue that we are merely sharing observations from medical chart data and that it is beyond our scope to assess underlying reasons. We hope that by publishing these data, we encourage further investigations. If this study is a complete outlier, then reviewers will come to their own conclusions, including about the difficulties of using retrospective, poorly documented medical charts for investigations of this sort.
We have taken your feedback about involving a pharmacologist in the review of our paper and attempted to do this, but could not find anyone able or willing to assist us.
Abstract: we have corrected the terminology used for drug-drug interactions and throughout the rest of the manuscript.
Introduction: We accepted your feedback and have made a number of edits to this section.
In particular, we edited the text to reflect the first-line ART recommendations (two NRTIs and one NNRTI) prior to the WHO recommendation in Dec 2018 transitioning to the integrase inhibitor, dolutegravir. The 2011 guidelines mentioned EFV or NVP in 3 first line regimens (along with either TDF or AZT or d4T as the first agent in the combination therapy). In 2014, TDF/3TC/EFV was the preferred first-line regimen for adults, and this has been stated. Relevant references have been included.
We disagree as that the Nanda reference (citation numbers have been changed) is not applicable to the statement you indicated. The final conclusion in the Nanda paper is that "More well-designed prospective studies are needed to examine potential drug interactions between ARVs and all contraceptive methods, to better inform guidelines and counseling for the more than sixteen million women living with HIV" and a shorter version of this statement appears in the abstract.
Methods: with respect to knowing about visits, we know who was seen in CCCs within the dates in their medical charts and the presence of their charts in the record rooms. We have clarified where dates of medical tests etc were also extracted in the revised version. We also added a short description about how pregnancy was typically noted in records and changed "hospital" to "facility" throughout the manuscript. We moved the Data Cleaning section to the results. The number of women who switched either ART or implants or both during the observation period are described in the second paragraph of demographic and ART information. We added mean and median durations of observation.
We censored the observation time after the recommended duration of implant use so as not to confuse the interpretation of results. If women concurrently using "expired" implants and ART experience a pregnancy, one could rightly assume that the failure is due to the loss in implant efficacy. We wanted to ensure that contraceptive failures could only be due to drug-drug interactions and not linked to the devices' efficacy. While understanding the real world effectiveness issues of women using implants past their labeled effectiveness is interesting, it merits a different study.
On the sample size: while we had determined in our protocol that we would need at least 177 concurrent users of LNG and EFV and 116 ETG users and EFV to counter the null hypothesis. However, we were concerned that selectively abstracting medical records from the facilities would lead to bias, so instead abstracted all records of concurrent users, yielding a much higher sample. This is why we omit discussion of sample sizes in the manuscript.
As discussed in the methods, we divided concurrent use for each combination of drugs, instead of assessing use by each woman… We based the original analysis on the regimen on date of pregnancy noted in records. Based on this comment, we reviewed the data and found two women that had a regimen switch within 9 months of the pregnancy date. Both were analyzed as EFV-users, but had switched from a NVP-based regimen within 4-5 months of the date pregnancy was noted. Thus, it is possible these were in fact pregnancies that occurred during NVP use. We have added a note about this in the results.
We did not use double entry, however field supervisors reviewed case files of women where a pregnancy was detected, as well as random samples of other case files.
Results: The results section is rather short (though has expanded in this version). There are already 4 subheadings. We hope our re-writing has helped clarify any confusion. Table 3: we edited the table to refer to PI-only, not non-NNRTI. Specifically these included LPV/r or ATV/r. Our raw data set included over a dozen combination of anti-retroviral agents (some of which were not consistent with HIV guidelines). A table listing all the possible combination would be unwieldly and contain small numbers for many categories. Furthermore the "PI" category is made up of a small number of observations. We don't feel breaking this category further is warranted given that we are primarily interested in NVP and EFV. However, researchers interested in other combinations can contact us for the data. The discrepancy between Table 3 and the text when it comes to the reference category has been corrected.
The referent combination was ETG-NVP, not ETG-EFV; the reason for this selection was presumed drug-drug interaction based on previous evidence. We felt comparing all combinations was most efficient. The pregnancy incidence rates remain the same even if the analysis separates by implant types. However, we redid an analysis of pregnancy IRRs for LNG-combinations and inserted the results in the text to address your comment. We didn't repeat the ETG analysis as the results are the same as shown. The text has been modified to clarify the comparisons. Also, we made Table 3  Discussion: You asked us to elucidate why we found no significant difference in the incidence of pregnancies among women using nevirapine and efavirenz-based regimen. We acknowledge the possibility that this finding is an anomaly. In terms of biases, we listed extensive limitations to our approach, with numerous potential errors. However, we doubt that all 40 pregnancies could be ascribed to data errors. Only a little over 30% of the observations included nevirapine, so the comparisons are not balanced. We have spent time exploring our data set to identify possible alternative explanations without seeing any patterns that would provide a plausible explanation.
Our authorship team is composed of primarily program implementers with an implementation science lens. Identifying pharmacologists to consult with about our results is therefore difficult. We attempted to do so through contacts at Johns Hopkins University without success. With our already prolonged timeline for posting a new version of the paper, we abandoned the effort. We suggest that the findings be included in the literature as an impetus to further study the issue.
First paragraph: We had constricted this section in one of our versions, but have edited to be explicit about the results for each combination category. We refined our description of the finding related to NVP.
Second paragraph: We added a sentence acknowledging that a different approach might have yielded even higher numbers of pregnancies. We feel we would be over-reaching if we attempted to comment on pregnancy rates in this context versus in non-African settings, as the number of studies is still very small.
Regarding the change in the sensitivity analysis removing older women: in general, we found a lack of robustness between preliminary analysis and subsequent cleaning of the data. These pregnancies remain relatively rare events. However, we may be able to hypothesize that younger women are more fertile than women aged 35-49, and thus the higher fertility negates any differences in type of implants.
Implications for practice: We have revised the discussion section to reflect the transition to DTG, citing relevant literature. We believe the last sentence of the implications paragraph is already advocating for person-centered care and ability to pick a regimen that fits their fertility intentions.
Choice of "unintended pregnancy" vs "failure" terminology: We have reviewed our manuscript with this comment in mind and replaced the unintended pregnancies term in some places, as we mostly discuss implant contraceptive failures. However, the measurement of that failure is a pregnancy while using an implant. Therefore there remain a few instances where the term "unintended pregnancy" cannot be replaced.
for implants among women taking antiretroviral therapies. However, the study is limited by its retrospective nature, and by the existing limitations of databases designed for other purposes. Of most concern is the outcome ascertainment, as detailed in the specific comments below.

Introduction:
The introduction is very long. Some of it could be moved to the discussion.

Methods:
How many research assistants abstracted data for each participant? How was data accuracy monitored? 1.
The study inclusion criteria are not clear. Did they have to be currently using an implant to be included? Or were women included if they ever used an implant and ART concurrently? 2.
For what proportion of women were the data incomplete? Did RAs consult other records for all women? Or only those on certain regimens? How were the MSI data matched to the ART data? How was the CCC number matched to other records? 3.
How were data collected on reasons for removal? Were actual pregnancies verified if a woman said her implant was removed due to pregnancy? My concern is that a woman could be concerned about the risk of pregnancy and ask for her implant to be removed without actually being pregnant. Also, how good were data on implant removal (for other reasons).

4.
You state that the investigators abstracted 1,612 records from women concurrently using ART and an implant. However, earlier you stated that the RAs abstracted data. Please clarify. Also, are these all women who at any time used ART and an implant concurrently?

5.
Please provide details on how you estimated the estimated fertilization date. 6.
How good were data on type of implant? Was implant type always noted? 7.
The exclusion of pregnancies within three months of implant insertion is important and appropriate.

8.
In the discussion of limitations, the author should spend more time discussing the lack of information on other medications, as well as any uncertainties in outcome ascertainment. I am concerned that the way pregnancy was determined may not be accurate, as pregnancies were not clinically verified (just a desire for removal due to pregnancy). I'm also concerned that other co-administered medications are not adequately captured. Retrospective studies such as this, though hypothesis generating, are subject to bias and confounding. Were RAs as diligent in seeking out possible pregnancy in women not using ART? 9.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Partly even harder when they suspected there was a contraceptive failure. We have modified the statement about MSI data to clarify that this was an aid in searching records but did not dramatically change the process. RAs still confirmed data from client records within the facility, but may have benefited from retrieval of FP client record numbers to do so. The RAs used a few key personal identifiers to match records from CCC with those of the same clients in other units (such as ANC). These identifiers were name, age or DOB, home sub-location. RAs were instructed to maintain confidentiality of those details and destroy them after completing medical record abstraction and having those reviewed by a supervisor.
Data on reasons for removal were abstracted from case files with the rest of the information recorded in a woman's record. The abstraction form included check boxes for likely reasons: pregnancy/method failure, menstrual/bleeding issues, desire for pregnancy, partner/spouse request, and expiration. Another field allowed RAs to add comments or provide an alternative reason for removal if one was found in the records. Almost all women with a recorded date of removal also had an indicated reason. Due to the nature of the study design, RAs could not verify pregnancies. However, missing or inconclusive data were found in the client's records in the Comprehensive Care Clinics, RAs referred to other sources (e.g., FP registers, maternity ward registers, provider notes) to verify and complete the abstraction details.
Three of the investigators interacted closely with providers. We have never heard of a case of a client asking for a removal out of concern of a risk of pregnancy that then later turned out not to have been. Providers are aware of amenorrhea as a possible side effect of implants. They also routinely carry out pregnancy tests.
Thank you for pointing out an inconsistency in our writing about the role of RAs as opposed to investigators. We have corrected. Research assistants abstracted the data under the investigators supervision (including close supervision of two of the co-authors of this paper).
Regarding concurrent use, no, some of the women whose data were abstracted did not use an implant and ART concurrently. Some used both, but not at the same time and were dropped during data cleaning (they're included in the group that had fewer than three months of concurrent use) We did not estimate the fertilization date. We only documented pregnancy status from clinical records. However, we also added content in the limitations section to explain that records were not always clear.
About the quality of data on implant types: we have added detail on the process RAs undertook to clarify type of implant. We excluded any records where the RAs were unable to determine the type of implant. We considered whether implant type was missing in data cleaning. After other cleaning, 1 person was dropped for not having implant type recorded. However, records with other missing information may have been cleaned prior to that step.
Thank you for the affirmation about excluding pregnancies within three months of implant insertion.
Discussion and limitations: We are limited by the way health providers ascertain a pregnancy and record pregnancy in the charts. We have added this in the limitations section of the discussion In terms of the concern related to other medications, we agree with you as this was our secondary research objective. However, record-keeping on other medications was very poor and we were not able to analyze this. But you will note that we expanded the results and discussion around potential drug-drug interactions further compounded with TB treatment, in response to reviewer comments. We already noted this in the second paragraph of our limitations section, but have expanded on it in this version. Regarding pregnancy in women not on ART. We do not include a comparison of women not on ART.
Results, section "Analysis of incidence by ART and contraceptive use": we appreciate the suggestions and corrections and have incorporated them. We also hope we have clarified what was previously the last paragraph. And regarding that paragraph, you made a comment referring to the sensitivity analysis (which has now incorporated into Table 3 per another reviewers suggestion). We have tried to reword the description of the analysis and its interpretation to make it clearer. The main purpose of such an analysis is to describe the robustness of the results. Our analysis showed lack of robustness in the difference in failures by type of implant. We don't feel showing the detailed data adds value.
Discussion: The multiple reviewer comments about the role of TB medications inspired to take a closer look at the subset of records with indications of TB. We added a paragraph on TB results and a new box summarizing our exploration of a subset of records with TB treatment that describes: the number or percentage of records with documentation of TB treatment in clinical records. We have added more commentary on the availability of TBrelated data in the medical charts in the discussion. With respect to ascertaining that "women on NVP were actually on NVP and had not been switched to EFV for some times depending on ART availability". Our methods for data abstraction relied upon complete and timely documentation within the facility-based comprehensive care clinic records. If there was a separate set of TB clinic client records, we did not review them. As for regimen, we can only attest that providers recorded the regimen of clients on their "Blue card" at every visit. It seems unlikely that an error of regimen would recur over multiple visits. We also did not hear of any historical concerns about ART availability.