1 Background

The human immune-deficiency virus (HIV) care continuum goes from diagnosis to viral suppression [1]. Thirty-nine million people globally have HIV, and 630,000 die from Acquired Immunodeficiency Syndrome (AIDS)-related illnesses annually [2]. The current goal is to end the epidemic by 2030 [3]. In order to achieve the epidemic goal, EMR-based predictive approaches and proactive interventions are key solution platforms [4,5,6]. EMRs are computerized platforms for storing and processing patients’ medical data [7]. They offer opportunities for risk prediction models that can be used for prediction model development, which can help identify high-risk HIV individuals [8,9,10].

EMR-based models have been used as clinical decision support (CDS) and have shown improvement in HIV care outcomes [11]. However, there are several methodological concerns during model development and validation [12, 13]. Previous systematic reviews indicated that almost all diagnostic and prognostic risk-predicting models were rated at high or unclear risk of bias, mostly because of non-representative sample size, improper handling of missing data, and inadequate reporting of model performance [14,15,16,17]. Often, prediction modeling studies did not fully address bias related to missing data, which is a common occurrence in EMR data [10], and external validation was rare [14].

On the other hand, the deployment of EMR-based models has demonstrated improvement in HIV care continuum outcomes [18], i.e., increased HIV screening [19,20,21,22], identified high risk and facilitated the re-linkage [23, 24], improved retention in care [25, 26], and improved viral suppression [27]. However, there is limited knowledge about their validity and usefulness. Therefore, a systematic study aimed to review methodological issues and the early-stage clinical impact of EMR-based prediction in the HIV care continuum.

2 Methods

This systematic review was guided by the “Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement for reporting [28]. We also used the guidance for conducting a systematic review of prognosis studies [29]. A proposal for the systematic review was registered on PROSPERO CRD42023454765.

2.1 Eligibility criteria

Studies were included if they were: (1) original articles and published in English between January 1st, 2010 and January 17th, 2022; (2) used structured EMR systems for prediction model development, validation, or update; (3) aimed at predicting any risk related to HIV care conditions (e.g., not other diseases); (4) deployed as clinical decision-support tools in HIV care settings, which allowed us to provide an early-stage clinical impact.

We excluded studies if they: (1) used unstructured EMR data or electronic clinical notes; (2) targeted non-HIV care or disease; (3) were population-level EMR-based risk prediction models like models predicting the incidence of HIV infection; (4) were not original research (e.g., systematic review); and (5) had no full text available.

2.2 Literature search

We searched the PubMed database and Google Scholar. In addition, we conducted a backward citation search, checking the references of identified papers, and a forward citation search using Google Scholar, which discovered papers referencing the identified papers. We used the guidance for formulating a review question for a systematic review of prediction models provided by the checklist for critical appraisal and data extraction for systematic reviews of prediction modeling studies (CHARMS) [30]. Key items for searching strategy and study eligibility criteria for systematic review followed the PICO (population, intervention, comparison, and outcome) guidance used in therapeutic studies [30, 31]. For our easy search strategy, we categorized the formulated PICO question into three concepts: (a) electronic medical records; (b) risk prediction model; and (c) HIV continuum care outcomes. Finally, all these three concepts are combined with the “AND” logical operator. The detailed search query used in PubMed is available [see Additional File 1].

2.3 Study selection

After we checked for any duplicates, two independent reviewers (TE and AD) conducted the study selection process in three stages. In the first stage, titles and abstracts were screened to identify papers for full-text screening. In the second stage, the selected papers were reviewed in full following inclusion and exclusion criteria. The disagreements between the two independent reviewers were resolved based on reaching a consensus to have a list of papers for a full-text review. In the third stage, we conducted a full-text paper review for eligibility. Finally, at this stage, we listed the final full-text reviewed articles for data extraction and inclusion for qualitative analysis.

2.4 Data extraction process

Two authors (TE and AD) independently extracted the data. For articles aimed at model development and validation, we used a spreadsheet and extracted the data using a CHARMS checklist [30]. For those studies aimed at model development, we extracted information on the first author, year of publication, the clinical context of the model, study setting, modeling methods, EMR data source, sample size including number of events, number of predictors, missing data management, outcome predicted, validation type including technique used, and evaluation metrics. Further, we extracted information on whether the authors evaluated the models with clinical utility metrics such as decision curve analysis (DCA) and net benefit [32]. For the articles aimed at model deployment, we extracted information on the first author, year of publication, methods of model presentation to end users, model application, and early-stage clinical impact, independently of the study design chosen [33].

2.5 Risk of bias assessment

We appraised the presence of bias in the model development and validation using the “prediction model risk of bias assessment tool (PROBAST)” [34]. Two investigators (TE and AD) independently assessed the risk of bias (ROB). It contained 20 signaling questions in four domains: 2 questions related to participants, 3 questions related to predictors, 6 questions about outcomes, and 9 questions related to the statistical analysis domain. These questions were answered as "yes," “probably yes," “probably no," "no," or “no information." ROB is rated as low, high, or has some concerns. If a domain contains at least one question answered with “no” or “probably no," it is rated as high-risk. If all the questions contained in a domain were signaled with “yes” or “probably yes," the domain is rated as low-risk. When all domains are at low risk, the overall risk of bias is considered to be low risk; when at least one domain is high risk, the overall risk of bias is rated as high risk. If some concern for bias was noted in at least one domain and it was low for all others, it is rated as having some concern for bias. The summary of judgment for ROB was visualized using a web application for visualizing risk-of-bias assessments built on the Robes R package [35].

2.6 Synthesis and analysis

We performed a qualitative synthesis by consulting the statement of the “Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD)” [36], CHARMS checklist [30], and PROBAST [34]. In addition, the reporting guideline for the early-stage clinical evaluation of the decision support system was used for the clinical impact evaluations [33]. Study characteristics were counted and tabulated, and descriptive statistics, including proportions, medians, and interquartile ranges, were used. The findings are presented in tables, figures, and the text.

2.7 Patient and public involvement

In this study, there was no patient involvement in terms of formulating review questions or measuring outcomes, or in study design or implementation. Patients did not provide suggestions on recording or interpreting the results. There are no plans to disseminate study results to patients or relevant communities.

3 Results

Of the 7760 identified articles, 7684 were excluded after the title and abstract screening according to the aforementioned criteria. The majority were excluded due to the title of the modeling studies being different from HIV continuum care settings. Then, upon the consensus of independent reviewers (TE and AD), 76 were considered potentially relevant. These 76 papers were screened in full text, and another 41 were excluded, of which 13 were unstructured EMR-based modeling studies, 14 were not focused on HIV care, 11 were systematic reviews, and 3 had no full text available. Finally, 35 were included for data extraction and qualitative analysis, of which 24 (68.6%) were aimed at model development and 11 (31.4%) were aimed at model deployment (Fig. 1). Thirty-one (88.6%) were from high-income countries, of which 26 (74.3%) were from the United States of America, and 4 (11.4%) reported from middle-income countries and low-income countries. In terms of research trends, there has been a gradual increase in the number of studies using EMR-based prediction modeling and deploying it as a clinical decision support tool in primary HIV continuum care. Thirteen of the thirty-five full-text reviewed articles were published between 2014 and 2017, ten (28.5%) between 2018 and 2021, and four (11.4%) between 2010 and 2013 (see Additional File 2).

Fig. 1
figure 1

PRISMA flow chart displays the process of study identification, screening, and inclusion

3.1 Outcomes predicted in the HIV care continuum

Twelve (34.3%) studies predicted individual risk of carrying HIV [19,20,21,22, 37,38,39,40,41,42,43,44]; 9 (25.7%) predicted risk of interrupting HIV care [23,24,25,26, 45,46,47,48,49]; 7 (20.0%) predicted risk of virological failure [27, 50,51,52,53,54,55]; 2 (5.7%) predicted phenotype of HIV-1/2 infection [56, 57]; others predicted patients newly diagnosed with HIV infection [58], clinical status of HIV patients [59], CD4 count change [60], comorbidity burden [61] and risk of readmission and death [62] [Fig. 2].

Fig. 2
figure 2

Outcomes predicted in the HIV care continuum by all the included studies (n = 35)

3.2 Model development

We conducted appraisals for those 24 studies aimed at model development and validation. Of which 15 (62.5%) were diagnostic models aimed at predicting the current risk of having an event of interest outcomes [37,38,39,40,41,42,43, 51, 54, 56,57,58,59, 61, 62], and 9 (37.5%) were prognostic models aimed at predicting the future risk of event occurrence in the HIV care continuum [45,46,47,48, 50, 52, 53, 55, 60]. Regarding participants used in modeling, 14 (58.3%) studies used EMRs from multi-center sites for model development and validation, while 10 (41.7%) studies used EMR data from a single site. In around two-fifths of model development studies, 11 (45.8%) were derived using machine learning algorithms, 9 (37.5%) used generalized linear models such as logistic regression and Cox regression algorithms, and 5 (20.8%) used chart review-based algorithm classification [Table 1]. The median sample size for model development was 2633 (interquartile range: 795, 138, 806). Seven studies (29.2%) used a sample size of less than 1000. The median number of predictors was 20 (interquartile range: 12, 94), with only 7 studies using 50 or more predictors and 4 studies using fewer than 10 predictors (see Additional File 2).

Table 1 Model development methods, and model validation techniques (n = 24)

3.3 Model validation and performance metrics

A vital stage in any predictive modeling process is model validation. The majority of the modeling studies that were included carried out internal validation, i.e., 12 (50%) of them then employed cross-validation techniques followed by sample splitting [4 (16.7%)] and bootstrapping [3 (12.5%)]. Four of those studies (16.7%) used more than one type of validation technique (Table 1). As for the model performance measures, 14 (58.3%) used the c-statistic, or area under the receiver operating characteristic curve (AUC), to evaluate the discrimination of the model. Other assessment measures, such as sensitivity and positive predictive value (PPV), were given in addition to c-statistics. Sensitivity was reported in 11 (45.8%), and PPV was measured in 8 (33.3%) prediction modeling studies. Six (25%) papers were evaluated using model calibration, and one of them reported an F1 score along with an AUC. Decision curve analysis (DCA) was not reported in any of the included studies to evaluate clinical value by taking clinical implications into account (see an additional file 2).

3.4 Risk of bias assessment

Figure 3 shows a risk of bias assessment according to PROBAST for those 24 model development and validation studies (except for the eleven studies aimed at model deployment). Six (25%) were rated at high risk of bias, 14 (58.3%) were rated at some concern about the risk of bias, and 4 (16.7%) were rated at low risk of bias. The majority of 20 (83.3%) studies expressed some concern about the risk of bias in the statistical analysis domain (see Additional File 3). For instance, 14 (58.3%) studies did not report the number of events used in model development, 12 (50%) did not consider or report missing data management, 4 (16.7%) did not report validation techniques, and 21 (87.5%) did not conduct external validation. Similarly, 10 (41.6%) studies did not report model discrimination performance, 18 (75%) did not report model calibration, and none measured clinical performance using decision analytics. For more details on this, see an additional file 2.

Fig. 3
figure 3

Summary of risk of bias assessment according to BROBAST for studies aimed at model development and validation (n = 24)

3.5 Early-stage clinical impact

In the second part of this systematic review, we assessed the early-stage clinical impact. Out of 35 included, 11 (31.4%) studies deployed EMR-based prediction tools in the HIV care continuum [19,20,21,22,23,24,25,26,27, 44, 49]. These tools are used as clinical decision support by generating EMR-based alerts when triggered by the predefined risk level or criteria and/or preparing a work list that assigns patients to their risk category. Five (45.5%) studies deployed EMR-based models to identify individuals at high risk of carrying HIV and who are eligible for screening [19,20,21,22, 44]. According to the preliminary reports, four of these models increased screening rates in HIV care practices [19,20,21,22], while one study did not impact the test rate [44]. Similarly, 5 (45.5%) were deployed to identify HIV patients who were at high risk of interrupting HIV care or losing care [23,24,25,26, 49], and one study was deployed to predict HIV patients at high risk of treatment failure [27]. According to the preliminary reports, all of these models lowered the rate of care interruption [23,24,25,26, 49] and improved viral load suppression [27]. [Table 2].

Table 2 Early-stage clinical impact assessment for studies aimed at model deployment (n = 11)

4 Discussion

This systematic review showed that within the past decade, at least 35 studies were published on EMR-based modeling studies in HIV care settings. The majority of these models predicted the individual risk of carrying HIV, the risk of interrupting HIV care, and the risk of virological failure. The finding was supported by the work of Ridgway and colleagues. Accurate EMR-based models have been developed to predict individual diagnoses of HIV, care attendance, and viral suppression [11]. This could be due to the increased interest in prediction model development and its potential applications as clinical decision support, followed by the rise of EMRs in healthcare settings [16]. The other argument could be narrated as the need for risk stratification. Since resources are getting scarce and a classical, “one-size-fits-all” intervention strategy is clinically inefficient, researchers are looking for a solution like predicting and stratifying patients into risk levels and tailoring intervention for those who may benefit most [9, 63].

We appraised those 24 studies aimed at model development and identified common methodological pitfalls, mostly because of the lack of a reported number of events for the outcome of interest and not considering or reporting the missing data. In addition, several modeling studies did not report important statistical performance, such as discrimination, calibration, and clinical performance. Other systematic reviewers also reported similar problems in prediction modeling studies [10, 17, 64]. These methodological issues can limit the generalizability of the prediction models or reduce their performance when applied to new patient populations or to other care settings [65]. Even though several pitfalls can be encountered during prediction model development and validation, measures for discrimination (e.g., area under a curve) and calibration (e.g., showing a calibration curve) are key aspects of statistical performance [13, 66]. Furthermore, before the actual use of EMR-based predictive models as clinical decision support, it’s important to assess clinical consequences using decision curve analysis and the net benefit [13, 32].

Another key factor in our systematic review was the early-stage clinical impact of EMR-based prediction models deployed independently of methodological concerns. The early-stage clinical impact of these models was increased HIV screening rates, linkage to care, a lowered rate of care interruption, and improved engagement in HIV care and treatment. A previous study by Ridgway JP and colleagues also reported that EMR-based clinical decision support tools had been effectively utilized to improve HIV care continuum outcomes [11]. For instance, some of the included studies reported that using an EMR-driven clinical decision support tool had increased monthly HIV screening from 7 to 550 monthly average HIV screens [19] and increased linkage to care from 15 to 100% [21]. Similarly, it lowered the loss of follow-up in HIV care by 10% [49], improved the clinical appointment visit by 3.8% [67], and increased the likelihood of achieving viral suppression by 15% [27]. However, reporting of the early-stage clinical impact is inadequate to promote the impact of EMR-based models in HIV care settings. Thus, further investigation may be needed to evaluate the potential and actual clinical impact on improving decisions and patient outcomes [13].

This systematic review has some limitations. Firstly, it only includes peer-reviewed journal articles published in English, which may introduce language and publication bias. Secondly, searching only in the PubMed database and Google Scholar might increase publication bias. However, to minimize this, we search for backward and forward citations in Google Scholar. Thirdly, due to variations in the definitions of predicted HIV continuum care outcomes and a lack of uniformity of model performance measurements across different studies, which make reasonable comparisons challenging, we were unable to quantitatively assess effects on specific clinical domains across studies and conducted a meta-analysis.

5 Conclusions

EMR-based prediction models have been developed, and some of them are practically deployed as clinical decision support in the HIV care continuum. The majority of these models have predicted the patient’s risk of carrying HIV, the risk of interrupting HIV care, and the risk of treatment or virological failure. However, the methodological assessment shows that most studies were at high risk of bias or some concerns. Several studies did not report the number of events, missing data management, and inadequate reports of statistical performance. Rare modeling studies conduct external validation. On the other hand, the early-stage clinical impact findings show that almost all of those deployed models have improved HIV care continuum outcomes.

5.1 Recommendations and policy implications

Firstly, the common recommendations are to take methodological concerns into account in the model development process to improve the predictive ability. Secondly, models should be externally validated in other care settings or in new patient populations to test their predictive performance. Lastly, while there is early-stage clinical impact observed with EMR-based predictive models in HIV care settings, it is important to assess clinical consequences before widespread implementation.