Improving identification of symptomatic cancer at primary care clinics: A predictive modeling analysis in Botswana

Abstract In resource‐limited settings, augmenting primary care provider (PCP)‐based referrals with data‐derived algorithms could direct scarce resources towards those patients most likely to have a cancer diagnosis and benefit from early treatment. Using data from Botswana, we compared accuracy of predictions of probable cancer using different approaches for identifying symptomatic cancer at primary clinics. We followed cancer suspects until they entered specialized care for cancer treatment (following pathologically confirmed diagnosis), exited from the study following noncancer diagnosis, or died. Routine symptom and demographic data included baseline cancer probability assessed by the primary care provider (low, intermediate, high), age, sex, performance status, baseline cancer probability by study physician, predominant symptom (lump, bleeding, pain or other) and HIV status. Logistic regression with 10‐fold cross‐validation was used to evaluate classification by different sets of predictors: (1) PCPs, (2) Algorithm‐only, (3) External specialist physician review and (4) Primary clinician augmented by algorithm. Classification accuracy was assessed using c‐statistics, sensitivity and specificity. Six hundred and twenty‐three adult cancer suspects with complete data were retained, of whom 166 (27%) were diagnosed with cancer. Models using PCP augmented by algorithm (c‐statistic: 77.2%, 95% CI: 73.4%, 81.0%) and external study physician assessment (77.6%, 95% CI: 73.6%, 81.7%) performed better than algorithm‐only (74.9%, 95% CI: 71.0%, 78.9%) and PCP initial assessment (62.8%, 95% CI: 57.9%, 67.7%) in correctly classifying suspected cancer patients. Sensitivity and specificity statistics from models combining PCP classifications and routine data were comparable to physicians, suggesting that incorporating data‐driven algorithms into referral systems could improve efficiency.

In Africa, progress in the delivery of cancer care remains overshadowed by the persistence of barriers in the referral process from primary care to specialized oncology care. As a consequence, many Africans continue to suffer delays in cancer diagnosis and treatment. The present study assessed the possibility of using data-driven algorithms to augment primary care provider (PCP)-based referrals for suspected cancer cases. Models show that initial PCP assessment augmented by routine data facilitates correct classification of suspected cancer cases. The findings suggest that risk scores derived from data-driven models could improve PCP assessments of probable cancer, particularly in low-resource settings.

| INTRODUCTION
Cancer patients in sub-Saharan Africa experience some of the highest case fatality rates globally. 1 The annual burden of cancer in the region is expected to increase to 1.27 million new cases and 1 million deaths from cancer by 2030. 2 African governments have made important progress in strengthening cancer care delivery through task-shifting, lower medicine costs, novel equipment and diagnostics and investments in radiotherapy and surgical centers. [3][4][5][6][7][8][9][10] Despite these achievements, many Africans still experience geographic and financial barriers that prevent them from accessing timely cancer treatment. [11][12][13] Cancer programs often lack the appropriate infrastructure, medicines and equipment and trained staff needed to deliver high quality cancer care. 9,[14][15][16] Improving the process of referral from primary to specialized oncology care centers through systems-level interventions is critical for ensuring timely access to treatment. 13,14,17,18 Botswana, a middle-income country in sub-Saharan Africa, has a national cancer control program that focuses on integrating cancer diagnosis and treatment services into existing health care programs. [18][19][20] Following successes in providing universal antiretroviral therapy for patients infected with human immunodeficiency virus (HIV), [21][22][23][24][25] Botswana implemented national cervical cancer screening program starting in 2012. Free treatment is available to all citizens with cancer. 26 However, delays in initiating treatment-sometimes as long as 7 months (IQR: 3.6 to 13.9) from initial presentation with cancer symptom 27 -contribute to over half of patients diagnosed with advanced disease. 27 A key barrier to timely treatment is low cancer literacy among health providers whose training and experience has historically focused on maternal-child health and infectious disease. 19 We hypothesized that data-driven algorithms could improve classification of patients at increased risk of having cancer among those identified by a primary care provider (PCP) as a cancer suspect.
Potentially, diagnostic evaluation in high-risk cancer suspects could be prioritized with improved access to specialist clinicians, scarce services (eg, cross-sectional imaging, invasive biopsies and endoscopy) and expedited pathology review. Augmenting PCP referrals of patients suspected to have cancer ("cancer suspects") with datadriven algorithms could shorten time to treatment initiation and alleviate burden on limited numbers of trained oncologists. 28,29 Given that labor contributes 20% to oncology programmatic costs in African settings, efficiency gains accrued by task-shifting could lead to substantial cost savings. 30 Algorithmic predictions could enable more accurate PCP referrals of high-risk patients for further examination by oncology care providers, and direct noncancer suspects to more appropriate services. Similar approaches have improved targeting of treatment to HIV-positive individuals in African settings. [31][32][33] Our study aimed to assess the potential effectiveness of datadriven clinical decision-making aids to support PCP referrals of cancer suspects in low-resource settings. We compared classification of cancer suspects for whom ultimate cancer diagnosis was available across models using (1) PCP, (2) algorithm-only, (3) external specialist physician review and (4) PCP augmented by algorithm.  Figure 1A.
Male and female cancer suspects presenting to primary care facilities in the rural Kweneng East District (estimated catchment population: 200 000) between April 2016 and June 2020 were enrolled in the study. Participants had to be 18 years or older, a resident of Kweneng East, not incarcerated, and not currently pregnant to be eligible for the study. PCPs, many of whom had received specialized training in detection of cancer symptoms, documented cancer suspects in a clinic-based register and communicated these to study team either by phone or in person. Patients were followed from index facility visit until cancer diagnosis or exit from program due to exclusion of cancer, death prior to diagnosis or request to be removed. There were 952 patients with suspected cancer who met eligibility criteria, completed the study, and were assigned an exit outcome. We excluded participants who were missing demographic or clinical information at baseline. Cancer probability was assessed independently by study physicians and PCPs. PCPs received a standardized training in detection of presenting symptoms of common cancers in this population. 19 Study physicians reviewed suspected cancer cases' demographics and clinical laboratory and pathology assessments to assign cancer probability.

| Measures
Using their clinical judgment and available information (including the PCP's assessment of cancer likelihood), external specialist physicians then assigned their own assessment of probability of cancer: low (noncancer causes of the presenting symptoms is more likely), medium (cancer is among the leading causes of presenting symptoms) and high (cancer is the leading cause of presenting symptoms). The same two physicians (NT and SDP) performed assessments throughout to ensure consistency.
We compared four sets of predictor measures commonly used by PCPs and physicians to stratify suspected cancer cases based on risk.
Different predictor sets represent potential approaches to augment a PCP's decision to prioritize a patient for further cancer evaluation.
Predictor set (1) included the PCP's own initial classification of cancer probability modeled as a categorical variable (low, medium or high).
Predictor set (2) included routinely collected demographic characteristics and presenting symptoms (patient age and sex, presence, or absence of mass/lump, ECOG score and HIV status), which we refer to hereafter as "algorithm only." We chose to include these factors because these elements are or could reasonably be routinely collected as part of clinical care. In the Potlako study, external specialist physicians (Authors: SDP, NMT) also reviewed summarized information from cancer suspects at time of initial presenting and provided their own assessment to direct patient navigators towards those cancer suspects deemed as having high risk of cancer. External specialist physicians also assigned probability of cancer as a categorical variable (high, medium and low). While this physician review may not be uniformly available in primary care settings in low-resource contexts, we felt this would represent an "ideal" clinical workflow and so was included as predictor set (3). Finally, in predictor set (4) we chose to assess a model including predictor sets (1) and (2), representing augmentation of the PCP's assessment with routinely collected demographic characteristics and presenting symptoms in the absence of a specialist physician. We refer to predictor set (4) as "PCP augmented by algorithm". The adopted Potlako program is outlined in Figure 1B. To assess discrimination of our regression models containing each set of predictors, we reported c-statistics and plotted receiver operating curves (ROC). Since costs of both false positive and negative classifications should be considered when selecting a decision-making algorithm, we reported sensitivity, specificity, positive predictive value and negative predictive value. Since the predicted probabilities estimated from logistic regression models provide a continuous rather than binary set of values, we chose three thresholds for classifying a patient as having cancer based on whether their modeled predicted probability was ≥0.3, 0.4 or 0.5. Lower classification thresholds favoring higher sensitivity (correct classification of true cancer cases) and

| Statistical analysis
higher thresholds favoring higher specificity (correct classification of noncancer cases). As a sensitivity analysis, we compared predictors identified above to those derived from a data-driven covariate selection approach using least absolute shrinkage and selection operator (LASSO). 35 Results from LASSO, which also account for possible collinearity between predictors, did not differ from those obtained from our main analysis and so we report results from our main models.
We then estimated the total number of participants who would be classified as having low, moderate or high probability under each predictor set to determine how many total patients would be impacted by adopting the data-driven algorithm. To generate low, moderate and high probabilities from our data-driven models, we generated tertiles based on the distribution of predicted probabilities for predictor sets (2)

| Classification of cancer suspects by predictor sets
Results of classifications under the predictive probability thresholds and different predictor sets revealed optimal trade-offs between sensitivity and specificity when using a 0.4 threshold ( Table 2) Predictor set (3) yielded the highest negative predictive value of (83%), followed by model 4 (82%), model 2 (81%) and model 1 (78%).
Comparisons of classification indicators across other cut points (30%, 50%) demonstrate that sensitivity is optimized at lower thresholds, whereas specificity is optimized at higher thresholds. Losses in sensitivity with increasing thresholds occur more quickly than gains for other indicators.
ROC showed similar discrimination for predictor sets (3) and (4) ( Figure 2). The area under the curve (AUC) for predictor set (

| DISCUSSION
In this study comparing different sets of predictors to estimate probability of cancer among cancer suspects in Botswana, we found that predictions derived from models with PCP augmented by routine data improved correct classification relative to the initial PCP's assessment alone. Predictions from models relying on PCP augmented by routine data led to discrimination and sensitivity metrics comparable to those from models relying on physician's initial assessment. In general, combining information from predictor sets yielded improved classification of noncancer cases, reflected by higher specificity and negative predictive values. These findings imply that PCPs could improve their assessments of probable cancer by using predictions from data driven algorithms that incorporate routine demographic and symptom data.
Empowering PCPs to conduct these assessments would alleviate burden on physicians, who could then focus on patients most likely to have cancer.

| CONCLUSION
We found that data-driven algorithms relying on PCP's initial assessment augmented by routine data led to improvements in classifying cancer suspects who had cancer in a peri-urban population of writing-review and editing. Siamisang Balosang: writing-review and editing, data curation. Neo M. Tapela: supervision, project administration, writing-review and editing. Scott L. Dryden-Peterson: conceptualization, supervision, project administration, writing-review and editing. The work reported in the paper has been performed by the authors, unless clearly specified in the text.