Generalizability of PGS313 for breast cancer risk in a Los Angeles biobank

Summary Polygenic scores (PGSs) summarize the combined effect of common risk variants and are associated with breast cancer risk in patients without identifiable monogenic risk factors. One of the most well-validated PGSs in breast cancer to date is PGS313, which was developed from a Northern European biobank but has shown attenuated performance in non-European ancestries. We further investigate the generalizability of the PGS313 for American women of European (EA), African (AFR), Asian (EAA), and Latinx (HL) ancestry within one institution with a singular electronic health record (EHR) system, genotyping platform, and quality control process. We found that the PGS313 achieved overlapping areas under the receiver operator characteristic (ROC) curve (AUCs) in females of HL (AUC = 0.68, 95% confidence interval [CI] = 0.65–0.71) and EA ancestry (AUC = 0.70, 95% CI = 0.69–0.71) but lower AUCs for the AFR and EAA populations (AFR: AUC = 0.61, 95% CI = 0.56–0.65; EAA: AUC = 0.64, 95% CI = 0.60–0.680). While PGS313 is associated with hormone-receptor-positive (HR+) disease in EA Americans (odds ratio [OR] = 1.42, 95% CI = 1.16–1.64), this association is lost in African, Latinx, and Asian Americans. In summary, we found that PGS313 was significantly associated with breast cancer but with attenuated accuracy in women of AFR and EAA descent within a singular health system in Los Angeles. Our work further highlights the need for additional validation in diverse cohorts prior to the clinical implementation of PGSs.


Supplemental List
To evaluate the impact of PGS313 on OS, we compared survival times by Kaplan Meier analysis for European patients above (N=280) and below (N=651) the 70th percentile of the PGS.We found no difference in survival time between the two groups by log-rank test (p-value= 0.38).There was also no difference in survival time when comparing above and below the 50th percentile as well as the top 90th and lowest 10th percentiles.

Patients
For European patients, we initially found by Cox Proportional Hazards that the normalized PGS313 was inversely predictive of OS, suggesting that a lower PGS313 score translates to longer survival time (HR, 0.80; 95 CI, 0.64-0.99,p-value = 0.04).
However, when adjusted for other variables such as whether or not a patient had received chemotherapy, cancer subtype (HR+ and/or HER2+), and age of diagnosis, the normalized PGS313 score is no longer predictive of OS (p-value = 0.06).

Figure S4: Genetic Admixture across GIAs
We found that genetic admixture was present but in varying degrees.The EAA population had the least overlap with the EA population, whereas the HL population had the most overlap with the EA population.These results suggest that genetic admixture may explain the overlapping performance of the PGS on our HL and EA cohorts.

Subtyping
We first queried the top 50 most commonly prescribed cancer-related medications within the electronic medical record for our cohort in the EHR.These were manually reviewed and grouped based class and subtype relevance.Supplemental Table 1 shows our grouped list of HR+ relevant medications and Supplemental Table 2 shows our grouped list of HER2+ relevant medications.Patients were subtyped as HR+ and/or HER2+ if they had been ordered at least one relevant medication from each list in the EHR.

Genetic Admixture
Please refer to our prior publication regarding our methodology for our genetic admixture analysis using k=4 subpopulations, representing the number of genetically inferred ancestries (GIAs) in our study 1 .

Ancestries (GIA)
Univariate logistic regression was used to evaluate the association between PGS313 and observed rates of all breast cancers in American females of African (AA), European

Downsampling experiment
To confirm that differences in ORs were not due to sample size imbalance across the GIAs, we conducted an ensemble downsampling of all groups such that all were downsampled into 500 batches of 124 cases and 124 controls each, which were randomly selected, as this was the number of cases in the AA cohort, which had the fewest number of cases across all GIAs.We then calculated the average OR and 95% CI across all 500 experiments (see Batched OR and Batched CI) and compared this with the Raw ORs for each GIA.

Figure S1 :
Figure S1: Kaplan Meier Curve for Overall Survival in European Patients with or without

Figure S1 :
Figure S1: Kaplan Meier Curve for Overall Survival in European Patients with or

Figure S2 :
Figure S2: Kaplan Meier Curve for Overall Survival in European Patients above or

Table S1 :
Medications used for identifying HR+ breast cancer cases

Table S2 :
Medications used for identifying HER2+ breast cancer cases

Table S3 :
Association between PGS and Breast Cancer Risk in GIAs

Table S5 :
Downsampling ExperimentTo evaluate if differences in ORs were due to sample size imbalance across the GIAs, we conducted an ensemble downsampling of all groups in 500 batches.The averaged OR and 95% CI (see Batched OR and Batched CI) overlapped with all of the Raw ORs for each GIA, as determined by the overlapping 95% confidence intervals, suggesting that differences in OR observed in our AA cohort, relative to our EA cohort, are less likely due to differences in sample size.However, for the EA cohort, the batched 95% CI was wider than that observed for the raw data, suggesting that a component of the larger 95% CI spread for the AA cohort may be due to sample size, as expected.

(
EA), East Asian American (EAA), and Hispanic (HL) genetically inferred ancestry.As with prior studies testing the generalizability of PGS313, we normalized the raw PGS score of non-European GIAs based on the average and standard deviation of European samples.In comparison, Mavaddat et al. reported an OR of 1.61 (95 CI 1.57-1.65),whichoverlaps with all GIAs.Logistic regression was performed to predict the labeling of breast cancer status using the normalized PGS, Age at diagnosis, and PCs 1-9 as co-variates, as consistent with Mavaddat et al.AUCs were calculated for each of the four different GIAs separately.