Using Polygenic Risk Scores to Aid Diagnosis of Patients With Early Inflammatory Arthritis: Results From the Norfolk Arthritis Register

There is growing evidence that genetic data are of benefit in the rheumatology outpatient setting by aiding early diagnosis. A genetic probability tool (G‐PROB) has been developed to aid diagnosis has not yet been tested in a real‐world setting. Our aim was to assess whether G‐PROB could aid diagnosis in the rheumatology outpatient setting using data from the Norfolk Arthritis Register (NOAR), a prospective observational cohort of patients presenting with early inflammatory arthritis.

The views expressed are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, or the UK Department of Health.
Supported by Arthritis Research UK (core program grant 20385) and by the NIHR Manchester Biomedical Research Centre.Supported by the Innovative Medicines Initiative 2 Joint Undertaking (grant agreement 101007757, HIPPOCRATES).The Innovative Medicines Initiative 2 receives support from the European Union's Horizon 2020 research and innovation program and the European Federation of Pharmaceutical Industries and Associations.Drs.Hum and Sharma's work was supported by the NIHR as holders of Academic Clinical Fellowships funded by the Integrated Academic Training program.Dr. Viatte was supported by the Swiss Foundation for Medical-Biological Scholarships, managed by the Swiss National Science Foundation (research grant PASMP3_134380).

INTRODUCTION
Given that there has been and continues to be significant investment in the study of human genetics and genomics, 1 it is important that we fully harness the potential value of genetic data in the clinical setting.With initiatives such as the "UK Newborn Genomes Programme," which will whole-genome sequence up to 200,000 newborn babies, 2 and the popularity of direct-to-consumer genotyping companies, 3,4 an ever increasing proportion of the population will be genotyped, meaning that these data will be increasingly available with no added cost.Coupled with the fact that the costs associated with whole-genome genotyping have reduced over time, it is now feasible to have genetic data available for patients during patient appointments. 3,4isdiagnosis of patients first presenting to rheumatology clinic with suspected early inflammatory arthritis can delay treatment and increase the risk of irreversible disability, comorbidity, and death. 5,6The majority of patients seen in rheumatology outpatient clinics with early inflammatory arthritis will be diagnosed with rheumatoid arthritis (RA), 7 psoriatic arthritis (PsA), 8 systemic lupus erythematosus (SLE), 9 ankylosing spondylitis (SpA), 10 or gout. 11][19][20][21] Polygenic risk scores (PRSs) are an estimate of an individual's genetic predisposition to a disease or trait and are typically calculated from the sum of risk alleles carried by the individual weighted by the effect estimate of the risk allele. 22Although using genetics on a population level has limited value given the low prevalence of rheumatic diseases, using genetics to aid diagnosis in a rheumatology clinic, where all patients will have symptoms and thus have an increased pretest probability for disease, may make probabilistic predictions based on genetics more effective. 23,24ased on this premise, a tool to aid diagnosis using existing knowledge of genetic risk loci resulted in the development of a genetic probability tool (G-PROB). 25G-PROB converts PRSs derived from known risk variants into easy-to-interpret conditional probabilities (G-probabilities) for multiple diseases, assuming that one of the diseases is present. 25In principle, patients would be genotyped before their first rheumatology clinic visit, and the clinician would be provided with G-probabilities adding up to 100%, indicating a breakdown of which diagnoses are most likely based on the presence of known disease-associated variants.These G-probabilities would then be interpreted, with diseases with low G-probabilities (eg, <5% or <20%) suggesting unlikely diagnoses and diseases with high G-probabilities (eg, >20% or >50%) suggesting a likely diagnosis.This information could complement existing investigations and tools, such as clinical history and physical examination, to aid and improve the accuracy of the first provisional diagnosis.
The authors who developed G-PROB tested the tool in various settings.These initially included applying G-PROB to simulated cohorts of various sizes and disease prevalence and finally on a constructed cohort of 243 patients from a biobank who had presented to a rheumatology outpatient clinic with suspected synovitis using classification criteria to define cases.They concluded that G-probabilities created by G-PROB were well calibrated with diagnosis based on classification criteria, with high negative predictive value (NPV) and modest positive predictive value (PPV). 25However, G-PROB has not been tested on a realworld cohort of patients from a rheumatology outpatient clinic with real-world disease prevalence, nor has its performance using clinician diagnosis been assessed.Before conducting prospective studies of the G-PROB tool, it is necessary to assess its performance in its intended setting by comparing it with clinician diagnosis in patients presenting to a rheumatology outpatient clinic with early inflammatory arthritis.The aim of our study was to assess G-PROB's ability to aid clinician diagnosis of patients who present with early inflammatory arthritis using prospective data from a large observational cohort in which the clinical heterogeneity and disease prevalence reflect that seen in a typical rheumatology clinic.

METHODS
The Norfolk Arthritis Register.The Norfolk Arthritis Register (NOAR) is a primary care-based inception cohort of patients with early inflammatory arthritis, defined as a minimum of two swollen joints for a period of at least 4 weeks. 26 Clinical diagnosis.Case note review was undertaken for all patients in the NOAR cohort, and the clinical diagnosis made by a consultant rheumatologist at the last available follow-up was recorded.Patients with a documented clinical diagnosis were included in the analysis.Patients with missing case notes were excluded (Figure 1).
The G-PROB tool and calculating G-probabilities.The full method by which G-PROB operates can be found in the original study. 25Briefly, G-PROB generates G-probabilities, which each correspond to a disease probability based on a weighted genetic risk score.This risk score considers the presence of known genetic variants associated with disease susceptibility and estimated disease prevalence and also assumes the patient has one of the diseases in the model.In theory, G-PROB can be DIAGNOSIS OF PATIENTS USING POLYGENIC RISK SCORES used to discriminate any sets of diseases with known genetic risk variants.G-PROB requires three inputs to generate G-probabilities: 1) estimated disease prevalence, 2) odds ratios (ORs) of susceptibility for known genetic variants, and 3) genotypes including presence of known genetic variants.
Genotyping and imputation.All samples with sufficient DNA available were genotyped using the Illumina Infinium CoreExome genotyping array.This was performed in accordance with the manufacturer's instructions, in which genotype calling was performed using the GenCall algorithm in the GenomeStudio Data Analysis software platform (Genotyping Module version 1.8.4).Non-HLA imputation was performed using the Michigan Imputation server, in which phasing was performed with Shapeit2, and imputation was performed with the haplotype reference consortium panel.After imputation, single-nucleotide polymorphisms (SNPs) were excluded based on a minor allele frequency of <0.01 and imputation accuracy of r 2 < 0.5.

HLA-DRB1 typing.
A semiautomated, reverse dot-blot method was used for HLA typing. 27,28For HLA-DRB1, amino acids at position 11, 71, and 74 were determined, and a four-digit HLA typing corresponding to the full and unambiguous amino acid sequence of the HLA protein was assigned. 28The estimated association with RA for each of the 16 possible HLA-DRB1 haplotypes have been described previously (Supplementary Table 1). 18,29sease-associated genetic variants.19][20][21]25 Diseasespecific ORs for each variant for each of the diseases (except gout) were determined, in which different ORs denote variants that contributed to the susceptibility of several diseases and ORs of 1.0 denote variants that are not associated with a particular disease (Supplementary Table 2). 25For gout, known genetic variants associated with serum urate concentrations were used to calculate ORs for susceptibility to gout as described previously. 21,25A total of 208 SNPs outside the HLA region and 42 HLA variants were included in the G-PROB settings for this analysis (Supplementary Table 2).
"Other disease" category.To account for patients with suspected inflammatory arthritis who have a condition other than the five most common aforementioned rheumatologic diseases, a sixth category entitled "Other disease" was devised by the creators of the G-PROB tool. 25The "Other disease" cohort will include a proportion of patients presenting to rheumatology clinic that will not have one of the five most common rheumatologic conditions described and will have a variety of other conditions, including chronic pain syndromes (such as fibromyalgia) and simple musculoskeletal injuries, for which there are limited known genetic variants.The ORs for the genetic risk scores of the "Other disease" group will be set to 1.0 to reflect that no data on the genetic risk profile are available for these patients.
Prevalence settings.G-PROB combines the weighted genetic risk score of each disease with an intercept to ensure that the mean probability is equal to the predefined disease prevalence.To enable comparison with the initial study that developed and tested G-PROB, the same estimates of outpatient clinic disease prevalence were used (Supplementary Table 3). 25,30,31The disease prevalence within the NOAR cohort was also calculated and G-PROB was tested using these disease prevalence settings as well for comparison (Supplementary Table 3).
Statistical analysis.All analyses were performed in R version 4.2. 32Six G-probabilities were generated for each patient, with one G-probability corresponding to the clinician diagnosis (denoted as a match) and the other five not (denoted as nonmatch).All G-probabilities for all patients were combined into one vector, and a corresponding binary disease match vector indicating clinician diagnosis match or nonmatch was created, with one of the six of the G-probabilities being "matched" and five of six being "nonmatched." G-PROB performance was assessed in three ways as described previously. 25First, density plots indicating distribution of G-probabilities that were matched compared with those that were nonmatched were created, for which good performance would find that matched G-probabilities were higher and distributed to the right compared with nonmatched G-probabilities.Mean matched and nonmatched G-probabilities were compared using an independent samples t-test.Second, calibration with clinician diagnosis match was determined by performing a linear regression without intercept, in which G-probabilities were the independent variable and binary disease match was the dependent variable.A regression coefficient (β) was determined for which the ideal calibration would have β = 1.0. 33Third, the ability of G-probabilities to correctly classify clinician diagnosis was assessed using the area under the curve (AUC)-receiver operating curve (ROC) of multiclass classifications, in which higher AUCs (from 0.5 to 1.0) indicate better classification. 34As described previously, AUCs were determined using two-vector data summaries of G-probabilities and clinician diagnosis match/nonmatch (known as micro-AUCs); however, macro-AUCs were also determined for comparison. 25,34,35ourth, performance of G-probabilities in suggesting likely and unlikely diagnoses was assessed by determining the NPVs and PPVs at several G-probability thresholds.
G-probabilities can range from 0% to 100%, for which a G-probability of 16.7% for all diseases would suggest all diseases are equally probable.Therefore, to further assess the discriminative abilities of different G-probability thresholds, arbitrary cutoffs of <5% and <20% for NPV and cutoffs of >20% or >50% for PPV, defined in the study in which G-PROB was developed, were assessed. 25The number of patients for which at least one, two, or three diseases could be considered less likely based on a G-probability threshold of <5% was determined, and the number of patients for which the highest G-probability corresponded to the clinician diagnosis was determined.Finally, scatterplots were created to investigate the PPV and NPV at different G-probability thresholds.
Patient and public involvement.Patients and the public were not specifically involved in this study.Deidentified data are available after reasonable request and ethical approval to the corresponding author.

RESULTS
Patient cohort characteristics and case assignment.From NOAR, 1,047 patients with genotype data and clinician diagnosis following case note review were included in the study (Figure 1).Clinical characteristics of patients according to clinician diagnosis were noted as well as for those with and those without retrievable clinical case notes (Supplementary Table 4).
G-probabilities by disease and distribution for correct and incorrect diagnoses.Six G-probabilities were created for each patient, resulting in 6,282 G-probabilities, of which 1,047 corresponded to clinician diagnosis and 5,235 did not (Table 1).The mean of G-probabilities matching clinician diagnosis was significantly higher than the G-probabilities not matching clinician diagnosis (12.2% vs 39.1%, P < 2.2 × 10 −16 ).Furthermore, the distribution of G-probabilities matching clinician diagnosis was clearly skewed to the right toward higher probabilities compared with the G-probabilities not matching clinician diagnosis (Figure 2).Distributions of G-probabilities for each disease group were also determined (Supplementary Figure 1).

Concordance of G-probabilities with cliniciandefined disease. Estimation of the regression line between
G-probabilities and clinician diagnosis match using linear regression, constraining the intercept to zero, demonstrated that the magnitude of G-probabilities was well calibrated with clinician diagnosis match status (β = 1.047,where β = 1.00 would indicate perfect test performance).It also showed that G-PROB is well calibrated for its intended use, that is, to produce G-probabilities that increase if the clinician diagnosis is more likely (Figure 3).ROC analysis.Micro-AUC analysis found that the ability of G-PROB to discriminate among different diseases was high, with an AUC of 0.849 (95% confidence interval 0.835-0.863)(Figure 4).ROC analysis on the individual disease level as well as macro-AUC analysis was determined and showed that DIAGNOSIS OF PATIENTS USING POLYGENIC RISK SCORES G-probabilities had low to moderate discriminative capabilities (Supplementary Figure 2).
NPV and PPV of G-probabilities.We found that 41.3% of the G-probabilities (n = 2,597 of 6,282) were <5%, corresponding to an NPV of 96.0% (Table 1).At the <5% threshold, it was possible to suggest that at least one disease was unlikely for 100% of patients (n = 1,047 of 1,047), at least two diseases were unlikely for 94% of patients (n = 984 of 1,047), and three or more diseases were unlikely for 53.7% of patients (n = 562 of 1,047).
We found that 48.8% of patients (n = 511 of 1,047) had a single G-probability >50%, corresponding to a PPV of 70.3%, whereas using a G-probability threshold of >20% resulted in a lower PPV at 40.7% (Table 1).In 70.3% of patients (n = 359 of 511), the disease with the highest G-probability corresponded to clinician diagnosis.Scatterplots were created to investigate changes   * Two lower G-probability thresholds (<5% and <20%) for suggesting unlikely diagnoses and two upper G-probability thresholds (>20% and >50%) for suggesting likely diagnosis were evaluated.Each patient has six G-probabilities (one for each disease) generated.The number of patients with at least one G-probability at the given threshold are displayed out of 1,047 patients, as well as the number of G-probabilities overall at the given threshold out of 6,282 G-probabilities generated (6 per patient).NPV, negative predictive value; PPV, positive predictive value.
in NPV and PPV at different G-probability thresholds at which NPV and PPV were inversely correlated, with higher NPV and lower PPV at lower G-probability thresholds (Supplementary Figure 3).

DISCUSSION
In the first real-world validation of the G-PROB tool, we found first, that G-PROB produces meaningful, easily interpretable outputs; second, that G-PROB is especially helpful at suggesting unlikely diagnoses; and finally, that G-PROB showed good, although not perfect, agreement with likely clinician diagnoses.Our results demonstrate that the G-PROB tool can be easily set up and used to generate G-probabilities.These G-probabilities are easy to interpret, and we demonstrate that they are meaningful based on G-probabilities matching clinician diagnosis being significantly higher compared with those that do not match and very good calibration of G-probabilities with clinician diagnosis, indicating that increases in G-probabilities are concordant with increases in proportions of clinician diagnosis match, as would be expected (Figure 2).Furthermore, ROC-AUC analysis showed that G-probabilities had good performance at discriminating clinician diagnosis.Overall, these assessments suggest that the G-probabilities generated by G-PROB are meaningful, and that increases and decreases in G-probabilities occur in a way that is appropriate.
We found that NPV was high at G-probability thresholds of <5% and <20%.This suggests that, when interpreting G-PROB results in clinic, we can be especially confident that diseases with G-probabilities below these thresholds are unlikely.3][14][15][16] It is important to note that the tool itself, and therefore its current performance, is directly linked to how much is known about the genetic risks of the diseases assessed, so as more susceptibility loci are discovered for each disease, performance may improve further.Furthermore, integrating clinical risk factors such as demographics, serology, and comorbidities with PRS has the potential to improve predictive performance, including PPV, as has been demonstrated in studies in atherosclerotic cardiovascular disease. 36here are several strengths of our study including the use of data from the NOAR cohort, which is a large, real-world observational cohort of patients who present with early inflammatory arthritis and are suspected to have a rheumatologic disease.Use of this cohort addresses several limitations of the original study, which assessed performance of G-PROB in a smaller sample size and used patients from a biobank. 25Furthermore, we have compared G-PROB's performance with clinician diagnosis as opposed to classification criteria, which is more appropriate in the real-world setting given that patients in the real world may not satisfy classification criteria but will still receive diagnoses based on expert clinical opinion.Overall, the fact that G-PROB performs well in the NOAR cohort and using clinician diagnosis strengthens the evidence that the tool may be helpful in real-world clinical practice.
We acknowledge several limitations of our study.First, the NOAR cohort only included patients who identified as White; therefore, we extrapolate performance to patients of other ethnicities.As methods to improve generalizability of known genetic risk variants using transancestry data are developed, this issue may be resolved in the future, but testing of the tool in patients from other ancestries is required. 37Future genetic studies such as functional genomics studies, which discover new diseaseassociated variants and functionally annotate known genetic risk variants with target genes and regulatory elements, can be incorporated into further iterations of the G-PROB tool, which may further improve its cross-ancestry performance and utility. 38,39 second limitation relates to power; despite including a sample size nearly four times larger than the original study, 25 we still had relatively low numbers of patients with diagnoses less commonly made in patients with suspected early inflammatory arthritis (including gout) (Figure 1).In the context of NOAR and UK clinical practice, this is likely because patients with gout are diagnosed by clinicians outside of the rheumatology outpatient setting (eg, in primary care or in hospital).Low numbers of cases in certain categories potentially overestimate the tool's ability (especially the NPV) by overemphasizing its ability to rule out uncommon diagnoses.Furthermore, because the selection of disease prevalence settings is crucial in the set-up and generation of G-probabilities, care must be taken when implementing the The AUC summarizes the AUC (micro-AUC) of the pooled data for all diseases.AUC, area under the curve.

DIAGNOSIS OF PATIENTS USING POLYGENIC RISK SCORES
G-PROB tool in clinical practice to ensure that the appropriate disease prevalence is selected to reflect the disease prevalence in a given clinic population.
In the initial study describing G-PROB, it was found in a cohort of 197 patients, of which 35% were initially misdiagnosed on first presentation to a rheumatology clinic, that for 65% of patients, the G-probability of the correct disease was higher than the G-probability of the initial diagnosis of the rheumatologist. 25Unfortunately, the NOAR study was not set up to record misdiagnoses, and therefore, a third limitation of our study is that we could not validate these findings because "initial diagnosis" and subsequent "follow-up/final diagnosis" were not recorded.
Finally, our study has demonstrated that there is strong correlation between the G-probabilities generated by genetic data from known genetic risk alleles and the final clinician diagnosis; however, in both our study and the initial study describing G-PROB, there is a heterogeneous group of patients in the "Other disease" category.In the "Other disease" category, there are patients with inflammatory conditions (such as polymyalgia rheumatica) as well as those with noninflammatory conditions (such as chronic pain syndromes).Therefore, a fourth limitation that neither our nor the initial G-PROB study could address was whether G-PROB can assist in improving the accuracy of diagnosis for patients who exist in the heterogeneous "Other disease" category.
In conclusion, we have shown that the G-PROB tool and G-probabilities are meaningful using assessments based on real-world observational data and clinician diagnoses.Further work is needed in the form of prospective studies to assess the added value of genetic tools such as G-PROB in addition to nongenetic factors such as serology and other clinical information.Additionally, studies aimed at evaluating the views of clinicians and patients with regards to the acceptability and usefulness of using genetic information in clinical practice as well as studies assessing the cost-benefit of genotyping patients and using genetic information in the clinical setting are needed before this technology is integrated in the modern outpatient clinic.

Figure 2 .
Figure 2. Distribution of G-probabilities that matched clinician diagnosis (blue) and G-probabilities that did not match clinician diagnosis (red).Ideally, the correct diagnosis should have higher probabilities, and the distribution should be skewed to the right, with incorrect diagnoses having lower probabilities with a distribution skewed to the left.

Figure 3 .
Figure 3. Linear regression model without intercept of concordance of G-probabilities with clinician diagnosis match, with the x-axis showing G-probabilities and the y-axis showing a binary outcome of concordance with clinician diagnosis (y = 1) and nonconcordance with clinician diagnosis (y = 0), with a β (regression coefficient) of 1.047.Ideally, the higher the inferred G-probability for a disease, the more likely that it is the actual diagnosis.Linear regression comparing G-probabilities and disease match (yes/no) constraining the intercept to zero is shown as a solid blue line.A β of one indicating exact calibration is shown as a dashed black line.For visualization, G-probabilities were placed into five equally sized bins, and we plotted the proportion of instances in which predicted disease is concordant with clinician diagnosis.Color figure can be viewed in the online issue, which is available at http:// onlinelibrary.wiley.com/doi/10.1002/art.42760/abstract.

Figure 4 .
Figure 4. Receiver operative curve analysis assessing the discriminative ability of G-probabilities to correspond with clinician diagnosis.The AUC summarizes the AUC (micro-AUC) of the pooled data for all diseases.AUC, area under the curve.

Table 1 .
Performance of G-probabilities in suggesting likely and unlikely diagnoses at different thresholds*