Performance of gout definitions for genetic epidemiological studies: analysis of UK Biobank

Background Many different combinations of available data have been used to identify gout cases in large genetic studies. The aim of this study was to determine the performance of case definitions of gout using the limited items available in multipurpose cohorts for population-based genetic studies. Methods This research was conducted using the UK Biobank Resource. Data, including genome-wide genotypes, were available for 105,421 European participants aged 40–69 years without kidney disease. Gout definitions and combinations of these definitions were identified from previous epidemiological studies. These definitions were tested for association with 30 urate-associated single-nucleotide polymorphisms (SNPs) by logistic regression, adjusted for age, sex, waist circumference, and ratio of waist circumference to height. Heritability estimates under an additive model were generated using GCTA version 1.26.0 and PLINK version 1.90b3.32 by partitioning the genome. Results There were 2066 (1.96%) cases defined by self-report of gout, 1652 (1.57%) defined by urate-lowering therapy (ULT) use, 382 (0.36%) defined by hospital diagnosis, 1861 (1.76%) defined by hospital diagnosis or gout-specific medications and 2295 (2.18%) defined by self-report of gout or ULT use. Association with gout at experiment-wide significance (P < 0.0017) was observed for 13 SNPs with gout using the self-report of gout or ULT use definition, 12 SNPs using the self-report of gout definition, 11 SNPs using the hospital diagnosis or gout-specific medication definition, 10 SNPs using ULT use definition and 3 SNPs using hospital diagnosis definition. Heritability estimates ranged from 0.282 to 0.308 for all definitions except hospital diagnosis (0.236). Conclusions Of the limited items available in multipurpose cohorts, the case definition of self-report of gout or ULT use has high sensitivity and precision for detecting association in genetic epidemiological studies of gout.


Background
Accurate case definition is important for epidemiological studies. However, in multipurpose cohort studies frequently used for genetic epidemiological studies of gout, limited information is usually available for case definition. Many different combinations of available data have been used to identify gout cases in large genetic studies. For example, in the Global Urate Genetics Consortium study, the largest genome-wide association study (GWAS) of hyperuricaemia and gout reported to date, 15 different definitions of gout were used [1].
Population genetic studies frequently require large numbers of participants to achieve adequate statistical power, because common variants typically exert small effects on risk of disease. Within a study population, accurate case definition improves study power by maximising the number of true cases and minimising the number of falsely attributed disease-free control participants [2]. Consistent case definition is important for analyses that pool genetic data from different studies, as well as for those analyses that aim to replicate reported genetic associations.
Authors of a recent analysis of the Study for Updated Gout Classification Criteria (SUGAR), using synovial fluid confirmation of monosodium urate crystals as the gold standard for gout definition, reported that the definition of self-report of gout or urate-lowering therapy (ULT) use had the best test performance characteristics of existing definitions used in epidemiological studies [3]. The aim of the present study was to determine the performance of case definitions of gout using the limited items available in multipurpose cohorts, including self-report of gout or ULT use, for population-based genetic studies.

Methods
This research was conducted using the UK Biobank Resource (approval number 12611) [4]. Data from the first tranche of UK Biobank genotyping and imputation data were used for this analysis (made publicly available in May 2015). Inclusion criteria were European ethnicity, age 40-69 years and genome-wide genotypes available. Exclusion criteria were self-reported sex mismatch with genetic sex, genotyping quality control failure, related individuals, either a primary or secondary hospital diagnosis of kidney disease (International Classification of Diseases, Tenth Revision (ICD-10), codes I12, I13, N00-N05, N07, N11, N14, N17-N19, Q61, N25.0, Z49, Z94.0, Z99.2), participants aged 70 years and over, and those with kidney disease, because these are risk factors for secondary gout.
Gout definitions and combinations of these definitions were identified from previous epidemiological studies [1,3,5]. Self-report of gout was defined by reporting of gout by the participant at the time of the study interview. Hospital diagnosis of gout was defined by either primary or secondary hospital discharge coding for gout (ICD-10 code M10, including sub-codes). Use of ULT required selfreport of being on any of allopurinol, febuxostat or sulphinpyrazone and not having a hospital diagnosis of leukaemia or lymphoma (ICD-10 codes C81-C96). Winnarddefined gout was hospital diagnosis of gout or goutspecific medication (ULT or colchicine) as reported by Winnard et al. [5]. For participants who did not meet any gout definitions, further exclusion criteria were corticosteroid use, non-steroidal anti-inflammatory drug use or probenecid use.
UK Biobank samples had been genotyped using an Axiom array (820,967 markers; Affymetrix, Santa Clara, CA, USA) and imputed to approximately 73.3 million single-nucleotide polymorphisms (SNPs) using SHA-PEIT3 and IMPUTE2 with a combined UK10K and 1000 Genomes reference panel. Logistic regression of SNPs against gout as the outcome was performed, adjusting for age, sex, waist circumference, and ratio of waist circumference to height. We analysed 30 urate-associated SNPs reported by Köttgen et al. in the large (>140,000 European participants) Global Urate Genetics Consortium GWAS [1]. Data were reported on the basis of number of SNPs detected at both genome-wide significance (P < 5 × 10 −8 ) and experiment-wide significance (P < 0.0017). CIs for proportions were calculated using the Wilson score method and www.openepi.com [6]. Heritability estimates were compared using the formula h1-h2 (se = sqrt(se1^2 + se2^2)).
Heritability estimates under an additive model were generated using GCTA version 1.26.0 [7] and PLINK version 1.90b3.32 [8] by partitioning the genome. To reduce computational time, a smaller control cohort of 10,000 individuals was randomly generated from the UK Biobank and used for each set of cases. SNPs were filtered for deviation from Hardy-Weinberg equilibrium (P > 1 × 10 −6 ) and minor allele frequency >0.01. A genetic relationship matrix was created for each chromosome, which was then used to calculate heritability assuming a prevalence of gout of 2% in the general population.

Results
Data including genome-wide genotypes were available for 105,421 participants. Demographic and clinical data for the entire study group are shown in Table 1. Mean age was 56.87 years; 49.18% participants were male; and mean body mass index was 27.36 kg/m 2 . Figure 1 shows the number of cases identified by each gout definition. There was substantial overlap between most definitions. However, for those who met the hospital diagnosis criteria, 126 (33.0%) of 382 did not meet the self-report of gout or ULT use definition. Table 2 shows the prevalence of gout identified by each gout definition in the entire study population and in men and women. The hospital diagnosis definition detected the fewest number of cases (n = 382, study population prevalence 0.36%). Definitions including self-report of gout detected significantly more cases than other definitions, with the definition of self-report of gout or ULT use detecting the highest number of cases (n = 2295, study population prevalence 2.18%). Analysis of the urate-associated SNPs described by Köttgen et al. [1] showed similar ORs for all gout definitions ( Fig. 2, Table 3). However, the number of SNPs associated with gout at genome-wide or experiment-wide significance differed depending on gout case definition. Association with gout at genome-wide significance (P < 5 × 10 −8 ) was observed for five SNPs (ABCG2, SLC2A9, GCKR, SLC17A3 and SLC22A12) with gout defined by self-report of gout or ULT use, five SNPs (ABCG2, SLC2A9, GCKR, SLC17A3 and SLC22A12) with gout defined by self-report of gout, four SNPs (ABCG2, SLC2A9, GCKR and SLC17A3) with gout defined by the Winnard definition [5], three SNPs (ABCG2, SLC2A9 and GCKR) with gout defined by ULT use and two SNPs (ABCG2 and SLC2A9) with gout defined by hospital diagnosis.
Association with gout at experiment-wide significance (P < 0.0017) was observed for 13 SNPs with gout defined by self-report of gout or ULT use, for 12 SNPs with gout defined by self-report, for 11 SNPs with gout defined by the Winnard definition, for 10 SNPs with gout defined by ULT use, and for 3 SNPs with gout defined by hospital diagnosis (Table 3). The heritability estimates (i.e., proportion of variance in gout explained by common inherited genetic variants under an additive model of inheritance) were 0.289 (0.034) for the self-report of gout or ULT use definition, 0.283 (0.036) for the self-report of gout definition, 0.282 (0.040) for the Winnard definition, 0.308 (0.044) for the ULT use definition and 0.236 (0.160) for the hospital diagnosis definition. There were no significant differences between the heritability estimates.

Discussion
Accurate and consistent phenotyping of cases and disease-free control participants is important to maximise study power and reduce the risk of misclassification bias in genetic association studies. Consistent definitions of disease phenotypes are also important for replication of genetic associations in different cohorts [9]. In this analysis of UK Biobank data, the definition of self-report of gout or ULT use detected the highest number of gout cases and had greatest precision for genetic association analysis.
Our findings are consistent with a recent analysis of the SUGAR cohort that used synovial fluid confirmation of monosodium urate crystals as the gold standard for gout definition [3]. The SUGAR analysis reported that the definition of self-report of gout or ULT use had the best test performance characteristics of existing definitions, with sensitivity of 82% and specificity of 72%. Collectively, these data support the use of the self-report of gout or ULT use definition for use in epidemiological studies when more detailed gout-specific clinical data are not available.
The different definitions of gout used in this study may reflect different disease presentations or patient populations. Although not all patients were captured by  any definition, there was substantial overlap between most definitions. The definition of hospital diagnosis is very restrictive and is unlikely to capture most people with gout. Of note, 126 (33.0%) of 382 of those who met the hospital diagnosis criteria did not meet the self-report of gout or ULT use definition. There may be several reasons for this. First, the hospitalised population may have a different disease presentation from that of those identified in the community through self-report or ULT use. Furthermore, a diagnosis of gout made during a hospital admission may subsequently be revised to a different diagnosis, and the ascertainment methodology does not take this into account. Compared with the case definition of self-report of gout or ULT use, the Winnard definition led to a lower estimated prevalence of gout and also had lower precision for genetic association analysis. Therefore, when self-report information is available, we recommend the definition of self-report of gout or ULT use. For all definitions tested, ABCG2 and SLC2A9 were associated with gout at genome-wide significance. These genes encode proteins that regulate uric acid transport within the gut and proximal renal tubule, respectively. The large effect sizes observed in this study are reminiscent of their dominant effect sizes in GWAS of control of serum urate levels [1], consistent with the central role of these two genes in regulating serum urate and gout risk. As part of evaluating the various definitions, we also calculated heritability estimates of gout, with the proportion of age-, sex-and body composition-adjusted variance explained by all common SNPs to be 0.282-0.308 (excluding the hospital definition). Previously, Köttgen et al. [1], also using GCTA software, had estimated a range of genome-wide heritability estimates of 0.27-0.41 for age-and sex-adjusted serum urate levels, depending on the individual sample sets analysed. The estimates of variance explained in serum urate and gout by common genetic variants in the European sample sets are comparable, suggesting that the common genetic variant-mediated heritabilities of serum urate levels and gout are similar. Clearly, environmental factors also contribute to the risk of gout, such as dietary   ULT Urate-lowering therapy Data are adjusted by age, sex, waist circumference, and waist-to-height ratio. Experiment-wide significance is defined as P < 0.0017. Hyperuricaemia single-nucleotide polymorphisms are as described by Köttgen et al.
[1] using different gout definitions exposures and medications. The heritability estimates use information from common SNPs under the assumption of additive contributions. Therefore, the estimates will not include the contribution of non-additive geneby-gene and gene-by-environment interactions, rare genetic variants and copy number variations. We acknowledge that our study has limitations. The analysis was restricted to European participants, and our genetic association results may not be generalisable to non-European populations. Furthermore, a definition that includes ULT may be less specific for gout if the study population is recruited from countries in which ULT is recommended for treatment of asymptomatic hyperuricaemia. A diagnostic gold standard was not available in this study, and therefore it is not possible to determine the false-positive or false-negative rates using this dataset. Disease validation was based on the genotype data available in this cohort, and gout was inferred on the basis of known genetic associations with hyperuricaemia and gout. The strength of association observed in this study population may not reflect findings in the general UK population; risk factors for secondary gout (age ≥70 years and kidney disease) were exclusion criteria. The study findings also are not applicable to studies in which researchers do not collect information about self-report of gout or gout medication use. Our study's strengths include the large sample size with consistent data collection. The comprehensive data collection, including patient interviews, hospitalisation records and medication information, allowed us to compare a number of different case definitions within a single study.

Conclusions
The case definition of self-report of gout or ULT use has high precision for detecting association in genetic epidemiological studies of gout. When these variables are available within multi-purpose cohorts, the consistent use of this case definition should reduce the risk of misclassification bias and improve study power.