High-resolution HLA genotyping in inclusion body myositis refines 8.1 ancestral haplotype association to DRB1*03:01:01 and highlights pathogenic role of arginine-74 of DR β 1 chain

Objectives: Inclusion body myositis (IBM) is a progressive inflammatory-degenerative muscle disease of older individuals, with some patients producing anti-cytosolic 5 ′ -nucleotidase 1A (NT5C1A, aka cN1A) antibodies. Human Leukocyte Antigens (HLA) is the highest genetic risk factor for developing IBM. In this study, we aimed to further define the contribution of HLA alleles to IBM and the production of anti-cN1A antibodies. Methods: We HLA haplotyped a Western Australian cohort of 113 Caucasian IBM patients and 112 ethnically matched controls using Illumina next-generation sequencing. Allele frequency analysis and amino acid alignments were performed using the Genentech/MiDAS bioinformatics package. Allele frequencies were compared using Fisher ’ s exact test. Age at onset analysis was performed using the ggstatsplot package. All analysis was carried out in RStudio version 1.4.1717. Results: Our findings validated the independent association of HLA-DRB1*03:01:01 with IBM and attributed the risk to an arginine residue in position 74 within the DR β 1 protein. Conversely, DRB4*01:01:01 and DQA1*01:02:01 were found to have protective effects; the carriers of DRB1*03:01:01 that did not possess these alleles had a fourteenfold increased risk of developing IBM over the general Caucasian population. Furthermore, patients with the abovementioned genotype developed symptoms on average five years earlier than patients without. We did not find any HLA associations with anti-cN1A antibody production. Conclusions: High-resolution HLA sequencing more precisely characterised the alleles associated with IBM and defined a haplotype linked to earlier disease onset. Identification of the critical amino acid residue by advanced biostatistical analysis of immunogenetics data offers mechanistic insights and future directions into uncovering IBM aetiopathogenesis.


Introduction
Inclusion body myositis (IBM) is a chronic inflammatory disease of skeletal muscles affecting middle-aged and older adults.It has a distinctive pattern of muscle involvement, primarily impacting anterior limb muscles, including quadriceps femoris, tibialis anterior and finger flexors, leading to a gradual decline in strength and mobility.More than two-thirds of patients experience impaired swallowing due to pharyngeal muscle involvement, which when severe may result in nutritional deficiencies and recurring aspiration pneumonia [1].A 2017 meta-analysis documented a worldwide prevalence of 24.8 cases per million individuals with a growing trend over time, possibly attributed to increasing disease awareness and improved diagnosis [2].Despite the technological advances in diagnostic methods, IBM patients still experience, on average, a five-year delay between symptom onset and diagnosis [3,4].Furthermore, there are currently no effective disease-modifying treatments for IBM, and misdiagnosis may result in the prescription of inadequate and potentially deleterious immunosuppressive therapies [1,5].
IBM pathology is complex and encompasses a combination of myofibre degeneration with disrupted proteostasis, mitochondrial abnormalities, and autoimmune manifestations.While the immune response is dominated by CD8 + T cells invading non-necrotic myofibers [6], infiltrates of CD4 + T cells and B cells have also been reported [7].Additionally, the humoral autoimmune response features the production of antibodies against a muscle enzyme, cytosolic 5′-nucleotidase 1A (NT5C1A, aka cN1A), reported in 33-72 % of patients [8][9][10][11][12].The interaction between T lymphocytes and antigenic peptides presented by major histocompatibility complex (MHC) molecules underpins adaptive cellular immune responses, including those against self.In humans, the MHC gene cluster is located on the short arm of chromosome 6 (6p21.3).It includes highly polymorphic alleles encoding Class I and Class II human leukocyte antigen (HLA) molecules that mediate the presentation of antigenic peptides to CD8 + and CD4 + T cells, respectively.HLA alleles are found at variable prevalence in different ethnicities.Indeed, some combinations of alleles are preferentially co-inherited and occur at non-random frequencies due to linkage disequilibrium (LD) [13].The human MHC cluster exhibits a high degree of LD between HLA gene loci, resulting in extended lengths of conserved DNA sequences denoted as ancestral haplotypes (AH) [14].
Since the 1990s, genetic association studies have consistently reported the association of HLA-DRB1*03:01 (originally reported as the HLA-DR3 serotype [15]) and the 8.1 ancestral haplotype (8.1 AH) with IBM in Caucasian individuals [16][17][18][19][20].There have also been reports linking HLA-DRB1*01:01 [20,21], A*03 [22], and DQB1*05 [22] to the disease.Notably, prior studies were constrained by their use of serological or genotypic data with limited resolution, which provided only a broad characterisation of allele groups.More recently, Rothwell and colleagues applied GWAS analysis to Caucasian IBM patients sourced from 11 countries through the MYOGEN consortium; HLA allele imputation from SNP polymorphisms showed independent associations with three alleles: HLA-DRB1*03:01, *01:01 and *13:01 [23].However, the identity of secondary risk alleles, and of protective alleles, has yet to be confirmed in other populations using higher resolution genotyping methods.
Our case-control study aimed to further refine IBM genetic associations within the 8.1 AH by applying high-resolution direct sequencing of HLA gene loci.We employed 3-field-resolution HLA genotyping enabled by Illumina next-generation sequencing, focusing specifically on individuals of Caucasian ethnic ancestry who constitute over 98 % of people with IBM in Western Australia.

Participant recruitment
Study participants were recruited through the myositis clinics at Murdoch University and Perron Institute for Neurological and Translational Science in Perth, Western Australia.A total of one hundred and thirteen Caucasian IBM patients and one hundred and twelve ethnically matched controls donated blood or saliva samples for the study after providing informed consent.Participants' ethnicity as Caucasian was self-reported (N = 167) or reported by treating neurologist or nurse (N = 58).The term Caucasian relates to white individuals of European descent; however, the sub-populations making-up Caucasians vary significantly between geographical locations; therefore, we matched our Australian cohort of Caucasian IBM patients with a control Caucasian cohort from the same region in order to limit the possibility of genetic variations between the two groups due to a different composition of their ancestry.To our knowledge, none of the study participants were related by birth.All the patients were diagnosed by an experienced neuromuscular neurologist as having IBM based on history, clinical examination, and where available, serological findings.Many patients underwent a confirmatory biopsy, (62 % could be traced), and diagnosis confirmed in accordance with the ENMC criteria [23].Some patients were unable or unwilling to have biopsies, and some were not available for review.Age at disease onset was self-reported; patients were asked to recollect their initial symptoms and the year in which they first became aware of them.
All participants provided informed consent prior to enrolment to the study and sample donation.The research protocol was evaluated and approved by the Murdoch University Human Research Ethics Committee (projects 2015/111 and 2020/188).

DNA extraction
DNA was extracted from either blood or saliva.Venous blood was collected into lithium heparin-coated tubes (BD Vacutainer, Becton Dickinson, Australia).The buffy coat was separated by centrifugation and stored at − 80 • C until processing.DNA was extracted from the buffy coat using a QIAamp DNA Blood It (QIAGEN, Hilden, Germany) according to the manufacturer's protocol.
Saliva samples were self-collected using Oragene DNA collection kits (DNA Genotek, Ontario, Canada) and stored at ambient temperature until processing.DNA was extracted from saliva using PrepIT-L2P kit (DNA Genotek, Ontario, Canada) according to the manufacturer's protocol.The concentration and purity of the extracted DNA were measured using a Nanodrop spectrophotometer at 260/280 nm and by resolving DNA aliquots on a 1 % agarose gel.

HLA genotyping
HLA genotyping was conducted by Illumina next-generation sequencing at the Institute for Immunology and Infectious Diseases, Murdoch University, which is accredited by the American Society for Histocompatibility and Immunogenetics (ASHI) and the National Association of Testing Authorities (NATA).The validated sequencing protocol using locus-specific PCR amplification of genomic DNA has been described in Currenti et al. [24].Briefly, polymorphic regions of HLA Class I A, B, C (exons 2, 3) and Class II DRB1, 3, 4, 5 (exons 2, 3), DQB1 (exons 2, 3), DQA1 (exon 2), and DPB1 (exon 2) were PCR amplified using sample-specific Molecular IndexeD (MID) primers.Amplicons were quantified, normalised, and pooled in equimolar ratios.Sequencing libraries were created and quantified using the Jetseq qPCR Library Quantification Kit (Meridian Bioscience Inc., OH, USA).Samples were sequenced on an Illumina MiSeq platform using the MiSeq V3 600-cycle kit (2 × 300 base pair reads) (Illumina Inc., CA, USA).Reads were quality-filtered, separated by MID tags and passed through an in-house accredited HLA allele caller software.Alleles were called according to the latest IPD-IMGT/HLA nomenclature [25].Allele identity assignation was based on G groups that combine sub-alleles with identical nucleotide sequences across the sequenced exons but polymorphism across the other exons.Additionally, some DQB1 and DRB1, 3, 4, 5 alleles were assigned an i group.These alleles were identical across the sequenced exons 2 and 3 but did not have reference data in the IPD-IMGT database for exon 3.For ease of analysis and in-text reporting, the G or i group designations have been removed; however, the full names of the alleles identified in this study featuring the G and i designations are listed in Supplementary Table 1, and the corresponding sub-allele list can be reconstructed using the HLA ambiguity tables located at https://www.iiid.com.au/laboratory-testing(v.3.32.0 to v.3.51.0).The data presented for the HLA allele sequences are deposited in the NCBI Sequence Read Archive (SRA) repository with the accession number PRJNA1042617.http://www.ncbi.nlm.nih.gov/bioproject/1042617.

Anti-cN1A ELISA
IBM patients' blood was collected by venepuncture into polymer-gel SST tubes (BD Vacutainer).The serum phase was separated by centrifugation and frozen at − 80 • C until further processing.Serum anti-cN1A antibodies were analysed by a semi-quantitative ELISA adapted from Bundell et al. [26].96-well plates (Maxisorp, Nunc, Roskilde, Denmark) were coated with 50 μL/well of full-length cN1A protein (GenScript, NJ, SA) diluted to 10 μg/ml in carbonate-bicarbonate buffer (pH 9.6) for 2 h at ambient temperature.Wells were washed with PBS/0.1 % Tween (PBST) and saturated with blocking buffer (PBST/5 % skim milk powder) overnight at 4 • C.After washing with PBST, 100 μL of a patient serum diluted to 1:1000 in blocking buffer was added in duplicate and incubated for 2 h at ambient temperature.Wells were washed with PBST before adding horseradish peroxidase (HRP)-conjugated anti-human secondary antibodies directed against pan IgG/M/A or IgG or IgM or IgA (Invitrogen, Rockford, IL, USA) and incubating for 1 h at ambient temperature.Wells were washed again before incubation with 50 μL per well TMB solution (ThermoFisher Scientific, Waltham, MA, USA) for 10 min.The reaction was stopped with 50 μL of 2 M H 2 SO 4 solution.The absorbance at 450 nm was read using the FLUOstar Omega (BMG Labtech, Mornington, VIC, Australia) microplate reader.Each plate included a positive control of anti-cN1A antibodies purified from a seropositive patient's serum.Pooled sera of forty-three healthy volunteers diluted 1:1000 in blocking buffer was used as baseline control.Blank duplicates were obtained by performing all the steps in the absence of a sample.All absorbance values were adjusted by the average blank value.Absorbance obtained for patients' samples was recorded as a fold change relative to baseline control.Reference cut-off values corresponded to the 99th percentile of the healthy samples (Busselton Population Health Study participants, n = 190).Patients with increased value for at least one antibody isotype were considered seropositive.

Statistical analysis
HLA allele frequencies and amino acid position differences were analysed using Genentech/MiDAS package v.1.1.0[27].Genotype frequencies were compared using Fisher's exact test.P values were adjusted using the Bonferroni method (denominated p a ) and considered statistically significant at p a < 0.05; odds ratios and 95 % confidence intervals (CI) were calculated.Independent associations were computed using stepwise conditional regression in MiDAS.Age at onset comparison was performed using ggstatsplot package v.0.8.0 [28].Analysis was carried out in RStudio v.1.4.1717 [29].The MiDAS package supports the HLA allele frequency distribution analysis under multiple inheritance models.Within the "additive" model, each instance of an allele is counted as an individual event.Conversely, the "dominant" model treats both heterozygous and homozygous carriers as a single event.Since it is yet to be established whether the effects of HLA alleles in IBM are dose-dependent, meaning that the impact of an allele is more pronounced in homozygous than in heterozygous carriers, the "additive" inheritance model was applied for our initial analysis.Subsequently, the analysis was reiterated using the "dominant" inheritance model, and the outcomes were compared.

HLA-DRB1
Comparison of the allele frequency distribution between IBM patients and controls under the "additive" inheritance model confirmed a strong positive association with the alleles of the 8.To account for the strong LD within the MHC region, we repeated the analysis while stepwise conditioning upon the most strongly associated variable or variables added to the model as covariates until the model fit could no longer be improved.As a result, only HLA-DRB1*03:01:01 (p a = 1.516 × 10 − 8 , OR = 6.187) of the 8.1 AH retained its significant positive association with the disease, while the association of the remaining alleles was no longer statistically significant.However, the independent negative association of DRB4*01:01:01 (p a = 2.371 × 10 − 4 , OR = 0.225) and DQA1*01:02:01 (p a = 3.463 × 10 − 3 , OR = 0.246) were confirmed (Table 1).
Allele frequency comparison was then carried out using the MiDAS "dominant" inheritance model combined with the conditional analysis.The results were comparable to those obtained under the "additive" model (Supplementary Tables 2 and 3).

Arginine-74 within the DRβ1 protein confers the associated risk, while glutamine at this position is protective
Next, we sought to locate the risk attributed to HLA-DRB1*03:01:01 at the level of a specific amino acid residue.To achieve this, we employed the MiDAS "hla_aa" function, which imports allele amino acid sequences from the IPD-IMGT/HLA database and aligns them, enabling the identification of variable residues.The effect of each residue is then estimated using the likelihood ratio test.Supplementary Table 4 contains the complete list of the variable amino acid residues identified from our data set.As before, we subjected the findings to stepwise conditional analysis, identifying amino acid in position 74 encoded within the DRB1 locus as highly significant (p a = 1.08 × 10 − 11 ) with a likelihood ratio of 69.40 (Table 2).
Given the significance of DRB1_74, we looked at the effect sizes of all residues in that position ( Table 3).We found that the frequency of arginine (R) was significantly increased in IBM patients (38.05 % vs 13.84 %, p a = 1.547 × 10 − 9 , OR = 6.159), whereas the frequency of glutamine (Q) was significantly decreased (1.77 % vs 13.84 %, p a = 1.050 × 10 − 4 , OR = 0.096) compared to the control group.

No HLA association with anti-cN1A serostatus
Our cohort comprised ninety-one IBM patients with known anti-cN1A serostatus, twenty-six (28.6 %) of whom were seropositive.Additionally, we tested thirty control donors among whom one individual (3.3 %) was determined to be seropositive.
The comparison of HLA allele frequencies between the anti-cN1A seropositive and seronegative subgroups utilising both the "additive" and the "dominant" inheritance models did not identify statistically significant differences after P-value adjustment (Supplementary Tables 5 and 6).

Discussion
Herein we present what we believe to be the first case-control study utilising high resolution next-generation sequencing to investigate HLA association with IBM in Caucasians to identify alleles associated with the disease.NGS directly sequences the HLA genes, reducing the reliance on imputation methods which have been used in recent GWAS studies [23]; it provides higher genotyping resolution and prevents possible inaccurate calls for low-frequency and rare alleles [31].In agreement with previous genomic analyses, we observed higher frequencies of the alleles linked within the 8.1 ancestral haplotype among IBM patients compared to the reference group of the same ethnicity.However, conditional statistical analysis revealed that only HLA-DRB1*03:01:01 independently conferred an increased risk of developing the disease.This finding corroborates conclusions from the previous multi-national GWAS study by Rothwell and colleagues, who noted the loss of significance within the 8.1 AH when conditioned upon the strong disease association of DRB1*03:01 [23].However, we failed to confirm the association with DRB1*13:01 and *01:01 reported in their study, possibly because of the variable population mix of Caucasians from different European countries and the USA in their heterogeneous control group, compared with our Australian Caucasian control cohort.Applying "additive" and "dominant" inheritance models to our analysis led to comparable outcomes indicating the absence of a discernible dose-dependent impact of the reported alleles on IBM pathology.
It is important to note that the Illumina next-generation sequencing technology utilised in our study detects alleles based on their highly polymorphic regions that encode the extracellular antigen-presenting domains of MHC proteins and not on the complete genetic sequence.Thus, there is potential for ambiguity among alleles sharing identical sequences in exons 2 and 3 (exon 2 only for DQA1 and DPB1) while differing in the more conserved exons.For instance, the HLA-DRB1*03:01:01i group encompasses over fifty distinct alleles (for a comprehensive allele list, refer to https://www.iiid.com.au/laboratory-testing).However, in our Caucasian cohort, the true HLA-DRB1*03:01:01 allele is the most probable due to its notably higher prevalence in this population.On this account, we investigated which amino acid residues contributed to the risk associated with the DRB1*03:01:01i alleles.Our analysis pinpointed the risk association to the presence of arginine in position 74.In contrast, glutamine in this position appears to protect against the disease as this residue is underrepresented in IBM patients.Curiously, arginine-74 has also been implicated in the pathogenesis of myopathies other than IBM [19].For instance, the GWAS study on the MYOGEN cohort has resulted in the identification of arginine-74, as well as asparagine-77, both located in the binding grove of the DRβ1 protein, as highly associated with idiopathic inflammatory myopathies (IIM) as a combined group of diseases [23].Analysis of IIM subsets, however, reported more heterogenous findings.In the same study, residue 74 of HLA-DRB1*03 was found associated with polymyositis but not with dermatomyositis (combined juvenile [JDM] and adult-onset [DM]), where an independent association of residue 57 in HLA-DQB1 was found [23].In contrast, a more recent study focused on a cohort of DM patients, reported association of residue-74 with adult-onset DM while a stronger association of residue-37 and only a weak association of residue-74 with JDM [32].The authors suggested that this difference may be driven by the association of this residue with anti-Jo-1, anti-PM/Scl and anti-cN1A antibodies which are more prevalent in adult patients [33].Of note, we could not confirm the genetic association with anti-cN1A antibodies within our cohort, possibly due to the smaller size of our cohort resulting in diminished analysis sensitivity.Abbreviations: nnumber of alleles, Pp value, P a -adjusted p value (method = "Bonferroni"), ORodds ratio, CIconfidence interval.
N. Slater et al.
Additionally, arginine-74 has been reported in association with various other autoimmune conditions, including rheumatoid arthritis [34], Grave's disease [35,36], type I diabetes [36].Extensive efforts have been dedicated to unravelling the mechanism by which it contributes to the risk of autoimmunity.Structural modelling of the DRβ1 chain by Miglioranza Scavuzzi and colleagues indicated that amino acid 74 resides within the peptide-binding pocket, and the substitution of a neutrally charged alanine or glycine with a positively charged arginine significantly altered the protein three-dimensional structure [37].This alteration facilitated more efficient binding of self-peptides and their presentation to CD4 + T cells [38].Building on those findings, Lee and colleagues screened a library of 150,000 small molecules to identify candidates that may disrupt the presentation by HLA-DRβ1-Arg74 of a self-antigen derived from thyroid stimulating hormone receptor (TSHR) that is responsible for autoimmune thyroid disease [38].Intriguingly, the group identified one such molecule, Cepharanthine, which specifically out-competed the binding of a TSRH-derived self-peptide to DRβ1-Arg74.In an experimental mouse model of Grave's disease induced upon presentation of TSHR-derived peptide, Cepharanthine demonstrated therapeutic efficacy by reducing T cell activation and downstream production of pro-inflammatory cytokines [38].
Furthermore, an antigen-independent mechanism has been proposed in Systemic Lupus Erythematosus (SLE) where DRB1*03:01:01 is also a known risk driver [39].The authors reported that a peptide that includes residues 65-79 of HLA-DRB1*03:01:01 could directly bind non-MHC receptors induced in the presence of IFN-γ in both human and mouse macrophages, initiating a signalling cascade, which resulted in mitochondrial perturbations, cell necrosis, protein unfolding and led to anti-dsDNA antibody production.
It is puzzling that HLA class II alleles are uncovered in association with IBM, a disease characterised by increased circulating and muscleinfiltrating cytotoxic CD8 + T cells [6,40,41].While the current paradigm places the role of HLA class II on presenting exogenous antigens to CD4 + T cells [42], there is emerging evidence that class II molecules can Abbreviations: DFdegrees of freedom, LRlikelihood ratio, Pp value, P a -adjusted p value (method = "Bonferroni").Abbreviations: nnumber of alleles, Pp value, P a -adjusted p value, ORodds ratio, CIconfidence interval.Abbreviations: nnumber of carriers, Pp value, ORodds ratio, CIconfidence interval.
N. Slater et al.
play alternative roles in the inflammatory response, such as presenting viral antigens to CD8 + T cells [43,44] and activating NK cells via the NKp44 receptor [45].
Our study also identified DRB4*01:01:01 and DQA1*01:02:01 as independently under-represented in the patient group and, thereby, presumably protective in IBM.Previous studies have reported an inverse correlation between the occurrence of the disease and the carriage of DRB4 alleles [46].Our study provides a more precise understanding of this association by demonstrating that individuals carrying the DRB1*03:01:01 but lacking the DRB4*01:01:01 and DQA1*01:02:01 alleles have a fourteen-fold higher risk of developing IBM compared to the general population.Furthermore, in this patient cohort IBM patients carrying the abovementioned genotype developed disease symptoms on average five years earlier than non-carriers.In contrast, the presence of a single gene copy of either DRB4*01:01:01 or DQA1*01:02:01 alleles entirely negated the risk driven by DRB1*03:01:01.Although the mechanism by which these alleles exert their protective effect is yet to be determined, we can propose some potential avenues.For instance, HLA class II molecules have been found to interact with lymphocyte activation gene-3 (LAG-3; CD223), an inhibitory receptor that is induced on activated T cells (both CD4 + and CD8 + ) and NK cells [47].By competing with self-reactive TCR for binding to HLA class II loaded with high-affinity self-antigens, LAG-3 may hinder the activity of autoreactive T cells that have escaped thymic negative selection, thereby controlling autoimmunity [48].Additionally, polymorphism within HLA non-coding gene sequences can impact the level of cellular expression of the HLA molecule and, consequently, the extent of the resulting inflammatory response, as in the case of specific HLA-DP variants associated with the recovery from chronic HBV infection [49].
It is essential to interpret our findings within our study limitations.Specifically, the statistical power of our study is limited by the number of available participants, which is inherent to research in rare conditions.This constraint became particularly detrimental when assessing HLA alleles associated with anti-cN1A antibodies, as the smaller comparison groups led to diminished sensitivity in the analysis.This point is exemplified by a previous study that reported a strong correlation between anti-cN1A antibodies and HLA-DRB1*03:01 in a larger cohort of 313 Caucasian adult-and juvenile-onset myositis patients [33]; a finding we could not replicate in our smaller cohort.Finally, our observation of the correlation between the presence of the IBM-associated HLA genotype and the age at symptom onset requires replication in a different patient group because capturing symptom onset data relies on patients self-reporting, which introduces a subjective bias to the analysis.
Currently, no objective approach is available to accurately determine the timing of initial disease symptoms, especially in patients with slow disease progression.
In conclusion, this study utilised a high-resolution next-generation sequencing workflow to gain deeper insights into the role of HLA in IBM in a Caucasian patient cohort.It determined that carriage of HLA-DRB1*03:01:01 of the 8.1 AH cluster was the principal allelic component responsible for the increased disease risk and identified the combination of alleles that confers the highest risk profile for IBM as well as impact on the age of symptom onset.We anticipate that in the future, international endeavours focused on gathering and sharing highresolution genetic data of this nature will enhance the precision and reliability of our understanding of the genomic landscape in rare diseases such as IBM, ultimately facilitating advancements in diagnosis and treatment.Additionally, studies investigating HLAantigen -T cell receptor interactions and alternate mechanisms through which HLA molecules can facilitate IBM pathological processes could identify novel targets for therapeutic intervention.

Table 1
HLA alleles independently associated with IBM after stepwise conditional analysis.

Table 2
Amino acid positions within the HLA molecules found independently associated with IBM after stepwise conditional analysis.

Table 3
Comparison of amino acid frequencies at the DRB1_74 residue between IBM patients and control cases.