Genetic interrogation for sequence and copy number variants in systemic lupus erythematosus

Early-onset systemic lupus erythematosus presents with a more severe disease and is associated with a greater genetic burden, especially in patients from Black, Asian or Hispanic ancestries. Next-generation sequencing techniques, notably whole exome sequencing, have been extensively used in genomic interrogation studies to identify causal disease variants that are increasingly implicated in the development of autoimmunity. This Review discusses the known casual variants of polygenic and monogenic systemic lupus erythematosus and its implications under certain genetic disparities while suggesting an age-based sequencing strategy to aid in clinical diagnostics and patient management for improved patient care.


Introduction
Systemic lupus erythematosus (SLE, or lupus) is an autoimmune disease characterized by autoantibody formation targeting nucleic components like double-stranded DNA (dsDNA) and RNA (Caielli et al., 2023).The vast spectrum of clinical manifestations ranges from mild skin rashes to widespread destructive multi-organ inflammation, which in some cases, could result in death.The pathogenesis of SLE is complex and multi-factorial (Tsokos, 2011), with genetic and environmental contributions to the disease.It has also been observed that various autoimmune diseases are more common in women (Vinuesa et al., 2023), and in SLE, individuals from Black, Asian or Hispanic ethnicities have an increased disease burden, with patients presenting with a more severe phenotype (Lewis and Jawad, 2017).
SLE can be grouped according to the age of disease onset into adult-and childhoodonset SLE (cSLE); the latter referring to those diagnosed before the age of 18 years and generally presents with greater severity especially in children under 5 years old (Bundhun et al., 2017;Alperin et al., 2018).This early onset of SLE has been associated with an increased genetic burden, highlighting the contribution of one or several risk alleles to disease (Webb et al., 2011).And within this patient group, around 3%-10% of patients carry a single disease-causing variant (Almlof et al., 2019;Belot et al., 2020;Charras et al., 2023), thus being increasingly recognized and termed as monogenic SLE (Harley and Sawalha, 2022;Vinuesa et al., 2023).Pinpointing the disease-causing variant will contribute greatly to our current knowledge of lupus pathogenesis, and this can be achieved through the use of nextgeneration sequencing (NGS) techniques (Sanger et al., 1977;Slatko et al., 2018;You et al., 2018;Yaung et al., 2023).As such, a focused strategy is needed together with prioritizing NGS and research efforts towards cSLE patients (Mina and Brunner, 2013).
Knowing that SLE has a strong genetic component to disease (Lewis and Jawad, 2017), multiple susceptibility loci have since been identified, following the advent of genome-wide association studies (GWAS) (Deng and Tsao, 2014).Further diving into genomic studies of SLE through NGS techniques has brought to light the utility of whole exome (WES) and whole genome sequencing (WGS).Our colleagues have also reviewed various technologies that could be employed to elucidate disease mechanisms (Yaung et al., 2023), such as Sanger sequencing (Sanger et al., 1977), single nucleotide polymorphism (SNP) array (You et al., 2018), WES and WGS (Slatko et al., 2018).In this Review, we expound further into the use of NGS techniques, notably WES, across the current genomic landscape of polygenic and monogenic SLE, discussing its potential in reconciling disease risk variants and copy number variations (CNVs) and evaluating the identification of such variants.

Next-generation sequencing in SLE
Sequencing technologies have been fundamental for researchers due to their high-throughput capabilities and more recently, their cost-effectiveness (Goodwin et al., 2016).This has allowed for comprehensive genomic studies (i.e., point mutations, small indels, CNVs) and paved the way for multi-omics studies (Levy and Myers, 2016;Lee et al., 2022;Satam et al., 2023;Yaung et al., 2023).In the context of systemic autoimmune diseases like SLE, multiple susceptibility loci identified by GWAS cumulatively contribute risk towards its development but carry a relatively low disease risk individually (Sestak et al., 2011;Wahren-Herlenius and Dorner, 2013).
Several methods have been employed in SLE genomics, including WGS, WES and targeted sequencing (Table 1).Briefly, WGS allows for comprehensive interrogation of the entire human genome and has contributed significantly to the genomic landscape via the 1000 Genomes project since 2010 (Genomes Project et al., 2010;Genomes Project et al., 2015;Sudmant et al., 2015).However, around 85% of disease-related mutations are concentrated in the exome, which constitutes about 2% of the whole genome (Majewski et al., 2011).WES then involves the selection of protein-coding regions (exons) in the genome for sequencing to identify any changes that could impact protein sequences (Ng et al., 2009).This has led to its increased use due to the significant reduction vis-à-vis the starting material, cost and data management (Petersen et al., 2017).In addition, mutations in the exonic region have been shown to be a major contributor to the development of monogenic diseases (Kuhlenbaumer et al., 2011).With the knowledge obtained from the above-mentioned methods, sequencing panels could be generated to target certain regions of interest that harbor pathogenic mutations, hence the utility of targeted sequencing for potential clinical care (Gulilat et al., 2019).
It has been recently suggested that polygenic risk scores (PRS) could be utilized to identify and stratify potential SLE patients for early intervention, if needed (Khunsriraksakul et al., 2023).Briefly, GWAS-identified risk variants are statistically compiled to predict disease incidence in a population and risk for developing SLE in individuals (Khunsriraksakul et al., 2022).An association between a high PRS and poorer prognosis in SLE has been observed (Chen et al., 2020;Reid et al., 2020;Sandling et al., 2021), with one study going further to delineate T cell differentiation and innate immunity as the two key axes of SLE association mediated by HLA and interferons (IFNs) respectively (Sandling et al., 2021).The strong involvement of HLA and IFNs has also been described for SLE pathogenesis (Chen et al., 2017;Villarino et al., 2017;Alunno et al., 2019;Crow and Ronnblom, 2019).Despite its utility, PRS has yet to be generalizable beyond the specific population being studied, which further emphasizes the need for larger, diverse and well-represented datasets in order to draw meaningful conclusions (Torkamani et al., 2018).In addition, data generated from GWAS is primarily based on SNP arrays which can be limited by its inability to identify causal variants and ultra-rare mutations, particularly in ethically underrepresented populations (Tam et al., 2019).NGS techniques thus provide an answer to interrogating such variants, which might aid in enriching our knowledge of SLE pathogenesis, the clinical diagnosis and management of polygenic SLE together with the potential use of PRS.
Across autoimmune diseases, a hallmark of its development is the loss of tolerance to self-antigens, with AIRE and CTLA-4 being implicated in SLE (Pullmann et al., 1999;Ahmed et al., 2001;Hudson et al., 2002;Lee et al., 2005;Cunninghame Graham et al., 2006;Lovewell et al., 2015;Montufar-Robles et al., 2019;Alghamdi et al., 2021).AIRE, or autoimmune regulator is essential for maintaining central immune tolerance by controlling the negative thymic selection of hyper-reactive T lymphocytes against self-antigens (Yang et al., 2015).Mutations in this gene region have been observed in Norwegian patients with autoimmune polyendocrine syndrome type I (APS-1; (Oftedal et al., 2023)) and Japanese patients with rheumatoid arthritis (RA; (Terao  (Bentham et al., 2015).Next, cytotoxic T-lymphocyte associated protein 4 (CTLA-4, or CD152) is an important checkpoint inhibitor in peripheral immune tolerance via negative signaling in regulating autoreactive T cells (Liu and Zhang, 2013;Van Coillie et al., 2020).Though several reports have identified certain polymorphisms contributing to SLE development (Pullmann et al., 1999;Ahmed et al., 2001;Hudson et al., 2002;Lee et al., 2005;Cunninghame Graham et al., 2006;Jury et al., 2010), a meta-analysis has highlighted no association of said variants to lupus (Liu and Zhang, 2013;Alghamdi et al., 2021).In some cases, specific CTLA-4 variants could even contribute to protection against SLE (Barreto et al., 2004), suggesting that only certain variants within the CTLA-4 gene region have an association with SLE development.
Recent studies have described several novel genes associated with SLE following WES analysis in an Asian population, such as the decreased expression of cell division cycle 27 (CDC27) in patients (Shang et al., 2022), and novel variants in genes encoding for complement receptor 2 (CR2) (Tang and Luo, 2022), C1R (Demirkaya et al., 2017), NRAS, TNFAIP3 and PIK3CD (Li et al., 2020), WNT16 and ERVW-1 (Chen et al., 2022), ACP5 andSAMHD1 (Hong et al., 2022).This list of genes contributing to monogenic SLE continues to grow with increased usage of WES over the past few years, further enriching our knowledge about the genetic contribution to SLE.
As previously mentioned, defects in complement genes have been observed to be a monogenic cause of SLE.Of note, C4, or complement compartment protein 4, is usually present in most individuals as two copies of C4A and C4B respectively.In some cases, SLE patients may carry a range of zero to five copies of C4A and zero to four copies of C4B (Yang et al., 2007;Pereira et al., 2019).A recent study has described an association between a low C4A copy number and an increased risk of developing SLE (Kamitaki et al., 2020).Though C4 genes are highly homologous and are usually excluded from variant calling analysis, Lundtoft et al. performed a focused analysis into C4 CNVs via targeted sequencing and found Scandinavian SLE patients with a low C4A copy number and carrying a common loss-of-function (LoF) variant presenting with lowered plasma C4 levels (Lundtoft et al., 2022).Whether this phenomenon can be extended to other ancestral populations remains unknown and warrants further investigation.
Other genes like FCG3RA and FCGR3B encode for low-affinity Fc gamma (Fcγ) receptors of IgG and are crucial in the binding and clearing of immune complexes (Willcocks et al., 2008;Niederer et al., 2010), while CCL3L1 (C-C chemokine ligand 3 like-1) translates into a ligand that binds to C-C chemokine receptor 5 (CCR5) (Gonzalez et al., 2005).Healthy individuals carry two copies of each respective gene, but SLE risk increases when there are either lower or higher copy numbers of said genes (Willcocks et al., 2008).Increased SLE susceptibility was also observed with low RASGAP1L and high TLR7 copy numbers respectively.RASGAP1L encodes for a Rab GTPase-activating protein (Kim et al., 2013), while TLR7 is a key receptor in innate immunity that recognizes single-stranded RNA (Lund et al., 2004;Takeda and Akira, 2005).Lastly, abnormal CNVs in heat shock proteins 90 (HSP90), especially in its AB1 isoform, were identified to correlate with SLE in the Han Chinese (Zhang et al., 2019).This highlights the importance of CNVs in SLE and autoimmunity and thus the need for more traction toward implementing a pipeline to include them in future genetic screens (Zhao et al., 2020).

Identification of potential diseasecausing variants
Genetic testing using NGS techniques has identified potential disease-causing variants and led to better preventative risk management of diseases (Shaw et al., 2023).However, given its complexity, the labeling of variants as potentially pathogenic should be done with caution to prevent misdiagnoses.A misdiagnosis of a pathogenic variant can result in unnecessary medical interventions and cause undue psychological distress to both patients and their families (Manrai et al., 2016;Shaw et al., 2023).Such detrimental consequences have occurred in diseases like hypertrophic cardiomyopathy and cancers, where variants that were thought to be pathogenic were subsequently found to be benign due to the under-representation of certain ancestries in reference control groups (Manrai et al., 2016;Shaw et al., 2023).
To prevent such genetic misclassifications, the American College of Medical Genetics and Genomics has introduced a standardized framework for variant interpretation (Richards et al., 2015).In the case of SLE and other autoimmune diseases, pathogenic variants can be better identified prior to further functional validation through this framework, thus reducing the occurrence of false positives as the number of sequencing studies continues to rise (Vinuesa et al., 2023).In addition, various consortia like the Clinical Genome Resource (ClinGen; (Rehm et al., 2015)), Rheumatologic Autoimmune Clinical Domain Working Group under ClinGen, Lupus in Minority Populations, Nature versus Nurture (LUMINA; (Alarcon et al., 2001)) have been established to aggregate all available genomic data and concentrate global research efforts.Crucially, the consolidation of genomic data overcomes the major limitation of genome-wide studies of requiring large sample sizes due to the need to adopt a high level of significance to account for multiple testing (Tam et al., 2019).
With this framework for variant interpretation and genomic data from various consortia, this can be potentially applied to the dysmorphic syndromes associated with SLE, specifically genes of the Ras/mitogen-activated protein kinase (Ras/MAPK) pathway to identify with greater certainty the potential pathogenic genetic variants within this pathway that contribute to SLE (Amoroso et al., 2003;Lisbona et al., 2009;Leventopoulos et al., 2010;Hanaya et al., 2017;Uehara et al., 2018).However, further investigations would be needed to delineate the underlying mechanism with functional studies of the different genes in the Ras/MAPK pathway as these are currently described in case reports and series.
Our current understanding has informed us that certain ancestral groups have an increased predilection towards developing SLE (Lewis and Jawad, 2017), which requires controlling for in future sequencing studies to prevent any potential misclassification of disease-causing variants due to the unavailability of an adequate ancestry-specific reference genome.Past research has been largely focused on European ancestry (Yang et al., 2007;Lewis and Jawad, 2017;Hanscombe et al., 2018), resulting in an under-representation of data from other ancestries to draw meaningful generalizations about the disease.This can be resolved by tapping on several biobanks that have been consolidated over the years to provide greater depth and insights into the genetic differences within and across various ancestries.These include, and are not limited to, the Tohoku Biobank (150,000 participants; (Minegishi et al., 2019)), Mexican Biobank (6,057 participants; (Sohail et al., 2023)), Biobank Japan (BBJ, 260,000 participants; (Kanai et al., 2018)), China Kadoorie Biobank (500,000 participants; (Chen et al., 2011)), H3Africa (70,000 participants; (Consortium et al., 2014;Mulder et al., 2018)), UK Biobank (500,000 participants; (Bycroft et al., 2018;Van Hout et al., 2020;Gaynor et al., 2023)), Michigan Genomic Initiative (MGI, 91,000 participants; (Zawistowski et al., 2023)), Vanderbilt University Biobank (BioVU, 300,000 participants; (Khunsriraksakul et al., 2023)), and SG10K (9,051 participants; (Chan et al., 2022)).It should be noted that SG10K has since been expanded to SG100K, whereby data from 70,000 participants across four national cohort studies will be pooled together with the additional recruitment of 30,000 individuals (Begum, 2022).

Discussion
In this Review, we have provided an overview of various susceptibility genes contributing to the development of SLE either through a polygenic or monogenic route identified via NGS techniques, highlighted the involvement and importance of CNVs and urged for the inclusiveness of control groups to account for ancestral differences to prevent any potential variant misclassification.
Though the method of WES has been well-established over the years, notable limitations persist in WES-based CNV analyses.The technique primarily targets coding regions, leading to a restricted view of the genome and potentially missing important regulatory components within non-coding regions such as intergenic or intronic regions (Mandelker et al., 2016;Royer-Bertrand et al., 2021).This significantly impacts the sensitivity of CNV detection.In addition, it is susceptible to biases, such as GC content bias, which can impact the reliability of CNV calls (Lelieveld et al., 2015).Furthermore, a relatively higher false positive rate and the limitation of achieving homogeneous coverage of sequencing reads restrict its inclusion as a gold-standard method for CNV detection (Marchuk et al., 2018;Burdick et al., 2020).These limitations emphasize the necessity of integrating WES with other omics approaches for more accuracy in CNV detection (Gabrielaite et al., 2021).Nonetheless, with ongoing upgrades to sequencing libraries, capture kits and bioinformatics pipelines, it is anticipated that the existing limitations will be alleviated (Zhou et al., 2021).Future applications of third-generation sequencing (TGS) techniques such as long-read sequencing hold promise in addressing these constraints and provide additional possibilities in detecting structural variations (SVs) (Xiao and Zhou, 2020).
Elucidating the pathogenesis of autoimmune diseases like SLE remains complex, and studies have called for the need for a multi-omics approach to furnish our current understanding of the disease (Fang et al., 2016;Hedrich, 2017;Kwon et al., 2019;Yaung et al., 2023).Thus far, transcriptomic signatures obtained from blood and tissues have shown an enrichment of genes involved in the IFN response (Banchereau et al., 2016;Der et al., 2019), which corroborates with previous genetic data (Baechler et al., 2003;Reynier et al., 2011).Epigenetic modifications in the genome such as methylation (Ballestar, 2011;Hedrich, 2017), non-coding RNAs (Taheri et al., 2020) and post-translational histone modifications (i.e., methylation, acetylation; (Hu et al., 2008)) have also been associated with the development of SLE.Proteomic studies have proven difficult to isolate biomarkers for diagnosis, management and monitoring due to the heterogeneity of the disease and its involvement across multiple organs (ref), but current efforts continue to show some promise (Huang et al., 2022;Fasano et al., 2023).Indeed, more needs to be done to reconcile multi-omics and genetic data of SLE in the future.

Conclusion
Up to 10% of patients below the age of 18 years can carry a significant disease-causing variant which manifests as severe SLE, alluding to a monogenic etiology and highlights the value of doing NGS in children with a very early onset of disease (Alperin et al., 2018;Charras et al., 2021).Previous studies have shown the utility of WES in unraveling novel rare variants and determining its respective contribution(s) to disease (Pullabhatla et al., 2018;Almlof et al., 2019;Tirosh et al., 2019;Almlof et al., 2021).However, genetic variation across ancestries should not be overlooked to prevent variant misclassification and downstream misdiagnoses.This can be controlled via the inclusion of gene datasets across various biobanks, consortia and databases.With that, establishing a pipeline where WES and CNV detection are coupled together will allow for the timely and pinpoint clinical diagnosis of SLE to allow for better clinical management and intervention.

Search strategy and selection criteria
We searched PubMed between 30 August 2023 and 7 February 2024, using the terms "systemic lupus erythematosus (SLE)", "nextgeneration sequencing (NGS)", "genomics", "copy number variation" in articles published from 1 Jan 2013 until 7 February 2024.Articles were also identified through references from articles identified through the search.Only papers published in English were reviewed and the final reference list was generated based on the relevance to the scope of this Review.

TABLE 1
NGS techniques and respective applications in SLE studies.

TABLE 1 (
Continued) NGS techniques and respective applications in SLE studies.

TABLE 1 (
Continued) NGS techniques and respective applications in SLE studies.

TABLE 1 (
Continued) NGS techniques and respective applications in SLE studies.