Gene hunting in autoinflammation

Steady progress in our understanding of the genetic basis of autoinflammatory diseases has been made over the past 16 years. Since the discovery of the familial Mediterranean fever gene MEFV (also known as marenostrin) in 1997, 18 other genes responsible for monogenic autoinflammatory diseases have been identified to date. The discovery of these genes was made through the utilisation of many genetic mapping techniques, including next generation sequencing platforms. This review article clearly describes the gene hunting approaches, methods of data analysis and the technological platforms used, which has relevance to all those working within the field of gene discovery for Mendelian disorders.


Introduction
The concept of autoinflammatory diseases was first proposed by McDermott and colleagues 15 years ago upon the elucidation of the genetic cause of TNF Receptor Associated Periodic Syndrome (TRAPS), the second of the prototypic periodic fevers; following just two years after the identification of MEFV the Familial Mediterranean Fever (FMF) gene. They used the term 'autoinflammatory' to distinguish the fact that autoantibodies were conspicuously absent and the innate immune response seemed to predominate in the pathogenesis [1]. There are two factors which indicate autoinflammatory disease: abnormally increased inflammation, mediated predominantly by the cells and molecules of the innate immune system, and significant host predisposition (monogenic or polygenic) [2].
Infevers, an online database of known mutations [3] currently recognises 19 monogenic autoinflammatory diseases where genes have been identified and mutations are curated. Currently in the UK, six genes can be screened routinely (MEFV, NLRP3 and 12, MVK, TNFRSF1A, and NOD2); others are sent to reference and research labs around the world. However, as many as 60% of patients with autoinflammatory diseases do not fit with the known syndromes and/or screening for known genes is negative [2]. New syndromes are still being reported. In the last two years there has been the identification of genes for six monogenic autoinflammatory diseases. In addition, certain polygenic diseases previously considered as autoimmune are now being considered for reclassification as autoinflammatory, for example Behçet's disease, systemic juvenile idiopathic arthritis (sJIA), ankylosing spondylitis, Crohn's disease, psoriasis and more [4].
With the exception of the cryopyrinopathies, where IL-1 blockade has revolutionised treatment, and FMF where colchicine is a safe and effective treatment for most, there is no ideal consensus treatment for many emerging new autoinflammatory diseases. As the aetiopathogenesis remains unknown for many of these conditions, treatment relies on non-specific immunosuppression with corticosteroids and/or other empiric trials of immunosuppressants including biologics, sometimes with significant side effects and cost. In extremely severe conditions allogeneic stem cell transplantation may be offered, but this is risky, and if pathogenesis is unknown there is no guarantee that it will be effective particularly for conditions associated with mutations not restricted to the haematopoietic system. Undoubtedly, identifying causative genes and pathways will lead to better targeted treatment. It will also improve diagnostic screening for other affected family members or unrelated patients with unclassified autoinflammatory diseases. Additionally, identification of novel mutations causing autoinflammation will enable genetic counselling for the family. Thus searching for novel genetic causes of autoinflammatory disease is both beneficial to patients, as well as advancing our knowledge of these diseases.

Review
The decision of which genes to screen for mutations can be based on: the function of the gene (candidate gene approach), location in the genome following mapping (linkage analysis and/or homozygosity mapping), or a combination of both. New next-generation sequencing technologies now permit the screening of every gene at once by sequencing whole exomes and even whole genomes. The history of the discovery of the 19 known monogenic causes of autoinflammation provides examples of the various strategies for the discovery of monogenic autoinflammatory diseases, as well as telling the story of the continued evolution in genetic techniques. Genes identified thus far, and the methods used for their discovery are summarised in Table 1. Gene discovery in polygenic disease is more complex, as each genetic variant confers only a small increase in susceptibility. Whilst this review predominantly focuses on monogenic disease, association studies in polygenic disease will also be touched upon.

Candidate gene approach
Sometimes a whole genome candidate gene approach is taken without prior mapping. NLRP12 was identified from a whole genome candidate gene approach; Jeru and colleagues decided to screen NLRP genes in periodic fever patients, as NLRP proteins are involved in the recognition of microbial molecules and the activation of immune responses [25]. Similarly, after observing a positive response to anakinra (the IL-1 receptor antagonist) in a family with a neonatal onset pyogenic disorder, Aksentijevich and colleagues decided to screen genes in Table 1 Summary of the genes identified to be mutated in monogenic autoinflammatory diseases, the year they were first recognised and mapping techniques which contributed to this Positional cloning [5,6].
the IL-1 pathway and found mutations in IL1RN in 6 families [26]. Sometimes a candidate gene approach is applied to a smaller region after genetic mapping, see below. What is evident from these examples is that insight into the innate immune system and the pathogenesis of known inflammatory disease is useful to inform candidate gene identification in novel diseases.

Genetic mapping
In human genetics, linkage analysis has traditionally been conducted as the first step in mapping a trait. The majority of studies have used a combination of: twopoint linkage analysis, multipoint linkage analysis, haplotype analysis, and in some cases homozygosity mapping.
Two-point/pairwise linkage: considers each marker independently calculating the log of odds (LOD) score that each is linked to the disease locus. This method is particularly useful for identifying which of a number of highly polymorphic markers are closest to the mutated gene, and thus is very useful in restriction fragment length polymorphism (RFLP) and microsatellite-based mapping (see below).
Multipoint linkage: makes use of a genetic map, where the order of markers and the distance between them is specified. The probability that the disease locus lies at each point between these markers is then calculated sequentially.
Haplotype Analysis: involves analysing each pedigree in turn; considering the order of the markers along each chromosome and reconstructing the transmission of each allele across each generation and thus inferring the recombination events between markers to narrow down the disease interval. The explicit genotype at each marker which has the correct heritage can additionally be compared between unrelated samples, as if they are identical this may indicate they have the same origin (founder effect).
Homozygosity mapping: is applied to consanguineous families, and in considering the presumed autosomal recessive inheritance of the disease, identifies regions of the genome which exhibit this inheritance pattern. A requirement of all these techniques is that uniquely identifiable markers are spaced throughout the genetic region of interest (be it the whole genome, chromosome, or smaller candidate region) which are then tested for proximity of the disease locus depending on which method is being applied. These markers have evolved over the last 15 years, which is reflected in the genefinding approaches taken in autoinflammatory disease studies as described below.

Restriction fragment length polymorphisms (RFLPs)
These are genetic markers where DNA is digested using restriction endonucleases before separation by gel electrophoresis and detection with a probe via Southern blot. A number of linkage studies in medium sized cohorts of families with Familial Mediterranean Fever (FMF) in the early 1990s were based on RFLP markers, identifying the short arm of chromosome 16 to harbour the afflicted gene for FMF in several populations [39][40][41][42] (and two studies based on microsatellites, below [5,6]), before the gene was subsequently identified [6,43].

Microsatellites
The development of microsatellite markers allowed for the faster typing of a greater number of markers whilst consuming less DNA. Microsatellites are tandem repeats where the repeating unit consists of 2-6 bases and the repeat array covers between 10-1000bps. They can be amplified by PCR with primers in the unique flanking regions and detected by gel electrophoresis, or using fluorescent primers using an ABI prism genetic analyser allowing more markers to be detected at once. This type of genotyping has been at the core of mapping for a number of autoinflammatory diseases, including the four most common hereditary recurrent fevers.
McDermott et al. conducted multipoint and two-point linkage analysis using microsatellite markers in a large Irish-Scottish kindred and two additional Irish kindreds suffering from dominant periodic fevers, and identified an 8cM region on chromosome 12p13. Having observed a lower level of TNF receptor in the blood of patients they screened TNFSFR1 as a candidate gene within that region [44]. This same region had also just been identified through multipoint linkage, two point linkage, and haplotype analysis in an Australian kindred of Scottish descent [7].
In 1999 pairwise linkage analysis was also central to the identification of the region harbouring the gene for Hyper IgD syndrome, conducted based on microsatellite markers in 13 families and then a candidate gene approach identifying the gene for mevalonate kinase (MVK) [8].
Two-point linkage, multipoint linkage and haplotype analysis on microsatellites were employed in the discovery of the gene NLRP3 mutated in the Cryopyrinopathies. The fact that linkage in familial cold autoinflammatory syndrome (FCAS) and Muckle-Wells Syndrome (MWS) identified the same region, 1q44, highlighted that these were two diseases within a spectrum [9,10]. Neonatal-onset multisystem inflammatory disease (NOMID), also known as the chronic infantile neurological cutaneous and articular syndrome (CINCA), was later identified as the third condition to be caused by mutations in this gene after linkage studies triggered Feldman et al to consider NLRP3 as a candidate gene [11].

Single nucleotide polymorphisms (SNPs)
SNP arrays have further increased the number of markers which can be measured, though they are biallelic meaning heterozygosity is low, therefore a multipoint approach to linkage considering a greater number of markers at once has become critical. Genomic DNA is fragmented and hybridised to the array, which has DNA oligomers complementary to the sequence adjacent to the SNP. The alternate SNP alleles either affect the binding to either the oligomer and attached probe (Affymetrix arrays) or binding to a subsequent probe after single base pair (bp) extension (Illumina arrays). Consequently, this results in differential fluorescence of the probe which is detected when the array is scanned and allows for the base allele to be called.
PSMB8 has been identified by three different groups looking at patients who were classified as having three differently named diseases. Agarwal et al studied patients with joint contractures, muscle atrophy, microcytic anaemia, and panniculitis-induced lipodystrophy syndrome (JMP) using homozygosity mapping and parametric multipoint linkage based on SNP genotypes from the Illumina HumanOmni1-Quad Beadchip [29]. Arima et al performed homozygosity mapping in five unrelated patients and three unaffected siblings of one of the patients with Nakajo-Nishimura Syndrome (NNS) using Affymetrix GeneChip Human Mapping 500K array set [30]; whilst Kitamura et al used homozygosity mapping and multipoint linkage to study Japanese autoinflammatory syndrome with lipodystrophy (JASL) patients and unaffected siblings from 2 consanguineous families genotyped on the Illumina Human 610 Quad combined with exome sequencing [31]. They found that both NNS and JASL was caused by the Gly201Val substitution (though Kitamura et al. refer to this as Gly197Val based on a different transcript of the gene) suggesting a founder effect. Meanwhile, Liu et al used a combination of homozygosity mapping with SNP genotypes from Affymetrix GeneChip Human Mapping 250K SNP Array, candidate gene sequencing and in one patient performed exome sequencing to identify PSMB8 in Chronic Atypical Neutrophilic Dermatosis with Lipodystrophy and Elevated Temperature Syndrome (CANDLE) patients. This is a further example of how mapping can support whole exome sequencing [32].
SNP arrays have also enabled genome-wide association studies (GWAS) in polygenic diseases. Association studies employ a case-control experimental design, where DNA markers of a sample of unrelated individuals are measured to search for correlations with phenotype. For each marker the analysis is a simple 2x2 χ 2 test. GWAS studies use high through-put platforms such as SNP arrays to detect if any of them are statistically associated with the disease. However, as each marker counts as an additional independent test, compensations must be made for this multiple testing and to surmount this threshold the number of cases and controls needs to be very large. Furthermore, the analysis may be perturbed by underlying population structures and ethnicities due to differences in allele frequencies. This must be carefully controlled or factored into the analysis. Once an association has been identified then there are rounds of further tests to confirm the putative association. Many of these studies use a course map of markers and then try to replicate the result in a further sample with more markers in the region. A number of GWAS have been undertaken in both Behçet's disease (reviewed in [45]) and in Crohn's disease [46].
Deep sequencing based techniques -whole genome, whole exome and targeted capture sequencing Whole genome sequencing takes the total genomic DNA from an individual and after preparing a DNA 'library' , sequencing the entire genome. In targeted capture, including whole exome sequencing, only the genomic regions of interest (e.g. all exons), are 'captured' from the DNA library. This is done by hybridising the library to complementary DNA or RNA oligos and washing the unbound DNA away before eluting the bound DNA for sequencing. Exome capture kits can be bought 'off the shelf'; three of the biggest retailers of exome capture kits are Nimblegen, Agilent and Illumina, and the exact constitution of the exome depends on the proprietary composition of their capture oligos. Sequencing is then performed on one of the high throughput next generation sequencing (NGS) platformsincluding Illumina's Solexa sequencing by synthesis technologies, SOLiD sequencing by ligation, 454 pyrosequencing and Ion Torrent semiconductor technology.
Following sequencing are several bioinformatic processes including: aligning sequencing reads to the reference genome; variant calling which identifies deviations from the reference sequence; and annotation of variants. Annotation is the process of marking any variants found with information such as nearest gene, predicted consequences at protein level, and frequency in variant databases such as 1000 Genomes Project and SNP database (dbSNP) to help to eliminate common polymorphisms (occurring in >1 % of the healthy population). Based on these factors, certain variants may be selected as 'candidates' for follow-up study with further insight into how the gene functions and whether the suspected variant is likely to interfere with this. Bioinformatic programs which may help with predicting consequences at protein level include PolyPhen [47] and MutationTaster [48].
For whole exome and whole genome sequencing this process can identify thousands of non-pathogenic variants per person. Sequencing multiple unrelated individuals is one solution to filter out non-pathogenic variants. Onoufriadis et al sequenced the exomes of 5 unrelated Europeans with generalised pustular psoriasis. In two of them they found the same homozygous mutation and a third was found to be a compound heterozygote in IL36RN [34], the gene now known to be associated with the autoinflammatory disease called deficiency of the interleukin 36 receptor antagonist (DITRA). Concurrent studies by another group used both homozygosity mapping combined with next generation sequencing, again leading to the identification of IL36RN, and emphasising the utility of combining these approaches for gene discovery [33].
Identifying the causal variant can be an enormous task if multiple unrelated patients are unavailable. For each variant the inheritance model can be considered, for example, if the disease is recessive is the variant homozygous? Runs of homozygous variants can also be identified effectively for homozygosity mapping [49]. The number of heterozygous variants from a sequenced exome which have the correct inheritance for autosomal dominant disease is more numerous than when considering homozygous or compound heterozygous variants for a recessive disease. Sequencing other family members can be informative to reconstruct the inheritance of variants. Using Agilent Sureselect 50Mb exome capture and SOLiD sequencing in a trio (an affected father and daughter and unaffected mother) enabled the identification of the PLCG2 as being the gene mutated in Autoinflammatory PLCG2-associated antibody deficiency and immune dysregulation (APLAID) syndrome, an autosomal dominant disease [35].
Very rare SNP variants can be determined from deep sequencing, which may be an asset for genome-wide association studies of polygenic complex diseases [50]. This approach could be beneficial to polygenic autoinflammatory diseases such as periodic fever, aphthous ulceration, pharyngitis, and adenitis syndrome (PFAPA), Behçet's disease, and has been applied in a small cohort in Crohn's disease [51]. However, this is computationally complex and requires the typing of many samples from large patient cohorts to be sufficiently powered to detect genetic associations.
Increased sensitivity of the NGS technologies is also enabling detection of mutations in known genes present in a minority of cells, i.e. for the detection of somatic mosaicism. Approximately 50% of CINCA patients who have the classic features have no detected mutation in NLRP3, so-called "mutation negative CAPS". A significant proportion of these patients may have somatic mosaicism in their leucocytes causing their disease, but at a percentage of affected cells which is too low to be detected by conventional Sanger sequencing. Sensitivity of detection can be increased by the process of cloning into vectors, as these cells make up a small percentage (4.2-38.5%) of the total leucocyte population [52]. Alternatively, next generation sequencing technologies may detect low levels of somatic mosaicism and could become part of routine genetic screening in the future for CAPS, since the NLRP3 and other related genes can be covered with sufficient read depth [Omoyinmi et al. , manuscript submitted].

Conclusion: clinical application and ethical considerations
We are in the midst of a revolution in diagnostic genetics in the field of autoinflammation. NGS technologies now allow rapid detection of novel genes causing autoinflammation in selected patients. In addition, techniques such as whole exome sequencing are increasingly being used in routine clinical practice as diagnostic tools to rapidly screen many genes, particularly for diseases where there is a phenotypic overlap and many possible genetic causes. In particular, carbohydrate deficient glycoprotein syndromes, and Charcot-Marie-Tooth disease are specific examples where whole exome sequencing allows rapid screening of many genes with greater efficiency and less cost than conventional Sanger sequencing [53]. Clearly, autoinflammatory diseases would lend themselves well to this sort of diagnostic approach since it is likely that in the future tens or even hundreds of genes will be implicated either singly or in combination as the cause of autoinflammation in certain patients. NGS technologies also bring new scientific challenges and ethical dilemmas. Our ability to discover new candidate genes now mandates that we develop robust functional experimental approaches to confirm that these genes are truly pathogenic, often the most challenging aspect of gene discovery in practice. At the same time, we are faced with new ethical considerations: the consequence of screening entire exomes and genomes means that unrelated/coincidental genetic disease may be identified. This raises complex ethical questions in relation to whether these findings should be communicated to the patient/family. This important issue in relation to "return of results" is reviewed elsewhere [54]. In the field of autoinflammation however, irrespective of these important ethical dilemmas, these new technologies have undoubtedly propelled us into an era of intense gene discovery and greater understanding of the immune system and its regulation that will rapidly be translated into better diagnostics and therapeutics for these patients.