Application of whole genome and RNA sequencing to investigate the genomic landscape of common variable immunodeficiency disorders

Common Variable Immunodeficiency Disorders (CVIDs) are the most prevalent cause of primary antibody failure. CVIDs are highly variable and a genetic causes have been identified in <5% of patients. Here, we performed whole genome sequencing (WGS) of 34 CVID patients (94% sporadic) and combined them with transcriptomic profiling (RNA-sequencing of B cells) from three patients and three healthy controls. We identified variants in CVID disease genes TNFRSF13B, TNFRSF13C, LRBA and NLRP12 and enrichment of variants in known and novel disease pathways. The pathways identified include B-cell receptor signalling, non-homologous end-joining, regulation of apoptosis, T cell regulation and ICOS signalling. Our data confirm the polygenic nature of CVID and suggest individual-specific aetiologies in many cases. Together our data show that WGS in combination with RNA-sequencing allows for a better understanding of CVIDs and the identification of novel disease associated pathways.


Introduction
Common Variable Immunodeficiency Disorders (CVIDs) are the most clinically prevalent primary antibody deficiencies, present in about 1 in 25,000 people [1]. CVID is characterised by recurrent infections, low serum levels of IgG, low IgA and/or IgM and poor specific antibody responses [2]. CVID is currently a diagnosis of exclusion leaving a highly heterogeneous patient group in terms of clinical features and complications. Some patients have a relatively mild phenotype, comprising of recurrent bacterial infections and infectionrelated complications while others suffer from disease related complications such as autoimmune cytopenias, polyclonal lymphoproliferation, enteropathy and lymphoid malignancy, indicating underlying immune dysregulation [3].
In 2011 Orange et al. published the first genome-wide association study (GWAS) of CVID to identify genomic regions associated with CVID development [30]. Analysis of 363 patients and 3031 controls led to the conclusion that CVID is likely to be a polygenic disease with multiple novel susceptibility loci implicated. However, as of yet this has not resulted in further identification and elucidation of genes or variants that cause or predispose for sporadic CVID emphasizing the difficulties in studying this highly variable disease.
The development of next generation sequencing techniques has transformed the identification of the genetic basis of Mendelian diseases. In contrast, identification of the genetic basis remains challenging in polygenic conditions. Here, we present the first whole genome sequencing (WGS) data for a cohort of CVID patients to investigate novel underlying aetiologies. We further leveraged the potential of WGS by combining the results with global transcriptomic profiling through RNA-sequencing (RNA-seq). Because of the complex and probable polygenic nature of CVID, we combine the identification of genes of interest with pathway-based analysis and focus on combining these results to identify pathways dysregulated in CVID.

Samples
Patients were recruited into the study through the Clinical Immunology Department at the Oxford University Hospital, Oxford. All patients gave informed written consent and the studies were performed according to the Declaration of Helsinki. All 34 patients were of Caucasian origin and met the ESID diagnostic criteria at the time of enrollment [2]. The majority of patients were regularly followed in the Clinical Immunology clinic at 6 monthly intervals over a period of up to 30 years with detailed clinical information entered into the local database that enabled accurate clinical phenotyping. A summary of the clinical phenotype and laboratory characteristics of the patient cohort can be found in Table 1 and a more complete overview can be found in Table S2.

WGS500
A cohort of 34 CVID patients was selected for WGS as part of the WGS500 project [31]. This is a collaborative project between the University of Oxford and Illumina, which aims to sequence the genomes of 500 individuals with a range of diseases including rare inherited diseases, immunological disorders and cancer. For all variants the frequency of the variant in the non-cancer, non-CVID samples of the WGS500 project (n = 239) is listed in Table 2 and  Tables S3-S7 and S15.

Whole genome sequencing
Genomic DNA was extracted from peripheral blood mononuclear cells (PBMCs) using the FlexiGene DNA kit (Qiagen) according to the manufacturer's instructions. The DNA was quantified by fluorescence using the Quibit Fluorometer (Invitrogen) and the quality assessed by running <500 ng on a 1% TAE agarose gel for 1 h at 70 V. Whole genome sequencing was performed on 3.5-7.5 ng DNA on either the Illumina HiSeq2000 or the HiSeq2500 run in standard mode using v2.5 or v3 sequencing chemistry (Core Genomics, WTCHG). Briefly, the genomic DNA was fragmented, end-paired, A-tailed and adapterligated before size selection and amplification for a multiplexed library preparation. The libraries were then paired end sequenced and sequenced to an average coverage between 27 and 40 quality reads per base (100 base pair reads).
confidence filter was applied to exclude poor quality calls (Table 3) [33]. Frequency (1000 Genomes and WGS500) and predicted deleterious filters (SIFT and PolyPhen) were applied to prioritise high quality calls for non-synonymous and likely pathogenic variants (Table 3). IVA allowed the use of custom gene lists (Notes S1 and S3) as biological filters with options for '1 hop' analysis both up and downstream of these genes [34]. The '1 hop' function identifies interactions between genes from literature. Here, this function was used to create the list of variants in the genes interacting with described CVID genes.
Genomic regions described in GWAS were transferred from hg18 into hg19 using http:// www.genome.ucsc.edu/cgi-bin/hgLiftOver. WGS variants were filtered as described in Table 3 and restricted to those in the identified genomic regions [30].
All variants reported (except the results of the pathway analysis) were manually inspected using the Integrative Genome Viewer (IGV) [35]. Heterozygous variants were included if the variant was present in >25% of all reads with >15% in both forward and reverse reads. Homozygous variants were included if present in >80% of all reads and a minimum of 60% of reads in both directions. Variants found in regions with a high density of insertions/ deletions were excluded to reduce false positive calls resulting from mapping errors.

Sanger sequencing
Primers were designed using primer3 software (http://bioinfo.ut.ee/primer3/) (Table S14). 25 ng DNA was amplified using a 25 μl PCR reaction containing 2.5 μl 10× Buffer, 1.5 mM MgCl 2 , 0.2 mM dNTP, 250 nM primer (each direction) and 0.5 U Platinum Taq DNA polymerase (Invitrogen). Cycling conditions were an initial denaturation step at 94 °C for 2-5 min followed by 30 cycles of 94 °C for 30 s, 56-63 °C (Table S14) for 30 s and 72 °C for 40 s followed by a final elongation step of 5 min at 72 °C. The PCR product was cleaned using either a QIAquick PCR purification kit (Qiagen) or by incubating 0.1 μl ExoI and 0.1 μl CIP with 0.4 μl NEB buffer 3 in an 8 μl reaction at 37 °C for 30 min before inactivation at 95 °C for 5 min. 3.5 μl cleaned-up template DNA was sequenced using the Big Dye Terminator 3.1 according to the manufacturer's instructions (Applied Biosystems).

RNA-sequencing
RNA-seq data were generated for enriched B cells from three CVID patients and three healthy controls. PBMCs were isolated using Lymphoprep (Stemcell Technologies) and Bcells were enriched using CD19 microbeads (Miltenyi Biotec) according to the manufacturer's instructions. After isolation, purity of the B-cells was 61.9-96.1% as determined by FACS analysis. RNA was extracted using a RNeasy Mini Kit (Qiagen) following the manufacturer's instructions. Spectrophotometry (Nanodrop 2000; Thermo Scientific) was used to quantitate the RNA yield and the quality was determined by on-chip electrophoresis (Biorad Bioanalyzer; Agilent). RNA-seq was performed on 600-1000 ng RNA on the Illumina HiSeq2500 run in standard mode using v1 sequencing chemistry (Core Genomics, WTCHG). Briefly, the mRNA fraction was selected from the total RNA before conversion to cDNA. dUTP was incorporated from the second strand cDNA synthesis. The cDNA was end-repaired, A-tailed and adapter-ligated. Prior to amplification the samples underwent uridine digestion. The prepared libraries were size selected, multiplexed and QC'ed before paired end sequencing over one lane of a flow cell. Data were aligned to the reference and reads mapped to genes using HTSeq (Core Bioinformatics, WTCHG) [36]. QC to determine the proportion of reads mapping to genes and variation within the dataset was carried out using R (R Core Team, 2012; http://www.R-project.org) (see Fig. S4). The genes with mapped reads were filtered so only those with at least 10 reads in at least three samples were retained prior to normalisation. This resulted in a reduction from 32,426 to 16,546 transcripts. The data were normalised using the TMM method and the edgeR R package [37]. Biological pathway enrichment was generated through the use of QIAGEN'S Ingenuity Pathway Analysis (IPA®, www.qiagen.com/ingenuity, QIAGEN Redwood City).

Whole genome-sequencing and CVID
As part of the WGS500 project aiming to sequence the genomes of 500 individuals with a range of diseases, we performed WGS for a cohort of 32 sporadic CVID patients and one grandmother-grandson pair [31]. The cohort had an equal sex distribution and the median age of disease onset was 23.5 years. All patients met the European Society for Immunodeficiencies (ESID) diagnostic criteria for CVIDs [2]. We excluded patients with lymphomas, though patient C003 developed a B cell lymphoma four years into the study. Information on the clinical phenotypes and immunotypes of the patients can be found in Table 1 and Table S2. WGS using the Illumina platform with an average coverage between 27 and 40 quality reads per base identified 14,819,871 variants (single nucleotide variants (SNVs) and short InDels) across the 34 patients. Initial analysis confirmed a familial relationship between the grandmother (D232) and grandson (C018) pair and revealed long runs of homozygosity in one of the sporadic patients (D269).
To validate our data we first looked for variants previously described to cause or predispose for CVID-like diseases in our patients (variants are listed in Table S1, results are summarised in Table S3). This analysis revealed five previously reported variants in TNFRSF13B, tumour necrosis factor receptor superfamily member 13B (also known as TACI). We identified one patient heterozygous for p.C104R, previously reported to abolish ligand binding in the heterozygous state through haploinsufficiency [24,26,40]. Other TNFRSF13B variants (p.P251L, p.V220A, p.R202H and p.R72H) have originally been linked to CVID, but a later study found equal frequencies in CVID patients and healthy controls as we found here (as compared to the frequency in the 1000 genomes project and the non-CVID WGS500 samples) [24][25][26]41]. Our analysis also identified one patient heterozygous for the p.H304Y variant in the inflammatory modulator gene NLRP12 [23]. We identified two patients with the p.P21R variant (one homozygous) in tumour necrosis factor receptor superfamily member 13C gene TNFRSF13C (BAFFR) [27] which is reported to have functional consequences through effects on multimerisation [42]. We noted poor coverage of TNFRSF13C in our WGS data and subsequent Sanger sequencing of all 34 patients revealed an additional 4 patients heterozygous for this variant (frequency of 0.176 compared to 0.059 in the 1000 Genomes Project) [41]. Details of variants highlighted throughout this report can be found in Table 2.
To identify novel variants of interest we selected high quality, and likely pathogenic variants. Because of the expected polygenic nature of CVID we used the relatively mild selection criteria (1000 Genomes and WGS500 frequency <0.05 and classified as pathogenic or likely pathogenic by Ingenuity Variant Analysis (IVA)), and included both heterozygous and homozygous variants resulting in 4768 variants in 3737 genes (details on the filtering can be found in Table 3). The majority of variants were SNVs (Fig. 1a) in exonic regions together with a number in annotated regulatory regions including promoters, 5′ and 3′UTRs, microRNA binding sites, splice sites and noncoding RNAs (Fig. 1b). In our analysis strategy ( Fig. 1c) we focused on variants present in regions implicated in GWAS [30], genes involved in described primary immunodeficiency disorders (PIDs) or genes with a direct link with previously identified CVID genes. Moreover, we also looked at pathways in which variants were overrepresented.

GWAS and PID genes
The CVID GWAS conducted by Orange and colleagues [30] included 29 of the patients described here. Within the regions associated with CVID development we identified 43 potentially pathogenic rare variants (Table S4) of which 39 are located in the HLA region of chromosome 6. We highlight the variants of interest based on a combination of gene function, frequency (1000 Genomes Project and WGS500 compared to our cohort) and predicted deleteriousness (PolyPhen, SIFT and mutation taster prediction score) ( Table 2) [41,[43][44][45]. Two variants were identified in SKIV2L, a gene associated with trichohepatoenteric syndrome 2 and syndromic diarrhoea, two rare diseases associated with low immunoglobulin levels and the absence of vaccine responses [46]. Four variants (p.R1883Q, p.P1745R, p.R1299Q, p.S1112F) were observed in MDC1, a gene involved in DNA damage response [47]. We also identified six patients (17.6%) with predicted deleterious variants in PRRC2A (p.R804C, p.R1556W, p.R1397W, p.F2083S) a gene associated with type 1 diabetes and rheumatoid arthritis, suggesting a function in immune regulation [48,49]. We did not find any predicted deleterious variants in CLEC16A, a gene recently associated with CVID via GWAS [50].
Next, we hypothesised that a combination of heterozygous variants in genes where homozygous variants cause PIDs (Note S1) may contribute to the pathogenicity of CVID as previously described for rheumatoid arthritis [51]. Analysis resolved 55 variants in described PID genes (Table S5). Again the most likely disease causing variants were identified based on frequency, gene function and predicted deleteriousness ( Table 2). Variants of interest include a novel, predicted damaging p.S2221P/p.S2210P variant in LRBA and heterozygous predicted deleterious variants in other genes in which loss of function leads to low or absent immunoglobulin levels (IKBKB), combined with low or absent B-cell subtypes (TCF3, BTK, CD79A, PMS2, POLE), and/or T-cell dysregulation (IL21R, STIM1, TAP2, DKC1) [52].
We identified five CVID patients with heterozygous variants in genes which are involved in non-homologous end joining (NHEJ) or V(D)J recombination and associated with T-B-SCID (DCLRE1C, PRKDC, RAG2) [52,53]. We found an additional four variants in other PID genes involved in NHEJ (NHEJ1, MRE11A, ATM, NLRP2) [53]. We also identified variants in PID genes involved in B-cell receptor (BCR) signalling (IKBKB, CD79a, BTK, KRAS) including a novel hemizygous variant in BTK (p.N350fs11/p.N526fs11/p.N560fs11) that leads to a frameshift and a premature stop codon. This male patient was previously screened for known BTK mutations and clinically fits the phenotype of X-linked agammaglobulinaemia (XLA) caused by BTK deficiency [52]. Functional follow-up is currently being conducted on this finding.

Genes in sporadic CVID
Next, we further investigated the 31 sporadic CVID patients in our cohort (excluding the grandmother:grandson pair (Note S2) and the possible XLA patient). Filtering as described in Table 3 identified 4422 variants of which 12 were present in five or more patients (Table  S6). Of special interest is a likely gain of function variant in LILRB5, an inhibitory member of the leukocyte Ig-like receptor (LIR) family, of which little is known though other members of this protein family inhibit the activation of immune cells (Table 2) [54].
Likely pathogenic variants in genes for which there is evidence from the literature of direct interaction with genes involved in CVID-like antibody failure were investigated using the '1 hop' function in IVA (Note S3). We identified 112 variants (Table S7) of which 38 were novel, including predicted damaging variants in PSMB9, a gene involved in the production of peptides for MHC class I presentation, and TNFAIP3 which plays a role in TNFα induced apoptosis and at the protein level is known to interact with TNIP1 in which we identified two different novel variants [55]. In addition we found variants in seven genes involved in B-cell receptor signalling (ARID3A, INPP5D, SH3BP2, BANK1, BLK, and GAB2), a variant in CAMLG encoding a protein involved in both TACI and TCR signalling, and a variant in FCGR3B encoding FCγR3B [56][57][58][59][60][61][62]. One patient (C032) was found to have a variant in the 3′UTR regions of two genes (BCL2L11 and EBF1) involved in IL7Rα signalling [63,64]. Dysregulation of this pathway leads to low T-cell numbers and low antibody production [52]. Moreover, EBF1 is a master regulator of early B-cell development in the bone marrow [63,65].
Overall, our analysis strategy to identify variants associated with CVID, variants in GWAS regions, PID genes, the most common variants and variants in genes interacting with described CVID genes (Fig. 1c), identified on average 9.4 variants (range 5-15) in each of the 31 sporadic CVID patients. Two patients do not share any variants with the other patients, while in others, the majority (up to 87%) of variants are also found in one or more other patients ( Fig. S1 and S2). A maximum of five variants are shared between two patients. Fig. 2 shows the overlap of variant-containing genes between patients. Visualising this shows that all patients share at least one variant-containing gene with another patient (Fig. 3). Patients did not cluster together based on their clinical phenotype (IgM+/− and complex vs infections only phenotypes) (Fig S3). We did not have sufficient power to do a comparison analysis between different patient groups.

Integration with RNA sequencing analysis
As a pilot study, RNA-seq data were generated for B-cell enriched samples from three sporadic CVID patients (C078, C085 and C044, selected on B-cell numbers and sample availability) and three healthy controls (Figs. 4a-b; Fig. S4). We performed differential gene expression analysis between CVID patients and healthy controls for 16,546 transcripts revealing 262 transcripts that were differentially expressed (false discovery rate <0.05 and fold change >2) ( Fig. 4c; Table S8). Down-regulated genes included those encoding immunoglobulin heavy and light chains as would be expected. This, together with our ability to reproduce a previous study showing increased FAS (a member of the TNF receptor superfamily encoding a receptor inducing programmed cell death) expression in CVID patients suggests that this small pilot study is able to pick up important differences in gene expression (Fig. 4d) [66]. We also find significantly increased expression of four additional PID-associated genes in the CVID patients ( Fig. 4d and Fig. S) including IL10RA, encoding the receptor for interleukin 10 that mediates immunosuppressive signals. None of the genes located in regions implicated by GWAS or genes highlighted in previous paragraphs ( Table  2) were significantly differentially expressed.
The WGS analysis we performed enables the prediction of variants deleterious to protein function. Combining WGS with RNA-seq allowed us to identify non-coding variants with evidence of altered gene expression. For each of the three patients with RNA-seq data, we applied filters to their WGS data as described in Table 3, with an alteration to the predicted deleterious filter to include variants which modify promoter and enhancer regions in addition to those with predicted deleterious effects to protein function. To determine gain of function variants we identified variants linked to genes with higher expression in CVID patients compared to healthy controls (fold change >2 from median expression) (n = 106, chi squared p value < 0.0001, Table S9). Fig. S6 shows the increased expression of ITGB2, a described PID gene, and ICAM1, which interacts with three CVID related genes (NLRP12, CD81, PLCG2). We also identified variants linked to reduced gene expression (n = 116, chi squared p value < 0.0001, Table S10). These include SH3BP2 which interacts with and activates other CVID and PID genes and VPREB3 which is involved in B-cell development (Fig. S6) [67,68]. No variants linked to altered gene expression are found in the regions associated with CVID identified by GWAS [30].

Pathway analysis using WGS and RNA-seq
Due to the heterogeneous and probable polygenic nature of CVID we hypothesised that different variants or combinations of variants in the same pathway could lead to similar clinical phenotypes. Therefore we focused on the identification of pathways that are possibly dysregulated in CVID. We identified the 100 pathways most enriched for genes with predicted deleterious variants in the 31 sporadic CVID patients based on WGS data (Table  S11) and compared this with the 52 pathways enriched for differentially expressed genes in the three CVID patients compared to healthy controls from Ingenuity Pathway Analysis (IPA) (Table S12). We found that 24 of these pathways overlap ( significantly greater than expected by chance (Fisher's exact test p < 0.0001; Fig. 4e). The presence of 'iCOS-iCOSL signalling' in both analyses confirms previous findings [10,11]. Other pathways of interest include those involved in the development of the immune system ('haematopoiesis from pluripotent stem cells') and regulation of apoptosis ('death receptor signalling'); and also pathways involved in cytokine signalling ('IL-6 signalling', 'IL-10 signalling') and differentiation or signalling of T helper cells ('CD28 signalling', 'PKCθ signalling' and 'T helper cell differentiation').
CVID may result from dysregulation of different pathways in different patients and analysing all 31 patients as a group may not reveal this. Therefore we repeated the WGS pathway analysis on a patient by patient basis to identify potentially important pathways at the individual level. This revealed an average of 5.8 (range 1-18) significantly enriched pathways per patient (Table S13). Pathways of particular interest include 'PI3K/AKT signalling' (three patients) and 'the role of BRCA1 in DNA damage response' (three patients) 'Actin Cytoskeleton signalling' (two patients) and 'DNA Double-strand Break repair by non-homologous end joining' (one patient).

Structural perspective on predicted deleterious variants
To further understand the mechanism by which the predicted deleterious variants (Table 2) could modulate protein structure and function, we mapped them onto available structures. Three selected variants highlighting changes of protein surface properties and interactions between secondary structure elements (helices) are shown in Fig. 5. Changes of protein surface properties could alter protein-protein interactions whereas disruption of bonding within the protein may alter folding, conformation and thus function. All three variants were mapped onto available crystal or solution NMR structures [69,70]. The novel variant p.S2210P that we identified in LRBA was found to affect an evolutionarily conserved, partially buried Ser2210 of LRBA located within the BEACH domain that is crucial for protein function [69]. The side chain of Ser2210 makes multiple hydrogen bonds with Glu2213 and Arg2397 (Fig. 5a). The Ser2210Pro mutation would abolish these interactions and could thus alter conformation and/or stability of LRBA. In contrast, with the variant p.R534W we identified in the B cell receptor signalling protein SH3BP2, the side chain of Arg534 does not interact with any residues within the SH2 domain (Fig. 5b). As Arg534 is exposed on the SH2 surface, the Arg534Trp mutation would make SH3BP2 more hydrophobic and could therefore alter SH3BP2 interactions with its potential binding partners recognising the surface of SH3BP2 centred on Arg534. The pR280W mutation of the deubiquitinase TNFAIP3 (also known as A20) involves Arg280 which is located within the catalytic domain and stabilizes conformation of the domain by binding to Asp279 (Fig.  5c) [70]. The Arg280Trp mutation would abolish these interactions and thus might affect the overall conformation, catalytic activity and/or TNFAIP3-protein interactions. We further assessed whether these three variants could affect protein stability using a programme mCSM [39]. All three mutations are predicted to destabilize the proteins (change in Gibbs free energy, ΔΔG, kcal/mol: LRBA Ser2210Pro, −0.236; SH3BP2 Arg534Trp, −0.461; TNFAIP3 Arg280Trp, −0.138).

Discussion
Here, we present the first WGS analysis of sporadic CVID patients and show that this does not reveal a unifying genetic aetiology in terms of highly penetrant mutations involving a single gene or pathway but rather that CVID is likely to include a number of different disease aetiologies matching a polygenic model. We applied an analysis strategy considering previously implicated candidate genes, loci implicated by CVID GWAS, together with known primary immunodeficiency disorder genes. Further investigation of genes with experimental evidence of interaction with those involved in CVID-like antibody failure was informative. We adopted an integrated analysis that considered WGS data in the context of differentially expressed genes in CVID patients, allowing interpretation in a biological context. We recognise that significantly larger sample sizes would allow further progress in resolving the heritable basis of CVID and resolving the current classification of disease aetiology. Our finding of a novel BTK mutation illustrates the diagnostic power of the WGS approach.
Among the CVID patients analysed, we found variants in TNFRSF13B, TNFRSF13C and NLRP12, consistent with previous findings [23][24][25][26][27]. We also found a novel heterozygous variant in LRBA, which is predicted to influence protein stability. Using our combined analysis strategy, we are able to identify a number of important pathways where we find evidence of pathogenic mutations which together with biological evidence that we and others have generated reveal likely significant pathways in CVID. For example we found significant enrichment in WGS and differential gene expression analysis of ICOS-ICOS ligand signalling confirming the previously found importance of this pathway in antibody failure [10,11].
In one patient we found reduced expression and two variants (one coding and one noncoding) in SH3BP2. The coding variant might alter interactions with potential binding partners. Moreover, in our RNA-seq analysis BCR signalling was one of the most significantly enriched pathways in CVID patients.
Previous studies showed that variants in BAFF-R, a receptor important for B-cell survival, can cause or predispose to CVID [12,27,44]. We, and others, show increased expression of FAS, an apoptosis inducing receptor, in CVID patients [66]. Moreover, we found a variant which might alter the overall conformation of TNFAIP3, a gene inhibiting TNFα induced apoptosis, and novel variants in TNIP1 which is a known TNFAIP3 interaction partner. Death receptor signalling was significant in pathway analysis of both the RNA-seq and WGS analyses together suggesting that apoptosis might be dysregulated in a subgroup of CVID patients. As NHEJ is essential in V(D)J recombination and class-switch recombination, inefficient NHEJ could lead to CVID like symptoms. We find variants in DCLRE1C, PRKDC, RAG2, NHEJ1, MRE11A, ATM and NLRP2 in our patients, all genes important in the initiation of V(D)J recombination or DSB repair by NHEJ. The hypothesis that NHEJ defects could lead to CVID in a subgroup of patients is supported by the recent finding of a leaky block in Bcell development in a study comparing the bone marrow of CVID patients and healthy controls [71].
Our analysis also highlighted the role of T-cells in sporadic, non familial CVID. We found variants in genes associated with T-cell regulation (IL21R, STIM1, TAP2) with WGS. Pathway analysis of both the RNA-seq and WGS data revealed enrichment of a number of T-cell related pathways (T helper cell differentiation; CD28 signalling; ICOS-ICOSL signalling). Dysregulation of T cells could also be the result of pathways found to be significantly enriched in the WGS data related to dendritic cell maturation, interaction between innate and adaptive immune cells and cytokine signalling (IL-18, IL6 and IL10) which would lead to B cell dysfunction through altered B-T cell interactions in CVID patients.
Interestingly, we found a relatively high number of T-cell related pathways in our RNA-seq analysis, having investigated and compared the expression profiles of samples enriched for B-cells. One possible explanation is the presence of residual T-cells in our samples (61.9-96.1% B-cell purity). Alternatively, genes that are important in T-cell signalling pathways could also be expressed in B-cells and therefore be found in our analysis.
In one of our CVID patients we identified variants in the 3′UTR region of two genes involved in IL-7R signalling (BCL2L11 and EBF1) [63,64]. 3′ UTRs are important for the determination of translation efficiency, localization and stability of mRNA. In mice IL-7R signalling and EBF-1 expression have been described as central regulators in B cell development where their interplay positively regulates B lineage commitment, survival and proliferation [64]. In humans, EBF-1 is important in B-cell lineage commitment [63]. In contrast, the role of IL-7R signalling in B-cell development seems less important as patients with IL-7Rα deficiency have normal B-cell numbers.
Together our data illustrate the strength of a combined approach using WGS and RNA-seq and highlight that CVID is a highly heterogeneous and polygenic disease. Our results corroborate with previous findings in TNFRSF13B, TNFRSF13C and NLRP12. Moreover, our data highlight variants in genes involved in ICOS-ICOSL signalling, regulation of apoptosis, and BCR signalling. We were also able to identify novel pathways which may be important for CVID, including NHEJ, T cell regulation and cytokine signalling. However, functional follow-up studies in individual patients are needed to confirm the role of variable pathways involved in the pathogenesis in CVIDs.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.  Overview of analysis strategy and distribution of variant type and genetic location of variants. (a) Distribution of variant type for variants that passed the filtering steps as described in Table 3. (b) Number of variants in different genetic locations in the 4768 variants that passed the basic filtering described in Table 3. (c) Overview of the analysis strategies applied to the WGS data.     points highlight genes with an adjusted p value < 0.05 and fold change >2. A positive fold change indicates upregulation in the CVID patients. Some of the most differentially expressed genes are named. (d) Differential expression of FAS and IL10RA between CVID patients and healthy controls. (e) The 100 most significantly enriched pathways for WGS predicted deleterious variants were reported by IVA. Fifty-two pathways were enriched for differentially expressed genes from the RNA-seq analysis. The overlap of enriched pathways between these two datasets is greater than expected by chance (Fisher's exact test p value < 0.0001).  fragment of human LRBA encompassing the pleckstrin homology (PH, blue) and the beige and Chediak-Higashi syndrome (BEACH, green-to-red) domains (PDB code 1T77) [69]. The side chain of Ser2210 makes hydrogen bonds to Glu2213 (both main chain and side chain) and the side chain of Arg2397. Amino acid numbering corresponds to UniProt entry P50851, isoform 2. (b) Solution NMR structure of the SH2 domain of SH3BP2 (PDB 2CR4; UniProt P78314, isoform 4). The side chain of Arg534 is exposed on the protein surface and does not make any bonds with other residues. (c) The crystal structure of the catalytic domain of TNFAIP3 (PDB 3ZJE; UniProt P21580) [70]. The side chain of Arg280 is exposed on the surface and makes hydrogen bonds to the side chain of Asp279. Overview of clinical information on the 34 CVID patients.  Table 2 Additional details on variants highlighted in the text.  Table 3 Overview of basic filtering applied in the majority of the analyses. Details on the settings used and the rationale behind using these filters and the number of variants left after filtering in either 31 or 34 patients are indicated.