Massive parallel IGHV gene sequencing reveals a germinal center pathway in origins of human multiple myeloma.

Human multiple myeloma (MM) is characterized by accumulation of malignant terminally differentiated plasma cells (PCs) in the bone marrow (BM), raising the question when during maturation neoplastic transformation begins. Immunoglobulin IGHV genes carry imprints of clonal tumor history, delineating somatic hypermutation (SHM) events that generally occur in the germinal center (GC). Here, we examine MM-derived IGHV genes using massive parallel deep sequencing, comparing them with profiles in normal BM PCs. In 4/4 presentation IgG MM, monoclonal tumor-derived IGHV sequences revealed significant evidence for intraclonal variation (ICV) in mutation patterns. IGHV sequences of 2/2 normal PC IgG populations revealed dominant oligoclonal expansions, each expansion also displaying mutational ICV. Clonal expansions in MM and in normal BM PCs reveal common IGHV features. In such MM, the data fit a model of tumor origins in which neoplastic transformation is initiated in a GC B-cell committed to terminal differentiation but still targeted by on-going SHM. Strikingly, the data parallel IGHV clonal sequences in some monoclonal gammopathy of undetermined significance (MGUS) known to display on-going SHM imprints. Since MGUS generally precedes MM, these data suggest origins of MGUS and MM with IGHV gene mutational ICV from the same GC B-cell, arising via a distinctive pathway.


INTRODUCTION
In multiple myeloma (MM), malignant plasma cells accrue in the bone marrow (BM) expressing CD138, a marker of terminal B-cell differentiation. In MM, an early question has been the nature of the cell of origin or normal B-cell counterpart in which neoplastic transformation begins. Addressing this question, immunoglobulin variable gene (IGV) analyses in MM have provided pivotal insights.
IGV genes encode the antigen-binding domain of the immunoglobulin molecules in the B-cell receptor (BCR), a receptor that is essential to survival and maturation of normal B-cells [1]. Pathways of B-cell maturation are determined by the context in which antigen is seen via the BCR. In the presence of cognate T-cell help, a distinctive phase of differentiation is initiated in the germinal center (GC), in secondary follicles of lymphoid organs that leads to the mainstay of memory, generating memory B-cells or long-lived plasma cells (LLPCs) located in the BM [2,3]. The GC can be morphologically compartmentalized to two areas, a dark zone (DZ) of proliferating B-cells or centroblasts (CBs) that downmodulate BCR and express CXCR4, and the light zone (LZ) of small B-cells or centrocytes (CCs) expressing BCR and CXCR5 [4]. Affinity maturation to improve BCR fit for antigen occurs www.impactjournals.com/oncotarget by targeted somatic hypermutation (SHM) of IGV genes, initiated by activation induced cytidine deaminase (AID) that is expressed at high levels in CBs in the DZ [2][3][4][5][6]. Following proliferation, selection of B-cells by antigen occurs after migration to the LZ in CCs, and is dependent on affinity for antigen complexes presented by follicular dendritic cells and interaction with follicular T H (T FH ) cells [7]. Precise imaging experiments have shown that CBs and CCs shunt bi-directionally between the DZ and LZ [8], to permit continued induction of SHM. Isotype class switch recombination (CSR) also occurs in the GC, and can be initiated in cells on the cusp of proliferating clonally [9]. CSR is an irrevocable process of deletional DNA recombination events dependent on AID activity [6]. B-cells selected by antigen exit the GC to two fates, circulating as CD27 + human memory B-cells or as cells committed to LLPC maturation that home and reside in the BM [3-5, 10, 11].
IGV gene analysis in B-cell neoplasms can delineate transit via the GC, and the precise pattern of SHM in clonally-related transcripts provides clear insight into when neoplastic transformation is likely to have occurred [12]. In MM, early data from heavy chain IGHV gene analyses from our group and others revealed an extensive SHM load in tumor-derived sequences, and sequencing of tumor transcripts segregated by standard cloning strategies and Sanger sequencing revealed intraclonally homogeneous IGHV sequences, consistent with transformation occurring at a post-follicular stage with SHM silenced [12]. However, when we examined IGHV gene sequences in monoclonal gammopathy of undetermined significance (MGUS), we observed that tumor-derived IGHV sequences revealed a marked intraclonal variation (ICV) in mutation patterns in some cases, implicating neoplastic arrest in MGUS with ICV at an earlier stage of maturation, at a stage consistent with exposure of the cell of origin to on-going SHM in the GC [13]. Furthermore, when we examined progression of MGUS-MM in paired cases by IGHV analysis, we observed that in one MGUS-MM pair, we could locate variant MGUS-like IGHV mutations also in MM clonal sequences [14]. At that point in our studies, we suggested that MM evolves by cloning out of MGUS-like cells [14]. Given the technical limitations of cloning and sequencing strategies used earlier in the analysis of the MM clone by IGHV analysis, and our initial observations that clonal variants could persist in MM from an MGUS stage, we reasoned that if the IGHV sequencing depth is ramped up, as currently feasible by next generation sequencing, it may be possible to observe ICV more commonly in MM, with new implications for tumor origins. To do this, we have carried out massive deep parallel sequencing of IGHV genes in 4 of 4 symptomatic IgG MM cases at presentation, comparing findings with 2 of 2 cases of normal PCs purified from BM and 1 control IgA MM case in which non-tumor IgG transcripts were amplified. Our data substantiate our hypothesis, and we find clear evidence for ICV among clonal MM-derived IGHV sequences. This suggests as the most likely model that MM can originate in a GC B-cell committed to terminal maturation but still targeted by SHM.

Deep sequencing IGHV gene IgG transcripts and error rate
IGHV gene IgG transcripts from malignant and normal BM PCs were analyzed by next-generation deep sequencing. Following read-pair alignment and sequence quality filtering, the total number of high quality reads for both technical replicates in each donor ranged from 97,698 to 2,009,338. The percentage of reads for each unique IGHV sequence was congruent between the two technical replicates ( Figure S1). To increase confidence that reads represented true sequence variants rather than errors introduced by base mis-incorporation during PCR or sequencing mis-reads, sequence variants were only analyzed where they were present at high frequency in both technical replicates. No clonotype sequences with high read counts were exclusive to a single technical replicate.

IGHV gene use in MM
Single tumor-derived clonal IGHV sequences were readily identified in 4 of 4 MM cases with identical CDR3 nucleotide motifs to establish monoclonality, and utilized germline genes IGHV2-70D*04, IGHV3-30*18, IGHV4-39*01, and IGHV4-59*01 respectively, each displaying marked imprint of SHM with % homology to germline varying between 89.3 -94.6 ( Figure 1). In these samples, the same CDR3 sequence was found in between 74.0 -87.3% of reads, surpassing cut-offs of >5% used to identify clonal IGHV gene use by deep sequencing [15]. Total reads from 2 replicates with identical CDR3 in respective cases were: MM1 (1,660,375); MM2 (1,232,656); MM3 (1,218,020); and MM4 (2,009,338) (Table S1). No dominant clonal IgG sequence was identifiable in control MM5 case, as expected. Tumor-derived subclonal sequences as shown were identical, e.g. in MM1 the dominant subclonal sequence identified from deep sequencing were aligned by computational analysis with donor germline IGHV genes and CDR3 identity used to delineate tumor-derived clonal sequences. Only selected informative codons are shown, focusing on revealing replacement nucleotides that arise by SHM for purposes of clarity. The % homology to germline IGHV gene of dominant clonal sequence in MM1-4 is stated in each case, and the dominant sequence is shown aligned to the germline gene. Data on the top 10 clonally-derived MM tumor IGHV sequences are shown aligned to the dominant sequence or top read, and not the germline gene to reveal the significant intraclonal variation observed in mutation patterns (dots denote sequence identity to the top read). For MM1 and MM2 aligned sequence matches start at position 43, for MM3 at position 65, and for MM4 at position 68. Translation to deduced amino acid (aa) sequences are shown aligning germline aa with replacement aa resulting from mutations in tumor IGHV gene sequences (denoted with *, and no change as -). No dominant clonal IGHV gene sequence was identified in the control MM5.

Pattern of SHM in MM-derived IGHV gene sequences
gave a count of 2862 and these were all homogeneous in sequence ( Figure 1). In MM1, the dominant subclonal homogeneous sequence and the second-most prevailing variant sequence were comparable in number. In 3 of 4 MM cases (MM2-4), the dominant subclonal sequence far surpassed in number the second-most predominant variant sequence. These data indicate that the level of ICV differed between the 4 MM cases and that in cases MM2-4 the tumor clones are dominated by a homogeneous clonal outgrowth.
Nevertheless, the proportion of variant sub-clonal sequence reads over and above the dominant sequence or top read were substantial and were found to comprise 99.8% in MM1, 50.7% in MM2, 53.0% in MM3 and in MM4 30.7% of the entire tumor clone (calculated from reads in Table S1).
Strikingly, in each of the 4 MM cases, there is extensive evidence of ICV as revealed by multiple variant IGHV sequences derived from a single tumor clone ( Figure 1). These data derive from 4 'biological replicates' (MM1-4) in experimental terms and yield reproducible results.

Profile of non-tumor PCs in MM samples
IgG transcript profiles in MM1-4 indicated that profiles of non-tumor IgG PCs were markedly depressed ( Figure 2).

IGHV gene use in normal PCs
A diverse repertoire of IGHV gene use was observed in IgG transcripts in normal PCs in samples NPC1-2; however, most of these were expressed at low frequencies (data not shown). IgG transcripts were grouped by IGHV gene germline use and expression levels mapped on a linear scale based on % frequencies ( Figure 2). The overall use of IGHV2-5 germline gene was the largest and common to NPC1-2 and MM5, followed by frequency of usage of the IGHV1-69 gene in these 3 samples ( Figure 2). In the control IgA MM5 case, usage of germline IGHV gene elements resembled that seen in NPC1-2 ( Figure 2).
Oligoclonal expansions were however extensively observed in NPC1-2 based on identical CDR3, and the largest expansion in NPC1 accounted for 2.4% of all IgG transcripts and in NPC2 for 7.8%. Our data also indicate that in deep sequencing of IgG transcripts in MM, the cut-off required to identify the tumor clone is likely to be higher (>7.8%).

Pattern of SHM in clonally related IGHV gene sequences in normal PCs
The largest oligoclonal expansions of IgG transcripts in NPC1-2 were evaluated, and subclonal variants related by identical CDR3 in each expansion aligned at the nucleotide level, using the top 10 most prevalent variant sequences (Figure 3). In 2 of 2 normal BM PC populations, oligoclonal expansions reveal marked ICV (Figure 3), comparable in mutational load and variation to that observed in MM1-4. Analysis of phylogenetic trees indicated comparable degrees of branching and therefore exposure to comparable rounds of SHM in evolution of clonally-related normal PCs and of clonally-related malignant PCs (Figure 4).

DISCUSSION
On-going SHM occurs in B-cells during extensive proliferation within the DZ of a GC, whereas in the LZ key selection events dictate B-cell fate [1,4,5]. Improved affinity for antigen by SHM allows preferential access to T FH cells in the LZ for extended stimuli, which appear to dictate either re-entry to the DZ or differentiation [7,16]. The precise nature of signals and immunological synapses that determine fate as memory B-cell or PC in the GC however are as yet not fully defined, but are orchestrated by specific transcription factors. Of these, IRF4 is required for induction of AID and CSR and for derivation of PCs [17]. Specifically, graded expression of IRF4 regulates AID levels and induction of Blimp-1, an obligate requirement for terminal differentiation of PCs [18]. Interestingly, a 'pre-PC' stage has been described that delineates GC B-cell commitment to PC fate and precedes Blimp-1 expression and is able to undergo SHM, also expressing Flt3 and Embigin that are repressed by Pax5 at an earlier stage [19]. Normal B-cells that undergo SHM but fail to improve fit for antigen undergo apoptosis by a default mechanism [2,20,21]. This check-point however may be by-passed by transformation, as suggested by GC B-cells that carry the t(14;18) translocation in follicular lymphoma [22].
The clonal nature of PCs derived from selection events in the GC can be gauged from IGHV gene analysis in data from the T-dependent response to a (4-hydroxy-3-nitrophenyl) acetyl-protein conjugate antigen in murine B-cells, seeking to assess a role for affinity in antibody secreting cell (ASC) selection and migration to the BM [23]. These experiments clearly show clonal waves of isotype switched antigen specific ASCs migrating to the BM, displaying clonally-related V H 186.2 gene sequences with marked ICV. These observations also indicate that clonal variants that are antigen-specific are selected in the GC to mature sequentially or synchronously to ASCs for homing to the BM, with late GC emigres displaying further evidence of accrual of SHM imprints in V H 186.2 sequences [23], consistent with re-entry to DZ of antigenselected B-cells.
Data from our analysis of IGHV genes in normal human PCs from the BM reflect these features of selection in antigen-specific murine ASCs [23]. We observed oligoclonal expansions of normal PCs with marked ICV in each. Comparable observations were also reported in a study of normal BM PCs from a single donor by deep sequencing, with the largest IGHV3-7 derived clonal expansion providing evidence of ICV to a level comparable to that seen in our data on normal PCs [24]. These findings delineate that LLPCs in the BM comprise multiple variant subclones within oligoclonal expansions to specific antigens, which persist over time at this site. Normal LLPCs in the BM are known to derive mainly from GC B-cells [11,25].
In our earlier study of MGUS, IGHV gene analysis carried out by conventional cloning and sequencing revealed monoclonal tumor-derived sequences with  marked ICV in 3 of 7 cases [13], to mirror characteristics of oligoclonal expansions of normal LLPCs in the BM. The IGHV gene data on MGUS suggested origins from a GC B-cell exposed to on-going SHM, and that has acquired a neoplastic genomic Event 1 to by-pass the default apoptotic fate dependent on antigen selection pressure [13,14]. Such a cell of origin in MGUS would be predicted to be committed transcriptionally to a PC fate, but able to undergo further rounds of SHM in the DZ. The inference from the MGUS findings is that a comparable normal GC B-cell may exist, committed to terminal maturation but able to re-enter the DZ and undergo SHM, possibly a 'pre-PC' [19].
In contrast to MGUS, in our and others' initial studies of MM-derived IGHV sequences, clonally homogeneous patterns of mutations were identified in individual MM cases, including in plateau phase of disease, but these data were based on analysis of limited numbers of clonally-related sequences [12]. The data suggested that neoplastic transformation occurs at a postfollicular stage in MM, with the cell of origin no longer exposed to SHM [12]. However, in our IGHV gene study of 2 matched pairs of MGUS that evolve to MM, we identified that a single clone evolves in each case, and observed that in 1 of 2 matched pairs variant IGHV tumor sequences persisted to the MM stage of disease [14]. These appeared to be derived from 'MGUS-like' cells in the MM clone [14].
Based on these initial observations, we hypothesized that clonal tumor-derived variants may be more common in symptomatic MM presentation disease that could be traced in IGHV genes with the enhanced sensitivity of deep next generation sequencing. In the present study, we indeed readily identified clonal IGHV gene sequences with ICV in 4 of 4 typical MM cases, substantiating our hypothesis that this may be a frequent feature of MM disease. This finding has only emerged from massive parallel sequencing of IGHV genes, and was not readily apparent from lower depth of analysis by conventional methods. These data are consistent with the concept that the cell of origin in these MM arises during the GC, from a B-cell continually targeted by SHM but committed to terminal maturation to generate the ICV we observe in clonal IGHV sequences, and mirrors origins of MGUS ( Figure 5). Given the seminal observations that most if not all MM originate directly from MGUS [26,27], our data suggest that MM with ICV in IGHV gene mutations arises from the same GC cell of origin as in MGUS with ICV in IGHV gene mutations, with disease presenting first as MGUS.
Although in 3 of 4 MM cases, clonal outgrowth is dominated by a single sequence subclone, the remaining fractions of variant sub-clonal sequences with ICV are nevertheless highly significant. This emerges when we examine the known MM tumor mass. In IgG MM the total tumor cells per patient has been calculated as 0.5 -3.1 × 10 12 based on data of in-vivo incorporation of 125 I into IgG [28]. To illustrate this, in MM2 50.7% of the clonal reads are sub-clonal variant sequences with ICV over and above the dominant sequence. Taking the lower estimate of tumor mass, 50.7% of 0.5 × 10 12 equates to 0.25 × 10 12 transformed cells. Given that normal PCs only account for ~0.5-1.0% of mononucleated cells in the BM, and a total number of ~10 9 comprising the complete polyclonal pool [29], the 'variant' tumor mass in MM1-4 can only be generated by proliferating transformed cells that persist.
Substantiating our observations, a recent study of a single MM case by massively parallel pyrosequencing also reported a dominant single clonal IGHV sequence with ICV [24]. Furthermore, in an initial report using the Lympho-SIGHT™ platform of high throughput sequencing of IGHV genes, in 401 MM cases, 71 (17.7%) showed evidence of clones related through SHM to the index clone at diagnosis [30]. Significantly, however, in the latter preliminary report [30], no conceptual framework was addressed to describe a possible pathway of clonal origins in MM in relation to the complexity of events in the GC, nor any association made with SHM patterns that we previously reported in MGUS [13]. Nevertheless, this initial report as an Abstract [30] appears to firmly substantiate our data that intraclonal variation in tumor-derived IGHV sequences occurs in MM. Additional high-throughput sequencing of IGHV genes in MM with the Lympho-SIGHT™ platform has also been reported directed at investigating minimal residual disease [15].
SHM levels in both MGUS and MM clonal IGHV genes are comparable in mutational load to normal LLPCs in the BM that largely derive from GC B-cells [11,25], supporting the concept of GC origins of transformed PCs. MGUS cells retain expression of CD27 in most cases, although it is largely shed more frequently by aberrant MM PCs [31]. CD27 expression is acquired via GC trafficking [10].
With regard to a GC cell of origin that undergoes SHM and is committed to terminal maturation, previous investigations of IgD + GC B-cells are noteworthy. In early studies, a subset of sIgM -IgD + CD38 + GC centroblasts were identified that have undergone class switch by Cμ-Cδ deletion, exhibit a marked λ light chain restriction and are supramutated in IGHV genes, but only differentiate to PCs in-vivo and do not circulate as memory B-cells [32]. Of high relevance, these IgD + GC B-cells have also been proposed as the cell of origin in rare IgD + MM disease, which reveal comparable genetic features of supramutated IGHV tumor-derived genes, a λ light chain bias and switch by Cμ deletion [32,33].
Deletional CSR events occur in T-dependent antigen GC responses in B-cells [2,4,7]. Most MM are derived from isotype switched cells, and in MM that switch to IgG on the functional allele, a single switch event occurs that is retained and can be tracked as a signature nucleotide motif in relapse disease [34]. This is an important observation, as it shows clonal origins in MM from a single switch event that correctly ligates CSR on the functional allele. Such a switched cell must then re-enter the DZ to be subjected to further SHM to generate the ICV we observe in IGHV gene sequences in MM cells (schematically represented in Figure 5). The non-functional allele however provides important clues that the stage of deletional CSR events associates with the impact of Event 1 in clonal origins of MM. Dysregulated CSR activity on this allele generates chromosomal translocations that map to the IgH 14q32 locus in the main to S H switch sites, and has been widely recognized as an early pathogenetic event in MM [35,36]. Aberrant deletional CSR events most likely occur in a background of genomic instability that may have resulted from Event 1 (Figure 5), possibly of an epigenetic nature, but these as yet remain unidentified. Event 1 is also predicted to allow escape from the normal antigen affinity associated GC check-point to by-pass apoptosis. AID activity underpins aberrant CSR events, as evident in B-cell tumors characterized by IgH locus translocations [37]. Several 14q32 translocations have recurrent partner chromosomes, restricted to subsets but none are universal to MM disease [35]. It is conceivable that in MM lacking aberrant 14q32 chromosomal translocations deletional CSR progresses without being dysregulated, and that the nature of Event 1 may differ. In MGUS, a comparable spectrum of 14q32 chromosomal translocations also occur in disease [35], and may also associate with timing of Event 1 in the GC-derived cell of origin.
It is generally accepted that secondary event(s) or lesion(s) underlie malignant transformation of MGUS to MM. At present, it is unclear from our data whether the variant sequences identified in monoclonal MM IGHV transcripts are derived from residual 'MGUS-like' cells or whether these cells persist as subclonal expansions because they have independently acquired separate secondary hit(s) as Events 2-N ( Figure 5). These events may include mutations in key genes that when estimated as % of the tumor population emerge as subclonal events in different MM patients: KRAS (20-72%), NRAS (32-96%), BRAF (36-92%) or DIS3 genes (29-81%) [38]. Given that a long time-lag is usual in MGUS transforming to MM, subclonal neoplastic events could occur independently in cells with variant IGHV sequence. Data from mapping the genome in MM with aCGH and at the exome and whole-genome levels substantiate clonal in MM suggest a model in which a germinal center (GC) B cell that has undergone neoplastic Event 1, possibly at the stage of deletional class-switch recombination, re-enters the GC dark zone and is subjected to on-going somatic hypermutation (SHM). It is postulated that such a cell may be a 'pre-PC' committed to terminal maturation but able to undergo SHM, which exits the GC to home to the bone marrow to accumulate as MGUS. Subsequent sum of neoplastic Events 2-N in MGUS cells generate malignant MM, which retain the imprint of ICV resulting from the GC cell of origin being exposed to on-going SHM in tumor-derived IGHV genes. evolution, where SNP abnormalities or acquired somatic mutations in individual tumors are shared only by some cells not others in individual cases, and on-going subclonal competition appears to promote tumor survival and progression [38][39][40][41][42]. Events 2-N in progression may also include epigenetic genome modifications or dysregulation at the miRNome level and microenvironment dependent modulation of tumor cells. Many mutations identified in MM cells already exist at the MGUS stage of disease [43,44], and while many acquired somatic mutations are common to disease subsets none are universal, further closely associating the molecular pathways of origins in MGUS and MM.
The GC emerges as a crucible in genesis of MGUS and MM via a distinctive pathway in which the cell of origin is targeted by on-going SHM. The question now is what the nature(s) of Event 1 or additional lesions might be in GC origins of MGUS and MM. This pathway may also lead to MM in which clonal IGHV sequences are homogeneous even by deep sequencing, by a 'cloning out' phenomenon or that in clonally homogeneous MM the cell of origin may arise later with SHM silenced as we had proposed earlier [13,14]. Fully elucidating these pathways will crystalize origins of MM, and inform models of disease [45,46].

Patients and samples
Bone marrow samples of MM patients and normal donor (NPC1) were obtained for study with written informed consent in accordance with the Declaration of Helsinki and as approved by institutional ethics review committees. NPC1 was a donor undergoing orthopedic corrective surgery with no underlying disease. NPC2 cells were purchased from DV Biologics (Costa Mesa, CA, USA) as Bone Marrow Mononuclear Cells.
MM1-4 cases were previously untreated with symptomatic disease and CRAB features requiring active therapy (MM4 was plasma cell leukemia, with tumor cells purified from blood): MM1 IgGκ, MM2 IgGκ, MM3 IgGκ, and MM4 IgGλ. MM5 was an IgAκ tumor, previously treated and in 2 nd relapse and used as a control to assess any bias in deep sequencing IgG transcripts. MM1-4 samples were at diagnosis prior to therapy.

RNA
Total RNA was extracted from MM tumor cells and NPC using the Allprep or RNeasy kits (Qiagen, Crawley, UK) in accordance with the manufacturer's protocol.

IGHV gene sequencing
Each RNA sample was separated into two and treated thereafter as two technical replicates.
IGHV region amplicons were generated from cDNA by PCR using individual pools of forward primers within framework region 1 (FR1) that were designed to amplify all known IGHV region alleles, and a reverse primer within the IgG constant region. Both primer sets incorporated Illumina P5 or P7 adaptor sequences at their 5' ends to facilitate sequencing. We used Phusion Flash High-fidelity Taq polymerase (Life Technologies, UK) for PCR, which has a reported error rate of 4.4 × 10 −7 /bp/PCR cycle [47].
Amplicons were purified using an eGel Size-Select electrophoresis system (Life Technologies, UK) to select products within the anticipated size range of approximately 400-450 bp. 250 bp paired-end sequencing was performed on an Illumina MiSeq sequencer using a pool of read1 sequencing primers matching the pool of FR1 primers but omitting the Illumina adaptor sequence, an indexing primer to provide indexing information and a read2 primer matching the IgG constant region amplification primer that also lacked the Illumina adaptor sequence.

Sequence quality control, paired end joining and filtering
Sequence read-pairs were combined using the Flash utility [48]. Sequence pairs which did not meet the quality criteria or which were shorter than 300 bp once combined were excluded from further analysis.

Sequence analysis
All assembled variable region nucleotide sequences were processed using the VDJfasta utility [49]. VDJFasta uses a Hidden Markov Model to statistically analyze sequences upstream and downstream of putative CDR3s and outputs V, D and J germline sequences, CDR3 sequences and translated protein sequences derived from each read. To increase processing speed, sequence processing using this utility was parallelized using custom Perl and Python scripts and run on the parallel computing facility provided by the University of Edinburgh Compute and Data Facility (ECDF, http:// www.ecdf.ed.ac.uk/).

Neighbor-joining trees
For each donor, the 10 most prevalent variable region sequences derived from a clonal expansion with identical CDR3 were aligned using Clustal Omega, phylogenetic trees were calculated using the Neighbor-Joining algorithm [50] and trees rendered using

Authorship contributions
GC designed the IGHV gene deep sequencing strategy and bioinformatics analyses. GC, NWB and DB carried out the experiments, and analyzed the data. AS and DH provided MM cases, analyzed the data. NZ analyzed the data and provided critique. SSS conceived and supervised the study, analyzed data and wrote the paper with GC and NWB, with input from AS, DH and NZ.