B cell clonal expansion and convergent antibody responses to SARS-CoV-2

During virus infection B cells are critical for the production of antibodies and protective immunity. Establishment of a diverse antibody repertoire occurs by rearrangement of germline DNA at the immunoglobulin heavy and light chain loci to encode the membrane-bound form of antibodies, the B cell antigen receptor. Little is known about the B cells and antigen receptors stimulated by the novel human coronavirus SARS-CoV-2. Here we show that the human B cell compartment in patients with diagnostically confirmed SARS-CoV-2 and clinical COVID-19 is rapidly altered with the early recruitment of B cells expressing a limited subset of V genes, and extensive activation of IgG and IgA subclasses without significant somatic mutation. We detect expansion of B cell clones as well as convergent antibodies with highly similar sequences across SARS-CoV-2 patients, highlighting stereotyped naïve responses to this virus. A shared convergent B cell clonotype in SARS-CoV-2 infected patients was previously seen in patients with SARS. These findings offer molecular insights into shared features of human B cell responses to SARS-CoV-2 and other zoonotic spillover coronaviruses.

B cells expressing a limited subset of V genes, and extensive activation of IgG and IgA subclasses without signi cant somatic mutation. We detect expansion of B cell clones as well as convergent antibodies with highly similar sequences across SARS-CoV-2 patients, highlighting stereotyped naïve responses to this virus. A shared convergent B cell clonotype in SARS-CoV-2 infected patients was previously seen in patients with SARS. These ndings offer molecular insights into shared features of human B cell responses to SARS-CoV-2 and other zoonotic spillover coronaviruses.

Background
The novel human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiological agent of the coronavirus disease 2019 (COVID-19)1,2 pandemic. Prior to the emergence of SARS-CoV-2, six human coronaviruses (hCoVs) were known; four seasonal hCoVs (hCoV-229E, -NL63, -HKU1, and -OC43)3 causing usually mild upper respiratory illness, and the two more recently discovered SARS-CoV4 and MERS-CoV5 viruses that arose from spillover events of virus from animals into humans. It is expected that humans are naïve to SARS-CoV-2 and will display a primary immune response to infection. Humoral immune responses will likely be critical for the development of protective immunity to SARS-CoV-2, but little is known about the B cells and antigen receptors stimulated by infection.

Sars-cov-2 Infection Causes Global Changes In The Antibody Repertoire
High-throughput DNA sequencing of B cell receptor (BCR) heavy chain genes de nes clonal B cell lineages based on their unique receptor sequences, and captures the hallmarks of clonal evolution, such as somatic hypermutation (SHM) and class switch recombination, during the evolving humoral response6. To study the development of SARS-CoV-2-speci c humoral responses, we collected peripheral blood samples from six patients admitted to Stanford Hospital with signs and symptoms of COVID-19.
All patients were con rmed positive for SARS-CoV-2 RNA as determined by quantitative reverse transcription PCR (RT-qPCR). Serology of SARS-CoV-2 -speci c IgA, IgG, and IgM antibodies showed early detection of these antibody ( Fig. 1a and Extended Data Fig. 1). Immunoglobulin heavy chain (IGH) repertoires were sequenced and compared to a healthy human control (HHC) dataset7 matched by age and number of B cell clones (Fig. 1b, top panel). In healthy subjects at baseline, IgM and IgD sequences are primarily derived from naïve B cells with unmutated IGHV genes, and typically have small clone sizes, whereas isotype switched cells (IgA and IgG compartments) have elevated SHM with varying clone sizes. SARS-CoV-2 seroconverted patients (7450, 7452, and 7454), and patients 7453-D2 and 7455, in contrast, display a highly polyclonal burst of IgG-expressing clones in the blood with little to no SHM. Seronegative samples (7451 and 7453-D0) show IGH repertoires similar to uninfected controls, suggesting an earlier stage in the infection at the time of sample collection. Fig. 1. COVID-19 patient IGH repertoires show early and extensive class-switching to IgG and IgA subclasses without signi cant somatic mutation. a, Blood was collected from six COVID-19 patients at admission to hospital (D0) and for one patient again two days later (D2). Colored bars indicate assays and experiments performed on samples (RT-qPCR: green; Serology: orange; high-throughput sequencing (HTS): blue). b, Points indicate B cell clonal lineages, with the position denoting the clone's isotype (panel column), human healthy control (HHC) or patient ID (panel row), IGHV gene (x-axis, with IGHV gene in the same order and position in panels, but not listed by name due to space constraints), and CDR-H3 length (y-axis within each panel). The point color indicates the mean IGHV SHM frequency for each clone and the size indicates the number of unique reads grouped into the clone. Points are jittered to decrease overplotting of clones with same IGHV gene and CDR-H3 length. Patient label colors indicate seroconversion (blue), seronegative (red), and no serology performed (black). See Extended Data Fig. 2 for an overview of the IgG4 and IgE subclasses.
As suggested by Fig. 1b, COVID-19 patients have a signi cantly increased fraction of unmutated and low mutation (<1% SHM in IGHV gene) clonal lineages among the antigen-experienced, class switched IgG subclasses (IgG1: p-value = 0.00112; IgG2: p-value = 0.000667; IgG3: p-value = 0.000882; IgG4: p-value = 0.00667) (Fig. 2a). Among some seroconverted subjects (7452 and 7454), 50% or more of the IgG3, IgG1, and IgG2 compartments are in this low mutational state, compared to <1% of lineages from corresponding HHC compartments (Fig. 2a). A similar in ux of low mutation lineages into the IgG compartment has previously been observed in response to Ebola virus (EBOV) infection8. In contrast to EBOV, we observe that COVID-19 primary infection stimulates polyclonal B cell responses with both IgG and, in some patients, IgA subclasses, rather than predominantly IgG alone (Fig. 1b). The SARS-CoV-2 response features a preponderance of IgG1-expression among IgG subclasses (Fig 2b), with median usage of IgG1 being almost double the proportion seen in HHC B cells (p-value = 2.065e-05). IgA1 is also signi cantly increased (p-value = 0.0123) relative to IgA2. This suggests an in ux of recently recruited naïve B cells in the response. The virus-induced response is polyclonal with diverse clonal lineages using a wide range of IGHV genes (Fig. 2c). However, we observed skewing of the responding IGH repertoires away from the usually most frequently utilized IGHV genes such as IGHV3-23, towards IGHV3-30 and IGHV3-9 usage, with particular enrichment of IGHV3-9 in IgG-expressing B cells.
The selection of particular IGHVs in response to an antigen has been observed in other antiviral responses, such as the preference for IGHV1-69 in response to some in uenza virus antigens9. Highly utilized IGHV genes display SHM patterns consistent with the observed frequencies of unmutated or low mutation clonal lineages among switched isotype compartments (Fig. 2a), with low median IgG1 SHM that ranges from 2.2-5.4%, and higher median IgA1 SHM that ranges from 6.2-8.6% (Fig. 2d). clones are signi cantly lower in COVID-19 patients compared to healthy controls in sequences expressing all IgG and IgA subclasses (p-value < 0.01 for IgG3, IgG4, IgA1, and IgA2; p-value < 0.001 for IgG1 and IgG2), and signi cantly higher in IgM (p-value < 0.001) and IgD (p-value < 0.01). There is a strong correlation between the fraction of serum anti-SARS-CoV-2 receptor binding domain (RBD) IgG levels and the fraction of IgG-expressing clonally-expanded lineages in patient (R = 0.95, p-value = 0.014) (Fig 3c). Points are jittered on the x-axis to decrease over-plotting of samples with the same value (y-axis). p-values were calculated by two-sided Wilcoxon-Mann-Whitney tests: ***p-value < 0.001; **p-value < 0.01; *p-value ≤ 0.05; NS: p-value > 0.05. c, Correlation between the serum anti-SARS-CoV-2-RBD levels measured by ELISA (x-axis) and the fraction of expanded clones with SHM ≥ 1% (y-axis). Clones expressing IgG were correlated with levels of anti-SARS-CoV-2-RBD-IgG antibodies; clones expressing IgM were correlated with levels of anti-SARS-CoV-2-RBD-IgM antibodies. Each point represents a patient sample. Shaded regions indicate 95% con dence intervals. R2 and p were measured by Pearson's correlation. Only individuals for whom serology was performed are shown (7450, 7451, 7452, 7453, and 7454).

Convergent Antibody Rearrangements Are Elicited In Covid-19 Patients
Although antigen-driven antibody responses are diverse between individuals, we and others have previously identi ed patterns of highly similar, "convergent" antibodies shared by individuals in response to Ebola virus8, and different convergent antibodies stimulated by Dengue virus12.
These convergent antibodies usually make up a small proportion of the total virus-speci c B cell response in each individual8. To identify putative SARS-CoV-2-speci c antibody signatures, we rst grouped heavy chain sequences together that utilized the same IGHV and IGHJ gene family, and CDR-H3 regions with the same length. We then clustered these groups by single-linkage with 85% identity in the CDR-H3 amino acid sequence and identi ed sequences that were found in at least two COVID-19 patients but were absent from the 114 HHC repertoires. We identi ed 124 such convergent clusters involving clonal lineages from all six patients. The number of convergent clusters that each patient contributed to varied from 12 clusters in patient 7451 to 65 clusters in patient 7455. 106 clusters were shared pairwise between two patients, 16 clusters spanned three patients, and two clusters spanned four patients (Fig.  4a). Clusters that spanned four subjects included a cluster of lineages using IGHV3-30-3 and IGHJ6 with a 14 amino-acid CDR-H3, which was expanded across different isotypes in 7453-D2 and 7454 with SHM frequencies generally below 2% and was detected as IgD in 7451 and IgG1 in 7455. We further detected one to three convergent clones in samples 7450, 7452, and 7453-D2 that were expanded as per the gDNA replicate analysis. Samples with expanded clones by this de nition expressed IgM, IgG1-3, IgA1, and IgA2 (Fig. 4b). We next sought to identify SARS-CoV-2-speci c sequences homologous to known sequences that recognize SARS-CoV-2 or the related betacoronavirus, SARS-CoV. Sequences convergent to known SARS-CoV-speci c antibody IGH sequences13 were identi ed in one COVID-19 patient (7453-D2) but were not detected in the HHC samples (Fig. 4c). We also identi ed one IGH (IGHV3-13, IGHJ4, and CDR-H3 length of 14), recently reported to cross-react to SARS-CoV and SARS-CoV-214, which clustered with IGH sequences identi ed in patient 7455 (Fig. 4c). patients (columns, four-patient clusters are highlighted in pink) are plotted by the expressed isotype (xaxis). Fill color indicates the average SHM and point size shows the number of unique reads. b, Expanded convergent clone counts in COVID-19 patients. Each point represents the number of expanded convergent clones (y-axis) detected in a patient sample that express the speci ed isotype (x-axis). Point colors indicate serology not tested (black), seronegative (red), and seropositive (blue). c, Sequence alignment of CDR-H1, CDR-H2, and CDR-H3 amino acid residues of anti-SARS-CoV convergent IGH (rows). Sequences were aligned against a SARS-CoV and SARS-CoV-2 cross-reactive IGH (top) or SARS-CoV-speci c IGH (bottom) sequences (labels in bold). Light blue highlighting indicates sequence differences. Dots indicate homology to the germline sequence (labels are underlined).
Our results indicate that the IGH repertoires of patients with diagnostically con rmed SARS-CoV-2 RNA and COVID-19 infection are rapidly remodeled in response to this novel virus. A burst of expanded clones with low SHM, shifts in the utilization of IGHV genes and subclasses, and longer and more hydrophobic CDR-H3 regions dominate the molecular changes observed in IGH. We observe convergent antibody responses shared with those seen in patients infected with SARS-CoV, highlighting common modes of human antibody response shared between SARS-CoV-2 and other spillover CoVs. Efforts to detect individuals who have been exposed to SARS-CoV-2 and have recovered after mild or asymptomatic infection have revealed delayed serum antibody responses and low levels of speci c antibodies in some individuals15. Convergent SARS-CoV-2-speci c antibody sequences in the memory B cell pools of recovered individuals could offer an independent kind of evidence of prior exposure and potential for more rapid secondary responses upon re-exposure to the virus. Longitudinal tracking of IGH repertoires in larger patient cohorts, correlation with clinical outcomes, and further investigation into the binding properties and functional activity of convergent responding clones, are required to better understand the human antibody responses to this novel human SARS-CoV-2.

Data Availability
All data is available in the main text or the extended materials. The IGH repertoire data for this study have been deposited to SRA with accession number SUB7246339.

Study Design
Patients admitted to Stanford Hospital with signs and symptoms of COVID-19 and con rmed SARS-CoV infection by RT-qPCR of nasopharyngeal swabs were recruited. Venipuncture blood samples were collected in K2EDTA-or sodium heparin-coated vacutainers for peripheral blood mononuclear cell (PBMC) isolation or serology on plasma, respectively. Recruitment of COVID-19 patients, documentation of informed consent, collections of blood samples, and experimental measurements were carried out with Institutional Review Board approval (IRB-55689).

Healthy adult controls (HHCs)
The data set containing control immunoglobulin receptor repertoires has been described previously7. In summary, healthy adults with no signs or symptoms of acute illness or disease were recruited as volunteer blood donors at the Stanford Blood Center. Pathogen diagnostics were performed for CMV, HIV, HCV, HBV, West Nile virus, HTLV, TPPA (Syphilis), and T. cruzi. Volunteer age range was 17-87 with median and mean of 52 and 49, respectively.

Molecular and serological testing on COVID-19 patients
SARS-CoV-2 infection in patients was con rmed by reverse-transcription polymerase chain reaction testing of nasopharyngeal swab specimens, using the protocols described in16,17. Plasma antibody testing for IgG and IgM speci c for SARS-CoV-2 spike protein receptor binding domain (RBD) was carried out with an enzyme-linked immunosorbent assay based on the protocol and antigen protein production described in18.

HTS of immunoglobulin heavy chain (IGH) libraries prepared from genomic DNA and cDNA
The AllPrep DNA/RNA kit (Qiagen) was used to extract genomic DNA (gDNA) and total RNA from PBMCs.
For each blood sample, six independent gDNA library PCRs were set up using 100 ng template/library (25ng/library for 7453-D0). Multiplexed primers to IGHJ and the FR1 or FR2 framework regions (3 FR1 and 3 FR2 libraries), per the BIOMED-2 design were used19 with additional sequence representing the rst part of the Illumina linkers. In addition, for each sample, total RNA was reverse-transcribed to cDNA using Superscript III RT (Invitrogen) with random hexamer primers (Promega). Total RNA yield varied between patients and between 6 ng-100 ng was used for each of the isotype PCRs using IGHV FR1 primers based on the BIOMED-2 design19 and isotype speci c primers located in the rst exon of the constant region for each isotype category (IgM, IgD, IgE, IgA, IgG). Primers contain additional sequence representing the rst part of the Illumina linkers. The different isotypes were ampli ed in separate reaction tubes. Eightnucleotide barcode sequences were included in the primers to indicate sample (isotype and gDNA libraries) and replicate identity (gDNA libraries). Four randomized bases were included upstream of the barcodes on the IGHJ primer (gDNA libraries) and constant region primer (isotype libraries) for Illumina clustering. PCR was carried out with AmpliTaq Gold (Applied Biosystems) following the manufacturer's instructions, and used a program of: 95°C 7 min; 35 cycles of 94°C 30 sec, 58°C 45 sec, 72°C 60 sec; and nal extension at 72°C for 10 min. A second round of PCR using Qiagen's Multiplex PCR Kit was performed to complete the Illumina sequencing adapters at the 5' and 3' ends of amplicons; cycling conditions were: 95°C 15 min; 12 cycles of 95°C 30 sec, 60°C 45 sec, 72°C 60 sec; and nal extension at 72°C for 10 min. Products were subsequently pooled, gel puri ed (Qiagen), and quanti ed with the Qubit uorometer (Invitrogen). Samples were sequenced on the Illumina MiSeq (PE300) using 600 cycle kits.
Sequence quality assessment, ltering, and analysis Paired-end reads were merged using FLASH20, demultiplexed (100% barcode match), and primer trimmed. The V, D, and J gene segments and V-D (N1), and D-J (N2) junctions were identi ed using the IgBLAST alignment program21. Quality ltering of sequences included keeping only productive reads with a CDR-H3 region, and minimum V-gene alignment score of 200. For cDNA-templated IGH reads, isotypes and subclasses were called by exact matching to the constant region gene sequence upstream from the primer. Clonal identities were inferred using single-linkage clustering and the following de nition: same IGHV and IGHJ usage (disregarding allele call), equal CDR-H3 length, and minimum 90% CDR-H3 For each clone, the median somatic mutation frequency of reads was calculated. Mean mutation frequencies for all clonal lineages from a subject for each isotype were calculated from the median mutation frequency within each clone, and so represent the mean of the median values. Clones with <1% mutation were de ned as unmutated and clones with ≥ 1% were de ned as being mutated. Subclass fractions were determined for each subject by dividing the number of clones for a given subclass by the total number of clones for that isotype category. Expanded clones were de ned as a clone found in one subject which is present in two or more of the gDNA replicate libraries. Clonal expansion in the isotype data was inferred from the gDNA data. Analyses were conducted in R22 using base packages for statistical analysis and the ggplot2 package for graphics23.
To determine convergent rearranged IGH among patients with SARS-CoV-2 infection, we clustered heavychain sequences annotated with the same IGHV and IGHJ segment (not considering alleles) and the same CDR-H3 length were clustered based on 85% CDR-H3 amino acid sequence similarity using cd-hit24. To exclude IGH that are generally shared between humans and to enrich the SARS-CoV-2-speci c IGH that are likely shared among the patients, clusters were selected as informative if (1)

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.