Memory persistence and differentiation into antibody-secreting cells accompanied by positive selection in longitudinal BCR repertoires

The stability and plasticity of B cell-mediated immune memory ensures the ability to respond to the repeated challenges. We have analyzed the longitudinal dynamics of immunoglobulin heavy chain repertoires from memory B cells, plasmablasts, and plasma cells from the peripheral blood of generally healthy volunteers. We reveal a high degree of clonal persistence in individual memory B cell subsets, with inter-individual convergence in memory and antibody-secreting cells (ASCs). ASC clonotypes demonstrate clonal relatedness to memory B cells, and are transient in peripheral blood. We identify two clusters of expanded clonal lineages with differing prevalence of memory B cells, isotypes, and persistence. Phylogenetic analysis revealed signs of reactivation of persisting memory B cell-enriched clonal lineages, accompanied by new rounds of affinity maturation during proliferation and differentiation into ASCs. Negative selection contributes to both persisting and reactivated lineages, preserving the functionality and specificity of B cell receptors (BCRs) to protect against current and future pathogens.


Introduction
B cells play a crucial role in protection from various pathogens and cancer cells as well as regulation of the immune response. The structural diversity of B cell receptors (BCRs) is responsible for the B cell-mediated immune system's capacity to recognize a wide variety of different antigens, and every individual harbors a large pool of naive B cell clones, each with a unique BCR. Antigenic challenge triggers the proliferation and maturation of naive B cells with cognate BCRs, and the resulting progeny comprise a number of cell subsets with differing functions and lifespans. During the affinity maturation process, the initial structure of a given BCR can change at the genomic level as a result of somatic hypermutation (SHM), a process that accompanies B cell proliferation after antigen-specific activation. Cells bearing BCRs with higher affinity to the antigen are favored during the affinity maturation process, and produce signals that stimulate further differentiation and expansion (De Silva and Klein 2015). Another process called class-switch recombination further increases the dimensionality of the BCR space. The five main classes, or isotypes, of antibodies (i.e., IgA, IgD, IgE, IgG, and IgM) have different functions in the immune response (Stavnezer, Guikema, and Schrader 2008;Vidarsson, Dekkers, and Rispens 2014), and isotype switching during clonal proliferation can thereby change the functional capabilities of B cells and the antibodies they produce. As a consequence, antigen challenge yields a population of clonally related cells with different BCRs and functionalities. the degree of repertoire convergence and individuality (Briney et al. 2019;Soto et al. 2019;Shah et al. 2019;Mandric et al. 2020;Yang et al. 2021). Studies of BCR repertoires of patients with different diseases have made an important contribution to the understanding of mechanisms of pathology and B-cell-mediated immunity (Bashford-Rogers et al. 2019; S. C. A. Nielsen et al. 2020;Gaebler et al. 2021;Sakharkar et al. 2021). Longitudinal analysis of repertoires at different timepoints has made it possible to study the dynamics of B cell response following antigenic challenge or therapy (Laserson et al. 2014;Davydov et al. 2018;Horns et al. 2019;Nourmohammad et al. 2019;Hoehn et al. 2021). Reconstruction of BCR evolution in B cell clonal lineages and phylogenetic analysis can reveal which evolutionary forces predominate at different stages of clonal lineage development. De Bourcy et al. recently reported on age-related differences in the structure of clonal lineages, somatic hypermutagenesis and affinity maturation processes, and differences in recall response of persisting lineages upon vaccination depending on CMV seropositivity status (de Bourcy et al. 2017). Other studies have described in detail the action of positive selection in the evolution of clonal lineages in vaccination and chronic HIV infection (Bonsignori et al. 2017;Horns et al. 2019;Nourmohammad et al. 2019). Reports have also described persisting clonal lineages which are predominantly represented by cells with IgM/IgD isotypes, and which demonstrate signs of neutral evolution (Horns et al. 2019). Comparison of BCR repertoires between different cell subsets also makes it possible to investigate factors governing the functional assignment of B cells during proliferation, and thereby to understand fundamental aspects of B cell immunity. For example, recent studies have described differences in BCR repertoires of IgM and switched memory B cells as well as the complex interplay between CD27 high and CD27 low B-cell memory subsets, showing the complex nature of B cell immune memory (Wu et al. 2010;Grimsholm et al. 2020).
BCR repertoires of antigen-experienced B cell subsets and their dynamics are usually studied in the context of pathologic conditions and vaccination, and there is a lack of equivalent data in the absence of acute or chronic immune response. We have therefore investigated immunoglobulin heavy chain repertoires from memory B cells, plasmablasts, and plasma cells from peripheral blood collected from generally healthy volunteers at three time-points over the course of a year. In order to obtain detailed and unbiased repertoire data, we used advanced IgH repertoire profiling technology that provides full-length IgH sequences with isotype annotation. Based on comparative and phylogenetic analysis of the resulting data, we are able to describe the structure, distinctive features, clonal relations, isotype distribution and temporal dynamics of B cell subset repertoires, as well as the phylogenetic history of large clonal lineages.

IGH repertoire sequencing statistics and analysis depth
We collected peripheral blood from six healthy donors at three time-points, where the second sample was collected one month after the first, and the third was collected 11 months after that ( Fig. 1A; Supplementary Table S1). These samples were subjected to fluorescence-activated cell sorting to isolate memory B cells (Bmem: CD19 + CD20 + CD27 + ), plasmablasts (PBL: CD19 low/+ CD20 -CD27 high CD138 -) and plasma cells (PL: CD19 low/+ CD20 -CD27 high CD138 + ) ( Supplementary Fig. S1A). Most of the cell samples were collected and processed in two independent replicates (Supplementary Data SD1). For each cell sample, we obtained fulllength IGH clonal repertoires were obtained using a 5'-RACE-based protocol, which allows unbiased amplification of full-length IGH variable domain cDNA while preserving isotype information, with subsequent unique molecular identifier (UMI)-based sequencing data normalization and error correction (Turchaninova et al. 2016;Shugay et al. 2014). From a total of 83 cell samples, we obtained 1.06 x 10 7 unique IGH cDNA molecules, each covered by at least three sequencing reads, representing 8.4 x 10 5 unique IGH clonotypes (Supplementary Data SD1). An IGH clonotype was defined as a unique nucleotide sequence spanning from the beginning of IGH V gene framework 1 to the 5' end of the C segment, sufficient to determine isotype. The number of unique clonotypes (i.e., species richness) depended on the number of cells per sample (Supplementary Fig. S1B), even after data normalization by sampling an equal number of unique IGH cDNA sequences. To characterize the number of distinct IGH clonotypes in each cell subset, we selected the samples with the most common number of sorted cells for each sample set. The median number of clonotypes was 20,072 (14,572-32,806, n = 14) per 5 x 10 4 memory B cells, 628 (528-981, n = 8) per 1 x 10 3 plasmablasts, and 800 (623-1,183, n = 9) per 1 x 10 3 plasma cells. Rarefaction analysis in the Bmem subpopulation ( Supplementary Fig. S1B, left) revealed an asymptotic increase of species richness that did not reach a plateau, indicating that the averaged species richness can only serve as a lower limit of sample diversity estimation. For all samples of PBL and PL subpopulations, species richness curves plateaued, meaning that we had reached sufficient sequencing depth to evaluate the clonal diversity of the sorted cell samples ( Supplementary  Fig. S1B center and right). B cell subsets display both divergent and similar characteristics in their IGH repertoires First, we aimed to characterize features of the IGH repertoires of the Bmem, PBL, and PL subset based on several key properties: usage of germline encoded IGHV segments, clonal distribution by isotypes, rate of SHM in CDR1-2 and FWR1-3, and features of the hypervariable CDR3 region. The proportion of overall clonal diversity occupied by the five major IGH isotypes was strikingly different between Bmem cells and antibody-secreting cells (ASCs; i.e., PBL and PL). IgM represented more than half of the repertoire in Bmem, while IgA was dominant in PBL and PL (Fig. 1B, Supplementary Data SD2). The second most prevalent isotype in ASCs was IgG, which was also less abundant in Bmem compared to IgA. IgD represented a substantial part of the Bmem clonal repertoire, while < 1% clonotypes of ASCs expressed IgD. The proportion of each isotype varied between donors and time points, but IgM and IgA or IgA and IgG consistently remained the most abundant isotypes in Bmem cells or ASCs, respectively ( Supplementary Fig. S2A, Supplementary Data SD2). In all studied subsets, the isotype distribution in terms of number of unique clonotypes roughly mirrored the isotype distribution based on the number of IGH cDNA molecules, indicating absence of large clonal expansions or differences in IGH expression level distorting abundance of isotypes. This could not be determined by sequencing of bulk PBMCs, as higher levels of IGH expression by ASCs can change the isotype proportions and thereby bias the quantitation of clonotype abundance (Supplementary Fig. S2B). The obtained IGH isotype distributions based on unique clonotypes roughly correspond to the distribution of IGH isotypes typically detected by flow cytometry of the same subsets (Perez-Andres et al. 2010). The level of SHM was on average significantly higher in ASC subsets, reflecting that PBLs and PLs are enriched for clones that have undergone affinity maturation (Fig. 1C). The average number of SHMs for IgE clonotypes did not differ significantly between cell subsets, but was significantly higher compared to the level of SHM detected for IgM and IgD clonotypes in Bmem. Of note, the rate of SHM in PBLs was higher than that in PLs in clonotypes from the three most abundant isotypes (i.e., IgM, IgA and IgG). We also compared the distributions of the lengths of the hypervariable CDR3 region between IGH clonotypes in different cell subsets. PBLs had significantly longer CDR3 regions compared to Bmem cells on average in every isotype except for IgE (Fig. 1D). Of note, the average CDR3 length in PL clonotypes was significantly higher compared to Bmem for IgA and IgD, but not for the other isotypes.
IGHV gene segment usage was roughly similar between Bmem, PBL, and PL cells from all donors, indicating generally equal probabilities of memory-to-ASC conversion for B cells carrying BCRs encoded by distinct gene segments (Fig. 1E, Supplementary Fig. S3A). This distribution differed significantly between the studied cell subsets and naive B cells (based on data from Gidoni et al. 2019). The repertoire of total B cells (Btot; CD19 + CD20 + ), which contained a large fraction of naive B cells, demonstrated similar IGH V gene segment usage to the naive B-cell repertoire (Supplementary Fig. S3A). We observed statistically significant Pearson correlations in terms of IGHV gene frequencies for all pairs between Bmem, PBL or PL (> 0.95 correlation, p < 0.01), and for naive vs Btot (0.79 correlation, p < 0 .01). We observed high concordance in terms of under-or overrepresentation of specific IGHV gene segments in repertoires of all antigen-experienced B-cell subsets compared to naive B-cells; Pearson correlation coefficients for the fold-change of IGHV gene segment usage frequencies were 0.95 for Bmem and PBL, 0.96 for Bmem and PL, and 0.98 for PBL and PL (p < 0.01 for all pairs). Moreover, IGHV gene segment under-or overrepresentation clearly depended on the given gene sequence. We clustered IGHV genes based on their sequence similarity, and observed that most IGHV segments in each of the four major clusters behaved concordantly with other segments in that cluster (Fig. 1E). This effect was also observed at the level of individual repertoires (Supplementary Fig. S3B) with discrepancies that could probably be attributed to genetic polymorphism of the IGH loci of particular donors.  Study design. Peripheral blood from six donors was sampled at three time points: T1 -initial time point, T2 -1 month and T3 -12 months after the start of the study. At each time point, we isolated PBMCs and sorted memory B cells (Bmem: CD19 + CD20 + CD27 + ), plasmablasts (PBL: CD19 low/+ CD20 -CD27 high CD138 -) and plasma cells (PL: CD19 low/+ CD20 -CD27 high CD138 + ) in two replicates using FACS. For each cell sample, we obtained clonal repertoires of full-length IGH by sequencing respective cDNA libraries. B: Proportion of isotypes in studied cell subsets averaged across all obtained repertoires. Left, frequency of unique IGH clonotypes with each particular isotype. Right, frequency of each isotype based on IGH cDNA molecules detected in a sample. C: Distribution of the number of somatic hypermutations identified per 100 bp length of IGHV segment for clonotypes within each particular isotype. D: Distribution of CDR3 length of clonotypes in each cell subset by isotype; E: Distributions of average IGHV gene frequencies based on number of clonotypes in naive B cells (data from Gidoni et al. 2019), Bmem, PBL, and PL repertoires are shown at the top. Colored squares on heatmap indicate significantly different (false discovery rate, FDR < 0.01) frequencies for IGHV gene segments in corresponding B cell subsets compared to naive B cell repertoires. Color intensity reflects the magnitude of the difference (FC = fold change). Only V genes represented by more than two clonotypes on average are shown. IGHV gene segments are clustered based on the similarity of their amino acid sequence, as indicated by the dendrogram at the bottom. In C and D, the numbers at the bottom of the plots represent the number of clonotypes in the corresponding group, pooled from all donors, and the median measurements from each cell type. Comparisons between subsets were performed with two-sided Mann-Whitney U test. * = p ≤ 0.05, ** = p ≤ 0.01, *** = p ≤ 10 -3 , **** = p ≤ 10 -4 .

Memory B cell repertoires are stable over time and contain a large number of public clonotypes
We further studied the similarity of IGH clonal repertoires of B cell subsets across time-points and between individuals, evaluating repertoire stability (i.e., distance between different timepoints) and degree of individuality (i.e., distance between repertoires from different donors). We evaluated repertoire similarity at two levels of IGH sequence identity: frequency of clonotypes with identical nucleotide sequence-defined variable regions (FR1-4), and number of clonotypes with identical CDR3 amino acid sequences, IGHV gene segments, and isotypes. Both metrics showed significantly higher inter-individual differences compared to the divergence of repertoires derived from the same donor, reflecting the fact that IGH repertoires of Bmem, PBL, and PL subsets are private to a large degree ( Fig. 2A,B). We observed identical clonotypes in the repertoires of PBL and PL collected at different time-points, whereas the repertoire similarity was much lower compared to that between replicate samples, reflecting the transient nature of PBL and PL populations in peripheral blood. Notably, we observed lower clonal overlap in PBL and PL for more distant time-points (separated by 11 or 12 months) than those that are closer together (1 month) (Supplementary Fig. S4A). The dissimilarity between samples collected on the same day versus 1 month or even 1 year later was much lower for Bmem, demonstrating the high stability of the clonal repertoire and long-term persistence of IGH clonotypes in these cells (Fig. 2B, Supplementary Fig. S4A).
To better describe the inter-individual IGH repertoire convergence, we analyzed the number of IGH amino acid clonotypes shared between different donors (i.e., public clonotypes) among 5000 most expanded clonotypes in each Bmem repertoire, assuming that functional convergence could be detected amongst the most abundant clonotypes due to clonal expansions in response to common pathogens. Indeed, the average number of shared clonotypes in Bmem was significantly higher between fractions of the most abundant clonotypes compared to randomly-sampled clonotypes (Fig. 2C), as well as when compared to the most abundant clonotypes shared by two naive repertoires (from Gidoni et al. 2019) or to pre-immune IGH repertoires obtained by in silico generation using OLGA software (Sethna et al. 2019) (Fig.  2C). We also noted that there were no shared clonotypes defined by their full-length nucleotide sequence (Supplementary Fig. S4B). Public clonotypes were also hypermutated, although the rate of SHM was slightly lower compared to that in clonotypes specific to one donor (private) (Supplementary Fig. S4C). These observations indicate functional convergence in Bmem repertoires, which is presumably driven by exposure to common pathogens. Of note, the extent of clonal overlap was significantly higher between naive repertoires than for in silico-generated repertoires, indicating functional convergence even in pre-immune repertoires. Furthermore, the distance between V segment usage distributions in Bmem repertoires was not significantly different compared to that in naive B cells repertoires. That indicates that the higher clonotype sharing seen in Bmem cells cannot be attributed to lower diversity in IGHV germline usage (Fig.  2D). The same analysis in PBL and PL subpopulations for the 600 and 200 most abundant clonotypes respectively yielded no shared clonotypes between repertoires of different donors, demonstrating no detectable convergence at this sampling depth. Finally, we found that public clonotypes were more likely to be detected than private ones in samples collected at different time-points ( Fig. 2E), again suggesting persistent memory to common antigens. A: Distance between repertoires obtained at different time-points from the same or different donors as calculated by Jensen-Shannon divergence index for IGHV gene frequency distribution. B: Number of shared clonotypes between pairs of repertoires from the same or different donors and time-points. For data normalization, we assessed the most abundant 14,000 Bmem, 600 PBL, and 300 PL clonotypes. C: The average number of shared clonotypes between repertoires from pairs of unrelated donors for the 5,000 most abundant Bmem clonotypes, randomly-selected Bmem clonotypes, most abundant clonotypes from naive repertoires of unrelated donors (from Gidoni et al. 2019), or from synthetic repertoires generated with OLGA software. We calculated the number of shared clonotypes between these repertoires, each of which contained 5,000 clonotypes. D: Inter-individual distance between distributions of V genes in repertoires, calculated as Jensen-Shannon divergence indices for the pairs of repertoires depicted in C. E: Fraction of persistent clonotypes detected at more than one time-point among clonotypes detected in repertoires from only one donor (private) or in at least two donors (public). Each dot represents the fraction of persistent clonotypes from one donor. In all plots, clonotypes are defined as having identical CDR3 amino acid sequences and the same IGHV gene segment and isotype. For A-D, each dot represents a pair of repertoires of the corresponding type; N indicates the number of pairs of repertoires in the group. Comparisons in all panels were performed with two-sided Mann-Whitney U test. * = p ≤ 0.05, ** = p ≤ 0.01, *** = p ≤ 10 -3 , **** = p ≤ 10 -4 .
Temporal dynamics of clonal lineages are associated with cell subset composition SHM during BCR affinity maturation leads to the formation of clonal lineages -i.e., BCR clonotypes evolved from a single ancestor after B cell activation. To study the structure and dynamics of clonal lineages originating from a single BCR ancestor, we grouped clonotypes from each individual based on their sequence similarity (see Materials and Methods for details). We focused on larger clonal lineages consisting of at least 20 unique clonotypes from the corresponding donor. On average, these clonal lineages covered 3.4% of a given donor's repertoire, and we identified 190 such lineages across the four donors from whom samples were collected at each of the three time-points (Supplementary Fig. S5A).
After performing principal component analysis (PCA), we determined that these clonal lineages could be divided into two large clusters, HBmem and LBmem, based on the proportion of cell subsets and BCR isotypes represented (Fig. 3A, Supplementary Fig. S5B). The more abundant HBmem cluster included 138 clonal lineages, and was mostly composed of memory B-cell clonotypes of non-switched isotype IgM. Conversely, the smaller LBmem cluster (52 clonal lineages) was more diverse and largely composed of ASCs, and enriched in IgG and IgA clonotypes (Fig. 3B). The average size of clonal lineages (i.e., the number of unique clonotypes per lineage) did not differ between the two clusters ( Supplementary Fig. S5C). Clonal lineages belonging to both clusters were observed in repertoires of all donors, and the HBmem cluster was more prevalent in each donor (Supplementary Fig. S5D).
To better understand the differences between HBmem and LBmem clonal lineages, we tracked the abundance of each clonal lineage in the repertoire across each time-point. The two clusters of lineages demonstrated different temporal behavior; while HBmem groups were quite stable over time, LBmem lineages had a burst of increased frequency at one of the time points (Fig.  3C). To compare the temporal stability of clonal lineages, we defined the lineage persistence metric, which equals 1 when a clonal lineage was equally frequent at all three time-points and is close to 0 when it was detected at just one time-point (Fig. 3D) was strongly associated with its composition (Fig. 3E, F). Clonal lineages enriched with clonotypes or with the IgM isotype -including all HBmem lineages -were more likely to persist through time. Conversely, lineages with larger proportions of ASCs or IgG/IgA isotypes, including most LBmem lineages, tended to have lower persistence, with a burst of increased frequency at one particular time-point. The persistence of a clonal lineage was not associated with its size, and the frequencies of clonal lineages were highly correlated among replicate samples (Supplementary Fig. S5E, F), indicating that differences in persistence cannot be attributed to clonotype sampling noise.
Besides their higher persistence, the HBmem lineages were enriched in clonotypes detected at multiple time-points (Supplementary Fig. S5G), indicating that persistent clonal lineages are supported by persistent clonotypes. Furthermore, 29.7% of the HBmem cluster was represented by public clonal lineages shared between at least two donors, compared to 3.8% for the LBmem cluster. The only two shared LBmem lineages had atypically high persistence, which made them more similar to HBmem (Fig. 3G)  persistence. ❑ ❑ f max is the maximum clonal lineage frequency among the three time-points, and ❑ ❑ f i , j are the frequencies at the remaining two time-points. E: Spearman's correlation between persistence of a clonal lineage and proportions of its clonotypes associated with a given B cell subset or isotype. F: Comparison of persistence between HBmem and LBmem. G: Fraction of public clonal lineages in the two clusters. Statistical significance for B, F, and G is calculated by the two-sided Mann-Whitney test. * = p ≤ 0.05, ** = p ≤ 0.01, *** = p ≤ 10 -3 , **** = p ≤ 10 -4 .

LBmem clonal lineages could arise from HBmem clonal lineages
The evolutionary past of a clonal lineage can be described by inferring the history of accumulation of SHMs leading to individual clonotypes-i.e., by reconstructing the phylogenetic tree of the clonal lineage. The initial germline sequence of each clonal lineage partially matches the germline VDJ segments, and can be reconstructed in a manner corresponding to the root of the phylogenetic tree of this lineage (see Methods). However, the first node of the phylogenetic tree (green diamond in Fig. 4A), the most recent common ancestor (MRCA) of the sampled part of the lineage, can be different from the inferred germline sequence. These differences, referred to as the G-MRCA distance, correspond to SHMs accumulated during the evolution of the clonal lineage prior to divergence of the observed clonotypes. The G-MRCA distance depends on how clonotypes of the tree were sampled. Sampling of clonotypes regardless of their position on the tree results in a low G-MRCA distance (Fig. 4A, top panel), while sampling just those clonotypes belonging to a particular clade can conceal early stages of lineage evolution and thus result in a large G-MRCA distance (Fig. 4A, bottom panel). The G-MRCA distance was on average five-fold higher in LBmem clonal lineages compared to HBmem (median = 0.044 vs. 0.008, Fig. 4B). This means that even though nearly all the evolution of an HBmem clonal lineage leaves a trace in the observed diversity of that lineage (Fig. 4D,G), the sequence variants of an LBmem lineage typically result from divergence of an already-hypermutated clonotype (Fig. 4E,H). In most (38 out of 52) LBmem lineages, some Bmem clonotypes were observed at the time-point preceding expansion. Moreover, clonotypes of LBmem lineages are typically characterized by lower pairwise divergence compared to that in HBmem lineages (median = 0.11 vs 0.13, Fig. 4C). Together with the burst-like dynamics characteristic of LBmem lineages (Fig. 3F), this implies that LBmem lineages may represent recent, rapid clonal expansion of preexisting memory.
Based on these results and the compositional features of the two clusters, we further hypothesized that LBmem clonal lineages may arise from reactivation of pre-existing memory cells belonging to the HBmem cluster. In search of examples of such a transition, we examined all clonal lineages that were persistent but included ASC clonotypes. We found one clear example of a transition from HBmem to LBmem state in the evolutionary history of a clonal lineage (Fig. 4F, I). While the MRCA of this lineage nearly matched the germline sequence, all ASC clonotypes were grouped in a single monophyletic clade (sublineage), such that its ancestral node was remote from the MRCA. The ASC sublineage demonstrated all features characteristic of LBmem, including predominance of IgG and IgA isotypes, low persistence, and low clonotype divergence. Conversely, the remainder of the clonal lineage had features of HBmem: predominance of IgM, high persistence, and high levels of clonotype divergence.
14 339 340 341 342 343 Comparison of G-MRCA p-distance (i.e., the fraction of differing nucleotides) for HBmem and LBmem lineages. C: Mean pairwise phylogenetic distance (i.e., the distance along the tree) between clonotypes of the same lineage for HBmem and LBmem clusters. D-F: Representative phylogenetic trees for clonal lineages belonging to HBmem (D), LBmem (E), and an example of HBmem-LBmem transition (F). The LBmem sublineage in F is nested deep in the phylogeny of the memory clonotypes, and is not characterized by a particularly long ancestral branch, indicating that it is not an artifact of clonal lineage assignment. Circles correspond to individual clonotypes, with the cellular subset indicated by color, and the isotype by label. Tables at right indicate the presence or absence of the corresponding clonotype at each time-point. The G-MRCA distance is indicated with a thick line. G-I: Schematic representation of the hypothetical dynamics of relative size for clonal lineages represented in D, E, and F, respectively. Significance for B and C was obtained by the two-sided Mann-Whitney test. * = p ≤ 0.05, ** = p ≤ 0.01, *** = p ≤ 10 -3 , **** = p ≤ 10 -4 .

Reactivation of LBmem clonal lineages is driven by positive selection
Having shown that the LBmem lineages likely originate from clonal expansion of pre-existing memory, we further compared the contribution of positive (favoring new beneficial SHMs) and negative (preserving the current variant) selection between the LBmem and HBmem clusters.
Since we observed only one clear example of an HBmem-LBmem transition (Fig. 4F), we could not claim with certainty that LBmem lineages always emerge from preexisting HBmem lineages rather than from some other memory type. Still, we were able to study LBmem reactivation by comparing differences in substitution patterns at the origin of HBmem and LBmem clusters. We reasoned that the G-MRCA distance of an HBmem lineage contains mutations fixed by primary affinity maturation after the initial lineage activation. In contrast, the G-MRCA distance of an LBmem lineage contains both mutations arising during primary affinity maturation and subsequent changes occurring later in the evolution of the lineage. Differences in the characteristics of the G-MRCA mutations between clusters are therefore informative of the process prior to the observed expansion of LBmem lineages.
To assess selection at the origin of the HBmem and LBmem lineages, we measured the divergence of nonsynonymous sites relative to synonymous sites (i.e., the DnDs ratio). Classically, DnDs > 1 is interpreted as evidence for positive selection. However, DnDs > 1 is rare, because the signal of positive selection is usually swamped by that of negative selection. In the McDonald-Kreitman (MK) framework, positive selection is instead revealed by excessive nonsynonymous divergence relative to nonsynonymous polymorphism (i.e., DnDs > PnPs; see Methods and Supplementary Table S2 for examples), under the logic that advantageous changes contribute more to divergence than to polymorphism (McDonald and Kreitman 1991). The fraction of adaptive nonsynonymous substitutions (α) can then be estimated from this excess. We designed an MK-like analysis, comparing the relative frequencies of nonsynonymous and synonymous SHMs at the G-MRCA branch (equivalent to divergence in the MK test) to those in subsequent evolution of clonal lineages (equivalent to polymorphism in the MK test; Fig. 5A In both the HBmem and LBmem clonal lineages, we observed a higher ratio of nonsynonymous to synonymous SHMs in the G-MRCA branches compared to subsequent tree branches, meaning that a fraction of SHMs acquired by MRCA was further fixed by positive selection. However, this fraction was higher in LBmem lineages (Fisher's exact test: α = 0.58 and 0.65, p < 10 -6 and < 10 -15 in HBmem and LBmem, respectively). α of distinct clonal lineages was also generally higher in LBmem than in HBmem (median α = 0.57 vs α = 0.18, Fig. 5B), showing that positive selection more frequently preceded expansion of LBmem than HBmem lineages. The observation of excess α in the LBmem cluster compared to HBmem was robust to the peculiarities of the MK analysis (Supplementary Table S3). The higher α for LBmem compared to HBmem implies that a fraction of SHMs was positively selected in LBmem clonal lineages after primary affinity maturation.

Subsequent evolution of LBmem clonal lineages is affected by negative and positive selection
Next, we considered the effects of selection on HBmem and LBmem clusters following their divergence from their MRCAs -i.e., in the subsequent evolution of a clonal lineage leading to the diversity of the observed clonotypes. We calculated the per-site ratio of nonsynonymous and synonymous SHMs (πNπS) among those that originated after the MRCA. The πNπS of both clusters was < 1 (Fig. 5С). This deficit of nonsynonymous SHMs indicates negative selection in the observed part of clonal lineage evolution. The πNπS ratio was lower in the LBmem cluster, indicating stronger negative selection.
To examine the selection affecting these post-MRCA SHMs in more detail, we studied the frequency distribution of SHMs in individual lineages, or their site frequency spectra (SFS) (R.  (Fig. 5A). SFS reflects the effect of selection on these SHMs. Deleterious SHMs are held back by negative selection, so that their frequency in the lineage remains low. By contrast, positive selection favors the spread of adaptive SHMs, increasing their frequency. Therefore, negative selection biases the SFS towards low frequencies, and positive selection, towards high frequencies.For each clonal lineage, we reconstructed the SFS of the SHMs accumulated since divergence from MRCA (Fig. 5A), and then averaged these SFSs within the HBmem and LBmem clusters. A larger proportion of the LBmem SFS corresponds to high frequencies compared to HBmem (Fig. 5D), indicating weaker negative and/or stronger positive selection in LBmem SFS.
To distinguish between these selection types, we calculated the proportion of the SFS distribution falling into each frequency bin for nonsynonymous SHMs, and divided it by the same value for synonymous SHMs (normalized πNπS; see Methods, Fig. 5E). The inter-cluster differences in the normalized πNπS in low-frequency bins were generally reflective of negative selection, while the differences in the high-frequency bins were reflective of positive selection. Normalized πNπS was significantly higher in the high-frequency (>60%) bins of SHMs in LBmem clonal lineages. This indicates that for LBmem, those nonsynonymous changes that were not removed by negative selection reached high frequencies more often than in HBmem. In total, these data indicate that a fraction of nonsynonymous mutations accumulated by LBmem lineages were adaptive. We thus observed that reactivation of LBmem lineages is coupled with strengthening of both types of selection: positive on the G-MRCA branch, and both positive and negative during subsequent clonal lineage expansion. This pattern is most likely evidence of new rounds of affinity maturation, which result in the acquisition of new advantageous changes and preserve the resulting BCRs from deleterious ones. HBmem, in contrast, evolved more neutrally under weaker negative selection, suggesting absence of antigen challenge during the observation period (Fig. 5F).

Discussion
Using advanced library preparation technology, we performed a longitudinal study of full-length BCR repertoires of the three main antigen-experienced B cell subsets -memory B cells, plasmablasts, and plasma cells -from peripheral blood of six donors, sampled three times over the course of a year. We analyzed these repertoires from two conceptually different but complementary points of view. First, we compared various repertoire features between the cell subsets, including clonotype stability in time and convergence between individuals. Second, we tracked the most abundant B cell clonal lineages in time and analyzed their cell subset and isotype composition, phylogenetic history, and mode of selection.
Comparative analysis of the cell subsets revealed significant differences in IGH isotype distribution, rate of SHM, and CDR3 length. IgM clonotypes predominated in the Bmem subset, whereas in ASCs the switched isotypes IgA and IgG together represented >80% of repertoire diversity on average. As expected, classical switched isotypes have higher rates of SHM, and the rate of SHM in ASCs is in general higher than in Bmem. The IgD isotype in Bmem cells showed similarities to IgM, where most IgD clonotypes had low levels of SHM, although there was a fraction of heavily-mutated clonotypes. On average, IgD-switched PL and PBL had a comparable level of SHM with IgG-and IgA-expressing ASC clonotypes. Notably, the level of SHM and CDR3 length in PBL on average exceeded that of PL in IgM, IgA, and IgG isotypes. We hypothesize that such PBLs with heavily hypermutated BCRs could be the subset of B cell progeny that continue to acquire mutations after optimal affinity has been achieved, while another part of the clonal progeny is committed to a long-lived PL fate and acquires the CD138 marker characteristic of this cell subset (Garimilla et al. 2019).
While different in many aspects, immune-experienced B cell subsets are similar -and concordantly distinct from naive B cells -in terms of IGHV gene segment usage. Moreover, we observed that the correlated enrichment or depletion in V segment usage frequency generally coincides with the level of sequence similarity of the V segments. Most IGHV-3 family members were observed more frequently in antigen-experienced B cells compared to naive subsets in all donors and time-points, while most of the other V genes that are well-represented in the naive subset decreased in frequency. These differences in V usage frequencies between naive and antigen-experienced B cell subsets have also been reported in several previous studies, even though different FACS gating strategies were used (Mitsunaga and Snyder 2020;Ghraichy et al. 2021). Our findings further support the idea that initial recruitment of B cells to the immune response is in many cases determined by the germline-encoded parts of the BCR, presumably CDR1 and CDR2. Previous studies have shown high levels of convergence in IGHV usage between B cell clonotypes specific for particular pathogens or self-antigens (Peng et al. 2019;Galson et al. 2015;Bashford-Rogers et al. 2019).
We further analyzed the repertoire similarity of cell subsets over time and between individuals. Intuitively, the Bmem subset is the most stable over time, showing less repertoire divergence and a greater number of shared clonotypes between sampling time-points in the same individuals. Compared to between-individual sharing, we detected a very small number of common clonotypes in Bmem cells. Those clonotypes have comparable levels of SHM to private ones, assuming germinal center-dependent origin. Two recent studies on extra-deep repertoires of bulk peripheral blood B cells reported 1-6% (Soto et al. 2019) or ~1% (Briney et al. 2019) shared V-CDR3aa-J clonotypes between pairs of unrelated donors, with lower repertoire convergence for class-switched clonotypes shown in the latter study. Using the same method, we similarly measured 0.06% repertoire overlap in the Bmem subset (Supplementary Figure S4D). Complementing the model proposed by Briney et al. -wherein IGH repertoires are initially dissimilar and then homogenize during B cell development before finally becoming highly individualized after immunological exposure -we found a significantly higher number of shared clonotypes between IGH repertoires among the most abundant Bmem clonotypes, indicating functional convergence presumably due to exposure to common environmental antigens. The latter is further supported by the higher number of persisting Bmem clonotypes observed among public clonotypes compared to private ones.
Next, we focused on the most abundant B cell clonal lineages, which are large enough to study the interconnection between cell subsets and phylogenetic features of lineages. In all individuals, the observed clonal lineages clearly fell into two clusters. HBmem represents persistent memory with a predominant IgM isotype; such clonal lineages were equally sampled from all time-points and rarely included ASC clonotypes. The MRCA of observed clonotypes in HBmem lineages almost matched the predicted germline sequence -and in 14.5% of the lineages, matched completely -indicating that the probability of observing a clonotype from these lineages has no association with the position in that lineage's phylogeny. Horns and colleagues observed lineages with very similar features to HBmem, which also possessed persistent dynamics against a background of vaccine-responsive lineages and were predominantly composed of the IgM isotype (Horns et al. 2019). However, their study was performed on bulk B cells, so there was no possibility to track their relatedness to the Bmem subset. In contrast, the LBmem cluster demonstrates completely different features, with lineages largely composed of ASC clonotypes with switched IgA or IgG isotypes, showing active involvement in ongoing immune response. The MRCA of LBmem lineages differed from the germline sequence by some number of SHMs, and only 1.9% of LBmem lineages had a complete match between the MRCA and the germline sequence. A large G-MRCA distance implies that the observed clonotypes originated from an already-hypermutated ancestor, and that we had therefore sampled clonotypes from a single clade of the lineage phylogeny. Such an effect can be caused both by rapid expansion of the clade and migration of the clade's clonotypes within the peripheral blood. We also observed that most LBmem lineages expanded at T2 or T3 (38 out of 45, > 80%) had at least one clonotype detected in the Bmem subset at the previous time-point, leading us to conclude that LBmems represent the progeny of reactivated persistent B cells. We found one lineage that possesses all features of the HBmem cluster except for one monophyletic clade, typical for LBmem lineage; thus, at least some LBmem clonal lineages represent the progeny of reactivated persistent Bmem cells that are represented among HBmem lineages. This example of HBmem-LBmem transition is very similar to reactivated persistent memory, as observed by Hoehn et al. in response to seasonal flu vaccination (Hoehn et al. 2021).
Our analysis of the selection mode in HBmem and LBmem lineages supported our assumptions. We showed that both lineages experienced positive selection from the germline sequence to the MRCA of the observed clonotypes -as expected, assuming that primary B cell activation is followed by affinity maturation associated with clonal lineage expansion. However, the pressure of positive selection was stronger in LBmem lineages than in HBmem. In addition, we detected an excess of sites under positive selection in LBmem lineages that underwent evolution after the MRCA as well. This leads us to the conclusion that LBmem cells underwent additional rounds of affinity maturation after reactivation. Hoehn et al. did not study the mode of selection in their reactivated lineages, but some clonotypes were sampled from germinal centers, supporting the involvement of affinity maturation. In subsequent evolution after the MRCA, we detected negative selection in both lineages -but again, stronger in LBmem. We consider this excess negative selection in LBmem as an additional signature of affinity maturation, purifying the lineage from deleterious BCR variants.
In this work, we performed a detailed longitudinal analysis of BCR repertoires from immuneexperienced B cell subsets from donors without severe pathologies, and from these data, we have produced a framework for the comprehensive analysis of selection in BCR clonal lineages. Our results demonstrate the interconnection of B cell subsets at a clonal level, B cell memory convergence in unrelated donors, and the long-term persistence of memory-enriched clonal lineages in peripheral blood. Signs of positive selection were detected in both memory-and ASC-dominated B cell lineages. Together, the results of our evolutionary analysis of B cell clonal lineages coupled with B cell subset annotation suggest that the reactivation of preexisting memory B cells is accompanied by new rounds of affinity maturation.
Full-length IGH cDNA libraries and sequencing IGH cDNA libraries were prepared as described previously (Turchaninova et al. 2016) with several modifications. Briefly, we used a rapid amplification of cDNA ends (RACE) approach with a template-switch effect to introduce 5' adaptors during cDNA synthesis. These adaptors contained both unique molecular identifiers (UMIs), allowing error-correction, and sample barcodes (described in Zvyagin et al. 2017), allowing us to rule out potential cross-sample contaminations. In addition to a universal sequence for annealing the forward PCR primer, we also introduced a 5' adaptor during the reverse transcription (RT) reaction, which allowed us to avoid using multiplexed forward primers specific for V segments, thereby reducing PCR amplification biases. Multiplexed C-segment-specific primers were used for RT and PCR, allowing us to preserve isotype information. Prepared libraries were then sequenced with an Illumina HiSeq 2000/2500, (paired-end, 2 x 310 bp).
Sequencing data pre-processing and repertoire reconstruction Sample demultiplexing by sample-barcodes introduced in the 5' adapter and UMI-based errorcorrection were performed using MIGEC v1.2.7 software (Shugay et al. 2014). For further analysis, we used sequences covered by at least two sequencing reads. Alignment of sequences, V-, D-, J-, and C-segment annotation, and reconstruction of clonal repertoires were accomplished using MiXCR v3.0.10 (Bolotin et al. 2015) with prior removal of the primeroriginated component of the C-segment. We defined clonotypes as a unique IGH nucleotide sequence starting from the framework 1 region of the V segment to the end of the J segment, and taking into account isotype. Using TIgGER  software, we derived an individual database of V gene alleles for each donor and realigned all sequences for precise detection of hypermutations. For analysis of general repertoire characteristics (isotype frequencies, SHM levels, CDR3 length, IGHV gene usage, and repertoire similarity metrics) we used samples covered by at least 0.1 cDNA molecules per cell for Bmem, and at least 5 cDNA per cell for PBL and PL.

Repertoire characteristics analysis
Isotype frequencies, rate of SHM, and CDR3 lengths were determined using MiXCR v3.0.10 (Bolotin et al. 2015). For calculation of background IGHV gene segment usage and number of 23   562  563  564  565  566  567  568  569  570   571   572   573  574  575  576  577  578  579  580  581  582  583   584   585  586  587  588  589  590  591  592  593  594  595  596  597   598   599 shared clonotypes, we utilized data derived from Gidoni et al. 2019 (European Nucleotide Archive accession number ERP108501) representing naive B cell IGH repertoires. We used repertoires containing more than 5,000 clonotypes and processed them in the same way as our data. IGHV gene frequencies were calculated as the number of unique clonotypes to which a particular IGHV gene was annotated by MiXCR divided by the total number of clonotypes identified in the sample. To assess IGHV gene segments over-and under-represented in studied subsets, we utilized edgeR package v0.4.4 (Robinson et al., 2010) with the 'trended' dispersion model. To evaluate pairwise similarity between repertoires based on IGHV gene segment frequency distributions, we utilized Jensen-Shannon divergence, calculated using the following formula: where P and Q represent distributions of IGHV gene segment in two repertoires, and pi and qi represent frequencies of individual member i (IGHV gene segment). In silico repertoires used for the calculation of background clonal overlap were generated with OLGA software v1.0.2 (Sethna et al. 2019) under standard settings utilizing the built-in model. For clonal overlap calculation, we downsized repertoires to a fixed number of clonotypes. For Fig. 1B, the 14,000 most abundant clonotypes were considered in Bmem, 600 in PBL, and 300 in PL. For Fig. 1C, we considered 5,000 clonotypes for all cell subsets. Clonotypes with identical CDR3 amino acid sequence and the same IGHV gene segment detected in both analyzed samples were considered shared. Clonotypes shared between repertoires of at least two individuals were termed as public.
Assignment of clonal lineages Change-O v0.4.4 (Gupta et al. 2015) was utilized to assign clonal groups, defined as groups of clonotypes with the same V segment, CDR3 length, and at least 85% similarity in CDR3 nucleotide sequence. Before clonal group assignment, we excluded all clonotypes with counts equal to 1. Clonal groups represent observed subsets of clonal lineages originating from a single BCR ancestor, so for simplicity, we use the term 'clonal lineages'. To study evolutionary dynamics of clonal lineages, we joined all replicas, three time-points (T1, T2, and T3), and cell subsets for each patient into a single dataset and excluded clonotypes that were presented by a single UMI. Phylogenetic analysis was performed on four patients for whom we had samples at all time-points, and on clonal lineages containing at least 20 unique clonotypes as in (Nourmohammad et al. 2019).

Clusterization of clonal lineages in HBmem and LBmem clusters
We performed principal component analysis on six scaled variables of clonal lineage composition: fractions of Bmem, PBL, and PL, and fractions of IgM, IgG, and IgA. The IgE isotype was not detected in clonal lineages involved in phylogenetic analysis, so we did not include it as a variable. HBmem and LBmem clusters were defined using the K-means clustering algorithm.

Metric of persistence of clonal lineages
We estimated the frequency of a clonal lineage in the repertoire at a given time-point as the ratio of the number of unique clonotypes in the clonal lineage detected at this time-point to the overall number of unique clonotypes detected at this time-point. If the clonal lineage was not detected at some time-point, we assigned its frequency to pseudocount, as it would be a single clonotype detected from this time-point. To estimate persistence of clonal lineage frequency in the repertoire over time we defined the persistence metric: where f max is the maximum frequency of the clonal lineage in the three time-points and f i , j are its frequencies in the other two (Fig. 3D). Persistence is equal to 1 if the frequency remains consistent at all three time points. If a clonal lineage was detected just once in the experiment and frequencies at other two time points were assigned to pseudocounts, the persistence approaches zero.

Reconstruction of clonal lineage germline sequence
We used MiXCR-derived reference V, D, and J segment sequences to reconstruct IGH germline sequences for each clonal lineage, concatenating only those sequence fragments which were present at CDR3 junctions of original MiXCR-defined clonotypes. Thus, random nucleotide insertions were disregarded, making them appear as gaps in the alignment of lineage clonotypes with the germline sequence. We excluded them from all parts of the phylogenetic analysis where germline sequence was required.

Reconstruction of clonal lineage phylogeny and MRCA
For phylogenetic analysis of clonal lineages, we aligned clonotypes with reconstructed germline sequences using MUSCLE version 3.8.31 with 400 gap open penalty (Edgar 2004). Next, we reconstructed the clonal lineage's phylogeny with RAxML version 8.2.11, using the GTRGAMMA evolutionary model and germline sequence as an outgroup, and computed marginal ancestral states (Stamatakis 2014). The ancestral sequence of the node closest to the root of the tree, represented by the germline sequence, is the MRCA of the sampled clonotypes. It can match the germline sequence or differ by some amount due to SHM, reflecting the starting point of subsequent evolution of observed clonotypes. This allowed us to distinguish between SHMs fixed in the clonal lineage on the way from the germline sequence to the MRCA (G-MRCA SHMs) versus polymorphisms within the observed part of lineage. The G-MRCA pdistance in Fig. 4B was measured as a fraction of diverged positions between germline and MRCA sequences.

McDonald-Kreitman (MK) test
The (MK) test is designed to detect the effects of positive or negative selection on population divergence from another species or its ancestral state (McDonald and Kreitman 1991). It is based on the comparison of ratios of nonsynonymous to synonymous substitutions observed in diverged and polymorphic sites, and estimates the fraction of diverged amino acid substitutions fixed by positive selection: where P n and P s respectively represent nonsynonymous and synonymous polymorphisms, and D n and D s respectively represent nonsynonymous and synonymous divergences fixed in the , resulting in α > 0. Negative selection has the opposite effect and produces α < 0.
To detect selection in the origin of clonal lineages, we considered G-MRCA SHM as divergent changes, and the remaining SHM in a clonal lineage after the MRCA as polymorphic ones (Fig.  5A). If we observed different nucleotides in the germline sequence and MRCA at a site that was also polymorphic, we considered it as divergent only if the germline variant was not among the polymorphisms (Supplementary Table S2, examples of codons q and r). Codons with unknown germline state were excluded from the MK test (Supplementary Table S2, example of codon j).
To perform the MK test on joined HBmem or LBmem cluster variation, we summed variation of all clonal lineages of the same cluster in each category (D n ,D s , P n , P s ). Calculations of α of distinct clonal lineages for comparison of its distributions between two clusters were complicated by zero G-MRCA distance in some clonal lineages, mostly belonging to the HBmem cluster. We dealt with this using three approaches, presented in Supplementary Table   S3. In the first, we added pseudocounts to D n andD s in each clonal lineage, so that for clonal lineages with zero G-MRCA distance, D n D s =1 . In the second, we excluded clonal lineages with zero G-MRCA distance from the analysis, still adding pseudocounts to D n andD s in each clonal lineage in cases where the G-MRCA distance consists of just one nonsynonymous or synonymous substitution. In the third, we compared only those clonal lineages that had at least one nonsynonymous and at least one synonymous substitution on the G-MRCA branch. We also calculated the MK test on joined variation for all types of exclusion criteria to check its robustness; however, there is no need to exclude clonal lineages in the case of the joined test (Supplementary Table S3). In the first approach clonal lineages with zero G-MRCA distance always produced negative α and biased median α to negative values as well. Medians of α in the second and third approaches were more consistent with results of the test on joined variation. However, in the third approach, the filter excluded most of the HBmem cluster, and so in the main test we presented results of the second approach (Fig. 5B). To check the significance of deviation of α from neutral expectations, we used an exact Fisher test as in the original MK pipeline (McDonald and Kreitman 1991).

πNπS
To calculate πNπS we identified SHMs in each clonal lineage relative to the reconstructed MRCA sequence. In multiallelic sites (with multiple SHMs observed, see codon i in Supplementary Table S2 as an example) we considered each variant as an independent SHM event. πN and πS were calculated as the number of nonsynonymous and synonymous SHMs in a clonal lineage, normalized to the number of nonsynonymous and synonymous sites in the MRCA sequence respectively. The resulting πNπS value is the ratio between πN and πS: where N and S are the number of nonsynonymous or synonymous SHMs, respectively, observed in the clonal lineage and N S and S S are the number of nonsynonymous or synonymous sites, respectively, in the MRCA sequence of the clonal lineage, calculated as in (Gojobori 1986).

Site frequency spectrum
Site frequency spectrum (SFS) reflects the distribution of SHM frequencies in the clonal lineage. We calculated the frequency of each SHM as a number of unique clonotypes carrying the SHM relative to the overall number of unique clonotypes in the lineage. To visualize SFS, we binned SHM frequencies into 20 equal intervals from 0 to 1 with a step size of 0.05, and counted SHM density in each bin as the number of SHMs in a given frequency bin normalized to the overall number of SHMs detected in the lineage. To obtain the cluster average SFS, we took the mean of clonal lineages of the same cluster in each frequency bin.

Normalized πNπS in bins of SHM frequencies
To compare ratios of nonsynonymous and synonymous SHMs of different frequencies between two clusters, we calculated normalized πNπS in bins of SHM frequency. For this purpose we used a smaller number of frequency bins (0; 0.2; 0.4; 0.6; 0.8; 1) to reduce the probability of bins without observed SHMs. To deal with the remaining empty bins, we added pseudocounts to nonsynonymous and synonymous SHMs in each frequency bin. Thus, normalized πNπS in the i-th SHM frequency bin was calculated as following: where N i and S i are the number of nonsynonymous and synonymous SHMs, respectively, in the i-th frequency bin, ∑  (Gojobori 1986