Longitudinal high-throughput TCR repertoire profiling reveals the dynamics of T cell memory formation after mild COVID-19 infection

COVID-19 is a global pandemic caused by the SARS-CoV-2 coronavirus. The T cell response is a critical part of both individual and herd immunity to SARS-CoV-2 and the efficacy of developed vaccines. However neither the dynamics and cross-reactivity of the SARS-CoV-2-specific T cell response nor the diversity of resulting immune memory are well understood. In this study we use longitudinal high-throughput T cell receptor sequencing to track changes in the T cell repertoire following two mild cases of COVID-19 infection. In both donors we identified SARS-CoV-2-responding CD4+ and CD8+ T cell clones. We describe characteristic motifs in TCR sequences of COVID-19-reactive clones, suggesting the existence of immunodominant epitopes. We show that in both donors the majority of infection-reactive clonotypes acquire memory phenotypes. Certain CD4+ clones were detected in the memory fraction at the pre-infection timepoint, suggesting participation of pre-existing cross-reactive memory T cells in the immune response to SARS-CoV-2.


INTRODUCTION
COVID-19 is a global pandemic caused by the novel SARS-CoV-2 betacoronavirus [1]. T cells are crucial for clearing respiratory viral infections and providing longterm immune memory [2,3]. Two major subsets of T cells participate in the immune response to viral infection in different ways: activated CD8+ T cells directly kill infected cells, while subpopulations of CD4+ T cells produce signaling molecules that regulate myeloid cell behaviour, drive and support CD8 response and the formation of long-term CD8 memory, and participate in the selection and affinity maturation of antigen specific Bcells, ultimately leading to the generation of neutralizing antibodies. In SARS-1 survivors, antigen-specific memory T cells were detected up to 11 years after the initial infection, when viral-specific antibodies were undetectable [4,5]. The T cell response was shown to be critical for protection in SARS-1-infected mice [6]. Patients with X-linked agammaglobulinemia, a genetic disorder associated with lack of B cells, have been reported to recover from symptomatic COVID-19 [7,8], suggesting that in some cases T cells are sufficient for viral clearance. Theravajan et al. showed that activated CD8+HLA-DR+CD38+ T cells in a mild case of COVID-19 significantly expand following symptom onset, reaching their peak frequency of 12% of CD8+ T cells on day 9 after symptom onset, and contract thereafter [9]. Given the average time of 5 days from infection to the onset of symptoms [10], the dynamics and magnitude of T cell response to SARS-CoV-2 is similar to that observed after immunization with live vaccines [11]. The exact immunodominant CD8+ and CD4+ SARS-CoV-2 epitopes are yet unknown. However, SARS-CoV-2-specific T cells were detected in COVID-19 survivors by activation following stimulation with SARS-CoV-2 proteins [12], or by viral protein-derived peptide pools [13,14]. Some of the T cells activated by peptide stimulation were shown to have a memory phenotype [13], and some potentially cross-reactive CD4+ T cells were found in healthy donors [14,15].
T cells recognise short pathogen-derived peptides presented on the cell surface of the Major Histocompatibility Complex (MHC) using hypervariable T cell receptors (TCR). TCR repertoire sequencing allows for the quantitative tracking of T cell clones in time, as they go through the expansion and contraction phases of the response. It was previously shown that quantitative longitudinal TCR sequencing is able to identify antigen-specific expanding and contracting T cells in response to yellow fever vaccination with high sensitivity and specificity [16][17][18]. Not only clonal expansion but also significant contraction from the peak of the response are distinctive traits of T cell clones specific to the virus [17].
In this study we use longitudinal TCRalpha and TCRbeta repertoire sequencing to quantitatively track T cell clones that significantly expand and contract after a mild COVID-19, and determine their phenotype. We reveal the dynamics and the phenotype of the memory cells formed after infection, identify pre-existing T cell memory clones participating in the response, and describe public TCR sequence motifs of SARS-nCoV-2-reactive clones, suggesting a response to immunodominant epitopes.

RESULTS
On March 14th (day 0) donor W, 27 year old female and donor M, 28 year old male, returned to Russia from the Rhône-Alpes region of France, one of the centers of the COVID-19 outbreak in France at the time. Upon arrival, according to local regulations, they were put into strict self-quarantine for 14 days. On day 3 of self-isolation both developed low grade fever, fatigue and myalgia, which lasted 4 days and was followed by a temporary loss of smell for donor M. On days 15, 30, 37 and 45 we collected peripheral blood samples from both donors. The presence of SARS-CoV-2 specific antibodies in the plasma was measured at all timepoints using SARS-CoV-2 S-RBD domain specific ELISA (Fig. S1). From each blood sample we isolated PBMCs (peripheral blood mononuclear cells, in two biological replicates), CD4+, and CD8+ T cells. Additionally, on days 30 and 45 we isolated four T cell memory subpopulations (Fig. S2): Effector Memory (EM: CCR7-CD45RA-), Effector Memory with CD45RA reexpression (EMRA: CCR7-CD45RA+), Central Memory (CM: CCR7+CD45RA-), and Stem Cell-like Memory (SCM: CCR7+CD45RA+CD95+). From all samples we isolated RNA and performed TCRalpha and TCRbeta repertoire sequencing as previously described [19]. For both donors, TCRalpha and TCRbeta repertoires were obtained for other projects one and two years prior to infection. Additionally, TCR repertoires of multiple samples for donor M -including sorted memory subpopulations -are available from a published longitudinal TCR sequencing study after yellow fever vaccination (donor M1 samples in [16]).
From previously described activated T cell dynamics for SARS-CoV-2 [9], and immunization with live vaccines [11], the peak of the T cell expansion is expected around day 15 post-infection, and responding T cells significantly contract by day 45. However, Weiskopf et al. [13] reports an increase of SARS-CoV-2-reactive T cells at later timepoints, peaking in some donors after 30 days following symptom onset. To identify groups of T cell clones with similar dynamics in an unbiased way, we used Principal Component Analysis (PCA) in the space of T cell clonal trajectories (Fig. 1b and c). In both donors, and in both TCRalpha and TCRbeta repertoires, we identified three clusters of clones with distinct dynamics. The first cluster (Fig. 1bc, purple) corresponded to abundant TCR clonotypes which had constant concentrations across timepoints, the second cluster (Fig. 1bc, green) showed contraction dynamics from day 15 to day 45, and the third cluster (Fig. 1bc, yellow), showed an unexpected clonal expansion from day 15 with a peak on day 37 followed by contraction. The clustering and dynamics are similar in both donors and are reproduced in TCRbeta (Fig. 1bc) and TCRalpha ( Fig. S3) repertoires. We next used edgeR, a software for differential gene expression analysis [20], to specifically detect changes in clonotype concentration between pairs of timepoints in a statistically reliable way. EdgeR uses biological replicate samples collected at each timepoint to train a negative-binomial noise model. We identified 374 TCRalpha and 373 TCRbeta clonotypes in donor W, and 797 TCRalpha and 672 TCRbeta in donor M significantly contracted from day 15 to day 45 (largely overlapping with cluster 2 of clonal trajectories). 438 TCRalpha and 533 TCRbeta for donor W, and 531 TCRalpha and 599 TCRbeta clonotypes for donor M were significantly expanded from day 15 to 37 (corresponding to cluster 3 of clonal trajectories). Reproducing the analysis using NoiseET, a Bayesian differential expansion model [21], yielded similar results (see Fig. S4), suggesting that our results are robust to the choice of statistical model and parameter inference procedure. Note that, to identify putatively SARS-CoV-2 reactive clones, we only used post-infection timepoints, so that our analysis can be reproduced in other patients and studies where pre-infection timepoints are unavailable. However, tracking the identified responding clones back to pre-infection timepoints reveals strong clonal expansions from pre-to post-infection (Fig. 1de, Fig. S3cd). For brevity, we further refer to clonotypes significantly contracted from day 15 to 45 as contracting clones and clonotypes significantly expanding from day 15 to 37 as expanding clones. For each contracting and expanding clone we determined their CD4/CD8 phenotype using separately sequenced repertoires of CD4+ and CD8+ subpopulations (see Methods). Both CD4+ and CD8+ subsets participated actively in the response (Fig. 1de). Interestingly, clonotypes expanding after day 15 were significantly biased towards the CD4+ phenotype, while contracting clones had balanced CD4/CD8 phenotype fractions in both donors (Fisher exact test, p < 0.0001 for both donors).
On days 30 and 45 we identified both contracting ( 2a,b) and expanding (SI Fig. 5a-c) T cell clones in the memory subpopulations of peripheral blood. Both CD4+ and CD8+ responding clones were found in the CM and EM subsets, however CD4+ were more biased towards CM, and CD8+ clones much more represented in the EMRA subset. A small number of both CD4+ and CD8+ responding clonotypes were also identified in the SCM subpopulation, which was previously shown to be a longlived T cell memory subset [22]. Intriguingly, a number of responding CD4+ clones, but very few CD8+ clones, were also represented in the repertoires of both donors 1 and 2 years before the infection. Pre-existing clones were expanded after infection, and contracted afterwards for both donors (SI Fig. 6  CoV-2 or any other coronavirus-specific TCR sequences in VDJdb, so the absence of matches suggests that contracting and expanding clones are unlikely to be specific for immunodominant epitopes of common pathogens covered in VDJdb. It was previously shown that TCRs recognising the same antigens frequently have highly similar TCR sequences [24,25]. To identify motifs in TCR amino acid sequences, we plotted similarity networks for significantly contracted (Fig. 2d-g) and expanded ( Fig.  S5d-g) clonotypes. In both donors we found clusters of highly similar clones in both CD4+ and CD8+ subsets for expanding and contracting clonotypes. Clusters were largely donor-specific, as expected, since our donors have dissimilar HLA alleles (SI Table 1) and thus each is likely to present a non-overlapping set of T cell antigens. The largest cluster, described by the motif TRAV35-CAGXNYGGSQGNLIF-TRAJ42, was identified in donor M's CD4+ contracting alpha chains. Clones from this cluster constituted 15.3% of all of donor M's CD4+ responding cells on day 15, suggesting a response to an immunodominant CD4+ epitope in the SARS-CoV-2 proteome. The high similarity of the TCR sequences of responding clones in this cluster allowed us to independently identify motifs from donor M's CD4 alpha contracting clones using the ALICE algorithm [26] (Fig.  S8). While the time dependent methods (Fig. 1) identify abundant clones, the ALICE approach is complementary to both edgeR and NoisET as it identifies clusters of T cells with similar sequences independently of their individual abundances. Finding highly similar responding TCR sequences in both donors, despite their different HLA types, suggests that some of the identified responding clones could be public, and are likely to be found in other SARS-CoV-2 infected individuals. Using a mechanistic model of TCR recombination [27,28] we predicted the probability of occurrence of the reactive clones (see Methods). For TCRbeta we also looked for the identified contracting and expanding clones in a large public dataset of TCRbeta repertoires from [29]. The TCRbeta occurrence probability correlated well with the frequency of contracting and expanding clones in this public dataset (Fig. S7), suggesting that at least a fraction of the SARS-CoV-2 responding clones are public and are likely to be shared between patients.

DISCUSSION
Using longitudinal repertoire sequencing, we identified a group of CD4+ and CD8+ T clones that contract after recovery from a coronavirus infection. Our response timelines agree with T cell dynamics reported by Theravajan et al. [9] for mild COVID-19, as well as with dynamics of T cell response to live vaccines [11]. Surprisingly, in both donors we also identified a group of predominantly CD4+ clonotypes which expanded from day 15 to day 37 after the infection. One possible explanation for this second wave of expansion is the priming of CD4+ T cells by antigen-specific B-cells, but there might be other mechanisms such as the migration of SARS-CoV-2 specific T cells from lymphoid organs. It is also possible that later expanding T cells are triggered by another infection, simultaneously and asymptomatically occurring in both donors around day 30. The major limitation of our study is that we do not know the specificity of contracted and expanded clones observed after infection. However, accumulation of TCR sequences from SARS-CoV-2-specific T cells will allow us to connect the TCR motifs we described to particular epitopes of the virus and estimate their relative contribution to the immune response. Databases of SARS-CoV-2-associated TCR sequences combined with sequencing of patients' TCR repertoires will allow for the tracking of the adaptive immune response in the early stages of the disease, and possibly identify features correlating with clinical outcomes. We showed that a large fraction of putatively SARS-CoV-2 reactive T cell clones are later found in memory subpopulations, and a subset of CD4+ clones were identified in pre-infection central memory subsets. The presence of SARS-CoV-2 crossreactive CD4 T cells in healthy individuals was recently demonstrated [14,15]. Our data further suggests that cross-reactive CD4+ T cells can participate in the response in vivo. It is interesting to ask if the presence of cross-reactive T cells before infection is linked to the mildness of the disease. Larger studies with cohorts of severe and mild cases with pre-infection timepoints are needed to address this question.

Donors and blood samples
Peripheral blood samples from two healthy volunteers, donor W (female, 27 years old) and donor M (male, 28 years old) were collected with written informed consent in a certified diagnostics laboratory. Both donors gave written informed consent to participate in the study under the declaration of Helsinki. HLA alleles of both donors (SI Table 1) were determined by an in-house cDNA highthroughput sequencing method.

SARS-CoV-2 S-RBD domain specific ELISA
An ELISA assay kit developed by the National Research Centre for Hematology was used for detection of anti-S-RBD IgG according to the manufacturer's protocol. The relative IgG level was calculated by dividing the OD (optical density) values by the mean OD value of the cut-off positive control serum supplied with the Kit. OD values of 2 samples (d37 and d45) for donor M exceeded the limit of linearity for the Kit. In order to properly compare the relative IgG levels between d30, d37 and d45, these samples were diluted 1:400 instead of 1:100, the ratios d37:d30 and d45:d30 were calculated and used to calculate the relative IgG level of d37 and d45.

TCR library preparation and sequencing
TCRalpha and TCRbeta cDNA libraries preparation was performed as previously described in [19]. RNA was isolated from each sample using Trizol reagent according to the manufacturer's instructions. A universal primer binding site, sample barcode and unique molecular identifier (UMI) sequences were introduced using the 5'RACE technology with TCRalpha and TCRbeta constant segment specific primers for cDNA synthesis. cDNA libraries were amplified in two PCR steps, with introduction of the second sample barcode and Illumina TruSeq adapter sequences at the second PCR step. Libraries were sequenced using the Illumina NovaSeq platform (2x150bp read length).

TCR repertoire data analysis
Raw data preprocessing. Raw sequencing data was demultiplexed and UMI guided consensuses were built using migec v.1.2.7 [30]. Resulting UMI consensuses were aligned to V and J genomic templates of the TRA and TRB locus and assembled into clonotypes with mixcr v.2.1.11 [31]. See SI Table 2 for the number of cells, UMIs and unique clonotypes for each sample.
Identification of clonotypes with active dynamics. Principal component analysis (PCA) of clonal trajectories was performed as described before [16]. First we selected clones which were present among the top 1000 abundant in any of post-infection PBMC repertoires. Next, for each clone we calculated its frequency at each post-infection timepoint and divided this frequency by the maximum frequency of this clone for normalization. Then we performed PCA on the resulting normalized clonal trajectory matrix and identified three clusters of trajectories with hierarchical clustering with complete linkage, using Euclidean distances between trajectories. We identify statistically significant contractions and expansions with edgeR as previously described [17], using FDR adjusted p < 0.01 and log 2 fold change threshold of 1. NoisET implements the Bayesian detection method described in [21]. Briefly, a two-step noise model accounting for cell sampling and expression noise is inferred from replicates, and a second model of expansion is learned from the two timepoints to be compared. The procedure outputs the posterior probability of expansion or contraction, and the median estimated log 2 fold change, whose thresholds are set to 0.05 and 1 respectively.
TCR generative probability estimation. Models of V(D)J-recombination for the TCRalpha and TCRbeta loci were inferred for each individual using IGoR [32]. Selection models were then inferred and evaluated using SONIA [28] to estimate TCR occurrence probabilities.

Data availability
Raw sequencing data are deposited to the Short Read Archive (SRA) accession: PRJNA633317. Processed TCRalpha and TCRbeta repertoire datasets, resulting repertoires of SARS-CoV-2-reactive clones, and raw data preprocessing instructions can be accessed from: https: //github.com/pogorely/Minervina_COVID.  current state of the science. Immunity p S1074761320301837.    . TCR occcurence probability predicts public responding clonotypes. a Probability of TCR generation and selection (Ppost) of TCRbeta clones with active dynamics (both contracted and expanded from both donors) is strongly correlated with the occurrence of these clones in the large public database of TCRbeta repertoires from Emerson et al. [29]. (b)Distribution for TCR occurency probability of SARS-CoV-2 responding TCRbeta clonotypes. Dashed line shows TCR occurrence probability distribution for all clones in total repertoire. Some identified clones are public and have large recombination probabilities, and thus are expected to be shared with additional SARS-CoV-2 patients.  15. Similarity network shows ALICE hits (clones in repertoire with more neighbours than expected by chance), which differ by 2 mismatches or less in TCRalpha amino acid sequence. Darker colors indicate larger frequency of clone in the repertoire, vertex size indicates degree. The majority (54%, 99/183) of hits identified by the algorithm correspond to a single large TRAV35/TRAJ42 cluster of CD4+ contracting clones also seen on Fig. 1d.