Generation of self-reactive, shared T-cell receptor α chains in the human thymus

The T-cell receptor (TCR) repertoire is generated in a semistochastic process of gene recombination and pairing of TCR α to TCR β chains with the estimated total TCR diversity of > 10 8 . Despite this high diversity, similar or identical TCR chains are found to recur in immune responses. Here, we analyzed the thymic generation of TCR sequences previously associated with recognition of self-and nonself-antigens, represented by sequences associated with autoimmune diabetes and HIV, respectively. Unexpectedly, in the CD4 + compartment TCR α chains associated with the recognition of self-antigens were generated in significantly higher numbers than TCR α chains associated with the recognition of nonself-antigens. The analysis of the circulating repertoire further showed that these chains are not lost in negative selection nor predominantly converted to the regulatory T-cell lineage. The high abundance of self-reactive TCR α chains in multiple individuals suggests that the human thymus has a predilection to generate self-reactive TCR α chains independently of the HLA-type and that the individual risk of autoimmunity may be modulated by the TCR β repertoire associated with these chains.


Introduction
T-cell functionality is dependent on antigen recognition by a specific T-cell receptor (TCR), a heterodimeric cell surface receptor that binds to a complex formed by a peptide antigen in the groove of an HLAmolecule.The genes encoding the two TCR chains, TCRα and TCRβ, are assembled from gene segments in somatic gene recombination events during T-cell development, producing a highly diverse repertoire of surface TCR heterodimers [1].The theoretical upper limit of TCR diversity has been calculated to range from 10 15 to 10 20 , greatly exceeding the total number of T cells in the human body, while the actual human TCR repertoire has been estimated to be in excess of 10 8 [2][3][4][5].Most of this diversity is concentrated in the complementarity determining region 3 (CDR3), the junctional section generated by imprecise joining of the gene segments.Moreover, structurally the hypervariable loops encoded by the CDR3 sequences are mainly responsible for binding to the antigenic peptide, and thus likely to be the most important determinants of specificity [6].
Given this high diversity, the default expectation has been that the TCR repertoire in each individual would be largely unique, or private.However, a large and growing body of research has shown that different individuals often use similar or same TCRs in antigen recognition at least for one chain of the heterodimer, whether the specificity be against viral, self-or tumor proteins [7,8].This phenomenon of shared, or public clones has been especially prominent in chronic or latent viral infections, such as cytomegalovirus or herpes simplex virus [9].Several studies also suggest that the public responses may be qualitatively different from private responses.For example, in some murine models of autoimmunity the destructive autoreactive responses are dominated by public clones [10,11], and studies of long-term HIV-positive non-progressors have identified highly cross-reactive public responses controlling HIV replication [12].
Underlying such public responses is the realization that a surprisingly large fraction of the naive T-cell repertoire can be shared by unrelated individuals.For the TCRβ chain this overlap has recently been reported to be up to 10%, and for TCRα chain 26.5% in peripheral blood [13], and in the thymus 6.1% and 46.7% for TCRβ and TCRα chains, respectively [14].Much of this sharing has been suggested to result from convergent recombination in the thymus, i.e., the biased generation of TCR sequences converging on a number of chains that arise repeatedly, often because they contain little or no junctional modifications [7].Here, we have studied the thymic generation of TCR chains previously identified to be associated with pathological conditions and show that many of these can also be found in the thymus of unrelated individuals.Surprisingly, our results show that the human thymus is predisposed to generating self-reactive TCRα chains that persist in the periphery in the population of conventional T cells.

Samples
The study and all sample collections were approved by the Ethics Committee of the Hospital District of Helsinki and Uusimaa and a written informed consent was obtained from the subjects or their parents.Six thymus resects were obtained from immunologically healthy infants undergoing corrective cardiac surgery for a congenital heart defect (aged 4-8 months, 2/6 were female).Two of the thymus donors (samples A and B) were monozygous twins and the genetic impact on the TCR repertoire has been analyzed previously [15].Thymocytes were mechanically extracted from the resects and aliquots of 10 million thymocytes were stored as dry pellets in − 70 • C.
The peripheral blood samples were derived from the Finnish Pediatric Diabetes Register [16].The register collects data and biological samples from children with newly diagnosed type 1 diabetes (T1D) and their first-degree relatives.The participants are screened for HLA-conferred susceptibility to T1D and tested for five T1D-associated autoantibodies [islet cell antibodies (ICA), insulin autoantibodies and antibodies to glutamic acid decarboxylase 65 (GADA), islet-antigen 2 (IA2A), and zinc transporter 8 (ZnT8A)].The current study includes ten siblings (aged 3-14 years, 5/10 female) with a heterozygous HLA-DR3/DR4 genotype.The children tested negative for the T1D-associated autoantibodies and were free of clinical symptoms of diabetes at the time of sampling and also for a period of 2-4 years after the sampling.Peripheral blood mononuclear cells (PBMC) were isolated using Ficoll Paque Plus (GE Healthcare, Chicago, Illinois, USA) gradient centrifugation and an aliquot of five million cells were frozen in − 140 • C with FCS-DMSO freezing media.

Sequencing
DNA was extracted from frozen thymocytes and PBMCs using DNeasy or QIAsymphony kits (both from Qiagen, Hilden, Germany) according to the manufacturer's instructions.The sequencing of the TCRα and TCRβ sequences was performed with ImmunoSEQ sequencing service (Adaptive Biotech, Seattle, Washington USA) as previously described [5].The sequencing assay uses a standardized quantity of quality-controlled DNA and consists of a multiplex PCR assay that spans recombined TCRα and TCRβ genes at a sufficient length to cover the entire CDR3 region and to identify VJ for TCRα gene and VDJ for TCRβ gene.Amplicon sequencing was performed on the Illumina platform.TCRα and TCRβ definitions were based on the IMGT database (www.imgt.org).Primer bias was corrected using a synthetic repertoire of TCRs, and barcoded, spiked-in synthetic templates.All sequence datasets analyzed in the current study are available at the immuneACCESS data repository (clients.adaptivebiotech.com/immuneaccess).

Database
Epitope-specific TCRs were collected in the literature and were principally issued from the previously published databases Mc-PAS-TCR [17] and VDJdb [18].Some manually collected TCRs were also included [19][20][21][22].We only accepted TCRs for which the specificity had been established by HLA multimer-staining or verified by peptide stimulation assays.The entire list of epitope-specific TCRs is available in Supplement 1.

TCR analysis
TCR sequences were downloaded in the format of immunoSEQ files with Export Sample (v2) (Adaptive Biotechnologies, Seattle, Washington, USA).The scripts to analyze sequence overlap and to search for the epitope-specific TCRs in the sequenced repertoires were written in programming languages R (www.r-project.org) and Python (www.python.org).All scripts are available on request.To measure the overlap between two sequence datasets, we used the Jaccard index, which is defined as the intersection of two datasets divided by their union.For random subsampling of the epitope-specific TCRs to a desired sample size we used Python random.sample of the random package.
Clonal overlap was assessed calculating the Jaccard index, which is defined as the size of the intersection of two data sets (A and B) divided by the size of their union: J(A, B) = |A∪B| |A∩B| .The averages of the detection rate and of the abundance of TCRs related to self and nonself were compared applying two-sided Wilcoxon signed rank test.Statistical significance was defined as p < 0.05.The statistics were calculated using SPSS Statistics Software version 24.

TCR clone distribution in thymus samples
Thymus samples were obtained from six immunologically healthy infants (A-F) during corrective cardiac surgery.From each donor an aliquot of 10 million mechanically extracted thymocytes was sequenced for TCRα and TCRβ chains.Two of the donors (A and B) were monozygotic twins.As previously reported, their TCR repertoires showed a genetic impact in the V and J gene usage and in the generation of junctional sequences, but not in thymic selection [15].On average, we obtained 5.9 and 1.1 million unique TCRα and TCRβ nucleotide sequences, respectively.Of the unique TCRα chains 31.2% and of the unique TCRβ chains 78.8.% were in-frame (Supplement 2).For unknown reasons, sequencing of sample E produced clearly fewer TCR sequences than the other samples.
In peripheral repertoires, the TCR clone sizes display broad N. Heikkilä et al. distributions and are thought to follow a power law where the occurrence of a clone varies as a power of its size [23].This phenomenon was particularly striking within the thymic TCRα repertoires where the probabilities of clone sizes display a straight line on a double logarithmic scale, which is a hallmark of a power law distribution (Fig. 1A).Consequently, the thymic TCR repertoire contains some remarkably high-abundance TCRα clones.The most abundant clones are represented on the average by 872 genomes (range 106-1249), while the majority of clones had a clone size of only one.The number of non-templated inserts between the TRAV and TRAJ segments was lower in the high-abundance clones than in the entire repertoire suggesting that they are closer to the germline and more easily generated (Fig. 1B).Also, clonotype sharing was more common among the highly abundant clonotypes as measured by the Jaccard index, the intersection of two samples divided by their union (Fig. 1C).Indeed, the most abundant clonotypes were typically shared among all individuals and displayed a large clonal size in all of them (Fig. 1D).A broad distribution of clone sizes and a lower number of non-templated insertions among the most abundant clones were observed in the TCRβ repertoire, although less clearly than in the TCRα repertoire (Fig. 1A, Supplement 3).

Database of TCRs associated with self-and nonself-epitopes
We then performed a literature search to create a database of TCR sequences associated with a known specificity, concentrating on two categories of antigens: First, TCRs identified to be specific for selfepitopes, and second, TCRs specific for nonself-epitopes.The majority of TCRs was derived from two previously published databases Mc-PAS-TCR [17] and VDJdb [18] but manually searched sequences specific to islet-antigens were also included [19][20][21][22].In the final database the self-reactive TCR chains were limited to those recognizing islet antigens, since TCRα chains with other specificities were particularly scarce in the literature.As a representative nonself-antigen we selected HIV, because unlike herpesviruses or influenza, our HIV-negative donors had not encountered its antigens.Further, we only included TCRs for which the specificity was determined with sufficient rigor.Our final reference database consisted of 546 unique TCRα clones and 2407 unique TCRβ clones that were specific for T1D-or HIV-related epitopes (Supplement 1).We further divided the TCRs to CD4 + and CD8 + subsets depending on the CD4/CD8 phenotype and HLA class I/II restriction of the original T cells from which the TCR was derived (hereafter CD4 + and CD8 + ; Table 1).The majority of the T1D-associated TCRs were derived from CD4 + T cells restricted by the HLA-DR4 or HLA-DR3 haplotypes associated with a high T1D risk [24].

Preferential thymic generation of TCRα clonotypes used by CD4 + T cells associated with T1D
To compare TCR sequences in our reference database with the thymus samples, an exact match with CDR3 amino acid sequence and matching TCR V-and J-gene segments were required.Using these criteria, on average 24.1% of the 184 HIV-associated CD4 + TCRα chains in the database were found in the thymus samples.Unexpectedly, a significantly higher fraction of the T1D-associated CD4 + TCRα chains was detected in the thymus samples (32.8% of 130 sequences in the database, Wilcoxon signed rank test p 0.028; Fig. 2A).On the average, each CD4 + T1D-associated amino acid chain was encoded by 2.5 unique nucleotide sequences while each CD4 + HIV-associated amino acid chain was encoded by 2.1 unique nucleotide sequences (p 0.028; Fig. 2B).Moreover, since the sequencing assay is based on genomic DNA, it allows a reasonable estimate of the clonal copy numbers.The average clone size corresponding to T1D-associated TCRα amino acid chains was significantly larger than HIV-associated chains (12.4 vs. 9.9, p 0.028; Fig. 2C).
For TCRα chains derived from CD8 + T cells, amino acid sequences associated with T1D were found less frequently in the thymic samples than those associated with HIV (25.2% of 80 sequences in the database vs. 37.2% of 152 sequences in the database, p 0.028), and there was no difference in the average number of nucleotide sequences encoding them or in the average clonal size (Fig. 2B&C).
To ascertain that the results were not skewed by the different number of T1D-and HIV-associated sequences in our database, we drew five random subsets of HIV-associated sequences matched to the size of the T1D-sequence sets and used these to search for matching sequences in the thymus samples.Again, CD4 + T1D-associated TCRαs were detected more frequently than the HIV-associated TCRαs (32.8% vs. mean of five resampled HIV-TCR sets 24.2%, p 0.028; Fig. 2D), whereas CD8 + HIVassociated TCRαs were more frequent than T1D-associated TCRαs (25.2% vs. mean of five resampled HIV-TCR sets 40.3%, p 0.028; data not shown).The number of unique nucleotide sequences was again higher for CD4 + T1D-than HIV-associated chains (2.5 vs. mean of five HIV resamplings 2.1, p 0.028; Fig. 2E), as was the average clonal size (12.4 vs. mean of five HIV resamplings 9.6, p 0.028; Fig. 2F).
Overall, the average combined clone size of the CD4 + T1D-associated TCRα chains identified in the thymus samples was 577.The average combined clone size of the HIV-associated TCRα chains, measured from the database size-matched resamplings, was 328 (p 0.028).
Despite the relatively high number of TCRβ chains in the reference database, there were very few matches with the thymus samples (Supplement 4).For both CD4 + and CD8 + sequences only 1-2 matches of T1D-associated sequences were found in 4/6 thymus samples, each encoded by a single nucleotide sequence and with a clone size of one.CD4 + HIV-associated sequences were similarly rare in the thymuses, with 1-3 matches in 4/6 thymus samples, encoded on the average by 1.3 nucleotide sequences and with an average clone size of 1.4.CD8 + HIVassociated sequences were found slightly more often.On average, 1.8% of 1819 HIV-associated sequences were found in the thymuses, encoded by 1.4 nucleotide sequences and with a clone size of 1.8.Thus, the TCRβ sequences were not analyzed further.

T1D-associated TCRα chains persist in the periphery
Since the increased frequency of T1D-associated TCRα chains was detected in unselected thymocyte samples, it was possible that these sequences were obtained from immature thymocytes before negative Fig. 2. Epitope-specific TCRα sequences in thymus samples.The frequency of CD4 + and CD8 + T1D-and HIV-associated TCRα reference database sequences found in the six thymus samples (A), the average number of unique nucleotide sequences encoding each identified amino acid chain (B), and their average clone sizes (C).The T1D-and HIV-associated sequences are indicated with circles and squares, respectively.Wilcoxon signed rank test was applied to compare the means between the two groups.Comparison of the average frequency of CD4 + T1D-associated sequences (D), the average number of unique nucleotide sequences encoding each identified T1D-associated amino acid chain (E), and their average clone sizes (F) with the HIV-associated reference database resampled five times to the size of the T1D database.The bars show the average and standard deviation.In (A-C) the six individual thymus samples are shown, in (D-F) n = 6.
N. Heikkilä et al. selection, so that the clones might be deleted before thymic egress.We therefore studied the presence of the T1D-and HIV-associated TCRα chains in peripheral blood samples from ten healthy children, aged 3-14 years.All donors had the T1D high-risk HLA-DR3 and DR4 haplotypes, but no autoantibodies to islet antigens (GADA, ICA, IA2A, IAA, Znt8A), and no signs of clinical T1D during a follow-up for 2-4 years after sampling.From these samples we obtained an average of 825 000 total TCRα reads, corresponding to 587 000 unique nucleotide sequences (Supplement 2).
Again, both T1D-and HIV-associated TCRαs were readily detected in the peripheral blood samples.Similarly to the thymus, for CD4 + sequences T1D-associated TCRαs were detected more often than HIVassociated TCRαs (T1D 15.7% of the 130 sequences in the database vs. HIV 8.4% of 184 sequences in the database, p 0.005), while the opposite was observed for CD8 + reference sequences (T1D 10.8% of the 80 sequences in the database vs. HIV 18.2% of the 152 sequences in the database p 0.005; Fig. 3A).The number of unique nucleotide sequences encoding each detected amino acid chain was higher for CD4 + T1D-than HIV-associated TCRαs (1.5 vs. 1.3, p 0.013; Fig. 3B), but the average clone size was similar (Fig. 3C).For the reference sequences obtained from CD8 + T cells, the number of unique nucleotide sequences encoding the detected amino acid chains was similar, but some of the HIVassociated clones appeared to be expanded and thus the average clone size was higher for HIV-associated than T1D-associated sequences (17.7 vs. 3.4, p 0.005).The children participating in our study were all HIVseronegative.
Finally, we repeated the random sampling of the HIV reference database to the same number of sequences contained in the T1D database, as described for the thymus samples.The sampling of HIV-related sequences was again done five times and the average numbers compared with the T1D data, with essentially similar results to those obtained with the full reference database (Fig. 3D-F).In the peripheral samples the average combined clone size of CD4 + T1D-associated TCRα chains was 72.9 and that of HIV-associated sequences, measured after matched resampling, was 32.9 (p 0.005).

T1D-associated TCRα clonotypes are not enriched within peripheral blood regulatory T cells
Another possibility was that, although originally identified in effector T cells, some of the T1D-associated sequences might be diverted to the regulatory T cell (Treg) lineage.We sorted conventional T (Tconv) cells (CD3 + CD4 + CD25high) and Tregs (CD3 + CD4 + CD25 − ) from peripheral blood samples from three immunologically healthy adults, aged 21-30 years (Supplement 5).TCRα sequencing of the two cell subsets produced on average 308 000 and 53 000 total nucleotide reads corresponding to 247 000 and 39 000 unique sequences for Tconv and Treg populations, respectively (Supplement 2).
Since the TCRα sequences were derived from sorted CD4 + T cells, we only searched the samples for the CD4 + sequences in the reference database.In the Treg samples only 3-4 T1D-associated reference sequences were detected (2.6% of the 130 sequences in the database; Fig. 4A), most likely because of the smaller number of unique sequences in the samples.These TCRα chains were represented by few unique clones (average 1.3; Fig. 4B) and their total genome number was low (average 1.2; Fig. 4C).In the Tconv samples, the average detection rate of T1D-associated chains was 11.8% of the 130 sequences, well within the range of the detection rate in the unsorted peripheral blood samples (10.0-20.8%).Also, the number of unique nucleotide sequences encoding the T1D-associated amino acid chains and the average clone Fig. 3. Epitope-specific TCRα sequences in peripheral blood samples.The frequency of CD4 + and CD8 + T1D-and HIV-associated TCRα reference database sequences found in the ten peripheral blood samples (A), the average number of unique nucleotide sequences encoding each identified amino acid chain (B), and their average clone sizes (C) The T1D-and HIV-associated sequences are indicated with circles and squares, respectively.Wilcoxon signed rank test was applied to compare the means between the two groups.Comparison of the average frequency of CD4 + T1D-associated sequences (D), the average number of unique nucleotide sequences encoding each identified T1D-associated amino acid chain (E), and their average clone sizes (F) with the HIV-associated reference database resampled five times to the size of the T1D database.In (A-C) the ten individual donors are shown, in (D-F) n = 10.
N. Heikkilä et al. size was similar among sorted Tconv and unsorted peripheral blood samples (Fig. 4).Thus, the depletion of Treg cells did not result in the disappearance of the T1D-associated sequences.
Also, many of the T1D-associated sequences were found in both subsets.Altogether eight and 26 unique T1D-associated TCRα amino acid chains were detected in the Treg and Tconv samples, respectively.Six of these TCRαs were found both in Treg and Tconv population.
Of the CD4 + HIV-associated TCRα chains only one was detected in the Treg samples.It was encoded by a single unique nucleotide sequence in each sample and the average clone size was 3.7.In the Tconv samples the frequency of CD4 + HIV-associated TCRα chains as well as their nucleotide sequence numbers, and the average clone sizes were within the range of the values observed for unsorted peripheral blood samples (Supplement 6).Wilcoxon signed rank test was applied to compare the means between the two groups.The average frequency of insulin/ proinsulin-and GAD65-associated TCRα reference database sequences found in the ten peripheral blood samples (D), the average number of unique nucleotide sequences encoding each identified amino acid chain (E), and their average clone sizes (F).In (A-C) n = 6, in (D-F) n = 10.

Generation of GAD65-and insulin/proinsulin-associated TCRα chains
For a self-antigen to have an effect on thymocyte selection it has to be present in the thymus [25].Of the islet antigens insulin has been conclusively demonstrated to be expressed as a tissue-specific antigen in thymic antigen-presenting cells, whereas for GAD65 conflicting reports have been published [26][27][28].Our T1D reference database contained 46 CD4 + TCRα chains associated with insulin/proinsulin and 83 associated with GAD65.One chain was derived from a T cell specific for islet-specific glucose-6-phosphatase catalytic subunit-related protein (IGRP) and was not analyzed separately.
Of the CD4 + TCRα chains associated with GAD65 on average 29.5% were detected in the thymus samples and of those associated with insulin/proinsulin 37.7%, but variation was considerable and the difference not statistically significant (p 0.12; Fig. 5A).The average number of unique nucleotide sequences encoding the GAD65-associated TCRα amino acid chains was slightly higher than that of insulin/proinsulin (2.8 vs 2.1, p 0.028; Fig. 5B), but the average clone sizes were similar (11.9 vs. 14.1, p 0.17; Fig. 5C).Resampling of the GAD65 reference database five times to match the size of the insulin/proinsulin database produced similar results (Supplement 7).
In the peripheral blood samples, no differences between TCRαs associated with GAD65 and insulin/proinsulin were detected (Fig. 5D-E), and sequences associated with both antigens were found at a similar frequency in sorted Tconv cells.

Discussion
With the exception of rare inherited forms of autoimmunity, autoimmune diseases require environmental influences [29].Infections are well established as triggering autoimmunity, most clearly shown in rheumatic fever after streptococcal pharyngitis or spondyloarthropathies after bacterial enteritis in individuals carrying the HLA-B27 haplotype [30][31][32].Increasing evidence also links viral infections to autoimmunity.Epstein-Barr virus infection seems to be required, though not alone sufficient, for the development of multiple sclerosis, and it has also been linked to other autoimmune diseases, including systemic lupus erythematosus, rheumatoid arthritis, and Sjögren's syndrome [33,34].More recently emerged pathogens have also been implicated.Human immunodeficiency virus increases the risk of systemic autoimmunity even when viral replication is suppressed [35,36], while the current outbreak of COVID-19 is also associated with autoimmune manifestations [37,38].Infections have also been linked to type 1 diabetes and the pancreatic autoantigens analyzed in the present study.In particular, Coxsackie B enterovirus can trigger pancreatic autoimmunity in animal models and in humans it has been linked to cases in which the autoreactivity first targets GAD65 instead of insulin/proinsulin [39][40][41].The evidence supporting the role of infections in the development of autoimmunity is thus strong, and the suggested pathogenetic mechanisms, although yet poorly understood, include molecular mimicry, bystander activation and epitope spreading [42].
However, it is clear that the individual risk of developing autoimmunity is modified by a complex interplay of genetic susceptibility and other environmental factors, as well.The general increase in the prevalence of autoimmune diseases in industrialized countries has been explained by changes in the early microbial exposure.The hygiene hypothesis postulates that in the absence of previously prevalent microbial pathogens, the immune system is prone to react inappropriately to selfantigens [43].To this background our results add a potential new risk factor for autoimmunity: The thymic generation of shared, autoantigen-associated TCRα chains and their semistochastic pairing to TCRβ partners.
The possibilities to define the epitope-specificity of TCRs have recently evolved from the experimental characterization of HLA tetramer-binding T cells to computational predictions in large T-cell repertoires.The TCR repertoires with predicted epitope-specificity typically contain public receptor chains with high generation probabilities and form clusters according to the sharing of amino acid motifs [44].Here, we have utilized experimentally well validated epitope-specific TCR chains to investigate the thymic generation and selection of public TCR clonotypes.We have previously reported that a remarkably large fraction of preimmune thymic TCRα repertoire is shared between unrelated individuals.These public TCRαs had fewer non-templated nucleotides, higher generation probabilities than the non-shared repertoire and the clone sizes were often large [14].In the TCRβ chain the interindividual sharing was much lower, probably because the presence of two gene junctions decreases the likelihood of convergent recombination.These findings are also likely to explain why in the present study the epitope-specific public repertoire is so conspicuous in the TCRα but not in the TCRβ repertoire.
In our data a substantial fraction of the T1D-associated TCRα amino acid sequences from the reference database were found in the thymus, and for CD4 + reference sequences they were indeed more common than HIV-associated sequences, were encoded by a higher number of unique nucleotide sequences and displayed a larger average clone size.Although the differences were not large, it is notable that they were highly consistent and found in every one of the thymus samples.This latter fact also shows that the phenomenon is independent of HLA type, since with the exception of one pair of twins, the samples were obtained from unrelated individuals.This may be due to the fact that HLA binding is mainly mediated by hypervariable loops encoded by germline sequences in the V genes, and thus likely to affect mostly V gene usage patterns [6].Indeed, the aspect of TCR repertoire shown to be most strongly affected by heritable factors is TCR gene segment usage [13,15,45].
Our data also clearly showed that the higher detection rate of CD4 + T1D-associated sequences persisted in the periphery and in Tregdepleted cells, indicating that the chains were not lost in negative selection; nor were they diverted to the regulatory lineage.This was also true for TCRα chains associated with insulin/proinsulin, a self-antigen expressed by thymic epithelial cells for the purposes of thymic selection [26,27,46].Thus, at the very least, thymic tolerance mechanisms appear to have no negative impact on the public component of islet-reactive TCRα chains.Indeed, thymic repertoire generation seems to even favor them, in every thymus sample we analyzed.Because of the scarcity of well-defined autoreactive TCRα chains in the literature, we cannot say whether these findings are unique to islet antigens or a more general feature of TCRα chains specific to other self-antigens, as well.Furthermore, our cohort of pediatric donors was not screened for other than islet-associated autoantibodies, so we cannot address their putative susceptibility to other forms of autoimmunity.Nevertheless, all donors were clinically healthy.
An obvious point regards the TCRβ partners of the public TCRα repertoire, which remain unknown in our data.Despite advances in determining the sequences of TCRα/β pairs, the clone size of our public chains is too small for experimental resolution.In the thymus the average frequency of all the T1D-associated TCRα chains combined was 6/10 5 and in the periphery 9/10 5 , well below the detection threshold of current techniques.The studies so far published indicate that public heterodimers are rare, and repertoire sharing typically concerns only one of the chains [47].Since antigen recognition, as well as thymic selection mostly works on the heterodimer level, the impact of either of the partner chains alone is limited.However, substantial experimental evidence shows that in some cases the specificity of the TCR may be determined predominantly by one of the chains, or that a public TCRα or TCRβ chain can promiscuously pair with multiple different chains and still preserve the antigen specificity.This was already apparent in our database of epitope-specific TCRs, where identical TCRα chains paired with multiple different TCRβ chains and vice versa (Supplement 1).Furthermore, Nagatsugawa and colleagues showed directly that a given TCRα chain can be enough to confer specificity, while the TCRβ partner N. Heikkilä et al. modifies the TCR affinity over a range of two orders of magnitude [48].In another interesting example, TCRs isolated from patients with beryllium disease expressed either of two conserved TCRβ chains, differing by a single amino acid.One of these TCRβ chains was enough to confer high affinity, allowing promiscuous pairing with a variety of TCRα chains, while the other required an exact TCRα partner [49].Finally, some TCRs have been shown to possess considerable structural plasticity and are thus able to adjust and bind to a range of slightly different antigens [50,51], suggesting that they may be amenable to promiscuous pairing without loss of specificity.
Together with these earlier reports our data suggest a scenario in which the human thymus readily and repeatedly generates public TCRα chains associated with the recognition of islet antigens.These chains are not deleted in negative selection, but their final specificity and affinity is modulated by their TCRβ partner.Since TCR α to β pairing is essentially random, we suggest that the actual risk posed by these autoantigenassociated TCRα chains and part of the individual susceptibility to T1D is determined by the TCRβ microrepertoire paired with them.The threshold for developing clinical autoimmunity is further modified by the genetic risk profile, early microbial environment, and the life-long history of encounters with infectious agents.
The existing data on public T cell clones from T1D patients, although few studies so far have addressed TCRα, is consistent with the scenario outlined above.A specific role for public TCRα clonotypes in T1D was suggested by studies identifying GAD65-specific CD4 + TCRα clonotypes and IGRP-specific CD8 + TCRα clonotypes present at a high frequency across different patients with T1D [52,53].Another study identified circulating islet-reactive CD8 + T cells with public TCRα chains in most individuals, but their homing to pancreas was mostly limited to T1D patients [54].The development of more powerful techniques is required to be able to characterize the TCRβ repertoire paired with these public TCRα chains.However, it is interesting to speculate whether the public TCRα chains in T1D are recurrent enough to provide a target for intervention.

Fig. 1 .
Fig. 1.The features of highly abundant thymic TCRα clonotypes.The distribution of clone sizes in TCRα (open symbols) and TCRβ (black symbols) repertoires.The TCRα clone size distributions show a regular slope of approximately − 3 on the log-log scale for n = 5, the less deeply sequenced sample E being excluded (A).The number of non-templated insertions (B) and the interindividual clonotype sharing measured with Jaccard index (C) for the most abundant top 1%, 2%, 5%, 10%, 20% TCRα clonotypes and for the entire (100%) repertoire, n = 6.TCRα clone size correlations between two individuals and the same highly abundant TCRα clonotypes indicated, representative plots are shown (D).

Table 1 A
summary of T1D-and HIV-specific TCRα and TCRβ sequences in the reference database.The sequences are divided into CD4 and CD8 categories according to the phenotype of the cell from which the TCR clonotype was originally derived.

Fig. 4 .
Fig. 4. Analysis of the CD4 + T1D-associated TCRα sequences in peripheral blood Treg and Tconv populations.The frequency of T1D-associated reference database sequences found in Treg, Tconv and unsorted cells (A), the average number of unique nucleotide sequences encoding each identified amino acid chain (B), and their average clone sizes (C).Median values are indicated by the horizontal line, the box indicates 25-75% percentiles and the whiskers minimum and maximum values.For Treg and for Tconv n = 3, for unsorted n = 10.

Fig. 5 .
Fig. 5. Comparison of CD4 + insulin/proinsulin-with GAD65associated TCRα chains.The average frequency of insulin/ proinsulin-and GAD65-associated TCRα reference database sequences found in the six thymus samples (A), the average number of unique nucleotide sequences encoding each identified amino acid chain (B), and their average clone sizes (C).Wilcoxon signed rank test was applied to compare the means between the two groups.The average frequency of insulin/ proinsulin-and GAD65-associated TCRα reference database sequences found in the ten peripheral blood samples (D), the average number of unique nucleotide sequences encoding each identified amino acid chain (E), and their average clone sizes (F).In (A-C) n = 6, in (D-F) n = 10.