Genetic signature detected in T cell receptors from patients with severe COVID-19

Summary Characterization of host genetic factors contributing to COVID-19 severity promises advances on drug discovery to fight the disease. Most genetic analyses to date have identified genome-wide significant associations involving loss-of-function variants for immune response pathways. Despite accumulating evidence supporting a role for T cells in COVID-19 severity, no definitive genetic markers have been found to support an involvement of T cell responses. We analyzed 205 whole exomes from both a well-characterized cohort of hospitalized severe COVID-19 patients and controls. Significantly enriched high impact alleles were found for 25 variants within the T cell receptor beta (TRB) locus on chromosome 7. Although most of these alleles were found in heterozygosis, at least three or more in TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, and TRBV10-1 suggested a possible TRB loss of function via compound heterozygosis. This loss-of-function in TRB genes supports suboptimal or dysfunctional T cell responses as a major contributor to severe COVID-19 pathogenesis.

A wide number of genetic variants have been associated with severe COVID-19, frequently pointing at immune response dysfunction. 19,20mong the most relevant genetic factors for immune dysfunction are genes located in the major histocompatibility complex (MHC).The human leukocyte antigen (HLA), MHC in humans, consists of a group of genes located on the short arm of chromosome 6.They encode surface glycoproteins of two types, HLA class I or II molecules, with different tissue distribution and molecular characteristics.This system is extremely polymorphic, with multiple genes and allelic variants, although a given individual possesses only two alleles inherited in a Mendelian fashion.The main function of the HLA molecules is to distinguish foreign invaders such as viruses and bacteria from the body's own cells.Pathogenderived peptides (anchored in the HLA molecule) are presented to T lymphocytes which, in turn, are activated, exerting their immune function.T lymphocytes engage with HLA through a surface receptor (T cell receptor, TCR).These receptors are generated in the thymus by a random rearrangement mechanism.Here genes from a group of TCRA and TCRB segments (in the case of abTCR) or TCRG and TCRD segments (in the case of gdTCR) stochastically mobilize segments to generate TCR receptors.These rearrangements facilitate the development of a large repertoire of diverse cells, enabling them to protect against distinct infections.Prior to exerting their function in the periphery, these newly arranged TCR must be assayed against the HLA molecules present in the thymus.Only those able to establish cognate interactions with the molecules will survive and exit to the periphery.Given the polymorphism of the HLA system and the stochastic rearrangement of TCR segments, the final TCR repertoire available to confront pathogens differs between individuals.
TCR gene alleles have long been considered immune response genes, but evidence has been lacking for diseases involving complex antigens like whole microorganisms and broad tissue autoantigens.Obvious relationships have been found at the level of individual pathogen peptides or autoantigen peptides.The HLA alleles are the prototypical immune response genes.However, they rarely impact on immune responses against pathogens, although exceptions exist like HIV.The HLA protective alleles HLA-B*27, HLA-B*57, and HLA-B*58:01 present immunodominant peptides such as Gag protein-restricted by HLA-B*27.TCR is a disulfide-linked membrane-anchored complex consisting of the highly variable alpha and beta chains bound to the invariant CD3 chain.The variable domain of TCR alpha-chain and beta-chain have three complementarity-determining regions.The complementarity-determining region of the beta-chain is encoded in locus q34 of chromosome 7 and has been shown to interact with antigens with a high degree of specificity. 21The role of HLA and TCR on COVID-19, however, remains unclear.
Herein, we report a genetic study performed on highly selected patients with severe COVID-19 hospitalized during the first wave in Madrid, Spain, before the introduction of vaccines.Our patient cohort was compiled following strict clinical inclusion criteria, including age younger than 60 years, no comorbidities, and hypoxemic bilateral pneumonia.Controls were ancestry matched and bioinformatically processed in an identical manner to avoid batch effect biases.Our study yielded 25 high impact variants (21 frameshifts and 4 stop codons) at genome-wide significance (p value >5.0E-8) within the TRB locus of the q34 band in chromosome 7. Genes TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, and TRBV10-1 contained at least 3 high impact alleles in heterozygosis from different variants, suggesting a possible mechanism of TRB loss of function via compound heterozygosis.Our results support a role of T cell receptors via loss of function in the exacerbation of COVID-19 symptoms, potentially leading to suboptimal and/or dysfunctional immune responses to SARS-CoV-2 infection as a major determinant of disease severity.Table 1 describes the set of 28 clinical terms split into the 4 overarching COVID-19 major clinical phenotypes along with the specific number of patients affected (right column).By decreasing order, the phenotypes affecting more than 10 patients were pneumonia (N = 72), acute respiratory syndrome disease (ARDS; N = 42), persistent fever (N = 30), ARDS and intensive care unit (ICU; N = 19), fatigue, malaise, headache and arthromyalgia (N = 13 each), and hepatitis (N = 11).None of our patients developed stroke, peripheral arterial thrombosis, arthritis, seizures, or myelitis.

Description of clinical phenotypes
Patients with severe COVID-19 were additionally sorted by the number of major COVID-19 phenotypes, which somewhat acted as a proxy for a greater number of symptoms.Interestingly, the top 19 patients with the greatest number of COVID phenotypes were all males.

Analysis of high impact variants
Figure 2 illustrates samples origin, filtering, and data analysis.Figure 3 describes the initial break down of country of origin from samples before filtering those that did not cluster within our controls' Iberian Spanish (IBS) genetic distance.A total of 851,386 variants were identified in the joint cohort of 167 severely affected cases (N = 74) and controls (N = 93).Overall 32,366 (3.80%) were novel variants.The total number of high impact variants was 5,589, averaging 322 per exome.Of note, 1,477 high impact variants were rare (i.e., not present in gnomAD 22 ), averaging 53 per exome.Table 2 summarizes these numbers.

Case-control high impact differentially affected genes
We identified high impact variants predicted by Variant Effect Predictor (VEP). 23Genes containing high impact variants were further selected for analysis as long as both case and control samples were affected in the same gene (to avoid case-control batch effects due to different coverages).This yielded a total of 1,119 genes containing at least 1 high impact mutation in both cases and controls.We identified significantly different case/control genes with high impact mutations (Bonferroni-corrected p value = 0.05/1,118 = 4.47E-05).Table 3 records genes with p values below this significance threshold.Overall, 12 out of the resulting 60 genes (20%) are T cell receptor genes.We carried over this list of 60 genes for functional enrichment analysis.

Functional enrichment analysis
The resulting differently affected 60 genes were analyzed for functional enrichment using DAVID. 24,25We used DAVID's functional annotation chart output to cluster groups of genes according to their enriched term score.In Table S1, we provide the non-redundant set of terms below the significance threshold and their associated genes, ordered by the strength of their statistical significance.
A cluster of genes related to T cell receptors dominated the DAVID's output table.The T cell receptor cluster enrichment score (Enrichment Score: 5.21) is the top scoring functional cluster, followed by epidermal growth factor (1.00) and ANK repeat (0.99). Figure 4 shows DAVID's enrichment scores for term clusters resulting from analyzing those 60 genes.

Analysis of TCR gene cluster variants
Next, we focused on the analysis of high impact variants within genes of the top functionally enriched cluster.This analysis yielded 25 variants with case-control allele frequency differences below a threshold of genome-wide significance (p value <5.0E-08;Table 4).These variants were distributed among 8 of the 12 T cell receptor gene cluster and included TRBV7-8, TRBV7-7, TRBV7-6, TRBV5-5, TRBV6-5, TRBV10-1, TRBV7-3, and TRBV30.All variants are relatively common (> 0.01 within the European population; NCBI's ALFA Allele Frequency Aggregator 26 ).These 25 variants include 4 stop gains (single nucleotide variant substitution) and 21 frameshifts (indels), all of them highly deleterious according to CADD 27 and localized within the TRB locus on chromosome 7, at band 34 within the long arm (7q34).We did not filter these variants by linkage disequilibrium, given that they are all functional (consequence either frameshift variant or stop gained).

TCR loss of function via compound heterozygosis
Except for a few exceptions, most of the 25 variants of concern were heterozygous for the alternative allele in our patients.In order to test whether loss of function might occur in both alleles in the remaining 8 T cell receptor genes, we counted high impact alleles within each of them (TRBV5-5, TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, TRBV7-8, TRBV10-1, TRBV30).If a gene harbors more than 1 high impact allele, the chances for compound heterozygosis are greater and therefore the chances for gene inactivation.Because of the limitations of short read sequencing, which does not distinguish phase between alleles in different variants, it still may be possible for two heterozygous high impact variants to affect the same allele.As a consequence, we decided to classify as likely compound heterozygous loss of function the presence of at least three high impact alleles in a TCR.Table 5 shows counts of high impact alleles per patient for each of the TCR.For each patient we therefore counted the number of high impact mutations within variants that already have been identified with significant allele frequencies in cases and controls.From a total of 25 variants of concern spanning 8 T cell receptor beta variable (TRBV) genes, we found these three groups.1. TRBV genes with likely inactivation through compound heterozygosis: mutations in all patients ranging from 3 to 8 (TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, TRBV10-1).There is an exception in TRBV7-3, with 1 patient having only two alleles.2. TRBV genes with some patients showing potential for inactivation via compound heterozygosis (TRBV7-8, TRBV30).High impact alleles are either absent or up to 3. 3. TRBV genes unlikely to be inactivated through compound heterozygosis (TRBV5-5).High impact alleles never make it to 2 for any patient.
In a preliminary analysis, we could not find any exome sequencing reads flanking the TRBV genes that could suggest a recombination with D genes (data are not shown).

Comparison with genetic markers previously reported as determinants of severe COVID-19
To date a number of genetic variants have been associated to COVID-19 disease severity.None of them, however, have implicated T cell receptor genes.Table 6 shows the latest list of variants from the Genetics of Mortality in Critical Care (GenOMICC) study that are present in our patient cohort. 8We chose only variants from the GenOMICC study because it involved severe COVID-19 patients and were produced using whole genome sequencing data.For 23 lead variants from the GenOMICC study, our exome patient data covered 11 of them with sufficient quality (STAR Methods).We calculated allele risk frequencies in our filtered case cohort (N = 74) and compared them to risk allele frequencies in NCBI's ALFA Europeans.We found no significant differences between allele risk frequencies in our filtered case cohort compared to the European population (r = 0.9708; p value = 0.7327).
We looked into the Host Genetics Initiative for COVID-19 as a public independent dataset. 28We chose the subset of patients with very severe COVID-19.This dataset included sequencing and microarray variant data from 21 studies across distinct European populations.We downloaded this dataset in GRCh37 (A2_ALL_eur_leave_23andme).From our total of our 25 variants of concern recorded in Table 6, we identified two that were included in the study.These two variants were rs17249 (7:142400325:G:T; Reverse Complemented Alleles) and rs17267 (7:142812761:G:A).Both variants are stop codons and have allele frequencies of 0.4526 and 0.239, respectively, which is similar to the European allele frequencies in the NCBI ALFA controls in Table 6.We found that both alternative alleles for these two variants are present in all our cases (n = 74) in heterozygosis, which suggests a greater frequency than the one noticed by the HGI consortium.No other variants from our dataset were present in this subset of Europeans with severe COVID-19.The lack of presence of most of our variants could be due to differences in methodology and study population.Our methodology used exome sequencing from highly selected Iberian Spanish (IBS) cases and controls, bioinformatically processed in the same way.We did not use a meta analysis of microarray, exome and whole genome studies, which implies different filtering and quality controls.For instance, we did not apply a linkage disequilibrium filter, given that we only considered coding variants with high impact, being either frameshifts of stop codons.We included both rare and common variants in our analysis, with most of our variants being multinucleotide insertions or deletions, and not SNPs as it is the case within the HGI dataset.

DISCUSSION
After performing whole exome sequencing of a selected sample of IBS patients with severe COVID-19, we found a group of TCR chain encoding genes more likely to be inactivated in our patient cohort.Our study identified 25 high impact heterozygous variants at the T cell receptor beta variable (TRBV) locus on human chromosome 7. Twenty of these variants were present in most patients suggesting likely TRBV inactivation via compound heterozygosis of the following 5 genes: TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, TRBV10-1.These genes are all part of the TCR beta complex, participating in highly specific antigen recognition.Altogether our findings support that a genetic predisposition may account for suboptimal and dysfunctional T cell responses in SARS-CoV-2 infection might favor the development of severe COVID-19.
A striking feature of SARS-CoV-2 infection is that it may produce a wide range of symptoms, from asymptomatic infections to acute respiratory distress syndrome.Other complications include thromboembolic phenomena and clinical manifestations due to specific organ involvement (hepatitis, renal failure, cardiovascular events, neurological dysfunction, etc.). 29Although distinct inoculum sizes 30 and different coronavirus variants may determine differences in transmission and pathogenicity, 31 host factors seem to largely explain the wide range of clinical outcomes seen following SARS-CoV-2 infection.Among others, older age, male sex and the presence of comorbidities (obesity, diabetes, prior lung disease, immunosuppression, etc.) are well-established predictors of severe COVID-19. 34Our data corroborate a predominance of male individuals among those with severe COVID-19 consecutively attended during the first wave of COVID-19 in Madrid, Spain.For facilitating the search of host genetic determinants, older individuals and those with comorbidities were excluded from our study cohort.
Our analysis used a set of matched ancestry case-control individuals whose exome data were processed and filtered using the same protocol.Variant data were analyzed using VEP, in order to discover high impact mutation (likely loss of function) count differences in cases and controls.
In order to minimize biases and artifacts for observed differences in severe COVID-19 patient-affected genes, we followed a strict set of filters, both at the level of variant and sample selection.Our genetic study targeted specifically a subset of apparently healthy individuals younger than 60 years-old that developed severe COVID-19 and required hospitalization.We found a significant enrichment in loss of function at the TRB locus on the long arm of chromosome 7, at band 7q34.Recent reports from the GenOMICC uncovered seven risk genes associated with severe COVID-19 infection located on chromosomes 6 (nearby where the HLA system lies in humans), 12, 19, and 21. 8 Other studies have investigated genetic determinants of severe COVID-19 in a much broader clinical population.][7][8][9][10][11][12][13] However, heterogeneity in ancestry study populations, clinical definition criteria, and methodological issues have resulted in lack of uniform findings and recognition of overall impact of genetic markers on COVID-19 disease severity. 32ur results show the power of highly selective inclusion clinical criteria, together with the importance of selecting for high impact variants and clustering of variants according to their annotated functions.Our method for variant selection and gene clustering allowed us to find enrichment for loss of function in TCR genes.These are a class of T cell surface molecules that recognize the antigen-derived peptides presented by the MHC and are able to trigger a series of immune responses.Variants identified in our study suggest a mechanism for T cell dysfunction/extenuation that could lead to severe COVID-19.4][35][36][37][38] Nevertheless, expression of these receptors could also reflect T cell activation.Our data provide evidence for a suboptimal or otherwise inappropriate T cell response associated with severe COVID-19. 39From a total of 851,386 common variants in cases and controls 5,589 were of high impact, an average of 322 per exome.We expect our findings might help prioritize patients more likely to suffer from severe COVID-19 as carriers of these genetic determinants.Our results contribute to the much wider debate on the importance of analyses of diverse human ethnic groups, since we have only used Iberian Spanish (IBS) patients to draw these results.Similar analyses in different populations will be therefore needed with a greater number of patients and controls.In summary, we propose a crucial role of T cell receptor genes as determinants of severe COVID-19.Our findings deserve further consideration by better powered studies and in distinct ethnic groups.

TCR functional gene cluster significantly enriched
A total of 60 genes were identified as having significantly different counts of high impact mutations.Overall, 12 out of these 60 genes were TCR genes.Functional clustering analysis within these 60 genes confirmed the TCR beta gene cluster to be far more enriched than any other.Apart from TCR, some of the remaining genes are evolutionarily related and already known to influence COVID-19 severity, such as genes of mucin secretion (MUC5B) 40 or the GOLGA6L2 family. 41

New variant associations for T cell receptor beta variable (TRBV) genes
Functional enrichment analysis led us to analyze variants within the TRBV gene cluster.Overall, 25 high impact variants of concern spanned eight TRBV beta encoding genes (TRBV5-5, TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, TRBV7-8, TRBV10-1, TRBV30) displaying genome wide   A total of 25 variants in GRCh37 were identified in 8 T cell receptor beta variable genes below genome-wide threshold of significance (p value <5.0E-08) for allele frequency differences in cases and controls.All variants cluster within a region 0.5M nucleotides long in chromosome 7 at band 7q34.21 of them are indels causing frameshift mutations, the rest single nucleotide stop gain variants.All of them are highly deleterious according to CADD.A CASE_MAF = 0.500 means all cases are heterozygous for the alternative allele.Case control p value relates to chi squared statistical significance difference between allele frequencies in our population of cases and controls.Case EUR p value relates to chi squared test differences between frequencies in the general European (EUR Sample Size) from NCBI's ALFA allele frequencies and the case population

ll
OPEN ACCESS significant (p < 5.0E-08) allele frequency differences.All of these variants are common (R0.236 frequency) in our case cohort, and also common (R 0.018) in the European population, according to the NCBI's ALFA.Our relatively common variants of concern are compatible with the 10% adult population who contract severe COVID-19 disease.Allele frequencies of these variants appear significantly different in our case cohort compared to the European population (p value = 5.29E-06; ANOVA Single Factor), which suggests an enrichment of their frequency for our cases with respect to Europeans.All of these variants are part of the hypervariant TCR V region of beta chains, yet they all are relatively common and highly deleterious (CADD Phred score =>12.0).Almost all observed high impact alleles from these 25 variants were heterozygous.If a T cell uses a non-productive TCR, it would therefore be free to rearrange another TRBV.If a particular TRBV is not available as it is the case in half of the alleles that have one of the variants, the T cell may arrange the normal allele on the other chromosome.If we consider heterozygous high impact mutations in isolation, it is therefore likely that the repertoire will not be affected unless the other chromosome gene copy contains another high impact mutation For each gene we highlight the total number of variants of concern as well as the number of high impact alleles each patient has in each gene for those variants.In red we highlight those genes containing more than 3 high impact alleles, indicating a potential loss of function in both alleles for the gene.
from a different variant.The presence of two different mutated alleles at a particular locus can inactivate a gene, a process which is known as compound heterozygosis.To ascertain whether compound heterozygosis could be present for each of these 8 TCRBV genes, we counted high impact alleles within the same patient.We analyzed how many high impact alleles patients had within the 25 variants of concern.We identified three groups of genes: (a) 5 TRBV genes where compound heterozygosis was likely for all patients because they harbored more than 3 high impact alleles (TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, TRBV10-1); (b) two where compound heterozygosis was possible for some patients with 0-3 mutant alleles (TRBV7-8, TRBV30), and (c) one gene where no compound heterozygosis was possible (TRBV5-5).
Because sequencing is unable to determine the phase of high impact alleles, we expect higher chances for high impact mutations affecting both chromosomal copies with greater high impact alleles.More than 3 high impact alleles were counted for almost all patients in five genes (TRBV6-5, TRBV7-3, TRBV7-6, TRBV7-7, TRBV10-1), suggesting their possible inactivation.Such inactivation of TRBV could lead to reduced repertoire or poorer specific T cell activation for our 74 severe patients.A less specific immune response would result in a dysfunctional activation with a much broader cytokine and inflammatory systemic response.

The role of TCR in patients with severe COVID-19
To date, the role of TCR in COVID-19 severity has remained unclear.Some studies [42][43][44] have shown that T cells play a prominent role in COVID-19 susceptibility and severity.However, they have not been able to establish whether T cell responses are helpful or harmful.Prominent lymphopenia has been observed in patients with severe disease, with abnormal T cell differentiation. 45,46Moreover, reduction of T cells in the periphery is a prominent feature of many individuals with severe COVID-19.Given the high impact of the identified variants, our results support that loss of function and inactivation of T cell receptors affecting the variable region in charge of binding to the peptide/MHC complex as a genetic signature for severe COVID-19.TCR may therefore play a crucial role in the recognition of SARS-CoV-2 antigens by T cells, accounting for a dysfunctional response for an exacerbation of symptoms in SARS-CoV-2 infection.

Risk allele frequencies do not differ from the general European population
We compared existing published genome-wide association variants from the GenOMICC consortium, which were unveiled in a large population of mostly Northern European individuals.The GenOMICC study provides a state-of-the-art analysis of host genomics associated with disease severity, yielding 23 genome wide significant variants.From this list, 11 were covered with sufficient quality in our 74-case cohort.We then compared risk allele frequencies observed in our case cohort against those of the general European population, yielding no significant differences.The lack of significant differences in our case/control cohorts for GenOMICC risk alleles may be due to their small sample, the peculiar characteristics of IBS ancestry or the different methodology we used.Our frequency concordance with the general European population, however, supports the validity of our variant frequency data, which, although small, follow the expected patterns observed for the European population in an independent cohort (NCBI's ALFA).

Limitations of the study
We acknowledge the modest size of our study population.We prioritized the use of strict clinical criteria to define severe COVID-19 in addition to checking a restricted ancestry-matched population.We also note that individuals used as controls in our study were from the general population rather than confirmed SARS-CoV-2 infected individuals with no symptoms.This means that a very small proportion of our controls could also be liable to suffer severe COVID-19 following coronavirus infection.This is also reflected by design, where all high impact variants in T cell receptors are present in cases and controls, albeit with significantly different frequencies.Despite this, our general population controls allow us sufficient discriminatory power to statistically identify differences in affected genes when comparing cases and controls.

Figure 1
Figure1shows the four major clinical phenotypes of severe COVID-19 and the number of patients that exhibited conditions within each group.Briefly, from the 74 cases, pulmonary manifestations were recorded in 72, extra-pulmonary conditions in 35, coagulation disorders in 14, and systemic manifestations in 35.

Figure 1 .
Figure 1.Number of patients affected with symptoms in our highly selected population Determinants of severe COVID-19 and major clinical phenotypes with numbers referring to patients with severe COVID-19 in our study population within each group.

Figure 2 .
Figure 2. Flow chart illustrating samples origin, filtering, and data analysis (Bioinformatics workflow) Here, we describe the different steps taken to analyze the patient data and come up with our variants of concern.

Figure 3 .
Figure 3. Country of birth for the 98 cases with severe COVID-19 enrolled in the study (Case Cohort) Overall 19 individuals were born outside of Spain.

Figure 4 .
Figure 4. Gene cluster enrichment annotations Results from functional enrichment analysis of top affected genes.The T cell receptor cluster enrichment score (DAVID Enrichment Score: 5.21) is the top scoring functional cluster, followed by epidermal growth factor (1.00) and ANK repeat (0.99).

(
Continued on next page)

Table 1 .
List of medical terms for COVID-19 clinical manifestations

Table 2 .
Summary statistics of all genetic variants analyzed in the study populationCases and controls (N = 167)

Table 3 .
Number of cases and controls with high impact variants within a gene (as identified by Ensembl's Variant Effect Predictor) (Continued on next page)

Table 3 .
ContinuedWe included only genes where the difference between affected cases and controls have a p value <4.47E-05.P values are Bonferroni-corrected significantly different affected genes.From a total of 1,119 genes with high impact variants in both cases and controls, 60 official gene names were identified as harboring high impact mutations in cases and controls.Within this list of differentially affected genes 12/60 (20%) are T cell receptors, shown ordered alphabetically for easier interpretation.

Table 4 .
Variants of concern in T cell receptor beta variable genes

Table 5 .
T cell receptor beta variable (TRBV) genes with significantly different allele frequency variants between cases and controls

Table 6 .
Lead variants from the GenOMICC study and their frequencies in our filtered case cohort of 74 IBS cases (IBS COV AF) with severe COVID-19From a total of 23 variants in GenOMICC, our exome data covered 11 variants (shown here).We calculated risk allele frequencies in our case cohort and compared them to risk allele frequencies in Europeans from NCBI's ALPHA Allele Frequencies.MAF counts can be inferred from EUR sample sizes and their respective EUR MAFs.