Narcolepsy risk loci outline role of T cell autoimmunity and infectious triggers in narcolepsy

Narcolepsy type 1 (NT1) is caused by a loss of hypocretin/orexin transmission. Risk factors include pandemic 2009 H1N1 influenza A infection and immunization with Pandemrix®. Here, we dissect disease mechanisms and interactions with environmental triggers in a multi-ethnic sample of 6,073 cases and 84,856 controls. We fine-mapped GWAS signals within HLA (DQ0602, DQB1*03:01 and DPB1*04:02) and discovered seven novel associations (CD207, NAB1, IKZF4-ERBB3, CTSC, DENND1B, SIRPG, PRF1). Significant signals at TRA and DQB1*06:02 loci were found in 245 vaccination-related cases, who also shared polygenic risk. T cell receptor associations in NT1 modulated TRAJ*24, TRAJ*28 and TRBV*4-2 chain-usage. Partitioned heritability and immune cell enrichment analyses found genetic signals to be driven by dendritic and helper T cells. Lastly comorbidity analysis using data from FinnGen, suggests shared effects between NT1 and other autoimmune diseases. NT1 genetic variants shape autoimmunity and response to environmental triggers, including influenza A infection and immunization with Pandemrix®.

Triggers of NT1 autoimmunity point to Influenza-A 9,16,17 and, secondarily, Streptococcus pyogenes infections 18,19 . Onset in children is abrupt and seasonal peaking between spring and summer 16 , presumably following a winter infection. Further, multiple countries have reported increased incidence of NT1 4-6 months following the 2009 H1N1 (pH1N1) "swine flu" pandemic 16,17,20 . Finally, immunization with Pandemrix®, a pH1N1 vaccine created to prevent the 2009 pandemic, is an established trigger for NT1 17,20,21 . Increased incidence following Pandemrix® was first seen in Northern Europe, with incidence in children increasing from 0.79/100,000 to 6.3/100,000 21 . Specificity is striking, as increased NT1 was later detected in all European countries where Pandemrix® was used, whereas countries using other vaccine brands did not display vaccination-associated increases in incidence 17,20 . The reason for the vaccine brand specificity may involve differences in flu antigen preparations and/or timing of vaccination when infections peaked in some countries 17,20 . Frequency of other autoimmune diseases did not increase following Pandemrix® vaccination 20 .
In this study, we characterize novel genetic factors for NT1 across multiple ethnic groups, performing computational and functional fine mapping. Our findings establish a compelling pathophysiological mechanism for the disease that implicate antigen presentation by DQ0602 to specific CD4 + T cells and subsequent CD8 + T-cell activation, with applications in the autoimmune disease and vaccination fields.

Results
GWAS discovers novel risk loci for NT1 5,848 cases and 61,153 controls derived from ten cohorts were used as the initial discovery GWAS sample (Table S1). We found associations in HLA (P < 10 −216 ), confirmed previously identified loci (TRA, TRB, CTSH, IFNAR1, ZNF365, TNFSF4) and found 7 novel loci near CD207, NAB1, IKZF4-ERBB3, CTSC, DENND1B, SIRPG and PRF1 (Figs. 1A and 2A, B,  Table 1; Supplementary Figs. 1 and 2). We observed that most associations were shared across all ethnic groups. Significance betweencohort heterogeneity was observed with TRA, SIRPG and DENND1B (Table S2). Finally, as both influenza infections and, in rare cases, immunization with Pandemrix®, associates with NT1 20 , 245 vaccination induced NT1 cases identified in four countries were also studied. In this sub-sample, we found GWAS significant signals with HLA-DQB1*06:02 and TRA rs1154155, as well as shared polygenetic risk (Table 1 and Supplementary Fig. 3). The lack of association of other loci is likely due to the small number of individuals with vaccination-related narcolepsy.
Interestingly, GWAS results are unusually rich in missense variants. In addition to HLA polymorphisms, these include a TRAJ24 (F8V) substitution, polymorphisms in langerin (CD207 N288D and K313I), as well as variants in CTSC (I453L) and PRF1 (A91V); the last two are known hypomorphs involved in autosomal recessive conditions with abnormal sensitivity to viral infections. Finally, we found that the effects of some of these variants colocalized between NT1 and type 1 diabetes (CTSH G11R and SIRPG S286L, posterior probability = 1.0). Functional effects of these missense variants are detailed in Supplementary Data 1 and Supplementary Fig. 2.

NT1 shares variants with other autoimmune diseases
Heritability in NT1 is similar to other pediatric autoimmune diseases 22   . The x-axis shows genomic location by chromosome and the y-axis shows -log 10 P-values. The red horizontal line indicates the genome-wide significant P-value threshold of 5 × 10− 8 . P-values larger than 10 −75 were set to 10 −75 (HLA locus has many SNPs with P-value < 10 −216 ). Variants are shared at individual level with known autoimmune traits, with notable exception at variants within the TRA and TRB loci and variants within CD207 and INFAR1 (see Supplementary Data 1). Raw P-values are reported using two-sided fixed-effects meta-analysis. Multiple testing correction has been done at genome-wide level so that variants with nominal P-value under 5e-8 were considered statistically significant. B Associated variants are located on chromosome positions that have active eQTLs in blood samples, as evidenced by an analysis using GARFIELD. Enrichment analysis has been binned by P-value threshold and raw two-sided P-values are reported. C When using stratified LD score regression, association within individual blood cell types implicate NK cells, CD4+ T and CD8+ T cells. Statistically significant enrichment is marked with a line corresponding to enrichment P-value = 0.05 (dashed line) and FDR corrected P-value = 0.05 (dotted line). Raw two-sided P-values are reported and we have show significance also by false-discovery rate of 0.05 (dotted line). D Global enrichment is seen with autoimmune traits in general using variants that were genome-wide significant. Raw twosided P-values from hypergeometric test are reported. Fig. 2A, B, E: Raw P-values are reported using two-sided fixed-effects meta-analysis. E Overall enrichment was seen with MS and SLE using LDSC. F PheWAS with narcolepsy risk variants showed association across different autoimmune traits. positive beta is depicted with square and negative with triangle. For details, see methods.
traits including psoriasis, multiple sclerosis and systemic lupus erythematosus in particular (P < 0.05, Table S3, Fig. 1E). We computed enrichment of NT1-associated genes with publicly available GWAS data at gene and variant level, identifying overlap with immune and infectious traits such as asthma, type 1 diabetes, primary biliary cholangitis, plantar warts and hepatitis B (P < 0.001, Fig. 1D-F and Table S4). Almost all variants identified (ZNF365, TNFSF4, NAB1, IKZF4-ERBB3, CTSC, DENND1B, SIRPG, and PRF1) are the same or strongly linked with markers associated with other autoimmune diseases (Supplementary Data 1).
In case/control studies, NT1 has been associated with various autoimmune diseases in some 24 . Of note, since DQ0602 is extremely (OR = 0.03) protective against type 1 diabetes 28 and strongly protective (OR = 0.64) against primary biliary cholangitis 29 ; no narcolepsy cases had these dual pathologies. Taken together, these findings suggest shared effects between narcolepsy and other autoimmune diseases at both the epidemiological level and at multiple genetic loci, modulated by HLA genotypes.

Variants involved in antigenic stimulation and infections
The specific polymorphisms in langerin (CD207) we found associated with NT1 have previously been linked to interferon stimulus and influenza uptake by dendritic cells (DC) (Supplementary Data 1). Langerin is a type II transmembrane C-type lectin receptor expressed in Langerhans cells, a specialized type of dendritic cells located exclusively in the respiratory tract and the epidermis, and it recognizes mannose-rich sugars expressed by bacterial, fungal or viral pathogens, including HIV-1 and Influenza-A (Supplementary Data 1).
Our leading variant, rs3815556G, a rare allele, is in complete linkage disequilibrium (r 2 = 1) with two coding variants, rs13383830C (N288D) and rs57302492A (K313I) that modulate recognition of bacterial versus viral antigens. The rare Asp-288/Ile-313 haplotype has langerin molecules with enhanced affinity for GlcNAc, present in influenza and other viruses, whereas the other haplotype has higher affinity for high-mannose structures and fucosylated glycans, as well as 6SO4-Gal binding activity. This may potentially allow for protection against a wider range of microorganisms, notably bacterial ones as well (Supplementary Data 1). The NT1-associated variants may thus affect disease predisposition by increasing influenza viral (as opposed to bacterial) uptake and antigen presentation to CD4 + T cells.
In addition to langerin, we identified a regulatory variant near IL10RB-IFNAR1, rs2096464T. The SNP is a strong eQTL for IFNAR1 expression in various tissues in GTEx. IFNAR1 controls dendritic cell responses to viral infections, notably Influenza-A. We therefore examined IFNAR1 expression in DC following H1N1 infection (PR8 delta NS1), finding that the exact NT1 predisposing SNP (rs2096464) is the major eQTL for this effect (P = 1.92 × 10 −25 , beta = 0.140), as well as for interferon stimulation (P = 10 −33 , beta = 0.215), as it is in perfect LD with the leading variant for the signal (rs6517159, r 2 = 0.93, Supplementary Fig. 4). Taken together, these findings suggest that NT1associated variants may affect disease predisposition by increasing influenza viral (as opposed to bacterial) uptake and antigen presentation to CD4 + T cells, although additional mechanisms could be involved.

Antigen Presentation and T-cell involvement in NT1
We next examined whether associations with NT1 were enriched genome-wide on specific enhancers using stratified LD score regression (LDSC) on Epigenome Roadmap cell type and ENTEX tissuespecific annotations (n = 491 cell and tissue types) 34 . Partitioned heritability by cell type categories was enriched in hematopoietic cell lines (observed h 2 at hematopoietic cells = 0.24[0.11], P = 0.018) and after partitioning the signal into specific cell subsets, ten cell types showed enrichment with P < 0.005. These were either helper or cytotoxic T cells or NK cells ( Fig. 1C and Supplementary Data 2). As LDSC does not keep information on the HLA region due to ambiguous linkage disequilibrium, we next examined the contribution of different immune cell types using enrichment analysis with genes close to GWAS significant variants. This analysis further supported the enrichment with CD4+ T cells, but also implicated antigen-presenting cells such as monocytes and dendritic cells (P-enrichment < 0.01, Table S6), reflecting, in addition to langerin and IFNAR1, expression of GWA significant, independently associated HLA-associated genes DQB1 and DPB1 (Supplementary Data 1). Together, these results indicate involvement of antigen presentation to CD4 + and CD8 + T cells in NT1.

Risk variants in T-cell receptor loci modulate αβ TCR repertoire
NT1 is the only autoimmune disease with known associations in both HLA and T-cell receptor (TCR) loci (TRA and TRB) ( Fig. 2A, B). TCRα and β chains heterodimerize to form biologically functional molecules that recognize peptides presented by HLA. We therefore examined the function of these leading variants by examining effects on T-cell receptor V-or J-gene chain usage using RNA sequencing in 895 individuals 35 , as well as in 130 individuals sequenced specifically in both memory CD4 + and CD8 + T cells.
As mentioned above, rs1154155 within TRA is entirely linked with multiple SNPs across ethnic groups (Fig. 2C), one of which, rs1483979, substitutes a leucine to a phenylalanine in the CDR3 area of J24, which is predicted to interact with peptides presented by HLA ( Fig. 2D; Supplementary Data 5 and 6). This substitution makes it a prime candidate for a functional effect, should an F allele J24*01 CDR3 sequence interact more favorably with an autoantigen than in the presence of  Article https://doi.org/10.1038/s41467-023-36120-z J24*02. Making the matter more complex, however, J24 usage is also modulated by rs1154155, such that the F-associated allele is associated with decreased usage in CD4 + cells, as shown by the eQTL plot in Fig. 2F (beta = 0.33, P < 0.001). We also computed association in memory CD4+T cells and in CD8+ T cells and observed a consistent effect on TRA-J28 expression specifically (P = 2.29 × 10 −10 and P = 4.08 × 10 −10 , in CD4+ and in CD8+ T cells, respectively). Finally, we confirmed that these effects are cis mediated, as the ratio of J24*01 (F) over J24*02 (L) was only 0.4 in heterozygotes, indicating lower allele expression of Falleles, with similar effects in other T-cell subpopulations (Supplementary Fig. 5).
In addition to J24-specific effects, rs1154155 is also strongly associated with TRA-J28 expression in total RNA sequencing from blood (P = 1.36 × 10 −10 , beta = −0.212, Fig. 2E) with posterior probability for shared variant pp = 0.958 (see Supplementary Data 3 for all rs1154155 effects) and the findings were also consistent when testing across CD4+ memory cells (P = 2.29 × 10). Interestingly, rs1154155 is entirely linked with rs3764159, a polymorphism located 14 bp upstream of TRAJ-28 within the 12-base pair recombination signal sequence spacer, possibly explaining the J28 usage effect (Fig. 2C). Controlling for the NT1 TCRA association by lead QTL SNPs for J24 and J28 usage abolishes all effects, except for a minor peak at rs72638479, itself a minor eQTL (r 2 = 0.95 with TRAV8-6, Supplementary Fig. 6).
Within the TRB region, rs7458379 is an eQTL for increased expression of TRBV4-2. However, based on our functional analysis, rs1108955 has the strongest evidence for increasing TRBV4-2 usage (pp = 0.99, Table S7, Supplementary Data 4). Furthermore, rs1008955 is in partial LD with rs7458379, but tags independent haplotypes at the TRB locus ( Supplementary Fig. 7). Both variants are eQTLs for TRBV4-2 expression but reflect independent signals with NT1, such that analysis conditioning for rs7458379 shows remaining association with rs1108955 (P = 0.00019), whereas conditioning the association for rs1108955 removed all association at both the TRB locus and with rs7458379 (P = 0.72) (Supplementary Fig. 8). Taken together, this indicates that higher expression of TRBV4-2 is related to NT1 and mediated by rs7458379 and rs1108955, with the latter as the potentially causal variant at this locus.

Discussion
In this study, we explored genetic risk for NT1 and potential disease mechanisms of identified genetic risk factors. The strongest associations were seen within the HLA region. In addition, we confirmed six previously described risk loci (TRA, TRB, CTSH, IFNAR1, ZNF365 and TNFSF4) and discovered seven novel associations in CD207, IKZF4-ERBB3, NAB1, CTSC, DENND1B, SIRPG, and PRF1.
Individual associations and partitioned heritability enrichment analysis indicate a primary role of the immune system for all loci identified. Most of these loci, often to the exact same SNP, have also been involved in other autoimmune diseases (Supplementary Data 1). These findings, together with the fact that narcolepsy is associated with increased risk of other autoimmune diseases in FinnGen, suggest that NT1 is an autoimmune disease, even if it does not meet all (1) Peripheral response: Influenza virions or vaccine protein debris are ingested by DCs facilitated by CD207; flu proteins are processed by cathepsins CTSH and CTSC for presentation by HLA molecules to specific TCRα-bearing CD4+ cells, initiating an immunological synapse and responses to influenza. Presentation by DC is modulated by IFNAR1 in the context of influenza infection and the type 1 INF response. Cross presentation of influenza antigens processed via the MHC class I pathway in concert with TNFSF4 in DCs is necessary to activate CD8+ cells that mature into cytotoxic lymphocytes (CTLs), initiating cell killing of viron infected cells. Activated Th1 CD4+ cells produce cytokines such as IFNγ and IL-2, which augment cytotoxic activity of CTLs via perforin (PRF1). SIRPG on activated T cells may also promote cell-to-cell adhesion and proliferation in this response. (2) CNS autoimmunity: Activated and primed specific CD4+ cells migrate to the CNS, where they interact with microglia and resident DCs via DQ0602 bound to an influenzamimic autoimmune-epitope (derived from hypocretin cells), initiating a secondary memory response. Hypocretin cell proteins are processed by cathepsins CTSH and CTSC for presentation by DQ0602 to specific TCRα-bearing CD4+ cells, initiating an immunological synapse and autoimmune response. Chain usage for TRAJ24-2, TRAJ28 and TRBV4-2 is associated with NT1 risk and may be crucial for autoantigen recognition. Further, cross presentation by resident DCs and microglial cells activates specific CD8+ cells via MHC class I binding of another HCRT neuron-derived peptide. These primed cytotoxic CD8+ cells then kill HCRT neurons after recognizing MHC class I (such as A*11:01, associated with NT1 independently of DQ0602) bound, cognate HCRT neuron-derived peptide, may be derived from RFX4 or LHX9, on hypocretin neurons. SIRPG1 on DCs, microglia or activated T cells may also promote cell-to-cell adhesion and proliferation in this response. accepted criteria 36 . Further, most variants identified have effects in antigen-presenting cells (HLA, CTSH, TNFSF4), e.g., dendritic cells (IFNAR1, CD207), T cells (TRA, TRB, SIRPG), T helper cells (HLA-DQ, HLA-DP) or cytotoxic T cells/NK cells (HLA-A, PRF1, NAB1), sketching a remarkably narrow potential disease pathway (Fig. 3). In addition to epidemiological data, examining genetic factors overcomes diseaseassociated ascertainment bias in the recruitment of constitutive cohorts. Finally, two loci were implicated at GWA significant level in 245 vaccination-associated NT1 (TRA and HLA), while five other loci replicated nominally at P < 0.05 (CD207, NAB1, TRB, IKZF4-ERBB3, and CTSH), with overall strong genetic correlation between sporadic and vaccination-associated cases. This indicates that vaccination-triggered narcolepsy is essentially identical to sporadic narcolepsy.
Unlike what is reported in other autoimmune diseases, however, narcolepsy is strongly associated with TCA and TCB genetic polymorphism that modulate the TCR repertoire in very specific ways. A logical explanation for this observation could be that a TCR-mediated reactivity that involves receptors containing TRAJ24F, TRAJ28 and TRBV4-2 is an important step in the development of narcolepsy, perhaps through TCR recognition of a viral trigger or an autoantigen by CD4 + or CD8 + cells. This hypothesis is supported by recent studies suggesting usage of TRAJ24 and TRBV4-2 in the DQ0602 recognition of amidated HCRT, a likely autoantigen, as well as to specific influenza peptides with increased reactivity in narcolepsy 14 .
Based on these observations, we propose that NT1 is an autoimmune process where influenza A contributes to risk in the presence of HLA-DQA1*01:02~DQB1*06:02 (DQ0602). The involvement of influenza-A may explain why genetic associations found are shared across ethnic groups, as influenza is one of few viruses that act worldwide on a seasonal basis. It also relates to IFNAR1 and that both affect and respond to influenza infections, although other infectious triggers cannot be excluded. Notably, the literature suggests that langerin with Asp-288 and Ile-313 shows no binding to 6SO4-Galterminated glycans, increased binding to GlcNAc-terminated structures and overall decreased binding to glycans. This would make langerin more restricted in its ability to bind complex carbohydrates and more able to bind GlcNAc-terminated structures, which overall would favor influenza, but also many other organisms. Similarly, IFNAR1 has been linked to antiviral immunity more generally as well, hence specificity to flu infections cannot be concluded with complete certainty.
The universal genetic association is especially clear for HLA-DQ0602, as it is found with different nearby located HLA-DRB1 alleles: DRB1*15:01 in individuals of primary European (Europe and USA) and Asian (China, Korea, Japan and India) descent, but DRB1*15:03 or DRB1*11:01 in individuals of primary Africa descent 5,6 . The primacy of DQ0602 over DRB1*15:01 (and thereby DRB5, as LD is complete) is also demonstrated by the fact that the DRB1*15:01D QA1*01:02~DQB1*06:01 haplotype is not associated with narcolepsy in China and by the fact additional DQ effects are mostly mediated by DQA1 alleles that interact in trans with DQB1*06:02 (i.e., DQA1*01:01 and DQB1*01:03) 33 . Consequently, the association with DRB1*15:01 was slightly less significant than association with DQ0602. In contrast to NT1, other autoimmune diseases, such as multiple sclerosis and type 1 diabetes, commonly have different HLA associations or disease presentations across countries, resulting in more complex HLA associations. Type 1 diabetes, for example, is well known to be mostly associated with HLA-DRB1*03:01 and DRB1*04 and associated DQ alleles in Europe, whereas DRB1*04:05-specific effects are evident in Japan, where the disease and these DRB1 alleles are rare [37][38][39] .
In this study, a hypomorph of perforin, a gene of critical importance to NK and T-cell cytotoxicity, was protective of NT1. Supporting this, we saw enrichment through tissue-specific partitioned heritability in cytotoxic NK and T cells. Although it is conceivable that NK cells or cytotoxic CD4 + T cell could be involved, the most likely explanation is involvement of CD8 + T cell in hypocretin cell killing. Indeed, neurons never express HLA class II, so expression of HLA class I and recognition of hypocretin neuronal antigens would be needed for hypocretin cell pathology to occur. This is also supported by the CTSC association, an enzyme of critical importance to cytotoxic CD8 + activation of progranzymes 40 . Further, Bernard-Valnet et al. 41 used transgenic mice with expression of a neoantigen in HCRT neurons and found that infusion of CD8 + T-cell targeting the neoantigen were able to cause hypocretin cell destruction, while infusion of neoantigen-specific CD4 + T-cell alone was insufficient, although CD4 + T cells migrated closely to the target neurons. Pedersen et al. (2019) also found NT1-associated CD8 + T-cell targeting intracellular transcription factors such as RFX4 and LXH9, known to be enriched in HCRT cells 15 . Finally, CD8 + mediation of cell killing has also been suggested by observation of a CD8 + T-cell infiltrate in a paraneoplastic anti-Ma2 encephalitis case with symptomatic hypocretin cell destruction, although these cases are complex and not associated with DQ0602 42 .
In summary, genetic data indicates T-cell autoimmunity in NT1 with genetic overlap of autoimmune traits. A particularity of the disease is involvement of polymorphisms such as in IFNAR1 and CD207 that regulate antigen-presenting cell responses to infection. Another peculiarity is strong association with TCR polymorphisms, possibly reflecting oligoclonality of T-cell responses. With epidemiological studies indicating seasonality of disease onset 16 , and increased incidence following vaccination with Pandemrix® in Europe 17,20 , a role of influenza or other infections is likely. CTSH implicates dendritic processing of antigens, perhaps of post-translationally modified HCRT itself [11][12][13][14][15] . Presentation by DQ0602 to CD4 + T cells ensue, with likely involvement of very few autoantigen epitopes and a restricted number of T-cell receptors, explaining the large effect of these loci. Subsequent cell killing of hypocretin neurons by CD8 + cells, may be through involvement of other, intracellular autoantigens 43 , complete the process. Altogether, this work illustrates how GWAS can identify the involvement of different cell types in a specific condition, thus ultimately providing further insight to the possible pathophysiological mechanisms underlying disease onset.

Study approval
This study has been reviewed and approved by the Stanford University Institutional Review Board (IRB) on Medical Human Subjects (Protocol # 14325, genetic and blood markers in narcolepsy and hypersomnia, Registration # IRB00005136) and by respective IRB panels in each country providing samples for the study. Informed consent was obtained from each participant in accordance with governing institutions. Patients and control subjects in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected prior the Finnish Biobank Act came into effect (in September 2013) and start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea, the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by . We confirm that all methods were carried out in accordance with relevant guidelines and regulations.

Study subjects
Six-thousand seventy-three unrelated individuals with NT1, some included in prior studies 8,9 , and 84,856 ancestry-matched controls were included in the study. In addition, 245 individuals with vaccination-related NT1 and 18,862 controls were recruited in Finland (N = 76 cases and 2796 controls), Sweden (N = 39 cases and 4894 controls), Norway (N = 82 cases and 429 controls) and United Kingdom and Ireland (N = 48 cases and 10,743 controls) [44][45][46][47] . All cases had documented immunization with Pandemrix®. All cases had narcolepsy with clear-cut cataplexy and were DQB1*06:02 positive or had narcolepsy with documented low hypocretin-1 in the cerebrospinal fluid. From FinnGen, we used ICD-10 code G47.4 and ICD-9 code 347, thus including individuals who have narcolepsy with or without cataplexy (for details, see Supplementary Materials 1.7 FinnGen).

Genotyping
Subjects were genotyped with Affymetrix Affy 5.0, Affy 6.0, Affymetrix Axiom CHB1, Affymetrix Axiom EUR, Axiom EAS, Axiom LAT, Axiom AFR, Axiom PMRA and Human Core Exome chip platforms (Table S1). Genotypes were called with Affypipe 48 , Affymetrix genotyping console or Genome Studio. Markers with genotyping quality (call rate < 0.95) or deviation from Hardy-Weinberg equilibrium (P-value < 10 −6 ) were discarded. Samples were checked for relatedness by filtering based on proportion of identity-by-descent using cut off >0.2 in PLINK 1.9 PI_HAT score. One pair of related individuals was removed. If related individuals were a case and a control, cases were retained in the analysis. Three first principal components within each cohort were visualized and outliers removed.

Imputation
We imputed samples by pre-phasing cases and controls together using SHAPEIT v2.2 and imputed with IMPUTE2 v2.3.2 49,50 and 1000 Genomes phase 1v3 build37 (hg19) in 5 Mb chunks across autosomes. Haplotype reference consortium data was used for the second Stanford collection. For variants having both imputed and genotyped values, the genotyped values were kept, whereas for those individuals where genotype was missing, imputed values were kept.

Genetic analyses and statistics
Analyses for all data sets were performed at Stanford University except for the Finnish and Swedish vaccination-related cases and European Narcolepsy Network samples, which were analyzed by respective study teams using the same analytical methods. GWAS was first performed in each case control group separately using SNPTEST v.2.5.2 51 . To adjust for cohort-specific population stratification issues, we used linear regression implemented in SNPTEST method score adjusting for the first ten principal components. Standard post imputation quality control was done. Variants with info score <0.5 and minor allele frequency (MAF) < 0.01 were removed from the analysis. Signals specific for one genotyping platform only and variants in each locus with heterogeneity P-value < 10 −20 were removed. We used a fixed-effects model implemented in METAv1.7 with an inverse-variance method based on a fixedeffects model for combining association results 52 . In total, 12,600,187 markers across studies were included in the final case/control metaanalysis. The significance level for statistically significant association was set to genome-wide significance (P-value < 5 × 10 −8 ), controlling for multiple testing. Overall, test statistics showed no genomic inflation (lambda < 1.05). GCTA was used for heritability and gene-based tests 53 . Coloc analysis was done using coloc package in R version 3.4.2 (2017-09-28) 54 and Manhattan and QQ-plots were created with QQman or FUMA. Shared and partitioned heritability was estimated using LD score regression. To compare the main characteristics of the participants, we used a multivariate logistic regression model as implemented in the R glm package. Lastly, we used FUMA and the curated list of GWAS as provided by FUMA to compute gene enrichment analysis for loci that were associated with narcolepsy at genome-wide significant level and to examine the association between narcolepsy loci and previous GWAS 55 . We also computed genetic correlation between autoimmune traits and narcolepsy using LD score regression and estimated PheWAS associations per lead variant in each locus using the Open targets resource https://genetics.opentargets.org/.

Comparison of vaccination and non-vaccination cases
Vaccination samples were studied separately for GWAS. To compare genetic architecture of narcolepsy cases following vaccination versus other cases, we first examined association of each GWAS significant SNPs of the primary (non-vaccinated) sample in the vaccination sample. In addition, we computed polygenic risk score in non-vaccination narcolepsy cases and vaccination-related narcolepsy using PRSice 56 and looked at overlap.

Typing and imputation of HLA variants
High-resolution HLA imputation in 4-digit resolution (2-field, amino acid level) for HLA A, B, C, DRB1, DQA1, DQB1, DPA1 and DPB1 was performed using HLA*IMP:02 as implemented in Affymetrix HLA or the HIBAG package in R version 3.1.2 (2014-10-31). HIBAG is an HLA imputation tool that uses attribute bootstrap aggregation of several classifiers (SNPs) to select groups of SNPs that predict HLA type and allows for the use of own HLA reference panels 30 . Reference HLA types were used from published imputation models and for individuals of primary Asian and African descent obtained with Sirona sequencing 57 in ethnic-specific populations (N = 500 individuals of African descent, N = 2000 individuals of European descent and N = 368 individuals of Asian descent). Imputation accuracy was further verified by Luminex HLA typing in a subset of samples and accuracy was over 95% for all ethnic groups and common alleles with >5% frequency in population. For all alleles, the accuracies for individuals of European descent were 98% for HLA-A, 97% for HLA-B, 98% for HLA-C, 96% for HLA-DRB1, 100% for HLA-DQA1, 100% for HLA-DQB1, 100% for HLA-DPA1 and 92% for HLA-DPB1. The accuracies for individuals of Asian descent, where allele typing was also available, were 95% for HLA-DRB1, 94% for HLA-DQA1 and 98% for HLA-DQB1.

Analysis of HLA variants
HLA effects in NT1 were analyzed as described before 6 in 23,410 individuals, including 9789 individuals of primary Asian descent and 13,621 individuals of primary European descent as ancestry-matched cases and controls. Within each ancestry group, HLA alleles were analyzed using additive models and logistic regressions after adjusting for the first 10 population-specific principal components. We identify independent associations using conditional analysis (stepwise forward regression in each cohort). Fixed-effects metaanalysis was used to combine associations using Plink 1.9 58 and R version 3.2.2. We considered alleles sustaining Bonferroni correction for correction of number of alleles with minor allele frequency over 2% (N = 110 HLA alleles), thus significance resulting in Bonferroni cut-off P = 0.00045.

Analysis of expression quantitative trait loci (eQTL)
We used tissue-specific summary statistics from the GTEx consortium and from 59 to examine total blood-specific effects of associating variants on gene expression 59,60 . We used Garfield to compute enrichments using enhancer annotation data from ENCODE provided by the Garfield software 61 and stratified LD score regression to compute tissue-specific enrichment using ENCODE data as provided by the LCSC package 34 .

Expression assessment in monocyte-derived dendritic cells
We examined how the genetic variants modulated T cell and antigen-presenting (dendritic cell and monocyte) gene expression by RNA sequencing and RNA expression. To examine environmentspecific triggers for eQTLs, we challenged dendritic cells with an influenza-A infection or stimulated them with interferon. We recruited individuals free from earlier inflammatory disease, autoimmune disease, chronic metabolic disorders or chronic infectious disorders between 18 and 56 years of age (average 29.9), extracted blood mononuclear cells and differentiated into mononuclear dendritic cells, as previously described 62 . We then extracted RNA from the samples using the RNeasy 96 kit (Qiagen, CAT#74182), according to the manufacturer's protocols and sequenced the samples under baseline, influenza infected and interferon beta 1 (IFNB1) stimulation (99 baseline, 250 influenza infected, and 227 IFNB1 stimulated). Five hundred fifty-two pass-filter samples (94 baseline, 243 influenza, and 215 interferon) were sequenced to an average depth of 38 million 76-bp paired-end reads using the Illumina TruSeq kit. We aligned reads to hg19 genome with TopHat, assembled transcriptomes for each sample using StringTie 63 and computed transcript quantities using Kallisto 64 . We merged transcriptomes across the same condition and then across all three conditions and removed redundant isoforms using cuffcompare 65 . We performed QTL mapping using the Matrix eQTL 66 package using an empirically determined number of principal components (PCs) as covariates in each analysis. We tested 0 to 44 PCs (local eQTLs) in increments of two and the number of PCs was chosen to maximize the number of local eQTLs detected. We computed empirical Pvalues by comparing the nominal P-values with null P-values determined by permuting each gene 1000 times. False-discovery rates were calculated using the qvalue package (https://github. com/StoreyLab/qvalue), as previously described 67 .

T-cell receptor eQTL analysis
For this analysis, we used data from 895 individuals that were originally genotyped and sequenced as part of the Depression Genes and Networks Project reported by 68 , identifying short range (cis) SNPs and trans HLA alleles association with TCR V and J usage as described before 35 . Briefly, expression/usage of each T-cell receptor alpha and beta V-and J-gene was calculated relative to total chain expression from peripheral blood RNA-sequencing. We mapped sequencing reads as in Battle et al. (Bowtie254 with Tophat55 default parameters) and counted the number of unique reads that mapped to each V/J/C-TCR/Ig gene with a modified version of HTSeq56, which allows reads to map to a sequence of more than one V/D/J/Cgene. We then removed individuals and genes with low read counts, normalized the reads using log transformation and regressed on technical and biological covariates as described in Battle et al. as well. Finally, we quantile-normalized the residuals to a normal distribution. Pearson correlations were used to test associations between genotypes and V-or J-gene expression.
Targeted TCR sequencing in NT1 cases and DQ0602-positive controls In addition to using the data from Battle et al., (2014) we also conducted TCR sequencing in T cells in 60 individuals with NT1 and 60 healthy DQ0602 individuals using RNA from total CD4+ T cells, CD4+ T memory and CD8+ T-cell populations. We used fastqc to infer quality and trimmed low-quality reads. We then performed barcode demultiplexing, after which local blast was used to align and extract CDR3s. Linear regression was fit for TRA usage per genotype dosage adjusting for age and gender, RNA-sequencing lane and case/control status as covariates. We also separately analyzed coding consequences for each TRAJ24 containing productive CDR3 fragment, as one of the most significantly associated SNPs was a coding SNP (rs1483979) changing an amino acid Leucine to Phenylalanine. These "LQF" and "FQF" were extracted, and their frequencies were computed. Ratio of FQF/(LQF + FQF) was further computed across all samples.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The sumstats generated in this study have been deposited in the Dryad database under https://doi.org/10.5061/dryad.kd51c5b9b.