Genome-Wide Association Study of Cryptosporidiosis in Infants Implicates PRKCA

Globally, diarrhea remains one of the major causes of pediatric morbidity and mortality. The initial symptoms of diarrhea can often lead to long-term consequences for the health of young children, such as malnutrition and neurocognitive developmental deficits. Despite many children having similar exposures to infectious causes of diarrhea, not all develop symptomatic disease, indicating a possible role for human genetic variation. Here, we conducted a genetic study of susceptibility to symptomatic disease associated with Cryptosporidium infection (a leading cause of diarrhea) in three independent cohorts of infants from Dhaka, Bangladesh. We identified a genetic variant within protein kinase C alpha (PRKCA) associated with higher risk of cryptosporidiosis in the first year of life. These results indicate a role for human genetics in susceptibility to cryptosporidiosis and warrant further research to elucidate the mechanism.

LAZ 12 ) versus controls within PROVIDE (P ϭ 0.007) and CBC (P ϭ 0.02), while no differences were observed in stunting between cases and controls in DBC (P ϭ 0.97). Additionally, there was no statistically significant evidence of heterogeneity in LAZ 12 , number of days exclusively breastfed, or sex between the three studies (heterogeneity P [P het ], Ͼ0.05).
GWAS of cryptosporidiosis within the first year of life. We tested the association between 6.5 million SNPs across the human genome and symptomatic Cryptosporidium infection in the first year of life. Effects were estimated separately for the three birth cohorts and subsequently combined using a fixed-effects meta-analysis, filtered for heterogeneity (P het ), minor allele frequency (MAF) (Ͼ5%), and imputation quality (INFO; score, Ͼ0.6) ( Fig. 1; see also Fig. S1 in the supplemental material). A total of 6 SNPs in an intron of PRKCA (protein kinase c, alpha) were significantly associated with Cryptosporidium infection (P Ͻ 5 ϫ 10 Ϫ8 ) ( Fig. 2A). For the SNP most highly associated with Cryptosporidium infection (rs58296998), each copy of the risk allele (T) conferred 2.4 times the odds of cryptosporidiosis within the first year of life (P ϭ 3.73 ϫ 10 Ϫ8 ). The effect size and risk allele were consistent across all three studies (P het value of 0.11) (Fig. 2B). After conditioning performed on the basis of rs58296998 (by including this SNP in the logistic regression model as a covariate), the evidence for association with the remaining SNPs in the region was no longer statistically significant, suggesting that the observed association in PRKCA is explained by a single SNP (rs58296998) or by one  The y axis is the -log10 P value for the SNP association in the meta-analysis of study-specific logistic regressions adjusting for length-for-age Z-score at 12 months, the first two study-specific principal components, and the genotyping batch for the Dhaka Birth Cohort (DBC). Genome-wide significance (5 ϫ 10 Ϫ8 ) is denoted by the dashed line. This plot is limited to associations with a P value below 0.01.
Genetic Susceptibility to Cryptosporidiosis ® highly correlated with this SNP (Fig. S2A). Among the 26 children homozygous for the risk allele (TT) at rs58296998, 46% developed symptomatic cryptosporidiosis during the first year of life. This proportion decreased to 24% for children heterozygous (CT) for this risk allele (n ϭ 272), compared to 13% of children homozygous (CC) for the risk allele (n ϭ 745). The rs58296998 T allele frequencies (15.0% to 16.7%) for all three cohorts in this region are consistent with the Bangladeshi reference population (1000 Genomes phase 3) frequency of 18% and the overall South Asian frequency of 15% (17). Globally, the highest frequencies of rs58296998 T allele are found in East Asian populations, with the highest T allele frequency of 34% of the Chinese Dai in Xishuangbanna, China. The rs58296998 T allele is at lower frequencies within Africa, at 9% within the Luhya in Kenya, and is even less frequent in West Africa (3.5% to 5.5%) (Fig. 3).
Cases had their first diarrheal episode positive for Cryptosporidia at a mean of 242 days of age. We confirmed the GWAS results with respect to the dosage of rs58296998 risk alleles significantly associated with time to first diarrheal sample positive for Cryptosporidia among cases versus right-censored controls (up to the child's first birthday) (P ϭ 6.37 ϫ 10 Ϫ8 ). All children homozygous for the risk allele (TT) had their first episode in the first year of life (Fig. 2C). Among cases, however, there was no statistically significant association between rs58296998 genotype and time to infection (P ϭ 0.095) (Fig. S2B). In PROVIDE, the rs58296998 genotype was associated with severity of diarrhea as determined by the Ruuska score (P ϭ 0.028) (Fig. S2C).
Suggestive SNP associations with Cryptosporidium (P Ͻ 10 Ϫ6 ) were also identified on chromosome 11 (chr11) and chr16. The strongest association on chromosome 11 (rs4758351) was found within an intergenic region of a cluster of olfactory receptor genes. Each copy of the rs4758351 A allele (MAF of 14%) conferred 2.39 times the odds of Cryptosporidium within the first year of life (P ϭ 3.78 ϫ 10 Ϫ7 ) (Fig. S3A). Multiple SNPs in this region of chr11 (position 6015194 to position 6024551) had similar magnitudes and strengths of association with Cryptosporidium (OR, 2.13 to 2.39). The strongest association on chromosome 16 was with the rs9937140 SNP, located upstream of apolipoprotein O pseudogene 5 (APOOP5). Each copy of the rs9937140 G allele (MAF, 23%) conferred 1.99 times the odds of cryptosporidiosis (P ϭ 7.75 ϫ 10 Ϫ7 ) (Fig. S3B).

Expression and PrediXcan.
We used a publicly available resource, the Genotype-Tissue Expression (GTEx) Project, to estimate the influence of human genetic variation on human gene expression in multiple tissues (18,19). The associated rs58296998 SNP, located in the PRKCA gene, is also associated with PRKCA expression. This expression quantitative trait locus (eQTL), or a genetic variant previously shown to influence the expression of a gene, showed decreasing expression of PRKCA with each T allele in the esophageal muscularis (P ϭ 3.12 ϫ 10 Ϫ5 ), the sigmoid colon (P ϭ 4.61 ϫ 10 Ϫ4 ), and the esophageal mucosa (P ϭ 7.50 ϫ 10 Ϫ4 ) (19). These expression data, coupled with the GWAS result, suggested that decreased expression of PRKCA is correlated with increased risk of symptomatic Cryptosporidium infection within the first year of life.
Additional genome-wide expression and gene set analyses. In the absence of direct gene expression measurement, we relied on previously estimated tissue-specific associations between genome-wide SNPs and gene expression, which quantify the genetic component of gene expression. We estimated predicted patterns of genomewide differential gene expression between cases and controls by weighting the summary statistics from our GWAS of cryptosporidiosis in the first year of life by the use of tissue-specific PredictDB weights. These SNP-level estimates were then combined for each gene to infer association between imputed gene expression and cryptosporidiosis (20,21). No association of predicted gene expression with cryptosporidiosis reached statistical significance. A total of 13 genes showed a nominally significant (P Ͻ 0.001) association in more than one tissue-specific model (see Table S1 in the supplemental material; see also Fig. S4). Variants in the gene OTUD3 (OTU deubiquitinase 3) (chr1; position 20208356 to position 20239438) were associated with cryptosporidiosis in 18 different tissue-specific models (P Ͻ 0.001). In all tissue-specific models, individuals with predicted increased expression of OTUD3 had an increased risk of cryptosporidiosis within the first year of life (OR, 1.68 to 6.63; P ϭ 8.46 ϫ 10 Ϫ5 to 8.97 ϫ 10 Ϫ4 ) (Fig. 4).
We also performed gene set enrichment analysis using MSigDB hallmark gene sets (n ϭ 50), KEGG (n ϭ 186) and BioCarta (n ϭ 217) by combining gene-level summary statistics to examine aggregate signals within biological pathways. No pathways reached statistical significance after adjusting for multiple comparisons; however, data from several gene sets were suggestive (Table S2). The two top-ranked gene sets are among the hedgehog signaling pathways, namely, the hallmark hedgehog signaling pathway (empirical P value [P emp ] ϭ 5.04 ϫ 10 4 ) (Bayes factor [BF] ϭ 515.65) and KEGG hedgehog signaling pathway (P emp ϭ 1.47 ϫ 10 Ϫ3 ) (BF ϭ 235.59).

FIG 3
Allele frequencies for allele T at top signal rs58296998 as determined by analysis of 1000 Genomes phase 3 data, as well as by analysis of case/control status in the three cohorts combined. Each pie chart on the map shows the frequency of the T allele with the black wedge. The remainder of each pie chart is colored in accordance with that T allele frequency. The inset provides the T allele frequency for children without any symptomatic cryptosporidiosis in the first year of life (controls; MAF ϭ 13.6%) and for those with at least one diarrheal episode (cases; MAF ϭ 25.0%).
Genetic Susceptibility to Cryptosporidiosis ®

DISCUSSION
Here, we present the results of the first genome-wide association study of symptomatic Cryptosporidium infection. Specifically, we tested the role of host genetics in susceptibility to Cryptosporidium infection associated with diarrhea within the first year of life. A region on chromosome 17 was identified, with each additional T allele of rs58296998, an intronic SNP in PRKCA, conferring 2.4 times the odds of cryptosporidiosis within the first year of life. Additionally, this SNP was previously identified as an eQTL of PRKCA, with decreased expression of PRKCA associated with the T allele. This suggests that this SNP may influence Cryptosporidium infection through decreased expression of PRKCA.
The protein kinase C alpha gene (PRKCA) is an isotype of the protein kinase C (PKC) family, whose members are serine and threonine specific and are known to be involved in diverse cellular signaling pathways. Specifically, PKCs have numerous roles in the development and function of the gastrointestinal tract (22) and in the immune response (23). This relationship was confirmed with knockout experiments, where PKC␣ was shown to be a positive regulator of Th17 cell effector functions. PKC␣-deficient [Prkca(Ϫ/Ϫ)] cells failed to produce the appropriate levels of interleukin-17A (IL-17A) in vitro (23). An analysis of Cryptosporidium parvum-infected mice demonstrated the importance of the Th17 response to infection, showing increased levels of IL-17 mRNA and Th17 cell-related cytokines in gut tissue after infection (24). Additionally, both pharmacological inhibition and genetic PKC␣ inhibition have been shown to prevent NHE3 internalization, Na ϩ malabsorption, and tumor necrosis factor (TNF)-mediated diarrhea, despite continued barrier dysfunction (25), supporting the idea of a role for PRKCA in symptomatic cryptosporidiosis. This link between PRKCA and Th17 may be critical to gut infections and, specifically, to infection of Cryptosporidium in the developing infant gut. We identified a SNP that was associated with decreased expression of PRKCA and thus was less able to mediate the IL-17 immune response during Cryptosporidium infection. PRKCA has also been shown to be associated with numerous other infections, including infections by Staphylococcus aureus (26); with progression of sepsis (27) and toxoplasmosis (28); with Burkholderia cenocepacia infections in cystic fibrosis patients (29); and with hepatitis E virus replication (30).
As an obligate intracellular parasite, Cryptosporidium relies on host cells to complete its life cycle in the human host; thus, it is also plausible that PRKCA directly mediates susceptibility via impacts on parasite invasion. Sporozoites invade brush border intestinal epithelial cells by inducing volume increases (31) and cytoskeletal remodeling at the site of host cell attachment (32), leading to engulfment via host membrane protrusions. Studies have shown that inhibition of host factors, including actin remodeling proteins and PKC enzymes, is sufficient to inhibit sporozoite invasion in vitro (32). Interestingly, PKC␣ has been shown to play an important role in Escherichia coli pathogenesis (33). Like Cryptosporidium, E. coli induces host actin condensation at the site of host cell invasion, and immunocytochemical studies indicate that activated PKC␣ colocalized with actin condensation at the bacterial entry site (34).
While our top SNP within PRKCA has previously been shown to influence the expression of PRKCA in GTEx, our imputed gene expression analysis using PrediXcan did not reveal a significant difference in predicted levels of PRKCA expression between cases and controls. This was likely due to the difference between a single SNP being examined in GTEx and the combined effects of multiple eQTLs estimated from a European descent reference population in PrediXcan. A major limitation of predicted gene expression analyses is the lack of population specificity for non-European groups (35). The PrediXcan models were derived from individuals of European descent, as were the covariance structures used to infer correlations between eQTLs. We saw a direct relationship between population differences in allele frequencies for the weighted SNPs and impaired performance. Specifically, we observed the lowest predictive performance in tissues for which the informative SNPs had large differences in allele frequencies between European and South Asian populations in the 1000 Genomes Project phase 3 data (17) (see Fig. S5 in the supplemental material). These included two tissues, namely, esophageal mucosa and the colon sigmoid tissue, in which rs58296998 was identified as an eQTL for PRKCA. These trends highlight the importance of reference populations representative of global populations to ensure that tools are useful in non-European populations, such as ours. We also identified an association of increased expression of OTUD3 with increased odds of cryptosporidiosis within the first year of life. This gene is associated with ulcerative colitis (36)(37)(38)(39)(40)(41)(42) and inflammatory bowel disease (43,44). This finding is consistent with the hypothesis of a pathway shared between enteric infection and autoimmune intestinal disease, as indicated in a previous genetic analysis of Entamoeba histolytica infection in the same study population (45).
Collapsing the predicted patterns of differentially expressed genes into gene sets, we found enrichment in the hedgehog signaling pathway. A previous study examined the gene expression profiles of long noncoding RNA (lncRNA) and mRNA in HCT-8 cells infected with C. parvum subtype IId (46). Of note, PRKCA was the most significantly differentially expressed gene in infected HCT8 cells 24 h postinfection (2.24-fold decreased expression in infected cells; P ϭ 3.82 ϫ 10 Ϫ5 ). Pathway analysis of the differentially expressed mRNAs found that genes in the hedgehog signaling pathway were significantly enriched during Cryptosporidium infection. This finding, in combination with our identification of hedgehog signaling in imputed gene expression profiles, is suggestive of a potential link between decreased PRKCA expression and hedgehog signaling; however, further research to confirm these findings and to elucidate the role of PRKCA genetic variation in gene expression and hedgehog pathway perturbation is needed.
A potential limitation of our study was that, due to the use of sensitive molecular diagnostics, multiple enteropathogens were frequently detected in each diarrheal sample. However, we did not detect the same genetic signatures as that seen in our previous study of Entamoeba histolytica in this same study population for Cryptosporidium (45). Further, coinfection with multiple pathogens would dilute the statistical signal for any one pathogen, and yet we found a statistically significant result for Cryptosporidium. Therefore, we are confident that our results are specific to cryptosporidiosis, despite cooccurrence with other enteric pathogens.
Through a GWAS meta-analysis of three separate birth cohorts, we identified a region in PRKCA on chromosome 17 as being associated with increased risk of symptomatic cryptosporidiosis in the first year of life among Bangladeshi infants. This gene has previously been implicated in other infectious outcomes, indicating pleiotropy with the immune system's reaction to numerous pathogens. Publicly available data support Genetic Susceptibility to Cryptosporidiosis ® a link between our top SNP and expression of PRKCA, suggesting a mechanism operating via Th17 inflammatory control. Clinical trials are currently proposed for PKC isotypes, including PKC-alpha, for treatment of autoimmune disease (47). These treatments may also be important for cryptosporidiosis, which lacks treatment for young children, due to an underlying shared pathway identified in this study. Identifying host genetic variations associated with cryptosporidiosis, such as those in PRKCA, can help us identify viable drug targets to improve treatment and prevention of this major cause of morbidity and mortality. Further research is needed to elucidate the mechanism underlying this relationship and to better understand the complex interplay of genetic susceptibility and environmental influences in the development of intestinal disease. Dhaka Birth Cohort study design. Designed to study the influence of malnutrition in child development, the Dhaka Birth Cohort (DBC) is a subset of a larger birth cohort recruited from the urban slum in the Mirpur Thana in Dhaka, Bangladesh. Children were enrolled within the first week after birth and followed up biweekly with household visits by trained field research assistants (FRAs) for the first year of life. Anthropometric measurements were collected at the time of enrollment and every 3 months thereafter. Length-for-age adjusted Z-scores (LAZ) were calculated by comparing the lengths and weights of study subjects with those of the World Health Organization (WHO) reference population, adjusting for age and sex, using WHO Anthro software, version 3.0.1. Field research assistants (FRAs) collected diarrheal stool samples from the home or study field clinic every time that the mother of the child reported diarrhea. To maintain a cold chain, the samples were transported to the Centre for Diarrheal Disease Research, Bangladesh (ICDDR,B) parasitology laboratory. The presence of Cryptosporidium was determined using enzyme-linked immunosorbent assay (ELISA). More details can be found in previously published reports by Steiner et al. (4) and Korpe et al. (9). We used a nested case-control design, where children with at least one diarrheal sample positive for Cryptosporidium within the first year were defined as "cases." Children with diarrheal samples that were not positive for Cryptosporidium were defined as "controls." PROVIDE study design. The "Performance of Rotavirus and Oral Polio Vaccines in Developing Countries" (PROVIDE) Study consists of a randomized controlled clinical trial and birth cohort from the same urban slum in the Mirpur Thana in Dhaka, Bangladesh, as the DBC and Cryprosporidiosis Birth Cohort (CBC) (see below). PROVIDE was specifically designed to assess the influence of various factors on oral vaccine efficacy among children in areas with high poverty, urban overcrowding, and poor sanitation. The 2-by-2 factorial design looked specifically at the efficacy of the 2-dose Rotarix oral rotavirus vaccine and oral polio vaccine (OPV) with an inactivated polio vaccine (IPV) boost over the first 2 years of life. All participants were from the Mirpur area of Dhaka, Bangladesh, with pregnant mothers recruited from the community by female Bangladeshi FRAs. Each participant had 15 scheduled follow-up clinic visits, as well as biweekly diarrhea surveillance through home visits by FRAs. The presence of Cryptosporidium in diarrheal samples was determined by ELISA. Consistently with the DBC phenotype definition, cases had at least one diarrheal sample positive for Cryptosporidium within the first year of life. Controls had at least one diarrheal sample available for testing, but none were positive for Cryptosporidium. Severity of diarrhea was determined with the Ruuska score, which assesses severity as a function of diarrhea length, clinical symptoms, and other clinical features (48).

MATERIALS AND METHODS
Cryptosporidiosis Birth Cohort study design. The Cryptosporidiosis Birth Cohort ("Cryptosporidiosis and Enteropathogens in Bangladesh"; ClinicalTrials.gov registration no. NCT02764918) is a prospective longitudinal birth cohort study in two sites in Bangladesh. The first site is in an urban, economically depressed neighborhood of Mirpur, and the second is in Mirzapur, a rural subdistrict 60 km northwest of Dhaka. The two birth cohorts were established in parallel, with the objective of understanding the incidence of cryptosporidiosis, the acquired immune response, and host genetic susceptibility to cryptosporidiosis in Bangladeshi children. Pregnant women were recruited and screened, and infants were enrolled at birth. Participants were followed twice-weekly with in-home visits to monitor for child morbidity and diarrhea for 24 months. Infant length and weight were measured every 3 months, and weight-for-age and length-for-age adjusted Z-scores were determined using World Health Organization Anthro software (version 3.2.2). Stool samples were collected during diarrheal illness and once per month for surveillance. Stool was tested for Cryptosporidium by quantitative PCR (qPCR) assay modified from a method reported previously by Liu et al. (49). A cycle threshold value of 40 was used. The pan-Cryptosporidium primers and probes target the 18S gene in multiple species known to infect humans (4).
Genotype data. DNA for all three cohorts was extracted from blood samples collected in the first few months of follow-up. The Dhaka Birth Cohort (DBC) and PROVIDE Study data were generated and cleaned as described previously (45). A summary of quality control (QC) procedures is provided in Fig. S1 in the supplemental material. Briefly, a total of 396 children in the DBC were genotyped on three different Illumina arrays. Imputation to 1000Genomes phase 3 data was performed for all individuals. After postimputation QC, which included additional filtering for relatedness and for poorly imputed variants, a total of 396 individuals and 10.2 million SNPs were included in the DBC data freeze. For PROVIDE, a total of 541 individuals were genotyped on a Multi-Ethnic Genotyping Array (MEGA) (Illumina). After standard quality control measures (including the use of minor allele frequency values of Ͼ0.5% and missingness values of Ͻ5%) were applied and first-degree-related individuals removed, a total of 499 individuals remained. After imputation to 1000Genomes and subsequent postimputation QC, a total of 499 individuals and 10.8 million genetic variants remained. For CBC, a total of 630 individuals were genotyped on a Multi-Ethnic Global Array (MEGA) (Illumina). One individual was removed for first-degree relatedness (PI_HAT Ͼ 0.2), 31 individuals were removed as PCA outliers, and 3 individuals were removed for heterozygosity. No individuals or SNPs were removed for missingness (Ͼ5%). Additional SNP-level filters included the use of minor allele frequency (MAF) values of Ͻ0.5% (M ϭ 751,869) and Hardy-Weinberg equilibrium P values of Ͻ10 Ϫ5 (M ϭ 85). After all QC steps, CryptoCohort genotype data included 594 individuals and 826,228 SNPs. Phasing in of SHAPEIT2 (50) was followed by imputation to 1000 Genomes phase 3 data (1000Genomes) (17) performed with IMPUTE2 (51,52). All three studies were separately imputed to 1000Genomes.
Cross-study genetic data harmonization. After imputation, all three data sets (DBC, PROVIDE, and CBC) were double-checked for relatedness (both within each study and between studies) to ensure independence. One individual from each pair of related individuals was dropped in a manner consistent with the first or second degree of relatedness (PI_HAT Ͼ 0.2). Individual outliers for heterozygosity (F ϭ Ͼ5 standard deviations from the mean) were also excluded from further analysis. A total of 85 individuals were dropped from DBC, 9 from PROVIDE, and 34 from CBC. Only the top principal component from the combined data set was found to be significantly associated with outcome (Fig. S6).
Statistical analysis. All three studies (DBC, PROVIDE, and CBC) were analyzed separately using logistic regression with an additive model accounting for imputed genotype weights in SNPTEST (51,53,54). All three analyses were adjusted for length-for-age Z-score (LAZ) at 1 year of age, for sex, and for the first two principal components. The Dhaka Birth Cohort was additionally conditioned on the genotyping array to account for batch effects. We combined the three analyses in a fixed-effects meta-analysis within META. Results were filtered for P het values of Ͼ0.05, minor allele frequency (MAF) of Ͼ5%, and INFO score of Ͼ0.6 in all three studies, resulting in 6,504,706 SNPs. The conditional analyses were run separately by cohort for the PRKCA region, with each analysis being conditioned on rs58296998 in addition to the original covariates with SNPTEST. Results were again filtered for heterogeneity or P het values of Ͼ0.05, MAF of Ͼ5%, and INFO score of Ͼ0.6 in all three studies.
Allele frequencies. The allele frequencies were derived from the 1000 Genomes Project phase 3 data, v5a (17). Individuals were stratified by their denoted population with first degree related individuals removed.
GTEx and eQTL overlap GWAS results. Expression quantitative trait loci (eQTLs) were identified through the use of the GTEx Portal (https://www.gtexportal.org/home/) on 6 August 2018 (19). The top SNP was identified as an eQTL for PRKCA with P values of Ͻ0.001 for multiple tissues. PrediXcan measured gene expression in 48 tissues and subsequently mapped genetic variation across the human genome to tissue-specific gene expression levels. Therefore, eQTLs are identified in a tissue-specific manner and annotated as such on the GTEx Portal.
MetaXcan imputation and association analysis. To impute gene expression and association with outcome from our GWAS summary statistics, we applied MetaXcan (S-PrediXcan and packaged best practices) (21). Weights were previously derived with GTEx v7 data in a population of subjects of European descent, with accompanying European-descent linkage disequilibrium metrics for the SNP covariance matrices (PredictDB Data Respository; http://predictdb.org/). MetaXcan was used instead of the original PrediXcan to ensure consistency in models with our GWAS. All 48 tissues were run separately for the meta-analysis results previously described. Following imputation and estimation of gene expression with outcome, we calculated weights for each gene-tissue pair as the ratio between the number of SNPs used in the model and the total number that were prespecified in the model multiplied by predicted expression performance. To determine associations across many tissues, a P value threshold of 0.001 was utilized. A strict Bonferroni correction performed for the 242,686 comparisons resulted in a P value threshold of 0.05/242,686 ϭ 2.06 ϫ 10 Ϫ7 , according to which no comparison yielded a statistically significant result. The relationships of allele frequencies in European and South Asian populations with PrediXcan weights were examined to assess prediction capacity ( Fig. S5 and S7).
Gene set enrichment analysis. Gene set enrichment analysis was conducted on the described previously imputed gene expression data summary statistics from MetaXcan. For each gene, we selected the tissue corresponding to the smallest P value. Using the program GIGSEA (Genotype Imputed Gene Set Enrichment Analysis [55]), we tested for associations of 453 curated gene sets defined by MSigDB hallmark gene sets (56), as well as KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg .jp) and BioCarta (57) gene sets (58). To account for redundancy with overlapping gene sets, we utilized the weighted multiple linear regression model, using the matrix operation to increase speed, with a total of 1,000 permutations. A false-discovery rate of 0.05 was calculated on the ranked results.
Data availability. Data are publicly available from the NIH, via dbGAP, phs001478.v1.p1 (Exploration of the Biologic Basis for Underperformance of Oral Polio and Rotavirus Vaccines in Bangladesh), or by request from us. All analysis programs used are detailed above, but the actual code in R for each analysis is also available by request from us.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.