Data characterizing the ZMIZ1 molecular phenotype of multiple sclerosis

The data presented in this article are related to the research article entitled “The autoimmune risk gene ZMIZ1 is a vitamin D responsived marker of a molecular phenotype of multiple sclerosis” Fewings et al. (2017) [1]. Here we identify the set of genes correlated with ZMIZ1 in multiple cohorts, provide phenotypic details on those cohorts, and identify the genes negatively correlated with ZMIZ1 and the cells predominantly expressing those genes. We identify the metabolic pathways in which the molecular phenotype genes are over-represented. Finally, we present the flow cytometry gating strategy we have used to identify the immune cells from blood which are producing ZMIZ1 and RPS6.

The data presented in this article are related to the research article entitled "The autoimmune risk gene ZMIZ1 is a vitamin D responsived marker of a molecular phenotype of multiple sclerosis" Fewings et al. (2017) [1]. Here we identify the set of genes correlated with ZMIZ1 in multiple cohorts, provide phenotypic details on those cohorts, and identify the genes negatively correlated with ZMIZ1 and the cells predominantly expressing those genes. We identify the metabolic pathways in which the molecular phenotype genes are over-represented. Finally, we present the flow cytometry gating strategy we have used to identify the immune cells from blood which are producing ZMIZ1 and RPS6. Value of the data 1. The detailed list of genes defining the ZMIZ1 molecular phenotype in multiple cohorts is described (Supplementary Table 1). 2. The cohorts used to identify the genes in the ZMIZ1 molecular phenotype (Table 1). 3. The genes negatively correlated with ZMIZ1 expression in whole blood and the immune cells in which they are expressed are identified (Fig. 1). 4. The gene pathways indicated as overrepresented with the genes of the ZMIZ1 molecular phenotype are identified (Fig. 2). 5. The flow cytometry gating strategy used to identify the immune cells most highly expressing ZMIZ1 and RPS6 are diagrammed (Fig. 3). 6. A list of Multiple Sclerosis (MS) risk SNPs tested for association with gene expression in whole blood in the study published in [1] ( Table 2).

Data
We have recently described a gene, ZMIZ1, whose expression is dysregulated in the blood of people with MS [1]. From transcriptomic data, the expression of this gene is tightly correlated with that of many others. The set of genes whose expression is most correlated with ZMIZ1 across cohorts from Australia and the United States is defined in Supplementary Table 1. The cohort details are described in Table 1. The genes whose expression is most positively correlated with that of ZMIZ1 is previously described [1]. The genes most negatively correlated with ZMIZ1, and the immune cell subsets in which they are predominantly produced, are shown in Fig. 1. Gene pathway analysis was used to identify the types of molecular pathways most overrepresented in the gene lists of the ZMIZ1 molecular phenotype (Fig. 2). Specifically, GeneGo Maps, GeneGo Map folders, GeneGo Networks, Gene Ontology processes, and Gene Ontology molecular functions over-represented are shown in Figs. 2A-E respectively. The flow cytometry gating strategy to identify the immune cell subsets expressing ZMIZ1 and RPS6 is described in Figs. 3A and B respectively. A list of MS risk single nucleotide polymorphisms (SNPs) tested for association with gene expression in whole blood in this study is presented in Table 2.

Cohorts
Untreated MS patients who had not been on immunomodulatory therapies for at least three months and age-matched healthy controls were recruited for the following cohorts. The molecular phenotypes were determined from previously described transcriptomic cohorts (all PAXgene whole blood samples): ANZgene (2010) [2]; clinically isolated syndrome (CIS; or first clinical diagnosis of central nervous system demyelination) [3] and RNAseq [4,5]. All MS patients were diagnosed using the revised MacDonald criteria [6]. All blood was collected with informed consent after the nature and possible consequences of the studies were explained from people with MS and healthy controls with approval from Human Research Ethics Committees.

Molecular phenotypes
The genes most correlated with ZMIZ1, ZFP36L2 and RPS6 expression in PAXgene whole blood was determined by Pearson's correlation using an RNAseq dataset of MS patients and healthy controls [4]. To visualise relative expression levels across different immune cell populations for the most correlated genes, a heatmap was generated using an RNAseq dataset of ex-vivo and in-vitro differentiated immune cell subsets, as previously described [7]. These genes were assessed for immune cell transcription factor roles and involvement in molecular pathways using GeneGo Metacore. Genes Treated MS cohort Figure 10 Total, n ¼ 78: n ¼10, glatiramer acetate n ¼18, fingolimod n ¼20, interferon beta n ¼23, natalizumab n ¼7, dimethyl fumarate [8] ANZgene (microarray) cohort Figures 1B, 6, 7 n ¼99 untreated MS; n ¼45 healthy controls [2] Sydney RNASeq cohort Figures 1, 3 Relative expression in immune cell subsets of ZMIZ1 and the 50 genes whose expression is most highly negatively correlated with ZMIZ1 expression in PAXgene whole blood in multiple sclerosis and healthy controls. (These were determined in the RNASeq cohort: n ¼32 MS, n ¼ 40 healthy controls, [4]. These genes are mostly expressed in lymphocytes. Expression was by RNASeq and colour on heatmap indicates relative expression level: orange is high, blue is low. Cell subsets were ex vivo or in-vitro generated as previously described [7]. Pearson's correlation (R) of expression with ZFP36L2 and RPS6 is also shown for each module gene, red is positive correlation and green is negative correlation. ZFP36L2 correlations all less than r¼ -0.27 (p¼ 0.02), RPS6 correlations all greater than r ¼0.54 (p ¼8.9E-07). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) associated with MS were also noted. We then tested correlations in previously described cohorts used for transcriptomic studies: the ANZgene microarray cohort [2], and the CIS cohort [3].

Flow cytometry
Venous blood was collected in EDTA and peripheral blood mononuclear cells (PBMCs) isolated on Ficoll-Paque Plus (VWR International), washed in phosphate-buffered saline (PBS) and cryopreserved in RPMI 1640 Medium (Life Technologies) containing 2 mM glutamine, 10% heat-inactivated fetal bovine serum (FBS, Fisher Biotec), 10% DMSO and 50 units/ml penicillin and 50 mg/ml streptomycin. PBMCs were thawed, washed in RPMI with 2% FBS, and incubated for 30 min in RPMI with 2% FBS, 10 mM HEPES, 1 mM magnesium chloride and 100 units/ml DNase I (Roche). Cells were stained with Live/Dead Aqua viability stain (Molecular Probes) in PBS on ice for 30 min, washed in PBS, and blocked with 33 mg/ml mouse IgG (Life technologies). Antibodies were as described in [1]. Processes; E. Molecular functions. Enrichment analysis using Gene Ontology (GO) software (http://geneontology.org/); with items listed in order of significance; p value shows the probability that the module is not over-represented; FDR: false discovery rate; ratio is the number of genes from the ZMIZ1 module in the pathway, compared to the total number of genes in the pathway. Table 2 List of MS risk SNPs tested for association with gene expression in whole blood in this study.

Transparency document. Supporting information
Transparency data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2017.02.040.