Meta-analysis of transcriptional regulatory networks for lipid metabolism in neural cells from schizophrenia patients based on an open-source intelligence approach

There have been a number of reports about the transcriptional regulatory networks in schizophrenia. However, most of these studies were based on a specific transcription factor or a single dataset, an approach that is inadequate to understand the diverse etiology and underlying common characteristics of schizophrenia. Here we reconstructed and compared the transcriptional regulatory network for lipid metabolism enzymes using 15 public transcriptome datasets of neural cells from schizophrenia patients. Since many of the well-known schizophrenia-related SNPs are in enhancers, we reconstructed a network including enhancer-dependent regulation and found that 53.3 % of the total number of edges (7,577 pairs) involved regulation via enhancers. By examining multiple datasets, we found common and unique transcriptional modes of regulation. Furthermore, enrichment analysis of SNPs that were connected with genes in the transcriptional regulatory networks by eQTL suggested an association with hematological cell counts and some other traits/diseases, whose relationship to schizophrenia was either not or insufficiently reported in previous studies. Based on these results, we suggest that in future studies on schizophrenia, information on genotype, comorbidities and hematological cell counts should be included, along with the transcriptome, for a more detailed genetic stratification and mechanistic exploration of schizophrenia.


Introduction
Schizophrenia is known to have a high concordance rate between monozygotic twins, with an estimated heritability of 80 %, indicating that genetic factors may play a major role in its onset (Imamura et al., 2020;Sullivan et al., 2003). Recently, a large-scale genome-wide association study (GWAS) (n = 36989 schizophrenia patients, n = 113075 controls) conducted by the Schizophrenia Working Group of the Psychiatric Genomics Consortium reported evidence for 108 single nucleotide polymorphisms (SNPs) associated with the onset of schizophrenia (Ripke et al., 2014). These SNPs are found in various locations in the genome and have been linked to a variety of genes, suggesting that schizophrenia is a multifactorial disease.
Schizophrenia has clinical features such as neurological dysfunction Roberts, 2021), inflammatory reactions (Müller, 2018;Upthegrove and Khandaker, 2020) and abnormal lipid metabolism (Cao et al., 2018;Ghosh et al., 2017;Liu et al., 2021;Tessier et al., 2016), and it has been hypothesized that mitochondrial dysfunction and the kynurenic pathway, which are responsible for tryptophan metabolism, are responsible for these traits.
Lipid metabolism exhibits various characteristics among individuals with schizophrenia. Examples of such abnormalities include structural alterations of lipids in the brain and skin and changes in blood plasma and serum lipids (Shimamoto-Mitsuyama et al., 2021). For instance, multiple cases of derangements of membrane phospholipids and polyunsaturated fatty acids have been reported in the brains of schizophrenia patients (Ghosh et al., 2017;Tessier et al., 2016;Wood et al., 2014;Yao et al., 2000). Other studies have shown that the composition of skin ceramides can also be affected (Smesny et al., 2013). Furthermore, significant differences in phospholipids, such as phosphatidylethanolamine and phosphatidylcholine, and amino acids related to lipid metabolism, such as tryptophan and carnitine, have been found in plasma and serum of schizophrenia patients, making them biomarker candidates (Cao et al., , 2018Liu et al., 2021;Wang et al., 2019).
Since schizophrenia is considered a multifactorial disease with such a wide variety of symptoms, it is important to elucidate how the genes linked to SNPs are functionally linked together as a system. TCF4, a transcription factor (TF), has been suggested to function as a master regulator of gene network disruption during the early stages of neurodevelopment in schizophrenia (Torshizi et al., 2019). The TCF4 gene network in neural progenitor cells was found to be enriched in pathways involved in neural activity and in schizophrenia-related risk genes. In a different study that integrated data from ChIP-seq to identify the binding sites of TCF4 in neuro-derived SH-SY5Y cells and findings on its association with schizophrenia, the TCF4 gene set was found to be enriched in Gene Ontology categories such as axon and neurogenesis, genes preferentially expressed in pyramidal cells of the brain and genes whose expression is reduced in brain tissue of schizophrenia patients . Also, direct regulation by TCF4 was suggested for 13 genes associated with the 108 SNPs. TCF4 was recognized as an important regulator of neural genes and the relationship between TCF4 and schizophrenia is being widely discussed.
Although informative, previous studies have mainly focused on promoter-mediated regulation and did not consider enhancer regulation. It has become clear in recent years that enhancer regulation plays an important role in psychiatric and neurological disorders. A representative previous study showed that the 108 SNPs that were proposed to be associated with schizophrenia were significantly enriched in enhancers activated in the brain and immune system (Ripke et al., 2014). In enhancers, RNAs known as enhancer RNAs (eRNAs) are transcribed bidirectionally and it has been reported that knocking down these eRNAs decreases the expression of target mRNAs (Kim et al., 2010;Murakawa et al., 2016). It is also speculated that eRNAs promote liquid-liquid phase separation of transcriptional complexes, which results in the up-regulation of gene transcription (Henninger et al., 2021). Other studies by Arner et al. (Arner et al., 2015) and Hirabayashi et al. (Hirabayashi et al., 2019) have also reported the relationship between eRNA and mRNA expression. Transcriptome analysis of 537 postmortem brain samples (schizophrenia patients n = 258, controls n = 279) identified 118 eRNAs with different transcription patterns in the schizophrenia group and 927 genetic variants affecting the expression of enhancers (Hauberg et al., 2019). Therefore, it is essential to focus on enhancer-mediated transcriptional regulation to elucidate the underlying mechanisms of gene dysregulation relevant to the pathogenesis of schizophrenia.
Furthermore, the fact that only certain TFs such as TCF4 were analyzed in these studies raises the concern that important factors may be ignored. This limitation can be prevented by considering transcriptional regulation via enhancers as a network to avoid missing important factors. In addition, most large-scale studies about schizophrenia discuss the system based on only one set of data and do not sufficiently compare the system in terms of similarities or differences among schizophrenia populations that display various characteristics such as neurological dysfunction, inflammatory response and lipid metabolism disorders (Myint, 2012;Suzuki et al., 2019).
In this study, we investigated all TFs, both promoters and enhancers, using existing information from GeneHancer and GeneALaCart (Fishilevich et al., 2017;Safran et al., 2010;Stelzer et al., 2016) and performed a meta-analysis on multiple public datasets. For the target genes, we focused on lipid metabolism, which seems to play a common and important role in a wide range of schizophrenia phenotypes, such as psychiatric and neurological disorders and inflammatory responses (Tufano and Pinna, 2020;Varga et al., 2011;Zolezzi et al., 2017). We did not examine networks of specific types of schizophrenia data but rather found networks shared between multiple samples (Barnes et al., 2011;Brennand et al., 2015Brennand et al., , 2011Hoffman et al., 2017;Lin et al., 2016;Narla et al., 2017;Ni et al., 2020a, b;Park et al., 2020;Pietersen et al., 2014;Shao et al., 2019;Topol et al., 2015Topol et al., , 2016. To do so, we followed the open-source intelligence (OSINT) approach (Benes, 2013;Carr, 2016;Kraus, 2008) and used a variety of publicly available information that could potentially provide clues to the missing links between transcriptional regulatory networks and phenotypes of schizophrenia. What we mean here by OSINT is to collect publicly available large-scale data, interpret the data using public databases and eventually characterize molecular mechanisms that underlie biological functions and diseases (Kraus, 2008). In this study, we used as many public datasets as possible to reconstruct differentially regulated networks (DRNs), which are transcriptional regulatory networks consisting of differentially expressed genes (DEGs), taking both enhancers and promoters into account. After the reconstruction of the DRNs, the genes of the DRNs were connected with SNPs by using GTEx eQTL data to address what types of genetic traits are associated with the schizophrenia DRNs. Then, we conducted a meta-analysis of the network structure to compare the underlying transcriptional networks in various schizophrenia populations.
As a result, we found common DRNs and differences among multiple datasets. The SNP enrichment analysis, which included enhancers, suggested an association of the schizophrenia DRNs with terms in hematological cell counts, lung disease, eyes, etc. Thus, we suggest that when collecting samples from schizophrenia patients in the future, information about genotype, comorbidities, and hematological cell counts should be included, along with the transcriptome, for genetic stratification and mechanistic exploration of the disease.

Step (i): downloading the schizophrenia transcriptome data
The data processing pipeline for this study is summarized in Fig. 1a, b. We obtained all public datasets used in this study from NCBI GEO (Gene Expression Omnibus). The search terms were set to Query: "schizophrenia"[MeSH Terms] OR "Schizophrenia"[All Fields]) AND "Homo sapiens"[porgn] AND "gse" [Filter], resulting in 213 hits. We manually excluded datasets whose study type was non-coding or DNA methylation and 65 datasets remained. From these, we manually selected 27 datasets of case/control studies using neuronal cells derived from iPS cells of patients or postmortem brains that appeared to fit the purpose of this study. Of these 27 datasets, we excluded datasets that could not be processed, contained non-coding RNA (ncRNA), had redundant contents, lacked measurement of expression levels, had no comparisons between schizophrenia patients and healthy subjects, or included diseases other than schizophrenia.

Step (ii): identification of DEGs
To identify DEGs from the obtained public datasets, we first checked whether DEGs were already defined in each dataset. For the public datasets that had already set the criteria for DEGs in the related publications, we followed the original criteria in the corresponding publications. As a result, a list of DEGs was obtained from datasets 1-3, [6][7][8]12,13,and 15. For datasets 4,5,[9][10][11]and 14, where no DEG lists were available or the number of DEGs was very small, we used DEGseq (Wang All publicly available transcriptome data were obtained from NCBI GEO [step (i)]. The DEGs in each dataset were identified with the criteria as described in step (ii). The list of lipid metabolism genes was obtained from KEGG, and their TFs (layer 1 TFs) were estimated using GeneHancer via GeneALaCart query [step (iii) and (iv)]. Layer 2 TFs, which are TFs of the layer 1 TF genes were estimated in the same way. The 'List of GHIDs and TFBSs corresponding to each DEG' of layer 2, shown at the bottom of the figure, follows to the top left of (b). Pathway enrichment analysis was performed for all DEGs [step (v)].  et al., 2010) to determine the DEGs (see Data S1 for details). A multiple testing correction was conducted to adjust p-values for false discovery rate (FDR) using the method proposed by Storey and Tibshirani (FDR < 0.05, fold change > 3/2 or < 2/3) (Storey and Tibshirani, 2003).

Step (iii): conversion to KEGG IDs/identification of lipid metabolism genes
Pathway map information from the KEGG database was used to extract the lipid metabolism genes among the identified DEGs. The names and IDs of all lipid metabolism pathways were obtained from KEGG REST, along with the KEGG IDs of the genes in the pathways to create a lipid enzyme list. We also compiled a table of all gene symbols and their corresponding KEGG IDs from KEGG REST.

Step (iv): estimating TFs of DEGs related to lipid metabolism
GeneHancer, a database of human enhancers and their inferred target genes, provided in the GeneCards framework was used to identify TFs that potentially regulate each lipid metabolism gene (Fishilevich et al., 2017). The TFs were assessed in the following steps. First, arbitrary genes (lipid metabolism genes identified above) were entered in the Input field of GeneALaCart, a batch querying system for GeneCards, with 'GeneHancer' selected as the Requested Data per Gene. Next, the GeneHancer ID (GHID) and Transcription Factor Binding Site (TFBS) corresponding to each gene were extracted from the acquired data. Then, the TFs corresponding to these TFBSs, which are also DEGs, were extracted. After deleting the duplicates of the extracted TFBSs, the gene identifiers for the TFs were submitted to GeneALaCart Query and the same process was repeated again. This procedure was repeated multiple times for each public dataset. The purpose of this step was to determine how many layers of the schizophrenia transcriptional regulatory network should be examined. We expected that if we traced the network from the lipid metabolism genes to chains of upstream TFs involved in the regulation, the appearance of new TF species would saturate at some point. As a result of this procedure, we found that the number of TFBSs was saturated at the second iteration in all datasets, and thus we concluded that we could identify not only TFs that regulate lipid metabolism genes but also genes involved in the regulation of these TFs by repeating the TF estimation procedure twice. Thereafter, the TF estimation procedure was repeated twice for the remaining data, and each TF estimation result was referred to as layer 1 and layer 2, i.e., the layer 1 TFs directly regulate lipid metabolism gene expression, and the layer 2 TFs directly regulate the layer 1 genes. In the end, using the tissue information provided by GeneHancer for each GeneHancer ID, we filtered the TFBSs in layer 1 and layer 2 to include only the TFBSs associated with the brain and nerve tissues (see Data S1 for the list of tissues) for further analysis [step (vi)-(viii)].

Step (v): KEGG pathway enrichment analysis
We conducted an analysis to identify the KEGG pathways that are significantly enriched in the respective public datasets. First, we obtained non-disease pathway information (KEGG pathway IDs other than the 05000 s, 04900 s, 01500s, and 07000 s) of Homo sapiens from KEGG REST. Next, we determined the total number of DEGs in the pathways of interest. Subsequently, the probability of the number of genes with variable expression levels appearing by chance in the pathway of interest (p-value) was determined using Fisher's exact test (p < 0.05). A multiple testing correction was performed to adjust the p-values for FDR using the Benjamini-Hochberg method (FDR < 0.1).

Step (vi): comparing the differentially regulated networks
Frequencies of the TF-target gene pairs across all datasets were visualized in a heatmap and then subjected to hierarchical clustering based on Euclidean distance and Ward's minimum variance method ("ward.D2" in R language). Classification of each TF was obtained from PANTHER Protein Class (http://www.pantherdb.org/). All genes in each cluster were subjected to enrichment analysis in terms of KEGG pathways or TF classes.

Step (vii): identification and enrichment analysis of eQTL SNPs
We next identified eQTL SNPs associated with the DRNs for lipid metabolism enzymes reconstructed in the previous section. The subsequent SNP identification and enrichment analyses were performed on the TFs identified in layers 1 and 2 but not on the target genes because it was expected that inclusion of the target genes might provide nothing other than an overrepresentation of lipid metabolism-related traits. The eQTLs were identified using the GTEx v8 cis-eQTL in GTEx Portal (https://gtexportal.org/home/). The SNPs were identified in 13 tissues beginning with 'Brain' (Brain Amygdala, Brain Anterior cingulate cortex BA24, Brain Caudate basal ganglia, Brain Cerebellar Hemisphere, Brain Cerebellum, Brain Cortex, Brain Frontal Cortex BA9, Brain Hippocampus, Brain Hypothalamus, Brain Nucleus accumbens basal ganglia, Brain Putamen basal ganglia, Brain Spinal cord cervical c-1, and Brain Substantia nigra). We used statistically significant eQTLs detected in the 13 tissues. Enrichment analysis of the trait/disease annotations for eQTL SNPs associated with the DRN was performed using XGR (Xploring Genomic Relations)  for each dataset. Conditions of XGR were set as follows: [ontology : Experimental Factor Ontology; LD r 2 > = 0.8; the size of members of each term in consideration : min 3, max 2000; the minimum number of overlaps : 2; test : hypergeometric; the tail used to calculate p-values : one tail; adjust p-values : BH (Benjamini-Hochberg)]. The XGR uses EFO (Experimental Factor Ontology) developed by EMBL-EBI (Malone et al., 2010) as a controlled vocabulary for trait/disease names to handle notation variability. Subsequent functional annotations obtained from enrichment analysis of trait and disease names are based on EFO terms of FDR < 0.05.

Step (viii): bibliographic analysis
We systematically searched the PubMed database to determine if overrepresented traits/diseases were studied in association with schizophrenia. EFO terms that appeared in the public datasets were replaced with MeSH terms by searching the EMBL-EBI Ontology Lookup Service and the NCBI MeSH database. When the original EFO term did not appear, it was paraphrased to an analogical term and then replaced with a MeSH term. The MeSH terms were searched in PubMed on 10th

Source code availability
Source codes for the data processing pipeline are available online (https://github.com/YugiLab/Publications/tree/main/Okamoto2021).

Statistics of the schizophrenia transcriptome data
We selected 15 publicly available transcriptome datasets that match the subjects of this study from the downloaded data. See Table 1 for the number designations for each dataset included in this study (e.g. 'dataset 1'), GEO accession numbers, references, and phenotypes of the selected datasets. The other downloaded datasets are presented in Table 2, along with reasons for their exclusion from this study. Genes without missing data points were considered in subsequent data analysis. First, we identified 9,062 DEGs (counted by gene symbols) that exhibited significant changes in schizophrenia patients (see Data S1 for details) out of 491,917 genes (counted by gene symbols) without missing data points in total (Table 3). Lipid metabolism genes were then extracted from these DEGs (see Table 3 for KEGG IDs of these enzymes). Lists of all identified DEGs are available in Data S1.

Estimating TFs of DEGs related to lipid metabolism
The numbers of target genes (i.e. lipid metabolism genes) in the DRNs differed across the datasets. The fraction of the number of lipid metabolism genes (= target genes) in all the DEGs was 1.2 % on average (Fig. 2a). Reconstruction of the DRNs provided TFs and their modes of regulation (Promoter, Enhancer or Promoter/Enhancer) of the target genes. For all the datasets, any form of transcriptional regulation via enhancers was in the majority (Fig. 2b, c). Note that the number of TF was zero in datasets 8, 13, and 15 since no lipid metabolism genes were identified in their DEGs.

KEGG pathway enrichment analysis
Functional enrichment analysis of non-disease pathways in each public dataset detected enrichments in pathways associated with the nervous system, cell-cell adhesion, protein and amino acid, immunity and inflammation, lipid metabolism, and hematology (Fig. S1).
Neurological pathways were significantly enriched in datasets 2, 3, 5, 7, 9-11, and 14 at p < 0.05 and in datasets 2, 3, 11, and 14 at FDR < 0.1. In these data sets, at least one of the following pathways was found  Out of the 27 datasets obtained from NCBI GEO, the following 12 datasets were excluded from the analysis. The reasons for these exclusions were, inability to process the files, inclusion of ncRNAs, overlap of the contents, absence of expression assays, lack of comparison between schizophrenia patients and healthy subjects, and inclusion of diseases other than schizophrenia in the study. The corresponding accession No., reference and reason for exclusion of each dataset are shown in the table.
Lipid metabolism-related pathways appeared significantly more frequently in datasets 2 and 12 at p < 0.05, but not in any datasets at FDR < 0.1. In dataset 2, 'Steroid biosynthesis', and in dataset 12, 'Steroid hormone biosynthesis' were found to be enriched. For Immune responses and inflammation-related pathways, 'Th1 and Th2 cell differentiation', 'antigen processing and presentation', 'TGF-β signaling pathway' and 'JAK-STAT signaling pathway' were found to be enriched in datasets 5-7, 11, 12 at p < 0.05 and datasets 7 and 11 at FDR < 0.1. Other blood-related pathways such as, 'Porphyrin metabolism', 'Hematopoietic cell lineage', 'vascular smooth muscle contraction', and 'Complement and coagulation cascades' also stood out. Pathways related to protein and amino acid metabolism were also found enriched at a significantly higher frequency, including 'Tyrosine metabolism', 'Protein digestion and absorption', 'Tryptophan metabolism', 'Alanine, aspartate and glutamate metabolism', 'Arginine and proline metabolism', and 'D-Glutamine and D-glutamate metabolism'.

Comparing the differentially regulated networks
In all datasets, the number of TF species in the DRNs were saturated at layer 2, which is the layer of TFs directly regulating layer 1 genes, i.e. the TFs that directly regulate the lipid metabolism enzymes (Fig. 3a). Based on this result, we reconstructed the DRNs up to layer 2 in the subsequent analysis because the inclusion of TFs that belong to layers over layer 3 elaborates the DRNs just by repeating TF autoregulatory loops. Connectivity of the DRNs corresponding to each dataset is available in Fig. S2 and Data S2.
There was a total of 14,215 edges representing physical interactions between TFs and cis-elements of target genes, of which 7,577 were interactions via enhancers and 6,638 were via promoters. Of the total number of edges, 53.3 % were found to be interactions via enhancers.
Hierarchical clustering based on frequencies of the TF-target gene pairs across all datasets provided six clusters for the TFs (a-f, Fig. 3b) and seven clusters for the target genes (1-7, Fig. 3b). Each cluster had one or more unique datasets that characterized the cluster itself by providing TFs and target genes that were not observed in other clusters. The intersection of these clusters revealed DRNs that commonly appeared in multiple datasets. The DRNs of blocks a1, a2, b1, and b2 in Fig. 3b consisted mainly of dataset 2. Block a1 was overlapping with dataset 11. TFs of block b2 were seen partly in datasets 1, 6, and 12. Target genes of block b2 were also seen partly in dataset 7. These findings imply the existence of a transcriptional regulatory module composed of common TFs and target genes that are shared in datasets 1, 2, 6, 7, 11, and 12. Block a3, c1, and c3 were mainly composed of dataset 11. Block c3 was partly overlapping with those of dataset 10. Block d4 was mainly composed of dataset 1. TFs of block d4 were partly seen in dataset 6. Block e6 was mainly composed of dataset 12. Block f5 includes TFs and target genes from dataset 6. Block f7 includes TFs and target genes from datasets 3, 7, 10, and 14. Dataset 2 included more connections between TFs than the other datasets, such as dataset 11, and occupied a larger area in the heat map.
Similarly, there were no clusters with significantly overrepresented target gene pathways, however, some pathways had raw p-values of less than 0.05 in the enrichment analysis; four pathways in Cluster 1, 10 pathways in Cluster 2, four cancer-related pathways in Cluster 3, one pathway in Cluster 4, two pathways in Cluster 5, two pathways in Cluster 6, and three pathways in Cluster 7. These included pathways related to spinocerebellar ataxia, cancer, infectious disease, amino acid, and central carbon metabolism (Data S3). We identified 9,062 DEGs that exhibited significant changes in schizophrenia patients out of 491,917 genes without any missing data (see Data S1 for the statistical criteria in determining the DEGs). A total of 157 genes for lipid metabolism enzymes were extracted from the DEGs and converted into KEGG IDs. Information on the number of genes, missing values, etc. of each dataset is shown in the table. Since multiple Ensembl IDs can be assigned to a single gene symbol, the number of DEGs may change when the Ensembl ID is converted to the gene symbol. (datasets 4-6, 9-11).

(b) Number of TFBSs in layer 1.
The numbers and mode of transcriptional regulation of the TFBSs in the DRNs in layer 1. The numbers on the vertical axis correspond to the numbering of the public datasets shown in Table 1. The TFBSs are color-coded according to the presumed TFBS Type (mode of regulation): Green: Promoter/Enhancer, Blue: Enhancer, Red: Promoter. The criterion for defining TFBS types follows the 'Ghtype' field of GeneHancer. As shown in (a), datasets 8, 13, and 15 did not contain any lipid metabolism genes, so the TF estimation was not performed and the value was set to zero. For all datasets, enhancer-mediated transcriptional regulation occupied the majority of the data.

(c) Number of TFBSs in layer 2.
The numbers and form of transcriptional regulation of the TFBSs in the DRNs in layer 2. The numbers on the vertical axis correspond to the numbering of the public datasets shown in Table 1. The TFBSs are color-coded according to the presumed TFBS Type (mode of regulation): Green: Promoter/Enhancer, Blue: Enhancer, Red: Promoter. This criterion for defining TFBS types follows the 'Ghtype' field of GeneHancer. As shown in (a), datasets 8, 13, and 15 did not contain any lipid metabolism genes, so the TF estimation was not performed and the value was set to zero. As in layer 1, enhancer-mediated forms of transcriptional regulation dominate in all datasets, while at the same time the proportion of Promoter-derived TFBSs is increased compared to layer 1.
(caption on next page) L. Okamoto et al. 10), and lipid-related EFO terms (dataset 11) such as 'triglyceride measurement', 'lipoprotein measurement', and 'lipid or lipoprotein measurement'. See Data S5 for lists of the DEGs associated with the EFO terms.

Bibliographic analysis
The PubMed searches with schizophrenia and MeSH terms, which correspond to the overrepresented EFO terms, provided small numbers (<100) of original research articles related to blood cell counts such as eosinophil and basophil counts, whereas these terms were frequently associated with the SNPs connected with the DRNs (Fig. 4c). Also, diseases such as coronary artery disease, systemic lupus erythematosus (SLE) and Sjogren syndrome appeared in most datasets with fewer hits by MeSH term in PubMed. Terms related to the clinical picture of schizophrenia were associated with most of the datasets, and more than hundreds of papers were found: smoking-related terms, 1878, 'druginduced agranulocytosis', 341, and 'lung disease', 346. Also, terms related to brain disorders were associated with most of the datasets, and hundreds of papers were found: 'autism spectrum disorder', 1517 and 'multiple sclerosis', 313. The ocular terms appeared in most of the datasets but the numbers of papers related to schizophrenia were variable, 'eye disease', 433, 'eye measurement', 27 and 'intraocular pressure measurement', five.

Discussion
We conducted a meta-analysis of 15 publicly available transcriptome datasets to characterize diverse etiologies and underlying common mechanisms of schizophrenia that cannot be revealed by studies based on a single dataset. Reconstruction of the DRNs for lipid metabolism enzymes revealed that promoters and enhancers mediate distinct TFtarget gene pairs and that enhancer-mediated regulation covers 7,577 edges of the DRNs out of 14,215 (53.3 %). Comparison of the DRNs provided networks that are commonly utilized among the datasets, as well as differences among them. Clustering the frequencies of the TFtarget pairs elucidated biases in the common DRNs, which might be caused by sample preparation, measurement methods and environment.

KEGG pathway enrichment analysis
Enrichment analysis using the KEGG pathway map showed that pathways related to the nervous system, cell-cell adhesion, protein and amino-acid, immunology and inflammation, lipid metabolism, and hematology were significantly overrepresented. Out of the 15 public datasets analyzed in this study, two of them showed enrichment of lipid metabolism-related pathways. This implies that abnormal lipid metabolism is not a common feature, at least at the level of gene expression in brain tissue and iPS-derived neurons.
The 'Gap junction', which showed enrichment in datasets 2, 5, 9, 10, and 11, is a pathway relevant to the blood-brain barrier (BBB), which has recently been suggested to be involved with psychiatric disorders. Studies showing a relationship between schizophrenia and the BBB include research using the 22q11.2 deletion mouse model (Crockett et al., 2021). In this model, the BBB was disrupted and there was hyper inflammation and microglia/astrocyte activation. The 'ECM-receptor interaction' pathway, which is enriched at a significantly high frequency in datasets 2-5, 7, 9-11, and 14, is also expected to affect the BBB. ECM proteins and receptors are important molecules that support cell adaptation to environmental changes and regulate the growth and motility of cells (Lukashev and Werb, 1998). The composition of the ECM alters during disruption of the BBB and has been shown to directly influence the progression of neurological diseases (Baeten and Akassoglou, 2011). Furthermore, 'Adherens junction' has been suggested to be a factor related to both neurological disorders and intellectual disability. According to a previous study, human iPSC-derived neural progenitor cells with a 15q11.2 microdeletion, which is a risk factor for schizophrenia and autism, show deficiencies in the adherens junction and apical polarity (Yoon et al., 2014). Since inhibiting the function of cadherin and catenin in the developing nervous system causes disruption of the adherens junction and impairs the structural formation of the cerebral cortex (Kadowaki et al., 2007), it is possible that abnormal cell adhesion can lead to the decline of brain functions in schizophrenia patients.
The metabolic pathway map for tryptophan was observed among the amino acid metabolic pathways for which enrichment was observed in the public datasets. Tryptophan is a precursor of serotonin that is degraded by the kynurenine metabolic pathway. Significant increases in kynurenine aminotransferase mRNA and kynurenine pathway metabolites have been reported in the brain and plasma of schizophrenic disease groups, particularly those with higher cytokine levels (Kindler et al., 2020). Also, metabolic pathway maps related to candidate biomarkers of schizophrenia, such as D-glutamate, were observed . In particular, the degradation of D-type amino acids has been suggested to contribute to the degradation of NMDA receptor function in schizophrenia (Verrall et al., 2010). Research on D-amino acid oxidase (DAAO) has shown that the DAAO activity is doubled in patients with schizophrenia compared to controls (Madeira et al., 2008). Increased D-amino acid degradation due to DAAO activity leads to reduced D-serine levels, which serves as an intrinsic co-agonist of NMDA receptors in the nervous system (Madeira et al., 2008). Based on this evidence, we can speculate that the abnormalities in the nervous system associated with schizophrenia are reflected as alterations in amino acid metabolism. Enrichment of the 'Protein digestion and absorption' pathway was also found in nine datasets out of 15 (datasets 1-5, 7, 9-11). There is a report that protein fermentation in the gut correlates with the severity of psychiatric symptoms in schizophrenia . There have been other studies that found changes in the gut microbiota in schizophrenia patients, which could indicate that the state of proteins in the intestinal tract may affect the gut microbiota . Accordingly, the enrichment of the 'Protein digestion and absorption' pathway observed in many datasets suggests the involvement of brain-gut correlation in schizophrenia. Transition in the number of TFBSs when TF estimation was repeated multiple times. The horizontal axis shows the number of the TF layers (e.g., layer 1, layer 2, …), and the vertical axis shows the number of TFBSs that were estimated. The dotted line drawn in the y-axis direction indicates the point where the numbers of TFBSs are saturated. The number of TFBSs and TF species in the DRNs were saturated in layer 2 in all datasets. Considering this result, we decided to examine the DRNs up to layer 2 in the subsequent analysis. (b) DRNs reconstructed from all public datasets. Heatmap representing the presence or absence of binding of each TF and target gene combination. Hierarchical clustering based on the frequency of TF-target gene pairs in all datasets provided seven clusters for target genes and six clusters for TFs. The areas delimited by black lines represent each cluster. Red labels for the target genes indicate lipid metabolism genes. The points where the clusters were divided are indicated by the green lines. Six clusters (a-f) of TFs were formed and labeled on the x-axis in black from left to right. Seven clusters (1-7) of target genes were formed and numbered from the bottom right y-axis. (c) Classes of the TFs. Functional classes of the TF-target gene pairs in each cluster are presented in different colors according to the class to which the TF belongs. The symbol of each cluster (a-f) corresponds to the six clusters (a-f) for the TFs in (b). C2H2 zinc finger transcription factors were dominant in most of the clusters. The number of DEGs identified from public datasets and the number of SNPs associated with the DEGs are shown. The numbers on the vertical axis correspond to the public dataset numbering shown in Table 1. Cyan: DEG and red: SNP. There were on average approximately three times more SNPs than DEGs, although the numbers differed greatly among each dataset. (b) Results of the SNP Enrichment analysis. The results of the SNP enrichment analysis identified from DEGs that belong to the DRNs are shown in the heat map. The vertical axis shows the representative upper level EFO terms enriched as the result of the analysis, and the numbers on the horizontal axis correspond to the numbering of the public datasets shown in Table 1. Enrichment of EFO terms related to psychiatric and neurological diseases was observed for datasets 1, 4-6, and 10. In addition, terms related to immunity, respiratory, eye, and skin were overrepresented in multiple datasets. (c) Overrepresented EFO terms and their appearance in the literature. The number of times the EFO terms appeared in the public datasets (horizontal axis) and the number of original papers (vertical axis) that were found in PubMed when the EFO terms or their related terms were used for MeSH terms and searched with schizophrenia. Blue dots indicate EFO terms that appeared in three or more databases and had more than 100 original papers. Blue dots include EFO terms related to brain disorders such as 'autism spectrum disorder' and 'multiple sclerosis'. In addition, inflammation-related terms such as 'inflammatory biomarker measurement' and 'abnormality of the immune system' were also included. Red dots indicate EFO terms that appeared in three or more datasets with fewer than 100 original papers. The red dots included terms related to blood cells, such as 'basophil count' and 'eosinophil count', and diseases such as 'coronary artery disease', 'systemic lupus erythematosus' and 'Sjogren syndrome'.
In dataset 12, we found enrichment of the 'Th1 and Th2 cell differentiation pathway'. The levels of proinflammatory markers, such as cytokines, are found to increase in both blood and the brain of patients with schizophrenia, while some individuals display mild systemic inflammation (Müller, 2018;Upthegrove and Khandaker, 2020). Moreover, several studies on the Th1/Th2 cytokine balance in psychiatric disorders have been conducted over the years. Based on this, it has been hypothesized that schizophrenia may be associated with an imbalance in these cytokines, resulting in a relative dominance of Th2 immunity in schizophrenia (Avgustin et al., 2005;Schwarz et al., 2001aSchwarz et al., , 2001bTeixeira et al., 2008). Also, the TGF-β signaling pathway, which inhibits Th1 differentiation and activity (Gorelik et al., 2002), showed enrichment in datasets 5, 7, and 11. TGF-β is known to be increased in the blood and in blood cells of schizophrenia patients at both protein and mRNA levels, respectively, which may also be linked to disruption of the Th1/Th2 cytokine balance (Amoli et al., 2019;Borovcanin et al., 2012;Kim et al., 2004). Note, however that there have been results that do not support this hypothesis, and no consistent conclusion has yet been reached (Potvin et al., 2008).
The Hippo signaling pathway was also found enriched in multiple datasets. In recent years, this signaling pathway has been shown to play an important role in development of the nervous system, such as proliferation of neural stem cells, differentiation of NSCs into neurons, synapse formation, and brain development, as well as controlling immunity (Cheng et al., 2020;Sahu and Mondal, 2021). It has been found that the Hippo pathway is involved in neurodegenerative disorders and it has been suggested that it may also contribute to other neurodegenerative diseases and neuropsychiatric disorders (Cheng et al., 2020;Sahu and Mondal, 2021;Stepan et al., 2018). A study reported that genes in the Hippo signaling pathway were included amongst the genes that appeared commonly linked to schizophrenia, autism, bipolar disorder, and obsessive-compulsive disorder (O'Connell et al., 2018).
To summarize, pathway analysis based only on DEGs of schizophrenia patients, without including transcription factors, showed significant enrichment of pathways (e.g. nervous system, cell-cell adhesion, protein and amino-acid, etc.) that support the characteristics of schizophrenia. In addition, several pathways were found to be associated with the BBB. This result provides a rationale to explore the involvement of the BBB in understanding the pathogenesis of schizophrenia.

Comparing the differentially regulated networks
Counting the frequency of TF-target gene pairs across all datasets revealed that there are DRNs that commonly appear in multiple datasets and, at the same time, there are DRNs specifically preferred by particular datasets (Fig. 3b). We also found that more than half of the edges in the DRNs (53.3%, 7,577 out of 14,215) involve regulation via enhancers (Fig. S2, Data S2). The prevalence of transcriptional regulation via enhancers implies that the reconstructed networks in previous studies, which were mainly based on the regulation via promoters, may possibly have missed more than half the transcriptional regulatory networks.
Our purely data-driven analysis suggested the relevance of some TFs to schizophrenia. Among the DRNs, one of the most frequently appearing TF-target gene pairs was the autoregulation of KLF6. The KLF6-KLF6 pair was significantly overrepresented in four datasets out of 15, an observation supported by a recent GWAS that reported association of schizophrenia and the rs17731 SNP that is in an exon of KLF6 (Peyrot and Price, 2021). KLF6 belongs to the C2H2 zinc finger transcription factor class that was enriched in Clusters 3 and 8. Other autoregulated TFs frequently found in the reconstructed transcriptional regulatory networks, such as PBX2, ZIC2, and TCF7L2, were also reported to have risk alleles associated with schizophrenia (Hatayama et al., 2011;Liu et al., 2017;Saito et al., 2016), whereas LEF1, whose autoregulation was the most frequently appearing TF-target gene pair in the reconstructed transcriptional regulatory networks, was reported to have a genetic association with leukemia (Berndt et al., 2013;Law et al., 2017) and autoimmune diseases (Okada et al., 2012;Sakaue et al., 2021;Yin et al., 2021). Autoregulation by these TFs, all of which proceeds through both promoters and enhancers, implies some relevance of positive feedback loops as one of the systemic characteristics of schizophrenia DRNs.
Clustering of the TF-target gene pairs based on their frequencies of occurrence revealed biases in frequently utilized DRNs among the datasets (Fig. 3c). The bias could be caused by the variation of numbers of DEGs among the datasets that are possibly rooted in differences in sample preparation protocols, measurement methods, environmental factors around the individuals, etc. For example, the DRNs based on datasets from similar sample origins (e.g. datasets 3, 10, and 14 in block f7) tended to be gathered in the same blocks or overlapped with each other. To explore the origin of such dataset-specific biases in future studies, measuring the genotypes of patients together with their transcriptomes would be helpful to discern whether the biases are genetic or environmental.

Identification and enrichment analysis of eQTL SNPs/bibliographic analysis
In datasets 4, 5, 10-12, the term 'hematological measurement' was significantly enriched, including 'erythrocyte measurement' and its subterms, 'eosinophil count', 'basophil count', and 'agranulocytosis'. There is a report of a patient with schizophrenia who developed acute myelogenous leukemia (AML) and was treated by bone marrow transplantation, resulting in remission of the schizophrenia as well (Miyaoka et al., 2017). Furthermore, several GWAS SNPs that are significantly associated with leukemia and autoimmune diseases are mapped on LEF1 (Berndt et al., 2013;Law et al., 2017;Okada et al., 2012;Sakaue et al., 2021;Yin et al., 2021), the most frequently appearing TF in the DRNs. In addition, patients with AML, chronic myelogenous leukemia, and myelodysplastic syndromes, a precursor of these diseases, also present with eosinophilia and basophilia (Dobrowolski et al., 2019;Gotlib, 2017;Karasuyama et al., 2018;Matsushima et al., 2003). Fluctuations of blood cell counts have been observed in schizophrenia as well as in leukemia. It has been reported that total leukocytes, neutrophils, and monocytes were significantly increased in the blood of schizophrenia patients compared to controls (Jackson and Miller, 2020;Steiner et al., 2020a). However, no consistent pattern has yet been identified for the changes in eosinophils and basophils. Since eosinophils release Th1-type pro-inflammatory cytokines (IL-6, IFN-gamma, TNF-alpha, etc.), this may suggest a relationship to the inflammation observed in schizophrenia (Spencer et al., 2009). High levels of eosinophils are also detected in autoimmune diseases such as atopic dermatitis and asthma, and it is known that the heart, lungs, skin, and nervous system can be damaged by the increased eosinophils (Diny et al., 2017). Asthma due to increased eosinophils  and pneumonia (eosinophilic pneumonia) are examples of such diseases.
Based on this evidence, we can hypothesize that schizophrenia may be a form of autoimmune disease. As a matter of fact, several terms indicating autoimmune diseases appeared in the enrichment analysis of SNPs; i.e. 'systemic lupus erythematosus' (datasets 4, 5, 10, and 11) and 'ulcerative colitis' (datasets 2 and 4). It has been argued that exposure to autoimmune diseases raises the risk of developing schizophrenia (Benros et al., 2011). It has also been reported that patients with autoimmune encephalitis exhibit negative symptoms and cognitive dysfunction similar to what is seen in schizophrenia (Hao et al., 2017;Steiner et al., 2020b). These observations support the possibility that inflammation, such as that caused by autoimmune diseases, may be one of the causes of schizophrenia.
In datasets 1, 4, and 10, the terms 'lung disease' and 'smoking behaviour (status) measurement' were enriched, respectively. It is known that people with schizophrenia have a higher risk of lung cancer morbidity and mortality than those without the disease Pettersson et al., 2020). In a GWAS involving more than 220,000 individuals, three loci were found to be shared by schizophrenia and lung cancer (Zuber et al., 2018). The locus with the strongest association included the nicotinic acetylcholine receptor, which has been established as a multifaceted locus shared between lung cancer and smoking behavior. The other two remaining loci showed no genetic association with smoking. Moreover, studies have shown that impaired cardio-respiratory fitness is associated with cognitive dysfunction (Filik et al., 2006;Holmen et al., 2018) and, in fact, patients with schizophrenia have a higher prevalence of angina and respiratory symptoms, and often have impaired lung function. It is also believed that there are genetic factors that increase the risk of both becoming a smoker and developing schizophrenia (Hartz et al., 2018).
Interestingly, the enrichment of terms related to the eyes were conspicuous in the results, including 'eye disease' in datasets 4, 5, 10, and 11, and 'eye measurement' in datasets 4-6, 10, and 11. Studies have shown that there are multiple structural and physiological disorders in the eyes of individuals with schizophrenia. More specifically, patients can exhibit retinopathy, cataracts, decreased visual acuity, strabismus, increased retinal vein width, loss of ganglion cell axons, abnormal dopaminergic activity in photoreceptors, and decreased GABA-and glycine-related lateral inhibition (Jurišić et al., 2020;Silverstein and Rosen, 2015). These symptoms could be related to schizophrenia or they may be caused by comorbidities such as diabetes (e.g. diabetic retinopathy) or psychotropic medications. Other follow-up studies of ocular imaging (e.g. Optical Coherence Tomography and Optical Coherence Tomography Angiography) in patients with type 1 diabetes reported that retinal nerve and vascular structures were associated with a decline in cognitive function (Fickweiler et al., 2021).
Collectively, we found that terms related to immunity and inflammation tend to appear more frequently in the SNP enrichment analysis, including transcription factors, compared to the pathway enrichment analysis of DEGs obtained from the public datasets. Thus, the immune system appears to serve a major role in the pathogenesis of schizophrenia, as pointed out in previous studies. The results also suggested an association with blood cell counts, such as eosinophils and basophils, and the ocular system. However, there are very few publications that have investigated the relationship between schizophrenia and these traits (Fig. 4c). Considering their frequent occurrence in the SNP enrichment analysis, we suggest that further studies are needed on the relationship between schizophrenia and these traits in the future.

Extending the research strategy to other target genes and other omic layers
Although this meta-analysis examined only transcriptome and SNP data, it is possible to explore far more mechanistic details if other omics data are measured. Gene expression regulation by ncRNA is another possible future addition to this work. In addition to miRNAs, which were mentioned in the Introduction, ncRNAs that have been reported to be altered in schizophrenia are long noncoding RNA (lncRNA) and Piwiinteracting RNA (piRNA) (Bundo et al., 2014;Mishra and Kumar, 2021;Ni et al., 2020a, b). piRNA exerts epigenetic functions such as silencing retrotransposons in germ cells and post-transcriptional regulation that contribute to gene expression in nerve cells. The copy number of LINE-1, a piRNA, has been confirmed to increase in neurons differentiated from patient-derived iPS cells including those with the 22q11 deletion (Bundo et al., 2014).
ncRNA data were not measured in all the public datasets used in this study, so it was not possible to compare the DRNs that we reconstructed with the network that includes regulation by miRNAs (Guo et al., 2010). TargetScan (http://www.targetscan.org/vert_72/) facilitates predicting of miRNA target genes. For analysis of lncRNAs, NONCODE (htt p://www.noncode.org) and LNCipedia (https://lncipedia.org/) are available. piRBase provides manually curated piRNA data (http://www. regulatoryrna.org/database/piRNA/). In our next work, we plan to reconstruct gene expression regulatory networks in which ncRNA is also taken into consideration by using these resources based on the OSINT approach.
In the publicly available transcriptome datasets used in this study, only three (datasets 6, 8, and 14) out of 15 were published with the genotype of the patients. There were two more datasets (datasets 2 and 7) that showed family histories but did not show specific genotypes. As mentioned in the discussion corresponding to the comparison of the DRNs, if the genotype of the patient is known, differences among the DRNs could be interpreted for each genotype. In addition to the limited genotype information, it was not possible to access other data, such as descriptions of comorbidities of the patients. Based on the results of this study, we propose to collect information such as gene deletions, descriptions of comorbidities and blood cell counts in comprehensive studies of schizophrenia in the future.

Conclusion
We conducted a meta-analysis of 15 publicly available transcriptome datasets of neural cells from schizophrenia patients to reconstruct and compare transcriptional regulatory networks for lipid metabolism enzymes. Comparison of the DRNs revealed commonly utilized network modules among the datasets, as well as differences between them. Since promoters and enhancers may cover different networks, it is essential to include not only TF-target gene pairs via promoters but also those via enhancers. In fact, 53.3 % of the total 14,215 edge pairs were involved in enhancer-mediated regulation in our study. However, by clustering the TF-target gene pairs based on their frequency of occurrence, we also found that there was a bias in the frequent DRNs among the datasets. Possible reasons for this bias are differences in sample preparation protocols, measurement methods and environmental factors surrounding individuals among the datasets. Moreover, the enrichment analysis of the SNPs associated with the genes in the DRNs led to the suggestion that terms involved with hematological and eye systems were associated with schizophrenia. Although these terms appeared in many datasets, the number of publications studying their association with schizophrenia was small. Based on these results, we suggest that when collecting samples from schizophrenia patients in the future, information such as genotype, description of comorbidities and hematological cell counts should be included, as well as the transcriptome, for more detailed genetic stratification and mechanistic exploration of schizophrenia.