Updating genome annotation for the microbial cell factory Aspergillus niger using gene co-expression networks

Abstract A significant challenge in our understanding of biological systems is the high number of genes with unknown function in many genomes. The fungal genus Aspergillus contains important pathogens of humans, model organisms, and microbial cell factories. Aspergillus niger is used to produce organic acids, proteins, and is a promising source of new bioactive secondary metabolites. Out of the 14,165 open reading frames predicted in the A. niger genome only 2% have been experimentally verified and over 6,000 are hypothetical. Here, we show that gene co-expression network analysis can be used to overcome this limitation. A meta-analysis of 155 transcriptomics experiments generated co-expression networks for 9,579 genes (∼65%) of the A. niger genome. By populating this dataset with over 1,200 gene functional experiments from the genus Aspergillus and performing gene ontology enrichment, we could infer biological processes for 9,263 of A. niger genes, including 2,970 hypothetical genes. Experimental validation of selected co-expression sub-networks uncovered four transcription factors involved in secondary metabolite synthesis, which were used to activate production of multiple natural products. This study constitutes a significant step towards systems-level understanding of A. niger, and the datasets can be used to fuel discoveries of model systems, fungal pathogens, and biotechnology.


INTRODUCTION
The genus Aspergillus (phylum Ascomycota) is comprised of nearly 350 species of saprophytic and ubiquitous fungi, and includes important pathogens of humans (Aspergillus fumigatus), model organisms (Aspergillus nidulans) and microbial cell factories (Aspergillus oryzae, Aspergillus niger). Aspergillus niger has been exploited for over a century by biotechnologists for the production of organic acids, proteins and enzymes (1). It is the major worldwide producer of citric acid with an estimated value of $2.6 billion in 2014, which is predicted to rise to $3.6 billion by 2020 (2). As a prolific secretor of proteins, A. niger is used to produce various enzymes at a bulk scale (1). The first A. niger genome was sequenced in 2007, which contained an estimated 14,165 coding genes ∼6,000 of which were hypothetical (3). More recent sequencing of additional A. niger strains and other genomes from the genus Aspergillus (4)(5)(6)(7)(8)(9), combined with refinement of online genome analyses portals (10)(11)(12), and comparative genomic studies amongst the Aspergilli (5,(13)(14)(15), have not significantly increased the percentage of A. niger genes that have functional predictions. While the exact number of 'hypothetical' genes varies between databases and A. niger genomes, recent estimates suggest that between 40 and 50% of the genes still remain hypothetical (1,16). Furthermore, only 2% of its genes (n = 247) have a verified function in the Aspergillus Genome Database (AspGD (17)). Even for the gold-standard model organism Saccharomyces cerevisiae, 21% of its predicted genes have dubious functional predictions (18), despite its high genetic tractability and a research community with >1,800 research labs worldwide. Indeed, gene functional predictions for A. nidulans, A. fumigatus, A. oryze and other Aspergilli typically cover 40-50% of the genome (16,17).
Such high frequency of unknown and hypothetical genes severely limits the power of systems-level analyses. One approach to overcome this limitation involves the generation and interrogation of gene expression networks based on transcriptomic datasets (19)(20)(21). The hypothesis underlying this approach is that genes which are robustly co-expressed under diverse conditions are likely to function in the same or closely related biological processes or pathways (22).
As one example, accurate delineation of fungal secondary metabolite biosynthetic gene clusters has been achieved by robustly defining contiguous gene co-expression during both in vitro (23) and infectious (24) growth, with coexpression analysis pipelines now publicly available to noncoders (25).
In this study, we conducted a meta-analysis of 155 publicly available transcriptomics analyses for A. niger, and used these data to generate a genome-level co-expression network and sub-networks for >9,500 genes. To aid user interpretations of gene biological process, gene sub-networks were analysed for enriched gene ontology (GO) terms, and integrated with information gleaned from 1,200 validated genes from the genus Aspergillus. Interrogation of selected co-expression sub-networks for verified genes and randomly selected hypothetical genes confirmed high quality datasets that enable rapid and facile predictions of biological processes. This co-expression resource has been integrated in the functional genomic database FungiDB (10) for use by the research community.
In order to validate that novel predictions of gene biological function were accurate, we functionally characterized genes which we hypothesized played a role in A. niger natural product synthesis based on co-expression datasets. Experimental validation included generation of null and overexpression mutants of transcription factors present in these sub-networks, controlled bioreactor cultivations, and activation of secondary metabolite gene expression and metabolite biosynthesis. In addition to demonstrating that this study enables novel predictions of A. niger gene function, these data suggest novel mechanisms for activating cryptic secondary metabolism can be used in natural product discovery programs, which is urgent due to the emergence of multi-resistant bacteria and fungi (26,27). As A. niger has been shown to be a superior expression host for medicinal drugs in g/l scale (28), such discoveries have significant translational potential. Taken together, the coexpression resources and experimental validation developed in this study enable high quality gene functional predictions in A. niger.

Strains and molecular techniques
Aspergillus niger strains used in this study are summarized in Supplementary Table S1. Media compositions, transformation of A. niger, strain purification and fungal chromosomal DNA isolation were described earlier (29). Standard PCR and cloning procedures were used for the generation of all constructs (30) and all cloned fragments were confirmed by DNA sequencing. Correct integrations of constructs in A. niger were verified by Southern analysis (30). In the case of overexpressing TF1, TF2 and HD, the respective open reading frames were cloned into the Tet-on vector pVG2.2 (31) and the resulting plasmids integrated as single or multiple copies at the pyrG locus (for details see Supplementary  Table S1). Deletion constructs were made by PCR amplification of the 5 -and 3 -flanks of the respective open reading frames (at least 0.9 kb long). N402 genomic DNA served as template DNA. The histidine selection marker (32) was used for selecting single deletion strains, whereas the pyrG marker was used for the establishment of the strain deleted in both TF1 and TF2. Details on cloning protocols, primers used and Southern blot results can be requested from the authors.

Gene network analysis and quality control
As of 2 March 2016, 283 microarray data (platform: GPL6758) of A. niger covering 155 different cultivation conditions were publically available at the GEO database (33), whose processing and normalization of the arrays have been published (34). In brief, array data in the form of CEL-files (35) were processed using the Affymetrix analysis package (35) (version 1.42.1) from Bioconductor (36) and expression data were calculated for genes under each condition with an MAS5 background correction. Pairwise correlations of gene expression between all A. niger genes were generated by calculating the Spearman's rank correlation coefficient (37) using R. To assess a cut-off indicating biological relevance, Spearman correlations were firstly calculated using a pseudo random data set whereby normalized transcript values for each individual gene were randomized amongst the 283 arrays and 155 experimental conditions. Using this pseudo random data set, the Spearman's rank coefficient was calculated pairwise for all predicted A. niger genes, giving a total of 104,958,315 comparisons/calculated Spearman's rank coefficients, from which 52,476,536 were positively correlated. From these, only two were greater than |0.4| and none above |0.5|. Subsequently |0.5| was taken as a threshold for co-expression. Sub-networks were calculated at an individual gene level using Python. All genes that were co-expressed with individual query ORFs are reported at FungiDB (10) and summarized in Supplementary File 1 using a |≥0.5| and |≥0.7| Spearman cut-off.
To expedite investigation of the sub-networks and their common biological process, gene ontology enrichment (GOE) was implemented using Python version 2.7.13. GO terms and their hierarchical structure were downloaded from AspGD (17). Enriched GO Biological Process terms for all genes residing in a query sub-network were calculated relative to the A. niger genome and statistical significance was defined using the Fishers exact test (P-value < 0.05). For all datasets now available at FungiDB, GO enrichment is conducted using Fisher's exact test and Bonferroni corrections (10). For informant ORFs, experimentally verified A. niger genes were retrieved from AspGD (17). Additionally, A. niger orthologs for any gene with wet lab verification in A. fumigatus, A. nidulans or A. oryzae were identified using the ENSEMBL BLAST tool using default settings (12). Finally, 81 secondary metabolite core enzymes (39,40) were also defined as informant ORFs. We generated informant ORF ('prioritized ORF') sub-networks, which report significant co-expression of query genes exclusively with one or more informant ORFs which are reported in Supplementary Table S2.

Reporter gene expression
Protocols for luciferase-based measurement of gene expression in microtiter format based on Tet-on (31) or anafp (34) promoter systems have been published. In case of strain BBA17.6, unable to form spores, the strain was inoculated on complete medium and allowed to grow for 7 days at 30 • C. Biomass was harvested using physiological salt solution and used for inoculation. All data shown are derived from biological duplicates each measured in technical quadruplicates if not otherwise indicated. Raw datasets can be requested from the authors.

Bioreactor cultivation
Medium composition and the protocol for glucose-limited batch cultivation of A. niger in 5 l bioreactors have been described (31). In the case of strains overexpressing TF1 (strain MJK10.22) and TF2 (strain MJK11.17), the Teton system was induced with a final concentration of 10 g/ml doxycycline when the culture reached 1 g/kg dry biomass. Samples for transcriptional and metabolome profiling were taken ∼6 h (∼72 h) after induction, i.e. during exponential (post-exponential) growth phase. For control (MJK17.25) and deletion strains (MJK14.7, MJK16.5, MJK18.1), doxycycline was added twice; once after the culture reached 1 g/kg dry biomass and 24 h before samples were taken from post-exponential growth phase for transcriptomics and metabolomics analyses. To avoid modifications in gene expression due to degradation of doxycycline in growth media, 10 g/ml doxycycline was added every 10-12 h (five times in total) during growth of conditional expression mutants. Growth and physiology profiles for all seven strains cultivated in biological duplicates are summarized in Supplementary File 4.

Transcriptional profiling
Total RNA extraction, RNA quality control, and RNA sequencing were performed at GenomeScan (Leiden, the Netherlands). Quality analysis of raw data was done as previously described (41). In brief, ∼13 million reads of 150 bp were obtained from paired-end mode for each sample. Read data were trimmed and quality controlled with FastQC (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc/). STAR (42) was used to map the reads to the A. niger CBS 513.88 genome (http://fungi.ensembl. org/). On average, the unique alignment rate was ∼95%. Data normalization was performed with DEseq2 (43). Differential gene expression was evaluated with Wald test with a threshold of the Benjamini and Hochberg False Discovery Rate (FDR) of 0.05 (44) with DEseq2. Raw and processed data are summarized in Supplementary Table S3 and have been deposited at the GEO database (33) under the accession number GSE119311.

Metabolome profiling
Metabolites were extracted from biomass corresponding to 2.5 mg biomass dry weight by Metabolomic Discoveries GmbH (Potsdam, Germany) and identified based on Metabolomic Discoveries' database entries of authentic standards. Liquid chromatography (LC) separation was performed using hydrophilic interaction chromatography with a iHILIC ® -Fusion, 150 × 2.1 mm, 5 m, 200Å 5 m, 200 A column (HILICON, Umeå Sweden), operated by an Agilent 1290 UPLC system (Agilent, Santa Clara, USA). The LC mobile phase was (i) 10 mM ammonium acetate (Sigma-Aldrich, USA) in water (Thermo, USA) with 5% and 95% acetonitrile (Thermo, USA) (pH 6) and (ii) acetonitrile with 5% 10 mM ammonium acetate in 95% water. The LC mobile phase was a linear gradient from 95% to 65% acetonitrile over 8.5 min, followed by linear gradient from 65% to 5% acetonitrile over 1 min, 2.5 min wash with 5% and 3 min re-equilibration with 95% acetonitrile. The flow rate was 400 l/min and injection volume was 1 l. Mass spectrometry was performed using a high-resolution 6540 QTOF/MS Detector (Agilent, Santa Clara, USA) with a mass accuracy of <2 ppm. Spectra were recorded in a mass range from 50 m/z to 1700 m/z at 2 GHz in extended dynamic range in both positive and negative ionization mode. The measured metabolite concentrations were normalized to the internal standard. Significant concentration changes of metabolites in different samples were analyzed by appropriate statistical test procedures (ANOVA, paired t-test) using R. When the adjusted P value based on Benjamini and Hochberg FDR (44) was lower than 0.05 and the fold change (log2) higher than ±1, expression of the metabolites was considered as significantly different.

The transcriptomic landscape of A. niger inferred from a gene expression meta-analysis
We normalized and interrogated gene expression across 155 published transcriptomic analyses for the A. niger laboratory wildtype strain N402 (ATCC 64974) and its descendants, comprising of 283 Affymetrix microarray experiments in total (34). Experimental parameters include a diverse range of cultivation conditions (agar plate, bioreactor, shake flask), developmental and morphological stages (germination, mycelial growth, sporulation), deletion and disruption mutants, stress conditions (antifungals, secretion stress, pH), different carbon and nitrogen sources, starvation, and co-cultivation with bacteria. These experimental conditions represent diverse niches inhabited by A. niger as well as industrial cultivation conditions, in addition to (a)biotic and genetic perturbations that result in global changes in gene expression.
In order to demonstrate that accurate values of transcript abundance were derived from this meta-analysis, we plotted average gene expression values for each gene throughout the 155 conditions as a function of chromosomal locus (Figure 1A). From these data, we categorized low, medium, and highly expressed loci. Subsequently, we generated a DNA cassette expressing a luciferase reporter gene under control of the inducible Tet-on promoter (31) and targeted it to the 5 -upstream region of two low and one high expression locus ( Figure 1B). The pyrG locus present on chromosome III, routinely used for gene-targeted integration in A. niger, served as locus control for Tet-on driven medium expression of luciferase (31). Luciferase levels measured at these loci were confirmed to be low, medium and high in relative terms in microtiter cultivations of the different A. niger strains ( Figure 1C) and in controlled batch cultivation at bioreactor scale ( Figure 1D). These data demon- strate that low, medium, or high expression at these loci are applicable for both high-throughput assays (microtiter) and more labour and resource intensive bioreactors cultivations. We conclude that the microarray data accurately reflect A. niger gene expression values. Note that this transcriptomic landscape is a significant addition to the A. niger molecular toolkit, as it facilitates rational control of gene dosage (time of induction and absolute expression level) by targeted locus-specific integration of a gene of interest.

Construction of high quality co-expression networks for A. niger
Experimentally validated gene expression data from the A. niger transcriptional meta-analysis was utilized to generate a gene co-expression network based on Spearman's rank correlation coefficient (37). In order to define a minimum Spearman's rank correlation coefficient ( ) for which we could be confident in extracting biologically meaningful coexpression, we conducted a preliminary quality control experiment, where transcript values for each individual gene were randomized amongst the 155 experimental conditions. This gave a dataset with identically distributed but randomized expression patterns. Next, we calculated every possible transcriptional correlation between genes on the A. niger genome, resulting in over 100 million -values. This identified 52 million positive and 48 million negative correlations ( Figure 2). From this dataset, only two -values were above |0.4|, and none were above |0.5|. Consequently, we took ≥ |0.5| as a minimum cut-off for biologically meaningful coexpression relationships. Calculations of Spearman correlations using the non-randomized microarray data resulted in over 4.5 million correlations which passed the minimum ≥ |0.5| cut-off. From these datasets, co-expression subnetworks for every gene in the global network were generated for both positively and negatively correlated genes ( Figure 2, Supplementary File 1). We classified them into two groups: 'stringent' ( ≥ |0.5|, 9,579 gene networks) and 'highly stringent' ( ≥ |0.7|, 6,305 gene networks) and calculated enriched GO terms for each gene sub-network relative to the A. niger genome.

Integration of co-expression networks with community-wide experimental evidence of gene function
The Aspergillus community has functionally characterized over a thousand genes in different species of the genus Aspergillus, which we reasoned can be used to aid a priori  (3), because we have omitted truncated ORFs reported there from our analysis), sub-networks were calculated which report all significant correlations with the query gene of ≥ |0.5| (stringent dataset) and of ≥ |0.7| (very stringent dataset). Both sub-networks were analyzed for significantly enriched GO terms (biological processes). Additionally, we highlight co-expression between query genes and 'informant ORFs', which have either been experimentally validated in the Aspergilli, or are predicted to encode key secondary metabolite biosynthetic enzymes (C). predictions of hypothetical genes or not yet verified genes in A. niger. In order to integrate such experimental data with the co-expression network, we mined the Aspergillus genome database AspGD (17) to generate a near-complete list of ORFs that have been functionally characterized in Aspergilli. All experimentally validated ORFs for A. niger (n = 247), A. nidulans (n = 639), A. fumigatus (n = 218) and A. oryzae (n = 81) were included in this dataset. Given the strong potential of A. niger as a platform for discovery and production of new bioactive molecules, we also included 81 putative polyketide synthase (PKS) or nonribosomal peptide synthetase (NRPS) encoding genes of A. niger that reside in 78 predicted secondary metabolite clusters (39,40), giving in total 1,266 prioritized ORFs. For every gene in the A. niger genome, we calculated co-expression interactions specifically with these 1,266 rationally prioritized ORFs. A total of 9,263 ( ≥ |0.5|) and 5,178 ( ≥ |0.7|) candidate genes had one or more correlations with prioritized ORFs (Supplementary Table S2). These datasets thus constitute the most comprehensive co-expression resource for a filamentous fungus and are accessible at FungiDB (10).

Co-expression resources enable facile predictions of gene biological function
In order to test whether biologically meaningful interpretations of gene and network function can be extracted from these resources, we interrogated both stringent and highly stringent datasets for genes where the biological processes, molecular function, and subcellular localization of encoded proteins have been studied in fungi and which represent the broad range of utilities and challenges posed by fungi. From the perspective of industrial biotechnology, we interrogated networks for the gene encoding the ATPase BipA, which is required for high secretion yield of industrially use-ful enzymes by acting as chaperone to mediate protein folding in the endoplasmic reticulum (45). With regards to potential drug target discovery, we analyzed gene expression networks for Erg11 (Cyp51), which is the molecular target for azoles (46). For assessment of virulence in both plant and human infecting fungi, we interrogated networks for the NRPS SidD, which is necessary for the biosynthesis of the siderophore triacetyl fusarinine C, and ultimately iron acquisition during infection (47). Assessment of all control sub-networks at GO and individual gene-level revealed striking co-expression of genes encoding proteins involved in respective metabolic pathways, associated biological processes, subcellular organelles, protein complexes, known regulatory transcription factors/GTPases/chaperones, and cognate transporters, amongst others ( Figure 3). The lowest Spearman correlation coefficient of |0.5| clearly results in biologically meaningful gene co-expression as exemplified by the delineation of diverse yet related processes, including: (i) orchestration of retrograde/anterograde vesicle trafficking via COPI/COPII/secretion associated proteins (BipA) (48); (ii) coordination of ergosterol biosynthesis by sterol regulatory binding element regulators SrbA/SrbB and association of this pathway with respiration at the mitochondrial membrane (Erg11) (49) and (iii) and the linking of respective ergosterol and ornithine primary and secondary metabolic pathways during siderophore biosynthesis via the interdependent metabolite mevalonate (SidD) (50). With regards to co-expressed genes as a function of chromosomal location, a common feature of filamentous fungal genomes is that genes necessary for the biosynthesis of secondary metabolite products occur in physically linked contiguous clusters. SidD resides in a six-gene cluster with SidJ, SidF, SidH, SitT and MirD, all of which were represented in the Figure 3. Schematic representation of co-expression networks for BipA, SidD, and Erg11 encoding genes. Genes are represented by circles, with positive and negative correlations depicted by grey and red lines, respectively. For simplicity, protein names are given in the centre of each circle. Where names were not available in A. niger, we used the name for either A. nidulans or S. cerevisiae ortholog. Query genes encoding BipA, SidD, or Erg11 are given in black diamond boxes. Co-expression sub-networks are given for gene pairs passing |0.7| (A) and |0.5| (B) Spearman correlation coefficient cut-offs. Both sub-networks were assessed for enriched GO terms, and genes were manually filtered into functional categories based on these analyses and interrogation of research literature (C). Note that in each instance, hypothetical proteins were co-expressed with each query gene, indicating the encoded products are somehow associated with these biological processes.
high stringency network that contained a total of only 13 genes (Figure 3).
Based on enriched GO terms from gene sub-networks and co-expression with experimentally verified ORFs, we could further rapidly infer biological processes for a total of 2,970 ( ≥ |0.5|) and 1,016 ( ≥ |0.7|) hypothetical genes that were positively and/or negatively associated using this analysis as exemplarily shown for eight hypothetical genes in Supplementary Table S4. Additionally, we interrogated entire families of functionally related genes that have been well characterized in the Aspergilli, including phosphatases, chromatin remodelers, and transcription factors and were able to assign novel biological processes for all these predicted genes as exemplarily shown for nine genes in Supplementary Table S5.
Additionally, in order to confirm that hypotheses regarding gene function and co-expression networks were independent of the array platform and/or strain utilised, we compared data derived from the microarray used in our study (specifically the Affymetrix technology for A. niger isolate CBS 513.88), with a separate tri-species platform for strain ATCC 1015 (51). This latter microarray has been used to concomitantly compare batch cultivations of A. niger, A. oryzae and A. nidulans on glucose and xylose media, which identified 23 genes to be a conserved response to xylose utilisation, including the xylose transcriptional regulator XlnR (An15g05810, (51)). Despite differences in strain, microarray platform, and experimental design, interrogation of the XlnR sub-network from our study revealed strong concordance with the conserved xylose response genes reported by Andersen et al. (51), including those encoding an l-arabitol dehydrogenase (An01g10920), aldose 1-epimerase (An02g09090), glycoside hydrolase (An12g01850), xylitol dehydrogenase (An12g00030), sugar transporter (An03g01620) and shortchain dehydrogenase (An04g03530). We therefore conclude that sub-networks generated in our study can be used to define A. niger co-expression relationships and infer gene function across strain backgrounds.
Taken together, these quality control experiments strongly suggest that the co-expression resources developed in this study can be used for high confidence hypothesis generation at a variety of conceptual levels, including biological process, metabolic pathway, protein complexes, and individual genes.

Co-expression resources accurately predict transcription factors of the ribosomally synthesized natural product AnAFP of A. niger
In order to provide experimental confirmation in predictions of biological processes gleaned from this coexpression resource, we interrogated all datasets associated with the gene encoding the A. niger antifungal peptide AnAFP. Ribosomally synthesized antifungal peptides of the AFP family are promising molecules for use in medical or agricultural applications to combat human-and plantpathogenic fungi (52). We and others could show that expression of their cognate genes are under tight temporal and spatial regulation in their native hosts and precedes asexual sporulation (34,(53)(54)(55).
The gene encoding AnAFP (An07g01320), is coexpressed with 986 genes ( ≥ |0.5|; 605 positively correlated / 381 negatively correlated (34)). GO enrichment analyses of positively correlated sub-networks uncovered that anafp gene expression parallels with fungal secondary metabolism, carbon limitation and autophagy (34). In total, 23 predicted transcription factors are co-expressed at a stringent level among which were the transcription factors VelC (An04g07320) and StuA (An05g00480; Figure 4), both of which are key regulators of asexual development and secondary metabolism in Aspergilli, whereby VelC is Analysis of anafp promoter activity in a strain deleted for velC (strain BBA17.9) and for stuA (strain BBA13.2), respectively, which was compared to the progenitor strain PK2.9 abbreviated as 'wt'. Note that the scale bar is different between B and C, and that StuA is thus likely a very strong repressor of anafp expression. Data are from liquid microtiter plate cultures, where a defined amount of biomass was used for inoculation and luciferase expression monitored online during cultivation. Expression levels are depicted for one representative example from three independent experiments. Each experiment was performed in quintuplicate.
known as an activator, and StuA is both activator or repressor (56)(57)(58). In order to confirm a regulatory function of these transcription factors on anafp expression, we used a reporter strain in which the anafp ORF has been replaced with a luciferase gene. Deletion of stuA or velC in this background revealed a strong increase or decreased/delayed activation of the anafp promoter, respectively ( Figure 4). Interestingly, the transcription factor binding site for VelC is unknown, whereas the binding sequence for StuA is absent from the predicted promoter region of anafp. These data thus indicate that the resources generated in this study enable accurate predictions of (in)direct regulatory proteins even in the absence of DNA binding sites.

Co-expression resources accurately predict transcription factors of non-ribosomally synthesized natural products of A. niger
The transcriptional activation of secondary metabolite (SM) gene clusters in different filamentous fungi is one current focus of the fungal research community (16) as >60% of currently approved clinical drugs are derived from natural products (59). A. niger stands out due to its exceptional high number of predicted SM gene clusters in its genome (n = 78), harboring 81 core enzymes in total, such as NRPS and PKS (39). However, only a dozen of SMs have been identified from A. niger so far (60). Our survey of the expression data of all gene clusters under the 155 cultivation conditions uncovered that the majority of SM core genes (53) are expressed in at least one condition (Supplementary File 2). The majority of expressed core genes are also co-expressed with their cluster members (Supplementary File 3). Notably, not all cluster members are co-expressed with contiguous transcription fac-tors. Indeed, only ∼30% of the gene clusters display coexpression with contiguous transcription factors (Supplementary File 3). We thus questioned which transcription factors are regulating these SM gene clusters, and used the co-expression dataset to assign biological processes to genes predicted to encode transcription factors. Given the important role of chromatin remodelers in activation and silencing of secondary metabolite clusters, we also interrogated genes predicted to encode histone deacetylases (61). This identified two ORFs encoding putative transcription factors: An07g07370 (TF1) and An12g07690 (TF2), and a histone deacetylase (An09g06520, HD) that are positively and negatively (TF1, TF2) or only negatively (HD) co-expressed with numerous core SM genes (Supplementary Table S5). Notably, all three genes do not reside in contiguous SM gene clusters but belong to a large SM sub-network consisting of 152 genes including 26 SM core genes (Supplementary  Table S6), whereby gene expression of TF1 and TF2 correlate very strongly ( = 0.87). Interrogation of enriched GO terms for both TF1 and TF2 gene sub-networks revealed enrichment of fatty acid metabolism, autophagy, mitochondria degradation (positively correlated) and maturation of rRNA and tRNA, ribosomal assembly and amino acid metabolism (negatively correlated). This analysis thus allowed us to select genes for in vivo functional studies based on a non-intuitive selection procedure. Neither TF1, TF2 nor HD have been experimentally characterized in fungi so far. In order to confirm a regulatory function of these putative regulators on SM core gene expression, we generated (i) single deletion strains for TF1, TF2 and HD, respectively, (ii) a double deletion strain for TF1 and TF2, and (iii) individual conditional overexpression mutants for TF1, TF2 and HD using the Tet-on system (31). The strongest effect  Table S3) during exponential and post-exponential growth phase. Note that the maximum growth rate was for all strains 0.24 h −1 except for TF1OE (0.19 h −1 ). (A) Differential gene expression for 81 predicted secondary metabolite core genes in mutant isolates relative to the progenitor control as determined by RNA-Seq analysis. (B) Assessment of metabolite abundance demonstrated differential abundance of hundreds of A. niger metabolites following deletion or over-expression of HD, TF1 and TF2. In total, 3,410 primary and secondary metabolites were detected in all runs, 1,260 of which were known compounds and 478 thereof were differentially expressed upon deletion or overexpression of these putative regulators (Supplementary File 4). (C) Relative abundance of the SM aurasperone B is shown in progenitor control and mutant isolates throughout bioreactor cultivation as an exemplar. on the metabolome profile of A. niger was observed during overexpression of TF1 and TF2 ( Figure 5). SMs upregulated under these conditions included, but were not limited to aurasperones, citreoviridin D, terrein, aspernigrin A, nigerazine A and B, pyranonigrin A and D, flavasperone, fonsecin, O-demethylfonsecin, flaviolin, funalenone; among the SMs down-regulated under these conditions were asperpyrone, L-agaridoxin and nummularine F (

DISCUSSION
In this study, we performed a transcriptomic meta-analysis to generate a high-quality gene co-expression network, and used this to predict biological processes for 9,579 (∼65%) of all A. niger genes including 2,970 hypothetical genes ( ≥ |0.5|). The compendium of resources developed in this work consists of (i) gene-specific sub-networks with two stringent Spearman cut-offs that ensure high confidence in biologically meaningful interpretations; (ii) statistically enriched GO terms for each co-expression network, and (iii) a refined list of co-expression relationships which incorporate over 1,200 experimentally characterized ORFs to aid predictions of gene biological process based on experimental evidence.
In order to demonstrate the utility of these resources, we firstly interrogated these datasets at a gene-level, demonstrating that transcription factors VelC and StuA, which are critical components of ascomycete development and secondary metabolism, also regulate expression of the A. niger antifungal peptide AnAFP. Our data provide further evidence of the coupling between development, biosynthesis of secondary metabolites, and secreted antifungal peptides (34). The datasets generated in this study can also be used to identify global regulators at the level of biological processes as demonstrated for fungal secondary metabolism and the two transcription factors TF1 and TF2 (which we name MjkA and MjkB, respectively). Previously, co-expression of contiguous genes has been used to determine the boundaries of secondary metabolite clusters in A. nidulans and other fungi (23)(24)(25). Our study can be viewed as complementary to such analyses, as MjkA and MjkB are physically located outside the boundaries of any putative secondary metabolite cluster, and as such would not be identified using previous approaches (23,25). Generation of lossand gain-of-function mutants demonstrated that they likely (in)directly regulate dozens of secondary metabolite loci at the transcript and metabolite level. Interestingly, MjkA is a Myb-like transcription factor highly conserved in Aspergilli with orthologues present in several plant genomes. Myb transcription factors have recently been demonstrated to regulate plant natural product biosynthesis (62), and our co-expression data and wet lab experiments suggest that titratable control of MjkA is a promising strategy for the activation of ascomycete secondary metabolism during drug discovery programs. With regards to the application of our co-expression approach to predict gene biological processes in other fungi, interrogation of the GEO database (33) demonstrates that several hundred global gene expression experiments are available for industrial cell factories (e.g. A. oryzae, Trichoderma reesei) and human or plant infecting fungi (e.g. A. fumigatus, Cryptococcus neoformans, Candida albicans, Magnaporthe oryzae), indicating that our approach can be broadly applied for industrial, medically, and agriculturally relevant fungi. As the financial costs for gene expression profiling continues to decline, this study paves the way for prediction of gene biological function us-ing co-expression network analyses throughout the fungal kingdom.

DATA AVAILABILITY
All datasets generated and/or analyzed during this study are available at FungiDB (http://fungidb.org/fungidb/). Spearman's correlation coefficients for gene co-expression sub-networks will also be made available by the corresponding author upon request. RNA seq data have been deposited at the Gene expression Omnibus under accession number GSE119311.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.