Multi-omics Analyses Provide Insight into the Biosynthesis Pathways of Fucoxanthin in Isochrysis galbana

Isochrysis galbana is considered an ideal bait for functional foods and nutraceuticals of humans because of its high fucoxanthin (Fx) content. However, multi-omics analysis of the regulatory networks for Fx biosynthesis in I. galbana has not been reported. In this study, we report a high-quality genome assembly of I. galbana LG007, which has a genome size of 92.73 Mb, with a contig N50 of 6.99 Mb and 14,900 protein-coding genes. Phylogenetic analysis confirmed the monophyly of Haptophyta, with I. galbana sister to Emiliania huxleyi and Chrysochromulina tobinii. Evolutionary analysis revealed an estimated divergence time between I. galbana and E. huxleyi of ∼ 133 million years ago. Gene family analysis indicated that lipid metabolism-related genes exhibited significant expansion, including IgPLMT, IgOAR1, and IgDEGS1. Metabolome analysis showed that the content of carotenoids in I. galbana cultured under green light for 7 days was higher than that under white light, and β-carotene was the main carotenoid, accounting for 79.09% of the total carotenoids. Comprehensive multi-omics analysis revealed that the content of β-carotene, antheraxanthin, zeaxanthin, and Fx was increased by green light induction, which was significantly correlated with the expression of IgMYB98, IgZDS, IgPDS, IgLHCX2, IgZEP, IgLCYb, and IgNSY. These findings contribute to the understanding of Fx biosynthesis and its regulation, providing a valuable reference for food and pharmaceutical applications.


Introduction
Fucoxanthin (Fx) is widely distributed in algae and some invertebrate cells, including Phaeodactylum tricornutum, Ectocarpus siliculosus, Thalassiosira pseudonana, Isochrysis galbana, and Nannochloropsis gaditana [1,2]. Fx can be assembled with chlorophyll into Fx-chlorophyll protein (FCP) with certain proteins, and FCP exists in thylakoid membranes of algae and acts as a light-capturing antenna [3]. Fx has excellent capabilities of blue-green light harvesting and photoprotection, which could help algae make full use of solar energy in different bands at different depths of seawater [3,4]. Fx, an oxygenated carotenoid, exhibits potential advantages with various pharmacological activities, including antiinflammatory, anti-tumor, anti-obesity, anti-oxidant, antidiabetic, anti-malarial, and anti-lipid effects [5]. However, the utilization of Fx as a nutraceutical in food and nutrient supplements is limited because of its low production level and poor stability. Microalgae are regarded as the most promising alternative Fx production algae with multiple biotechnological advantages, such as short growth cycle, easy handling, and large-scale artificial cultivation [6]. Compared with other microalgae, such as P. tricornutum and N. gaditana, I. galbana lacks cell walls and is much easier to be digested and handled, making it a good initial food source for the larvae of aquatic animals [7,8]. Moreover, I. galbana is a marine singlecell microalga rich in Fx (more than 10% of dry weight biomass) and lipid (7.0%-20% of dry weight biomass), which is considered an ideal material for the development of functional foods for humans [9]. Additionally, we found that the Fx content of I. galbana LG007 was the highest in different strains or species, which can be used as an ideal material for follow-up research ( Figure S1).
Although the process of Fx synthesis has not been fully elucidated, several studies have attempted to reveal genes or proteins involved in Fx biosynthesis [10,11]. Genes involved in Fx biosynthesis have been identified, including b-carotene, phytoene synthase (PYS), phytoene desaturase (PDS), 15-cis-fcarotene isomerase (ZISO), f-carotene desaturase (ZDS), carotenoid isomerase (CRTISO), and lycopene b-cyclase (LCYb) [10,11]. However, some enzymes participating in the final step of Fx biosynthesis have not been discovered [11]. There are two generally accepted hypotheses for the final step of Fx biosynthesis from violaxanthin to Fx in Fx-producing algae: 1) violaxanthin is a precursor of Fx, which is converted by phycoxanthin; and 2) neoxanthin is the precursor of Fx, which is formed by the ketonization of the neoxanthin and the acetylation of an intermediate [12,13]. Studies have focused on the key genes related to the Fx biosynthesis pathway, mainly involved in the expression of some genes in the Fx biosynthesis pathway by external inducing factors (light intensity, methyl jasmonate, and arachidonic acid). Zhang et al. revealed the change in Fx content and gene expression pattern of the Fx biosynthesis pathway under conditions of high irradiance stress in the diatom P. tricornutum, showing an evident linear relationship between Fx content and the expression levels of PYS and zeaxanthin epoxidase (ZEP) [14]. Yu et al. reported that the expression level of LCYb could be significantly increased by treatment with methyl jasmonate and arachidonic acid, and the content of Fx in P. tricornutum was significantly higher than that in the control group, which showed that LCYb played an important role in the Fx biosynthesis [15]. However, the aforementioned studies are mainly reflected in P. tricornutum, and few studies are on enzyme genes related to the Fx biosynthesis pathway of I. galbana. Because of limited genome information, how I. galbana regulates Fx biosynthesis at the DNA and RNA levels remains unclear. Draft genome sequences of I. galbana were generated in 2014 based on next generation sequencing. However, incomplete genome assemblies produced short contigs and scaffolds, causing problems for the follow-up research of I. galbana [16]. Additionally, high-quality genomic resources can enable the breeding of novel I. galbana strains with higher Fx content in industrial practice for commercial use. But up to now, a systematic analysis of the regulatory networks for Fx biosynthesis in I. galbana has not been performed using genome, transcriptome, and metabolome data according to our review of the literature.
In this study, we generated a high-quality genome assembly and annotation of I. galbana LG007 by using the thirdgeneration sequencing (PacBio Sequel platform). The highquality I. galbana LG007 genome provides a valuable resource for studying the evolutionary events and genomic characteristics of aquatic algae. A previous study revealed the influence of spectral intensity and quality of blue-green light on Fx content in Chlorophyceae, but it did not involve the role of green light as a single source for Fx biosynthesis [17]. Our previous results suggested that Fx content could be increased under green light conditions, which is a special simulating factor that occurs during the cultivation of I. galbana ( Figure S2). Transcriptomic and metabolomic analyses were performed on algae cells at different stages of cultivation (3,5,7, and 9 days) under different light-quality conditions (white and green) to reveal key genes or metabolic products that are potentially related to the accumulation and regulation of Fx biosynthesis.

Results
Assembly of a high-quality genome of I. galbana LG007 About 15.5 Gb of PacBio long reads and 8.92 Gb of Illumina clean reads were generated (Table S1). The total length of all reads assembled from the I. galbana LG007 genome contained 353 contigs was 92.59 Mb, with a contig N50 of 666.7 kb, GC content of 58.44%, and the longest contig length of 2.93 Mb ( Figure 1A and B, Figure S3; Table S2). The size of the assembled genome was close to that estimated by flow cytometry and 17-mer ( Figures S4 and S5). Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis of our present assembly showed that $ 83.8% of the plant orthologs were included in the assembled sequences (Table S3). Likewise, $ 98.4% of Illumina clean reads and $ 99.78% of PacBio long reads could be mapped to the genome, respectively (Tables S4 and S5). These metrics implied that the assembled genome is credible and can be used for subsequent analysis. Using the 3D-DNA and LACHESIS workflow, 98.22% (90.95 Mb) of the genome was successfully anchored onto 15 superscaffolds (Figures S6  and S7; Table S6). The scaffold N50 of the I. galbana LG007 genome after high-throughput chromatin conformation capture (Hi-C) assisted assembly reached 6.99 Mb (Table S7), generating a high-quality genome assembly for I. galbana.

Gene family expansion associated with lipid metabolism
A total of 179 single-copy homologous genes were identified among 15 genomes by using OrthoFinder (version 2.3.12) and were used to reconstruct a phylogenetic tree ( Figure 1C). Phylogenetic analysis showed that I. galbana diverged into the Haptophyta branch $ 133 million years ago (MYA) after the divergence of the Rhodophyta (1333 MYA), Chlorophytae (1460 MYA), and Bacillariophyta (1150 MYA). These results support the view that Emiliania huxleyi, I. galbana, and Chrysochromulina tobinii as monophyletic groups share a common Haptophyta ancestor [18][19][20]. Although E. huxleyi and I. galbana genomes appear to share only limited collinearity, this observation may stem from the lower quality of the E. huxleyi genome assembly ( Figure 1D). The results showed that I. galbana and E. huxleyi may have a close relationship and are sisters in coccolithophores, which is consistent with the findings of phylogenetic analysis. Comparative genomic analysis showed that gene family expansions outnumbered contractions in Raphidocelis subcapitata, Thalassiosira oceanica, and E. huxleyi. We also discovered 356 expanded and 2823 contracted gene families in I. galbana LG007. KEGG pathway analysis showed that the expanded gene families were specifically enriched in signal transduction, purine metabolism, lipid metabolism, and ABC transporters (Table S13). GO analysis showed that these expanded genes were related to signaling, metabolic processes, stimulus response, and catalytic activity (Table S14). Interestingly, lipid metabolism-related genes (IgPLMT, IgOAR1, and IgDEGS1) exhibited significant expansion, indicating that the expansion of these genes in I. galbana could enhance the regulation and biosynthesis of carotenoid, resulting in a high content of Fx in I. galbana. A total of 2823 contracted gene families highlighted the functions pertaining to signal transduction, starch metabolism, and biosynthesis of other secondary metabolites (Table S15). GO terms of the contracted genes were associated with binding, catalytic activity, transporter activity, transcription regulator activity, stimulus response, signaling, and biosynthesis of secondary metabolites (Table S16). Thus, it is likely that the contractions of secondary metabolite biosynthesis-and stimulus-related genes could affect the accumulation of other secondary metabolites and resistance in I. galbana, resulting in a relatively good photoprotection capabilities only in the blue-green light of seawater. A comparison of E. huxleyi, C. tobinii, P. tricornutum, Chlamydomonas reinhardtii, and I. galbana LG007 revealed that 2027 (31.37%) of the 12,387 I. galbana LG007 gene families were common to other four species, whereas 3135 gene families were specific to I. galbana LG007 (Figure 2A). GO enrichment analysis showed that the functions of these specific genes mainly included metabolic process, catalytic activity, biological regulation, stimulus response, developmental process, binding, pigmentation, molecular transducer activity, and transcription regulator activity (Table S17). KEGG enrichment analysis showed enrichment of the calcium signaling pathway, the cGMP-PKG signaling pathway, fatty acid biosynthesis, signal transduction, energy metabolism, and terpenoid metabolism (Table S18).
To infer the whole-genome duplication (WGD) event in I. galbana, we calculated the synonymous substitution rate (Ks) by a mixture model implemented in the R package. 2149, 1516, 7038, 1460, and 1623 genes were used to calculate the Ks value for C. reinhardtii, C. tobinii, E. huxleyi, IgFH as a candidate gene could play a key role in Fx biosynthesis of I. galbana The metabolic processes of violaxanthin, neoxanthin, diadinoxanthin, and Fx may be very complicated, which involved oxidation, isomerization, acetylation, deepoxidation, hydrogenation, and hydroxylation chemical reactions according to the structure of substrates or products. To efficiently identify candidate genes involved in Fx biosynthesis, we identified 39 hydroxylase genes by a combination of direct screening of the genome assembly annotations and conserved domain BLAST searches. One of the 39 hydroxylase genes catalyzed the hydroxylation of hydrophobic substrates, which has a similar chemical reaction according to the structure of diadinoxanthin, neoxanthin, and Fx, suggesting that the Fx hydroxylase gene (IgFH, IZ011859) might have a function similar to that of the characterized neoxanthin-Fx or diadinoxanthin-Fx as candidate genes ( Figure 2D and E, Figure S8). We could not detect Fx or new products after incubating recombinant IgFH proteins with diadinoxanthin by enzyme activity assay. However, overexpression and enzyme activity assays confirmed that neoxanthin could be further catalyzed by the IgFH proteins, implying that IgFH could play a key role in Fx biosynthesis ( Figure S9).
Two distinct interconversion cycles of zeaxanthin to violaxanthin [violaxanthin de-epoxidase (VDE) and ZEP] and diatoxanthin to diadinoxanthin [diadinoxanthin deepoxidase (DDE) and diatoxanthin epoxidase (DEP)] containing epoxidase and de-epoxidase were involved in the same type of catalytic reaction. We identified 6, 16, 4, and 1 epoxidase family proteins in I. galbana, Arabidopsis thaliana, P. tricornutum, and C. reinhardtii, respectively. The number of epoxidase gene proteins from I. galbana (6) was lower than that of A. thaliana (16), but was close to that of P. tricornutum (4) (Figures S10-S12). Additionally, we constructed a phylogenetic tree based on the identified deepoxidase proteins from 15 amino acid sequences of I. galbana (5), A. thaliana (4), P. tricornutum (5), and C. reinhardtii (1). The number of de-epoxidase gene families in I. galbana (5) was significantly higher than that in C. reinhardtii (1). We speculated that the presence of most epoxidase and de-epoxidase gene proteins could play a role in the Fx metabolic synthesis of I. galbana (Figures S13-S15). . The number of up-or down-regulated TF genes at different cultivation time points in the treated group is also given in the parenthesis. B. Percentage of up-and down-regulated genes in different TF gene families in the treated group on 7d. C. GO analysis of DEGs (biological process) at different cultivation time points in the control and treated groups. D. Metabolic pathways with differential expression profiles in treated group as compared with control group on 7d. DEGs between the treated group and control group on 7d were loaded into the MapMan software. Red and blue colors indicate high and low expression, respectively. TF, transcription factor; GO, Gene Ontology ; DEG, differentially expressed gene; C3d, control group on day 3; T3d, treated group on day 3; C5d, control group on day 5; T5d, treated group on day 5; C7d, control group on day 7; T7d, treated group on day 7; C9d, control group on day 9; T9d, treated groupon day 9; CHO, carbohydrate; TCA, tricarboxylic acid cycle; OPP, pentose phosphate pathway.
Fx-related genes of I. galbana showed high expression levels under green light condition Among the annotated genes, 12,093 (96.74%) genes were expressed in 24 samples. Some genes were highly expressed on day 7 (7d) with green light ( Figure S16), which has a similar trend to the phenotype of Fx content ( Figure S17). To explore the differentially expressed genes (DEGs) involved in the Fx biosynthesis, DEGs of pairwise comparisons between control and treatment were analyzed (e.g., control 3d vs. treated 3d, control 5d vs. treated 5d). The number of stage-specific genes [fragments per kilobase of exon per million fragments mapped (FPKM) ! 8000] varied from 646 to 807 for the control group and 448 to 802 for the treated group ( Figure S18). The number of stage-specific genes in the treatment group changed little on 7d, but decreased significantly on 9d. The number of stage-specific genes in the control group fluctuated on 7d and increased sharply on 9d ( Figure S18). These results indicate that 7 days would be an important period for Fx biosynthesis. GO enrichment analyses of these stage-specific genes between the control and treated groups showed a representation of genes associated with various biological regulations, carboxylic acid biosynthetic process, fatty acid metabolic process, stimulus response, and catalytic process ( Figure S19).
In total, 3730 genes exhibited significantly higher expression, and 4089 genes exhibited significantly lower expression at different stages in the treated group than in the control group ( Figure 3A). Among these comparisons, the downregulated and up-regulated expression between the control and treated groups was greatest on 5d and 7d, indicating a difference in the transcription levels at the 5d and 7d stages with green light irradiation. Some TFs also exhibited a significant difference between the control and treated groups. For example, the members of the MYB, homeobox (HB), and homeobox-leucine zipper (HD-ZIP) families involved in pigment accumulation and resistance stress showed significantly higher expression ( Figure 3B). Fatty acid elongation, steroid biosynthesis, signaling, stimulus response, and cell development were enriched in the DEGs, particularly at the 7d stage with green light irradiation ( Figure 3C). To explore the metabolic pathways responsible for the differences between the control and treated groups, we analyzed the expression profiles of DEGs using the MapMan tool. We found that the genes involved in lipid metabolism, light reactions, and pyruvate oxidation were more active in the treated group at the 7d stage, suggesting higher energy and more synthetic substrates for the metabolism of terpenoids and the b-carotene pathway ( Figure 3D). With regard to MYB proteins, 52 and 146 MYB proteins were identified in I. galbana and A. thaliana, respectively ( Figure 4A, Figure S20). R2R3-MYB TFs are related to the biosynthesis of pigment, suggesting a close relationship with the accumulation of Fx in I. galbana under light-induced conditions. Among the 114 TF genes with a relatively high expression, we found that IgMYB98 (IZ007092) encodes an R2R3-MYB TF, which is significantly down-regulated (P = 1.99EÀ9) in the synthesis of Fx and may be a key gene for negative regulation of Fx biosynthesis in I. galbana under light-induced conditions ( Figure 4B; Table S19).
Green light had a significant effect on the metabolism of carotenoids in I. galbana In order to understand the biosynthesis pathways of the Fx accumulation under different light qualities (white and green), we compared the carotenoids beween the control and treated groups on 7d (C7d vs. T7d). Fifteen carotenoids were identified in the comparison of C7d vs. T7d groups, including three types of carotenes, carotenoid esters, and xanthophylls ( Figure 4C; Table S20). Phytoene, f-carotene, neurosporene, lycopene, ccarotene, violaxanthin, and neoxanthin involved in carotenoid biosynthesis were not detected between the T7d and C7d groups, indicating that these types of carotenoids may be prone to degradation or rapid conversion in I. galbana. The main carotenoids that accumulated in the T7d group were bcarotene, echinenone, violaxanthin myristate, zeaxanthin, and b-cryptoxanthin, among which b-carotene was the main carotenoid, accounting for 79.09% of the total carotenoid. Echinenone had the second-highest content in the T7d and C7d groups, accounting for 8.65% and 7.17% of the total carotenoid content, respectively. Heatmap analysis revealed that the content of carotenoids in the T7d group was significantly higher than that in the C7d group, including b-carotene (1.64-fold increase), lutein-myristate (1.56-fold increase), bcryptoxanthin (1.29-fold increase), capsanthin (2.43-fold increase), and zeaxanthin (2.67-fold increase) ( Figure 4D, Figure S17B). We identified eight differentially accumulated carotenoids (DACs) in C7d vs. T7d, including seven up-regulated DACs and one down-regulated DACs. The number of upregulated DACs was much higher than that of downregulated DACs in the comparison of T7d vs. C7d, suggesting the abundant diversity of carotenoids present under green light. Notable increases in carotenoids from the C7d to T7d samples included those in e-carotene (2.28-fold increase, P = 0.008), violaxanthin-laurate (3.50-fold increase, P = 0.003), violaxanthin-myristate (2.58-fold increase, P = 0.008), antheraxanthin (2.86-fold increase, P = 0.02), capsanthin (2.56-fold increase, P = 0.001), zeaxanthin (2.67-fold increase, P = 0.002), and Fx (2.14-fold increase, P = 0.009) (Table S21). These results suggest that green light has a significant effect on the metabolism of carotenoids in I. galbana.

Gene co-expression network involved in Fx accumulation
To identify the hub genes, we performed weighted gene co-expression network analysis (WGCNA) for the control and treated groups on 7d, separately. A total of 25 modules (comprising 31-2830 genes) were identified in the control group, and 24 modules (comprising 30-3333 genes) were recognized in the treated group ( Figure 5A and B, Figure S21). Notably, the red co-expression module of the control group and turquoise co-expression module of the treated group showed a relatively high correlation (r ! 0.60) with Fx content ( Figure 5C and D). GO and KEGG pathway enrichment analyses of relatively higher correlation modules highlighted key DEGs and biological processes with Fx content (Figure 5E and F). For example, the GO and KEGG analyses showed that the red module of the control group included most of the genes involved in metabolic processes, stimulus response, biological regulation, biosynthetic process, biosynthesis of secondary metabolites, fatty acid biosynthesis, and carotenoid biosynthesis ( Figures S22 and S23). The turquoise module associated with green light irradiation for Fx content showed enrichment of GO terms and KEGG pathways related to biological processes, metabolic processes, biosynthetic processes, catabolism processes, metabolic pathways, biosynthesis of secondary metabolites, and carbon metabolism (Figures S24 and  S25). Next, we studied the preservation of co-expression modules between the control and treated groups ( Figure S26). We identified a midnight-blue module (35 genes) between the control and treated groups, and the genes of this module were enriched in metabolic processes, negative regulation of biological process, and stimulus response ( Figure S26C and D). Taken together, hub gene analysis identified f-carotene desat-urase (IgZDS, IZ006629), phytoene desaturase (IgPDS, IZ009969), and Fx-chlorophyll a (IgLHCX2, IZ013244) in the red, turquoise, and midnight-blue modules, which are involved in the biosynthetic pathway of b-carotene (Fx synthesis).
Accumulation of b-carotene, antheraxanthin, zeaxanthin, and Fx showed a strong correlation with the expression of genes related to carotenoid biosynthesis To explore the relationship between genes and metabolites involved in Fx synthesis under different light qualities (white   and green), the pathway of DEGs and DACs related to Fx was constructed ( Figure 6). Genes involved in the Fx biosynthesis pathway exhibited very high expression levels in the treated group on 7d according to the transcriptome data. Of these, nine DEGs were up-regulated in the T7d group compared to the C7d group, including IgPSY (IZ005980), IgPDS (IZ009969), IgZDS (IZ006629), IgLCYb (IZ013964), IgZEP (IZ006381), IgNSY (IZ013964), IgDDE (IZ004535), IgDEP (IZ003702), and IgFH (IZ011859), while IgCRTISO (IZ001261), IgCHYB (IZ010839), and IgVDE (IZ007819) were down-regulated. There were both band e-branches of carotenoid biosynthesis in the comparison of C7d vs. T7d, and an abundance of b-carotene as well as small amounts of antheraxanthin, zeaxanthin, and Fx. The content of one carotene (ecarotene), two carotenoid esters (violaxanthin-laurate and violaxanthin-myristate), and four xanthophylls (antheraxanthin, capsanthin, zeaxanthin, and Fx) increased, and the content of zeaxanthin-palmitate decreased with green light induction. Taken together, b-carotene, antheraxanthin, zeaxanthin, and Fx involved in Fx biosynthesis were found to be accumulated and up-regulated by green light induction, which showed a trend similar to that of IgPSY, IgLCYb, IgNSY, IgDDE, IgDEP, IgFH, IgMYB98, IgZDS, IgPDS, and IgLHCX2 ( Figure 6). The results showed that the upregulation of these genes in the T7d group led to the enhancement of the Fx biosynthesis pathway. Therefore, we hypothesized that green light can enhance Fx and b-carotene in the carotenoid pathway. Four unigenes (IgMYB98, IgZDS, IgPDS, and IgLHCX2) were selected for expression analysis ( Figure S27).

Discussion
We reported a 92.59-Mb high-quality genome assembly of I. galbana with the contig N50 and scaffold N50 both of 6.99 Mb. Moreover, the contig N50 of the assembled genome was 15. 68 folds higher than that of prior short-read assemblies (6.99 Mb vs. 419 kb) [16]. These results provide the foundation for studying the regulation of Fx accumulation in Haptophyta and potential applications for other Fx-producing algae. Phylogenetic and collinearity analyses showed that E. huxleyi, I. galbana and C. tobinii as monophyletic groups share a common Haptophyta ancestor. Comparative analysis provided evidence of the WGD event, which was dated at $ 245 MYA and earlier than the divergence time of E. huxleyi and I. galbana ($133 MYA). In I. galbana, most genes involved in metabolic regulation have a relatively conserved structure or function, and a few TF gene families have even expanded a subset of duplicates. For example, overexpression of MYB7 could activate the promoter of the lycopene-b-cyclase (AdLCY-b) gene in the carotenoid biosynthesis pathway of kiwifruit, altering the content of carotenoid and chlorophyll [21]. The MYB gene family (IgMYB98) maintained a higher number or expansion in Haptophyta and Bacillariophyta, and exhibited higher transcriptional activity, indicating that the latest algae rich in carotenoid linked to responses to water stress environmental stimuli exhibit lineage-specific gene expansions in environmental adaptation and metabolic regulation. Notably, lipid metabolism-related genes exhibit significant expansion, including IgPLMT, IgOAR1, and IgDEGS1. These results indicate that the expansion of lipid metabolism-related genes in I. galbana could enhance the regulation and biosynthesis of Fx.
Although Fx plays an important role as a photosensor of blue-green light and an effector of carotenoid-dependent, the metabolic pathway of Fx remains unclear, and there are many unknown steps in the process of violaxanthin to Fx [3,22]. We identified one domain from I. galbana genome by comparing with the hydroxylase function domain and predicted the IgFH gene, which is closely related to the chemical reaction according to the structure of diadinoxanthin and Fx, suggesting that it might function similarly to the characterized diadinoxanthin and Fx as a candidate gene. The increase in Fx content between the control and treated groups indicated that 7d with green light would be an important period for Fx biosynthesis. GO enrichment showed that these stage-specific genes in the control and treated groups were related to various biological regulation and fatty acid metabolic processes. Several TFs have been implicated in carotenoid accumulation; however, the members of MYB, bHLH, and HB families involved in pigment accumulation and resistance stress showed differential regulation response to the different light qualities [23][24][25]. We performed WGCNA analysis to identify gene modules. DEGs and TFs were significantly correlated with Fx synthesis. These results suggested that the identified TFs may be related to the accumulation and regulation of Fx production in I. galbana by the induction of green light. Transcrptiomic data suggest that transcriptional profiling and phenotypic data methods can be beneficial to identify the most promising candidate genes involved in the Fx biosynthesis.
Studies have shown that multiple key genes are related to the Fx biosynthesis pathway, including PYS, PDS, ZISO, ZDS, CRTISO, and LCYb [10,11,14]. Comprehensive analysis of multi-omic data helps reveal the underlying accumulation of carotenoids [26][27][28][29]. For example, Jia et al. revealed the molecular mechanism of white petal color in Brassica napus by metabolomic and transcriptomic analyses, mining several candidate genes involved in carotenoid biosynthesis (BnWRKY22 and BnNCED4b) [27]. Xia et al. found that DEGs and DACs involved in carotenoid biosynthesis were significantly upregulated and accumulated more in yellow flower petals than in the green bud petals and white flower petals, indicating a predominantly promotion function for color transition in Lonicera japonica [29]. Thirteen genes (PSY1, PSY2, PDS1, PDS2, ZDS, CYCB, LCYB1, LCYB2, LCYE, CHYB, LUT1, VDE, and ZEP) were related to the carotenoid biosynthesis, which were strongly correlated with the changes in lycopene, bcarotene, and b-cryptoxanthin, providing an insight into controlling fruit color in papaya fruit [28]. Up-or downregulated DEGs involved in the carotenoid biosynthetic pathway greatly affect the content of trans-b-carotene, trans-bcryptoxanthin, and 5,8-epoxy-b-carotene, resulting in a striking difference between peel and flesh tissue during on-tree loquat development [26]. Although most studies on the carotenoid biosynthesis have focused on the color transition of fruits and flower petals, combined metabolomic and transcriptomic analyses of carotenoid biosynthesis in I. galbana have not been reported yet. In this study, we identified 12 DEGs and 4 DACs involved in Fx biosynthesis by metabolomic and transcriptomic analyses. Notable increases in carotenoids involved in Fx biosynthesis in the T7d group compared to the C7d group included b-carotene, antheraxanthin, zeaxanthin and Fx, suggesting the abundant diversity of carotenoids present under green light. LCYb catalyzes the formation of b-carotene and its oxides from lycopene, which is a key step in the synthesis of bcarotene [28]. ZEP plays a key function in the xanthophyll cycle of plants, catalyzing the conversion of zeaxanthin to antheraxanthin and violaxanthin [30]. NXS catalyzes the conversion of the double-epoxidation precursor violaxanthin into lutein with equilibrated double bonds, representing the classic end of the formation of plant xanthophyll [31]. Therefore, we hypothesized that green light can accumulate b-carotene, zeaxanthin, and Fx by activating the xanthophyll cycle process in the Fx pathway. The results of genome and transcriptome indicate how the genome of I. galbana provides a useful model for studying the evolution of Fx-producing algae and the mechanism of Fx biosynthesis.

Conclusion
In summary, we report a high-quality genome of I. galbana LG007 by using the PacBio Sequel platform and Hi-C technology. Domain identification of a novel gene that encodes neoxanthin-Fx hydroxylase was analyzed. Fx content could be increased under green light condition, which is a special simulating factor that occurs during the cultivation of I. galbana. Metabolomic analysis indicated that the T7d group accumulated a higher content of carotenoids than that of the C7d group, and b-carotene was the main carotenoid, accounting for 79.09% of the total carotenoids. Multi-omic analysis revealed several DEGs (including TF genes) significantly correlated with the accumulation and regulation of Fx synthesis, including IgMYB98, IgZDS, IgPDS, and IgLHCX2. Therefore, our findings advance the understanding of Fx biosynthesis and its regulation, providing an important resource for food and pharmaceutical applications.

Materials and methods
Sample materials and genome sequencing I. galbana LG007 was separated from the near sea area of Chuanshi Island in Fujian and deposited with the Southern Institute of Oceanography, Fujian Normal University, China. The seawater used for culture was collected from the near sea area of Chuanshi Island in Fujian, with a salinity of 28‰. The algae were cultured in 100 ml f/2 medium and incubated at 23°C ± 1°C by shaking the bottle manually 6 times per day under continuous light of 100 lmol photonsÁm À2 Ás À1 with fluorescent lamps [32]. Genomic DNA was prepared using the cetyltrimethylammonium bromide (CTAB) method to construct the Pacbio and Illumina libraries. Concentrated DNA was applied to select size with BluePippin; they were repaired, tailed, adaptor-ligated, and used for library construction in accordance with the protocol released by PacBio. Next, $ 15.5 Gb of clean data were obtained from the PacBio sequencing, and used to estimate genome size.

Genomic size estimation
The BD FACSCalibur cytometer (Becton Dickinson, San Jose, CA) was used to estimate the genome size of I. galbana LG007, which was calculated as a ratio of the average fluorescence. We further used Illumina short reads and a K-mer-based method to estimate the genome size, heterozygosity, and repeat content of I. galbana LG007. Approximately 8.92 Gb of Illumina data were generated and used to calculate the abundance of 17-mers by GenomeScope software (version 2.0) (Table S1).

Genome assembly and completeness assessment
After removing the low-quality short reads and sequencing adaptors, the clean data were corrected, trimmed, and assembled by using Canu software with default parameters [33]. For improving the accuracy of base-pair correction, preliminary assembled contigs were polished by the BWA and Pilon software using $ 8.92 Gb Illumina data [34]. Summary statistics of the assembled genome are presented in Table S1. Assessment of the completeness of I. galbana LG007 genome was evaluated through BUSCO using eukaryotic models [35]. Illumina short reads and PacBio long reads were properly mapped to the genome via Bowtie2 [36] and Minimap2 [37], respectively.

Superscaffold construction using Hi-C technology
The nuclear integrity of samples was examined by 6diamidino-2-phenylindole (DAPI) staining to guarantee the quality of the Hi-C procedure [38,39]. After filtering adapter sequences and low-quality pair-end reads, $ 12.35 Gb of clean data were generated (Table S1). The Hi-C clean data were properly mapped to the I. galbana LG007 by BWA (version 0.7), and then erroneous mappings and duplicates were filtered by the Juicer pipeline [40]. The output of the Juicer pipeline was used for 3D-DNA analysis with default parameters, including misjoin correction, ordering, and orientation [41]. To ensure the accuracy of assembly, the assembled contigs combined with Hi-C data were ordered and clustered into the superscaffolds by using LACHESIS based on the relationships among valid reads [39], and then the invalid read pairs were filtered by HiC-Pro (version 2.7.8) [42].
Gene and repetitive sequence annotation LTR_FINDER, Tandem Repeats Finder, and RepeatMasker were used to identify the repeat sequences in the I. galbana LG007 genome, as previously described [43,44]. We then performed annotation of the I. galbana LG007 genome assembly by combining the homology-based search, de novo prediction, and transcriptome evidence. E. huxleyi, C. tobinii, P. tricornutum, E. siliculosus, and C. reinhardtii were selected to perform the homology annotation. We predicted the coding genes with MAKER pipeline (version 2.31.9) by using transcript sequences from RNA sequencing (RNA-seq) [45]. The protein-coding genes were compared to the content of egg-NOG, GO, COG, and KEGG by using BLASTP with an Evalue cutoff of 1EÀ5 [46][47][48]. Non-coding RNAs (ncRNAs) and small RNAs were identified by searching from the Rfam and microRNA (miRNA) databases, respectively [49]. In addition, other types of ncRNAs, including miRNA and snRNA, were predicted by alignment to the Pfam database using Infernal software.

Genome evolution analysis
Single-copy genes were identified among 15 genomes by using OrthoFinder and downloaded from the National Center for Biotechnology Information (NCBI) database, including E. huxleyi, C. tobinii, P. tricornutum, E. siliculosus, C. reinhardtii, Pennisetum purpureum, Chloropicon primus, Bathycoccus prasinos, Porphyra umbilicalis, Gracilariopsis chorda, Cyanidioschyzon merolae, R. subcapitata, T. pseudonana, and T. oceanica [50]. Based on the identified single-copy protein sequences, a phylogenetic tree was constructed by using RAxML software with P. purpureum, P. umbilicalis, G. chorda and C. merolae as the outgroup [50]. The divergence time of each tree node was calculated using the TimeTree database and the MCMCtree software. We used CAFÉ software (version 3.1) to identify the expansion and contraction of gene families with the criterion of P < 0.05 [51]. GO terms for genes were obtained from the corresponding InterPro or Pfam entries. KEGG terms were assigned to the KEGG pathway database (https://www.genome.jp/kegg). Enrichment analyses of KEGG pathways and GO terms were performed using the OmicShare tools (https://www.omicshare.com/tools).

Identification of candidate genes related to Fx pathway in I. galbana
The identification of orthologs of previously known functional genes in the Fx pathway was performed by combining the results of the genome assembly annotations, transcriptional expression levels, and conserved domain BLAST searches. The orthologs and previously known functional genes in the Fx pathway exhibited identity scores > 85%, suggesting that these genes are functionally similar and can be used as candidate genes involved in Fx biosynthesis. For gene family analysis related to Fx pathway in I. galbana, BLASTP and HMMER were used to search for homologous proteins of related gene families in I. galbana LG007 (E-value < 1EÀ10), and then homologous proteins were further confirmed using the NCBI conserved domain database tool [52]. The final deduced homologous protein sequences were aligned by using the ClustalW software [53]. RAxML software was used to construct a phylogenetic tree via the maximum-likelihood method with 1000 bootstrap iterations [54].

Analysis of WGD and gene synteny
For detecting the polyploidization events in the I. galbana LG007 genome, the protein sequences from I. galbana LG007 were intercompared to identify conserved paralogs by using BLASTP with an E-value 1EÀ5. E. huxleyi, C. tobinii, P. tricornutum, and C. reinhardtii were also analyzed and used for comparison. We identified the collinear blocks by using MCScanX and calculated the nonsynonymous substitution rate (Ka), Ks, and Ka/Ks values for syntenic gene pairs by using KaKs_Calculator software (version 2.0) [55,56]. Syntenic blocks between I. galbana LG007, E. huxleyi and C. tobinii were identified by using MCScanX [55].

RNA-seq
Our previous results suggested that the green light could promote Fx synthesis at the 7d stage (P < 0.05, 14.06% higher) ( Figure S2). To investigate the transcriptome dynamics and response of Fx accumulation under different light qualities in I.galbana LG007, we performed transcriptomic analysis of the simulated cells under white and green light conditions at different stages of cultivation. I. galbana LG007 was cultured in 100 ml f/2 medium and incubated at 23°C ± 1°C by shaking the bottle manually 4-6 times per day under continuous light of 100 lmol photonsÁm À2 Ás À1 with a 12 h:12 h light:dark cycle [33]. The culture (1 Â 10 6 cells/ml) was evenly divided into eight groups and cultured in a spectrum-adjustable plant growth box (Catalog No. AKF-KYG04-600DZ, Anhui Ancorgreen Optoelectronics Technology, Hefei, China) for 3 days, 5 days, 7 days, and 9 days, respectively ( Figure S16A). Four treated groups (T3d, T5d, T7d, and T9d) were treated with green light irradiation 100 lmol photonsÁm À2 Ás À1 [green light source: LED circular lamp beads (Catalog No. SZG05A0A, Seoul Semiconductor, Siheung-si, Korea); spectrum: 525 nm], and four control groups (C3d, C5d, C7d, and C9d) with a white light of 100 lmol photonsÁm À2 Ás À1 [white light source: LED circular lamp beads (Catalog No. LH351H-D, Samsung LED, Tianjin, China); spectral range: 400-700 nm] were used as controls. Total RNA from each sample was extracted using a TransZol Up Plus RNA Kit (Catalog No. ER501-01, TransGen Biotech, Beijing, China), and the corresponding cDNA library was constructed for RNA-seq.

Gene expression analysis
Approximately 307.77 Gb of high-quality transcript data were produced and processed by Trimmomatic (version 0.36). The high-quality filtered reads were mapped onto the genome by using HISAT2 with the default parameters [57,58]. FPKM values were calculated using Stringtie and Ballgown [59,60]. DEGs between the control and treated groups were analyzed using DESeq2 based on criteria of fold change ! 1 and false discovery rate 0.05 [61], followed by KEGG and GO enrichment analyses. The ratio of each sample to the genome is more than 90%, and the number of reads per sample was estimated to range from 32,792,958 to 59,111,858 (Tables S22 and S23).

Metabolite profiling and statistical analysis
To explore the metabolites of I. galbana LG007 under white and green light conditions, we collected C7d and T7d samples with three biological replicates. Stock solution of Fx was prepared by dissolving 0.5 mg Fx in 50 ml methanol solution. Stock solutions of Fx standard were gradient diluted as follows: 5 lg/ml, 10 lg/ml, 20 lg/ml, 50 lg/ml, and 100 lg/ml. Fx production was detected according to the linear relationship between the peak areas of the samples and a standard curve (R 2 = 0.999). According to the aforementioned method of I.galbana LG007 fermentation, 10 ml of the fermentation mixture was centrifuged at 8000 r/min for 20 min, followed by removal of the supernatant and washing with distilled water for three times. After vacuum freeze-drying, 1 ml of acetone was added to the freeze-dried algae to extract the total carotenoids. The supernatant was harvested by centrifugation (8000 r/min, 15 min) and filtered (0.25-lm filter membrane). The supernatant was analyzed by high-performance liquid chromatography (HPLC) using a Waters e2695 Liquid