Identification of 74 cytochrome P450 genes and co-localized cytochrome P450 genes of the CYP2K, CYP5A, and CYP46A subfamilies in the mangrove killifish Kryptolebias marmoratus

The mangrove killifish Kryptolebias marmoratus is the only vertebrate that reproduces by self-fertilizing and is an important model species in genetics and marine ecotoxicology. Using whole-genome and transcriptome sequences, we identified all members of the cytochrome P450 (CYP) family in this model teleost and compared them with those of other teleosts. A total of 74 cytochrome P450 genes and one pseudogene were identified in K. marmoratus. Phylogenetic analysis indicated that the CYP genes in clan 2 were most expanded, while synteny analysis with other species showed orthologous relationships of CYP subfamilies among teleosts. In addition to the CYP2K expansions, five tandem duplicated gene copies of CYP5A were observed. These features were unique to K. marmoratus. These results shed a light on CYP gene evolution, particularly the co-localized CYP2K, CYP5A, and CYP46A subfamilies in fish. Future studies of CYP expression could identify specific endogenous and exogenous environmental factors that triggered the evolution of tandem CYP duplication in K. marmoratus.


Background
Cytochrome P450 (CYP) enzymes are heme-containing proteins that play critical roles in the metabolism of endogenous substrates (e.g., hormones and vitamins) and in the detoxification of xenobiotics (e.g., drugs and environmental pollutants) [1][2][3][4][5]. Together, the CYPs constitute one of the most diverse gene families. Different species, even closely related ones, can have different numbers of CYP genes [6,7]. The CYP genes are hierarchically classified at three distinct levels into subfamilies, families, and clans based on their amino acid sequence similarity, phylogenetic relationships, and syntenic relationships [6][7][8]. Molecular phylogenetic studies have identified ten CYP clans and 19 families in vertebrates [6,7,9]. CYP genes in families 1 to 4 are mainly related to xenobiotic metabolism and are more diverse than the other CYPs, with less sequence conservation [10,11]. In contrast, CYP genes in families 5 to 51 mainly have endogenous functions. Many studies of CYP genes in families 1 to 4 have focused on ecotoxicological model species, including teleosts [1,3,12]. Zebrafish and Japanese medaka are the teleosts most commonly used to study the mechanistic action of CYPs in response to chemical compounds. These model organisms have shown that CYPs alert the organism to the presence of carcinogenic and hormonal disruptive substances in aquatic ecosystems [13].
Kryptolebias marmoratus is the only vertebrate that reproduces by self-fertilization. K. marmoratus is a useful laboratory fish for studying molecular ecotoxicology because it is only 3-5 cm long, its life cycle is just 12-16 weeks, and it is easily maintained in aquaria [20]. As an ecotoxicological model species in which the entire genome has been sequenced [21][22][23], it has provided a platform for assessing the impact of various chemicals on the marine environment. In a previous study, nine CYP genes co-localized on a scaffold were identified and their spatio-temporal expression patterns in response to various endocrine-disrupting chemicals (EDCs) were analyzed (e.g., benzo[α]pyrene, bisphenol A, octylphenol, and nonlyphenol) [24]. In this study, we identified and annotated the full complement of 74 CYP genes in K. marmoratus. We also analyzed the co-localized CYP2K, CYP5A, and CYP46A subfamilies and characterized their structural features.

Homology of CYP genes in other fish
Molecular phylogenetic analysis based on the inferred amino acid sequences was used to characterize the relationship of K. marmoratus CYP genes with CYP genes in other intensively studied fish species such as zebrafish (D. rerio), Japanese medaka (Oryzias latipes), and fugu (F. rubripes) (Fig. 2). The phylogenetic tree indicated that the clan structure was robust among these fish species with the CYP genes in clan 2 showing the most expanded pattern in K. marmoratus (Fig. 2). Compared with the zebrafish CYP genes, the K. marmoratus CYP genes were arranged into similar subfamilies, with the exception that CYP39, CYP2AA, and CYP2AE were lost in K. marmoratus (Fig. 3). For the CYP1, CYP17, CYP19, CYP20, CYP21, and CYP46 families, the gene members and their structures in K. marmoratus were similar to those in zebrafish but with different degrees of sequence similarity. Each CYP2R1 and CYP2U1 subfamily has a single CYP gene consisting of five exons. These genes can be considered to be orthologs of CYP2R1 and CYP2U1 in humans and in other fish [12,15,25]. CYP1A, CYP1B, CYP2U, and CYP2R appear to be evolutionarily conserved across species. In K. marmoratus, the CYP26 family consists of CYP26A1, CYP26B1, and CYP26C1, as shown in zebrafish. In both species, CYP26A1 and CYP26C1 showed similar gene structures. While zebrafish CYP26B1 has six exons, K. marmoratus CYP26B1 has seven exons. This difference is because the 3rd exon in zebrafish is split into two exons, thus forming the 3rd and 4th exons in K. marmoratus. The CYP2 family is largest in K. marmoratus and consists of 32 genes in nine subfamilies. The nine genes (CYP2N22, CYP2N23, CYP2AD12, CYP2AD-iso, CYP2P16, CYP2P17, CYP2P18, CYP2P19, and CYP2P20) in the three CYP2 families are homologous to human CYP2J2 because phylogenetic analysis grouped them together into a clade with the zebrafish CYP2 subfamilies (CYP2N, CYP2P, CYP2V, CYP2AD, and CYP2AE) (Additional file 2: Figure S2). All nine genes have been reported to be located in tandem on a scaffold (NW_016094248) and to share synteny with 11 zebrafish genes [24]. Four CYP2X genes are present in two separate scaffolds. The CYP2X subfamily showed a different gene structure from other members in the CYP2 family in this species with the exceptions of CYP2R1 and CYP2U1. Gene members in CYP2X have 11 exons instead of 9 (Table 1), because the 5th and 7th exons are split into two exons each. CYP2X25 is located on scaffold NW_016096522, while the other three CYP2Xs (CYP2X27, CYP2X24, and CYP2X26) are located in tandem on scaffold NW_016094701 (Fig. 1). Based on their sequence identity and the phylogenetic analysis results, we predicted that these four genes would be on the same scaffold. While the best mapping position of CYP2X25 was on scaffold NW_016096522, the 2nd best location was the same area of CYP2X26. This finding is likely because the two proteins share 86% amino acid sequence similarity and the genes share 90% nucleotide sequence identity. Considering that the gaps in the area spanning the CYPX26 gene on scaffolds NW_016094701 and NW_016096522 were relatively short, we suspected that an assembly error had occurred in the region. In order to confirm whether this was assembly errors or not, we mapped four CYP2X genes onto the published genome scaffolds of another killifish strain with the higher number of contigs [22]. Unfortunately, only two CYP2X genes (CYP2X24 and CYP2X25) were mapped onto one scaffold. However, CYP2X25, which was isolated in this study, was mapped to one scaffold with one of four genes together and the scaffold was mapped back onto the  Based on this analysis, this isolation of CYP2X25 is more likely due to the assembly error, instead of the translocation.

Tandem duplicated CYP genes
Similar to the CYP evolution patterns in other animals, tandem duplication of a number of CYP genes was observed in the K. marmoratus genome. Of 74 CYP genes from K. marmoratus, we examined the region of tandem duplicated CYP genes to investigate the duplicated pattern in the genome. Eight scaffolds contained more than two copies of tandem duplicated CYP genes, five of which had CYP genes with more than four copies (Fig. 1). Of CYP2K subfamily, ten CYP2K genes (CYP2K39,  (Figs. 1 and 4). Synteny analysis revealed that zebrafish have eight CYP2K genes clustered in a homologous region (116 kb), whereas T. rubripes and O. latipes have only two copies of CYP2K genes in the 9-kb and 10kb regions, respectively (Fig. 4a). Four CYP2K genes comprise another cluster on scaffold NW_016094341 (Fig. 4). Phylogenetic analysis of CYP2K genes in fish (with human genes as the outgroup) showed that the four CYP2K genes are similar to medaka-CYP2KP29 and medaka-CYP2K30, which are located on chromosome 24 (Fig. 5). Synteny analysis of this region did not identify homologous genes outside the clusters for any species (Fig. 4). In addition, the CYP5A tandem genes and the CYP46A tandem genes were clustered in scaffolds NW_016094285 and NW_016094252, respectively (Fig. 1). While zebrafish has only one gene in the CYP5A subfamily, K. marmoratus has five copies of CYP5A genes (5A1, 5A2, 5A3, 5A4, and 5A6). These copies were also arrayed in tandem on scaffold NW_016094285 ( Figs. 1 and 6a). Synteny analysis showed homology with zebrafish chromosome 18 (Fig. 4b). In the CYP46A subfamily, CYP46A1, CYP46A2, CYP46A4, and CYP46A5 also showed tandem duplication on scaffold NW_016094252 in the K. marmoratus genome (Fig. 6b). This region seemed to share synteny with D. rerio chromosome 20, Japanese medaka chromosome 24, and Fugu chromosome 16 (Fig. 6b), although some gene order mismatches in both K. marmoratus and D. rerio were observed, compared with pufferfish and Japanese medaka.
Considering the presence of a big gap (~170 kb) between bcl-11 and CYP46A1 in K. marmoratus, we also suspected the assembly error in this region. However, comparing with the genome assembly by Kelley et al. [22], the gene order in K. marmoratus in both assemblies was consistent. In pufferfish and Japanese medaka, two copies of CYP46A-like tandem genes were surrounded by the genes, ccdc85cb and ism2b, in the synteny region. It seemed that CYP46As and neighboring genes, including ccdc85cb, CCNK, and bcl-11, were inverted in the area with an additional duplication of CYP46A copies, which was uncertain if the tandem duplication occurred before or after the inversion. Thus, based on the synteny analysis of the zebrafish, gene duplication probably has occurred prior to the inversion, although zebrafish seems to have small difference in the evolutionary repertoires in this region.

Comparison of CYP subfamilies in teleosts
Using whole genome sequences and RNA-seq data, we identified a full complement of CYP genes in the K. marmoratus genome. K. marmoratus has a total of 74 CYP genes in 17 families within 10 clans. Ten clans and 19 families have been reported in vertebrate animals [6,7,9]. Among the 19 CYP families of vertebrates, we did not identify the CYP39 or CYP16 family in K. marmoratus. CYP39 families have recently been identified in teleost fish. Before this discovery, the CYP39 family was thought to be unique to mammals or to have arisen in the tetrapod lineage after it diverged from fish [8]. Goldstone et al. [12] reported the presence of CYP39 genes in zebrafish. However, CYP39 genes were not found in other published fish genomes, including Fugu. K. marmoratus does not have the CYP16 family. This family was lost in mammals and is also absent from zebrafish. Out of all published fish genomes, CYP16 was reported only in Fugu [15].

Gene expansion by lineage-specific duplication
While CYP genes are commonly expanded by tandem duplication [6,15,[26][27][28], the basic mechanisms by which a certain gene is selected for such duplication remain unclear. We predominantly focused on comparing the K. marmoratus CYP genes with the zebrafish CYP genes because the two species have similar total numbers of CYP genes and the homology of their CYP genes with all human CYP genes is known (Fig. 3). Phylogenetic and synteny analyses revealed lineage-specific duplication of many CYP genes, which was apparent in some tandem duplications of CYP genes. Among the eight genomic regions where tandemly duplicated CYP genes were located in the K. marmoratus genome, five subfamilies (CYP2P, CYP2AD, CYP2K, CYP5A, CYP8B, and CYP46A) in the Fig. 3 Comparison of cytochrome P450 subfamily member homologies among humans, zebrafish, and K. marmoratus. Image is modified from Nelson (2003) four regions showed lineage-specific duplication (Figs. 1 and 2). Although the gene members in the subfamilies were duplicated in a lineage-specific manner with different copy numbers, the syntenies (including the tandem duplicated genes) were the same between the two species ( Fig. 3) [24]. Specifically, CYP46As in K. marmoratus and zebrafish showed strong homology within gene members and gene structures, albeit with different degrees of sequence similarity, compared to other subfamilies with the same syntenies. However, we note that gene order in the K. marmoratus CYP46As synteny is different, suggesting that both species appear to have undergone evolutionary events independently after the tandem duplication of CYP46A. CYP46A1 has been identified in many species, including teleosts, and plays an important role in cholesterol turnover in the central nervous system in vertebrates [29]. In humans, CYP46A1 functions as a cholesterol 24(S)-hydroxylase and a 24-hydroxy-cholesterol-hydroxylase [29][30][31]. Although mutations in CYP46A1 have been associated with neurodegenerative diseases such as Alzheimer's and Huntington's disease in humans [32][33][34][35], the function of CYP46A1 in teleosts has not been studied. Ten CYP2Ks on scaffold NW_016094323 belong to the subfamily that shows the highest level of lineage-specific tandem duplication in K. marmoratus, while four CYP2Ks on another scaffold do not seem to be duplicated in a lineage-specific manner and share synteny with those of zebrafish ( Figs. 1 and 4).

Kryptolebias marmoratus-specific gene expansion
Cytochrome P450 enzymes have two main functions: metabolism of endogenous molecules and detoxification of xenobiotic compounds. Phylogenetic studies have suggested that CYP genes, which are responsible for the endogenous functions, are stable across animal species and that copy expansion is rare [11]. In contrast, CYP genes related to xenobiotic metabolism have been shown to be phylogenetically unstable with a relatively high rate of birth-death evolution [11,36,37]. Within this context, the most apparent gene expansion due to lineagespecific tandem duplication in K. marmoratus occurred in two CYP subfamilies, CYP2K and CYP5A. Similar to what has been observed in other teleost species, CYP2K was the most expanded subfamily in K. marmoratus (Fig. 4). Since CYP2Ks are highly expanded in teleosts and the members in CYP2K vary across species, the functions of CYP2K genes have received comparatively little attention. CYP2Ks share synteny with human CYP2W1, a tumor-specific CYP that oxidizes indole and chlorzoxazone [38][39][40]. Rainbow trout CYP2K1 and zebrafish CYP2K6 show an orthologous relationship and both metabolize aflatoxin B 1 (AFB 1 ) to exo-8,9-AFB 1 epoxide, which is carcinogenic. However, their metabolic features differ somewhat, as only rainbow trout CYP2K1 can metabolize lauric acid [13,41]. Based on the clan identity of CYP2K, the expansion by high level tandem duplication may have resulted from the diversity of exogenous xenobiotic substrates. Thus, rapid evolutionary selection could have favored tandem duplication as a means of coping with xenobiotic stress.
Kryptolebias marmoratus have five copies (CYP5A1, CYP5A2, CYP5A3, CYP5A4, and CYP5A6) of CYP5A subfamily members, while other teleosts including zebrafish, pufferfish, and channel catfish maintain the subfamily with a single gene copy [8,12,15]. CYP5A1 (thromboxane A2 synthase) catalyzes the conversion of prostaglandin H2 into thromboxane A2 and has been associated with human cardiovascular disease related to platelet aggregation [42]. Rather than metabolizing xenobiotics, CYP5A1 seems to be primarily involved in endogenous functions. Considering that genes involved in conserved endogenous functions are rarely expanded, the K. marmoratusspecific expansion of CYP5A is an interesting finding. Gene duplication and subsequent divergence of the duplicated copies are basic mechanisms by which gene subfamilies are formed and are considered essential sources of genetic complexity and evolutionary change [43][44][45]. Gene expansion by tandem duplication leading to gene clusters appears to be an important mechanism by which these needs are met for cytochrome P450 in various species. Analysis of the expression profiles of the CYP genes expanded specifically in K. marmoratus could generate insight into the endogenous and exogenous environmental factors driving CYP evolution.

Fish rearing
Kryptolebias marmoratus mangrove killifish were reared at the aquarium facility of Sungkwunkwan University (Suwon, South Korea). The fish were maintained in an automated flow-through system with constant water quality (pH 8.0 and 15 practical salinity units [psu]) at 25°C under a 12/12-h light/dark cycle. The fish were maintained in glass aquaria (20 L capacity). Each aquarium accommodated 40 fish larvae (length ≈ 1.0 ± 0.2 cm, approximately 7 days post-hatching [dph]). Fish were fed with Artemia spp. brine shrimp (<24 h after hatching) once per day.

Genome-wide identification of CYP genes
The assembled K. marmoratus whole genome (ASM164957v1) and transcriptome (SRX1765072) sequences have been published [23]. Using CYP gene sequences in other teleosts including zebrafish (D. rerio), Japanese medaka (O. latipes), and pufferfish (F. rubripes) (Additional file 3: Table S1), we searched for putative CYP sequences in the K. marmoratus genome. BLAST analysis of coding sequences was performed to confirm the sequence similarities. All CYP gene sequences were obtained by performing BLASTp searches of the fully assembled transcripts against the nonredundant (NR) NCBI database. A significant hit was defined as a hit with an E-value ≤10 −5 . The putative CYP coding sequences from K. marmoratus were translated into amino acids; further annotation was carried out by Prof. David R. Nelson (University of Tennessee Health Science Center) and Dr. Gared V. Goldstone (Woods Hole Oceanographic Institution). Gene structure was identified by comparing sequences between the genome scaffolds and transcriptomes. Synteny analysis was carried out by comparing the CYP gene clusters in K. marmoratus with those of Japanese medaka (O. latipes), pufferfish (T. rubripes), and zebrafish (D. rerio). Data were collected from the published chromosome assembly information at Ensemble (https://www.ensem bl.org/index.html) with further identification.

Phylogenetic analysis
The entire amino acid sequences encoded by the CYP genes of zebrafish (D. rerio) (Dr-CYPs) and Japanese medaka (O. latipes) (Ol-CYPs) were retrieved from GenBank (Additional file 3: Table S1). Multiple alignments of amino acid sequences from K. mamoratus, Japanese medaka, and zebrafish were performed with Clustal algorithm [46]. To establish a best-fit substitution model for phylogenetic analysis, the model showing the lowest score according to the Bayesian information criterion (BIC) [47] and the Akaike information criterion (AICc) [48,49] was determined by maximum likelihood (ML) analysis. According to the results of the model test, the LG + γ + I model was chosen to generate a phylogenetic tree using MEGA6 software (Center for Evolutionary Medicine and Informatics, Tempe, AZ, USA) [50]. For phylogenetic analysis, fulllength protein sequences were aligned and a phylogenetic tree was obtained as described above with an additional bootstrapping test (1000 replicates) [51]. Phylogeny data were deposited in the Treebase repository with the accession number 22004.

Conclusions
In this study, we identified and annotated the full complement of 74 CYP genes in K. marmoratus. We also analyzed the co-localized CYP2K, CYP5A, and CYP46A subfamilies and characterized their structural features.

Additional files
Additional file 1: Figure S1. Diagram of the process of identification of the CYP2K38pseudo gene. (DOC 4051 kb) Additional file 2: Figure S2. Phylogenetic analysis of CYP genes in various fish species (marine medaka, pufferfish, stickleback, mangrove killifish) and human. (DOC 573 kb) Additional file 3: Table S1. Accession numbers of genes used for synteny and phylogenetic analysis. (DOCX 19 kb) Abbreviations CYP: Cytochrome P450