Comparative transcriptomics of anal fin pigmentation patterns in cichlid fishes

Understanding the genetic basis of novel traits is a central topic in evolutionary biology. Two novel pigmentation phenotypes, egg-spots and blotches, emerged during the rapid diversification of East African cichlid fishes. Egg-spots are circular pigmentation markings on the anal fins of hundreds of derived haplochromine cichlids species, whereas blotches are patches of conspicuous anal fin pigmentation with ill-defined boundaries that occur in few species that belong to basal cichlid lineages. Both traits play an important role in the breeding behavior of this group of fishes. Knowledge about the origin, homology and underlying genetics of these pigmentation traits is sparse. Here, we present a comparative transcriptomic and differential gene expression analysis of egg-spots and blotches. We first conducted an RNA sequencing experiment where we compared egg-spot tissue with the remaining portion of egg-spot-free fin tissue using six individuals of Astatotilapia burtoni. We identified 1229 differentially expressed genes between the two tissue types. We then showed that rates of evolution of these genes are higher than average estimated on whole transcriptome data. Using quantitative real-time PCR, we found that 29 out of a subset of 46 differentially expressed genes showed an analogous expression pattern in another haplochromine species’ egg-spots, Cynotilapia pulpican, strongly suggesting that these genes are involved in the egg-spot phenotype. Among these are the previously identified egg-spot gene fhl2a, two known patterning genes (hoxC12a and bmp3) as well as other pigmentation related genes such as asip. Finally, we analyzed the expression patterns of the same gene subset in two species that feature blotches instead of egg-spots, one haplochromine species (Pseudocrenilabrus philander) and one ectodine species (Callochromis macrops), revealing that the expression patterns in blotches and egg-spots are rather distinct. We identified several candidate genes that will serve as an important and useful resource for future research on the emergence and diversification of cichlid fishes’ egg-spots. Only a limited degree of conservation of gene expression patterns was detected between the egg-spots of the derived haplochromines and blotches from ancestral haplochromines, as well as between the two types of blotches, suggesting an independent origin of these traits.


Background
Animal pigmentation patterns are highly variable phenotypes both at the intra-and inter-specific level, and represent prominent traits to study the genetics of species diversification and adaptation (reviewed in [1][2][3]). The functionality of color patterns can readily be assessed in most cases, given that these traits often evolve in response to adaptation to the environment via natural selection (e.g. inter-and intra-specific communication, camouflage and mimicry), or co-vary with female choice via sexual selection [4][5][6]. The outcome of these two types of selection regimes can be different, with the former often producing cryptic phenotypes, where coloration mimics the environment, while the latter generates conspicuous phenotypes, where males typically display bright colors driving female choice or malemale competition [4][5][6]. Despite the high evolutionary significance of color patterns, the genetic mechanisms underlying their formation and diversification often remain elusive [1][2][3].
Recent work in fish model systems, especially in zebrafish, has started to uncover the genes and cellular processes involved in pigmentation pattern formation [7][8][9]. Pigmentation patterns are determined by the specification of different types of neural crest derived pigment cellsthe chromatophores [10] that contain different light absorbing pigments: melanophores contain black eumelanin pigments; erythrophores and xantophores contain yellow-red carotenoid and pteridine pigments; cyanophores contain a blue pigment of unknown composition; and finally, iridophores contain purine crystals that produce metallic iridescence [11]. Differences in the arrangement, position, and density of these cells leads to the diversity of color patterns present in nature. These differences depend on a variety of factors including neural crest cell migration, specification, proliferation, and survival [7][8][9]11].
In this study, we address the molecular basis of two novel and conspicuous pigmentation traits found in the anal fin of male cichlid fishesegg-spots and blotches (Fig. 1). Egg-spots represent an evolutionary novelty that emerged only once in the haplochromine lineage, the most species-rich group of East African cichlids [12,13]. These circular markings consist of a central circular area containing xanthophores and iridophores, surrounded by an outer transparent ring [14,15]. They are primarily found in males and show an extreme inter-and intraspecific variability in number, color, and position on the fin [13][14][15][16]. Egg-spots have been the subject of intense studies suggesting a signaling function in the peculiar mating behavior of the mouth-brooding haplochromines. They are likely sexually selected via female choice in some species [17,18] and via male-male competition in others [19][20][21]. Blotches, on the other hand, are patches of conspicuous anal fin pigmentation with ill-defined boundaries and occur only in a handful of cichlid species, including some basal haplochromines [13][14][15] and ectodine cichlids from Lake Tanganyika (Fig. 1). As with egg-spots, they are mostly found in males and their function might also be linked to courtship behavior, although this has been less extensively studied [12]. The origin and evolutionary trajectory of these anal fin patterns remains unclear. Due to the phylogenetic position of the species showing blotches as sister-group to the egg-spot bearing haplochromines [13][14][15], it might be speculated that egg-spots are derived from the blotch-pattern, which would make the two phenotypes homologous.
Convergent evolution is widespread in East African cichlid adaptive radiations, not only between lakes [22,23], but also within a single lake [24]. For example, haplochromine anal fin blotches are phenotypically similar to the ones found in the genus Callochromis (Fig. 1). However, the phylogenetic position of Callochromis, which is nested within the Ectodini [25], suggests that these two types of blotches evolved independently. Overall, we envision two possible scenarios for the origin of egg-spots: in one case they represent a derived state of blotches found in haplochromines, whereas blotches found in ectodines evolved independently (two origins); alternatively egg-spots have evolved independently from the blotches of both basal haplochromines and ectodines (three origins).
Understanding the genetic pathways underlying these pigmentation phenotypes can help us to distinguish between these scenarios. While several studies have addressed pigmentation diversity in East African cichlids, little is known about the genetics underlying their Fig. 1 Representative males from the four species analyzed: two haplochromine species displaying egg-spots in their anal fins (A. burtoni and C. pulpican), a basal haplochromine species (P. philander) and an ectodine species (C. macrops), both showing orange blotches in their anal fin coloration and pigmentation patterning, and only a handful of genes have been studied in detail. Among these genes is hagoromo, which shows a greater diversity of alternatively spliced variants and accelerated protein evolution in the haplochromines compared to other cichlids [26,27]; paired box 7 (pax7), on the other hand, was shown to be linked to a haplochromine female biased pigmentation phenotype [28]. Three genes have so far been associated with the egg-spot phenotype: the xanthophore marker colony stimulating factor 1 receptor A (csf1ra), and the two four and a half lim domain 2 proteins (fhl2a and fhl2b). csf1ra is expressed in haplochromine egg-spots and in the characteristic "Perlfleckmuster" (pearly spotted) pattern present in cichlid fins. This gene underwent adaptive sequence evolution in the ancestral lineage of the haplochromines coinciding with the emergence of egg-spots [14]. However, csf1ra is downstream in the pathway of egg-spot morphogenesis. More recently, we have shown that fhl2a and fhl2b are more causally related to egg-spot development and that an alteration in the cis-regulatory region of fhl2b could have contributed to the emergence of this trait in haplochromines in the first place [15].
In this study, we first addressed the question of the genetic basis of the egg-spots. We then went onto use comparative transcriptomics across species carrying eggspots and blotches to shed light on the origin of this novel trait. Specifically, we identified a total of 1229 genes that were differentially expressed (DE) between egg-spot and non-egg-spot fin tissues in the haplochromine cichlid Astatotilapia burtoni. These genes are evolving at a higher rate than average making this a valuable dataset to study the emergence and rapid diversification of this trait. For a subset of 46 DE genes we measured expression levels in three other species: the egg-spot bearing haplochromine Cynotilapia pulpican, carrying egg-spots on a different region of the anal fin than A. burtoni, and two blotch-bearing species, the basal haplochromine Pseudocrenilabrus philander and the ectodine Callochromis macrops. The rationale is that if egg-spots and blotches in haplochromines are controlled by the same genetic components they might show similar expression profiles.
A total of 29 out of 46 genes were found to be DE in C. pulpican. By comparing the expression in two haplochromine species with different egg-spot arrangements, we confirmed that the expression of the genes is correlated with the presence of egg-spots (irrespective of their position on the anal fin), whilst excluding potential positional genes and therefore confirming their involvement in egg-spots formation. Both types of blotches showed very distinct expression profiles from the egg-spots, and substantial differences in gene expression were also found between the two types of blotches. A similar gene expression profile between the egg-spots of derived haplochromines and the blotch pattern in the basal haplochromine P. philander would be indicative of a common origin for both traits, whereas similar expression profiles between the haplochromine egg-spots and the blotch of C. macrops would suggest that convergent evolution of this trait involved the same genetic pathways. Our study reveals the opposite for the genes under investigation, i.e. egg-spots and blotches show different expression profiles and also the two types of blotches differ in gene expression profiles, suggesting that egg-spots and blotches do not share a genetic basis and that convergent phenotypic evolution does not correspond to parallelism at the genetic level.

Results and discussion
Transcript profile in anal fin and egg-spot tissue In order to identify genes involved in egg-spot morphogenesis we quantified differences in gene expression patterns between egg-spots and the surrounding nonpigmented anal fin of six Astatotilapia burtoni males (Fig. 1). Illumina RNAseq (RNA sequencing) provided a total of 193,054,988 high quality reads from the six eggspot tissue samples and 194,099,061 reads from anal fin tissue samples of the same individuals. The replicates for each tissue were sequenced separately and the average number of reads per sample was 3,226,2837.42 (2,750,960.2-3,226,2837.42). We mapped the reads from each replicate to a reference A. burtoni embryonic library, which is a transcript collection from several different embryonic and larval developmental stages, and therefore probably the most comprehensive available representation of the entire gene set from A. burtoni [29]. In total we identified 1229 genes that were DE between the two types of tissues, with 620 genes being over-expressed in the egg-spot tissue, whilst 609 were under-expressed ( Table 1). The DE transcripts, their identification using tBLASTx and BLASTx searches (against the NCBI non-redundant database [30]), together with the respective expression levels, are provided in Additional file 1. A first inspection of those DE genes between egg-spot and non-egg-spot tissue revealed that our experiment retrieved many genes with a known function in pigment formation and patterning in different model organisms including paired box 7 (pax7), endothelin receptor b1 (ednrb1), microphthalmia-associated transcription factor a (mitfa), Agouti signaling protein 1 (asip1), sex determining region Y box 10 (sox10) and anaplastic lymphoma receptor tyrosine kinase (alk) [31], suggesting that our strategy is a valid approach to identify candidate genes for egg-spot morphogenesis.

Functional annotation of the DE genes
The reference A. burtoni transcriptome was annotated by performing a BLASTx search against NCBI's Danio rerio protein database [30]. From the 1229 DE genes, 58.6 % (720) had significant BLAST hits against the database (annotated datasets can be found in Additional file 2), while 41.4 % (509) of the DE contigs were nonidentified. From the 720 contigs with a BLAST hit we could functionally annotate 495 using BLAST2GO [32]. We further described the Gene Ontology (GO) term composition for egg-spot over-expression and egg-spot under-expression in comparison to the reference transcriptome GO representation (Fig. 2). Overall, the GO terms representation was similar between the two tissues. However, there were several GO terms for "Molecular function" and "Cellular component" that differed significantly between the two data-sets, suggesting, as expected, that the two tissues are functionally different (Fig. 2).
To narrow down the list of relevant GO terms, and to use them as a tool to find candidates, we used a twosided Fisher's exact test (false discovery rate (FDR) <0.05) to determine which functional GO categories were enriched in the genes over-expressed in the egg-spot in comparison to the total embryonic transcriptome. Five categories were significantly enriched in our overexpression gene dataset: 'Pigmentation' (GO:0043473), 'Developmental pigmentation' (GO:0048066), 'G-protein coupled peptide receptor activity' (GO:0008528), 'Peptide receptor activity' (GO:0001653) and 'Cell adhesion molecule binding' (GO:0050839) (Fig. 3). These are GO functional categories known to play a role in the development of pigmentation patterns. Neural crest cells are precursors of pigment cells and migrate from their original location to the anal fin where they will form the egg-spots [33][34][35], therefore genes playing a role in cell migration, cell adhesion and pigmentation development are relevant to the formation of this trait. Egg-spot formation relies on pigment production, which in turn is often activated via membrane receptor activity [36][37][38]. In Table 2 we present the list of genes belonging to these enriched functional categories that are potentially good candidates for egg-spot morphogenesis. The genes belonging to the GO term 'Developmental pigmentation' were overlapping with the ones included in the 'Pigmentation' category and the same is true for the two receptor GO term categories, therefore we only show three of the five enriched functional GO categories. This method of functional description of a gene dataset to extract candidates represents a supervised search, meaning that we might bias our findings towards what is already known. We note, however, that there are many other non-described genes, or known genes with incomplete GO term annotations, which could play a role in egg-spot morphogenesis.

Potential lineage specific genes are DE in the egg-spot
How novel traits emerge and are modified is one of the many unresolved problems in evolutionary biology [39][40][41]. It has long been advocated that new traits can emerge via the co-option of conserved regulators [42]. More recently, however, evidence is accumulating that new, i.e. lineage specific, genes can also play an important role in the development of novel traits [43][44][45]. Around 41 % of our candidate contigs did not have a BLAST hit against the D. rerio protein database. This could be due to the incompleteness of this database or to the lack of homologs in this species. To control for these factors we performed BLASTx and tBLASTx searches against the NCBI non-redundant (nr) protein and nucleotide databases [30]. Around 15.5 % (191/ 1229) of the DE contigs could not be assigned to a specific gene present in either nr database (Additional file 1). The contigs without positive BLAST hits could represent non-coding RNAs, partial sequences of known genes that could not be identified, or lineage specific genes (new or fast evolving genes) [46]. These results add to previous work on comparative transcriptomics of East African cichlids reporting that only 51 % of the total transcriptomes of the species studied (A. burtoni and Ophthalmotilapia ventralis) have hits on the NCBI nr nucleotide database [46]. In our case, the reduction in percentage of non-identified contigs is, most probably, due to the recent availability of five cichlid genomes [29].
It has previously been shown that lineage specific genes might play a role in the emergence and development of novel traits. In cnidarians 15 % of the transcripts expressed in a phylum specific cell type are lineage-specific, though the functional role of these transcripts was not tested [45]. The relative contribution of novel genes to the evolution of new morphologies, when compared to the co-option of conserved genes, is still under debate and further studies are needed to clarify their role on the evolution of such traits. Therefore, it would be interesting to identify the unknown DE transcripts and assess their role in the development and evolution of egg-spots.

Rates of evolution of the egg-spot DE genes
Changes in gene function can result either from modification in a cis-regulatory element that changes gene expression pattern and timing, and/or from a modification in the protein sequence that alters its function [47][48][49][50]. To test for protein sequence evolution in the egg-spot DE genes we calculated the rates of evolution in the form of dN/dS (ratio of non-synonymous substitutions over synonymous substitutions) of this gene dataset and compared the values obtained with a previously Fig. 3 Enrichment of functional GO terms in the egg-spot over-expressed genes (yellow bar) when compared to the total transcriptome of A. burtoni (blue bar). Those were calculated with a two-tailed Fisher exact test (FDR < 0.05) published dataset that estimated transcriptome-wide dN/dS values between cichlid species [46]. We were able to estimate dN/dS values (averages across species pairwise dN/dS) for 196 out of the 1229 contigs (see Additional file 1). As expected, the majority of the genes were under purifying selection (dN/dS < 1) and there was no significant difference in the rates of evolution between the over and under-expressed genes (Fig. 4). However, for both the over-and under-expressed genes, the average dN/dS values were significantly higher than those of the entire transcriptome (Fisher's exact test, pvalue <0.05), which means that, on average, the genes that are DE between the egg-spot and the anal fin are evolving at a faster rate. The haplochromine egg-spot is a male ornamental trait and, hence, most likely under sexual selection, either directly via female choice or via male-male competition [17][18][19][20][21]. Our results thus provide support to the general finding that genes underlying sexually selected traits evolve more rapidly [51][52][53][54].
We found seven genes to be under positive selection (dN/dS > 1), four of which were over-expressed in the egg-spot tissue (Table 3). Among them there are genes that play a role in neural crest differentiation (tenascin) and in cell migration (tenascin, mucin and family with sequence similarity 110c (fam110c)), which are important processes in pigmentation development [55][56][57][58]. The other genes have no a priori functional link with egg-spot formation. Nonetheless, due to their difference in expression and their signature of adaptive sequence evolution, they should be considered as good candidates and their functional roles in egg-spot development should be tested in the future.

Comparative gene expression via quantitative real time PCR
To confirm the results obtained via RNAseq, we examined a subset of 46 of the 1229 DE genes and tested their expression in egg-spot versus non-egg-spot tissue via quantitative real-time PCR (qPCR) in a second haplochromine species with a different egg-spot arrangement on the anal fin, Cynotilapia pulpican from Lake Malawi (Fig. 1). Half of these genes were over-expressed and half under-expressed in the egg-spot (Tables 4 and 5, respectively). These candidate genes were chosen randomly across the spectrum of the different levels of expression (from 1.3 to 5 fold differences in gene expression). Under-expressed genes were included as they might be acting as pigmentation inhibitors, thus preventing the appearance of egg-spots in other regions of the anal fin when over-expressed. Overall, there was no obvious trend with respect to functional GO categories associated with the top DE genes (see Additional file 2).
Note that six out of the 46 candidates remained unidentified after tBLASTx searches against a non-redundant NCBI database. While the egg-spots of A. burtoni are located in the proximal region of the anal fin, C. pulpican has its eggspots in the distal region of the anal fin. By measuring the expression of these genes in this species, we effectively control for positional effects in gene expression along the proximal-distal axis.
We also aimed to determine whether egg-spots and blotches share a conserved gene expression profile, which would indicate a common origin of these two traits. We thus tested if the candidate genes identified in A. burtoni had similar expression levels in the blotches of a basal haplochromine species (Pseudocrenilabrus philander) and in the blotches of a member of a distinct cichlid tribe, an ectodine species (Callochromis macrops), where this trait has likely evolved independently.

Comparative gene expression in haplochromine egg-spots
The qPCR gene expression analysis in the second haplochromine species revealed that 14 of the 23 genes that were over-expressed in the egg-spots of A. burtoni showed a similar expression pattern in C. pulpican (Fig. 5a), suggesting they are egg-spot specific and not simply involved in fin patterning. Among them are the previously identified egg-spot gene fhl2a [15], two transcription factors well known for their involvement in patterning and cell fate specification (homeobox C12a (hoxC12a) and heart and neural crest derivatives expressed 2 (hand2)), and an important growth morphogen (bone morphogenetic protein 3b (bmp3b)) [59][60][61]. The detection of fhl2a, in particular, suggests that our results are robust, since the gene was recently shown to be over-expressed across egg-spot development [15]. Included in the list are five of the unidentified contigs.
The remaining nine genes that were over-expressed in the egg-spots of A. burtoni either showed no difference in expression (4) or were under-expressed (5) in the eggspots of C. pulpican (Fig. 5a). These genes are most likely involved in fin rather than egg-spot patterning, as suggested by the fact that three of these of genes are known to participate in fin development (retinol binding protein 7 (rbp7), retinol binding protein 4 (rbp4) and insulin-like growth factor 1 (igf1)) [62][63][64]. Overall, we confirmed the over-expression of 14 genes in the adult egg-spots from both A. burtoni and C. pulpican making them strong candidates genes for egg-spot formation that deserve further investigation. Among the 23 under-expressed genes in A. burtoni, 15 were also consistently under-expressed in the egg-spots of C. pulpican (Fig. 5b), including one unidentified contig. Again, this suggests that these genes are egg-spot related. Among them is aristaless 3 (Axl3), a gene belonging to the homeobox gene family, known for its patterning effects [65]. Axl3 displays the highest expression differences among all genes (under-and over-expression included) and might putatively represent an inhibitor of the pigmentation/egg-spot pattern, although no role in pigmentation has been reported yet. The remaining eight genes showed no differences in gene expression between egg-spot and anal fin tissue on C. pulpican, and could therefore be involved in fin patterning. Thus far, none of these eight genes have been related to a function in pigmentation.
We cannot rule out that the genes that did not show the same pattern in both species do not have a function in egg-spots. Although egg-spots in A. burtoni and C. pulpican are homologous they do not necessarily have to share the exact same genetic network. It is thus possible that the DE genes might be responsible for interspecific differences of the egg-spot phenotype acting in a lineage-specific manner as has been shown in other taxa. For instance, the eyespots (concentric wing pigmentation patterns) of nymphalid butterflies, which are arranged along the distal half of the wing, are considered homologous [43,66]. Nevertheless, there is a great flexibility in the expression patterns of four genes involved in the development of these structures in the different species studied: antennapedia was the only gene where there was a gain of expression associated with the origin of the eyespot phenotype, whereas there were many gain or loss events for notch, distalless and spalt in the different species [67]. Overall, the genetic network underlying the nymphalid eyespot pattern appears to be highly variable, suggesting that homologous structures are not necessarily controlled by the same set of genes. Perhaps the same is true for cichlid egg-spots, which might initially have been under the control of the same set of genes followed by diversification in the recruitment of different genes. A broader phylogenetic sampling of egg-spot phenotypes would be necessary to clarify this question. The 29 genes that were consistently over-or underexpressed in the adult egg-spots in both haplochromine species are nevertheless strong candidates genes for egg-spot development and merit further investigation to understand their role in the origin and diversification of this trait. These genes should be studied in detail throughout development and their function should be tested, not only in one species but also across several species of egg-spot bearing haplochromines with variable egg-spot phenotypes. With this approach we will be able to distinguish between a functional role in the evolution of the trait and or merely a function in the development and/or physiology of the trait.

Comparative gene expression between egg-spots and haplochromine blotches
We then measured gene expression of our set of 46 candidate genes in a basal haplochromine species, Pseudocrenilabrus philander, which displays a blotch rather than an egg-spot on its anal fin (Fig. 1). It is not known whether the blotches found in basal haplochromines are ancestral to the egg-spots found in 'modern haplochomines' [13,25]. Homology inferences are typically made according to shared phenotypic criteria between traits and also according to parallelism at the developmental and genetic level [68] Therefore, if egg-spots and blotches are homologous we might expect that the gene expression patterns in both traits are, at least, partially conserved.
According to our results, haplochromine blotch and egg-spots differ substantially in their expression profiles (Fig. 5a, b). None of the 14 genes that were overexpressed in both A. burtoni and C. pulpican egg-spots were over-expressed in the blotch of P. philander (Fig. 5a), and only four of the 15 genes under-expressed in the two modern haplochromines were also underexpressed in P. philander (Fig. 5b). Although not conclusive, the poorly conserved expression pattern between the two traits suggests that the haplochromines' egg-spots and the blotches have emerged independently within the Haplochromini lineage. These results have to be taken with caution, though, as haplochromine egg-spots could have evolved from Fig. 5 Gene expression results for 46 DE genes as measured by qPCR. qPCR was performed for C. pulpican, P. philander and C. macrops (Relative position of the egg-spot/blotch on the fin are shown on top of each panel). Expression of these genes was quantified in the egg-spots and blotches relative to the anal fin tissue. Blue box denotes over-expression, red denotes under-expression and grey denotes no significant difference. Instances where it was not possible to measure gene expression are colored white with NA. ***: p < 0.001, **: p < 0.01, *: p < 0.05, • p < 0.1 (for more details please see Additional files 4, 5 and 6). a Results for egg-spot over-expression dataset (Table 4). In the first column are the RNAseq results for A. burtoni. In the second, third and fourth column are the results for C. pulpican, P. philander and C. macrops respectively. b Results for egg-spot under-expression dataset ( Table 5). Details of the statistical analyses used are found in Additional file 4 (P.pulpican), Additional file 5 (P. philander) and Additional file 6 (C. macrops) c Distance tree calculated using the gene expression results (over-expression, under-expression and no difference of expression) as characters blotches by up-regulation of different effector genes within the same genetic network. This has been observed in Drosophila, where the phenotypically diverse wing pigmentation patterns are controlled by the key regulator distalless (dll) [49]. The emergence of this wing spot phenotype was brought by the evolution of regulatory links between dll and multiple downstream pigmentation genes, which resulted in their up-regulation in the wing [49].

Comparative gene expression between eggs-spots, haplochromine and ectodine blotches
The blotch phenotype evolved more than once and is also found in some ectodine cichlids from Lake Tanganyika [12]. Ectodine anal fin blotches are similar to the ones found in basal haplochromines (Fig. 1), but apparently have an independent origin [25]. Although nonhomologous, ectodine blotches might still share the same genetic network with both haplochromine eggspots and blotches, as has previously been shown for other convergent traits [69].
In this study, we measured gene expression of our set of 46 candidate genes in the blotch of Callochromis macrops. Our gene expression assays revealed that only four of the genes that were over-expressed in A. burtoni and C. pulpican egg-spots were also over-expressed in the blotch of C. macrops (Fig. 5a). They encode transcription factors (cat eye syndrome critical region 5 (cecr5)), co-factors (fhl2a) [70], cytoskeleton components and kinases (a-kinase anchoring protein 2 (akap2)) and a non-identified transcript. These genes could be related to the pigmentation patterning or production of pigment in all three species. Furthermore, C. macrops also shares with A. burtoni and C. pulpican four genes that are consistently under-expressed in both species (Fig. 5b). One gene (vitronectin [71]) was over-expressed in C. macrops blotch and A. burtoni egg-spots, but not in C. pulpican egg-spots. These two species (A. burtoni and C. macrops) have in common that their egg-spots and blotches, respectively, contain orange pigments, while the egg-spot of C. pulpican is yellow. These genes might therefore correlate with patterning or production of orange pigment, although no such role has been previously described.
The comparison of expression profiles between the blotch bearing P. philander and C. macrops revealed that the underlying gene expression patterns are different indicating that there is probably no parallel evolution at the genetic level determining the phenotypic resemblance of the blotches. Curiously there are six genes that are under-expressed in the A. burtoni egg-spots that show no difference in expression in C. pulpican, but are over-expressed in blotches of both P. philander and C. macrops. The expression pattern of those six genes could be correlated to the blotch phenotype, but the most probable explanation is that they are involved in fin morphogenesis, since the non-pigmented region of A. burtoni matches the pigmented one in the two species with blotches.

Gene expression clustering
To determine the relationship between the pigmented anal fin tissues (egg-spots and blotches), we coded the gene expression results of the 46 genes in the four different species into a matrix of discrete data points (0under-expression, 1no difference, 2over-expression) and constructed a distance genealogy (Fig. 5c). The resulting tree diagram shows a clear separation between eggspot and blotch phenotype. The different species clearly cluster by gene expression phenotype (bootstrap of 100 %) and the observed similarities do not correspond to the species phylogeny (Fig. 5c, Table 6). The character distance matrix also shows that of the two blotches, C. macrops blotch is more similar to the haplochromine eggspots in terms of gene expression (Fig. 5c, Table 6). Our results suggest that egg-spots, haplocromine blotches and ectodine blotches are not regulated by the same genetic components.
Overall our results suggest that haplochromine eggspots, haplochromine blotches and ectodine blotches are novel pigmentation traits that evolved independently by re-using a limited number of common genes ( Fig. 5 and Table 6). The genes in common seem to be related to the cellular composition of the trait, which is re-used every time a new pigmentation pattern emerges, and not with the pigmentation pattern per se. Therefore, a thorough comparison of the different fin phenotypes should be done to assess what are the cellular components of each of the pigmentation phenotypes to better understand and interpret the gene expression underlying it.
These homology inferences have to be taken with caution, as we have only studied a subset of candidate genes (46/1229) derived from the egg-spot versus non-egg-spot tissue transcriptomic comparison in A. burtoni. An indepth comparison of the blotch tissue will certainly require comparative transcriptomics in the blotched species.

Conclusions and future perspectives
Understanding the genetic and molecular basis of both evolutionary innovation and phenotypic variation is a major challenge in evolutionary biology. Using nextgeneration sequencing we here present a transcriptional survey of egg-spot tissue in the haplochromine cichlid Astatotilapia burtoni. This collection of DE transcripts represents the largest set of egg-spot candidate genes available and will greatly contribute to the understanding of the genetics underlying this trait. We provide a list of 1229 genes that are DE between egg-spots and non-eggspots fin tissues, many of which are fast evolving genes that might be involved in the genetic network determining the egg-spot phenotype.
A closer look at the expression profiles of 46 of the DE genes shows that the expression profiles are not conserved between egg-spots and blotches, which suggests that haplochromine egg-spots, haplochromine blotches and ectodine blotches do not share the same genetic basis. This result indicates that these traits emerged independently in the evolution of this group of fishes. It has been hypothesized that egg-spots are modifications of the "Perlfleckmuster" (pearly spot) pattern that is present in fins of many cichlid species [12,14]. In the future it will be interesting to determine if the same genes that underlie the egg-spots of haplochromines are also expressed in the "Perlfleckmuster".
With our current approach, we identified 29 genes whose expression patterns are egg-spot specific in two distinct cichlid species, strongly pointing to a role in the formation of this trait. These genes definitely deserve further investigation; in particular, their expression dynamics should be examined during egg-spot development and their function should be assessed with transgenic experiments, now available for cichlids [72]. The functional characterization of these genes during egg-spot development and in a broader phylogenetic context will inform us about the origin and diversification of this innovation in the most species rich vertebrate lineagethe haplochromine cichlid fishesthus leading to major advances in the understanding of the emergence and diversification of novel traits.

Samples
Astatotilapia burtoni and Cynotilapia pulpican bred laboratory strains were kept at the University of Basel (Switzerland) under standard conditions (12 h light/12 h dark; 26°C, pH7). All individuals were euthanized with MS222 (Sigma-Aldrich, USA), following approved procedures (permit number 2317 issued by the Basel cantonal veterinary office) before tissue dissections. Callochromis macrops individuals were captured at Lake Tanganyika, Mpulungu (Zambia), P. philander were captured in a river near Mpulungu (both under a research permit issued by the Department of Fisheries, Republic of Zambia). Dissections were carried out in situ, tissues were stored in RNAlater (Ambion, USA) and shipped to the University of Basel.

RNA extractions
Isolation of RNA was performed using TRIzol® (Invitrogen, USA). All dissected tissues were incubated in 750 μl of TRIzol and left at 4°C overnight (or 8-16 hours). The tissues were homogenized with a BeadBeater (Fas-tPrep-24; MP, Biomedicals, USA). Extractions proceeded according to manufacturer's instructions and DNase treatment was performed with DNA-Free™ (Ambion, USA). RNA quantity and quality was determined with a NanoDrop 1000 spectrophotometer (Thermo Scientific, USA). cDNA was synthetized using the High Capacity RNA-to-cDNA kit (Applied Biosystems, USA).

Differential gene expression analysis using RNAseq -Illumina
The anal fins of six Astatotilapia burtoni male juveniles were dissected and RNA was extracted from egg-spot and anal fin tissue for each individual. One microgram of RNA per sample was sent for library construction and Illumina sequencing at the Department of Biosystems Science and Engineering (D-BSSE), University of Basel and ETH Zurich. Samples were run in two lanes of an Illumina Genome Analyzer IIx (maximum read length was 50 base pairs (bp)).
The reads from each sample were mapped against a reference A. burtoni embryonic transcriptome that contains 171,136 reference transcripts. We mapped the reads from each library against the reference transcriptome using Bowtie2 as aligner [73] and RSEM (RNA-Seq by Expectation-Maximization) [74] as the method to estimate gene abundance. The individual RSEM files were concatenated into one single dataset and analyzed using the Bioconductor R package EdgeR [75]. Transcripts that had less than one count per million in one of the samples were discarded. We tested for differential expression between egg-spot and anal fin samples, using anal fin as reference. Since the samples were paired (each replicate of the egg-spot and anal fin belong to one individual fish), we included the individual information in the statistical model. For that we used a negative binomial generalized linear model (GLM) based on common dispersion using the individual as the blocking factor, i.e. we tested for consistent differences in expression between egg-spot and anal fin within individuals. Transcripts were considered as DE if, after correction for multiple testing, the false discovery rate (FDR) was lower than 0.05 [76].

Functional annotation of differential expressed transcripts
Gene ontology (GO) [77] annotation of the differential expressed transcripts was conducted with Blast2GO version 2.5.0 [32]. BLASTx searches were done against the Danio rerio protein database using a threshold of e −5 and maximum number of hits of 20. These GO terms were used to estimate transcript function. A table with the list of the differential expressed transcripts, their respective values of expression, and their GO terms is provided in Additional file 1. Between dataset differences in the proportion of genes for individual level 2 GO terms were tested by means of chi-squared tests with p-values adjusted for multiple tests using Bonferroni corrections [78]. The enrichment of functional GO terms in the eggspot over-expressed gene dataset was calculated with a two-sided Fisher's exact test with a FDR of 0.05.

Rates of evolution for the differential expressed transcripts
Transcriptome data from the five available cichlid species (Pundamila nyererei, Neolamprologus brichardi, Oreochromis niloticus, Maylandia zebra, and Astatotilapia burtoni) were downloaded from Broad Institute [29]. Each species' transcriptome consisted of multiple libraries that were concatenated. The 1229 DE genes from A. burtoni were compared using a BLASTn search (threshold: e −50 ) against each species' transcriptome and DE genes with a hit in all cichlid species were retained (599). The 599 DE genes were then compared using BLASTx (threshold: e −20 ) against the tilapia (Oreochromis niloticus) proteome from the ENSEMBL database and corresponding coding sequences (cds) retrieved (378). Finally, the database of 378 tilapia cds was queried against the individual cichlid transcriptomes using BLASTn (threshold: e −35 ). BLAST outputs were parsed and filtered to retain hits with identity >90 %, length >200 bp and bit score >200. We obtained 298 tilapia cds that have at least a hit on all cichlid transcriptomes. A concatenated fasta file was built to include the ten top hits from each cichlid transcriptome and the 298 tilapia cds. Sequences were then aligned using MAFFT v7.245 [79] with einsi -adjustdirection options (einsi is suitable for sequences containing large unalignable regions, as expected with the presence of UTRs (untranslated regions) and splicing variants in our transcriptome data). Alignments were trimmed using the tilapia cds as a reference and visually inspected. Alignments with paralogous sequences resulting from recent duplications were discarded. Within each individual alignment a consensus was built across transcripts from each cichlid species with 'cons' from EMBOSS [80] (−plurality 1.5, indicating the cut-off for the number of positive matches below which there is no consensus). Alignments were then translated to proteins and checked for all sequences being in the corresponding tilapia reading frame (no stop codons). The whole pipeline was run with customized R and Unix scripts. We obtained 196 good alignments, 74 % of which comprised of all five cichlid species sequences, while the remaining included at least three species each. Average alignment length was 1716 bp, ranging from 270 to 7794 bp. Alignments are available from the author upon request. dN/dS estimates were calculated using the script kaks.pl in Bioperl [81] which computes the dN/dS for all sequence pairs, using the Nei-Gojobori method [82].

Gene expression analysis using qPCR
The expression of 46 genes (23 over-expressed genes in the egg-spot region and 23 under-expressed genes in the egg-spot) was further studied in three other species -Cynotilapia pulpican, Pseudocrenilabrus philander and Callochromis macrops. Primers were designed with GenScript Real-time PCR (TaqMan) Primer Design software available at https://www.genscript.com/ssl-bin/app/primer. Where possible, primers were designed in exon spanning regions to avoid effects of gDNA contamination. Primers were tested in all species and in cases where primers pairs did not work we designed new species-specific primers. Genes studied and primer sequences are available in Additional file 3.
Three qPCR experiments were carried out: qPCR experiment 1: Gene expression was compared between the non-egg-spot anal fin tissue and the egg-spot tissue of C. pulpican. This species has its egg-spot in a different position in the fin compared to A. burtoni (Fig. 1, n = 4-5). qPCR experiment 2: Gene expression was compared between the non-blotch anal fin tissue and blotch tissue of P. philander (Fig. 1, n = 6). In this experiment six individuals were used. qPCR experiment 3: Gene expression was compared between the non-blotch anal fin tissue and blotch tissue of C. macrops (Fig. 1, n = 4-7). In this experiment 4 to 7 individuals were used. For all experiments each individual was an independent replicate meaning that there was no pooling of samples.
The reactions were run on the StepOnePlus™ Real-Time PCR system (Applied Biosystems, USA) with Fas-tStart Universal SYBR Green Master mix (Roche, Switzerland), following the manufacturer's protocols. All reactions were performed with an annealing temperature of 58°C, a final concentration of cDNA of 1 ng/μl and a final primer concentration of 200 ng/μl. The comparative threshold cycle (CT) method [83] was used to calculate the relative concentrations between tissues, where anal fin was taken as the reference tissue and Ribosomal protein L7 (rpl7) or the Ribosomal protein SA3 (rpsa3) genes as endogenous controls. Primer efficiencies were calculated using standard curves. Efficiency values of test primers were comparable to the efficiency of endogenous control primers (rpl7, rspa3) and are available in Additional file 3.