Comparative Genomic Analyses Provide New Insights into the Evolutionary Dynamics of Heterochromatin in Drosophila

The term heterochromatin has been long considered synonymous with gene silencing, but it is now clear that the presence of transcribed genes embedded in pericentromeric heterochromatin is a conserved feature in the evolution of eukaryotic genomes. Several studies have addressed the epigenetic changes that enable the expression of genes in pericentric heterochromatin, yet little is known about the evolutionary processes through which this has occurred. By combining genome annotation analysis and high-resolution cytology, we have identified and mapped 53 orthologs of D. melanogaster heterochromatic genes in the genomes of two evolutionarily distant species, D. pseudoobscura and D. virilis. Our results show that the orthologs of the D. melanogaster heterochromatic genes are clustered at three main genomic regions in D. virilis and D. pseudoobscura. In D. virilis, the clusters lie in the middle of euchromatin, while those in D. pseudoobscura are located in the proximal portion of the chromosome arms. Some orthologs map to the corresponding Muller C element in D. pseudoobscura and D. virilis, while others localize on the Muller B element, suggesting that chromosomal rearrangements that have been instrumental in the fusion of two separate elements involved the progenitors of genes currently located in D. melanogaster heterochromatin. These results demonstrate an evolutionary repositioning of gene clusters from ancestral locations in euchromatin to the pericentromeric heterochromatin of descendent D. melanogaster chromosomes. Remarkably, in both D. virilis and D. pseudoobscura the gene clusters show a conserved association with the HP1a protein, one of the most highly evolutionarily conserved epigenetic marks. In light of these results, we suggest a new scenario whereby ancestral HP1-like proteins (and possibly other epigenetic marks) may have contributed to the evolutionary repositioning of gene clusters into heterochromatin.


Introduction
The organization of eukaryotic genomes into euchromatin and heterochromatin represents one of the most important and still unsolved aspects of genome evolution. Heterochromatin was originally identified from the early observation that specific regions of interphase nuclei possess distinctive staining properties [1]; it remains an elusive component of the eukaryotic genome. Our understanding of its nature and properties has progressively expanded through decades of intensive research, initially in cytological and classical genetic studies and later by sophisticated molecular and in silico techniques. Key steps have been the identification of two types of heterochromatin, constitutive and facultative [2]; the discovery of the unique genetic properties of heterochromatin [3,4,5,6,7]; the mapping of satellite DNAs, transposable elements and other repeated sequences in the heterochromatin [8,9,10,11,12]; the recent advances in whole genome sequencing [13,14,15]; and the systematic analysis of heterochromatin-binding proteins [16,17], and of histone modifications in heterochromatin [18,19].
However, our knowledge of heterochromatin is far from being exhaustive or satisfactory. For example, dozens of essential genes and hundreds of putative genes have been identified in the heterochromatin of D. melanogaster [12,13,14,15], but their existence is quite paradoxical [6,7]. In fact, a signature feature of heterochromatin is its ability to silence euchromatic genes that are brought within a heterochromatic environment following a chromosome rearrangement or a transposition event, a well-known phenomenon called position effect variegation (PEV) which provides an important model for studying the mechanisms regulating gene repression by chromatin modifications [20,21,22,23,24].Yet, the single-copy genes embedded in the heterochromatin of D. melanogaster are bound by specific proteins [25,26,27], show a pattern of modified histones [18] and their proper expression depends on their heterochromatic location [28,29], despite the fact that they are transcribed from promoter regions sharing basic similarities with those of euchromatic genes [30,31].
To our knowledge, the evolutionary history of the D. melanogaster heterochromatic genes has only been addressed in two studies, that have investigated the chromosomal location in D. pseudoobscura and D. virilis of: i) a small cluster of genes, including the light gene, located in the heterochromatin of chromosome 2 [30], and ii) the two adjacent RPL15 and Dbp80 genes located in the heterochromatin of chromosome 3 [31]. In both cases, the counterparts of D. melanogaster genes were shown to map in euchromatic regions in D. pseudoobscura and in D.
virilis, suggesting a repositioning of these genes during genome evolution in the Drosophilidae lineage.
Comparative studies of genes located on the dot chromosomes of D. melanogaster and D. virilis, both of which show heterochromatic properties, have shown that most of the genes maintain an overall synteny and only rare instances of "wanderer" genes (present in a euchromatic chromosome arm in one species and on the dot chromosome in the other) were found [32]. However, although the D. melanogaster dot chromosome 4 is considered heterochromatic, it shares only certain properties with the pericentric regions of the large autosomes, and genes located there are likely to be under different gene expression constraints [33].
To investigate the evolutionary dynamics underlying the emergence of heterochromatic genes in D. melanogaster, we identified and mapped their orthologs in D. pseudoobscura and D. virilis genomes. We took advantage of the extensive whole genome sequencing data currently available for at least 12 Drosophilidae species and of recent comparative studies confirming the overall conservation of chromosomal synteny in Drosophilidae [34,35,36]. In particular, we focused our analyses on 53 single-copy genes mapped to the heterochromatin of chromosome 2 of D. melanogaster (Fig 1). This genomic region corresponds to about 18.3 Mb and has been extensively characterized in the last decades at genetic, cytological and molecular levels for the presence of essential and putative genes and other genetic loci [37,38,39,40,41].
Our comparative studies show that the orthologs of chromosome 2 heterochromatic genes of D. melanogaster tend to be clustered at specific regions of separate chromosomal elements of D. pseudoobscura and D. virilis. These results suggest that the repositioning of genes to pericentric heterochromatin and their interspersion with numerous heterochromatic sequences, as In mitotic chromosome preparations of D. melanogaster, the pericentric heterocromatin has been differentiated into cytogically defined bands using specific chromosome banding techniques [3]. The heterochromatic regions of the left arm of chromosome 2 (2Lh) are numbered from h35 to h37, whereas those of the right arm of chromosome 2 (2Rh) are numbered from h39 to h46. Region h38 indicates the centromere (C). While most heterochromatin genes of the 2L arm appears to be restricted to region h35 (polytenic region 40), genes from the right arm are more scattered being localized in h41, h44 and h46 [52] and thus are separated by megabases of satellite DNA blocks and other repeated DNA sequences [12,13,15]. Only the mapping of the heterochromatic genes studied in this work is reported (see Table 1). For a more detailed map see Dimitri et al. [52]. The mapping of a group of Drosophila heterochromatin gene orthologs in D. virilis and D. pseudoobscura was retrieved from the work of Schaeffer et al, (supplemental Tables 21 and 24, [35]; these genes are shown in black, while the orthologs mapped in this work are shown in red. The heterochromatin of chromosome 2Rh was estimated to contain at least 12.1 Mb DNA [14], while according to the genomic coordinates of Release 6.03 it encompasses 5.2 Mb. This apparent discrepancy can be explained considering that heterochromatin scaffolds (named 2R-Het or 2R-U in the previous Release 5.1 [48] have been incorporated as an uninterrupted sequence in Release 6.03, without taking into account that those scaffolds are separated by megabases of satellite DNA stretches that are not included in the sequence [12,13,15]. seen in D. melanogaster, may have originated concomitantly with the rearrangements leading to the centric fusion of two separate chromosomal elements during the evolution of the Drosophilidae karyotype [35,42,43,44]. We also have data suggesting that ancestral HP1-like proteins may have contributed to the evolutionary repositioning of gene clusters to pericentromeric heterochromatin.

Results
Searching for chromosome 2 heterochromatic gene orthologs in Drosophila species The phylogenetic relationships among the syntenic chromosomes as defined by Muller in D. melanogaster, D. pseudoobscura, its sibling species D. persimilis and D. virilis [45] are shown in Fig 2. These species share a common ancestor dated around 40-50 million years ago, their genomic assembly is almost complete, and thus they are very useful for comparative genomic studies. Since D. pseudoobscura and D. persimilis genomes show a very low level of divergence [46], D. persimilis was included in this study as a useful source of information to compensate for the absence or partial lack of data from D. pseudoobscura.
The cytogenetic location of D. melanogaster heterochromatic genes studied in this paper is shown in the mitotic heterochromatin map of chromosome 2 (Fig 1, Table 1) [3]. The most distal mitotic bands, h35 (2L, left arm) and h46 (2R, right arm), are the only heterochromatic regions that can be resolved on Bridges' polytene map and contain the transition zone between heterochromatin and euchromatin [39,40,47]. The euchromatin-heterochromatin cytogenomic borders, as defined by Hoskins et al [13,14] and by Riddle et al [19], fall within these regions, and all the genes reported in Fig 1 should  Heterochromatic genes, as well as numerous euchromatic genes, have been annotated as having orthologs in several Drosophila species [36,48]. A database, Ortho DB, is easily accessible and represents an important source of information [49]. However, at present a large number of orthologs of D. melanogaster heterochromatic genes have been only partially annotated or not yet identified in other Drosophila species. The scheme shows the phylogenetic relationships between D. melanogaster, D. pseudoobscura, its sibling species D. persimilis and D. virilis, as defined by Muller [45] Using tBLASTn (see Materials and Methods), we were able to retrieve ortholog candidates of 9 D. melanogaster heterochromatic genes in D. pseudoobscura, D. persimilis and D. virilis. Some were already annotated, while others were not. All 9 retrieved genes appear to be true orthologs based on the following features: 1) strong sequence similarities in coding regions and exon-intron structure among the analyzed species; 2) high level of identity of deduced proteins; 3) no evidence of related duplicated sequences within the genomes found by FISH or BLAST analyses.
The identified D. melanogaster heterochromatic gene orthologs with their genomic coordinates are listed in Table 2 and structural comparisons between species are shown in S1 Fig. Mapping D. melanogaster heterochromatic gene orthologs in D. pseudoobscura and D. virilis Schaeffer et al. [35] produced an integrated physical and cytogenetic map of 11 species of the Drosophila genus, anchoring the genome assembly scaffolds to the polytene chromosome map. We first analyzed in detail the genomic coordinates of the assembled scaffolds containing specific marker genes [Tables S21 and S24 in ref. 35], to retrieve the polytene chromosome maps of a group of D. melanogaster heterochromatic gene orthologs in both D. pseudoobscura and D. virilis (Table 1). To confirm these data and to extend mapping to a larger number of orthologs, we performed fluorescent in situ hybridization (FISH) on polytene chromosomes.
For each gene, we cloned PCR fragments to be used as specific probes (see Mat and Met). Examples of FISH mapping are shown in Fig 3 and the results are summarized in Table 1.
It is clear that the D. melanogaster heterochromatic orthologs tend to be clustered at specific regions of the D. pseudoobscura and D. virilis genomes ( Table 2 and Fig 4). In D. virilis three main clusters were found at polytene regions 47C, 42F-43A and 55D, that we will call Dvir_47C, Dvir_42F-43A and Dvir_55D, respectively. Dvir_47C and Dvir_42F-43A are on chromosome 4 (Muller B element), while Dvir_55D is on chromosome 5 (Muller C element). The Dvir_47C harbors 16 orthologs of D. melanogaster heterochromatin genes located in 2Lh, together with at least 3 other orthologs from 2Rh. Dvir_42F-43A contains 18 orthologs from 2Rh, while Dvir_55D harbors 11 orthologs from 2Rh (mainly localized in region h46 of the mitotic map).
In D. pseudoobscura three main clusters were also found at polytene regions 63A, 83A and 82AB, that we will call Dpse_63A, Dpse_83A and Dpse_82AB. The Dpse_83A and Dpse_82AB main clusters on the Muller B element share at least 15 and 11 orthologs with the Dvir_47C and Dvir_42F-43A clusters on Muller B, respectively. The Dpse_63A Muller C shares 9 orthologs with the Dvir_55D main cluster and 2 with the Dvir_53D minor site on the Muller C element.
In addition, 12 D. melanogaster heterochromatic gene orthologs were not found in the above mentioned clusters and map to different locations in both the D. virilis (CG40285 at Dvir_42C; CG40129 at Dvir_59F; CG17678 at Dvir_49A; CG40218 and CG11066 at Dvir_53D) and D. pseudoobscura (CG17493 at Dpse_96B; CG41265 at Dpse_94A; CG17684 at Dpse_93D; CG42596 at Dpse_93A; CG17715, CG18140 and CG40191 at Dpse_89B). With the exception of the CG17715, CG18140 and CG40191 orthologs, found together at Dpse_89B, and CG40218 and CG11066 found together at Dvir_53D, the other 7 orthologs represent evidence of individual gene movements.
Our analysis of D. melanogaster heterochromatic gene orthologs in D. virilis and D. pseudoobscura shows that i) a cluster of orthologs retained its location on Muller B and Muller C elements in the three species analyzed; ii) a group of 14 D. melanogaster Muller C heterochromatic genes were originally located in the Muller B element (Table 1 and Fig 4) of D. virilis and D. pseudoobscura

Analysis of syntenic relationships among Drosophila species
To explore in more detail the gene content and syntenic relationships between the clusters identified by FISH, we compared annotated genes of D. pseudoobscura with their orthologs in D. virilis and D. melanogaster, looking at the relative direction of transcription. The genes we examined are shown in Tables 3, 4   As shown in Fig 5A and Table 3, 194 Kb of Dpse_63A harbors 17 genes; 13 are colinear with those located at Dvir_55D, 2 with those located at Dvir_53D, while 2 genes (numbered 15 and 16 in Table 3) are not present at either Dvir location. Together, 15 genes maintain the same transcription direction in both Dpse_63A and Dvir_55D-53D. This suggests that Dpse_63A results from a recombination event that occurred between Dvir_55D and Dvir_53D in an ancestral chromosome like that in D. virilis. Notably, the same gene arrangement expected to be produced by the reciprocal event of such a hypothetical recombination is found at Dpse_78B of D. pseudoobscura. This arrangement in turn matches perfectly (and hence is  Table 3. List of D. pseudoobscura genes at Dpse_63A orthologous to genes of D. virilis and D. melanogaster.

D_mel
Dpse_63A D_vir Name Chr. position Name Chr. position  Table 4). Furthermore, 12 out 15 genes in these two syntenic blocks of the Muller C elements of D. pseudoobscura and D. virilis are found scattered on the Muller C element in D. melanogaster heterochromatin (Table 3 and Fig 4). We compared the gene content of the Dpse_83A block of the Muller B element with that of the Dvir_47C syntenic block of the Muller B element (Fig 5B and Table 5). Within the 123 Kb region analyzed of Dpse_83A, 23 genes are found that are colinear with genes located within the 112 Kb of Dvir_47C, sharing same transcription direction. Among those genes, 13 are orthologs of D. melanogaster 2Lh genes (Muller B), whereas 5 are orthologs of D. melanogaster 2Rh genes (Muller C); two remaining genes are orthologs of still unmapped D. melanogaster heterochromatin genes (CG12423 and CG17878). Thus, 20 out of 23 genes in Dvir_47C and Dpse_83A syntenic blocks are embedded in pericentric heterochromatin in D. melanogaster.
As shown in Table 1, among the D. melanogaster 2Rh gene orthologs found, 11 map to Dpse_82AB and 18 map to Dvir_42F-43A. A detailed comparative analysis of Dvir_42F-43A and Dpse_82AB cannot be performed, because Dpse_82AB is not assembled in a continuous scaffold and the orthologs are either not annotated or found in different scaffolds, most of which are unmapped (Table 6). Thus, we used the assembled D. virilis genome sequence to further analyze 400 Kb of the Dvir_42F-43A region. 50 genes (from CG42595 to CG5277) are located in this region, (Dvir:scaffold_12963:13,968,303..14,371,342; Table 6). As previously shown (Table 1), 18 genes of Dvir_42F-43A (Muller B) were confirmed to be orthologous to genes of D. melanogaster 2Rh (Muller C). Our FISH mapping has also shown that 11 of these genes map to Dpse_82AB (Muller B), while 5 of them were previously not annotated (Table 1). It is therefore conceivable that Dvir_42F-43A is syntenic with Dpse_82AB, both regions sharing clustered D. melanogaster 2Rh gene orthologs.
The analysis of the Dvir_42F-43A region also revealed an intriguing result. Adjacent to the cluster of Dmel_2Rh genes orthologs, we found another gene cluster which contains orthologs of genes located in polytene chromosome region 31 (Muller B) of D. melanogaster (Table 6). Interestingly, in D. pseudoobscura most of the orthologs are found in unmapped scaffold, that usually contains genomic regions derived from heterochromatic regions [34,36]. Table 4. List of D. pseudoobscura genes at Dpse_78B orthologous to genes of D. virilis and D. melanogaster.

D_mel
Dpse_78B D_vir Name Chr. position Name Chr. position The nature of Dpse_63A, Dpse_83A and Dpse_82AB regions of D. pseudoobscura The Dpse_63A, Dpse_83A and Dpse_82AB blocks are proximal to chromocentric heterochromatin. Interestingly, hybridization signals produced by most gene probes mapping to Dpse_63A, Dpse_83A and Dpse_82AB tend to exhibit a dispersed and grainy morphology (Fig  3), compared to the sharp euchromatic signals. This morphology is distinctive of sequences derived from partially polytenized heterochromatin found in the so-called β-heterochromatin of the chromocenter [50,51]. Other peculiar characteristics of pericentromeric heterochromatin in D. melanogaster are the enrichment of repeated sequences and the increased size of gene introns, compared to euchromatin [52]. We then asked whether the repositioning of gene clusters in regions located proximal to the chromocenter in D. pseudoobscura (Fig 4) Both mitotic and polytene chromosome map of D. melanogaster heterochromatic genes are indicated.
The gene marked by the asterisk is intronless doi:10.1371/journal.pgen.1006212.t005 The block Dpse_83A contains 9.0% repeats, while 1.7% was observed in the syntenic Dvir_47C. A precise repeat content of block Dpse_82AB cannot be estimated because this region is found fragmented in several unmapped scaffolds (see Table 6) which are likely to correspond to repeat-rich heterochromatin regions [34,36]. The repeat content of the inferred syntenic block Dvir_42F-43A was estimated to be low, around 3.0%. Thus, it appears that the repositioning of gene clusters from D. virilis to D. pseudoobscura pericentric regions has occurred concurrently with an increase of repeats, which mainly correspond to transposable element-homologous sequences (S2 Fig).
The intron size estimates of assuredly homologous introns found in the coding regions of genes present in the syntenic blocks are reported in S2 Table. Intronless genes and genes with unknown intron size have been not considered; genes with partially sequenced introns have been included in the analysis, which may give rise to understimated intron size. Genes from the Dpse_78B/Dvir_53D-55D euchromatic gene cluster have been used as control.
This analysis revealed that intron size of genes in the syntenic clusters Dpse_63A/Dvir_55D and Dpse_83A/Dvir_47C remains stable, with a mean value ranging from 427 bp at Dpse_63A to 597 bp at Dvir_55D, and from 362 bp at Dpse_83A to 418 bp at Dvir_47C. An increase in intron size was observed in genes of the proximal syntenic blocks Dpse_82AB/Dvir_42F-43A; in this case the mean value of Dpse_82AB cluster is about 2 fold that of Dvir_42F-43A cluster and is mainly due to the intron increase of 2 orthologs (CG12559 and CG15848) among the 10 genes present in the clusters. A consistent intron size increase is indeed observed in the D. melanogaster orthologs of the three above mentioned clusters. The mean values being 10 fold higher for orthologs of Dpse_63A cluster and 7-8 fold higher for those of Dpse_83A and Dpse_82AB.
Taken together, it appears that the repositioning of the analysed gene clusters from D. virilis to D. pseudoobscura has occurred concurrently with an accumulation of repeats, while the increase in intron size has been limited. By contrast, a significant increase in intron size of D. melanogaster heterochromatic genes is apparent.

Immunolocalization of HP1a on polytene chromosomes of D. virilis and D. pseudoobscura
The genomic organization of region Dvir_42F-43A in D. virilis opens an interesting issue. Although the polytene region 31 lies in the euchromatin of D. melanogaster, it shows some heterochromatic features; it has a poorly visible banding pattern with a "gooseneck" chromosomal morphology and shows abundant association with the heterochromatin protein HP1a, an evolutionarily conserved epigenetic mark very abundant in heterochromatin [53,54,55]. To test whether this association was evolutionarily conserved, we performed immunofluorescence (IF) experiments on D. virilis and D. pseudoobscura polytene chromosomes using anti-HP1a antibodies. The results clearly show that HP1a is present at Dvir_42F-43A (Fig 6) and remarkably also at Dvir_47C and Dvir_55D, where the other gene clusters are located. Similarly, in D.
arrows delimit the analyzed scaffold regions. The D. melanogaster gene name is shown for three genes used as markers (RYa, Yeti and scarface). Big arrows at the Dvir_55D and Dvir_53D in D. virilis chromosomes show the breakpoints of to the hypothetical recombination sites leading to the gene arrangement at the region Dpse_63A and Dpse_78B in D. pseudoobscura. The Dpse_78B cluster is syntenic with the Dmel_47C1-3 in D. melanogaster (lower row). The list of the genes numbered in the arrowheads is reported in Table 3 and Table 4. (B) Syntenic relationships between Dpse_83A and Dmel_47C regions. Genes are depicted as arrowheads showing the transcription orientation and coloured according to their localization in D. melanogaster (box). Small arrows delimit the scaffold region analized. The D. melanogaster gene name is shown for three genes used as markers (light, RpL5 and RpL21). The complete list of genes numbered in the arrowheads is shown in Table 5.  pseudoobscura HP1a is present at polytene regions Dpse_63A, Dpse_82AB and Dpse_83A where it colocalizes with the gene clusters (Fig 6). To further investigate the association of HP1a with the gene clusters, we performed sequential IF/FISH experiments where IF with anti-HP1a antibodies was followed by FISH using specific probes. We focused our attention on  the Dvir_42F-43A cluster, using PCR probes of the GJ22088, GJ17515, GJ21982 and GJ21862 genes which are orthologous to the CG42595, CG17665, CG41265 and CG42596 heterochromatic genes of D. melanogaster, respectively ( Table 6). As shown in Fig 7A, the FISH signals of CG41265 (GJ21982) and CG42596 (GJ21862) clearly colocalize with the strong HP1a signal found at division Dvir_42F. In particular, CG41265 (GJ21982) and CG42596 (GJ21862) map to the proximal and distal end of the HP1 region, respectively. The FISH signals of CG42595 (GJ22088) and CG17665 (GJ17515), on the other hand, map to Dvir_43A and therefore are linked to the HP1a signal, but not included in it. Interestingly, in D. pseudoobscura the CG42595 ortholog is included in the large HP1 signal mapped to Dpse_83A, while the CG42596 ortholog maps to Dpse_93A, a region located in the middle of euchromatin, which is also strongly stained by the anti-HP1 antibody (Fig 7B). Together, these results reveal an unexpected, evolutionarily conserved association between HP1a and the clusters of heterochromatic gene orthologs, even though not all genes in the cluster are included in the region decorated by the anti-HP1a antibody.

Discussion
To investigate the evolutionary dynamics of heterochromatin, we have studied 53 heterochromatic genes of D. melanogaster chromosome 2, asking whether their current location in the genome originated from random events or arose as a consequence of specific rearrangements occurring during chromosome evolution. Among the genes, 17 map to the mitotic heterochromatin region h35, while 36 are spread out along a large portion of 2R mitotic heterochromatin spanning bands h41-h46 (Fig 1) [41,52].
The results of this analysis provide evidence that the orthologs of current D. melanogaster heterochromatic genes are clustered at three main syntenic regions in D. virilis and D. pseudoobscura (Table 1, Fig 4 and Fig 5). Interestingly, in D. virilis the genes are present across a few hundred kb, while in D. melanogaster the genes are found scattered over several Mb of DNA. Moreover, in D. virilis the gene clusters are located in the middle of euchromatin, in regions with low repeat content, while in D. pseudoobscura the clusters are associated with the distal portions of pericentric heterochromatin and are enriched in repeated sequences. Notably, the accumulation of repeated sequences in pericentric heterochromatin during Drosophila evolution it is a well known phenomenon [56] which is unlikely to be reversible.
Based on the Drosophila phylogeny (Fig 2; and ref 57,58) and on trasposable element enrichment of D. pseudoobscura clusters compared to those of D. virilis (S2 Fig), we propose that the ancestral clusters were located in euchromatin and underwent repositioning into pericentromeric regions in the lineage that gave rise to the melanogaster subgroup species [57,58]. A similar trend has been previously suggested for the evolution of a cluster of seven 2Lh genes of D. melanogaster (30). However, further analyses are needed to fully test this hypothesis.
Remarkably, we have also found that in both D. virilis and D. pseudoobscura the clusters show a striking association with the HP1a protein (Figs 6 and 7), one of the most highly evolutionarily conserved epigenetic marks of heterochromatin [59,60,61].
The only exception to the clustering is given by seven genes (CG17684 CG41265, CG42596, CG17678, CG17493, CG40285 and CG40129) that appear to have moved individually (Table 1 and Fig 4). In mammalian genome evolution, transposition events can drive the movement of individual genes from sex chromosomes to autosomes [62]. Similarly, in Drosophila phylogeny, movements of individual genes have occurred by retrotransposition and DNA-based transposition [63]. The mechanism of movement of these seven genes is unlikely to be due to retrotransposition or recombination, as suggested by their conserved exon-intron structure (S3 Fig), In fact, CG17684 CG41265, CG42596, CG17678 show an identical intron-exon structure in the three species. CG17493 is intronless in all the species, CG40285 is intronless in D. virilis and D. pseudoobscura, but carries a single intron in D. melanogaster, while CG40129 has a similar complex intron structure in both D. melanogaster and D. virilis and it is not assembled in D. pseudoobscura. It is then possible that the above mentioned seven genes may have moved by DNA-based transposition, although the molecular basis underlying their chromosomal evolution needs further investigations.

The genes at the pericentric regions of Muller C and Muller B elements
In the Muller C element of D. pseudoobscura we found a cluster of 13 orthologs of D. melanogaster 2R heterochromatin, located at ± 200 Kb DNA from the proximal region Dpse_63A. In the Muller C element of D. virilis, 10 out of these 13 orthologs map to the euchromatic regions Dvir_55D, 2 map to Dvir_53D, while one (CG30438) was not detected in these regions.
Our analysis suggests that the arrangement of genes within Dpse_63A arose by recombination between Dvir_55D and Dvir_53D in an ancestral Muller C element like that of D. virilis (Fig 5A). Evidence for this recombination event relies on the proximity of GJ20406 and GJ22228 at Dpse_63A and reciprocally by the proximity of GJ20124 to GJ21920 found at Dpse_78B (Tables 3 and 4). The junction GJ20406-GJ22228 is present in the obscura group species, in D. ananassae and in D. willistoni, but not in the melanogaster subgroup species (D. erecta, D. yakuba, D. sechellia, D. simulans, D. melanogaster), whereas the junction GJ20124-GJ21920, representing the reciprocal event, is found in all the species analyzed except D. willistoni. Moreover, the GJ20406-GJ22228 association is almost proximal to the pericentromeric heterochromatin as seen in D. pseudoobscura, while the GJ20124-GJ21920 association maps to the euchromatin in all the analyzed species.
Taken together, these observations suggest that: i) the hypothetical recombination event occurred in the ancestor of the lineage that led to the subgenera Drosophila and Sophophora and ii) the two reciprocal events have had different evolutionary fates. Indeed, in the melanogaster subgroup species, gene order was lost in the pericentromeric region, yet was maintained in euchromatic regions. Furthermore, when we compare the gene order in the syntenic blocks Dpse_63A/Dvir_55D-53D with that found in the D. melanogaster 2R heterochromatin (Table 3), a scrambled arrangement of the 13 orthologous heterochromatin genes is apparent, suggesting that recurrent internal rearrangements occurred during chromosome evolution of this pericentric region.
Dpse_83A and Dpse_82AB adjacent regions are located proximally to the chromocenter in the D. pseudoobscura Muller B element and harbor at least 29 orthologs of D. melanogaster heterochromatic genes. Dpse_83A contains 13 orthologs of Dmel_2Lh and at least 5 orthologs of Dmel_2Rh ( Fig 5B, Table 5), while the Dpse_82AB carries 11 orthologs of Dmel_2Rh (Table 1). Our data indicate that Dpse_83A and Dpse_82AB are syntenic with Dvir_47C and Dvir_42F-43A of the Muller B chromosome (Tables 5 and 6). The novelty of this observation is that during chromosome evolution, further rearrangement must have occurred to separate the two contiguous regions Dpse_82AB and Dpse_83A, giving rise to the relocation of a group of genes from these two Muller B regions to a pericentromeric Muller C region in the descendent lineage, that is the D. melanogaster 2R heterochromatin (Fig 4).
The mechanistic details responsible for this event are unknown, because extensive sequence data are not available for the Dpse_82AB-Dpse_83A regions in the D. pseudoobscura genome assembly. Some observations can, however, help to reconstruct the molecular events which occurred at the time of the fusion of two separated Muller B and Muller C arms into a single chromosome.
The first observation is the finding of duplicated sequences (CG3057) at Dpse_83A (Fig 5B  and Table 5). Although these duplications cannot be associated with specific breakpoints, they might have arisen by staggered single-strand breaks, as postulated in the isochromatid model by Ranz et al. [64]. The second observation is the scrambled arrangement of the D. melanogaster heterochromatic genes compared to the gene order seen in the syntenic blocks Dpse_83A/ Dvir_47C (Table 5) and in Dvir_42F-43A (Table 6), which would suggest numerous internal rearrangements during chromosome evolution. In addition, these rearrangements may have contributed to the expansion of DNA sequences, since the location of Dmel_2Rh genes spans megabases of DNA (from mitotic bands h41 to h46).
Notably, in Dvir_42F-43A the cluster of Dmel_2Rh gene orthologs is flanked by a cluster of D. melanogaster gene orthologs mapping to polytene region 31, in the middle of euchromatin (Table 6), a location known to be enriched in HP1a protein. We indeed found that the HP1a protein is associated with the gene cluster at Dvir_42F-43A (Fig 6) and also with the other clusters of both D. virilis and D. pseudoobscura (Fig 7). If such association represents an ancestral condition, then the D. virilis clusters may represent proto-heterochromatic sites whose repositioning and expansion led to the origin of pericentric regions of D. melanogaster chromosome 2.
Together, our results are consistent with a scenario where the D. melanogaster heterochromatin genes of chromosome 2 arose through an evolutionary repositioning from a euchromatic location (D. virilis) to heterochromatin, passing through an intermediate location in regions associated with distal heterochromatin (D. pseudoobscura). A similar trend was described by a study on the evolution of the heterochromatic gene cluster containing the light gene [30]. Recently, we have reported that the heterochromatic Yeti gene underwent a similar repositioning [65]. Here we showed that Yeti was part of a euchromatic gene cluster that was relocated by recombination to the heterochromatin of chromosome 2 (Fig 5A).

HP1 and repositioning of gene clusters to pericentric heterochromatin
Previous studies have investigated the epigenetic states that enable the expression of heterochromatic genes and render their promoters heterochromatin-dependent [18,33], yet little is known about the evolutionary process(es) through which this has occurred. The results of the present study indicate that, in the Drosophilidae evolution, movement of genes from euchromatin to heterochromatin preferentially occurred by evolutionary repositioning of gene clusters which show close association with the HP1a protein (Figs 6 and 7). The molecular mechanisms responsible for repositioning of the large clusters are presently unknown, because of our poor knowledge of the breakpoints involved in the rearrangements and arrangements of genes flanking the regions under analysis. However, comparative studies carried out in Drosophila species have shown that during chromosomal evolution, transposable elements and other repetitive sequences have been implicated in the generation of rearrangements [34,35,40,42,64].
The association of D. virilis and D. pseudoobscura gene clusters with HP1a is indeed intriguing. Spellman and Rubin [66] have suggested that block conservation is needed to regulate a coordinated expression of the genes present in a block. There is evidence showing that HP1a, among its versatile functions as an element of the chromatin architecture and in gene silencing, also has a stimulating effect on the expression of both heterochromatic genes and region 31 genes [67,68,69].
Although not all genes within the clusters cytologically map to the HP1a-enriched portion (Fig 7), it is unlikely that such association arose by chance. It is worth noting that the association with HP1a is conserved even by genes that have moved away from the cluster (see the example of the Dmel_CG42596 ortholog found at Dpse_93A (Fig 7B). In addition, while the ortholog of Dmel_CG42595 lies at the very proximity of the HP1a signal at Dvir_43A, but is not included in it, in D. pseudoobscura it has become included in the HP1 signal at Dpse_83A.
Could we interpret this association with HP1a (and possibly with other epigenetic regulators) in an evolutionary perspective? If so, might the presence of HP1 in the ancestral domains have contributed to the success of gene repositioning into pericentric heterochromatin? Although the results of the present work cannot provide a final answer to these questions, some speculations about possible scenarios are possible.
We could envisage an evolutionary scenario where ancestral HP1-like association may have favored the expression of relocated gene clusters. It is well known that euchromatic genes moved into pericentric heterochromatin are subjected to position effect variegation (PEV) [21]. However, heterochromatic genes of D. melanogaster such as light and rolled are bound by HP1a which positively contributes to their expression [25,26,70]. We can speculate that genes in ancestral "proto-heterochromatic" clusters were already used to being active in an HP1a environment, and were hence less likely to exhibit deleterious position effects when moved to a pericentric region. Thus, at least in certain instances, the ancestral association with HP1 would have acted as an "epigenetic shield", protecting the relocated gene clusters against silencig effect, and favoring their expression in pericentromeric regions.
In addition, ancestral HP1-like may have contributed to the formation of rearrangements that eventually led to the gene cluster repositioning. For instance, HP1 may have mediated long-distance interactions between different HP1-chromatin domains (i.e., located in euchromatin and in pericentric heterochromatin). If DNA breaks occurred in physically close domains, then rearrangements may eventually be generated following end-joining repair events, thus promoting the repositioning of HP1-associated chromatin domains.
Although further experiments will be required to investigate the evolutionary significance of the association of HP1 with the D. virilis and D. pseudoobscura gene clusters, our work provides previously unanticipated results that will contribute to the understanding of the molecular evolution of genes embedded in pericentric heterochromatin.

Drosophila species and gene nomenclature
The following species were used for the cytological preparations: D. pseudoobscura (14011-0121.94), and D. virilis (15010-1051.00), from Tucson Drosophila Stock Center, University of Arizona, USA. Polytene chromosomes were prepared according to Pardue [71]. Gene names reflect the species-specific nomenclature, or use the D. mel gene symbol.

Molecular probes
Species-specific probes were generated by PCR on genomic DNAs. Opposite primers were selected on the basis of published sequenced genomes and chosen to avoid the inclusion of intronic sequences. The list and the size of the probes generated are in S1 Table. Primers obtained from the D. persimilis sequenced genome were used to amplify the orthologs CG12559, CG12547, CG42595, and CG41265 with D. persimilis (14011-0111.01) and/or D. pseudoobscura DNAs. Amplified DNA fragments were eluted from agarose gels and cloned in pGEM-T vector (Promega). In every case the plasmid probes were verified by DNA sequencing.

Fluorescence in situ hybridization (FISH)
Fluorescence in situ hybridization (FISH) was performed according to Pimpinelli et al. [72].
Squashed preparations of polytene chromosomes from salivary glands dissected from third instar larvae of D. virilis were denatured and hybridized with Cy3-dCTP or FluorX-dCTP (GE Healthcare) labeled probes. CG41265 and CG17665 were simultaneously localized using mixed probes. Polytene chromosomes were stained with DAPI, 4', 6'-diamidine-2'phenylindole-dihydrochloride. Chromosome preparations were analyzed using a computer controlled Nikon E1000 epifluorescence microscope equipped with a cooled CCD camera (Coolsnap). Digital images were obtained using an Olympus epifluorescence microscope equipped with a cooled CCD camera. Gray scale images, obtained by separately recording Cy3, FluorX and DAPI fluorescence with specific filters, were pseudo colored and merged for the final image using Adobe Photoshop. Labelled sites were identified on the basis of published polytene maps [35].

Immunofluorescence
Polytene chromosomes of D. virilis and D. pseudoobscura were HP1 immunostained according to James et al. [53] using the C1A9 anti-HP1 antibody. In brief, salivary glands were rapidly dissected in Cohen and Gotchell medium G containing 0.5% Nonidet P-40 and incubated in 2% formaldehyde fixative solution for 25 min. The preparations were incubated with monoclonal anti-HP1 C1A9 antibody (1:50) overnight at 4°C in a humid chamber. The slides were washed in TBS/0.05% Tween 20 three times for 5 min and incubated with secondary antibody 1:40 dilution of FluoroLink Cy2-labeled goat anti-mouse (Amersham Biosciences) for 1 h at room temperature in a humid chamber. Finally, the slides were washed three times in TBST at 4°C, stained with 4,6-diamidino-2-phenylindole (DAPI) at 0.01 μg/ml, and mounted in antifading medium.

In silico search
Genomic Databases on the Flybase website (http://flybase.org; version FB2014_06) were searched by TBLASTN using amino acids from individual translated exons of D. melanogaster heterochromatic genes. Only those genes lacking a clear orthologous member in the OrthoDB database were searched. To retrieve orthologs we did a tblastn search using the amino acids of single or of two contiguous exons of D. melanogaster over the genomes of D. virilis and D. pseudoobscura (or D. persimilis when the query produced partial information in D. pseudoobscura).
When the results showed high scores (E-value < e-80) and most importantly, all the subject results were in the same plus or minus frame in adjacent sequences, we then assembled a complete coding region that was compared with the D. melanogaster gene structure. Splice junctions were analysed by visual inspection comparing exons in the pairwise alignment of two DNA sequences. By these criteria we were able to reconstruct a bona fide structure of the gene. Multiple sequence alignments were performed with ClustalW procedure available at EMBL-EBI (http://www.ebi.ac.uk) or with the multialin interface at http://multialin.toulouse. inra.fr. Assembly of exon-intron was obtained by manual inspection of high-scoring alignments obtained using the Blast tools with translated ORF of D. melanogaster proteins and supported by EST annotated sequences. The changes proposed to the orthologous heterochromatin genes for the bestfit alignments are reported in S3 Table. To estimate the repeat content within syntenic blocks, DNA sequences were retrieved from Flybase and repeats were identified using Censor [73], implemented at Repbase (www.girinst. org/censor/index.php) and the Arthropoda dataset. The masked sequences correspond to transposable elements and no satellite/simple repeats were masked in the final output. The masked sequences were filtered out retaining only alignments longer than 100 bp and with similarity greater than 80%, roughly corresponding to a score greater than 800. These values were used as threshold in our analysis which allowed elimination of spurious matches that might inflate the repeat content and prevented as much as possible false positive cross-species matches. Bed files containing the masked positions were uploaded and visualized as custom tracks in Gbrowse (www.flybase.org).