Comparative Mapping of the Macrochromosomes of Eight Avian Species Provides Further Insight into Their Phylogenetic Relationships and Avian Karyotype Evolution

Avian genomes typically consist of ~10 pairs of macro- and ~30 pairs of microchromosomes. While inter-chromosomally, a pattern emerges of very little change (with notable exceptions) throughout evolution, intrachromosomal changes remain relatively poorly studied. To rectify this, here we use a pan-avian universally hybridising set of 74 chicken bacterial artificial chromosome (BAC) probes on the macrochromosomes of eight bird species: common blackbird, Atlantic canary, Eurasian woodcock, helmeted guinea fowl, houbara bustard, mallard duck, and rock dove. A combination of molecular cytogenetic, bioinformatics, and mathematical analyses allowed the building of comparative cytogenetic maps, reconstruction of a putative Neognathae ancestor, and assessment of chromosome rearrangement patterns and phylogenetic relationships in the studied neognath lineages. We observe that, as with our previous studies, chicken appears to have the karyotype most similar to the ancestor; however, previous reports of an increased rate of intrachromosomal change in Passeriformes (songbirds) appear not to be the case in our dataset. The use of this universally hybridizing probe set is applicable not only for the re-tracing of avian karyotype evolution but, potentially, for reconstructing genome assemblies.


Species
For exploring further an association between chromosomal rearrangement patterns, on the one hand, and overall karyotype/genome organisation and divergence time, on the other, for the eight avian species studied, we produced simple correlation graphs using Microsoft Excel (Supplementary Figure  SN1a-d). Herewith, instead of particular interspecies FISH success rate (Table 1), percentage values of failed chicken BACs in these avian species were used to reflect relative genome divergence in general and plotted on the x-axis. In other words, percentage of failed chicken BACs served as a kind of genome similarity degree calculated in reverse order (by subtracting FISH success rate from 100%) so that the chicken had 0% (instead of 100%) and the other species their respective values that can be considered as a peculiar distance these birds are at from the 'zero' species, i.e., the chicken. As values on the y-axis, individual numbers of intra-, inter-or total chromosomal changes or inversions alone (Table 2) were assigned.
As a result, the graphs contained points for the eight bird species, which were approximated by linear correlation functions. Thus, by means of the graphs plotted in this way, we were determining the presence of possible relationships (regularities) between the rearrangement values (Table 2) and the degree of 'kinship' of various bird species relative to the chicken expressed as a percentage of failed chicken BACs. In theory, knowing the degree of 'kinship' with the original species, one may assume what value of total rearrangements a given species may have.

W W A
Along with the correlation analysis for the success rate, similar correlation estimates were obtained for the ratio 2n/80. The respective pairwise coefficients, R, for correlation between 2n/80 and other characteristics ranged between 0.186 and 0.906 as can be seen in Supplementary Table SN1. Similarly to the success rate charts (Supplementary Figure SN1a-d), we plotted the graphs describing the dependencies produced for 2n/80 (Supplementary Figure SN2a-d).
Alternatively, we performed a transformative ranking for the following species-specific characteristics: R1, success rate of interspecies FISH hybridization; R2, divergence time (as estimated between the chicken and any other studied bird using TimeTree [39]; Table 1); and R3, ratio of the diploid number of chromosomes of a species (Table 1 and Supplementary Table SN1) to the typical avian karyotype taken as 80 chromosomes, i.e., 2n/80.
As can be seen from Table 1 and Supplementary Table SN1, each of the above three characteristics (factors) has its own specific variability and nature (i.e., magnitude and range) of values. For example, divergence time (R2 factor) for the chicken should be interpreted as 0, while its value for the pigeon, houbara bustard, blackbird, canary and woodcock was equal to 98 million years. If we consider the diploid set of chromosomes (2n) used to calculate the R3 index, we have here, as a rule, only two available variables (78 and 80), and only the woodcock (96) gets out of this row, and so on. In other words, it is rather difficult to integrally generalize such diverse and variable factors into any one indicator. Therefore, as a solution to this problem, it was proposed to transform (normalise) these very different discrete data for the three factors by ranking them. For example, instead of the following available values for 2n/80, which were used as R3: Similar transformations (rankings) were performed for the other two factors. Then, the transformed values for three factors were multiplied for each species, which gave a new integrative indicator. Thus, this integrative genome/divergence index (IGDI) was computed as a product of three single factors using the following formula: IGRI = R1 × R2 × R3.
Next, a simple linear correlation graph shown in Figure 6a was built using Microsoft Excel for the eight species, where the total number of rearrangements was plotted along the y-axis, and IGRI along the x-axis. The respective coefficient of determination was R 2 = 0.8427, meaning that a simple correlation for this studied dependence (R = 0.9180) was higher than the respective Pearson's correlation coefficients in Supplementary Table SN1

Multiple correlation
We also tested a different approach by searching for multiple correlation dependence of the total number of rearrangements on the hybridization success rate and 2n/80. Graphically, this can be visualised as a 3D diagram plotted using STATISTICA 5.5 (StatSoft, Inc./TIBCO, Palo Alto, CA, USA) and shown in Figure 6b, that had the following axes: x = s (success rate changing from 1.00 to 0.73), or VAR2; y = k (2n/80 ranging between 0.98 and 1.20), or VAR3; z = TR (total rearrangements), or VAR1.
Approximation of values of the success rate (see Supplementary Table SN1 where R is the coefficient of correlation between actual data and those obtained as a result of calculation by formula (1) or (2).
Formula (2) is more accurate and can be taken as a basis for describing values of total rearrangements depending on the success rate and 2n/80.

Supplementary Note 2: PCA, FAC and ALC clustering analyses
To examine relationships between the eight studied avian species as influenced by their genome/karyotype and rearrangement features, the respective charts were plotted using the methods of principal component analysis (PCA), fuzzy analysis clustering (FAC), and average linkage clustering (ALC) based on Euclidean distances. Input dataset for these analyses included four or five characteristics (factors) for interspecific FISH hybridization, karyotype and rearrangements as follows: success rate, diploid number of chromosomes (2n), and numbers of intra-, interchromosomal and total rearrangements (see their appropriate values for each species in Supplementary Table SN1). For data analysing and plotting, R language and libraries for R environment were used. Prior to running the PCA and FAC analyses, the data were transformed (normalized) by scaling [69]. Further, depending on datasets, the appropriate metric was chosen, which was mostly the Euclidean distance, and the clustering method was the average linkage method.

PCA analysis
As a result of performing PCA on the base of four karyotype/rearrangement characteristics and producing the respective score plot, it was found that the success rate and number of total rearrangements contributed to PC1, while the numbers of intra-and interchromosomal rearrangements to PC2 (Figure 7). The change in the success rate factor was inversely proportional to the change in the total rearrangements factor (as can also be suggested from the Supplementary Table SN1 data).
The used four factors had approximately equal degree of influence on differentiation of the eight compared birds. In particular, there were the following two distinct groups (Figure 7): chicken-guinea fowl (Galliformes) and duck-houbara-pigeon-canary-blackbird, the latter being divided into two subgroups, duck-houbara-pigeon (mixed) and canary-blackbird (Passeriformes), while woodcock remarkably differed from the others due to the least hybridization success rate and a large number of inter-and intrachromosomal rearrangements (eight of each type). As can be seen from position of the eight birds on the PCA score plot (Figure 7), the pair chicken-guinea fowl had the greatest possible success rate, and the other birds had respectively lower values. On the other hand, duck, houbara, pigeon, canary, and blackbird had the greatest number of intrachromosomal rearrangements as reflected by their position on the score plot.
If the fifth characteristics, i.e., 2n, was included in the PCA analysis, there was some more pronounced correlation between this factor and intrachromosomal rearrangements, as shown on the PCA score plot in Supplementary Figure S2.1. Overall, this resulted however in somewhat distorted arrangement of the eight species relative to PC1 and PC2, although it retained main features of relationships within this set of birds as observed for the four characteristics (Figure 7).

FAC analysis
Using the same dataset, we applied the FAC method by employing the function fanny() from the package cluster [70][71][72] and Dunn's partition coefficient (Fk) according to the following formulae: / is Dunn coefficient for each observation in a matrix of objects, μ is membership coefficient, μir is ratio of the membership coefficient of observation i to cluster r, k is the number of clusters, r is a selected cluster, and i is an object of observation. Figure S2.1. PCA score plot generated for the eight bird species studied using five characteristics: BAC hybridization success rate (Rate), diploid number of chromosomes (2n), and numbers of total (Total), intra-(Intra rearrangements) and interchromosomal (Inter rearrangements) rearrangements.

Supplementary
The above procedure enabled to obtain the respective chart shown in Supplementary Figure S2 Figure S2.2) were clearly defined, and those were consistent with the known phylogeny ( Figure 1). As far as duck and woodcock are concerned, they had lower Dunn's partition coefficient values, at Fk < 0.17, meaning that they were to a smaller degree similar to the three observed distinct clusters. At the same time, they did not have any obvious pair on the score plot, resembling, to some extent and at least for woodcock, their positions in the phylogenetic tree in Figure 1. Figure S2.2. FAC score plot generated for the eight bird species studied using four characteristics: BAC hybridization success rate, and numbers of total, intra-and interchromosomal rearrangements. Left: a matrix sorted in descending order (by the degree of fuzziness of three clusters). Dunn's partition coefficient was used to estimate the fuzziness degree. Right: a PCA ordination diagram resulted from applying the fuzzy clustering method.

ALC analyses
To build a plausible phylogenetic tree for the investigated eight avian species and their five characteristics, we performed the ALC clustering based on Euclidean distance metric and average linkage method (UPGMA). This was followed up by bootstrapping validation generating Approximately Unbiased (AU) p-values and Bootstrap Probability (BP) values and using the pvclust software package for R [73]. If AU values were more than 95%, clusters were deemed significant. Interpretation of optimal cluster number was done using the Elbow method [74]. The produced phylogenetic three (Supplementary Figure S2.3) was concordant with the observed clustering patterns using the other mathematical approaches. Figure S2.3. ALC clustering for the eight bird species studied using a matrix of Euclidean distances between objects and based on five characteristics: BAC hybridization success rate, diploid number of chromosomes, and numbers of total, intra-and interchromosomal rearrangements. Bootstrapping validation resulted in AU (Approximately Unbiased) p-values (%) and BP (Bootstrap Probability) values (%) presented as red and green estimates, respectively. Red rectangles contain clusters with AU p-values ≥95%.

Supplementary
Next, we compared the resulting ALC tree (Supplementary Figure S2.3) with the 'reference' one ( Figure 1) using the Robinson-Foulds distances between pairs of phylogenetic trees [75][76][77]. As calculation of the Robinson-Foulds distances showed (Supplementary Figure S2.4), the right ALCassisted tree did not entirely match the left known phylogeny but was still similar to the clustering patterns obtained with the other approaches.