Global Repeat Map Method for Higher Order Repeat Alpha Satellites in Human and Chimpanzee Genomes ( Build 37 . 2 Assembly )

Alpha satellites are tandemly repeated sequences found in all human centromeres. In addition to the functional and structural role within centromere they are also a suitable model for evolutionary studies, because of being subject to concerted evolution. The Global Repeat Map (GRM) algorithm is a convenient computational tool to determine consensus repeat units and their exact size within a given genomic sequence, both of monomeric and higher-order (HOR) type. Using GRM, we identify in Build 37.2 assembly fifteen different alpha satellite HORs, three of them novel, not reported previously. In the next step we compute suprachromosomal family classification and CENP-B box / pJ distributions for these HORs. All human alpha satellite sequences originate from one pra-ancestral alpha satellite monomer. For the first time we perform GRM analysis and compare human and chimpanzee alpha satellite HORs for chromosomes 4 and give an evidence that the human and chimpanzee alpha satellites originate from a common ancestor that predated the human-chimpanzee separation. We also compare the codon-like trinucleotide (CLT) extensions of human and chimpanzee chromosome 4. Our results are consistent with the expectation that the alpha satellite HORs in human and chimpanzee have been created after the human-chimpanzee separation. (doi: 10.5562/cca1987)


INTRODUCTION
Centromeres in all eukaryotes play an essential role in many of chromosome functions, such as segregation in mitosis and meiosis, recognition and pairing of homologous chromosomes, sister chromatid attachment, and formation of kinetochore structures. 1 They are characterized by highly repetitive DNA regions and bound kinetochore proteins, which are required for the attachment of microtubules to chromosomes during mitosis.
Every human centromere consists of arrays of tandemly repeated 171-bp units, known as alpha satellite DNA that can be several megabases in size; 2 however, among reported chromosome assemblies, the amount and type of alpha satellite varies.These massive arrays are embedded between blocks of pericentromeric heterochromatin containing highly repetitive DNA. 3,4In situ hybridization with alpha satellite and immunolabeling using antibodies against kinetochore proteins also confirms that centromeres are located in these regions. 5−8 Figure 1 schematically shows the overall concept of HORs for an illustrative case of 11mer HOR.Higherorder alpha satellite DNA consists of ≈171-bp monomers organized in the second order arrays of monomeric repeat units that are highly homogenous.After a specific higher-order alpha satellite DNA has been created by amplification, its copies do not just passively accumulate mutations.There is a mechanism that works only within an array to maintain its homogeneity.Owing to that process called "homogenization", 6 all chromosomespecific arrays have the same typical percentage of divergence between HOR copies (1−5 %). 8 By contrast, monomeric alpha satellite DNA lacks detectable higherorder periodicity, and its constituent monomers are far less homogeneous (individual alphoid monomers diverge by 20−40 % from each other). 9−15 every chromosome has its own unique family of higher-order alpha satellite.At least 33 different alphoid subfamilies have been identified so far.Some of these subfamilies are specific for a single chromosome, whereas others are common to a few chromosomes.Certain chromosomes seem to have a single HOR within their centromeres, whereas others contain several different HORs.A type of polymorphism found in alphoid arrays involves higher-order units that differ by an integral number of monomers (monomer insertion or deletion), but nonetheless closely related in sequence. 7,14ighly homogeneous arrays of higher-order alpha satellite monomers are relatively recent additions to human genome. 7,8−25 In addition to their different sequence organization, higher-order and monomeric alpha satellite DNA also differ in their functionality.−30 Because of this direct connection with centromere function, aforesaid recent evolution of higher-order alpha satellite DNA raises some intriguing questions.
An explanation for generating higher-order alpha satellite DNA involves unequal crossing over between misaligned HOR units aligned on the register of homologous monomers.−33 By the process of unequal crossing over higher-order alpha satellite DNA enable rapid evolutionary development.
A possible functional role of noncoding sequences and in particular of repeats has been much discussed.Recent studies have indicated a relatively sharp transition between the eucromatin of chromosome arms and the region containing alpha satellites near the centromere, 20,25 raising the possibility that some genes are located close to alpha satellites. 34Higher-order repeats are in particular interesting since they are, as we mentioned above, due to more recent evolution and by the process of unequal crossing over enable a rapid evolutionary process.−38 It was postulated that the chromosomal regions in man, that are gene-poor, harbor gene regulatory elements that have the ability to modulate gene expression even over very long distances. 39A functional importance of repetitive elements has been considered, suggesting that repetitive components play a major architectonic role in higher order physical structuring.It was argued that a fruitful interpretation of sequence data may result from thinking about genomes as information storage systems with parallels to electronic information storage systems.From this informatics perspective, repetitive DNA is an essential component of genomes; it is required for formatting coding information so that it can be accurately expressed and for formatting DNA molecules for transmission to new generations of cells, and that the cooperative nature of protein-DNA interactions provides another fundamental reason why repeated sequence elements are essential to format genomic DNA. 40his was accompanied by observation that tandem arrays are often the regions that vary most between related taxa. 40Considering these facts, it is reasonable, in addition to already known functions, to assume a possible role of different alpha satellite structures as components in gene expression multi-layered regulatory network.
Alpha satellite DNA in great apes were previously studied, for example in Refs.41−46.Higher-order and monomeric alpha satellites have been recently studied by computational analysis of the most recent builds of human genome assembly.Despite their obvious functional significance, centromeric regions and their constituent alpha satellite sequences were largely omitted by the Human Genome Project because of their repetitive nature and the expected paucity of genes. 22,25,47In fact, due to centromere gap, located at the edges of p and q arms, 20 many of higherorder alpha satellite DNA regions are missing in NCBI human genome assembly.Nevertheless, although recent genome assemblies mostly provides alpha satellite content near the centromeric gaps, genomic assemblies of some chromosomes have reached a centromere region and in these cases detailed information on higher-order alpha satellite structure, dynamics and possible new functions can be obtained.Various computational tools have been developed for computational analyses of repetitions in a given genomic sequence, with a goal to achieve a compromise between efficiency and sensitivity requirements.However, there still remain challenges in the case of large scale and/or significantly distorted repetitions.In particular, for higher-order alpha satellites the difficulties are largely due to imperfect patterns containing substitutions, insertions and deletions.
Analysis of the NCBI assembly was performed recently using two different computational approaches.Rudd and Willard 22 have used standard computational tools.Monomers of alpha satellites were extracted using Repeat-Masker and characterized as monomeric or higher-order using dot matrix program DOTTER.Percent identity among monomeric alpha satellite monomers and among higher-order alpha satellites was examined using CLUSTALW.BLAST alignments of all known HORs reported in the literature versus all alpha satellite in the July 2003 assembly was performed in Ref. 22 revealing that many of higher-order alpha satellites reported in the literature were missing in the genome assembly.
Having in mind possibly important information regarding the evolutionary and functional role of human higher-order alpha satellite DNA and a demanding task of studying bioinformatically this higher-order units, we perform here an extensive study applying novel robust bioinformatics tools Global Repeat Map (GRM) 48−54 (see Methods).We investigate the major alpha satellite higher-order repeats from Build 37.2 assembly of all human chromosomes and determine detailed monomer scheme and consensus sequences, finding three novel higher-order alpha satellite structures, not reported previously.Furthermore, we identify and analyze alpha satellite HOR from chimpanzee chromosome 4 centromere and analyze higher-order, monomer, and base-to-base divergences in human and chimpanzee homologous chromosomes.We find that the human and chimpanzee HORs are widely different, both in size and composition of HOR units and in the constituting monomer structure.To analyze differences in possible regulatory elements in human and chimpanzee higher-order alpha satellite consensuses we apply here our new method of stop/start codon like trinucleotide extensions. 55

Key String Algorithm (KSA)
In spite of powerful standard computational tools in bioinformatics, there are still difficulties to identify and analyze long repeat units.For example, the Tandem Repeat Finder can identify tandem repeat units up to 2 kb. 56,57Here we use a new approach useful in particular for investigations of very long and/or complex repeats.
The KSA framework 48,49,52,53 is based on the use of a short sequence of nucleotides, referred to as key string, which cuts a given genomic sequence at each location of the key string appearing within the sequence.Going along genomic sequence, the lengths of ensuing KSA fragments form KSA length array.The length array could be compared to an array of lengths of restriction fragments resulting from hypothetical complete digestion cutting genomic sequence at recognition sites corresponding to KSA key string.While restriction enzymes cleave double stranded DNA selectively at specific palindrome sequences, in KSA we have no limitations on the choice of computational key string cutting a given genomic sequence.Periodicities appearing in KSA length array enable identification and location of repeats in genomic sequence.Analysis of repeat sequences at positions of any periodicity in the KSA length array provides consensus repeat unit and divergence of repeat copies with respect to consensus.A presence of higher order periodicity in KSA length array reveals the presence of HOR and enables determination of consensus HOR repeat unit (secondary repeat unit) and divergence of HOR copies with respect to consensus.
Similarly, with a proper choice of key string, the KSA fragments a given tandem repeat into monomers, as for example cutting Alu sequence at two identical positions providing identification of Alu sequences, cuts a palindrome providing identification of large palindrome sequences and their substructure, and so on.KSA provides a straightforward ordering of KSA fragments, regardless of their size (from small fragments of a few bp to as large as tens of kilobasepairs).KSA provides high degree of robustness and requires only a modest scope of computations using a PC.Due to its robustness, KSA is effective even in cases of significant deletions, insertions and substitutions, providing detailed HOR annotation and structure, consensus sequence and exact consensus length in a given genomic sequence even if it is highly distorted, intertwined and riddled (segmentally fuzzy repeats).Using HOR consensus sequence, in the next step KSA computes finer characteristics, as for example the suprachromosomal family (SF) classification and CENP-B box / pJα distributions.

Global Repeat Map (GRM)
The GRM program is an extension of KSA framework, executed as follows.
Step 1. GRM-Total module: Computes the frequency vs. fragment length distribution for a given genomic sequence by superposing results of consecutive KSA segmentations computed for ensemble of all 8-bp key strings (4 8 = 65536 key strings). 49Figures 2a and 2b show GRM diagrams for genomic sequence of human chromosome 4 (NCBI Build 37.2).In a GRM diagram each pronounced peak corresponds to one or more repeats at that length, tandem or dispersed.
Step 2. GRM-Dom module: Determines dominant key string corresponding to fragment length for each peak in the GRM diagram from step 1.An 8-bp key string (or a group of 8-bp key strings) that gives the largest fre-quency for a fragment length under consideration is referred to as a dominant key string.
Step 3. GRM-Seg module: Performs segmentation of a given genomic sequence into KSA fragments using dominant key string from the step 2. Any periodic segment within the KSA length array reveals the location of repeats and provides genomic sequences of the corresponding repeat copies.
Step 4. GRM-Cons module: Aligns all sequences of repeat copies from step 3 and constructs consensus sequence.
Step 5. NW module: Computes divergence between each repeat copy from step 3 and consensus sequence from step 4 using Needleman-Wunsch 58 algorithm.Code for GRM modules is available upon request to the authors.
Regarding the 8-bp choice of the key string size: using an ensemble of all r-bp key strings the average length of KSA fragments is ≈4 r .With increasing length of key strings the overall frequency of large fragment lengths increases.For an ensemble of all 8-bp key strings, from computed GRM diagrams we can identify the primary and secondary repeat units as large as hundred kilobases.
The GRM method is a straightforward method to provide a global repeat map in a single diagram, identifying all pronounced repeats in a given sequence, without any prior knowledge of the sequence structure.Once the size of a repeat is determined, GRM provides in a straightforward way location of the corresponding repeat arrays and their precise analysis.GRM is particularly useful for precise sequence analysis since the method does not involve averaging procedure.It is also useful that the method is robust with respect to sizeable substitutions and indels.Once the consensus repeat unit is determined using GRM, in the next step it could be well combined with BLAST for search of dispersed units or their fragments.For very large repeat units Tandem Repeat Finder has limitations, why GRM has no such size limitations.On the other hand, Tandem Repeat Finder may be more effective for short sequences.

Global Repeat Map for Human Chromosome 4
As an illustration of GRM study of higher-order and monomeric alpha satellite arrays in genomic sequence, we compute here the GRM diagram for genomic sequence of chromosome 4 (Figures 2a and 2b).The most pronounced peaks in this diagram correspond to the following tandem repeats in chromosome 4: alpha satellite repeats (GRM peaks at multiples of the ≈171 bp repeat unit), GRM peaks at 135 bp, 166 bp, and ≈310 bp which are signature of Alu sequences, GRM peak at 1409 bp and GRM peaks at 2210 bp (also multiple of ≈171 bp repeat unit and possible higher-order alpha satellite).In addition, there are eight pronounced GRM peaks at repeat lengths above 2500 bp.

Higher-order Alpha Satellite Repeats in Human Chromosome 4
In the next step we perform detailed study for alpha satellite HORs.Analyzing partial contributions to GRM diagram of chromosome 4 from individual contigs we find that the largest frequencies contributing to alpha satellite peaks are arising from contig NT_022853.15.The relevant GRM interval of fragment lengths for genomic sequence NT_022853.15 is shown in Figure 2c.Peaks at approximate multiples of basic alpha satellite repeat length ≈171 bp are decreasing with increasing multiple orders.That is a natural trend for tandem repeats.On top of that multiple pattern there is a strong peak at 2211 bp corresponding to the consensus HOR length.Actually, the peak at 2210-bp reveals higherorder structure of alpha satellite organization: thirteen (2211 bp / 171 bp ≈ 13) tandemly arranged alpha satellite monomers, which mutually diverge by 20−40 %, are arranged into more homogenous second order units.The high homogeneity of second order units, as well as relatively high heterogeneity of primary repeat units, are reflected in GRM diagram (Figure 2c) with a characteristic pattern of HOR-signature.At the end of decreasing array of primary repeat peaks there is a pronounced peak which corresponds to higher periodicity.Furthermore, in GRM diagram of contig NT_022853.15there is one pronounced peak at the fragment length 1553 bp ( 9) which disturbs HOR-signature pattern.This peak arises due to deletion of four monomers (m07, m08, m09, and m10) in HOR copy No 8 (Tables 1 and 2).
Next, we determine computationally a dominant key string, TTTG, which maximally segments the NT_022853.15sequence into ≈171-bp fragments.Performing KSA segmentation using this dominant key string we obtain an array of ≈171-bp fragments.Mutual alignment of all, in this way obtained, ≈171-bp monomers (see heatmap, Figure 3) approved above deduction; thirteen different monomers are constituent blocks of higher-order structure (13mer alpha satellite HOR).The corresponding basic consensus monomers are denoted as m01, …, m13 (consensus sequences in Table 3).
We also compute GRM diagram for each of thirteen alpha satellite basic monomers and we find no pronounced peak.This reflects their monomeric structure, i.e., the absence of internal repeat structure.In the next step, divergences between each repeat copy and HOR consensus monomers are computed, revealing internal structure of each alpha satellite HOR copy (Table 2).Detailed monomer structure of alpha satellite HOR copies in contig NT_022853.15 is summarized in Table 1.
Position of alpha satellite HOR copies in chromosome 4 (see ideogram on Figure 4) and heatmap in Figure 3 reveal that the 13mer HOR, in fact, seems to be a truncated tail of a major HOR block positioned in unsequenced domain in front of the contig NT_022853.15.
From results in Tables 1−3 and Figure 3 it is obvious that homogenization works better near the center of higher-order repeat arrays, and less well at the array of HOR edges, bordering some non-related sequences. 8ur results (Table 1) are in accordance with publica-

Global Repeat Maps and Higher-order Alpha Satellite Repeats for All Human Chromosomes
Using GRM algorithm we identify and analyze higherorder and/or monomeric alpha satellite units in all human chromosomes (Build 37.2 assembly).In the first step, we compute GRM diagrams for all human chromosomes for two relevant intervals of fragment lengths (Figures 5−8).(The same diagrams with the corresponding magnification ability are presented in Supplementary Figures 1 and 2).
We perform detailed study of alpha satellite HORs in every human chromosome in the same way as for chromosome 4 in the previous chapter.Summary of all human alpha satellite HORs corresponding to Build 37.2 assembly and positions of HOR blocks are given in Table 1 and Figure 4.
We have determined consensus HORs for chromosomes 1, 2, 4, 5, 7, 8, 10, 11, 17, 19, X, and Y (Build 37.2 assembly).Aligned monomers in consensus nmer HOR are denoted m01, m02, .... Arrays correspond to consensus HOR if monomer sequences correspond to the convention of 61 (referred to as direct (D) monomers).This is the case for 10mer in chromosome 2, 16mer in chromosome 7, 11mer in chromosome 8, 11mer in chromosome 9, 14mer in chromosome 17, and 17mer in chromosome 19.If the consensus HOR contains alpha monomers which are reverse complement to the convention of 61 (referred to as revers-complement (RC) monomers), then the array m01, m02, ... is reverse complement to consensus HOR; this is the case for 11mer in chromosome 1, 13mer in chromosome 4, 13mer in chromosome 5, 7mer in chromosome 9, 18mer in chromosome 10, 12mer in chromosome 11, 13mer in chromosome 19, 12mer in chromosome X and 45mer in chromosome Y.HOR consensus sequences are presented in Table 3  Only two chromosome (8 and Y) assemblies have arrays of highly homogenous higher-order alpha satellite DNA both on p and q arms (Figure 4).Because all chromosomes are known to contain higher-order alpha satellites at centromeres, 7,8 the fact that only the chromosomes 8 and Y have this level of success indicates that most current assemblies probably terminate at some distance from functional centromere.In two cases with higher-order alpha satellite DNA both on p and q arms, the alpha satellite tandem repeats are oriented in the same direction on both arms (see Table 4: D-D for chromosome 8 and RC-RC for chromosome Y), consistent with both being part of the same homogeneous tandem array.By contrast, within the heterogeneous monomeric arrays, the orientation of alpha satellite DNA typically switches several times within each arm contig. 8,20On the other hand, we have found 51 two 30mer HOR arrays in chimpanzee chromosome Y, positioned one after the other (with a gap of 599 bp in between).The first HOR, truncated at the start of the contig was referred to as direct.The second HOR which is reverse complement and highly identical to the first HOR array, was referred to as reverse complement.We conclude that the direct and reverse complement HOR arrays are positioned on the opposite arms of a palindrome and also are a part of the same homogeneous tandem array.
The SF classification of alpha monomers or HORs is used as a basis for discussion of CENP B box and pJ motif distributions in alpha monomers.

CENP-B box and pJ Motif Distribution
The consensus alpha satellite monomers for basic types A and B have only seven differences, five of which are concentrated in a 16 bp region of alpha satellite monomer.Such clustering indicates that these mutations are not random, but are affected by a selection. 8,61Indeed, the alternative A and B configurations match the binding sites of two alpha satellites-binding proteins, pJ (5'-TTCCTTTTPyCACCPuTAG-3') and CENP-B (5'-PyTTCGTTGGAAPuCGGGA-3'). 61,62Ohzeki et al. 63 and Warburton 64 have shown that only a combination of both the CENP-B box and HOR pattern provided successful centromere binding to kinetochore complex during mitotic processes.CENP-B box appears only in alpha satellite HOR 8,65,66 while no CENP-B boxes were detected in monomeric alpha satellites. 67,68The pJ motif reflects some of nucleotides derived from alpha satellite monomer which were shown to be effective in binding experiments.A shorter pJ core sequence CCTTTTPyC, 61 presenting an essential part of the pJ motif, was effective when dimerized, while a number of mutations outside of this core did not abolish binding.
After determining the SF classification of monomers in consensus HORs, we investigate the appearance of CENP-B box and pJ motif in these monomers.We find that only the monomers in 13mer HOR in chromosome 5 and monomers in 13mer HOR in chromosome 19 are without any CENP-B box and pJ motif (Table 4).This is an exception to the general pattern found for human chromosomes. 69In the next chapter we will see that these two HORs are highly homologous, what could be a consequence of interchromosomal transition or orchestrated interchromosomal homogenization.Another consensus HOR from chromosome 19, a 17mer, has one CENP-B box and one pJ motif.The consensus 18mer HOR in chromosome 10 has eight CENP-B boxes, located in every other monomer except one.In chromosome 2 a new 10mer consensus HOR has four CENP-B boxes in every other monomer except one.In chromosome 4 a 13mer consensus HOR has CENP-B box in three consecutive monomers.In chromosome 9 a new 7mer consensus HOR has the pJ motif in four consecutive monomers.Moreover, we find in chromosome Y a first reported case of HOR with only pJ motif and no CENP-B box.
Since the CENP-B box and pJ motif are essential for protein binding, an interesting question is whether the monomers with and without CENP-B box and pJa motif have different sequence divergences.In this respect, we find that the pairwise divergence among monomers shows no dependence on the presence or absence of the CENP-B box or pJ motif.

Homogenization within Consensus Higher-Order Alpha Satellite Monomers
To explore the evolutionary relationships of higherorder alpha satellite monomers in human genome, we compared all consensus higher-order alpha satellite monomers from Table 3 and Supplementary Table 1 to each other.We performed Needleeman-Wunsch alignments 58 between all possible pairwase combinations of monomers (223 monomers, 49729 alignments).The relationship between monomers in consensus HORs is presented graphically in a heatmap (Figure 9), where each divergence is depicted according to a given color scale.
There is difference between weaker homogeneity within HOR alpha satellite consensus monomers and before mentioned stronger homogeneity within array of alpha satellite HOR copies which have typical divergence between copies of 1−5 % (for example of chromosome 4 see Table 1), which is a consequence of evolutionary concerted processes.Similarity between various consensus monomers within one higher-order repeat unit is derived from common ancestral alpha satellite monomer.Creation of a large tandemly repeated alpha satellite array may occur through abruptly amplification of ancestral alpha satellite monomer or through a step by step series of unequal crossovers and/or gene conversion events that initially create duplication and then expand. 8Such hypothesis of one praancestral alpha satellite monomer is additionally supported by calculations of consensus sequences of all monomers within any consensus HOR and their mutual alignment (Table 8).Very low mutual divergences between consensus monomers from consensus HORs in Table 8 reveal that calculation of consensuses is like traveling back in time: all higher-order monomers in every chromosome are evidently descendants of oneancestral alpha satellite.After a specific alpha satellite array has been created by amplification, there are no mechanisms, like "homogenization" processes in a case of HOR copies, to maintain its homogeneity and copies just passively accumulate mutations.In a more realistic situation, there are homogenization mechanisms in both, monomeric and HOR sequences, but homogenization processes within HOR copies will occur more frequently than between monomeric units. 25In both cases, the mean divergences indicate the time of creation of initial alpha satellite arrays, and in the case of isotropic random mutations, standard deviations indicate rate of creation processes alone.On the other hand, the mean divergence could also be a good recipe for estimation of the age of HOR creation, because after a specific HOR unit has been created, the processes of effective "homogenization" have started and present mutations have been fixated.
Analyses of interchromosomal (or inter-HOR, if there are two different higher-order alpha satellite units in the same chromosome) mean divergences reveal (Table 6) a few possible similar higher-order repeat units, e.g., 13 mer in chromosome 5 and 13mer in chromosome 19, 13mer in chromosome 5 and 17mer inchromosome 19, 16 mer in chromosome 7 and 13mer in chromosome 19, 16mer in chromosome 7 and 17mer in chromosome 19, 11mer in chromosome 9 and 45mer in chorosome Y, and so on.It is important to notice that real information of mean divergences between two different higher-order repeat units are masked in Table 6 because of monomeric heterogeneity within one higherorder repeat unit.
To overcome above-mentioned problem we performed modified estimation of mean divergence; to each consensus monomer in one higher-order repeat  unit a monomer with lowest alignment divergence from other higher-order repeat unit has been assigned and then the mean divergence of, so obtained monomer pairs, has been calculated (Table 7).This modified mean percent divergence we called minimal divergence.
The 13mer and 17mer HOR in chromosomes 19 have the lowest mutual minimal divergence; we proposed that one higher-order unit is derived from the other, although more complex explanations, with both higher-order units derived from a third unknown higherorder unit is also possible. 50It is very unlikely that the 17mer unit arose from 13mer unit by addition of four monomers, because monomers alignment excluded possibility that the four additional monomers in 17mer unit are duplications of any monomers from 13mer unit (see chromosome 19 13mer and 17mer alignment in a heatmap from Figure 9).Therefore, we hypothesized that the shorter, 13mer higher-order repeat unit arose from the longer 17mer higher-order unit by deletion of four alpha satellite monomers which are all distinct from the monomers in 13mer.This is consistent with a general view 7 that a type of polymorphism found in alphoid arrays can be related to HOR units that differ by an integral number of alphoid monomers.It should be noticed that, in addition to the chromosome 19 case of two similar higher-order units on the same centromere, there is a sample of two completely different higherorder alpha satellite units on the centromere of chromosome 9.It is obvious, from Figure 9 and Table 7, that these higher-order alpha satellite structures have completely different building units (monomers) and, from Table 6, that they are created in different moments and with different rates.
Moreover, 13mer in chromosome 5 is similar to the both 13mer and 17mer in chromosome 19 what could be a consequence of two possible processes: (1) these sequences were subject to intrachromosomal homogenization mechanisms or (2) blocks of higher-order 13mer alpha satellite may have undergone exchanges via transposition mechanisms. 25n addition to this group of three very similar alpha satellite HORs there are a few groups of HORs with somewhat greater mutual diversity: the group of 11mer in chromosome 1, 12mer in chromosome 11, 14mer in chromosome 17, and 12mer in chromosome X, or the group of 10mer in chromosome 2, 11mer in chromosome 8, and 7mer in chromosome 9, and so on, with mutual divergence of about 13 %.If we assume that the transposition mechanisms are more probable to be responsible for these similarities it is very easy (from Table 8.Mean divergences among alpha satellite monomers from consensus HOR sequences in human chromosomes.For example, to a small square at the intersection of the first horizontal band corresponding to chromosome 1 and the fifth vertical band corresponding to the chromosome 7, divergence between consensus sequence of all 11 monomers from consensus 11mer HOR in chromosome 1 and from consensus sequence of all 16 monomers from consensus 16mer HOR in chromosome 7

Figure 1 .
Figure 1.Schematic presentation of seven copies of 11mer alpha satellite HOR.The i-th HOR copy is denoted by h i and the k-th constituent alpha satellite monomer by m k .

Figure 2 .
Figure 2. GRM diagram for Build 37.2 genomic assembly of human chromosome 4 for intervals of fragment lengths: a 0-80000 bp.Pronounced peaks above 2 kb are denoted by the corresponding fragment lengths.The most pronounced peaks above 2 kb are at approximately 3293, 4745, 5157, 6155, 19258, 29185, 52801 and 57546 bp.b 0-2500 bp.The most pronounced peaks are at approximately 135, 171, 310, 1409, and 2211 bp.c contig NT_022853.15containing alphoid HOR in chromosome 4.There is a pronounced tandem array with alphoid repeat units of 171 bp.The peaks at multiples of alphoid monomer repeat unit 171 bp, i.e., n•171 bp, are denoted by nα.For description of peaks see the text.

Figure 3 .
Figure 3. Percent divergence scores for base to base comparison of alpha satellite monomers from contig NT_022853.15.Percent divergence scores are colored according to the color scale shown on the right.

Figure 4 .
Figure 4. Human chromosomes ideogram with denoted positions of alpha satellite HORs investigated in this paper.

Figure 5 .
Figure 5. GRM diagrams for Build 37.2 ansembly of human chromosomes 1 to 12 in the interval of fragment lengths 0 -2000 bp.Pronounced peaks are denoted by fragment lengths (repeat unit lengths).

Figure 6 .
Figure 6.GRM diagrams for Build 37.2 assembly of human chromosomes 13 to Y in the interval of fragment lengths 0−2000 bp.

Figure 7 .
Figure 7. GRM diagrams for Build 37.2 assembly of human chromosomes 1 to 12 in the interval of fragment lengths 2 kb -25 kb bp.

Figure 8 .
Figure 8. GRM diagrams for Build 37.2 assembly of human chromosomes 13 to Y in the interval of fragment lengths 2 kb -25 kb bp.

Figure 9 .
Figure 9. Graphical presentation ("heatmap") of divergence between monomers in consensus HORs.Monomers in consensus HORs are displayed both horizontally and vertically.The color of the intersection of the horizontal band corresponding to the n-th monomer in the i-th chromosome and the vertical band corresponding to the m-th monomer in j-th chromosome represents the divergence (in percents) between these two monomers (vertical scale on the right side).

Figure 10 .
Figure 10.Mean divergence and standard deviation among human alpha satellite monomers in consensus HOR.
tions where it was shown that structural variants of HORs usually differ in length as a result of the presence or absence of an integral number of monomers.Warburton et al. in Ref.59have already described duplications of one monomer, as happens here for instance in

Table 5 .
Suprachromosomal family (SF) classification of HOR sequences from Table3and Ref. 55.To each monomer from HOR we assign the SF classification of closest SF consensus monomer defined in Ref. 8 and Ref. 61