Mutational and phylogenetic status of west siberian strains of BLV

The study is devoted of full-genome BLV sequences circulating in cattle populations of the Novosibirsk region, Russia. The phylogenetic tree shows that the West Siberian isolates are quite closely related to such previously isolated strains as AF399704 (Brazil), AP018007, AP018016, AP018019, LC007988, LC007991 (Japan) and EF065638 (Belgium) we calculations show that the number of mutations that could independently occur in parallel evolving BLV strains significantly exceeds the expected number based on the probability of corresponding substitutions. It was also found that the studied isolates have some mutations, the presence of which, at first glance, is possible only with their divergent development in different independently evolving branches. However, calculations show that the probability of an independent origin of an identical mutation is extremely small, which indicates the possibility of exchanging RNA sites between isolates circulating in West Siberian cattle populations.


Introduction
Viruses are the most genetically diverse category of organisms, whose total number of genes exceeds their total number in living cell forms [1]. Bovine leukaemia, a very common disease of cattle, causes millions of euros worth of damage to the global economy [2]. Causative agent of bovine leukemia, bovine leukemia virus (BLV) is an RNA virus of the Retroviridae family [3,4]. The molecular structure of the virus has been thoroughly studied by many researchers [5 ,6], and the description of specific genes and the nucleotide composition of individual strains are deposited in various publicly available databases [7]. Although BLV has a relatively conserved genome [8], speciation within the global population still takes place, which results in a rather difficult phylogenetic classification. There is an opinion that the global BLV gene pool can be sufficiently divided into three or four sets [9]. On the other hand, at least 10 BLV genotypes have been described, within which separate strains have been isolated [10,11,12], even though, their study is irregular.

Materials and methods
The object of research was the cattle of black-and-white holstein breedobtained from farms around the Novosibirsk region (n=3954). Blood samples were drawn from the subcutaneous vein of animals in sterile disposable IV tubes using EDTA a preservative. The blood (15-25 ml of each sample) was analyzed using an automatic hematological analyzer (RCE-90 Vetin) in accordance with GOST 25382 [16].Hematological studies and DNA isolation (using the "DNA-Sorb-B" kit; FSUN Central Research Institute of Epidemiology, Russia) were performed on the day of receipt of the material in the laboratory. DNA was isolated. According to the cumulative results, 780 samples showed the presence of BLV proviral DNA in blood samples [14]. Of these 15 samples were selected for sequencing, meeting the following requirements: Sequencing was performed using the Sanger method according to the recommendations [17] on the 3730xl DNA Analyzer (Applied Biosystems, 850 Lincoln Center Drive, Foster City, United States). In addition to the 15 studied BLV sequences, the sequences of eight genotypes presented in the DDBJ database were included in the alignment (http://getentry.ddbj.nig.ac.jp/). The assessment of the degree of evolutionary divergence was calculated in pairs, as a result of which a matrix of evolutionary distances was constructed. The phylogenetic analysis was carried out using the Bayes method [18] with the GTR substitution model and the neighboring compound method (NJ) and the Tamura-Nei nucleotide substitution model with cluster reliability estimated at loading 1000 repeats (Fig. 1). The average nucleotide distances were estimated by applying the Tamura Nei model in the Ugene program.
The isolation of strains due to the lack of specially developed algorithms [19] was determined using standard statistical methods. The degree of randomness of the differences was determined by comparing the actual genetic distances, expressed as a percentage, with the critical value for randomness equal to 1% [1] using the Student's criterion [20]. The reliability of pairwise differences in genetic distances was evaluated by the criterion χ 2 [20]. The isolation factor was determined by analyzing the variances of the genetic distances between the selected samples and between each of the local isolates with the control strain FLK was determined by one-factor analysis of variance using the One-way function of the ANOVA application included in the STATISTICA8 software package.
To further confirm the uniqueness of the sequenced samples the probability of repeating combinations of single-nucleotide substitutions was calculated. Since probability in fact is a qualitative feature, the error was calculated according to the corresponding formula [20]. In the absence of generally accepted frequency constant mutations [21,22] the probability was calculated using two models developed directly by the author's team. The first of them is simplified, where the probability of replacement, based on the fact that only four types of nucleotides (A, T, G, C) are included in the DNA is a priori 0.33 [23]. The formula for calculating the simplified probability (ωs) of repeating mutations of each particular isolate is as follows: where n is the number of mutations of a particular isolate relative to the most related strain; 8401 is the length of the nucleotide sequence of the control strain BLV (according to NCBI: BioProject: PRJNA485481).
Since, according to some data, the probability of each type of single-nucleotide substitutions is a gene-specific value in the conditions of a particular biological species [23] a more complex model took into account the frequency of each type of single-nucleotide substitutions relative to the total number. The calculation formula looks like this: Where Ki is the total probability coefficient of nucleotide substitutions, taking into account the probability of each type of single-nucleotide substitutions and their number, defined as: Where k is the probability coefficient of each type of nucleotide substitution equal to the ratio of the number of substitutions to the total number of SNPs in the sample; ni is the number of this type of SNP in the isolate. The proportion of mutations (pi,%) with the probability of repetition was calculated by the formula: Where ni is the number of SNPs with the probability of repetition, Niis the total number of mutations. The expected number of repeated mutations was calculated by the formula: Where ni is the observed number of SNPs with the probability of repetition; 0.33 is the simplified probability of replacement with a similar nucleotide. Taking into account the different probability of occurrence of transitions and transversions [1], the expected number of repeated mutations was additionally calculated taking into account each type of substitutions. Repeated mutations were taken into account if they were detected in different strains that did not have a common cleavage from the dendrogram. The calculation formula looks like this: Reverse mutations were taken into account if they were present in the studied isolate and the control strain of FLK provided that it had a cleavage from the branch of strains where a direct mutation was detected. The proportion of probable reverse substitutions was calculated using the formula: Where ′ the number of reverse mutations; 8401 is the length of the sequence of the control strain FLK (b.p.).

Results
Based on the results of phylogenetic analysis (Fig. 1), it can be concluded that there are several clusters within the sample that are related to strains EF065638 (Belgium), AF399704 (Brazil), AP018007, AP018016, AP018019, LC007988 and LC007991 (Japan). It is interesting that sample No. 2 is located separately from other isolates of the first genotype. To determine the phylogenetic status of BLV samples isolated from the blood of cattle of the Novosibirsk region, the genetic distances between individual isolates were evaluated using data on the molecular structure of strains stored in the DDBJ database (http://getentry.ddbj.nig.ac.jp/) and the control strain FLK (Table 1, Fig. 1). It is noteworthy that the average distance estimated by the degree of evolutionary divergence between the isolated isolates (0.0229±0.0013) was slightly larger (t =1.25; not valuable) than the average distance between each of the selected samples with the control strain FLK (0.0185±0.0032). Testing of the distance factor (isolate-isolate or isolate-FLK) did not confirm the isolation of the entire array of local isolates relative to the control strain (F=1.488; p=0.225), which also indicates the heterogeneity of the origin of BLV isolates circulating on the cattle of the Novosibirsk region.
In the case of isolated mutagenesis, it is much more likely that the average distance of evolutionary divergence between the isolates of the Novosibirsk region is significantly smaller than between each of them and the control strain of FLK. The criterion χ 2 , which evaluates the reliability of the differences in the minimum distances between each of the isolates identified in the Novosibirsk region, as well as between the Siberian isolates and strains from the DDBJ database did not exceed 2.7 which indicates the lack of reliability of the differences. The minimum differences between the samples (Table. 1) have no significant differences from the random error (1%), despite the fact that the largest genetic distances indicate the accumulation of non-random mutations. At the same time, despite the apparent proximity, none of the isolates has accumulated a sufficient number of mutations relative to both the samples we isolated and the strains from the DDBJ database (Table. 1), which would be considered an independent entity without correction for random discrepancies. It is curious that the degree of differences between individual samples (Table. 1) from the control strain, FLK still exceeded 1%, despite the fact that 60% (9 out of 15) of the samples did not accumulate a critical (1%) number of nucleotide substitutions.  However, each sequenced isolate can be considered a unique neoplasm. Despite the fact that the number of identified mutations in some cases did not allow accumulating sufficiently large distances to be considered an independent formation, each singlenucleotide replacement is, in fact, a unique phenomenon, the repetition of which is almost impossible [24]. Based on the fact that the sequence length of the control strain BLV (FLK) is 8401 bp. (www.ncbi.nlm.nih.gov/nuccore/9626225) the probability that the same nucleotide mutates in two independently evolving strains is 1/8401 or 0.012 %, which is practically indistinguishable from zero (χ 2 = 0.0001). Of these, only in 1/3 of cases there is a probability of an identical replacement, provided the probability of replacing the nucleotide with any other = 0.33 [23]. Taking into account the different frequency of transitions and transversions that occurred in our study (Fig. 2), the probability of uniqueness of mutations was expected to be even lower than with random mutagenesis. In both cases, the probability of repeating mutations of any of the isolates relative to the control strain of FLK had no significant differences from 1% (Table. 2), that is, it corresponds to the value of the random error [1]. It would seem that each BLV isolate isolated in the Novosibirsk region should represent a unique neoplasm in its own way. However, in practice, some of the samples do not have or practically do not have mutations peculiar only to this sample. Only less than half of the samples had a frequency of unique mutations exceeding the value of the random error (Fig. 3). The type of purine-purine, pyrimidine-purine, purine-pyrimidine or pyrimidinepyrimidine substitutions practically does not affect the occurrence of identical SNPs in independently evolving strains (F=0.200; p=0.637), which, taking into account the unequal probability of transitions and transversions (Fig. 2), rather indicates not the independent occurrence of the same SNPs, but the "embedding" of the corresponding fragment of the genome. Surprisingly, there was no stable correlation between the total number of mutations and the number of probable repeats (r=0.487, not valuable). Thus, the calculations indicate either the very fact of the presence of repeated mutations as such, or the presence of recombinations in the selected samples, which is a fairly likely scenario in the conditions of joint livestock keeping and will allow us to say that recombinations, incertions and deletions could make some "corrections" to the phylogenetic analysis. So the mutation T>C in position N.o. 2766 is present in the samples № 1, 2, 3, 4, 7-11, 13, 14 and at the same time it is absent in samples 5, 6, 12, 15. It is interesting that the dendrogram (Fig. 1) assumes an early cleavage of the evolutionary branch of sample No. 2, which has a substitution that is absent in the late split-off samples No. 5, 6 and 15. This means either an independently generated identical mutation in the evolutionary branches of sample No. 2 and samples № 1, 3, 4, 7-11, 13, 14, or the reverse substitution of C˃T in the phylogenetic https://doi.org/10.1051/bioconf/20213606025 FSRAABA 2021 branch of samples No. 5, 6 and 15. Or, what seems much more plausible, there was an exchange of the corresponding sites of nucleic acids already directly in the livestock population on the territory of Western Siberia. Earlier studies of influenza viruses [25] and HIV [26] recorded the exchange of nucleic acid sites between viral particles, often accompanied by the formation of new strains, which indirectly confirms the assumption put forward.

Fig.3. Frequency of unique SNP in the BLV isolates of the Novosibirsk region
Reverse nucleotide substitutions apparently also took place in the formation process of BLV isolates of the Novosibirsk region, which undoubtedly introduced certain errors in the phylogenetic analysis (Table 3). As an example, we give a replacement T>C at position 3256, which is present in all samples except No. 5, which allows for a reverse replacement in the phylogenetic branch of this isolate, although the probability of a reverse replacement according to a random model is 1/6, this value is still not zero. But it should be understood that such coincidences can be observed in independent evolutionary branches of different strains. However, this is still not enough to allow for the independent random occurrence of several dozen identical mutations in independent evolutionary branches. In general, the number of reverse substitutions in 13 out of 15 samples had serious differences with the theoretically expected, and there were deviations both in the larger and smaller sides (Table  3). Moreover, the number of samples that did not belong to the 1st genotype BLV namely 10, 11, 12 and 13, had more than expected. It is noteworthy that the proportion of reverse mutations in sample No. 12 is 34.2937±0.5179 %, which significantly differs from the random error (P>0.001). At the same time, the proportion of expected reverse substitutions in other isolates does not differ significantly from 1%, and therefore falls within the range of random error. https://doi.org/10.1051/bioconf/20213606025 FSRAABA 2021

Discussion
The similarity of the isolates that were studied to different strains geographically related to Belgium, Japan, the USA, Brazil and Argentina (Fig. 3) is of considerable interest, since this feature is not typical for most viruses. For example, rabbit hemorrhagic disease virus (RHDV) strains tend to cluster geographically [27], which was not observed in this case. These results, as well as others from a number of previous studies [13], in fact highlight the circulation of several strains in the same geographical location. A fairly logical explanation for this fact is the constant movement of livestock associated with the selection necessity, in particular with the purpose of improving the productivity of animals. However, a single introduction of BLV strains into Western Siberia in the process of improving local blackand-white cattle by gene pool introduction of Holstein Friesian cattle may well have caused the heterogeneity of circulating BLV strains. However, the question of direct introduction of Brazilian, Argentine and Japanese strains remains open. A previous study [14] showed a low probability of direct introduction of BLV strains from South America to Western Siberia due to the lack of both economic and zootechnical expediency оof transporting livestock along such a long route. Also, сthere was no information about the import of cattle or bull semen from Japan in publicly available sources. Therefore, the similarity of Siberian isolates to Asian and South American strains can be explained by two hypotheses. Some assume that the similarity of viral strains can be explained by the accidental coincidence of stochastic mutations that independently оoccurred at the same loci in geographically isolated populations of viruses, resulting in the presence of the same nucleotides in similar positions [19]. R. Dawkins [28] believes that the probability of similarity of two nucleotide sequences due to random mutations is inversely dependent on the length of the compared fragments. However, others believe that the identity of nucleotide sequences can be considered a consequence of the coincidence of random mutations only if the possibility of direct gene flow is excluded by 100% [1,24]. Studies of https://doi.org/10.1051/bioconf/20213606025 FSRAABA 2021 satellite vertebrate DNA [29] have shown that allopatric isolation almost always leads to the accumulation of unique mutations in isolated populations. Despite the phenotypic similarity due to the convergence effect caused by the similarity of the orientation of selection vectors, the observed similarity is often the result of mutations at completely different loci [24]. In this case, it is well known that the gene pool of local livestock was repeatedly enriched, and therefore, the probability of isolation of West Siberian BLV strains by definition cannot be100%. Given that any gene flow invariably creates an error in phylogenetic analysis [30], similarity of some of the BLV isolates from Novosibirsk region to Japanese and South American strains can be explained by divergence within the global BLV gene pool before theirgeographical separation with subsequent independent evolution. The identified similarities can be explained by independent penetration of ancestral forms of AF399704 (Brazil) and LC007988 (Japan) into Novosibirsk and corresponding regions, and the differences found are a consequence of the accumulation of random mutations [14].
This observation gives rise to an interesting hypothesis. It is known that any viral genome can be classified into conservative and variable components. The latter can also be divided into conditionally and truly variable parts. Presumably, the accumulation of differences between the Novosibirsk isolate (sample 2) and AF399704 (Brazil) and LC007988 (Japan) was due to mutagenesis of the truth of a variable fragment of the BLV genome. Conditionally, the variable part in both cases has been preserved in the form of the most ancient 'root' and least virulent sequences, which is quite logical considering the evolutionary vector of viruses for accelerated reproduction [31]. The assumption looks quite plausible, because it is confirmed by the presence of asymptomatic BLV carriers which results in hidden transmissions of the virus (30-70 % of cases) in cattle populations [32]. The more virulent the strain, the more likely is to provoke physiological reactions of the body, and therefore to be detected at the next veterinary blood test and subsequent culling together with the animal carrier. This artificial selection is aimed at eradicating highly virulent strains, and therefore, strains with virulence below a certain critical level are able to have a symbiotic relationship with cattle and probably account for the same 30-70% of cases of latent presence of BLV in the animal body.
It is known that bacterial and viral genomes contain so-called mutator genes [32], that aggravate the intensity of mutagenesis. In the case of BLV, the function of mutator genes is attributed to the LTR -region [33]. Apparently, mutations of the LTR region occurred in the evolutionof BLV, which led to a split into two fundamentally different vectors, as indicated by some earlier studies [31]. The first of them is conservative, characteristic of the 1st genotype, which has preserved the maximum similarity with the original strain. The second branch is mutable and includes all other genotypes with the accumulation of corresponding mutations in the LRT-region. To confirm the hypothesis based on table 5 data, the average number of accumulated mutations for isolates of the 1st genotype samples № 2, 5-9, 14, 15, 16), as well as the 2nd and 4th genotypes (samples # 3, 4, 10-13) was calculate. Next, the average number of mutations in samples of the first genotype (174.11±13.57) was compared to that of isolates that do not belong to the first genotype (336.00±37.91). The obtained value of the student's criterion (4.02) at k=13 corresponds to the reliability of differences P˂0.01. The number of reverse mutations in samples 3 and 4 as well as 10-13 differed from 1% by a wide margin, whereas in the samples of the 1st genotype it were less or not significant from the random error. It turns out that the number of mutations in general and the reverse ones in particular in the evolutionary line of the 1st genotype were lower than in the others. Apparently, in the split-off evolutionary branch that gave rise to all other genotypes, active mutagenesis of SNPS took place SNP, which led to variability of the conditionally conservative part of the BLV genome, since most of it still remains practically unchanged. https://doi.org/10.1051/bioconf/20213606025 FSRAABA 2021

Conclusion
The results of the studies showed that the BLV isolates isolated in the Novosibirsk region are a single set, with characteristic mutations characteristic of some other strains. Among them are Japanese and South American, despite the fact that the probability of direct introduction is almost zero. Most likely, such similarities are explained by the presence of mutations inherent in the original form of the virus, which were subsequently found in strains from different parts of the world. The hypothesis of randomly occurring identical mutations found in divergently evolving strains seems unlikely, since the probability has no significant differences from the random variable = 1%, despite a fairly small number of unique mutations.