Diversity and recombination analysis of Cotton leaf curl Multan virus: a highly emerging begomovirus in northern India

Background Cotton leaf curl disease (CLCuD), caused by begomoviruses in association with satellite molecules, is a major threat to cotton production causing enormous losses to cotton crop in most of the cotton growing countries including Indian subcontinent. In this study, isolates of begomovirus and satellite molecules associated with CLCuD were collected from North India (Haryana, New Delhi). They were amplified employing rolling circle replication mechanism, cloned, sequenced and, their phylogenetic and recombination analysis was performed. Results The five Cotton leaf curl Multan virus (CLCuMuV) isolates investigated in this study showed monopartite organization of the genome typical of Old World begomoviruses. Nucleotide sequence analyses assigned them as the strains of CLCuMuV and were designated as CLCuMuV-SR13, CLCuMuV-SR14, CLCuMuV-ND14, CLCuMuV-ND15 and CLCuMuV-SR15. The genome of CLCuMuV-SR13 shared a highest level of nucleotide sequence identity (98%) with CLCuMuV (JN678804), CLCuMuV-SR14 and CLCuMuV-SR15 exhibited 96% with CLCuMuV (KM096471), while isolates CLCuMuV-ND15 and CLCuMuV-SR15 revealed 96% sequence identity with CLCuMuV (AY765253). The four betasatellite molecules investigated in this study shared 95–99% nucleotide sequence identity with Cotton leaf curl Multan betasatellite (CLCuMB) from India. The betasatellite molecules were designated as CLCuMB-SR13, CLCuMB-SR14, CLCuMB-ND14 and CLCuMB-ND15. Alphasatellite molecules in this study, designated as GLCuA-SR14, GLCuA-ND14 and GLCuA-SR15, revealed 98% identity with Guar leaf curl alphasatellite (GLCuA) reported from Pakistan. Conclusion The phylogenetic and recombination studies concluded that the isolates of CLCuMuV genomes undertaken in this study have a potential recombinant origin. Remarkably, significant recombination was detected in almost all the genes with contribution of Cotton leaf curl Kokhran Virus (CLCuKoV) in IR, V1, V2, C1, C4 and C5 regions and of CLCuMuV in C2 region of CLCuMuV-SR14. CLCuKoV also donated in C2, C3 regions of CLCuMuV-ND14; V1, V2, C2 and C3 regions of CLCuMuV-ND15 and C1 of CLCuMuV-SR15. Altogether, these observations signify the uniqueness in Indian CLCuMuV isolates showing contribution of CLCuKoV in all the genes. An interesting observation was frequent identification of GLCuA in CLCuD leaf samples. Electronic supplementary material The online version of this article (10.1186/s12864-019-5640-2) contains supplementary material, which is available to authorized users.


Background
India is the largest cotton producing country in the world [1]. Cotton (Gossypium hirsutum, family Malvaceae) is a shrub cultivated in tropical and subtropical countries for its soft, fluffy and staple fiber. In Mexico, wild cotton species show greatest diversity followed by Australia and Africa. G. hirsutum, an important cash crop, is prone to infection by several pathogens and insects. Among them begomoviruses associated with cotton leaf curl disease (CLCuD) cause severe losses to cotton crop [2][3][4].
In 1967, CLCuD was reported for the first time in Multan, Pakistan [12]. Later in 1980, it unexpectedly decreased cotton productivity causing great concern to cotton growers and agricultural scientists. In 1992-1997, it again reduced the yield by 29% [13]. In India CLCuD was first reported from Sri Ganganagar in 1993 [14]. Recently, several variants of CLCuKoV-Bu strain have been reported from India and Pakistan, thus posing an alarming situation which may lead to another CLCuD outbreak in the subcontinent [15]. The recombinant origin of begomoviruses significantly contributes to their increasing pathogenicity [16][17][18][19][20].The genome of begomoviruses associated with CLCuD (henceforth referred to as BAC) is monopartite consisting of a circular, single stranded (ss) DNA molecule (~2.7 kb). It is organized into 7 ORFs viz. C1, C2, C3, C4 and C5 in complementary sense and V1 and V2 in virion sense. The ORFs participate in replication (C1 and C3), transcription activation (C2) and packaging (V1 and V2). Though the exact function of C4 and C5 genes is still not known, C4 protein participates in suppression of RNA silencing mechanisms and C5 protein, which is not common, may be involved in replication of DNA but remains insignificant in viral infection [21]. The opposing complementary sense and virion sense genes are separated by a conserved non-coding intergenic region (IR) which is called as the common region (CR), consisting of about 200 nts having a highly conserved nonanucleotide (TAATATTAC) sequence containing an origin of replication (ori) [22].
Monopartite begomoviruses have also been found to be associated with circular, ss satellite molecules-termed as betasatellites are~1.35 kb (approximately half the size of the monopartite begomovirus DNA genome). They are required for induction of disease symptoms in their host plants [23,24]. The replication, movement and encapsidation of betasatellites relies on helper virus. The function of betasatellites is governed by the βC1 protein, which functions as a pathogenicity determinant [25,26].
Some begomovirus-betasatellites are also associated with alphasatellites [11]. The size of alphasatellite is nearly half of the begomoviral genome (1.3-1.4 kb). They are able to replicate autonomously. The alphasatellite encodes a replication associated protein (Rep), which depends on helper begomovirus for its movement and whitefly-mediated transmission in plants [27]. Alphasatellite regulates the virulence of begomovirus-betasatellite complex [28,29]. Another distinct class of non-coding DNA satellite has been found to be associated with some begomoviruses termed as deltasatellite. The size of deltasatellite is about one quarter of the begomovirus genome (~700 nt) and its function is still not known [30].
The genome of begomovirus replicates via rolling circle replication mechanism. Geminiviruses can also replicate their dsDNA via recombination-dependent replication mechanism [31][32][33]. It is well established that recombination plays an important role in the establishment, emergence and evolution of new species of geminiviruses. The exchange of genetic material via recombination causes plant virus evolution [34]. The network of relationships among various begomoviruses in the Indian subcontinent could also be explained through recombination [35].
In this study, complete genome of several isolates of CLCuMuV, associated betasatellites and alphasatellites were cloned and sequenced. Since the biodiversity of begomoviruses in North India is very rich, the recombination events in the viruses were investigated and analysed for their possible role in the evolution of virus. Infectious clones of CLCuMuV as well as CLCuMB were prepared and agroinoculated in Nicotiana benthamiana plants. Furthermore, phylogenetic analysis and potential recombination of isolates of CLCuMuV with reference to other begomoviruses were studied to understand the CLCuD-begomovirus complexity and evolution in the Indian subcontinent.

Confirmation of BAC complex
The samples taken in this study from Sirsa during the years 2013, 2014 and 2015 were designated as CLCuMuV-SR13, CLCuMuV-SR14, and CLCuMuV-SR15, respectively and from New Delhi during 2014 and 2015 were abbreviated as CLCuMuV-ND14 and CLCuMuV-ND15. PCR-based amplification of the CP gene confirmed the presence of beomoviral genome in CLCuD cotton samples (Additional file 1: Figure S1). Additionally, 1 μg of Rolling Circle Amplification (RCA) product obtained from CLCuD-infected leaves following digestion with restriction enzyme KpnI yielded a DNA fragment (~2.7 kb), thus confirming the presence of the begomoviral DNA in CLCuD-symptomatic leaf samples.
Dendrograms were drawn to analyse the phylogenetic relationship of the CLCuMuV isolates from the present study which formed clusters and showed a close relationship with the CLCuMuV isolates from different regions of India and Pakistan. Sequences, derived from Sirsa (SR14-KX951460 and SR-15KY888163) and New Delhi isolates (ND14-KX951461 and ND15-KY561820) along with previously reported CLCuKoV sequences (HF549182) from Pakistan, formed a new clade with CLCuMuV sequences (AY765253, AY765256 and DQ191160) from India; that further diverged into 2 subclades differentiating New Delhi isolates from that of Sirsa isolates, thus signifying their divergence from a common progenitor. However, sequence KJ868820 (SR13) obtained from the Sirsa samples collected from the same site (in the year 2013) clustered separately with CLCuMuV sequences from Pakistan (EU384573, AJ496461, AJ496287 and AJ002447) and India (single isolate JN678804) (Fig. 1). Hence, it shows emergence and variability of CLCuMuV sequences in the Indian subcontinent.
The heat map representing the pairwise sequence alignment generated by Sequence Demarcation Tool version 1.2 (SDT, 58) showed that the genome of CLCuMuV isolates shared more than 91% of sequence identity with other relevant BAC sequences used in this analysis (Additional file 3: Figure S2 a-e) which is more than the cutoff value (91%) for demarcation of begomovirus species [36].
The amplification and sequencing of full-length (~2.7 kb) genome obtained using oligo primers (viz. Cot For./Cot Rev.) further confirmed the presence of CLCuMuV in all the RCA derived samples used in the present study.
SDT analysis showed that in the present study CLCuMB shared > 90% identity with other relevant betasatellites used in SDT analysis (Additional file 6: Figure S3 a-d), while alphasatellites displayed > 92% identity with other alphasatellite sequences used in the analysis (Additional file 7: Figure S4 a-c).

Recombination analysis
Since recombination is the leading aspect in the evolution and emergence of geminiviruses [14,38], we investigated the evidence of recombination in the nucleotide sequences of newly identified CLCuMuV isolates and associated satellites using RDP4 (Recombination detection program). The analysis was performed by comparing the present sequences of BAC with various sequences of begomoviruses retrieved from GenBank database. It displayed potential recombination events with distinct patterns in several full-length genomes of newly-identified CLCuMuV and associated betasatelllite and alphasatellite sequences as listed below:
Nucleotide sequences analysed in this study, including putative parental sequences of these isolates have shown significant recombination breakpoints in IR, V1, C1 and C2 regions which further supports the previous studies [34,37,39]. The RDP analysis has depicted that the putative parents of the present virus isolates, and also other viruses included during analysis, have evolved via recombination at

Alphasatellites
Among sequences of different alphasatellite molecules, recombination events were detected in GLCuA-ND14 (KX987150) at two sites, one within the Rep region (nt position 365-477) with GLCuA (HE599396) as minor parent and GLCuA (HG417077) as major parent. The other recombination event covered nt position 933-1076 involving Rep and A-rich region with Okra leaf curl alphasatellite (OkLCuA, Acc. KF471055) as minor parent and GLCuA (KT390423) as major parent. It was considered likely as the event was detected by > 3 methods (Fig. 6).

Infectivity of cloned DNA
Infectious clone of CLCuMuV (pGR5.4) upon agroinfiltration in leaves of N. benthamiana, developed mild symptoms. However, in combination with betasatellite (pCAMβ), symptoms such as vein thickening, yellowing and curling of leaves were observed at 14 days post infiltration in N. benthamiana (Table 1; Fig. 7). Thus, it demonstrated that CLCuMuV is a betasatellite-dependent begomovirus for symptom induction. The CLCuMuV, when alone, could infect N. benthamiana, turning yellowing of leaves. However, additional betasatellite was required for induction of leaf curl symptoms.

Discussion
The present study was taken up as it deemed desirable to understand the complexity of this disease. Cotton leaves showing CLCuD symptoms were taken from Haryana and New Delhi, India. RCA was performed for characterization and phylogenetic analysis of begomoviruses,, betasatellite and alphasatellite molecules associated with CLCuD from North India. Nucleotide sequences of all the BAC showed the typical size and organization of typical monopartite begomovirus [40][41][42][43]. The isolates of the BAC in the present study exhibited a maximum nucleotide sequence identity with CLCuMuV sequences (JN678804, AY765253 and DQ191160) reported earlier from different parts of India [27]. The sequence similarity was further supported by phylogenetic analysis. The clustering of the begomoviral isolates with CLCuMuV is consistent with nucleotide sequence alignments. Further, phylogenetic studies demonstrated that isolates from the same region (Sirsa, Haryana) collected in different years fell under two separate groups. The formation of separate clusters might be due to difference in the length of V2 region of CLCuMuV-SR13 as compared to other isolates identified in this study. Also, the nucleotide sequence of CLCuMuV-SR13 (KJ868820) showed highest similarity with CLCuMuV sequences from Pakistan (EU384573, AJ496461, AJ496287 and AJ002447), thus formed a group with Pakistani isolates. Other four newly identified sequenes of CLCuMuV-SR14 (KX951460), CLCuMuV-ND14 (KX951461), CLCuMuV-ND15 (KY56 1820) and CLCuMuV-SR15 (KY888163) shared maximum identity with CLCuMuV sequences from India (AY765253, AY765256 and DQ191160) and forming a group with Indian isolates. The CLCuMuV isolates in this study also exhibited highest sequence identity amongst themselves indicating the presence of same strain according to recent begomoviral demarcation threshold [36]. This further demonstrates that CLCuD in northern Indian regions is predominantly induced by CLCuMuV [44]. Interestingly, despite high sequence similarity, there exists diversity among genome organization of these isolates. For instance, the genomes of CLCuMuV-SR14 and CLCuMuV-SR15 possess 7 ORFs, whereas those of CLCuMuV-SR13, CLCuMuV-ND14 and CLCuMuV-ND15 contain 6 ORFs. Remarkably, C5 gene rarely reported in CLCuMuV genome, was identified in CLCuMuV-SR14 and CLCuMuV-SR15.
Betasatellite sequences analysed in this study showed typical genome organization having a single ORF on complementary-sense strand, SCR (conserved amongst all the betasatellites) and a sequence rich in adenine (A-rich region) [40,45]. These sequences shared a maximum identity with CLCuMB isolates from India and were supported by their clustering pattern in phylogenetic analysis. However, CLCuMB sequences, obtained from two different regions (Sirsa and New Delhi) appeared to form separate clusters. The clustering pattern of CLCuMB sequences from Sirsa seemed to have same ancestral origin but have probably evolved separately. These results depicted the monophyletic nature of CLCuMB which is consistent with the previously reported (pre-2013) CLCuMB sequences from India and Pakistan, contrasting the recently reported (post-2013) CLCuMB sequences from India and Pakistan that form groups regardless of their geographical origin [37,44,46].
Alphasatellite sequences obtained in this study also showed a typical pattern consisting of a single virion sense Rep gene and an A-rich region [40]. Notably, none of the alphasatellite reported in the current study shared significant sequence similarity with cotton leaf curl alphasatellite, rather they exhibited a maximum similarity to GLCuA (HE599397, HE599396, HG934789,…etc) from Pakistan. This encouraged us to conclude that the alphasatellite associated with CLCuD isolates of the present study is actually GLCuA. The frequent identification of GLCuA  in the present study from CLCuD-infected cotton plants, however, contrasts with the recent detection of CLCuD associated alphasatellites (ToLCA and OkLCuA) by Datta et al., 2017 [44]. Thus, our results revealed a mixed infection and different natural combinations of CLCuMuV and its associated satellites CLCuMB and GLCuA in cotton plants from India. Phylogenetic analysis supported these results by placing the GLCuA sequences on a separate branch closely related to GLCuA from Pakistan (HE599397, HE599396, HG934789,…etc). The occurrence of GLCuA might be due to migration of whitefly vector from the neighboring country Pakistan as suggested by Sahu et al., 2018 [38].
The identification of CLCuMuV in CLCuD cotton plants in North India shows a correlation with the recent literature which documents the rebound of CLCu-MuV in Punjab (India) and Vehari in Pakistan [44,46]. Thus, our observations from North India together with the recent studies from the Punjab regions of India and Pakistan, suggest the recurrence of CLCuMuV in the Indian subcontinent and shows its predominance and frequent emergence in North Indian regions which might cause shift in previous epedemics of the recombination-prone CLCuKoV to this subcontinent [47].
Various factors such as mixed infections, high level of viral replication and increased vector host range could contribute significantly to recombination [48]. The evolution and biodiversity of begomoviruses and also other viruses is mainly attributed to recombination [17]. Our analysis revealed a potential recombination origin of CLCuMuV isolates studied in this article. In previous studies IR, V1, V2 and C1 regions were shown to be the recombination hotspots in begomovirus genome [24,34,36,39,46,[49][50][51]. In the present study, recombination events were however identified in all the CLCuMuV genes and correlate well with the recent analysis [44]. In spite of similarities, earlier findings differ from this study in several aspects [44]. For instance, recombinant fragments of CLCuMuV were contributed by CLCuKoV and CLCuAlV (as major parents) and CLCuMuV (as minor parent). Inaddition, almost all of the recombinant fragments of CLCuMuV were contributed by CLCuMuV (as major parent) and CLCuKoV (as minor parent) and no evidence of recombination from CLCuAlV was observed in the present study. Further, in previous study CLCuKoV was shown to contribute only the Rep, CR and ori to CLCu-MuV [10]. However, in the present study CLCuKoV showed contribution in genes of both complementary sense and virion sense of the CLCuMuV genome. For instance, in CLCuMuV-SR14, CLCuKoV devoted the IR, C1, C4, C5, V1 and V2 regions. Inspite of that, C1, C4, V1, V2 and C5 regions of this isolate (SR14) were also shared by CLCuMuV. Further, C3 and C2 regions were contributed by isolates CLCuMuV (JN678804) and CLCuMuV (EU384574), respectively. In CLCuMuV-ND14, CLCuKoV donated only in the V1 and V2 regions and not the C1Rep. In CLCuMuV-ND15, CLCuKoV contributed the V1, V2, C2 and C3 genes. Though CLCuMuV isolates from Pakistan having short stretches of C1, C2 and C3 were reported to be derived from CLCuKoV, in the Indian isolates contribution of CLCuKoV was previously detected only in the IR and Rep regions [10]. These observations suggest that the emergence of several new BAC in north Indian regions occurred because of recombination between two distinct species viz. CLCuMuV and CLCuKoV.
In our study nucleotide sequences of CLCuMB showed recombination in the βC1, SCR and A-rich regions, thus supporting the previous reports from Pakistan and India [24,37,44,48]. And, in GLCuA, recombination was detected in Rep and A-rich regions which is compatible with earlier studies suggesting these regions as the hotspots of recombination [52].

Conclusions
In conclusion, this study revealed that CLCuMuV has been evolving continuously and the evolution of recombinant begomovirus isolates reported here might be derived via exchange of genetic material from related begomoviruses viz., CLCuMuV, CLCuKoV and CLCuRaV.
Our results clearly demonstrated the complex pattern of inter-species and intra-species recombination leading to significant structural changes in DNA components of CLCuMuV. It thus resulted in the genetic variability and emergence of new variants. This genetic diversification of CLCuMuV may lead to diverse host range (enhancing threat to other crops), transmission ability and capability of breaking resistance, hence posing a big challenge for disease management. This study encourages further analysis about molecular epidemiology and genetic evolution of geminivirus. Timely diagnosis and application of control strategies need to be done before this disease complex spreads over a large scale and causes severe threat to economy. This molecular information may provide insights about the genome complexities, sequence identity and genetic variability parameters viz., recombination analysis of CLCuMuV genome, its associated satellite DNAs (CLCuMB and GLCuA) which are indispensable for developing virus-based disease management strategies. This may accelerate the development of CLCuD-resistant plants. Pathogen-derived resistance manipulating CLCu-MuV genes has already shown a great potential to generate virus-resistant plants.

Methods
Sample collection and detection of begomoviruses associated with CLCuD (BAC) Cloning and nucleotide sequencing of satellite molecules (betasatellites/alphasatellites) associated with CLCuD Amplification of satellite molecules was done using RCA product as the template employing oligo primers specific for betasatellite [54] and alphasatellite [55]. The PCR products were resolved on agarose gel, purified with the help of Gel extraction Kit (QIAGEN, Germany) and cloned into pDrive (QIAGEN, Germany) and pGEM-T easy vector (Promega, USA). Nucleotide sequences of the clones representing betasatellite/alphasatellite were determined using automated sequencer (Xcelris Genomics, Ahmedabad, India).

Sequence comparison and phylogenetic analysis
Nucleotide sequences derived from different clones representing BAC isolates were assembled to construct complete genome. Sequences of the complete BAC genomes and their associated satellite molecules (this study) were compared with those available in NCBI GenBank database following BLASTn (https://blast.ncbi. nlm.nih.gov) program [56]. A total of 54 full-length sequences of genomes of BAC, 44 betasatellite and 53 alphasatellite sequences originating from the Indian subcontinent and depicting high homology with the sequences determined under study, were selected and retrieved from GenBank (details of sequences used in this analysis are provided in Additional file 8: Table S4; Additional file 9: Table S5; Additional file 10: Table S6). Phylogenetic analysis following multiple sequence alignments was performed using MEGA-7.0 [57]. Default parameter of one character-based and two distance-based algorithms were included. A consensus dendrogram was finally drawn for these algorithms employing bootstrap value of 1000 replicates. Pairwise sequence alignment was performed using SDT [58].

Recombination analysis
To investigate the extent of recombination, putative breakpoints and parents of CLCuMuV isolates, RDP-4 was employed [59]. The full-length sequences of BAC, betasatellite and alphasatellite were used for recombination analysis (Additional file 11: Table S7 a-c). Nucleotide sequences were aligned using CLUSTAL-W in MEGA 7 [57]. The resultant alignments were exported to the RDP4 and subjected to recombination analysis. Recombination analysis having cutoff of 0.05 was used as a P-value and the default settings was performed in RDP4 programme employing RDP, BOOTSCAN, GENCONV, SISCAN, MAXCHI, CHIMAERA and 3SEQ to estimate the recombination events. Recombination analysis were considered true, if they were supported by at least 3 methods.

Development of CLCuMuV/CLCuMB-derived infectious clones and agroinoculation into Nicotiana benthamiana plants
Infectious clones of the CLCuMuV genome (isolate CLCuMuV-SR13) and associated CLCuMB (CLCuMB-SR13) were cloned in pGreen0029 (pGR5.4) and pCAMBIA1302 (pCAMβ) binary vectors respectively, in accordance with Pratap et al., 2011 [60]. The KpnI based partially digested RCA product generated a DNA fragment of~2.7 kb which was cloned into pGreen0029 vector (pGR2.7I) following its nucleotide sequence determination (Xcelris Genomics, Ahmedabad, India). The RCA product obtained from CLCuD-infected cotton leaves was subjected to PCR amplification using oligo primers (viz. Cot For/Cot Rev. containing KpnI and HindIII restriction sites in Forward and Reverse primers, respectively) as described in above section. PCR amplicon (~2.7 kb), thus obtained, was cloned into pGEM-T Easy vector (pGEM-T-2.7), digested with KpnI and HindIII restriction enzymes (Fermentas, USA) and cloned in pGreen0029 vector (pGR2.7II). The DNA insert from the pGR2.7I vector was released following digestion with KpnI restriction enzyme and subsequently cloned in binary vector pGR2.7II. The complete head to tail dimer of CLCuMuV genome was cloned in pGreen0029 vector pGR5.4. The integration and orientation of dimeric clone of CLCuMuV genome was verified after restriction digestion with NcoI. Similarly, full-length CLCuMB sequence (CLCuMB-SR13), which was previously cloned in pDrive (pDriveβ) vector, was subjected to restriction digestion by EcoRI and BglII. The digested DNA fragment (~500 bp) having hairpin loop, thus generated, was cloned into pCAMBIA1302 binary vector (pCAM0.5β). The complete CLCuMB sequence cloned into pDriveβ was released following digestion with EcoRI and cloned into pCAM0.5β vector (pCAMβ). Cloning and orientation of the partial dimer was confirmed by restriction digestion. The infectious clone viz. pGR5.4 (along with helper plasmid pSoup) and binary vector (pCAMβ) were mobilized in Agrobacterium tumefaciens strain LBA4404 by freeze thaw method [61]. The mobilization of pGR5.4 and pCAMβ was confirmed by colony PCR employing primers specific for BAC (viz. Cot For/Cot Rev) and betasatellite (β01/β02) respectively, as described in above section. The agroinfiltration in N. benthamiana plants was done as described by Khan et al., 2015 [8].