Various short autonomously replicating sequences from the yeast Kluyveromyces marxianus seemingly without canonical consensus

Eukaryotic autonomously replicating sequences (ARSs) are composed of three domains, A, B, and C. Domain A is comprised of an ARS consensus sequence (ACS), while the B domain has the DNA unwinding element and the C domain is important for DNA-protein interactions. In Saccharomyces cerevisiae and Kluyveromyces lactis ARS101, the ACS is commonly composed of 11 bp, 5ˊ-(A/T)AAA(C/T)ATAAA(A/T)-3ˊ. This core sequence is essential for S. cerevisiae and K. lactis ARS activity. In this study, we identified ARS-containing sequences from genomic libraries of the yeast Kluyveromyces marxianus DMKU3-1042 and validated their replication activities. The identified K. marxianus DMKU3-1042 ARSs (KmARSs) have very effective replication ability but their sequences are divergent and share no common consensus. We have carried out point mutations, deletions, and base pairs substitutions within the sequences of some of the KmARSs to identify the sequence(s) that influence the replication activity. Consensus sequences same as the 11 bp ACS of S. cerevisiae and K. lactis were not found in all minimum functional KmARSs reported here except KmARS7. Moreover, partial sequences from different KmARSs are interchangeable among each other to retain the ARS activity. We have also specifically identified the essential nucleotides, which are indispensable for replication, within some of the KmARSs. Our deletions analysis revealed that only 21 bp in KmARS18 could retain the ARS activity. The identified KmARSs in this study are unique compared to other yeasts’ ARSs, do not share common ACS, and are interchangeable.


Introduction
Duplication of genomes requires precise initiation of DNA replication at replication origins. Eukaryotic replication origins are divergent but generally encompassed binding sites for origin recognition complex (ORC), regulatory sequences, and transcription units (Gilbert, 2001). An essential component of the replication origins is the cis-acting autonomously replicating sequence (ARS). ARS has been shown to allow stable maintenance of episomal plasmids within the yeast cell (Liachko and Dunham, 2014). Generally, intergenic sequences that contain more than 75% A-T are potential initiation sites for DNA replication in yeasts (Liachko et al., 2010). In Saccharomyces cerevisiae, short sequences less than 100 bp are defined as ARSs that contain 11-17 bp ARS consensus sequence (ACS) in addition to fairly defined flanking sequences (Liachko and Dunham, 2014;Méchali et al., 2013). However, Méchali (2010) reported that the presence of an ACS is not sufficient to predict a functional DNA replication origin because, among the 12,000 ACS sequences discovered in S. cerevisiae genomes, only 400 are active replicators (Nieduszynski et al., 2006). On the other hand, different groups within the genus Saccharomyces have varying ARS elements as components of the replication origin (Dhar et al., 2012). Most of Kluyveromyces lactis ARSs utilize 50 bp as an ACS motif, which is completely divergent from the canonical S. cerevisiae ACS (Liachko et al., 2010) except the ARS101 of K. lactis that shares the common ACS of S. cerevisiae (Irene et al., 2004). The yeast Lachancea kluyveri ARSs require a sequence that is similar but much longer than the ARS consensus sequence well defined in S. cerevisiae (Liachko et al., 2011). ARS elements in Schizosaccharomyces pombe are more than 1 kb in size, rich in AT residues, but lacking a common sequence motif. High-affinity binding of S. pombe ARS to SpORC requires no specific sequence (Clyne and Kelly, 1995;Kelly and Callegari, 2019;Reeves and Nissen, 1990). An ARS of 60 bp was reported as indispensable and adequate to confer ARS function to shuttle plasmids and linear DNAs in the yeast Candida guilliermondii (Foureau et al., 2013).
The yeast Kluyveromyces marxianus DMKU3-1042 is thermotolerant, fast growing on various carbon biomass, cost-effective, and hightemperature ethanol fermenting yeast (Abdel-Banat et al., 2010a;Limtong et al., 2007). It tends to effectively integrate linear DNA fragments randomly into its chromosomes (Nonklang et al., 2008) via its highly active non-homologous end-joining (NHEJ) pathway (Abdel-Banat et al., 2010b) and it does not need homology sequences at the fragment' ends for effective recombination unless otherwise its NHEJ pathway is disrupted. To utilize the advantages of the strain, we developed a simple one-step method for NHEJ-based cloning and constructed several K. marxianus circular plasmids with different selection markers for recombinant DNA (Hoshida et al., 2014). Using this method, 36 promoters were cloned to express RFP, and promoters' activities and expression profiles were analyzed in a real-time manner . The outstanding notice is that transformation of a mixture of two PCR-amplified DNA fragments could generate correct recombinant DNA in K. marxianus and the replication of plasmids within the yeast cells was driven by the 60-bp sequence of KmARS7 (Hoshida et al., 2014).
In this study, we demonstrate isolation and analysis of more KmARSs from the yeast K. marxianus DMKU3-1042. Following a simple functional validation approach and post-transformation cellular events, we identified several robust KmARSs. In addition, the impact of site-specific mutations and deletions on the activity of some KmARSs were determined. We also demonstrate the influence of short interchanged sequences of KmARSs on the replication activity. The KmARSs reported here indicate that the strain DMKU3-1042 uses various autonomously replicating sequences that have no obvious canonical consensus.

Strains, media, and transformation procedures
Yeast strains (Table 1) were regularly maintained at 28 • C in YPD medium [1% yeast extract, 2% peptone, 2% glucose] or SD medium [0.17% yeast nitrogen base without amino acids and ammonium sulphate (US Biological, MA, USA), 0.5% ammonium sulphate, 2% glucose and required nutrients]. SD(-U) was an SD medium with necessary nutrients but lacking uracil (Ausubel et al., 1999). 5-Fluoroorotic acid (5-FOA) medium was prepared according to the protocol described by Akada et al. (2006). Luria-Bertani (LB) medium containing 100 μg/ml ampicillin (Sigma-Aldrich, MO, USA) was used for the selection of E. coli strain DH5α cells that transformed with plasmids bearing the Amp R marker gene. Solid media contained 2% agar. Yeast strains were grown in fresh YPD plates at 28 • C for 1~2 days before being used for transformation experiments. Yeast competent cells were prepared as previously described (Abdel-Banat et al., 2010b). Briefly, a mixture containing final concentrations of 40% w/v polyethylene glycol 3350 (PEG), 200 mM lithium acetate (LiAc), and 100 mM dithiothreitol (DTT) was dissolved in sterilized distilled water. This mixture was referred to as the transformation mixture (TM). Aliquots of auxotrophic mutant K. marxianus cell suspension prepared in the TM retain their competence for up to 14 months when stored at − 80 • C (Abdel-Banat et al., 2010b). The transformation was accomplished by thawing the yeast competent cells at room temperature, followed by the addition of PCR-amplified linear or plasmid DNA, heat shock for 15 min at 47 • C, and then plating on SD(-U) medium for selection.

Screening and isolation of autonomously replicating sequences from K. marxianus (KmARSs)
The yeast K. marxianus DMKU3-1042 chromosomal DNA and the yeast S. cerevisiae shuttle vector pRS316 (Sikorski and Hieter, 1989) were digested with EcoRI and XhoI restriction enzymes as instructed by the manufacturer (New England Biolabs, MA, USA). The recovered K. marxianus DNA was ligated into the digested vector using the T4 DNA ligase kit (New England Biolabs, MA, USA) and the reaction was terminated by heating for 10 min at 65 • C. The ligation product was transformed into competent cells of E. coli. Approximately 14,959 E. coli colonies carrying plasmids with K. marxianus chromosomal DNA fragments were pooled from the LB selection plates, cultured overnight in liquid LB medium at 37 • C and the recombinant plasmids were extracted and purified from E. coli cells using QIAprep® spin miniprep kit (Qiagen). The purified plasmids were transformed again into the K. marxianus strain RAK3605 (ura3-1) as described previously (Abdel--Banat et al., 2010b). RAK3605 cells that were transformed with the genomic library were cultured in MM(− U) medium to identify the cells that harbor recombinant pRS316 with potential autonomously replicating sequences of K. marxianus (KmARSs). The recovered cells were spread on YPD plates to produce colonies and subsequently, at least six transformants from each construct were inoculated on 5-FOA plates (Boeke et al., 1987) to detect whether these plasmids can replicate autonomously.

Sequence identification of KmARSs
To identify the sequence of KmARS-containing plasmids that confirmed replicating autonomously within K. marxianus cells, yeast transformants were cultured individually on MM(− U) liquid media and grown overnight at 28 • C. Then plasmids were extracted using a Zymoprep TM Yeast Plasmid Miniprep Kit II (Zymo Research, Orange, CA, USA) and Zymolyase 100 T (Seikagaku Biobusiness, Tokyo, Japan), as previously reported (Nonklang et al., 2008). Again, the isolated plasmids were cloned in E. coli DH5α competent cells and purified as stated in section 2.2. Throughout the empirical work in this study, the concentration of all kinds of DNA was quantified by Qubit® fluorometer (Thermo Fisher Scientific Inc.) using Quant-iT TM dsDNA assay kit. The sequences of KmARSs were determined by the cycle sequencing protocols used for the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems™) according to the supplier's instructions. Recombinant pRS316 plasmids with inserted KmARSs are listed in Table 2.

DNA manipulation
PCR was performed using KOD plus DNA polymerase (Toyobo, Osaka, Japan) according to the manufacturer's instructions. The primers used are listed in Table 3. The S. cerevisiae URA3 gene (ScURA3), including its promoter and terminator, was amplified by PCR from BY4704 chromosomal DNA with the following primer pairs: URA3-223 and URA3-300c; URA3-300 and URA3-300c; 9C-URA3-223 and URA3-300c; and 9C-URA3-223 and 3CG9-URA3+880c. The 9C and 3CG9 sequences flanking the URA3 gene were utilized subsequently in two discrete PCR reactions (Cha-aim et al., 2009(Cha-aim et al., , 2012Hoshida et al., 2014) to anneal the KmARSs at either or both ends for further analysis. The minimum active sequences of KmARSs (Table 2) were determined empirically by PCR-directed deletion of the KmARS sequences from both  Brachmann et al. (1998) sides and rejoining the amplified fragments together with the URA3 gene as described before (Hoshida et al., 2014).

Functional validation of K. marxianus ARSs (KmARSs) by linear KmARS transformation
To determine the minimum active sequences of KmARSs, three steps were followed (Fig. S1A). First, the ScURA3 gene was amplified by PCR with the primers 9C-URA3-223 and 3CG9-URA3+880c. Second, a linker of 9Cs (5 ′ -ccccccccc-3 ′ ) or 3CG9 (5 ′ -cccgggccc-3 ′ ) was designed at the 3 ′ end of KmARS primers to anneal the truncated KmARSs sequences to the ScURA3 gene prepared in the first step. Third, short truncated sequences of some KmARSs were divided into two parts to design primers. One part was flanked with 9C and the other with 3CG9 in an intention to leave the central joining sequence of the KmARS free after running the PCR with both primers (Fig. S1A). These steps were used to identify the minimum active sequences for KmARS7, KmARS11, KmARS16, KmARS18, KmARS22, KmARS36, and KmARS51 by transforming the ScUR-A3+KmARS into K. marxianus strain RAK3606 and selection on MM-U and replica-plating on 5-FOA. To examine whether segments of minimum KmARSs can be exchanged with each other while retaining the ARS activity, a combination of primer pairs representing discrete KmARSs were used to anneal them by PCR at the ends of the ScURA3 gene as described in the third step above then followed by routine selection and replica-plating procedures (Fig. S1A).

Analysis of K. marxianus ARS consensus sequence (ACS)
To detect the ACS within KmARSs, deletions and/or substitutions experiments were performed on the minimum active sequences of KmARS7, KmARS11, KmARS18, KmARS22, and KmARS36. Deletion primers were designed from the minimum active sequences of KmARS7 (201-250) and KmARS36 (291-340) by deleting triple nucleotides at a time, while for KmARS18 (111-138) primers, deletion of a single base was carried out in addition to single base substitution for all bases. In the case of KmARS11 (46-105), five nucleotides were deleted at a time from the 3 ′ end and ten nucleotides were deleted at a time from the 5 ′ end. For KmARS22 (991-1060), ten nucleotides were deleted at a time from either the 5 ′ or 3 ′ side.

Functional validation of KmARSs
We have previously shown that the circular plasmid pRS316 did not replicate in K. marxianus DMKU3-1042 but its linear DNA efficiently integrated into the chromosomes of this strain (Abdel-Banat et al., 2010b;Nonklang et al., 2008;Hoshida et al., 2014). In this study, a simple approach based on a linear transformation protocol was adopted to concept-proof the activities of KmARSs (Fig. S1A). After series of sequence alignments (Fig. S3) with known ARSs from S. cerevisiae (Deshpande and Newlon, 1992) and Kluyveromyces lactis (Iborra and Ball, 1994), KmARSs sequences ranging from 21 to 70 bp were identified for replication in K. marxianus DMKU3-1042. To analyze the sequences more precisely, these KmARSs were fused to the ScURA3 marker gene and subjected to transformation. Upon transformation, the yeast K. marxianus uses its NHEJ pathway to attach the ends of these linear constructs to form circular DNA and transformants. However, some transformants may have produced by chromosomal integration of the DNA introduced. To confirm plasmid formation, transformants were inoculated on 5-FOA plates. Yeast cells with autonomously replicating DNA successfully grow on 5-FOA, while cells with chromosomally integrated ScURA3 gene fail to grow on 5-FOA ( Fig. S1A & B). Using this easy functional validation and post-transformation cellular events, truncated but functional sequences of seven KmARSs were verified (Fig. 1A). The functional sequences of KmARS7 (50 bp  *pRS316 is a S. cerevisiae CEN6/ARSH4 shuttle vector (Sikorski and Hieter, 1989). **Sequence coordinates represent the chromosomes of K. marxianus DMKU3-1042 (Lertwattanasakul et al., 2015).  gives more than 80% rescued colonies from 5-FOA toxicity, an indication of intracellular replication as plasmids. It is noteworthy that, the alignment of these short functional sequences showed no prominent common consensus but the AT stretches prevail the sequences (Fig. 1B).

Impact of truncations and triple nucleotide deletions on the activity of the region 201-250 of KmARS7
We have previously demonstrated that 60 nucleotides of KmARS7 (201-260) effectively drove the replication of the ScURA3 gene  . (Hoshida et al., 2014). However, the KmARS7 retains its potent activity even after further truncations of this region. The region 201-250 gave an average of 40×10 5 colony-forming units (CFU) μg − 1 transforming DNA, but the number of transformants was dropped drastically when the regions 216-250 or 226-250 were transformed in conjunction with the ScURA3 gene ( Fig. 2A). Truncations of the region KmARS7 (201-250) were also investigated by triple nucleotide deletions. Two separate primers for each construct were used to amplify the ScURA3 marker. One primer was KmARS7 (201-225)c-3CG9 and the other set of primers were KmARS7 (226-250)9C and its triple nucleotide truncations (Fig. 2B). Deletion of three nucleotides from the 3 ′ end of KmARS7

Functional characteristics of KmARS11
The whole insert sequence of KmARS11 is 154 bp. The regions KmARS11 (46-105) and (46-100) gave comparable high transformability, while the transformability of the regions KmARS11 (46-95) and (56-105) was declined but produced significant levels of transformants compared with transformation without any ARS. The transformability of the regions KmARS11 (66-105) was not distinct from that of the ScURA3 gene (Fig. 3). As a result, 50 bp of KmARS11 (56-105) retains the transformability.

Functional characteristics of KmARS18
Deletion of seven nucleotides from the 3 ′ end of KmARS18 (111-160) slightly decreased the transformation efficiency of KmARS18. Further triple-nucleotide deletions resulted in the reduction of the transformation efficiency of KmARS18 on average to levels as low as 34%. Surprisingly, the region KmARS18 (111-138), which is 28 bp-long, showed elevated transformation efficiency (Fig. 4A)  , was thoroughly investigated by single nucleotide deletion from both sides (Fig. 4B). The deletion of seven nucleotides from the 5ˊ end (TCCATAA) resulted in the generation of fewer transformants. Moreover, an additional single nucleotide deletion from this region completely abolished its transformability. On the other hand, the deletion of four nucleotides from the 3ˊ end (135-CTTT-138) resulted in the elimination of transformability. The region as short as 21 bp-long of KmARS18 that covers the nucleotides (116-136) was capable to drive efficient transformation (Fig. 4B). Replacement of three nucleotides 131-GTC-133 with CCA, the addition of A at the position 131, deletion of G at position 122, and replacement of the region KmARS18 (111-TCCATAATT-119) by the introduction of nine nucleotides of KmARS7 (201-CAAGACTTC-209) at the same site negatively affect the transformation efficiency of the region KmARS18 (111-138) (Fig. 4B). Furthermore, as shown in Fig. 5A, a single nucleotide substitution in the region KmARS18 (111-138) induces moderate to weak effect or complete loss of transformability. However, the substitutions at some sites did not affect the transformability and the mutants gave transformants similar to the original sequence. Substitution at the sites T118G, T118C, T119A, T119C, G121C, A128C, A129G, A129T, or A129C made the KmARS18 (111-138) lose the ability to develop transformants (Fig. 5A). In other cases, very few but small transformants were developed upon base substitution at the sites T111C, A117G, A117C, T118A, G121T, G122T, T125G, T126A, T126G, G127A, A128T, A130C, G131T, G131C, or T132A (Fig. 5A). Additionally, the region of the 21 nucleotides ] that showed highly efficient transformation (Fig. 4B) was capped by adding five nucleotides, "CGCGC", at its free end after joining it to the marker gene. Transformation of this construct and a similarly capped region KmARS18 (111-159) as a control, revealed that the region KmARS18 (116-136) is very sensitive to additional bases at its 3 ′ -end (Fig. 5B) but 5 ′ -capping by the "CGCGC" did not interfere with the efficient transformability of the region KmARS18 (111-159) [49 bp].
The region (291-340) of KmARS36 [50 bp] gave an average of 3.9×10 5 CFU µg − 1 DNA. Contrary to the other KmARSs, the transformability was increased gradually upon deletion of triple nucleotides at a time and reached up to 7.73×10 5 CFU µg − 1 DNA when nine nucleotides were deleted from the 3ˊ end leaving a region of 41 bp ]. When twelve nucleotides were deleted leaving a region of 38 bp [KmARS36 (291-328)], the transformability was slightly declined compared with KmARS36 (291-331) but showed higher transformability than KmARS36 (291-340), indicating that the 38 bplong region is still capable to drive the autonomous replication. Further deletions from the 3ˊ end, leaving the regions 291-328 or 291-325, caused the loss of transformability (Fig. 6B). The nucleotides that covering the region 326-ATAAAA-331 are indispensable for the activity of KmARS36.

Impact of KmARSs interchanged sequences on transformability
There is clear variation in the sequences among the identified core sequences of KmARSs (Fig. 1B) and these KmARSs have no sequence identity with the optimized KlARS (Liachko and Dunham, 2014) (Fig. S4). Although sequences of the regions KmARS18 (1181-1240) and KmARS7 (123-182) have sites with fairly high identity to other yeast ARS consensus sequences (ACS), these regions did not drive efficient transformability relative to their corresponding regions of the KmARS18 (111-159) and the KmARS7 (201-250) (Fig. S5). Due to the disparities in the consensus and lengths of the identified KmARSs, short sequences of these ARSs were interchanged with each other to judge whether or not they could induce efficient transformability. As shown in Table 4, the majority of various regions of the KmARSs when interchanged, they generate in some instances even more transformants than do the corresponding regions of individual KmARSs. The most prominent results were the highly efficient transformability of KmARA18  (Table 4). Meanwhile, these interchanged sequences showed fewer consensus identities and the similarities mainly skewed towards the 3 ′ and 5 ′ ends without clear consensus in the middle (Fig. S6). It is noticeable that transformants from the interchanged constructs gave between 81 to 100% colony growth on 5-FOA.

Discussion
Autonomously replicating sequences (ARS) are the replicator elements to which bind the initiator protein that unwind the DNA double helix and recruits additional factors to initiate the process of DNA replication. The proteins that regulate replication are highly conserved, including the origin recognition complex (ORC), which binds directly to replication origin sequences, but Gilbert (2001) stated, "In several eukaryotic replication systems, it appears that any DNA sequence can function as a replicator". However, many studies on yeast ARS helped to define specific sequences that function as origin replicators in S. cerevisiae, S. pombe, K. lactis, and C. guilliermondii (Stinchcomb et al., 1980;Clyne and Kelly, 1995;Irene et al., 2004;Liachko et al., 2010;Liachko and Dunham, 2014;Foureau et al., 2013). Here we report the identification of twelve functional KmARS from the strain DMKU3-1042 capable to replicate plasmid DNA but have no common consensus sequences.
Previously, Iborra and Ball (1994) reported the isolation of three small DNA fragments from K. marxianus strain ATCC12424 [ARS1 (1267 bp), ARS2 (1206 bp), and ARS3 (1200 bp)]. ARS1 and ARS2 contain both ARS and centromeric elements, while ARS3 contains ARS core sequence only and all function in K. lactis. Only two of our KmARSs identified in the current study share identity with ARS1 and ARS2 from the strain ATCC12424. One of them is KmARS3 (Table 2), which shares 89.53% identity to ARS1 and the other is a portion consists of 128 nucleotides from KmARS20F that shares 100% identity to ARS2. However, none of the KmARSs reported here share significant identity to the ARS3 from the strain ATCC12424, which indicates that in this study ten KmARSs are identified for the first time from K. marxianus. This might be either the ARS3 replicator is not functional in K. marxianus DMKU3-1042 or its rival was missed during our libraries' screening. It has been reported that very similar ACS of nonanucleotide (5 ′ -TTTATTGTT-3 ′ ) is common between K. marxianus and K. lactis (Iborra and Ball, 1994). However, this same ACS is not found in any of the currently investigated KmARSs.
In this study, we also identified minimal sequences that function as ARS. These sequences indicated again that ACS found in S. cerevisiae and K. lactis does not exist in the K. marxianus ARSs. In addition, generally within 50-bp KmARS sequences, at least 21-bp are functioning as ARS for plasmid replication. Among the identified minimal functional sequences, any clear consensus sequence was not found, indicating that the essential sequence of ARSs in K. marxianus are divergent.

Conclusion
Identification of the short sequences that function as K. marxianus autonomous replication origins using a novel and simple approach for the validation of the ARS function. ACSs of K. marxianus DMKU3-1042 are diverse among the KmARSs as well as from those of K. lactis, indicating that eukaryotic replication systems are not necessarily having common ACS. That is evidenced by the fact that no site-specificity was detected in early embryos of frogs, flies, and fish (DePamphilis, 2003). However, mammals contain genetically required sequences that convey origin activity when translocated to other chromosomal sites, but they lack identifiable, genetically required consensus sequences such as ACS in budding yeast replicators (Prioleau et al., 2003). A single nucleotide  mutagenesis approach helps to identify specifically the essential nucleotides within the span of the active KmARSs. The KmARS18 ACS termini are very sensitive to nucleotides substitution. All defined minimum active KmARSs, except KmARS22 and KmARS16, are located at the intergenic sequences of the genome. Overall, the minimum KmARSs reported here are capable to induce the formation of circular DNA and effectively replicate within the yeast cells. The KmARSs described in this study will provide additional options that are versatile and more effective to develop large sets of molecular tools for better engineering of this strain.  . Sequences with the symbol (+++) give a highly efficient transformation, those with the symbol (++) give moderate transformation, those with the symbol (+) give weak transformation, and those with the symbol (-) completely lost the activity. The sequence with (+/-) give variant transformability (mainly small colonies). (B) Sensitivity of KmARS18 (116-136) to cap. The addition of cap sequences at the end of KmARS18 (116-136) adversely affects the ARS function of this region. The addition of cap "cgcgc" to the region KmARS18 (111-159) positively enhanced the transformability, while the transformability of the capped KmARS18 (116-136) is greatly declined relative to the uncapped same region.

Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.

Declaration of Competing Interest
None *Transformation efficiencies of the interchanged ARS sequences are tabulated as CFU (×10 5 ) μg − 1 DNA. Using the same lot of yeast competent cells (RAK3605), the marker gene alone gave approximately 1.26×10 5 CFU μg − 1 DNA. **Tested colonies from transformants of the KmARS18 (111-138) in combination with all other regions of KmARSs that shown in this table gave 81 to 100 percent growth on 5-FOA.