The CRISPR-Cas System and its Relationship with MGEs in Klebsiella

Microorganisms have developed many strategies in the process of long-term defense against external attacks, one of which is the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated proteins (Cas) bacterial immunological system. In this paper, the whole genome of 300 strains of Klebsiella was collected, the CRISPR-Cas system in the strains was statistically analyzed, and the types and structures of CRISPR system in Klebsiella were explored, as well as the correlation between CRISPR and mobile genetic elements (MGEs). Through principal component analysis (PCA), we found that Cas gene, plasmids, integron, IS1, IS609 and DNA-related enzymes were closely related to CRISPR. Compared the structural characteristics of plasmids, the DinG family helicases, Cas6, Csf2, and IS5 were observed near the CRISPR loci in plasmid, which is also conrmed by the the results of PCA that they may be important factors affecting the plasmid whith CRISPR.


Introduction
Klebsiella is a gram-negative bacterium which is divided into three species in Berger's Identi cation Bacteriology: Klebsiella pneumoniae, Klebsiella acidophilus, Klebsiella native and Klebsiella phytophthora. The occurrence of infectious antibiotic resistance in this organism is a major problem worldwide. K. pneumoniae has a large antibiotic resistance gene pool, which they share with other Enterobacteriaceae, mainly through self-transferring plasmids (Navon-Venezia, et al. (2017). In these organisms, almost all modern antibiotic resistance (to carbapenems, cephalosporins, aminoglycosides, and now even colistin) is encoded in large (40-200 Kb) low-copy (16 per cell) conjugated plasmids (Moineau, 2015).
Over the past decades, researchers have studied some CRISPR-Cas systems and Cas proteins in detail. CRISPR-Cas is an adaptive immune system which stores memories of encounters with foreign DNA which are mostly mobile genetic elements (MGE) in unique spacing sequences extracted from the MGE and inserted into the CRISPR array (Moineau, 2015). Transcripts of CRISPR sequences are used to recognize homologous sequences and guide Cas nucleases to their unique targets when encountering familiar MGEs, resulting in inactivation of the latter (Barrangou, et al., 2017;Garcia-Martinez, et al., 2018). Like all defense mechanisms, the CRISPR-Cas system evolved in a long arms race with MGE, which has led to rapid evolution of some Cas gene sequences, primarily effect components, as well as signi cant diversity in the genetic composition and tissue of the CRISPR-Cas site. Like eukaryotic RNA interference and the argonte-centric defense mechanisms of prokaryotes, CRISPR-Cas belong to nucleic acid-oriented defense systems (Koonin, 2017;Swarts, et al., 2014;Takeuchi, et al., 2012). Among these mechanisms, however, CRISPR-Cas has the complete ability to create immune memories, representing true adaptive immunity.
The complexity and diversity of the CRISPR-Cas system implies a complex evolutionary history.
The CRISPR system is divided into two types (polysubunit effector complex, mono-protein effector module) and six types (Types I, III, IV and Types II, V, VI), of which type I and type III are more studied. (Lino, et al., 2018;Makarova, et al., 2020;Ostria-Hernandez, et al., 2015). In the type I system, the CRISPR RNA (crRNA) complex recognizes the target DNA, which is then cleaved by Cas3. In type III system, the Cas10 protein is assembled into a complex that recognizes and cuts targets. The genomic CRISPR locus consists of three parts: trans-activated CRISPR RNA (tracrRNA) genes, Cas genes, and CRISPR repeat and spacers.
Some studies have identi ed type I-E and I-F CRISPR-Cas in many gram-negative Enterobacteriaceae, in addition, there are some studies have found that Kebsiella pneumoniae contains the CRISPR-Cas plasmid system (Medina-Aparicio, et al., 2018;Ostria-Hernandez, et al., 2015;Qu, et al., 2019). So far, there have been few studies on the type IV system in which such genes located on mobile elements. The relationship between CRISPR-Cas and MGEs is very complex. Some MGEs contribute to the origin and evolution of CRISPR-Cas, and conversely, the CRISPR-Cas system and its components are absorbed by some MGEs (Faure, et al., 2019).
This study examined the diversity of CRISPR-Cas system in Klebsiella strains, the relationship between MGEs and CRISPR-Cas system, especially the plasmid CRISPR-Cas were analyzed. Plasmid CRISPR-Cas directed against other plasmid, it may provide another level of incompatibility in plasmid communities. Both plasmid and chromosomal CRISPR-Cas are evidently important determinants of the epidemiology of large antibiotic resistance plasmids in Klebsiella.

Strains collection
In the National Center for Biotechnology Information database (https://www.ncbi.nlm.nih.gov/), we randomly chosed 300 strains Klebsiella strains. The whole genome download was saved as a FASTA format. Upload the whole genome sequence in the CRISPR Cas + + website (https://crisprcas.i2bc.paris-saclay.fr/CrisprCasFinder/Index) to get each strain of CRISPR-Cas information (including the CRISPR locus, Cas gene, repetitive sequence and spacer, etc.).

Identi cation and analysis of CRISPR
Typical CRISPR repeats were sorted and stored in FASTA format, and ClustalX was used for multiple sequence alignment analysis. The con rmed CRISPR loci were divided into 7 categories according to the different repeat sequences, which were named as CRISPR1-7 (Makarova, et al., 2011). Web logo (http://weblogo.berkeley.edu/logo.cgi) was used to visualize the identi ed CRISPR site. These repeats are thought to be speci c genetic markers for CRISPR. The secondary structures of single stranded RNA or DNA sequences were predited with RNAfold Web Server (http://rna.tbi.univie.ac.at/cgibin/RNAWebSuite/RNAfold.cgi). Current limits are 7,500 nt for partition function continuously and 10,000 nt for minimum free energy only predicitions. MGEA7.0 software was used to construct phylogenetic trees of repeated sequences in Klebsiella CRISPR-Cas system for genetic evolution analysis.

Spacer sequence analysis
In order to identify the spacing sequences matched with the mobile elements, the spacing sequences in the CRISPR loci were sorted and saved in FASTA format. The spacing sequences were BLASTN search in Genbank using standard BLASTN search, e-value <10 -5 homologous sequences with a 10% difference in sequence length , identifying the genetic moving element.

Phylogenetic tree of Cas1 gene in Klebsiella
Most Cas genes in Klebsiella belong to type I-E and type I-F. Cas gene loci were obtained through the CRISPR Cas ++ (https://crisprcas.i2bc.paris-saclay.fr/) website, and the corresponding Cas1 gene sequence was extracted from the whole genome sequence with BioEdit software and stored as FASTA le. The MGEA7.0 program was used to estimate nucleotides diversity and evolutionary distance, as well as to construct phylogenetic trees by using the Neighbor connection approach of the Juke-Cantor distance.

The distribution of mobile genetic elements and regulator instrains
The FASTA le of the whole genome sequence obtained from NCBI was submitted to RAST website for gene annotation, and the obtained results were saved in the form of table. The gene of NCBI strain was used to count the insertion sequence, transposon, integron and DNA related enzymes. After all the data were integrated, the statistical correlation between the data and CRISPR was analyzed using principal component analysis.

Plasmids analysis
In order to analyze the characteristic structure of CRISPR-Cas system, the plasmids containing CRISPR-Cas were further analyzed. Two CRISPR-Cas plasmid genome sequences were downloaded from the NCBI database and uploaded to RAST to annotate the genes of the two plasmids. Based on the result of RAST, the sequences were uploaded to IS Finder, INTEGRALL, CRISPR Target, etc., respectively, to supplement the information of movable elements and drug-resistant genes, and to draw the plasmid structure diagram. (Ge, et al., 2016) The correlation between plasmid CRISPR-Cas and mobile genetic elements was also analyzed by principal component analysis, the steps were same as described above (Qu, et al., 2019).

Geographical comparison of CRISPR alleles
702 Klebsiella strains were selected from all available strains in NCBI database, 300 of which were randomly selected. The CRISPR locus was identi ed according to CRISPR Cas ++, which should contain at least two distinct intervals. The number of CRISPR loci varies from one to three, depending on the strain. Of the Klebsiella strains collected, only 95 had the CRISPR locus, and 12 had the CRISPR locus in their plasmids. The results of statistical analysis showed that the number of direct repeats in Klebsiella was between 3 and 66, and the number of spacers was between 2 and 65.

The pro les of Klebsiella CRISPRs
The CRISPR loci were divided into 7 groups according to repeat sequence similarity. Through multi-sequence alignment analysis, the direct repeat length of CRISPR gene in each locus was similar. The results showed that CRISPR2, CRISPR3 and CRISPR6 were the most common con rmed loci in all strains. The number of repetitions was 180, 147 and 60, respectively (Table 1).
The results from describing the CRISPR cohort suggest that CRISPR2 and CRISPR3 have fewer mutations and higher frequencies (Fig 1). From the diversity analysis of base mutations, the signi cance of these ndings is to con rm that the CRISPR structure is relatively stable in the seven groups, that the fewer base mutations in the CRISPR repeat sequence, the more stable the CRISPR, and the more complete the systemic evolution.
CRISPR repeats may form stable hairpin like secondary structures (classical stem rings), with each CRISPR repeat containing a large ring and a small ring at each end. In these 7 CRISPR groups, there are two rings at each end of the RNA secondary structure and a stem in the middle, which is 5-7 in length and highly conserved (Fig 2). The free energies of the thermodynamic ensemble are -11. 60, -11.70, -15.20 -11.80, -12.70, -14.20, -8.60 kcal/mol. Although the presence and number of CRISPRs is relatively constant, the number of repeats at each site always varies from strain to strain. The results show that CRISPR2 and CRISPR3 have the lowest MFE, meaning they have the most stable RNA secondary structure.

The effect of spacer structure on CRISPR loci
According to data statistics, the total number of spacers in Klebsiella strains was 2549. Spacing length has been shown to affect the activity of the CRISPR locus. Our data shows a negative correlation between the size of the repeat and the interval block (Fig 3 h). According to the above results, the most stable ones are CRISPR1 and CRISPR3.
From the perspective of base matching with exogenous gene sequences, we found that CRISPR 1-7 had 51,197,111,176,100,68, and 8 special spacing sequences, respectively, and the exogenous matched sequences were 262, 1512, 1617, 1040, 751, 1197, and 73, respectively. Most of these foreign sequences come from insertion sequences (IS), transposons, plasmids and phages. The formation mechanism of spacer was proved. The presence of spacers was matched with elements associated with antibiotic resistance gene mobilization (e.g., IS5, Tn3). Taken together, the current ndings con rm that repeated sequences are negatively correlated with the size of the spacer block and alter the activity of the CRISPR site, while further research is needed.
One important nding is that in almost every type, the Cas1 gene is present in its habitat, so it is safe to assume that the Cas1 gene is ubiquitous in Klebsiella's CRISPR-Cas system. By constructing homologous evolutionary trees, Cas1 gene of different strains was compared to conduct further research and analyze the role of Cas1 gene in Klebsiella evolution. Thus, Cas1 gene can be used to roughly classify bacteria among species according to nucleotide similarity. Compared to other Cas genes, Cas1 is more representative, because almost all bacteria contain Cas1.
3.5 The relationship between CRISPR and mobile genetic elements, regulators, and DNA-related enzyme CRISPR-Cas systems are known to resist MGE invasions, such as plasmids and phages, which often carry antibiotic resistance genes (ARG). The relationship between CRISPR-Cas system and MGEs is complex and diverse. MGEs can promote the high variation of CRISPR locus in bacteria, and CRISPR can defend against MGE attacks.

The effect of CRISPR-Cas system on plasmids
To observe the structural characteristics of the CRISPR-Cas system, we analyzed the plasmid containing CRISPR (P15WZ-82_Vir) and the plasmid without CRISPR (pKpvST101_5), and compared the distribution of mobile genetic elements and regulatory factors on the two plasmids. We compared the graphical results of the two plasmids, and observed that both plasmids contained multiple replicons, as well as a comparable number of Tra family genes, IS sequences, transposons, and integrons, indicating similar levels of diversity of the two plasmids. Interestingly, we also found that the CRISPR sites on the P15WZ-82_VIR plasmid contained DinG family helicase, type I-E CRISPR-associated protein Cas6/Cse3/CasE, CRISPRassociated protein Csf2 and IS5.These genes and mobile elements were not found in pKpvST101_5, a plasmid that does not contain the CRISPR site. So we consider that these genes and elements may be important factors that in uence the emergence or evolution of CRISPR on plasmids.

The effect of MGEs on CRISPR-Cas system of plasmids
According to these results, we think the DinG family helicase, Cas6/Cse3/CasE, Csf2, IS5 have some relationship with the emergence of CRISPR. In order to comprehensively analyze which MGEs and gene can affect the emergence and evolution of CRISPR plasmid type, we collected the full genetic sequences of 10 plasmids that contained CRISPR and 10 plasmids that did not, we annotated these gene sequences and got the result of principal component analysis. PCA results showed that DinG family helicase, Cas6/Cse3/CasE, Csf2, IS5 and plasmid CRISPR had relatively high coe cients (81.2%, 53.3%, 73.8%, 47.1%, respectively).

Discussion
In this paper, the distribution, type and spacing sequence of CRISPR-Cas system in Klebsiella were studied. Among the collected strains, about one third contained CRISPR-Cas system, and most of the CRISPR-Cas system belonged to type I-E. When the CRISPR locus was analyzed, nine strains were found to have CRISPR gene loci in plasmids. After statistical analysis of the in uencing factors of CRISPR-Cas system, plasmids were found to have great in uence on its stability.
The CRISPR-Cas system provides bacteria with adaptive immunity against plasmids and other MGEs. In Klebsiella, the plasmid speci c spacer obtained from the CRISPR spacer of chromosomes can provide immunity to the plasmid for the strain. Huang et al proposed in the study of multi-drug resistant Klebsiella pneumoniae that drug-resistant genes could be integrated from the plasmid to the chromosome by using the CRISPR-Cas system (Huang, et al., 2017). Muhammad and Jonathan also concluded that obtaining new spacer sequences in the CRISPR-Cas array could induce the degradation of its targeted plasmids in the host, prompting the transfer of drug-resistant genes on the plasmids to chromosomes or other related mobile genetic elements under the pressure of antimicrobial selection (Kamruzzaman, et al., 2019). Many strains carry multiple plasmids, and without the selection of plasmids genes, the acquisition of new plasmids reduces the growth rate and competitiveness of the plasmid-carrying host, thus placing a burden on the host [18]. Obtaining plasmid-mediated CRISPR spacers targeting other plasmids and host chromosomes may facilitate the collaborative integration of plasmids with each other or into host chromosomes, thereby improving the stability and compatibility of plasmids (Kamruzzaman, et al., 2019).
The type-IV CRISPR Cas system is equivalent to a simpli ed version of the type-I CRISPR Cas system, with a genetic makeup similar to that of type-I. However, the Cas protein sequences of type-IV systems are quite different from those of other type-I systems, so they are classi ed as different systems (Makarova, et al., 2015) . nition in the type-IV system is unclear (Newire, et al., 2019). Type-IV system has two variants (subtype IV-A and subtype IV-B), both of which contain highly differentiated effector module genes of Cas5 (Csf3), Cas7 (Csf2) and Cas8-like large subunit (Csf1), but subtype IV-A also encodes the DinG family helicase.
All of the complete genomes that characterize the type-IV CRISPR Cas system are encoded by bacterial plasmids, bacteriophages, or other uncharacterized integrated elements (Faure, et al., 2019). In addition, some type-IV CRISPR Cas loci encode predictive enzymes of ADP Ribosyl transferase family (ART), including bacterial toxins. Together with the type-IV system's Cas proteins, these enzymes may help suppress the host CRISPR Cas or other defense systems, ensuring the stability of plasmids and prophages (Shabbir, et al., 2016) .
The type-IV CRISPR-Cas system on plasmids lacks target enzymes (Cas3 or Cas10 genes).In the study of Muhammad Kamruzzaman and Jonathan R. Iredell, it was mentioned that the positive plasmids of the CRISPR-Cas system of type-IV Klebsiella pneumococci were only found in the bacteria of type I-E chromosome CRISPR-Cas, which made up for the lack of target cutting function in the CRISPR plasmids, considering that there may be a cross between the plasmid and the CRISPR chromosome (Kamruzzaman, et al., 2019). However, in this study, we found that some strains only contained the type-IV CRISPR-Cas system, which may be the result of the continuous evolution of CRISPR under environmental pressure and some MGEs. The impact of MGEs on the CRISPR Cas system occurs in a number of independent situations, including the ability to eliminate interference. But the actual effect of the derived CRISPR-Cas system on plasmids remains to be discovered.
The CRISPR-Cas system was found not only in plasmids, but also in other MGEs. Including phages, Tn7 transposition elements and intergated joint elements (ICEs) (Koonin, et al., 2020). Recruitment of CRISPR-Cas defense systems by different MGEs may have contributed to the evolution of MGEs and defense systems. Some CRISPR adaptation modules (e.g., Cas1, Cas2, Cas4, etc.) are thought to have evolved from different transposons. Transposons are an extensive MGE that can be reproduced by recombinases that insert elements into new locations in the host genome. Translocation is involved in DNA replication, DNA repair, and sometimes reverse transcription (Faure, et al., 2019). Most of the CRISPR-Cas carried by MGEs only retained some of their original functions, and the CRISPR-Cas system was preserved in the evolution of MGEs by inhibiting the host defense to gain an advantage in the con ict with MGEs. There is a complex functional and evolutionary relationship between CRISPR-Cas and MGEs, including the similarity between CRISPR-Cas function and the various nuclease reactions in the life cycle of MGEs (Faure, et al., 2019). Much of the biological information involved needs further exploration and discovery.

Conclusions
This study focuses on the CRISPR-Cas system in Klebsiella to explore various factors affecting CRISPR and the relationship between CRISPR and mobile genetic elements. The analysis shows that CRISPR interferes with and protects against foreign mobile devices, while some genes and mobile genetic elements may also have signi cant in uence on the emergence and evolution of CRISPR. Explored various types of CRISPR-Cas systems in Klebsiella, which is prevalent worldwide, it's of great signi cance to research the plasmid -mediated resistance transmission of Klebsiella in the future  The secondary structure of repeats of CRISPR1~CRISPR7. Secondary structure prediction of the most frequent sequence of the rst and terminal repeats of each CRISPR was performed by RNAfold. The free energy of the thermodynamic ensemble was -11.60, -11.70, -15.20 -11.80, -12.70, -14.20, -8.60 kcal/mol.

Figure 2
Six Groups of CRISPR spacer size variability. The relationship between the size of repeat and spacer among six groups: (a) Group 1 spacers; (b) Group 2 spacers; (c) Group 3 spacers; (d) Group 4 spacers; (e) Group 5 spacers; (f) Group 6 spacers; (g) Group 7 spacers; (h) The x-axis represents the size of the CRISPR spacers, the y-axis represents the number of the CRISPR spacer. The size of repeat and spacer were inversely correlated.

Figure 3
The evolutionary tree of Cas1 of all strains. The Cas1 has 57 strains, respectively. The Cas1 genes sequence were obtained by searching for the complete genome sequences in Genbank. Strains in one branch indicate most evolutionary similarities, the branch represented that these sequences could be divides into groups by certain values and the percentage of each branch showed the sequence similarity, and the evolutionary distance scale of Cas1 in 0.10.