RFLP-kenzy: a new bioinformatics tool for in silico detection of key restriction enzyme in RFLP technique

Background Today, several bioinformatics tools are available for analyzing restriction fragment length data. RFLP-kenzy is a new bioinformatic tool for identifying restriction key enzyme that cut at least 1 sequence and a maximum of n-1 sequence. Results This bioinformatic tool helps researchers to select appropriate enzymes that yield different RFLP patterns, especially from overly identical sequences with single nucleotide mutation or other small variations. By using RFLP-kenzy, multiple DNA sequences could be analyzed simultaneously and the key enzymes list is provided. The present paper also demonstrates the ability of RFLP-kenzy to identify the key enzymes through the analysis of 16S rRNA sequences and the complete genome of various genera of microorganisms. Conclusion From the results, several key enzymes were provided indicating the importance of this new tool in the selection of appropriate restriction enzymes.

2 Materials and methods

Script construction
The key enzyme tool algorithm was created using Python 3, programming language version 3.11, and the Visual Studio IDE programming tool.For script editing, the file "enzyme_list_file.txt" was generated from the Bio.Restriction library.In general, this script processes eukaryotic, bacterial, or viral FASTA file sequences, which are first aligned with the Clustal Omega package 1.2.2-win64 (an external tool that must be installed in the given path c:\clustal-omega) and saved in Clustal file format (Fig. 1).The second part focuses on checking restriction sites in the sequences.In this step, all restriction sites are determined and stored for easy access in a list of dictionaries, recorded separately as "forward_site.txt," according to the enzymes available in the "enzyme_ list_file.txt"(Fig. 1).The aligned sequences are then read using the AlignIO module from the Biopython library.Each sequence is searched for restriction sites corresponding to the enzymes listed in the enzyme file.The positions of the cut sites are identified.Results, including the sequence, enzyme, and the cut site information, are provided.Finally, this newest script selects, stores, and displays the key enzymes in an "out_file_final_forward.
Fig. 1 Algorithm representation of the process of RFLP-kenzy txt," with special considerations for enzymes that cut only one sequence or all sequences minus one.

Microorganisms used for the script assay
To evaluate the performance and reliability of this virtual bioinformatic tool, validation studies were carried out using partial or complete genomes of diverse microorganisms collected from GenBank https:// www.1).

Analytical validation of RFLP-kenzy
For analytical validation of the script, sequences from 1263 to 30,146 bp (Table 1) were analyzed by all the library restriction enzymes.After analysis, the results are provided as a list of key enzymes.

RFLP-kenzy and output results description
RFLP-kenzy is a new bioinformatic tool developed to facilitate the selection of appropriate restriction enzymes that allow rapid distinction between sequences.By using RFLP-kenzy, key enzymes that cut only one sequence or all sequences minus one are shown on an individual tab as _file_final_forward.txt.Once the analysis is complete, the results including the analyzed sequence, enzymes, and the cut site information are also provided in an output file as previously described.

Analytical validation of RFLP-kenzy
The initial validation of the RFLP-kenzy tool revealed different restriction key enzymes able to differentiate For this same case study, other key enzymes which cut more than one sequence and could be used to separate these species are also provided by the RFLP-Kenzy tool (S1).For example, some key enzymes can cut all sequences at specific sites except for one sequence.In such case, species with this particular sequence could be quickly separated from the rest of the studied Escherichia group as is the case for Escherichia blattae DSM 4481 and for Escherichia albertii LMG 20976T that do not contain any restriction sites for the following enzymes Sma325I, Hpy300XI, CstMI, SalI, Pac19842II or XmaJI, AvrII, AspA2I, and BlnI, respectively (S1).The results also showed that key enzymes such as Hca13221V, HbaII, Sbo46I, or Hca13221V could be used to differentiate both of Escherichia coli NBRC 102203, Escherichia hermannii strain CIP 103176, Escherichia fergusonii ATCC 35469, and Leclercia adecarboxylata CIP 82.92, respectively (S1).
In the initial validation of RFLP-kenzy tool, the restriction patterns of some Salmonella typhimurium variants were further analyzed.Results reveal here also several key enzymes able to cut just one sequence such as Rtr1953I, Spe19205IV, Cko11077IV, MaqI, NgoAVII, and Bsp3004IV (S2).By using the key enzymes mentioned above, Salmonella typhimurium variant C85, variant C170, and variant C9 could rapidly discriminate each one from the other and from the rest of the Salmonella species.On the other hand, Salmonella typhimurium variant C170 or variant C11 could also be rapidly selected based on the Sba460II, BtgZI, or Pcr308II restriction patterns, respectively, because these two species do not contain any restriction sites for these enzymes (S2).
The validation of RFLP-kenzy tool included also the analysis of 11 complete genome of the SARS-related Coronavirus.As shown in the supplementary file (S3), restriction endonucleases able to cut from one to 10 sequences are selected as key enzymes.In addition, other key enzymes able to cut from two to 10 sequences also provided in the supplementary file(S3) proving the usability of this new bioinformatic tool in selecting key enzymes able to differentiate these closely related Coronavirus isolates.In the last example, the analysis of the restriction profile of 7 Trichoderma mitochondrion complete genome was also conducted for the validation of RFLP-kenzy tool.Results showed that different key enzymes are designed in the supplementary file (S4).Among these endonucleases, numerous key enzymes cut only one sequence as is the case for XmaIII, BstZI, BseX3I, Rsp-PBTS2III, SstE37I, Sth20745III, EclXI, RdeGBI, Eco52I, EagI, GdiII, UbaF13I, which allow to distinguish easily the Trichoderma atroviride ATCC 26799 strain from the other Trichoderma species.The following enzymes such as Van91I, Eco31I, Pae10662III, AccB7I, Bso31I, FspAI, SacII, PflMI, KspI, Sfr303I, BsaI, Cfr42I, BspTNI, and SgrBI or enzymes as HdeNY26I, RpaB5I, AquII cleave only the Trichoderma gamsii strain KUC1747or the Trichoderma virens strain Gv29-8 allowing specifically the differentiation of these two strains, respectively (S4).In addition, other enzymes such as TaqII, Nbr128II, BseYI, PspFI, Bpu10I, and GsaI or as SpoDI which cut, respectively Trichoderma koningiopsis strain POS7 or Trichoderma simmonsii strain GH-Sj1 provide in silico restriction patterns to separate with efficiency both these strains (S4).As mentioned earlier, RFLP-kenzy tool reveals other restriction enzymes capable to cut more than one sequence, allowing thereby the separation of the uncut sequences (S4).

Comparison with other tools
To highlight the advantages of the proposed tool, the performance of RFLP-kenzy was compared with other existing virtual programs including Pdraw 32, REDiges, NEBcutter, and CisSERS [7,8] tools.Unlike the previous tools, RFLP-kenzy allows the user input multiple genes or complete genome at the same time for comparatives studies.Also, by using RFLP-kenzy, the entire list of restriction enzymes from the rebase database is used for the analysis, which in this way offers the possibility of evaluating all restriction enzymes.Furthermore, RFLP-kenzy tool is not a web server-based program an advantage for high-throughput analysis that requires high internet connection and server availability which is a limit in the case of web server tools.Another advantage of RFLP-kenzy is that details of the cut site's positions, the analyzed sequence, and the restriction enzymes are also provided in addition to the key enzymes lists.With the RFLP-kenzy tool, any restriction enzyme could be added to the enzyme_list_file.txtallowing assessment of tagged endonucleases.In addition, this script runs locally on the IDE editor and could run on a notebook like Colab or Jupyter.Also, RFLP-kenzy does not require the installation of the Java Virtual Machine (JVM) as is the case for many other tools.
The script of RFLP-kenzy is available for free in the Supplementary file (S5).

Discussion
RFLP analysis is a method that examines DNA sequence variations by comparing the patterns of DNA fragments generated through restriction enzyme cleavage.However, one of the main challenges of RFLP techniques is to identify the appropriate restriction enzymes with specific recognition sites in particular to differentiate between the closely similar sequences with minor variations [9,10].This is the reason why several in silico tools have been developed in the past for the virtual analysis of the RFLP patterns, but each method has its own advantages and limitations as described above [8].Based on our previous study in which we proved that too closely related species from lactobacilli group, with more than 99% of 16S rRNA gene sequences similarities, could be separated only on basis of their 16S rRNA RFLP patterns by using key enzymes which cut 1 sequence in the minimum and n-1 sequence in the maximum [11].In the present work we propose, a free virtual RFLP tool that allows the selection of all key enzymes that cut at least 1 sequence and at most n-1 sequences.This tool provides the possibility of analyzing simultaneously the RFLP of multiples sequences using all restriction endonucleases available in the Bio.Restriction library.After the analysis, all restriction key enzymes are then listed in a list text.
Moreover, the results of the RFLP-kenzy validation demonstrated the effectiveness of this newly tool in selecting the appropriate restriction enzymes allowing the rapid distinction between closely related isolates like Salmonella Typhi variants that are usually reported as difficulty discerned by using restriction enzymes [12].In such cases, this tool could be very useful in short-term epidemiological surveillance of typhoid fever for example.Also, results showed that several restriction enzymes are selected as the key ones by using RFLP-kenzy and the whole genome of SARS Coronavirus strains.These enzymes could rapidly distinguish between new variants in this ribovirus group that accumulate mutations without any correction systems [13,14].With the rapid emergence of Coronavirus variants, the development of simple and accurate tools is very important in particular to detect and track mutations [15].Furthermore, the complete mitochondrion genome analysis of Trichoderma species by RFLP-kenzy tool also provides many key enzymes that could be very useful for rapid identification and selection of species in this fungal group with biocontrol traits for preventing diseases in plant [16].

Conclusion
In this work, we provide a free, simple, and new virtual restriction tool called RFLP-kenzy.Through the in silico analysis of the restriction patterns of four examples, we demonstrated that RFLP-kenzy is a useful and easy tool for identifying key enzymes that allow fast separation of partial or complete genomes of eukaryotic, prokaryotic, or viral organisms.By using this virtual new tool, restriction enzymes digestion patterns simulated, and the appropriate enzymes are designed without traditional laboratory experiments, helping in this way researchers to save time and resources.For further work, we would like to increase the capacity of the RFLP-kenzy tool to analyze large sized sequences of more than 20 mega bp.

Table 1
Species used for the analytical validation of RFLP-kenzy