n-cutting site of DNA splicing language for single string and palindromic rule

A new symbolization of Yusof-Goode (Y-G) splicing system was introduced by Yusof in 2012, is inspired by the framework of Formal Language Theory introduced by Head in 1987. Y-G splicing system is intended to present the biological process of DNA splicing in a translucent way. In this paper, starting with some relevant preliminaries, one theorem is proposed via Y-G approach using one initial string and one rule with different characteristics of the restriction enzyme. Additionally, the theorem showed the behavior of the splicing languages generated at single stage splicing. Two cases are considered in the theorem by conducting splicing using palindromic rule and palindromic recognition site with same left and right context for Case I and different left and right context for Case II. Furthermore, two molecular examples are discussed to validate two cases proposed in the theorem, which shows the real meaning of the theorem in biological aspect. From the proposed theorem, based on the splicing language generated, the type of splicing language can be determined. It is discovered that, the generated languages are in the form of limit and transient.


Introduction
The DNA recombinant technology is being used in animal and plant breeding, to produce antibiotics, hormones and other medically important agents [1]. This laboratory experiment definitely will cause a huge amount of money and high time investment, without promising result. Thus, with Formal Language Theory [2], a new strategy in biomolecular field is founded. The introduction of Formal Language Theory has sparked interest among researchers to explore the DNA splicing system with utilization of various mathematical modelling, for instance probability and automata [3][4][5]. By adopting probabilistic approach, some properties of probabilistic semi-simple splicing systems are investigated and proven that the generative power of the splicing languages is increase [3]. Other than probabilistic approach, automata has also been chosen by many researchers to be applied in solving DNA splicing problems. Fong et al. [4] studied on the relation between automata and subgroups. By using modified automata, the conditions for the recognition of subgroups were established. Following, in 2019, Fong et al. [5] applied the automata diagrams to visualize the splicing languages generated from DNA splicing system. The automata diagrams are presented by transition graphs where the representation of the languages is from the respective DNA splicing systems. These has proven that via mathematical modelling, DNA splicing system can be better explored. In 2012, some modification on the rule notation proposed by Yusof [6] to the splicing system initiated by Head, has encouraged more researchers to explore this field via Yusof-Goode (Y-G) splicing system [7][8][9]. Most recent researches focused on the language generated from splicing system, called splicing language, namely second order limit language [7], single stage splicing language [8] and two stages splicing language [9]. Other than employing the Y-G splicing system in their researches, authors applied variety of mathematical method in order to prove and validate the proposed languages, that is, by using automata, limit adjacency matrix and de Bruijn graph, respectively.
In 2004, three types of splicing languages are classified that is limit, adult or inert, and transient language [10]. Then, after eight years, Yusof has redefined and renamed the inert language to inert persistent language. A new type of splicing language called active persistent language is also introduced [6]. In this paper, one theorem is presented, which employed the new definition founded in 2015 [8]. The theorem is modelled via Y-G splicing system, which aim to investigate the effect of ncutting to the generated languages. Two molecular examples are then predicted by considering two cases disclosed in the theorem.
This paper comprises of four segments. The first segment is the introduction, followed by the second segment which stated some important definitions used in this paper. The main results and discussion are then presented in the third segment, with the proposition of theorem and disclosure of molecular examples. The generalized splicing languages from Case I and Case II are obtained and discussed. Finally, the results are summarized in the final segment.

Methodology
This segment itemizes some fundamental definitions used in this paper. Since this paper is based on Yusof-Goode approach, therefore the definition of Y-G splicing system and splicing language is first introduced: consists of a set of alphabets , a set of initial strings in and a set of rules, where for left pattern, for right pattern or for both patterns of rules, applied on DNA string. For and elements of , splicing and using produces the initial string together with and , presented in either order where and are the free monoids generated by with the concatenation operation and as the identity element. A language is a splicing language if there exists a splicing system S for which . Then, definition of single stage splicing language is stated.

Definition 2 [11]
Let be the Y-G splicing system. The set of single stage splicing language, , models the set of all molecule types which appear when all restriction enzymes, double stranded deoxyribonucleic acid (dsDNA) strings and ligases act simultaneously in a single buffer.
Next, definition of limit language is provided.

Definition 3 [10]
A limit language is the set of words that are predicted to appear if some amount of each initial molecule is present, and sufficient time has passed for the reaction to reach equilibrium state, regardless of the balance of the reactants in a particular experimental run of the reaction.
Lastly, definition of transient language is given as follows.

Definition 4 [6]
A transient splicing language is a set of strings that will ultimately be used up and disappear.

Results and Discussion
In this segment, to investigate the effect of n-cutting to the splicing language, one theorem is proposed. The theorem which is modelled via Y-G splicing system shows the presence of different type of generated languages. This is significant to the behavior of DNA splicing for one rule on a single string [8]. Through Theorem 1, prediction of single stage splicing languages based on palindromic rule and palindromic crossing site is presented. It is apparent that, both cases, I and II lead to the same outcomes, which produced splicing languages in the sequence of − − ′ ′ − . Hence, the theorem proved.

Some molecular examples of single string with one palindromic rule in Y-G splicing system
A splicing model proposed in [6] has shown that single string with one palindromic rule will generate more than 3 infinitely long splicing languages in the sequence of − − ′ ′ − , which is parallel to Theorem 1 above. The patterns produced in both cases are consistent with theorem proposed by Lim which suggests that, for one recognition site in an initial string, with palindromic characteristics of the crossing site, 3 patterns of splicing languages are generated [8]. However, from the suggested theorem above, it is observed that, if an initial string consists of more than one recognition site, the generated languages will be in the sequence of − − ′ ′ − , but with more than 3 infinitely long molecules. Thus, this result shows the effect of n-cutting to the generated languages in a splicing system. Therefore, to validate Case I and Case II respectively, two molecular examples are provided. . With the presence of the enzyme, string will cleave as follows: Consequently, the string can split to its rotation as follows: When splicing occurs at two cutting sites, with the reaction of ligase, the above molecules, and can religate to form new molecules. Given , the following splicing languages are generated: By induction, it is proved that, ( ) > 3,that is: Table 1, the conceivable strings are considered for . When the restriction enzyme, PvuI which is taken from the NEB catalogue reacted to cleave the DNA molecules, then, with the existence of ligase, the conceivable molecules are produced. For − , − ′ and ′ − sequences, the value of is substituted in the generalised splicing language obtained in the example. It shows that, the number of splicing languages produced by 2-cutting sites in a splicing system, will be more than 3 and infinitely long molecules, as increases.

Example 2:
Let be a Y-G splicing system with { } where . Initial string and a palindromic restriction enzyme, CviQI, with left cutting pattern on 5' overhang, and palindromic recognition site. With the existence of restriction enzyme, the string will cleave as follows: Consequently, the string can split to its rotation as follows: When splicing occurs at two cutting sites, with the reaction of ligase, and can religate to form new molecules of splicing languages, given :  Table 2, the conceivable strings are shown for . The 5' overhang restriction enzyme, CviQI taken from the NEB catalogue cleaved the DNA molecules. Then, with the existence of ligase, the imaginable fragments are considered for . By substituting the value of to the − , − ′ and ′ − sequences, it is observed that, the middle segment will duplicate according to the value of . This shows that, the number of splicing languages produced by 2-cutting sites in a splicing system, will produce more than 3 and infinitely long molecules, as increases.
From previous study in [7], to generate language and to preserve the biological characteristics, two types of approaches can be applied which are laboratory experiment and splicing system. In this segment, two molecular examples are given to show the existence of splicing language in the type of limit and transient. This is supported by findings in [12], which suggests that a language resulted from the splicing system is proven to exist experimentally such as limit, adult or inert persistent, transient and active persistent language [6,12].

Conclusions
In this paper, one theorem of a single string with a palindromic rule and palindromic crossing site is presented with number of cutting sites of more than one. Theorem 1 proved that, either with same or different left and right contexts, the number of splicing languages in the sequence of − , − ′ and ′ − will be more than 3. Consequently, as increases, the molecules will be longer and infinitely long due to the duplications of the middle segment in the generalized splicing language. It can be concluded that, an n-crossing site in a string will produce more and longer languages, in the form of limit and transient languages.
Patterns generated in the theorem is correlated to theorem proposed by Lim in 2015 [8]. However, according to Goode, these three patterns that generates infinitely long molecules will vanish and resulted in infinite set of transient languages, while the pattern in the form of initial string, will remain as a type of limit language [10]. The conclusion of single stage splicing language for a single string with a palindromic rule for two cases of palindromic crossing site is summarized in the table below:  Table 3 summarising the cutting and pasting process on certain DNA molecules. When a molecule being cut by a palindromic restriction enzyme with palindromic crossing site at two cutting sites, either for same or different left and right contexts of the chosen restriction enzymes, there will be more than 3 limit or transient languages produced. This finding hence supports the aim of this research where we want to see the effect of the number of cutting sites on the type of splicing languages. In conducting this research, the chosen restriction enzyme is only limited to the palindromic characteristics. Nevertheless, for future work, other characteristics of restriction enzyme can be considered. Additionally, the application of graph theory in presenting the generated splicing languages can be proposed.