DNA CYCLIC CODES OVER RINGS

. In this paper we construct new DNA cyclic codes over rings. Firstly, we introduce a new family of DNA cyclic codes over the ring R = F 2 [ u ] / ( u 6 ). A direct link between the elements of such a ring and the 64 codons used in the amino acids of the living organisms is established. Using this correspondence we study the reverse-complement properties of our codes. We use the edit distance between the codewords which is an important combinatorial notion for the DNA strands. Next, we deﬁne the Lee weight, the Gray map over the ring R as well as the binary image of the DNA cyclic codes allowing the transfer of studying DNA codes into studying binary codes. Secondly, we introduce another new family of DNA skew cyclic codes constructed over the ring ˜ R = F 2 + v F 2 = { 0 , 1 ,v,v + 1 } , where v 2 = v . The codes obtained are cyclic reverse-complement over the ring ˜ R . Further we ﬁnd their binary images and construct some explicit examples of such codes.


Introduction
DNA computing combines genetic data analysis with the computational science in order to tackle computationally difficult problems. This new field started by Leonard Adleman [3]. Adleman solved a hard (NP-complete) computational problem by DNA molecule in a test tube. Deoxyribonucleic acid (DNA) contains the genetic program for the biological development of life. DNA is formed by strands linked together and twisted in the shape of a double helix. Each strand is a sequence of four possible nucleotides, two purines; adenine (A), guanine (G) and two pyrimidines; thymine (T ) and cytosine (C). Hybridization, known as base pairing, occurs when a strand binds to another strand, forming a double strand of DNA. The strands are linked following the Watson-Crick model, every (A) is linked with a (T ), and every (C) with a (G), and vice versa. We denote byX the complement of X defined as follows,Â = T,T = A,Ĝ = C andĈ = G (for instance if x = (AGCT AC) then its complementx = (T CGAT G). The pairing is done in the opposite direction and the reverse order. Several authors have contributed to provide constructions of cyclic DNA codes over fixed rings. In [2,17], the authors gave DNA cyclic codes over finite field with four elements. Further, Siap et al. have studied in [19] cyclic DNA codes over the ring F 2 [u]/(u 2 − 1) using the deletion distance. More recently, Guenda et al. have studied in [10] cyclic DNA codes of arbitrary length over the ring F 2 [u]/(u 4 − 1).
In this paper we consider the design of DNA codes of length n over the ring R = F 2 [u]/(u 6 ). The ring R is a finite chain ring with 64 elements. With four possible bases, the three nucleotides can give 4 3 = 64 different possibilities called codons. These combinations are used to specify the 20 different amino acids used in the living organisms [4]. To this end, we construct a one-to-one correspondence between the elements of R and the 64 codons over the alphabet {A, G, C, T } 3 by the map φ such a correspondence is useful for the error protection (see [11]). We shall give the structure of the cyclic reversible-complement DNA codes over the ring R with designed edit distance D. We also give some upper and lower bounds on D.
The properties of our codes are the most required properties for DNA computing; namely, our codes are reversible-complement and also cyclic which is a very important property, since it can reduce the complexity of the dynamic programming when testing the strand from the unwanted secondary structures [14]. The edit distance is an important combinatorial notion for the DNA strands. It can be used for the correction of the insertion, deletion, substitution errors between the codewords. This is not the case for the Hamming, deletion, and the additive stem distances. We shall also define a Lee weight and a Gray map over R. The image of our DNA codes under the mapping are quasi-cyclic codes of index 6 and of length 6n over the alphabet {A, G, C, T }. There are several advantages in using codes over the ring R. We list some of them below: 1. There exists a one-to-one correspondence between the codons and the elements of the ring R. This is presented in section 2. 2. A code over R contains more codewords than codes of the same length over fields. 3. The factorization of x n − 1 is the same over the field F 2 but in general is not the same over other rings. This fact simplifies the construction of cyclic codes over R. 4. The structure of the cyclic codes of any length over R is well-known [9], whereas little is know concerning the structure of cyclic codes of any length over rings. 5. The cyclic character of the DNA strands is desired because the genetic code should represent an equilibrium status [18]. Another advantage of cyclic codes, as indicated by Milenkovic and Kashyap [14], is that the complexity of the dynamic programming algorithm for testing DNA codes for secondary structure will be less for cyclic codes. 6. The binary image of the cyclic codes over R under our Gray map are linear quasi-cyclic codes.
In the second part of the paper, we study the DNA skew cyclic codes over the ring The obtained codes are reversecomplement. Further we give the binary images of the DNA skew cyclic codes and provide some examples. The advantage of studying the reversible DNA code in skew polynomial rings is to exhibit several factorizations. Therefore, many reversecomplement DNA code could be obtained in a skew polynomial ring (which is not the case in a commutative ring).
This paper is organized as follows. In Section 2 we present some preliminaries results as well as the one-to-one correspondence between the element of the ring R = F 2 [u]/(u 6 ) and their codons. Next, we give the algebraic structure of the cyclic codes over R = F 2 [u]/(u 6 ) and we study the DNA cyclic codes and reversecomplement of these codes. Moreover, we define the Lee weight related to such codes and give the binary image of the DNA cyclic code. Some explicit examples of such codes are presented. In Section 3, we describe a DNA skew cyclic codes It is a commutative ring with 64 elements. It is a principal local ideal ring with maximal ideal u . The ideals of R satisfy the following inclusions Since the ring R is of the cardinality 64, then we can construct a one-to-one correspondence between the elements of R and the 64 codons over the alphabet {A, G, C, T } 3 by the map φ, this is given in Table 1. A simple verification shows that for all x ∈ R, we have (1) x +x = u 5 + u 4 + u 3 + u 2 + u + 1.
Now, since R n is an R-module, a linear code over R of length n is a submodule C of Table 1. Identifying codons with the elements of the ring R.
R n . An (n, k) linear block code of dimensions n = ml, is called quasi-cyclic if every cyclic shift of a codeword by l symbol yields another codeword. For x ∈ R n , denote the number of the component of x equal to a i by n ai (x). The Hamming weight of x is w H (x) = n−1 i=0 n ai (x), where a i ∈ R * . The Hamming distance d H (x, y) between the vector x and y equals w H (x − y). Let x = x 0 x 1 . . . x n−1 be a vector in R n . The reverse of x is defined as x r = x n−1 x n−2 . . . x 1 x 0 , the complement of x is x c =x 0x1 . . .x n−1 , and also called the Watson-Crick complement (WCC), the reverse-complement is defined as x rc =x n−1xn−2 . . .x 1x0 . A code C is said to be reversible if for any x ∈ C, we have x r ∈ C. Moreover, C is said to be reversecomplement if for any x ∈ C, we have x rc ∈ C.
The edit distance is the minimum number of the operations (insertion, substitution and deletion) required to transform one string into another one. The edit distance can be defined as in [16].
Let A and B be finite sets of distinct symbols and let x t ∈ A t denotes an arbitrary string of length t over A. Then x j i denotes the substring of x t that begins at position i and ends at position j. The edit distance is characterized by a triple A, B, c consisting of the finite sets A and B, and the primitive function c : E → R + , where R + is the set of nonnegative reals, E = E s ∪ E d ∪ E i is the set of primitive edit operations, E s = A * B is the set of substitutions, E d = A * E is the set of deletions and E i = E × B is the set of insertions. Each triple A, B, c induces a distance function d c : A * × B * → R + which maps a string x t to a nonnegative value, defined as follows.
where d c ( , ) = 0 if denotes the empty string of length n.
It is easy to check the following bounds on the edit distance d c .
Proposition 1. Assume that X and Y are two strings in R n . Then the following holds:

2.2.
Cyclic codes over R = F 2 [u]/(u 6 ). In this subsection we give the algebraic structure of the cyclic code of arbitrary length over R. We start by giving the definition of cyclic code over this ring. Let C be a code over R of length n. A codeword (c 0 , c 1 , · · ·, c n−1 ) of C is viewed as a polynomial c 0 + c 1 x + · · · + c n−1 x n−1 in R[x]. Let τ be the cyclic shift acting on the codewords of C in the following way: The following theorem is a particular result of [6,9] which gives the structure of the cyclic codes of arbitrary lengths.
Theorem 2.2. Let C be a cyclic code of arbitrary length n over the ring R.
(ii) Assume n = m2 s such that gcd(m, 2) = 1. Then the cyclic codes of length n over R are the ideals generated by f 0 , uf 1 , Let denote by K the field R/(u). We have the following canonical ring morphism − : The rank of C is defined as Thus we have the following tower of linear codes over R. (2) For i = 0, · · · , 5 the projections of (C : u i ) over the field K are denoted by T or i (C) = (C : u i ) and are called the torsion codes associated to the code C (see [15]).
The following theorem presents some bounds on the edit distance for the cyclic codes defined above.
. Assertion (ii) comes from the fact that the code T or i (C) and T or 0 (C) are binary cyclic codes satisfying f i ⊂ C. The dimension of T or i (C) is n − deg(f i ). By the well-known Singleton bound, we have d c (C) ≤ min{deg(f i )} + 1. Assertion (iii) follows from Proposition 1 using again the Singleton bound.
Definition 2.4. Let 1 ≤ D ≤ 3n − 1 be a positive real number. Then a cyclic code C of length n over R is called an [n, D] DNA cyclic code if the following conditions hold: ( Condition (ii) given in Definition 2.4 shows that the defined DNA cyclic codes are reverse-complement cyclic codes.
The following statement can be obtained straightforwardly.
Then the following conditions hold: Theorem 2.7. Let C be a cyclic code of odd length n over R and assume that C is reverse-complement. Then we have: (i) C contains all the codewords of the form α(u)I(x); This implies that Multiplying the last polynomial by x m+1 in R[x]/(x n − 1), we obtain By Equation (5) we see thatâ + α(u) = a. Therefore, we obtain . Multiplying both sides of this equality by u 5 gives have the same degree, leading coefficient and constant term, one necessary have k 0 (x) = 1. Consequently, f 0 (x) is self-reciprocal. The same argument can be used for f 1 , f 2 , f 3 , f 4 and f 5 as well.
In the following, we are interested in providing sufficient conditions for a given code C to be reverse-complement.
Proof. Let c(x) be a codeword in C, we have to prove that c(x) rc ∈ C. Since Applying the reciprocal and using first Lemma 2.6, and next the fact that f 0 (x), It was also assumed that α(u) + α(u)x + · · · + α(u)x n−1 ∈ C, which leads to This is equal to Using similar arguments as in Theorem 2.8, one can prove the following statement. (ii) if C = f 0 , uf 1 , u 2 f 2 , u 3 f 3 , u 4 f 4 , u 5 f 5 is the cyclic code of odd length n over R, then C u 2 = u 2 f 5 and φ(C u 2 ) is over the alphabet φ(u 2 R).
Remark 1. The DNA cyclic codes which are obtained in the Lemma 2.11 are stable across the error in the DNA strands by the usage of the codons, see [5]. Any codeword of sub-code φ(C u 2 ) over φ(u 2 R) contains the nucleotide C and G. This is an interesting thermodynamic property of the DNA strand. For its importance, we send the reader to [10].
Example 2.12. Let us consider the following polynomial in In Table 2, we present the associate DNA cyclic codes of length 7 given with their corresponding size and their minimal Hamming distance. Table 3 and Table  4 present all codewords of DNA cyclic code associate to C = u 4 f 0 f 1 and to C = f 1 f 2 , respectively. Table 2. DNA cyclic codes of length 7 Table 3. A DNA Cyclic Code associate to C = u 4 f 0 f 1 given in (4)   GGGGGGGGGGGGGGGGGGGGG  CCCCCCCCCCCCCCCCCCCCC  CTCGGGCTCCTCCTCGGGGGG  GAGCCCGAGGAGGAGCCCCCC  GGGCTCGGGCTCTGTTGTTGT  CCCGAGCCCGAGACAATAACA  TGTGGGCTCGGGCTCTGTTGT  ACACCCGAGCCCGAGACAACA  TGTTGTGGGCTCGGGCTCTGT  ACAACACCCGAGCCCGAGACA  TGTTGTTGTGGGCTCGGGCTC  ACAACAACACCCGAGCCCGAG  CTCTGTTGTTGTGGGCTCGGG  GAGACAACAACACCCGAGCCC  GGGCTCTGTTGTTGTGGGCTC  CCCGAGACAACAACACCCGAG  TATGGGTATTATTATGGGGGG  ATACCCATAATAATACCCCCC  GGGTATGGGTATTATTATGGG  CCCATACCCATAATAATACCC  GGGGGGTATGGGTATTATTAT  CCCCCCATACCCATAATAATA  TATGGGGGGTATGGGTATTAT  ATACCCCCCATACCCATAATA  TATTATGGGGGGTATGGGTAT  ATAATACCCCCCATACCCATA  TATTATTATGGGGGGTATGGG  ATAATAATACCCCCCATACCC  GGGTATTATTATGGGGGGTAT  CCCATAATAATACCCCCCATA  TGTGGGTGTTGTTGTGGGGGG  ACACCCACAACAACACCCCCC  GGGTGTGGGTGTTGTTGTGGG  CCCACACCCACAACAACACCC  GGGGGGTGTGGGTGTTGTTGT  CCCCCCACACCCACAACAACA  TGTGGGGGGTGTGGGTGTTGT  ACACCCCCCACACCCACAACA  TGTTGTGGGGGGTGTGGGTGT  ACAACACCCCCCACACCCACA  TGTTGTTGTGGGGGGTGTGGG  ACAACAACACCCCCCACACCC  GGGTGTTGTTGTGGGGGGTGT  CCCACAACAACACCCCCCACA  CTCGGGCTCTGTTGTTGTGGG  GAGCCCGAGACAACAACACCC  GGGCTCGGGCTCTGTTGTTGT  CCCGAGCCCGAGACAACAACA  TGTGGGCTCGGGCTCTGTTGT  ACACCCGAGCCCGAGACAACA  TGTTGTGGGCTCGGGCTCTGT  ACAACACCCGAGCCCGAGACA  TGTTGTTGTGGGCTCGGGCTC  ACAACAACACCCGAGCCCGAG  CTCTGTTGTTGTGGGCTCGGG  GAGACAACAACACCCGAGCCC  GGGCTCTGTTGTTGTGGGCTC  CCCGAGACAACAACACCCGAG  GGGGGGCTCGGGCTCCTCCTC  CCCCCCGAGCCCGAGGAGGAG  CTCGGGGGGCTCGGGCTCCTC GAGCCCCCCGAGCCCGAGGAG CTCCTCGGGGGGCTCGGGCTC GAGGAGCCCCCCGAGCCCGAG Example 2.13. We have that . In Table 5 we present the DNA cyclic codes associate to C = f 0 , uf 1 , u 2 f 2 , u 3 f 3 , u 4 f 4 , u 5 f 5 .  GGAGGAGGAGGAGGAGGAGGA  CCTCCTCCTCCTCCTCCTCCT  GGCGGCGGCGGCGGCGGCGGC  CCGCCGCCGCCGCCGCCGCCG  GGTGGTGGTGGTGGTGGTGGT  CCACCACCACCACCACCACCA  AGGAGGAGGAGGAGGAGGAGG  TCCTCCTCCTCCTCCTCCTCC  AGAAGAAGAAGAAGAAGAAGA  TCTTCTTCTTCTTCTTCTTCT  AGCAGCAGCAGCAGCAGCAGC  TCGTCGTCGTCGTCGTCGTCG  AGTAGTAGTAGTAGTAGTAGT  TCATCATCATCATCATCATCA  CGGCGGCGGCGGCGGCGGCGG  GCCGCCGCCGCCGCCGCCGCC  CGACGACGACGACGACGACGA  GCTGCTGCTGCTGCTGCTGCT  CGCCGCCGCCGCCGCCGCCGC  GCGGCGGCGGCGGCGGCGGCG  CGTCGTCGTCGTCGTCGTCGT  GCAGCAGCAGCAGCAGCAGCA  TGGTGGTGGTGGTGGTGGTGG  ACCACCACCACCACCACCACC  TGATGATGATGATGATGATGA  ACTACTACTACTACTACTACT  TGCTGCTGCTGCTGCTGCTGC  ACGACGACGACGACGACGACG  TGTTGTTGTTGTTGTTGTTGT  ACAACAACAACAACAACAACA  GAGGAGGAGGAGGAGGAGGAG  CTCCTCCTCCTCCTCCTCCTC  GAAGAAGAAGAAGAAGAAGAA  CTTCTTCTTCTTCTTCTTCTT  GACGACGACGACGACGACGAC  CTGCTGCTGCTGCTGCTGCTG  GATGATGATGATGATGATGAT  CTACTACTACTACTACTACTA  AGGAGGAGGAGGAGGAGGAGG  TCCTCCTCCTCCTCCTCCTCC  AAAAAAAAAAAAAAAAAAAAA  TTTTTTTTTTTTTTTTTTTTT  AACAACAACAACAACAACAAC  TTGTTGTTGTTGTTGTTGTTG  AATAATAATAATAATAATAAT  TTATTATTATTATTATTATTA  CAGCAGCAGCAGCAGCAGCAG  GTCGTCGTCGTCGTCGTCGTC  CAACAACAACAACAACAACAA  GTTGTTGTTGTTGTTGTTGTT  CACCACCACCACCACCACCAC  GTGGTGGTGGTGGTGGTGGTG  CATCATCATCATCATCATCAT  GTAGTAGTAGTAGTAGTAGTA  TAGTAGTAGTAGTAGTAGTAG  ATCATCATCATCATCATCATC  TAATAATAATAATAATAATAA  ATTATTATTATTATTATTATT  TACTACTACTACTACTACTAC  ATGATGATGATGATGATGATG  TATTATTATTATTATTATTAT  ATAATAATAATAATAATAATA   Table 5. DNA cyclic codes associate to 2.4. Binary image of DNA codes. In this Section we will define a Gray map which allows us to translate the properties of the suitable DNA codes for DNA computing to the binary cases. Table 7 gives a binary image of the DNA cyclic code of length 7 given by Table 2. Any element c ∈ R can be expressed as c = a 0 + a 1 u + a 2 u 2 + a 3 u 3 + a 4 u 4 + a 5 u 5 , where a i ∈ F 2 , 0 ≤ i ≤ 5. The Gray map ϕ from R to F 2 is defined as follows a 1 , a 2 , a 3 , a 4 , a 5 ), where a i ∈ F 2 , 0 ≤ i ≤ 5. We have for example ϕ(1 + u) = (1, 1, 0, 0, 0, 0). We define the Lee weight over the ring R by The Lee distance d L (x, y) between the vector x and y is w Lee (x − y). According to the definition of the Gray map, it is easy to check that the image of a linear code over R by ϕ is a binary linear code. We can obtain the binary image of the DNA code by the map ϕ and the map φ. In Table 6 we give the binary image of the codons. The binary image of DNA code resolved the problem of construction of DNA codes with some properties, see [14]. The following property of the binary image of the DNA codes comes from the definition.
Further, if C is a DNA cyclic code of length n over R, then ϕ(C) is a binary DNA quasi-cyclic code of length 6n over F 2 and of index 6.
Proof. Let C be a DNA cyclic code of length n over R. Hence ϕ(C) is a set of length 6n over the alphabet F 2 which is a quasi-cyclic code of index 6. It is easy to verify that the Gray map is a linear weight preserving.
Remark 2. The usual Gray map from the ring R = {0, 1, u, u + 1} to F 2 , have the same isometric properties. Table 7. A binary image of DNA cyclic codes of length 7 given Table 2 The code C Length of ϕ(C) d H (ϕ(C)) Size of the Code ϕ(C) u 2 f 0 42 12 4096 Remark 3. The codes of rows 2 and 3 given by Table 7 are optimal according to [20].
There is a one-to-one map ψ between the elements ofR and the DNA nucleotide base {A, T, C, G} given by 0 → G, v → C, v + 1 → T and 1 → A. A simple verification shows that for all x ∈R, we have (5) θ(x) + θ(x) = v + 1.
In the following, we only consider codes with even lengths.
The ringR n =R[x; θ]/(x n − 1) denotes the quotient ring ofR[x; θ] by the (left) ideal (x n − 1). Let f (x) ∈R n and r(x) ∈R[x; θ], we define the multiplication from the left as follows.
for any r(x) ∈R[x; θ]. Define a map as follows Clearly, ξ is anR-module isomorphism map which implies that each element (c 0 + c 1 · · · + c n−1 ) ofR n can be identified with the polynomial c(x) = c 0 + c 1 x + c 2 x 2 + · · · + c n−1 x n−1 ofR n . Lemma 3.2 ( [1, Lemma 1]). If n is even, and x n − 1 = g(x) * f (x) inR[x; θ], then we have: The following proposition gives the structure of the skew cyclic codes overR n .
Proposition 2 ( [1, Corollary 5]). LetC be a skew cyclic code inR n . Then 1. If a polynomial g(x) of least degree inC is a monic thenC = (g(x)), where g(x) is (skew) right divisor of x n − 1. 2. IfC contains some monic polynomials but no polynomial f (x) of least degree inC is monic, thenC = (f (x), g(x)), where g(x) is a monic polynomial of least degree inC and f (x) = vf 1 (x) or f (x) = (v + 1)f 1 (x) for some binary polynomial f 1 (x).

IfC does not contain any monic polynomials. ThenC
is a binary polynomial that divides x n − 1.

3.2.
The reverse-complement DNA skew cyclic codes overR. In this subsection, we give conditions on the existence of the reverse-complement cyclic codes of an even length n over the ringR. In Table 8 we present all codewords of the DNA skew cyclic code of length 10 and minimal Hamming distance 2.
The polynomial f represents the DNA sequence X(ACT C). We get the reverse of the sequence X via f * (x) The reverse of the DNA sequence of X is given by (CT CA).
Notice that the definition of reciprocal polynomial overR[x, θ] is being different from the one defined over a commutative ring. Indeed, in the non-commutative ring R[x, θ], we use the right multiplication over the automorphism θ and the multiplication overR[x, θ]. Lemma 3.6. Let f (x) and g(x) be polynomials inR [x, θ] with deg(f (x)) ≥ deg(g(x)). Then the following assertions hold.
Proof. Let us prove Assertion (i).
The result follows. Now let us prove Assertion (ii). From the Definition 3.2, we have which completes the proof.
Theorem 3.9. LetC = (f (x), g(x)) be a skew cyclic code inR n , where f (x) is a polynomial of minimal degree inC and is not a monic polynomial, g(x) is a polynomial of least degree among the monic polynomials inC. IfC is a reversecomplement code then f (x) and g(x) are self-reciprocal.
Proof. The proof is similar to the proof of the Theorem 3.7 and of Theorem 3.8.
Using similar arguments as those used in the proof of Theorem 3.10, we can prove the two following statements.
Theorem 3.11. Let C = (vf 1 (x)) be a skew cyclic code inR n , where f 1 (x) is a monic binary polynomial of lowest degree with f 1 (x)|(x n −1). If v(x n −1)/(x−1) ∈C and f 1 (x) is self-reciprocal, thenC is reverse-complement. Theorem 3.12. Let C = (f (x), g(x)) be a skew cyclic codes inR n , where f (x) is a polynomial of degree minimal inC and is not monic polynomial. Let g(x) be a polynomial of least degree among monic polynomial inC. If v(x n − 1)/(x − 1) ∈C and f (x) and g(x) are self-reciprocal, thenC is reverse-complement.
We can obtain the binary image of the DNA code from the maps ϕ and ψ as well as the DNA alphabet onto the set of length 2 binary word given by G → (0, 0), A → (1, 1), T → (0, 1), C → (1, 0). We have the following property of binary image of the DNA skew cyclic code.
Corollary 2. The map R → F 2n 2 is distance preserving linear isometry, hence if C is DNA skew cyclic code over R, then ϕ(C) is a DNA skew quasi-cyclic code of length 2n and of index 2.