Secure DNA data compression using algebraic curves

A system that achieves compression using arti ﬁ cial DNA packaging with the support of two algebraic curves is presented, whereby the Hermitian channel code algorithm introduces gain and safety. Additionally, performance results are presented with a gain of 7 dB against uncoded quadrature phase shift keying and 1 dB against McEliece, for a bit error rate of 10 − 3 . The results of the security levels compared with the McEliece system are also presented.

Introduction: Currently, in wireless communication systems due to the large amount of information transmitted, it is necessary to optimise the bandwidth and increase the speed of data transmitted by compression of the messages and channel coding.However, there are problems associated with high-speed transmission in wireless communication systems because they experience a high degree of interference [1].Therefore, methods and technologies need to be more efficient and stronger in terms of security to be resilient to the attacks.
In [2], Vardy showed that one of the strongest one-way functions for the intractability problem is implemented using distances for codes constructed over a field for the McEliece cryptosystem [3].In [4], Atito et al. introduced the artificial DNA packing with chaotic map to use stenography and/or encrypt information for the purpose of improving and increasing security level, privacy and authentication.However, synchronisation can be lost by the use of different substrates with different computer clocks [4].
For that reason it has been proposed, using chaotic maps Galois fields (GF) with the DNA packing [5], to generate artificial DNA sequences with chaotic map over GFs to hide information.The generation of a very large carrier sequence (S c ) still persists, underestimating the problem of performance in terms of spectral efficiency [5].However, as promoters (P) and terminators (T ) can be unique, they can be found in the DNA strand by an attack called basics of side-channel, which analyses the pattern of power [6].Moreover, eventually theoretical threads containing chaotic sequences between these markers can be broken by techniques which identify the Lyapunov exponent and other parameters of the chaotic map [7].
This Letter presents a communication system that achieves an increase in the data rate transmitted using wide bandwidth with high encryption by employing an artificial DNA algorithm with compression of information by a hyperelliptic curve, plus a mechanism of security, which provides encryption, robustness and efficiency using a Hermitian curve.It should be noted that in [8,9] the Hermitian codes were used only for channel coding or data storage.In [10], a cryptographic system is constructed using a combination of a hyperelliptic and a Reed-Solomon code; in [11], those results were extended to an LDPC (low-density parity check) code.Alternatively, in this Letter the Hermitian code is used for the S c channel encoding and security at the same time.1 System description of artificial DNA over GF(2 p ) and GF((2 q ) 2 ) System description: Fig. 1 shows the system description of an artificial DNA communications system over GF(2 p ) and GF((2 q ) 2 ), where Tx is the transmitter side and Rx is the receiver side; it is assumed that the key ( G, Z, t) is public (Z is a random error vector and t are the correcting errors).Moreover, T and P were negotiated in advance, or can be modified after the private key (S, G, P) is computed.The transmitted data is a binary data sequence represented by d The language symbols are encoded by the DNA codes in Table 1, whereby the DNA basis elements are represented by a couple of bits, but using polynomial notation adenine (A) = 0, thymine (T) = 1, cytosine (C) = α and guanine (G) = α 2 .The sender first generates a random DNA carrier S c = s δ 1 −1 , …, s 1 s 0 with an arbitrary length δ 1 = ρ 1 × p, with ρ 1 > ρ 0 , ρ 1 ∈ ℕ.If there is a desire to transmit the short word 'for', by the ASCII code it would be 24 bits, but by Table 1 it is reduced to 18 bits.Assume that the carrier S c has δ 1 = 5 * 5 = 25 bits, 18 bits of which are data d, i.e. δ 0 = 18, and the remaining 7 bits are T 1 and P 1 , but concatenated as T 1 and P 1 and so on for longer word sequences.
Simulation results: Fig. 2 shows the artificial DNA packing performance against McEliece and uncoded QPSK over the Rayleigh fading channel.The code used is (512, 314, 17) from [8], the concatenation of an artificial DNA, with a hyperelliptic and a Hermitian curve with p = 10 and q = 3, generates a gain over a Rayleigh fading channel of 7 dB compared with the uncoded QPSK systems GF((2 q ) 2 ) with q = 3, for a BER = 1 × 10 −3 .Thus shows that for each δ 0 bits at the input, the system saves β(δ 0 − δ 2 ) bits in the communication channel, with β > 0. Hence only δ 2 bits are transmitted to recover d = dd0−1 , . . ., d1 d0 .

Conclusion:
In this Letter, we have presented an algorithm that uses the combination of artificial DNA packing and two algebraic curves, which can be used in future wireless communication systems.Moreover, the concatenation of an artificial DNA, for a BER = 1 × 10 −3 a gain of about 1 dB is achieved by the code in the proposed DNA communication system compared with the McEliece communication system with GF((2 q ) 2 ) and q = 3, since S c has no gaps, which means more information is transmitted in a shorter time and with spectral efficiency.The security of the new security scheme is NP (non-deterministic polynomial time) complete.It is based on the minimum distance problem of the combination of two algebraic curves.

For
HyC, a reduced divisor is D 165 = (α 0 u 2 + α 17 u + α 30 , α 23 u + α 12 ) which has the cursor number Ψ(D 165 ) = 165.All of the coefficients of the polynomial representation a(u) = α 0 u 2 + α 17 u + α 30 and b (u) = α 23 u + α 12 belong to the Galois field GF(2 5 ).Then, the following carrier array S c = 00001 10011 10010 01111 01110 is embedded in the coefficients of the polynomial representation for D 165 , with δ 1 = 25 bits and δ 0 = 18 bits.From left to right, the first block is the coefficient of u 2 , whereas the second block is the coefficient of u, and so on.The cursor number for D 165 is represented as a binary with k = log 2 K = 10.94 ⌈ ⌉≃ 11 bits.In this case, a binary array with k = 10 bits is used for the cursor number.Therefore, 165 = 00 10 10 01 01, which represents the message V = (0 α α 1 1) in the field GF(2 2 ).

Fig. 2 Fig. 3
Fig. 2 Performance of artificial DNA communication system against McEliece and uncoded QPSK over Rayleigh fading channel

Fig. 3
Fig.3shows a comparison of attacks work factor (log 2 ) for the McEliece algorithm over GF(2 m )[3] against the proposed DNA system, which shows the increase of the work factor on the proposed DNA system when compared with the attacks on the traditional McEliece algorithm.The proposed DNA system outperforms the security level McEliece algorithm by 1000% for a field exponent of m = 13.