Steganography in color images with random order of pixel selection and encrypted text message embedding

Information security is major concern in modern digital ages, and the outdated algorithms need to be replaced with new ones or to be improved. In this article a new approach for hiding secret text message in color images is presented, combining steganography and cryptography. The location and the order of the image pixels chosen for information embedding are randomly selected using chaotic pseudo-random generator. Encrypting the secret message before embedding is another level of security designed to misguide the attackers in case of analyzing for traces of steganography. Evaluating the proposed stegoalgorithm. The standard statistical and empirical tests are used for randomness tests, key-space analysis, key-sensitivity analysis, visual analysis, histogram analysis, peak signal-to-noise ratio analysis, chi-square analysis, etc. The obtained results are presented and explained in the present article.


INTRODUCTION
Steganology is an ancient science that is becoming more and more widely used with the development of digital information.
It consists of two main areas: steganography and steganalysis. Steganography is an interdisciplinary applied science field (Cox et al., 2007;Stanev & Szczypiorski, 2016), a set of technical skills and art for ways to hide the fact of transmission (presence) of information. It is one of the most effective approaches to protecting important information by hiding it (data hiding). High-tech steganography summarizes the areas for hiding messages using communication and computer technology, nanotechnology and modern advances in sciences such as biology, chemistry and others (Wu et al., 2020;Koptyra & Ogiela, 2020;Abd-El-Atty et al., 2020).
Steganalysis has exactly the opposite task. It combines methods and technologies for detecting secret steganographic communications. Along with the beginning of the development of modern ways of hiding information, at the end of the 20th century research in the field of steganalysis (Johnson & Jajodia, 1998;Provos & Honeyman, 2001;Fridrich & Goljan, 2002) has begun. Steganalysis is divided into two main categories: blind and targeted (Zhelezov, 2016). Targeted steganalysis methods have been developed to detect data embedded with specific stegoalgorithms and they are very accurate against certain stegomethods. The blind analysis methods are based on algorithms that require prior "training" with a series of empty and filled containers. The most characteristic of both types of analysis is that their methods are based on statistical dependencies in the analyzed subjects (Nissar & Mir, 2002). Such a method is POV (Pairs of Values) as part of the chi-square analysis (Westfeld & Pfitzmann, 1999;Fridrich, Goljan & Du, 2001).

Related work
One of the latest research which shows successful Optical Character Recognition (OCR) steganography technique with good results in steganalysis, is presented in (Chatterjee, Ghosal & Sarkar, 2020). Other example of resent steganographic research is described in (Pak et al., 2020), where the authors are using chaotic map for constructing a steganographic algorithm. Popular methods for image steganography are analyzed in Table 1.
The main task of the steganographic algorithm is to reduce the efficiency of such methods and thus to increase its reliability. For this purpose, it is necessary to choose a method of embedding that does not violate the statistical dependencies.
For this reason, a Spread Spectrum Steganography approach (Marvel, Boncelet & Retter, 1999;Satish et al., 2004) based on a pseudo-random sequence generator has been chosen in this article. Additional text encryption is applied for transforming the secret message into unreadable character sequence for increasing the level of security of the proposed steganographic algorithm. In this approach, the resulting pseudo-random sequences are used to determine the message embedding positions. This leads to preserving the statistical dependencies in the container. Another advantage of this approach is related to some types of targeted steganalysis. They extract the values of the smallest bits of the file sequentially and analyze them for repetitive sequences. With the embedding method proposed in the article this type of steganalysis is completely ineffective.

Motivation and justification
The text messages and the digital images are the most used information carriers concerning the data flow in the Internet and mobile communications. There are thousand of chat applications designed for short text messages correspondence using different ways to secure the communications between the users. Variety of cryptographic algorithms are implemented in order to protect the transferred information. Unfortunately, some of the encryption methods have become outdated and the new ones are being invented to improve secure communication. Information security will always be a major concern that motivates the development of new secret methods for data distribution and real-time communication. Such method is proposed in this work by combining two general scientific areas: steganography and cryptography.

An outline of the proposed work
Our main focus is to present a different approach for classic LSB Steganography in images using random order of pixel selection and embedding encrypted text message. In order to achieve our goal the proposed technique requires the following steps: constructing novel pseudo-random generator secure text encryption, using the constructed pseudo-random generator choosing random pixels (in chaotic order) from an image, using the constructed pseudorandom generator for embedding information using traditional LSB pixel's color modification for hiding, leaving no traces of steganography.

PSEUDO-RANDOM GENERATOR BASED ON DUFFING MAP AND CIRCLE MAP
For random pixel selection we are using pseudo-random generator (PRG) described in this section. Pseudo-random generators (also called Pseudo-random number generators (PRNG)) are software realized and unlike True-random generators (TRG) are easy for implementation with significantly lower cost and time consumption. This is why they are often used in cryptographic and steganographic systems. Examples of PRGs can be found in (Kordov, 2014(Kordov, , 2015b. The requirement resistance of PRGs is to different types of attacks are discussed in this section.

Duffing map description
The Duffing map is well known two dimensional non-linear discrete-time dynamical system with chaotic behavior (Holmes & Moon, 1983) which is a discrete version of Duffing oscillator (Van Dooren, 1988). Duffing map is given by: (1) where x t and y t are double variables, calculated on every iteration, and a and b are fixed parameters of the Duffing map. For chaotic behavior the parameters are set to a = 2.75 and b = 0.2 (Hasan et al., 2017;Riaz et al., 2018). The initial test values we used for variables are x 0 = −0.63825 and y 0 = 0.37713 and Fig. 1 is a graphical representation of the Duffing map with the described values.

Circle map description
The Circle map is explored for chaotic behavior in Shenker (1982) and DeGuzman & Kelso (1991). It has random-like properties and is suitable for constructing PRGs (Kordov, 2015a). The Standard circle map equation is: where θ is a double variable and Ω and K are the controlling parameters with values Ω = 0.7128281828459045, K = 0.5. The initial value for the variable we used for experiments in this article is θ 0 = −0.329054. Figure 2 is a graphical representation of the Circle map with the described values. Random bit extraction process The proposed bit extraction process is using Eqs. (1) and (2) and contains the following steps: The initial values of the constant parameters from Eqs. (1) and (2)  (1), x t and y t are used for calculation of additional double variable p t : temp1 t ¼ jintegerðx t Â 10 9 Þjmod 2; temp2 t ¼ jintegerðy t Â 10 9 Þjmod 2; and θ i from Eq. (2) is used for calculation of the variable q t : q t ¼ jintegerðu t Â 10 9 Þjmod 2 (4) The produced random bit is obtained by performing XOR operation between the variables p t and q t . The previous two steps are repeated until the necessary random binary sequence is reached.

Key-sensitivity analysis
This test is performed to determine the behavior of the proposed PRG if there are changes in the secret key that is used to produce binary sequences. To test the key sensitivity very similar secret keys are used (described in Table 2) by changing a single digit in one of the initial double variables. The results of the experiment are graphically presented in Fig. 3 and clearly show that the final binary sequence is different every time even if the secret keys are very Table 2 Secret keys values.

Secret
Variable values similar. This means the proposed PRG is very sensitive to any changes in the initial conditions.

Key-space analysis
The key-space includes the variety of possible values of the used variables in random bit generation. Equation (1) has two initial variables that can have different values (x 0 and y 0 ) and Eq.
(2) has one variable −θ 0 . The parameters from the equations are constant so they cannot be part of the secret key. In addition to the secret key, the integer values of N and M also can have different values. Considering the floating point standard of IEEE for double variables (IEEE Computer Society, 2008) every double variable has precision about 10 −15 . Combining the three variables we have (10 15 ) 3 ≈ 2 149 plus (2 32 ) 2 for the two integer variables and final about 2 213 for key-space. The required key-space for resisting brute-force attacks is 2 100 (Alvarez & Li, 2006) which means that the proposed PRG is secure enough. Kay-space comparison is presented in Table 3.

Randomness evaluation
The most important property of a PRG is to produce random binary numbers. To evaluate the randomness 1 billion bits are generated and the sequence is tested with the most popular statistical test software packages.

NIST-random test
The first software for randomness evaluation is NIST-Statistical Test Suite (Bassham et al., 2010) and includes 17 base tests. The testing process is performed by dividing the tested sequence into 1,000 subsequences with length of 1,000,000 bits. All the NIST need to have P-values in the range [0,1) to be considered for successfully passed. The results for all tests are summarized in Table 4.

DIEHARD-ramdom test
The second test package is DIEHARD software (Marsaglia, 1995) and contains 19 test for randomness. The tests applied for the same bitstream of 1 billion bits generated by our PRG. The acceptable range again for calculated P-values is [0,1) for passing the individual tests. The results for all tests are summarized in Table 5. All the tests in Table 5 have P-values in range [0,1), indicating that all the tests for randomness evaluation are passed.

ENT-ramdom test
The ENT statistical test software (Walker, 2008) is the last package we used for randomness evaluation. The ENT software tests are: Entropy test, Optimum compression test, χ 2 distribution test, Arithmetic mean value test, Monte Carlo π estimation test, and Serial correlation coefficient test. In Table 6 are presented the results for all tests from ENT software.

STEGANOGRAPHY IN COLOR IMAGES WITH RANDOM PIXEL SELECTION
The proposed method combines the classical Least-significant bit (LSB) value replacement by choosing random positions of hiding in the image. The random order and the message encryption is performed by the proposed PRG.

Message embedding algorithm
The process is performed by the following steps: 1. The text information is transformed to vector V of binary sequence using ASCII table values of the characters. 2. Control character sequence marking the end of the secret message is converted also in binary sequence and added to the vector V. (In our case we used "*#").
3. The binary sequence in vector V is encrypted using XOR operation and random sequence produced by PRG with Secret Key 1. The result vector is V′.
4. The proposed PRG is used with the Secret Key 2 to produce two times 24 bits for selecting random position in an image with following rule: x position ¼ integerð24bitsÞ mod image width; 5. If a pixel with position (x,y) is used the previous step is repeated until unused pixel position is found.
6. LSB technique is used for embedding three bits from vector V′ into RED, GREEN and BLUE color values of the selected pixel.
7. Steps 4-6 are repeated until the sequence from vector V′ is embedded into the final stego image.

Message extraction algorithm
The process is performed by the following steps: 1. The proposed PRG is used with the Secret Key 2 to produce two times 24 bits for selecting random position in the image with the following rule: x position ¼ integerð24bitsÞ mod image width; y position ¼ integerð24bitsÞ mod image height (6) 2. If a pixel with position (x,y) is used the previous step is repeated until unused pixel position is found.
3. The LSB values from RED, GREEN and BLUE colors are copied into vector V′. 4. Every 8 bits are transformed into char value and every last two obtained characters are compared with the control sequence that marks the end of the message ("*#").
5. Steps 1-4 are repeated until the control sequence is reached. 6. The binary sequence in vector V′ is decrypted using XOR operation and random sequence produced by PRG with Secret Key 1. The result vector is V′.
7. Vector V is transformed from binary sequence into ASCII chars equivalent forming the original hidden text message.

EXPERIMENTAL SETUP
For our empirical experiments we used 2.40 GHz Intel Core i7-3630QM Dell Inspiron laptop with 8 GB RAM, x64 Windows 10 Pro operating system. The proposed method is realized using C++ programing language and the test images are personal photos taken within our university region. Sixteen color images are selected-8 with dimensions 256 × 256 and 8 with dimensions 512 × 512. MATLAB R2016a software is used for histogram plotting and image analysis and processing. The initial values used for PRG are x 0 = −0.63825, y 0 = 0.37713, θ 0 = −0.329054 and for N and M − 541.

STEGANOGRAPHIC ANALYSIS
In this section, the most used tests for steganographic analysis are included for testing the proposed stego algorithm. The color images are tested by embedding secret messages with different length. The messages are random only for the experiments and contain 100 letters (800 bits), 200 letters (1,600 bits), 300 letters (2,400 bits), 400 letters (3,200 bits), 500 letters (4,000 bits), 1,000 letters (8,000 bits), and 2,000 letters (16,000 bits). All the test in Supplemental File 1 and 2.

Visual analysis
This is the most mandatory test for steganographic algorithm. A necessary requirement for any stego algorithm is to leave no visual traces of embedded secret messages or message container changes. Figure 4 shows one of the test images with its corresponding stego images with different lengths of embedded information. Figure 4 clearly demonstrates that there are no visual differences between the images and no traces of hidden messages. More examples are presented in Fig. 5, to confirm that there are no visual trace of steganography in corresponding stego images.

Histogram analysis
The image histograms are used for graphical representation of the tonal distribution of the red, green and blue colors. This experiment is designed to analyze if there are any changes in color distribution when the proposed steganographic method is applied.  Figure 6 shows the histograms of a test image with its corresponding stego images and Table 7 shows average pixel intensity values.
The histogram attack method (Fridrich & Goljan, 2002) is historically the first statistical attack described in the resources. It is based on the fact that with LSB embedding, the even pixel values either remain unchanged (unmodified) or are being increased by 1, while the odd pixel values either remain unchanged or decrease. Thus, the values (2 i , 2 i + 1 ) form a pair of values (PoV), which are exchanged during embedding. This asymmetry in the embedding function can be used and a statistical test applied to confirm or deny that the  The difference between the observed and expected occurrence frequencies for each pair is sought. In our case the observation of Fig. 6 shows that the tonal distribution is not changed when the secret messages are hidden in the plain image.

Peak signal-to-noise ratio and structural similarity analysis
The Peak Signal-to-Noise Ratio (PSNR) measure the possible maximum power of the clean signal against the power of the noise signal. Poorly changing the pixel values of an image can lead to corruption of the image quality which may uncover a possible steganography. PSNR is calculated using the following equation: PSNR ¼ 10log 10 MAX 2 MSE ðdBÞ; where MAX is the maximum possible value of the pixel color. Considering that every pixel has 8 bits for red, green and blue color, we use the average value of the three values meaning MAX = 2 8 − 1 = 255. MSE is the mean square error between the plain and stego images defined as: where p x,y and s x,y are the corresponding pixel values from the plain and stego images, respectively. Considering the color images have red, green and blue values for every pixel, the (p x,y − s x,y ) 2 is calculated by: The Structural Similarity (SSIM) is another method used in steganographic analysis proposed and described in Wang et al. (2004). The test is designed to determine the similarity between two images, in our case the similarity between plain and corresponding stego images. Values close to 1 are indicators for the best possible structural similarity between the compared images.
Part of the obtained values for MSE, PSNR and SSIM are shown in Table 8. The MSE and PSNR values are calculated for images with 100 chars (800 bits), 1,000 chars (8,000 bits), and 2,000 chars (16,000 bits) embedded. All the results are available in Supplemental File 3. Table 8 shows high values for PSNR (over 60 dB) meaning the stego algorithm do not destroy the image quality with considered minimum requirement of 20-30 dB for low quality. The obtained values for SSIM are close to the best possible value −1.

Additional metrics analysis
Some researchers use different metrics for steganographic analysis of their methods. For evaluation of the proposed algorithm we performed additional experiments for the most used indicators-Average Difference (AD), Structural Content (SC), Normalized Cross-Correlation (NCC), Maximum Difference (MD), Laplacian Mean Squared Error (LMSE), Normalized Absolute Error (NAE), Image Quality Index (IQI). The best possible value for SC, NCC, MD and IQI is 1 and for AD, LMSE and NAE is 0. The results for our method are presented in Table 8. All the results are available in Supplemental File 3.
The obtained results in Table 9 show results close to the perfect values demonstrating the stability and efficiency of the proposed stegoalgorithm.

Comparison
In order to compare the proposed method with other image steganographic algorithms we use the presented metrics (where available) in related articles. The main metrics for defining the security and the reliability of the stegomethods are related to preserving the quality of the cover images and keeping the similarity with the stego images. For the image quality estimation the PSNR and MSE metrics are applied, and for the similarity of the cover and corresponding stego images-SSIM metric. The following Table 9 contains the most used metrics. The test results in Table 10 show that the presented algorithm has satisfying statistical properties and provides better security level than compared methods.

Chi-square analysis
In this article, steganalytic software based on the Chi-square method is used (available at http://www.guillermito2.net/stegano/tools/index.html). The software graphically shows the positions of the pixel values according to the image Chi-square value of the tested image. The red curve indicates the Chi-square values of the tested images and the green values represent the average value of the LSBs. If the green values are below the red curve the test didn't pass successfully. Otherwise, it is assumed that the test was passed successfully, that is, there are no indications of a hidden message. For visual comparison we constructed a single screen shot image with six diagrams containing the results of the software.  Figure 7 demonstrates the results of our tests. The first is a diagram of the container and below are the corresponding stego files. The red curve is constantly at zero value leaving no green point under it. The Chi-square tests show that there is no trace of steganography in the stego files, indicated that the proposed algorithm can withstand against Chi-square attacks.

COMPUTATIONAL AND COMPLEXITY ANALYSIS
The proposed algorithm is tested with the conditions described in Experimental Setup Section. Concerning the complexity of our method, it is defined by the computations and iterations of the calculations for encryption and embedding operations. Considering the linear computation of every operation(random numbers generation, LSB modification  etc.) do not affect the complexity, the theoretical complexity of the proposed scheme is θ(8 * n) equated to θ(n), where n defines the input data of the algorithm. The input data of the algorithm is the secret message for embedding which is processed as bit sequence (8 bits for a character). The image parameters (width and height) do not increase the time consumption, because the number of random selected pixels depends only from the length of the embedded secret message. However, the images size is related to the memory consumption.
The following Table 11 summarizes the results of our empirical experiment for embedding different size of secret text messages.
The results in Table 11 show that the proposed method is very fast and the computational complexity depends entirely of the secret text length.

CONCLUSIONS
In this manuscript a new method for steganography is presented. The base of the proposed algorithm is a PRG used for secret message encryption and random pixels selection for data embedding. Proving the level of security the PRG is statistically tested for randomness and key-sensitivity, and the key-space analysis defines a necessary level of brute-force attacks resistance with minimum requirement 2 100 for key-space.
The steganographic algorithm is evaluated with visual analysis, file size comparison, histogram analysis and chi-square analysis and the results show that there are no traces of steganography when a secret message is hidden in the tested color images. The PSNR analysis indicates that the quality of the signal in stego images remains high, considered that the good quality of the signal is above 20 dB. Additional tests indicates high similarity between cover and the corresponding stego images for proving the security and the reliability of the proposed scheme. The presented method can be improved for real-time video communication with embedded data.