1 Introduction

Low-rate speech coding is widely used in mobile communications, voice over IP (VoIP) and some instant messaging tools, and has become the main data traffic on the Internet. Due to its wide range of applications, dynamic generation and interactive transmission, low-rate speech coding is a good carrier for information concealment [1,2,3]. In recent years, there have been many studies exploring steganography methods for low-rate speech coding. For example, Yuan et al. [4] proposed a quantization index-modulated steganography based on the multidimensional vector concealment space, Huang et al. [5] proposed an information concealment method by replacing certain bits of mute frames, and Zhou et al. [6] proposed an extended Least Significant Bit (LSB) method based on hidden states.

However, current steganography methods are limited by low embedding capacity and poor concealment. Additionally, it is often difficult to apply these algorithms in practice due to their low steganographic capacity. Therefore, improving the concealment of steganography and increasing the steganographic capacity pose a main challenge in low-rate speech coding steganography.

There have been efforts to utilize Sudoku matrices to enhance concealment under the conditions of large hidden capacity for image steganography [7, 8]. These approaches allow good concealment with high hidden capacity. However, there have been no reports of low-rate speech coding strategies based on the use of Sudoku matrices to achieve information steganography.

In this paper, we describe a steganographic method based on three-dimensional Sudoku matrices by analyzing the characteristics of low-rate speech coding. The most popular G.723.1 coding was used to verify this method. The results show that the steganography method proposed in this paper can achieve a steganographic capacity of 200 bps and provides better concealment performance in the G.723.1 carrier than other methods.

2 Related Work

2.1 Steganography Methods for Low-Rate Speech Coding

Low-rate speech coding efficiently compresses speech to decrease the redundant information of the encoded bitstream. Performing steganography using low-rate speech coding is a challenging task. In [9], the Least Significant Bit (LSB) method was used according to the anti-noise of the G.729. A coding strategy was proposed in [10], but using the LSB method in the pitch period of the low-rate speech G.723.1 encoder results in significant speech distortion. In [11], the authors found that in the LSB-embedding algorithm, better speech quality can be obtained by adjusting the perceptual weighting filter parameter values. There are also methods that seek to quantify the index modulation, or Quantization Index Modulation (QIM). For example, the basic concept of the method described in [12] is to group the quantized codebooks and then to search the quantized codebook packets according to the concealment information. This method is suitable for digital audio signals with vectorization and allows small quantization error but has low hidden capacity. In [13], the QIM information concealment method was proposed for pitch period prediction of low rate speech G.723.1. This method has a maximum embedding capacity of 4 bits per frame and small hidden capacity. In [14], a more secure algorithm was proposed based on the QIM algorithm that uses a key to control the inversion state of each subtree in the QIM algorithm to improve security. In [15], matrix coding is introduced into information steganography, which reduces the amount of bit modification. Additional details about current approaches in speech steganography are summarized and described in [16].

The Sudoku matrix has been widely used in steganography, encryption authentication, digital watermarking and other fields. The number of matrices satisfying the properties of the soliton matrix is very large, nearly \(5.525\times 10^{27}\). The application of the Sudoku matrix in information steganography will greatly improve its security [7, 8]. The hidden capacity of a steganographic method based on the Sudoku matrix can be varied according to the changes of the Sudoku matrix size. Additionally, the quantization noise is small and the anti-detection ability is relatively strong. Image steganography based on Sudoku matrix is of growing interest, many kinds of Sudoku matrices and their variants have been tested in information steganography research [17, 18]. There are also steganographic variants of 3D chaotic maps [19]. Overall, image steganography methods that utilize a Sudoku matrix can improve image quality and enhance the hidden capacity. However, there are no reported methods for low-rate speech coding that utilizes Sudoku matrices for information steganography.

The main idea of this study was to exploit the advantages of the Sudoku matrix for low-rate speech quantization coding with high concealment capacity.

2.2 Analysis of the Pitch Period in Low Rate Speech Coding

The pitch period is the periodicity of the vocal cord vibration that occurs when sending out voiced sound, a very important parameter for speech signal processing. However, it is difficult to predict and detect the pitch period. Existing methods of pitch detection have limitations and it is hard to predict the exact value by signal processing. The main difficulties of pitch detection are as follows:

  1. (1)

    The change of speech signal is very complex and the pitch periodicity is not completely periodic.

  2. (2)

    The resonant peak of the sound channel sometimes affects the harmonic structure of the excitation signal, making it difficult to extract the complete information about the vibration of the vocal cords.

Fig. 1.
figure 1

The pitch period of low-rate speech sub-frame coding.

Since the existing pitch period detection technique is unable to obtain the true value of the true pitch period, modification will be tolerable if a slight modification of the predicted pitch period value is made to embed the secret information. This will not have a serious impact on the restoration of voice quality, suggesting that taking the pitch period as a hidden information embedded point is feasible. In addition, in low-rate speech signal coding, the pitch period is a common coding parameter, making it a reasonable approach. The low-rate speech signal coding must be processed by frames, such as with the G.723.1 encoder, where each speech frame consists of four subframes and each subframe has its own pitch period, as shown in Fig. 1. As the predicted values of the pitch period are sequential numbers, it is obvious that the influence of the pitch period on the quality of speech recovery is monotonically continuous. Thus, the physical modification is positively correlated to the logical variation.

3 Methodology

3.1 The Overall Framework for Low-Rate Speech Coding Steganography

Pitch period prediction is one component of low-rate speech coding. In the prediction process, by using a Sudoku matrix, steganography can be achieved by replacing the optimal pitch period value with the adjacent second best index value. To better introduce the principle, we next describe G.723.1 coding examples to illustrate the basic principles of steganography as follows.

Fig. 2.
figure 2

The overall framework of steganography based on G.723.1 encoding.

The G.723.1 coding process is shown in Fig. 2. The speech signal PCM is divided into frames, and then high-pass filtered and LPC -filtered to obtain the residual signal. The pitch period of the open-loop is then estimated by the short-time averaging method, and then the fourth best closed-loop pitch period is obtained by searching for close-loop pitch periods. The sum of the second and fourth closed-loop pitch periods, the first closed-loop pitch period and the third closed-loop pitch period form a three-dimensional numeric space. The three-dimensional numeric space composed of the pitch period values and the three-dimensional Sudoku matrices have a certain mapping relationship. There is secret information of the four search patterns near the coordinate of the best index value. By selecting the secret information of the search pattern with the smallest distance between values and then changing the three best index values to the coordinate values of the secret information value, the quantization error can be reduced and the concealment ability can be enhanced. In this way, the final transmitted signal is the modified closed-loop pitch period information. The overall scheme of low-rate speech steganography is shown in the portion of Fig. 2 within the dashed line.

As shown in Fig. 2, the most critical component is how to construct a three-dimensional Sudoku matrix for low-rate speech coding steganography and how to design embedding and extraction algorithms using the matrix for steganography. Previous paper [20] focused on the generation of circulated motions of the three-dimensional Sudoku matrix. In this paper, we directly apply this three-dimensional Sudoku matrix to design a steganography method for hidden pitch features in low-rate speech coding.

3.2 Steganography Embedding Algorithm Based on Sudoku Matrix Steganography

In the G.723.1 encoding process, the 3D Sudoku matrix is first initialized using the circulated motion construction method of the 3D Sudoku matrix. Since the size of the 3D Sudoku matrix using this method is \(8\times 8\times 8\), the range of the pitch period is 18–142, the bit allocation is 7 bits and the space size is 128. The \(8\times 8\times 8\) Sudoku matrix is then expanded in a periodic manner, to fill the entire three-dimensional numeric space (0–127). Then, the pitch period values of the four sub-frames output by the pitch predictor are combined to form three temporary coefficients in combination, assigned as \(index_{x}, index_{y}\) and \(index_{z}\). These values are then mapped to a three-dimensional Sudoku matrix, searching for the smallest index value in the four patterns as the final pitch period value.

Let the original \(8\times 8\times 8\) matrix be represented by the coordinate function, \(g = Magic(x, y, z)\), where x, y and z represent the three coordinate values of the 3D Sudoku matrix and \(x, y, z \in [0,7]\), and g denotes the values of 3D Sudoku matrix, \(g \in [0,63]\). The coordinate function after the periodic expansion is represented by \(q = Magic\_Expand(x', y', z')\), where \(x', y', z' \in [0,255], q \in [0,63]\). The function obtained after the periodic expansion is shown in Eq. 1:

$$\begin{aligned} q=Magic\_Expand(x', y', z')=Magic(x, y, z), \left\{ \begin{array}{ll} x = x'\quad mod\quad 8\\ y = y'\quad mod\quad 8\\ z = z'\quad mod\quad 8 \end{array} \right. \end{aligned}$$
(1)

During pitch period detection, it is necessary to search for the index value of the optimal pitch period and then perform the index value replacement based on the 3D Sudoku matrix. Therefore, when performing steganographic embedding for each frame of speech, the following algorithm is performed.

Step 1: Two pitch estimates are calculated for each frame, one for the first two subframes and one for the remaining two subframes. The open-loop pitch period estimation \(L_{OL}\) is calculated using perceptually weighted speech f[n]. A cross-correlation judgment criterion \(C_{OL}(j)\) maximization method is used to determine the pitch period, and the resulting open-loop pitch periods are \(L_{OL}[0]\) and \(L_{OL}[1]\), according to the following expression:

$$\begin{aligned} C_{OL}(j) = \frac{(\sum _{n=0}^{199}{f[n] \times f[n-j]})^2}{\sum _{n=0}^{199}{f[n-j] \times f[n-j]}}, 18\leqslant j\leqslant 142. \end{aligned}$$
(2)

Step 2: For sub-frames 0 and 2, the closed-loop pitch lag is selected from the appropriate open-loop pitch hysteresis in the range \(\pm 1\) and encoded with 7 bits (the final transmitted data is a closed-loop pitch period). For subframes 1 and 3, the closed loop pitch lag is differentially encoded using 2 bits (the final transmitted data is the differential coded data) and may differ by only \(-1\), 0, +1, or +2 from the previous subframe lag. The quantized and decoded pitch lag values are referred to as \(L_{i}\) from this point on, where

$$\begin{aligned} \left\{ \begin{array}{l} L_{0}\in \{L_{OL}[0]-1,L_{OL}[0],L_{OL}[0]+1\}, 18\leqslant L_{0}\leqslant 142, \\ L_{1}\in \{L_{0}-1,L_{0},L_{0}+1,L_{0}+2\},\\ L_{2}\in \{L_{OL}[1]-1,L_{OL}[1],L_{OL}[1]+1\}, 18\leqslant L_{2}\leqslant 142,\\ L_{3}\in \{L_{2}-1,L_{2},L_{2}+1,L_{2}+2\}. \end{array} \right. \end{aligned}$$
(3)

Step 3: The values of the four closed-loop pitch periods are combined and the coefficients of output are \(index_{x}\), \(index_{y}\) and \(index_{z}\). The formula is as follows:

$$\begin{aligned} \left\{ \begin{array}{l} index_{x} = L_{0}-18\\ index_{y} = L_{2}-18\\ index_{z} = L_{1}-L_{0}+L_{3}-L_{2}+2 \end{array} \right. \end{aligned}$$
(4)

The difference ranges of \(L_{1}-L_{0}+1\) and \(L_{3}-L_{2}+1\) are both 0–3, the addition of the two can expand the range of \(index_{z}\) to expand the scope of the following search patterns. Based on the above analysis, we then need to map these values into the 3D Sudoku matrix and perform the following embedding operation.

Step 4: Preprocess the binary secret information stream and convert it into decimal key. Convert six consecutive binary numbers to decimal numbers Key. \(Key\in [0,63]\).

Step 5: Search for the Key values in the 3D Sudoku matrix function

\(Magic\_Expand(x, y, z)\) according to the following four patterns. Each of the search patterns includes a matrix search range, where the matrix range contains 0-63 non-repeating numbers, and the so-called search is to find the Key value in these element numbers. Obtain the four coordinates of the Key value \((index_{x}\), \(index_{y}\), \(index_{z})\), \(1\leqslant i\leqslant 4\). Search for the Key value by looking up the index table for optimization, which is not described in detail here due to space limitations.

$$\begin{aligned} \left\{ \begin{array}{l} x = index_{x}\\ index_{y} -3 \leqslant y \leqslant index_{y} + 4\\ index_{z} -3 \leqslant z \leqslant index_{z} + 4\\ \end{array} \right. \end{aligned}$$
(5)
$$\begin{aligned} \left\{ \begin{array}{l} index_{x} - 3 \leqslant x \leqslant index_{x} + 4\\ y = index_{y}\\ index_{z} -3 \leqslant z \leqslant index_{z} + 4\\ \end{array} \right. \end{aligned}$$
(6)
$$\begin{aligned} \left\{ \begin{array}{l} index_{x} -3 \leqslant x \leqslant index_{x} + 4\\ index_{y} -3 \leqslant y \leqslant index_{y} + 4\\ z = index_{z}\\ \end{array} \right. \end{aligned}$$
(7)
$$\begin{aligned} \left\{ \begin{array}{l} \lfloor index_{x}/4\rfloor \times 4 \leqslant x \leqslant \lfloor index_{x}/4\rfloor \times 4 + 3\\ \lfloor index_{y}/4\rfloor \times 4 \leqslant y \leqslant \lfloor index_{y}/4\rfloor \times 4 + 3\\ \lfloor index_{z}/4\rfloor \times 4 \leqslant z \leqslant \lfloor index_{z}/4\rfloor \times 4 + 3\\ \end{array} \right. \end{aligned}$$
(8)

Step 6: Compare the Euclidean distance between the four coordinates and the original coordinates. Since the smaller the Euclidean distance, the smaller the quantization error, the coordinates with the smallest Euclidean distance are selected as the best pitch period after the information concealment. The index coordinates of the final Sudoku matrix are \((bestindex_{x}, bestindex_{y}, bestindex_{z})\). The information embedding is completed.

$$\begin{aligned}&(bestindex_{x}, bestindex_{y}, bestindex_{z}) \\&= \mathop {\arg \min }_{i=1-4}[(index_{x}-indexi_{x})^2)+(index_{y}-indexi_{y})^2) + (index_{z}-indexi_{z})^2] \end{aligned}$$

3.3 Steganography Extraction Algorithm Based on Sudoku Matrix Steganography

Extracting secret information is relatively simple compared to encoding embedded secret information. Here, the \(8\times 8\times 8\) Sukodu matrix is periodically expanded to become a three-dimensional Sudoku matrix of \(128\times 128\times 128\). When the pitch period values of the four subframes are decoded, the four pitch period index values are extracted, and the sum of the pitch period values of subframes 1 and 3 and subframes 0 and 2 are converted to three-dimensional coordinates. Then, the coordinates are mapped in the three-dimensional Sudoku matrix named Key. The value is the secret information that is then converted into a binary data stream. The detailed secret information extraction algorithm is as follows.

  • Step 1: Extract the pitch period value of four sub-frames and assign them to \(L_{0}, L_{1}, L_{2}\), and \(L_{3}\).

  • Step 2: Perform the following operations on the four pitch period values, assigned to (bestindex\(_x\), \(bestindex_{y}\), \(bestindex_{z}\)).

    $$\begin{aligned} \left\{ \begin{array}{l} bestindex_{x}=L_{0}-18\\ bestindex_{y}=L_{2}-18\\ bestindex_{z}=L_{1}-L_{0}+L_{3}-L_{2}+2 \end{array} \right. \end{aligned}$$
    (9)
  • Step 3: Build and initialize the three-dimensional Sudoku matrix. The receiver shares the sender’s three-dimensional Sudoku matrix. This three-dimensional Sudoku matrix is written as \(Magic\_Expand(x, y, z)\).

  • Step 4: Locate the coordinate points into the three-dimensional Sudoku matrix, obtain the secret information as Key as shown in formula 6, and then the decoding is completed.

    $$\begin{aligned} Key = Magic\_Expand(bestindex_{x},bestindex_{y},bestindex_{z}) \end{aligned}$$
    (10)

4 Results and Analysis

4.1 Concealment Capacity Analysis

To perform the information concealment method based on the 3D Sudoku matrix for the G.723.1 speech coded stream, the range of pitch period of each frame was set to [0, 63], implying that 6 bits can be hidden in every frame. The length of each G.723.1 frame is 30 ms, so the concealment capacity of the concealment algorithm based on the three-dimensional Sudoku matrix is 6 bit/0.03 s = 200 bit/s in the pitch period. The modification method of the lowest two significant bits of the method in [10] has 3 sub-vectors for each frame and each sub-vector can hide 2 bits. Therefore, each frame can hide \(3 \times 2\,bit = 6\,bit\). The overall concealment capacity of this method is 6 bit/0.03 s = 200 bit/s. In the QIM information concealment algorithm based on pitch period proposed in [13], the codebook division method is odd-even division. Each subframe can hide 1 bit hence, each frame can hide \(4 \times 1\,bit = 4\,bit\). So, the overall concealment capacity of proposed method is 4 bit/0.03 s = 133.3 bit/s. The above concealment capacity analysis shows that the concealment capacity of this model based on three-dimensional Sudoku matrix proposed is equal to that of a previously reported method [10] and 1.5 times better than the method described in [13].

4.2 Concealment Analysis

The concealment of speech carrier steganography can be evaluated by the quality of speech. There are two evaluation methods of speech quality, i.e., subjective and objective. The most popular method of subjective speech quality evaluation is Perceptual Evaluation of Speech Quality (PESQ) and the most popular method of objective speech quality evaluation is the signal-to-noise ratio (SNR).

Subjective Speech Quality. Multiple speech clips of different speakers were selected to form the speech sample dataset. Speech clips from 4 categories were used, i.e., English male voice (EM), English female voice (EW), Chinese male voice (CM) and Chinese female voice (CW). Each category contains 10 long samples of 10 ms. Each voice clip is of 8 kHz sampling rate, in 16bit quantized PCM format. The subjective sampling speech evaluation method uses speech quality perception to evaluate the PESQ. The values of this test range from 0 to 5, and the higher the value, the better the voice quality.

Table 1. PESQ values and loss ratios for different hidden algorithms
Fig. 3.
figure 3

Comparison of SNR values of three kinds of steganography methods.

Table 1 shows the comparison of PESQ values for information concealment for the method of [10], the method of [13] and our method. The influence of the steganography method on the PESQ of speech is less than that of [10], which indicates that proposed steganography method is better than the method presented in [10]. The influence of method [13] on the PESQ of speech is smaller than that of proposed method, but both values are small. Additionally, the hidden capacity of our method is 1.5 times that of [13]. The average deterioration rate of PESQ was 9.84%, 8.04%, 4.01%, and 10.03%, respectively. The overall mean rate of deterioration was 7.98%, which was within an acceptable range. Thus, the algorithm meets the requirements of good concealment.

Objective Speech Quality. The objective speech quality evaluation method uses the SNR and the test results are shown in Fig. 3. By comparing the SNRs of the three steganography methods for different sample sets, we can see that in general, the higher the embedding rate of the secret information, the lower the SNR value is and the higher the quantization noise. For the different sample sets, the SNR of the steganography method in this paper was larger than that in [10] for equal concealment capacity, indicating that the quantization noise of our method is smaller than that of [10]. The SNR of proposed steganography method was slightly smaller than the SNR of [13], but it should take into consider that the concealment capacity of proposed method was 1.5 times that of [13].

5 Conclusions

The use of a Sudoku matrix was first introduced for using in a speech-carrier-based information concealment algorithm. In order to satisfy the specific distribution characteristic of the speech coding coefficients, the two-dimensional Sudoku matrix was extended to a three-dimensional Sudoku matrix to further enhance the speech concealment capacity with maintenance of good perceptual speech concealment.

The G.723.1 speech coding protocol was chosen as the basic method of information concealment. When the pitch period was estimated, the best index was mapped by using \(8\times 8\times 8\) Sudoku matrices. Using this method, the secret information was successfully embedded with an embedding capacity of 200 bit/s. Finally, a large number of speech samples were used to test and evaluate the proposed method and compare it to other methods found in the literature. The speech quality loss was evaluated based on the subjective speech quality perception criterion and the quantization error was evaluated by an objective SNR method. Results show that the proposed method can provide considerable concealment capacity with better imperceptibility.