Resistance of the double random phase encryption against various attacks.

Several attacks are proposed against the double random phase encryption scheme. These attacks are demonstrated on computer-generated ciphered images. The scheme is shown to be resistant against brute force attacks but susceptible to chosen and known plaintext attacks. In particular, we describe a technique to recover the exact keys with only two known plain images. We compare this technique to other attacks proposed in the literature.


Introduction
Over the past few decades, the inherent parallelism of optics has prompted a great deal of research on optical information processing. Among all the possible applications, a very promising direction is optical encryption [1]- [24]. With their pioneering work on double random phase (DRP) encoding, Réfrégier and Javidi [1] paved the way for many following proposals of optical security and encryption systems. This very DRP scheme has spawned several variation techniques and has been applied in many situations [11]- [24]. However only recently has the security of DRP started to be thoroughly analyzed and a few weaknesses have started to appear [25]- [29]. Carnicer et al. presented a chosen ciphertext attack where an attacker can retrieve the key by inducing a legitimate user in deciphering many specially crafted ciphered images [25]. Gopinathan et al. [27] and Peng et al. [28] described two techniques to recover the key knowing a single plain image and the corresponding ciphered image. In the first case the key is obtained through a simulated annealing process, while in the second case a phase retrieval algorithm is used. A variant of the DRP encryption is also attacked in [29]. In this paper, we comment on several possible attacks on the DRP encryption scheme and in particular we present a known plaintext attack that requires two known plain images [26], but that is able to retrieve the exact key rather than an approximation of it as in Refs. [27] and [28]. We demonstrate the feasibility of the proposed attacks using computer-generated images.
In section 2, we recall the principle of the double random phase encryption. In section 3 we consider brute force attacks to the system. In sections 4 and 5 we describe chosen plaintext attacks and known plaintext attacks respectively. Section 6 gives a discussion of the various attacks presented in this and other papers. Conclusions are presented in Section 7.

Overview of the double random phase encryption scheme
The original double random phase encryption technique was proposed in [1]. It is described in Fig. 1. The image to be encrypted, or plain image P, is immediately followed by a first random phase mask, which is the first key X. Both the image and the mask are located in the object focal plane of a first lens. In the image focal plane of this lens is therefore obtained the Fourier transform (FT) of the product P · X. This product is then multiplied by another random phase mask that is the second key Y . Lastly, another FT is performed by a second lens to return to the spatial domain. Since the last FT does not add anything to the security of the system, we will perform all our analyses in the Fourier plane. The ciphered image C is then where F stands for the Fourier transform operation. In most of the paper, we will assume that P is a gray-level image -the so-called amplitude encoding. However, the last attacks we will present also work with the phase encoding variant [12] where the image is pure phase (see Section 6).

Exact decryption
The first obvious attack against any encryption system consists of trying every possible key until finding the correct one. Assuming that both phase masks have a size of √ N × √ N pixels and that each pixel has L possible phase values, the number of attempts required to retrieve both keys is of the order of L 2N . For any practical values of L and N, this number is huge and it increases exponentially with the number of pixels N which makes this attack computationally intractable. For instance, let us fix L = 16 phase levels and N = 100 × 100 pixels. Then the number of keys to try would be 16 20000 ∼ 10 24000 , which is inconceivable. In practice, however, it is not always necessary to consider all of these combinations.
Thus, in the most common case of amplitude encoding [1], we do not need to know the first key X to decipher an image C. Indeed, if we know the second key Y , we can find where F −1 denotes the inverse Fourier transform and ÷ stands for the point-by-point division. Now because P is a pure magnitude function and X is a pure phase function, the plain image is easily recovered by where | | is the modulus. This property leads to the first reduced brute-force attack: Attack 1: Try every possible instance of the second key Y , while ignoring the first key X.
The number of trials required by this attack is L N instead of L 2N . This represents a great simplification since the number of combinations is reduced by a factor L N . In the case of the numerical example used above, the number of trials is 16 10000 ∼ 10 12000 .

Approximate decryption
In addition to attack 1, it is possible to look for an approximate version of the key Y instead of its exact value. The first possible simplification consists of reducing the number of phase levels, at the cost of a loss of quality in the decrypted image. For instance, we can look for keys with only two phase levels.

Attack 2:
Try only binary phase keys for the second key Y ; ignore the first key X.
The number of possible combinations is then 2 N instead of L N . For the example of section 3.1, the number of combinations to try would be 2 10000 ∼ 10 3000 . Figure 2 shows the result of decrypting two 100 × 100 images with various reductions of phase levels of the key Y . The original key has 16 phase levels but the original image is recognizable even with a binarized phase key. However, the fewer phase levels, the more noise is introduced in the reconstruction.
Another possible simplification is to retrieve a partial window of the key Y instead of the full key [25,26,29]. Since the pixels outside the window are then ignored, deciphering with the partial window amounts to applying a low-pass filter to the deciphered image. The details of the original image are thus lost but large features can be recognized.

Attack 3:
Brute force search of a partial window of the second key Y ; ignore the first key X.
If the window has a total of N r < N pixels, then the number of combinations to try is L N r instead of L N . Figure 3 presents decryptions of two 100 × 100 images using partial windows of the ciphered image with various sizes. In this case, retrieving a 30 × 30 window of the second key is sufficient to recognize the plain image, although the details are lost. The number of combinations is thus reduced to 16 900 ∼ 10 1080 .
Additionally, it is possible to combine Attacks 2 and 3 as follows: Attack 4: Brute force search of a partial window of the second key Y trying a small number of phase levels; ignore the first key X.
For instance, if we use only three phase levels then the number of combinations to try is 3 N r instead of the original L N . Figure 4 presents decrypted images using ternary phase levels for  the second key and various window sizes. It can be seen that a gray-level image with details deteriorates faster than a binary image with a low-level of detail. In the case of Fig. 4(e), the number of trials would be 3 900 ∼ 10 429 , while the image is still recognizable. Although that number represents a dramatic reduction from the original 10 24000 attempts, the number of trials remains huge. Therefore we can conclude that brute force attacks are intractable. However it should be kept in mind that the nominal security of the scheme is lower than one may naively expect.

Chosen plaintext attacks
We now assume that the attacker has the ability to trick a legitimate user of the system into encrypting particular images. In this case, as mentioned in [25] and [26], a very simple and yet very effective attack can be mounted:

Attack 5:
Obtain the ciphered image corresponding to a Dirac delta function as plain image.
In other terms, the plain image is entirely black except for a single pixel. If the non-black pixel is centered, it is straightforward to see from Eq. (1) that the resulting ciphered image is -to a constant phase factor -the second key Y [26]. If the non-black pixel is not centered, then the ciphered image is the second key Y multiplied by a linearly varying phase. Since this linear phase is known (it only depends on the position of the non-black pixel), it is possible to extract the key Y . Thus, the system can be easily broken by encrypting a single chosen plain image. It can be argued that such a plain image can look suspicious to the authorized user that is to encrypt it. A variant can therefore be used: Attack 6: Obtain and subtract the ciphered images corresponding to two plain images whose difference is a delta function. This attack is made possible because the encryption scheme is linear. Indeed, If P 1 , P 2 are two plain images with corresponding ciphered images C 1 , C 2 , and α, β are two scalar coefficients, it is easy to see from Eq. (1) that αP 1 + β P 2 is encrypted to αC 1 + βC 2 . In the case of Attack 6, if P 1 − P 2 = δ then C 1 − C 2 is the ciphered image of a delta function -that is the second key Y -as in Attack 5. In order to disguise the attack even further, it is possible to use an arbitrary number of plain images (instead of just two) for which a particular linear combination gives a delta function. It is then sufficient to compute the corresponding linear combination of the ciphered images to obtain the key.
Note that Attack 5 is not applicable in the case of the phase-encoded variant of the encryption scheme [12]. This is because it is impossible to obtain zero-valued pixels in this case, thus forbidding the formation of a delta function. However, Attack 6 remains possible. For instance, let us assume that amplitude images with values between 0 and 1 are encoded to phases between 0 and 2π. Now we choose amplitude Image 1 to be uniformly zero and amplitude Image 2 to also be uniformly zero except in the center (0,0) where it takes a value of 0.5. Then the corresponding phase encoded plain images will be P 1 that is uniformly 1 (phase 0) and P 2 that will be uniformly 1 except in the center where it takes a value of −1 (phase π). The difference P 1 − P 2 is therefore 0 at every pixel except in the center where it take the value 1 − (−1) = 2. Therefore we can see that P 1 − P 2 = 2δ and Attack 6 is applicable. It has to be noted that, since the plain images are phase coded, the knowledge of the first key X is necessary to complete the decryption process. Fortunately, this key can be recovered with a single additional plainciphered image pair.
Attack 7: Employ Attack 5 or 6, and further obtain the ciphered image corresponding to a uniform image as plain image; deduce the first key X.
Indeed, if P 3 is a uniform (spatially constant) image, with its corresponding ciphered image C 3 , then Eq. (2) gives and since P 3 is constant, the first key is directly recovered as To sum up, with at most three chosen plain-ciphered image pairs, it is possible to recover the two encryption keys and break the system.

Known plaintext attacks
We now assume that the attacker knows several plain images with their corresponding ciphered images. The plain images do not have to have very specific forms so they do not have to be chosen by the attacker.

Vectorial notation
For the following analysis, we consider that P, C, X and Y are discrete (pixelized) and spatially bounded. This is always the case in practice, given that even continuous images and phase keys can be adequately sampled. We therefore represent these four two-dimensional functions by four arrays. The total number of pixels of each array is N. We re-arrange each array into a vector with N components; this vector is formed by sequentially appending each column of the original array. The four vectors are called P, C, X and Y. The linearity of the encryption scheme (see Section 4) means that the transformation between the plain image vector P and the ciphered image vector C can be represented by a matrix-vector product as follows where T is the N × N transformation matrix that characterizes the encryption process.

Using a base of the vector space
As a linear operation in a vector space of dimension N, the encryption is fully defined by its operation on a base of the vector space (that is, a set of N linearly independent vectors).

Attack 8:
Obtain the ciphered images corresponding to N known linearly independent plain images; retrieve the key by matrix inversion.
An N × N matrix P can be formed, using as columns the vectors P 1 ,... P N that correspond to the N plain images. Similarly, an N × N matrix C can be formed with the vectors corresponding to the ciphered images. The process of encrypting all the plain images can be written as where T is the same encryption matrix as in Eq. (6). Now, P and C are known to the attacker, and P is invertible because its columns are linearly independent. So, the encryption matrix can be computed by with P −1 the inverse of P. At this stage the system is compromised because knowing the encryption matrix T is equivalent to knowing the keys. Indeed, from Eq. (6), we see that any unknown plain image P can be obtained from the ciphered image C by where T −1 is the inverse of T.
Note that the attack presented in this subsection is quite general and would work for any linear encryption technique [30,31]. In practice, this attack is not very useful because it requires one to know a huge number of plain images, for instance 10000 images if each image has 100 × 100 pixels.

Using two known images
Here we assume that the attacker knows two plain images P and P with their corresponding ciphered images C and C .

Attack 9:
Obtain the ciphered images corresponding to 2 known plain images and deduce the keys by solving a linear system of equations.
Let us express the encryption operation in terms of p i , p i , c i , c i , x i and y i , the elements of the vectors P, P , C, C , X and Y, respectively. For the first image, we can write where F i j is the matrix that characterizes the FT operation. Note that, although it operates on vectors, it does not represent a one-dimensional FT but rather a two-dimensional FT on the images before they are re-arranged in vector form. For the second image, we can write similarly Now, it is possible to eliminate the elements of the second key y i by cross-multiplying Eqs. (10) and (11). We get In Eq. (13), the only unknown variables are the components of the first key x j . We thus rewrite this equation as where S i j = F i j (c i p j − c i p j ) is known. Equation (14) represents a system of N linear equations with N unknown variables. This system has a trivial solution x i = 0 for every i. This solution is not the one we are looking for because we know that the first key is not zero. Therefore there has to be another solution to this system. This is only possible if the determinant of the matrix S is zero, that is if the N equations are not linearly independent. The interpretation of this property is that the keys are defined up to a constant phase factor. We can thus arbitrarily choose the value of one pixel of the key and this will fix the values of the rest of the pixels. We decide to fix x N = 1. We can also eliminate the last equation of the system, because it is a linear combination of the other equations. With these modifications, Eq. (14) is transformed to This new system of N − 1 equations with N − 1 variables can be solved by classical system solving techniques such as Gauss elimination or LU decomposition [32]. Of course, the resolution is only possible if the determinant of this new system is not zero. In practice, this condition is not very restrictive and it is easy to find two images for which it holds. Once the first key is known, the second key is easily retrieved using for instance Eq. (10) by An example of this attack is presented in Fig. 5. The known plain images P and P are shown in Figs. 5(a) and 5(b) respectively. From these two images and their corresponding ciphered images, the keys X and Y are found. These recovered keys are then used to decrypt the unknown ciphered image shown in Fig. 5(c). The result of the decryption is given in Fig. 5(d). As can be seen, the plain image is successfully decrypted. Note that all the variables involved in the attack can be complex-valued. The only restrictions are that all x i , all y i and a sufficient number of p i and p i be different from 0. This situation means in particular that p i can take complex values as happens in the phase-encoded encryption. The attack therefore works without any modification in this latter case. It would even be applicable to a mixed amplitude-phase encoding. Also, the encryption keys need not be pure phase to be recovered.

Effect of noise
In all the attacks described above, we have assumed that the ciphered images are known without error. This might not be the case in practice. In this section, we assume that the ciphered images are known up to an additive gaussian noise. We have already said that Attacks 1-4 are not practical. For Attack 5-7, it is not difficult to see that the error on the retrieved key is proportional to the noise on the ciphered images. Attacks 8 and 9 are more heavily affected by noise since the linear system resolutions involved in the key recovery process amplify the errors. Let us focus on the most interesting case of Attack 9. Figure 6(a) presents six plain images, of which the two leftmost are supposed to be known. The corresponding ciphered images are degraded by a 5% additive gaussian noise. We apply Eqs. (15) and (16) to recover the keys from the two known images, as in Section 5.3. Figure 6(b) shows the images decrypted with the recovered keys. It can be seen that, even with this low noise, the images are not recognizable, except maybe the ones that were used to compute the keys.
In order to more accurately retrieve the encryption keys in spite of the noise, we need to use more than two plain-ciphered image pairs. We now assume that we know four such pairs, although the process can be generalized to any number of pairs. The four known plain images P 1 , P 2 , P 3 , P 4 are the four leftmost images in Fig. 6(a). For any two pairs of known plainciphered images (P α ,C α ) and (P β ,C β ), we can obtain a system of equations identical to Eq. (14): where S respectively. Equation (17) can also be written in matrix form as where S αβ is the N × N matrix that contains the S αβ i j , X is a N × 1 vector that contains the unknown pixels of the first key, and the right-hand term is a N × 1 vector of zeros. Now we can associate the four known images in six different ways in order to obtain six different versions of Eq. (18) with varying values of α and β . These six equations, together with the condition x N = 1 that avoids the trivial solution X = 0 (see Section 5.3), can be combined into a single equation: ⎡ where the left-hand term is the product of a (6N + 1) × N matrix by a N × 1 vector, and the right-hand term is a (6N + 1) × 1 vector. Equation (19) defines an overdetermined system of equations. A least-square solution of this system can be found by classical techniques such a QR decomposition or singular value decomposition [32]. The obtained key X is still noisy; in particular it is not pure phase as it is supposed to be. We can enforce this property by normalizing the amplitude of each pixel: We now need to recover the second key Y from the first (normalized) keyX. Rather than a direct derivation from a single image using Eq. (16), it is more accurate to find again a least-square solution using the four known images. Since thex i are now known, Eq. (10) for image α can be rewritten as where Equation (21) can be written in the following matrix form where Λ α is a diagonal matrix given by The four known images provide four different versions of Eq. (23) that can be combined into a single equation: Again, Eq. (25) can be solved in a least-square sense by classical algorithms and the obtained Y can be normalized byŷ Figure 6(c) presents the images decrypted using the retrieved keysX andŶ . The results are now very satisfactory, including for the two rightmost images that were not used in the key recovery process. These results demonstrate that a good approximation of the keys was found. If the noise were stronger, it would be necessary to incorporate more known images in the key estimation process.

Discussion
This paper presents an analysis of the theoretical -rather than practical -security of the DRP technique. The various presented attacks assume an abstract version of the technique where all data is available in digital form. In a real optical implementation, the difficulty to manipulate and measure the involved physical quantities adds some challenges to the attacker.
We have shown that Attacks 1 through 4 are intractable, which makes the DRP encryption resistant to brute force attacks. Attack 8 requires a very large number of known plain images and is therefore quite impractical. On the other hand, Attacks 5 to 7 are surprisingly easy and effective if the attacker is able to choose images to be encrypted. If this possibility does not exist, Attack 9 is still very powerful as a known plain image attack which only requires two known images. The DRP technique must therefore be considered potentially insecure against chosen and known plaintext attacks. Its main weakness lies in its linearity.
The attacks presented in this paper have some advantages over other attacks proposed recently. For instance, the attack described in [29] requires access to an encryption machine and uses a large number of chosen plain images, which makes it impractical. Similarly, the technique in [25] requires access to a decryption machine and a large number of chosen ciphered images. In addition, the latter paper mentions an attack with a centered delta function as input image. In this paper we have shown that a non-centered delta function will also suffice, as will any set of innocuous images whose linear combination is a shifted delta function. This latter attack defeats the defense proposed by Carnicer et al. [25]. Further, we have shown that with the possibility of a second chosen image (an image with constant complex amplitude) the first phase mask can also be found, allowing complex-valued inputs to be decrypted.
The attacks presented in [27] and [28] use heuristic approaches and result in an approximated recovery of the keys and the decrypted images are quite noisy. Actually, Ref. [28] does not show the decryption of an unknown image. On the other hand, our Attacks 5 to 9 are based on a mathematical analysis of the DRP technique and they are able to recover the exact keys and therefore provide a perfect decryption of unknown images. Additionally, it is important to note that the attacks proposed in [25,27,28,29] only work for the case of amplitude-coded plain images. In our proposal, Attacks 6 through 9 are perfectly applicable to phase-encoded images so they can be used against the full phase DRP encryption [12].
A variant of the DRP encoding that uses keys in the Fresnel domain has been proposed in [14]. Optical encoding using keys in the Fresnel domain will be substantially more secure as any attack requires a search in 3D domain to locate the precise Fresnel location of the keys. Therefore, various attacks presented in the literature which use the linearity property will not be very practical. Also, other encryption techniques based on fractional Fourier transform have been proposed [8,16,18,23]. The most effective attacks presented in this paper are not directly applicable to these variant schemes using Fresnel domain keys and fractional Fourier transform. These variant schemes should therefore be preferred to the original DRP encryption.

Conclusion
In this paper we have described several ways to attack the double random phase encryption technique, some of which are impractical and other are very effective. We have shown that an exhaustive search of the key is generally intractable, even when applying some simplifications to reduce the number of combinations. However, we have presented chosen and known plaintext attacks that are able to efficiently recover the keys of the system. The most dangerous attack only requires two known plain images. We have studied the effects of noise on this attack and proposed a way to reduce them. We have also provided a comparison of the proposed attacks to other techniques described in the literature. Given the risks involved with the presented attacks, it is recommendable to be extremely cautious when using the double random phase encryption. Large keys of at least 1000 × 1000 pixels should be used and, if possible, it should be avoided to re-use the same keys for different images, as in a one-time pad approach. Also, a safer alternative to the original double random phase encryption is to use variant schemes such as keys in the Fresnel domain or fractional Fourier transform encryption.