Achieving Privacy-Preserving Iris Identification Via El Gamal

: Currently, many biometric systems maintain the user’s biometrics and templates in plaintext format, which brings great privacy risk to uses’ biometric information. Biometrics are unique and almost unchangeable, which means it is a great concern for users on whether their biometric information would be leaked. To address this issue, this paper proposes a confidential comparison algorithm for iris feature vectors with masks, and develops a privacy-preserving iris verification scheme based on the El Gamal encryption scheme. In our scheme, the multiplicative homomorphism of encrypted features is used to compare of iris features and their mask information. Also, this paper improves the Hamming distance of iris features, which makes the similarity matching work better than existing ones. Experimental results confirm the practicality of our proposed schemes in real world applications, that is, for the iris feature vectors and masks of 2048 bits, nearly 12 comparisons can be performed per second.


Introduction
With the rapid development of the Internet and e-commerce, more and more scenarios in daily life require identity authentication for users, for example, services provided in airports and banks, and applications on mobile phones, ATMs and online financial services. With traditional identity authentication methods, passwords are easy to be forgotten or lost, and 'mobile phone number + verification code' in the identity authentication approach can be easily intercepted. Therefore, some biometric identification technologies have been developed [Cimato, Gamassi and Piuri (2009)]. However, most of existing biometric identification technologies perform feature comparisons in the plaintext format. Once the biometric information is stolen by hackers or internal personals, it may cause serious consequences of large-scale user private information leakage. In fact, the risk of user's biometric information leakage is no less than losing traditional identification media, such as ID cards. Although some studies have been focused on addressing the security and privacy issues of biometric, existing biometric systems may confront more and more difficulties in protecting biometric information against hardware attacks and system security cracking technologies [Liu, Wang, Chaudhry et al. (2018)]. In existing works, it is difficult to give accurate definitions on the security and privacy protection on biometric information in biometric systems. Ratha et al. summarized eight attack points on the inter-module channel in biometric systems [Ratha, Connell, Bolle et al. (2006)]. These attacks can be divided into four types according to the locations of attacks [Parbhakar, Pankanti and Jain (2003); Menaria and Jain (2017)], that is, physical layer attacks, transport attacks against feature and template databases, storage attacks against features and template databases, and attacks on software modules. There are some typical patterns for the above mentioned attacks: (1) Forged biometric attack: The attacker may use fake biometric features, for example, plastic fingerprints, to access the system; (2) Replay attack: The attacker may steal biological information from the transmission channel, and use them in completing identification; (3) Coverage feature extractor: The attacker may replace the processing results of the feature extractor with custom templates; (4) Forge the feature vector: The transmitted data between the feature extractor and matcher may be replaced by fake features; (5) Rewrite the matcher: Make the matcher output the results according to the attacker's expectation; (6) Change the template: Some templates may be modified, deleted or added to the database; (7) Attack channel: The transmission information between the database and matcher may be changed using different templates; (8) Rewrite the final decision: The attacker may modify or disturb traditional binary biometric with success rate of more than 50%. Although the first type of attacks can be resisted by employing the living body detection technology, this technology is still imperfect. Also, some methods can resist the attacks in identifying the interior of system, for example, enhance the security of the channel, encrypt the data to prevent from theft, add time-stamps to the data to prevent from replay attacks, and install anti-trojan and virus security software for the template database to ensure that the database cannot be tampered with. Unauthorized access to biometric templates is generally considered to be the greatest threat to the user's data security. Therefore, the protection of biometric templates becomes a key issue, where biometric template protection schemes should satisfy the following properties: (1) Renewability: It should be possible to revoke the leaked biometric templates and generate new biometric templates based on the same biometric data; (2) Diversity: The regenerated biometric templates should not match with the revoked template from the same biometric; (3) Security: It should be impossible to obtain raw biometric data from biometric templates, or at least the computation should be difficult; (4) The template storage scheme should not cause significant decrease in the biometric recognition rate, such as the error rejection rate and the error acceptance rate. Due to some factors such as the collection environment, the biometric characteristics of the same person will be slightly different in each acquisition. Thus, researchers have proposed a variety of methods for biometric template protection, which can be divided into the following categories: (1) Template protection method based on feature transformation [Nanni and Lumini (2008)]. This method requires a function to transform the biometric or template and match it in the transform domain. The transformation function should be invertible or irreversible. Note that a method based on reversible transformation will result in a lower error reception rate, however, when the parameters of transformation function are leaked, user templates would not safe due to the reversibility of the transformation. On the contrary, for irreversible transformation, even if the transformed parameters are leaked, it is difficult to deduce too much information about biometric templates.
(2) Template protection method based on biometric encryption. This method stores the encrypted biometric data as templates. According to different key sources, this category of template protection methods can be further divided into two types, that is, key generation methods and key binding methods.
(3) Template protection method based on secret sharing. The idea of secret sharing is to split the secrets in an appropriate way, where shares are managed by different participants, and only participants in the authorized sets can collaborate to recover the secret message. Thus, if the collected biometrics (plaintext or ciphertext) can be split and stored in multiple servers, the data leakage issue in the single server scenario can be effectively avoided, and the system can be disaster-tolerant.
(4) Template protection method based on homomorphic encryption [Brakerski and Vaikuntanathan (2014)]. With this method, the obtained iris feature template is protected in ciphertext format by a homomorphic encryption scheme.

Our contributions
To address the above mentioned issues, this paper proposes a privacy-preserving iris identification scheme. The iris feature template is processed as an image, then the output is treated as a binary vector for random number substitution to obtain a set of twodimensional vectors. These vectors are further dealt with the El Gamal encryption scheme to obtain the replaced iris information. The experiment uses the iris feature templates in CASIA database (https://download.csdn.net). An improved non-secure alignment algorithm was used to encrypt the iris feature template, which matched through similarity calculation. The simulation of iris recognition confirms that the performance of our scheme outperforms the traditional method. Therefore, it is practical to be deployed in small-scale iris recognition applications.

Homomorphic encryption
Homomorphic encryption was proposed by Rivest et al. [Rivest, Adleman and Dertouzos (1978)]. Suppose the domain of a function is set , the value field is , and the corresponding law is . If for 1 , 2 ∈ { } , ( 1 ⊕ 2 ) = ( 1 ) ⊕ ( 2 ) holds, then the law satisfies the homomorphism. In cryptography, it can be expressed as follows: If the encryption algorithm , then it is called multiplicative homomorphism. A semi-homomorphic encryption scheme only satisfies either addition homomorphism or multiplication homomorphism, whereas both are satisfied in a fully encryption scheme. Note that the El Gamal scheme [Xie (2014)] only satisfies the multiplicative homomorphism, which means it is a semi-homomorphic scheme.

Template matching
Template matching is usually performed according to their similarity [Zhang (2001)].
There are three commonly used methods for calculating similarity. In the following, suppose and represent two-dimensional vectors.

=1
. Although Euclidean distance calculation is simple and efficient, it has obvious disadvantages, for example, it treats different attributes in the processing object equally and may sometimes fails to meet actual needs. Block distance: The matching formula is defined as . This method is generally applied to the distance between two points in the plane, which is simpler than the Euclidean distance's calculation. However, it cannot be applied to binary strings. Hamming distance: The matching formula is defined as . Hamming distance is generally applied to the fields of information theory, coding theory, cryptography, etc., and it is applicable to binary strings. Due to its preferable properties, this method is employed in the template matching algorithm in this paper. Evaluation indicators: The evaluation index in the commonly used iris recognition system [Ma, Tan and Wang (2004)] reflects the rejection rate. The false rejection rate, also known as the rejection rate, represents the probability that a legitimate user is considered as an illegal user. The false acceptance rate, also known as the false positive rate, indicates the probability that an illegal user is considered as a legitimate user. The rejection rate can be divided into error acceptance rate (FAR), error rejection rate (FRR), error rate (EER), and overall error rate (TER), where TER is the sum of FAR and FRR. The equal error rate means that FRR and FAR are the same after the wait threshold is determined. By taking different thresholds, FRR will be different from FAR, and usually their variation is inversely proportional.

Iris recognition general process
The iris recognition process mainly includes image collection, image processing, feature encoding, feature matching, and conclusion. In this paper, we mainly focus on the feature coding and the feature matching procedures. To guarantee the privacy, the El Gamal scheme is used to encrypt the encoded iris information, and then the improved Hamming distance is used for feature matching in feature matching [Lv, Lv and Zhao (2017)]. Note that a threshold should be set at the initiation phase. If the Hamming distance is less than the pre-defined threshold, the recognition passes, otherwise it fails. The general process of iris recognition is shown in Fig. 1. (1) Extracting iris region: For the input image, first convert into a grayscale one, then employ the Canny operator to perform edge detection to obtain the inner edge of Iris, and followed by a Hough transform for detecting the outer edge of iris.
(2) Coordinate transformation: Note that the iris area extracted in Step (1) is ring-shaped, whereas the general image is rectangular. Thus, to facilitate the subsequent procedures, the obtained iris image is subjected to polar coordinate transformation and satisfies r = � 2 + 2 and = . In this way, the ring-shaped iris area is converted into a rectangle one.
(3) Normalization: Since the texture information of iris is affected by eyelids, eyelashes, etc., the closer to the outer edge, the more susceptible it is to interference. Therefore, the part close to the inner edge of iris is selected as the image containing the iris texture information. The obtained image contains both rich iris texture information and the amount of calculation. Finally, the rectangular area is normalized to 250×20 pixels.

Iris feature extraction
The iris image obtained above is subjected to feature extraction using a Gabor filter [Panchal and Samanta (2016); Li, Sun and Tan (2016)]. The iris feature is in fact the texture information. The difference between the iris signals is caused by different texture information. Therefore, the iris signal is the description of texture information, and the Gabor filter is the core algorithm proposed by Duangrna [Daugman (2003)] to extract the iris feature. In this paper, this classical algorithm is employed to extract iris features using Gabor filtering.

Privacy-preserving comparison
Since the iris feature is a binary vector after encoding, the Hamming distance is employed in performing template matching. The template matching method described above is directly operated on the iris texture information, which would result in information leakage. We propose an improved method of template matching. The extracted iris feature template is denoted by ( , ), and the template to be matched is ( , ), where , represent the row and column of the template matrix, respectively. In the iris feature extraction process, there are interference factors such as eyelids and eyelashes. These points are used as interference points when extracting features and encoding. When the iris is segmented, the pixels of these interference points are marked and generated. Iris feature template has the same size of interference template, they are written as ( , ) and ( , ). The mask is generated to remove interference points and improve the matching degree. If the extracted feature points are interference points, the feature points of ( , ) and ( , ) are recorded as 1 at the corresponding positions, otherwise they are recorded as 0. In this paper, an improved Hamming distance formula is defined as follows: (1)

Our method
Since the randomness of information encryption will affect the quality of encryption, our approach can be divided into the following three schemes with different randomness.

Scheme 1
(1) Mapping Randomly take two fixed prime numbers , , and use them to transform the obtained iris feature templates ( , ) and ( , ) as follows: (2) Encryption The vector information obtained in Step (1) is encrypted by using the encryption algorithm described in Section 2.2, where represents and in Step (1), and the encrypted , , and are represented as ( ), ( ), ( ) and ( ), respectively. (3) Matching Perform similarity analysis using the secret comparison algorithm described in Section 3.4. Replace , , and in the formula by ( ) , ( ) , ( ) and ( ) , respectively. That is, the original formula is converted to the following: (4) By calculating the relationship between the Hamming distance value and the pre-defined threshold τ, if HD < τ, it succeeds to pass, otherwise it fails.

Scheme 2
The difference between Scheme 2 and Scheme 1 is that the substitution formula is changed in Step (1). Specifically, for each element ( , ), two prime numbers , are randomly chosen, which are used to transform the obtained iris feature templates ( , ) and ( , ) as follows: The other two steps are the same as in Scheme 1. In fact, by adding more randomness on the basis of Scheme 1, it could provide more security for the iris feature template.

Scheme 3
This scheme introduces the randomness to the replacement, in this way to improve the security of Scheme 2. The following changes are made to Scheme 2. For each element ( , ), two prime numbers , are randomly chosen, a prime number is picked, and compute the iris feature template ( , ) and ( , ) as follows: The other steps are exactly the same as in Scheme 1.
Similarly, we have the following correctness analysis of Scheme 2 and Scheme 3.

Experimental analysis
Our experiments are conducted on WIN7 Ultimate operating system using the C programming language and MATLAB, where the CASIA iris database is used. Since Scheme 2 and Scheme 3 are developed by introducing more randomness to Scheme 1, they can enhance the security of iris information with Scheme 1. The more randomness, the lower computational efficiency. Therefore, we focus on analyzing the practicality of Scheme 3 with the best efficiency. As shown in Tab. 1 and Tab. 2, the iris identification achieves optimal effect when the Hamming distance HD=0.3. Thus, in the experiment, the threshold is taken as 0.3. Tab. 1 and Tab. 2 show the relationship between the threshold and FRR, FAR for the plaintext state and the ciphertext state, respectively.  According to the experiments, the selected threshold value directly affects the running timings of our Schemes. Fig. 2 and Fig. 3 demonstrate the running timing of our scheme under different thresholds for the plaintext state and the ciphertext state, respectively. From these figures, it is easy to see that when the threshold is 0.3, the proposed scheme takes nearly the most computing time. However, this threshold implies nearly the optimal recognition effect. When the length of iris feature vectors and masks is 2048 bits, the proposed privacy-preserving comparison algorithm can complete 12 iris comparisons per second. Thus, our solution is practical and secure when applied in small-scale iris recognition scenarios.

Conclusion
In this paper, a privacy-preserving iris feature vector comparison algorithm with mask is designed, where the iris feature information and the mask are encrypted by the El Gamal scheme. This paper also proposes an improved Hamming distance formula. With the multiplicative homomorphism of El Gamal scheme, the comparison of iris feature and Time\ms threshold mask information can be conducted in ciphertext format without leaking their privacy. Extensive experimental results demonstrate that the proposed scheme achieves preferable recognition effects and enjoys high efficiency.