Iris image acquisition and real-time detection system using convolutional neural network

The aim is to further improve the efficiency of iris detection and ensure real-time iris data acquisition. Here, the light field refocusing algorithm can collect the data in real-time based on the existing iris data acquisition and detection system, and the DL (Deep Learning) CNN (Convolutional Neural Network) is introduced. Consequently, an iris image acquisition and real-time detection system based on CNN is proposed, and the system for image acquisition, processing, and displaying is constructed based on FPGA (Field Programmable Gate Array). The spatial filtering algorithm can compare the performance of the proposed bilateral filters with common filters. The results indicate that the proposed bilateral filters can pick out qualified iris images in real-time, greatly improving the accuracy of the iris image recognition system. The average time for real-time quality assessment of each frame image is less than 0.05 seconds. The classification accuracy of the iris image quality assessment algorithm based on DL is 96.38%, higher than the other two algorithms, and the average classification error rate is 3.69%, lower than the average error rate of other algorithms. The results can provide a reference for real-time iris image detection and data acquisition.

In the traditional image acquisition method, the analog video signal of the CCD (Charge Coupled Device) camera is processed into a digital signal with A/D (Analog to Digital converter), stored on the image acquisition card, and then send to the computer.This method is most commonly used and mature, but there are also some shortcomings [6].Firstly, the output of the CCD camera has been converted to the simulated NTSC (National Television System Committee) or PAL (Phase Alternating Line) format and is output through S-Video (Separate Video) or mixed video signal.In this way, the acquisition card cannot sample every pixel point of the camera, causing digital information loss and low resolution [7].Secondly, the HW (Hardware) circuit is complex, and the cost is high, hindering its popularization.With the development of IT, more and more algorithms can be implemented through HW.At the same time, due to the specialization of the production line, the images generated or transmitted in the same environment contain similar and stable noise types, promoting the utilization of ASIC (Application Specific Integrated Circuit) [8].However, due to the complexity and diversity of IPA (Image Processing Algorithms), only one structure can be adopted in a system, limiting the application of IPA.According to characteristics of the operation structure, FPGA (Field Programmable Gate Array) system is the optimal choice for image preprocessing.Complex IPA can be implemented through an FPGA system with a small number of chips and simple peripheral circuits.Moreover, images with different sizes and gradations can be processed with the same FPGA chips through parameter adjustment, so the FPGA system is very flexible [9].At present, FPGA has widespread in IP (Image Processing).Consequently, the system for iris image acquisition and real-time detection is proposed based on the FPGA system.The iris images are acquired in real-time through external equipment and the light field refocusing algorithm, and then they are detected in real-time through the DL (Deep Learning) CNN (Convolutional Neural Network) and the light field refocusing algorithm of FT (Fourier Transform).The effectiveness of the proposed model is verified through comparison with the related algorithms.The results can promote the development of the iris recognition industry.

Research on iris detection and recognition
The analysis of related works on iris detection and recognition shows that the traditional methods and the current DL methods mostly focus on feature extraction and recognition of iris images.Meanwhile, high-quality images are required in the research, that is, the images are often filtered and labeled, which are similar to the ImageNet dataset commonly used in image classification tasks.However, compared with the natural common sample image set, such as the ImageNet dataset, available open-source data sets with immense images are rare in the biological field, especially, in the field of iris image research.Some open-source data sets of many iris images are too small to learn iris features to build identity-iris feature recognition libraries.Ahmadi et al. (2017) pointed out that in iris image research, to obtain voluminous iris image features, independent image acquisition and filtering steps should be accomplished before laboratory experiment, and sufficient high-quality iris images should be prepared for recognition methods analysis [10].The quality of the self-collected iris images should be filtered, and the authenticity of the self-collected images should be judged, ensuring the effectiveness and efficiency of the follow-up work and avoiding information theft from the source as far as possible.Biswas et al. (2017) proposed an iris recognition and detection model for measurement.The analysis of results showed that the accuracy of the model was better than that of the traditional iris detection system as high as 92% [11].A general detection framework with an enhanced algorithm was proposed.The iris data processing ability was improved through the enhanced learning algorithm.The case analysis proved that the method had good practicability and achieved optimal performance [12].Jayanthi et al. (2020) proposed an integrated model based on effective DL for accurate iris detection, segmentation, and recognition.The CNN was adopted for data acquisition, and the final monitoring accuracy reached 99.14% [13].Agarwal et al. (2020) proposed a novel and proficient feature descriptor for local binary image detection of pseudo iris detection, and the proposed model showed high performance in different data dimensions [14].

Research on real-time image detection
Voluminous image data have restricted the performance of IPA, which costs so much time to process.Thus, an IPA with a better cost-effectiveness ratio is badly needed.Sitzmann et al. (2018) optimized the IPA to enhance the speed of image preprocessing.Although this method is feasible, the preprocessing algorithm has been quite mature and the complexity and speed can hardly be improved to ensure high accuracy [15].Bresilla et al. (2019) constructed a real-time image system for data acquisition based on the DL CNN of single-stage detectors.The results indicated that the processing speed of the system is higher than 20 FPS, and it was fast enough for any grasping/grasping manipulator or other real-time implementations [16].Shah et al. (2020) suggested excavating the parallel characteristics of preprocessing algorithms from the processed objects and performed the same operations on the relevant pixels in the image.To increase the processing speed and process data in realtime, a high-speed processor should be introduced in digital image processing with massive, simple, and repetitive operations [17].Thus, the implementation of the algorithm can be changed.Specifically, the massive, simple, and repetitive operations can be mapped to the HW circuit for parallel calculation, which is what HW is good at.Besides, the parallel calculation is substantially faster than serial calculation and thereby increasing the performance of the computer system.

Summary of related works
Many studies have been conducted and some achievements have been made in iris recognition, but to balance between speed and accuracy of recognition is hard still, concretely, either speed or accuracy has to be sacrificed for better performance.The speed and accuracy have become the bottleneck of iris recognition, which can be solved through the quality evaluation of iris images and the improvement of the speed and accuracy of preprocessing in iris automatic recognition.Moreover, it can improve the automation of the iris acquisition system.

Overall acquisition and detection system
The real-time acquisition of iris data is based on the FPGA system.FPGA is a semi-custom circuit in ASIC, and it is a PLA (Programmable Logic Array), which can be applied in devices with few gate circuits.The basic structure of FPGA includes programmable input and output unit, configurable logic block, digital clock management module, embedded block RAM, wiring resources, embedded special hardcore, and underlying embedded functional unit.FPGA has been expanded in digital circuit design because of its rich wiring resources, high repeatability, high integration, and low investment [18].The system structure is designed based on the actual situation after the performance of the main control chip FPGA, image sensor, DDR SDRAM (Double Data Rate Synchronous Dynamic Random Access Memory) are understood.Here, the AlteraCyclone IV series FPGA EP4CE15F17C8N chip can control the whole system.CMOS (Complementary Metal-Oxide-Semiconductor) digital image sensor OV7725 can acquire data, and SDRAM can store data [19].The structure of the system is shown in figure 1. (1) Interface design: Here, the OmniVision7725 CMOS camera is the image acquisition sensor.The sensor can switch between various working modes to output different frame rates and pixel frequencies and can configure and output three digital image signal formats.The image data are acquired through the image sensor.The digital image data in RGB565 format is obtained through the timing setting parameters of the data transmission bus by FPGA.

Image
(2) Image acquisition: To output images with 24Mhz pixel frequency, 640 * 480 resolution, and RGB565 format, the image sensor should be initialized.Firstly, the SCCB (Serial Camera Control Bus) protocol based on FPGA is realized.According to the SCL (System Clock Line) clock, the registered address is written through the bus interface SDA (Serial Data), and the parameters of the corresponding working modes are written.Finally, the above image data D [9:0] is acquired.The horizontal synchronized signal HREF (Horizontal Reference) and the vertical synchronization signal VSYNC (Vertical Sync) of the image are obtained while the image data are output.The DL CNN can acquire the iris data in real-time.
(3) Data read and write: Every system contains memory devices, without which large amounts of data cannot be handled, and SDRAM should be connected with other modules through a control module to avoid complex timing control and facilitate SDRAM usage.Consequently, an SDRAM control module is introduced, which can read and write data into SDRAM through simple control signals.

Real-time iris image acquisition
To acquire real-time iris images, the camera is connected to the PC through the USB interface, and real-time acquisition SF (Software) can capture the iris images.First, the candidates are required to eye the lens at about an eight cms distance.Then, when the shoot button is pressed, the real-time image is transmitted to the PC and displayed.Concurrently, a green frame will appear in the middle of the lens, and the system will automatically evaluate the quality of each captured frame that zooms when the candidate approaches or leaves.If a detected frame meets the requirements of image quality assessment, the frame image is cached, and the acquisition is interrupted for subsequent processing.If the frame is not qualified, image acquisition continues until the iris image meets the requirements.The process is shown in figure 2 The depth of camera focus will affect the change of square projection information in ray space.The x-plane in traditional cameras is fixed, but the imaging plane of the camera can be moved.Changing the distance between the imaging plane of the camera and the lens plane can achieve the focus of the camera at different depths [20].At the same time, when changing the depth position of the camera focus, the rays in the ray space will produce different slopes.Generally speaking, when the imaging plane is far from the lens, the focal plane will approach the camera's main lens plane, and the resulting slope is positive; when the imaging plane approaches the lens, the focal plane will move away from the camera's main lens plane, resulting in a negative slope value.After the imaging plane is moved, the slope of the band on the ray space represents the moving direction of the focal point relative to the x imaging plane [21].The ray space diagram shows that the relative direction of the convergence point movement and the slope value in the corresponding ray space depend on the distance between the convergence point and the x-plane.The depth of camera focus affects the distance between the lens plane and the imaging plane, and the integral trace of calculating focus image in the ray space will also shift.According to the similar triangle theorem, it can be seen that the imaging light LF passing through the lens u in the xplane and the imaging light LF1 after moving the imaging plane have the following relationship. (1) The calculation method of the pixel value E(α•F) (x,y) of F1=α•F in plane focus is as follows.
α in the above equation refers to the depth scale factor of the virtual imaging plane.
In the light field camera, the change of the camera focus position will not affect the spatial map of the rays; however, it will cause the vertical shift of the integration grid of the calculated image pixels.
When calculating the final generated image of the light field, the image synthesis needs to be considered as a process of imaging simulation of the virtual camera.The calculation of the camera pixel value can be obtained by the integration of the oblique line.The ray tracing technology is used to track the radiance of each ray in the ray beam.The method of tracing is to use the knowledge of geometry to track through the main lens first; then, the array of micro lenses is passed; finally, the beam reaches the surface of the image sensor to get the ideal rays.The image of the sub-aperture system in the camera is represented by LF (u, v), and the pixel at the position (x, y) can be recalculated by the aperture image LF (u, v) (x, y).The calculation method is as follows.
The above equation can be used to complete the refocusing process of the light field.

Real-time iris image detection
Here, the DL NN (neural network) can detect and acquire iris images in real-time.DL can learn the inherent laws of data through deep structure, enabling the model to understand and interpret the text, image, and acoustic data.As a result, the model based on DLNN can analyze and learn like the human brain, helping people recognize, understand, and judge text, image, and acoustic data.CNN algorithm is one of the main methods of DL and is currently the most widely used method in IP.Meanwhile, CNN is also one of the prominent network APPS (Applications) and is the key branch of the NN.Various APPS on mobile intelligent devices are based on CNN, from everyday life to production, showing good performance and facilitating people's life.
The original iris acquisition data are unlabeled.Here, the iris images are classified according to availability and authenticity.The unsupervised clustering method is added as an auxiliary module to classify unlabeled iris images.The purpose is to determine the availability of unlabeled iris images through the clustering method after network training and learning.The real-time acquisition and analysis structure of iris images is shown in figure 3. The proposed model analyzes image data from two aspects.First, the availability of iris images is analyzed.Second, the authenticity of iris images is analyzed.The structure and parameters of the first analysis are the input of the second analysis and are the core of the whole network.
Figure 4 shows that the preprocessed iris images are introduced into the VGG16 (Visual Geometry Group network) network to analyzed their availability, and the features extracted by different convolution layers are input into the unsupervised k-means clustering method.The extracted features are clustered through pre-set clusters number.The clustering results are the pseudo labels of the corresponding iris images and are compared with the pre-set data set labels.The degree of difference is the target loss to learn and reduce in the network, and the weights of the network are constantly changed through the BP (Backpropagation) training.Finally, the availability detection model is trained.Then, the authenticity should be detected for the iris images.According to the trained VGG16 network structure obtained in the availability detection, the preprocessed original iris images and the simulated pseudo iris images are introduced.First, the image features are extracted through the optimal structure layer of the trained VGG16 network, and then the image features extracted from the network structure are introduced into the RNN (Recurrent Neural Network) LSTM (Long-Short Term Memory) structure for training and learning.Consequently, the LSTM network parameters are optimized, and the authenticity detection model is trained.

  FIGURE 4 Real-time iris image detection structure based on CNN
In the process of image generation and digital imaging, Fourier theory provides different processing angles.According to the Fourier transform theory, the problem of light field refocusing is handled.Compared with the spatial domain-based refocusing algorithm, the operation speed of the Fourier-based refocusing algorithm is faster [22].The core of the refocusing algorithm based on the frequency domain is also the Fourier slice theorem [23].The Fourier slice theorem is usually used for the processing of two-dimensional functions.To adapt to the processing of multidimensional data of cameras, it is necessary to extend this theorem, and reduce the dimension to four-dimension to process the four-dimensional light field data.According to the two-dimensional Fourier slice theorem, it is proved that the Fourier transform of the two-dimensional function integral projection in one dimension is equal to the one-dimensional slice value of the two-dimensional function Fourier transform.If the two-dimensional Fourier transform value corresponding to the two-dimensional function f (x1, x2) is F (ω1, ω2), the projection in the x2 direction on the x plane is the integral derivative value of the two-dimensional function.The Fourier transform corresponding to the projection can be expressed as P(ω), and the following equation can be obtained through the Fourier slice theorem: The two-dimensional Fourier slice theorem is expanded.If f is an N-dimensional function, it is coordinately transformed, the transformed function is integrated into the M dimension for Fourier transform, and this process is to solve the function by Fourier transform; then, the coordinates are inversely transformed, and the final transformed M is the slice value, which can be expressed by the following equation.Due to the limited size of the sensor of the light field camera, combined with the characteristics of discrete sampling, the angle of the periodic continuity signal is used to represent the light field signal, and the discrete sampling is performed.Therefore, the refocusing method based on the fractional Fourier transform is studied.
Refocusing algorithm of aperiodic continuous light field: the light field is regarded as an aperiodic continuous signal, and the biplane parameter method is used to represent the four-dimensional light field data.The plane coordinates of the primary lens are expressed by u=(u,v), the plane coordinates of the microlens array are s=(s,t), and the refocused imaging plane is x=(x,y).F'and F are the distances from the primary lens to the refocused image plane and the distances from the primary lens to the microlens array plane, respectively.The refocusing equation of spatial domain can be expressed as follows.( 6)


The light field signal is regarded as the signal of wrinkle continuity.The light field signal and refocusing process can be expressed as periodic continuous signals.s, x and u periods are expressed as Ts, Tx and Tu.In a period, the value range of s, x and u can be expressed as [-Ts/2, Ts/2], [-Tx/2, Tx/2], and [-Tu/2, Tu/2], respectively.Where represents the angle between the primary lens and the refocusing imaging plane and the angle between the primary lens and the microlens array plane.The imaging equation of the camera can be expressed as follows.(8) Light field refocusing algorithm based on fractional Fourier transform: each dimension in the light field acquisition function is discretely processed.Then, x=x1 ∆x.x1 is an integer, and the value range is [-nx, nx].The interval is ∆x, which is equal to the ratio of Tx to Nx. Tx is the period, and Nx is the number of points in a period; similarly, the expression methods of the positions of y, u, and v in the four-dimensional space can be obtained.When |α∆s|≥|(l-α) ∆u|; then, ∆s=∆t, ∆u=∆v.However, when the two sets of values are not equal, the following equation can be obtained.(9) According to convolutional representation of the fractional Fourier transform ( ), the discrete Fourier transform equation based on periodic continuous signals can be expressed as follows.
In the above equation, Ωx0 refers to the fundamental angular frequency, whose value is equal to 2π/Tx.Similarly, it can be obtained that the value of Ωu0 is 2π/Tu.

Data processing and analysis
In the research and experiment of using neural network method, the first and most important problem is training data.Especially for the structure of neural network, it is not only necessary to ensure the data label required by supervised learning, but also ensure the quantity and diversity of data [24].For the problem of data label, the purpose is to adjust the parameter changes in the process of network learning and training, and improve the adaptability of network structure and data, so as to realize the final classification and detection of unlabeled iris original collected image.Therefore, the structure in this exploration needs to use labeled data in the process of training and learning.In the initial labeling, in order to avoid the interference of human factors and increase the credibility and persuasiveness of the label, the improved YOLO iris location detection structure is used.According to the YOLO network, the confidence threshold of image data judgment under different states is used as the partition for the initial labeling of iris original data.In order to avoid over fitting phenomenon, in this exploration, the data enhancement method commonly used in neural network data processing is adopted for the original data [25].
Since human iris image involves personal privacy, there is no public iris image database.Therefore, in order to successfully complete the experiment and verify the effectiveness of iris detection and recognition algorithm, the establishment of iris image database is a necessary process.All the experiments are carried out in the laboratory, and the collected human iris images are enhanced by near-infrared light conditions.At the same time, it is equipped with LED light source and two sets of photography light sources in the laboratory to ensure sufficient and uniform indoor light source.Using Lytro Illum light field camera and Nikon long lens, the iris texture images under different focal length are collected.When the images are collected, the position of the light field camera should be kept unchanged.The collection of iris texture images of different distances is realized by changing the position of the subjects.The iris images of the left and right eyes are collected separately from 10 subjects (this study has informed the respondents and obtained their consent).In order to ensure the accuracy of the     experiment, at least 3 images are collected from each position under the same conditions.180 human iris images can be obtained by collecting the true iris images at the camera focus position of 1.5m, 1.6m and 1.7m.The collected images are filtered and 150 clear human iris images are selected.The true iris data mainly comes from the camera collection, while the false iris data mainly comes from the network and database.Similarly, three iris images under the same conditions are collected, and 200 pairs of prosthesis iris images are selected.In order to ensure the image quality, the image resolution is 2048 × 1536.Through the light field refocusing technology, 3600 iris images can be obtained, and 3000 of them are clear.The collected iris images of human and prosthesis are processed and classified as the iris database of this study, and analyzed.Table 2 shows the specific data.

RESULTS AND ANALYSIS
All experiments are carried out under laboratory conditions.The iris image of the collected person is strengthened through near-infrared light conditions.At the same time, the LED light source in the laboratory and two sets of photography light sources are equipped to ensure sufficient and uniform indoor light sources.The Lytro Illum light field camera and Nikon long lens are used to collect iris texture images at different focal lengths.When collecting, it is ensured that the position of the light field camera remains unchanged.The collection of iris texture images at different distances is realized by changing the position of the collected person.A total of 10 people is collected this time.Also, the iris images of the left and right eyes are collected separately.For the accuracy of the experiment, under the same conditions, at least 3 images are collected at each position.The collected images are screened to select about 150 true human iris images.The collected iris image of the prosthesis comes from the Internet.High-quality iris images are selected for collection.The light field camera is used to collect images through the iPad's screen display.Similarly, 3 iris images under the same conditions are collected, and 150 prosthetic iris images are selected.About 30,000 iris images can be obtained through light field refocusing technology and analyzed.

Normalization analysis of iris image three-dimensional structure features
Since the sharpness of the iris images collected by different defocus amounts is different, the focus stack sequence is first obtained by refocusing technology.Different focus stack images will be obtained when collecting at different defocus amounts.Different focus stack images are collected within the range of [0.5, 1.7] meters in the refocusing interval and numbered to obtain a series of focus stack sequences (Sequence number).Under the same sequence number, at the focusing positions of 1.5m, 1.6m and 1.7m of the camera, the true and false iris images are collected.The true iris acquisition method is to collect the iris image of the subjects in the laboratory.As shown in the left figure of Figure 5, the clarity scores of iris images collected at 1.5m, 1.6m and 1.7m are calculated respectively.The results show that the iris image at 1.6m has the highest score and the image is the clearest.The iris image of 10cm myopic defocus at 1.7m is slightly higher than that at 1.5m of hyperopia defocus, and the clarity of iris image at 1.5m is the lowest.This is in accordance with the actual imaging principle.For the medium distance iris database, the 70cm iris image collected normally is the clearest, while the 70cm focused image collected in the bright light environment is slightly worse.The reason is that part of the image is overexposed, resulting in the loss of edge information.The clarity of motion blurred image and defocused blurred image is slightly poor.The comparision between the two databases shows that the image clarity of the iris database of 70cm medium distance light field is much higher than that of the iris database of long-distance light field, which indicates that the improvement strategy proposed for the problems existing in the long-distance database effectively improves the clarity and quality of the collected images.It can be seen from Figure 5 that the peak value of the normalized three-dimensional structure feature curve of the true iris image is lower than that of the false iris curve, and the three-dimensional structure feature curve of the true iris fluctuates more, not a smooth curve; the normalized curve of the three-dimensional structure of the false iris is relatively smooth, with relatively few fluctuations.

Analysis of the relationship between refocusing range and accuracy
The range of refocusing and the number of refocusing sequence images will affect the accuracy of the expression of three-dimensional structure features.Therefore, the refocusing sequence images remain is fixed.The value of refocusing and the accuracy of the expression of three-dimensional structure features are analyzed.The results are shown in the Figure 6 below.Figure 6 shows that there is a correlation between the refocusing value and the accuracy of iris detection.When the range of refocusing is between 0.1 and 0.4, the accuracy of iris detection is improved with the increase of the refocusing range.When the refocusing value is 0.4, the detection accuracy reaches the maximum value, and then the accuracy fluctuates slowly with the increase of the refocusing value.Therefore, the selected refocusing value is 0.4.The experimental results show that there is a correlation between Ss and the accuracy of in vivo detection.The accuracy of Ss increases with the increase of refocusing range between [0.1,0.4].When Ss=0.4, the maximum accuracy is 94.4%.After that, the accuracy fluctuates slowly with the increase of Ss.There is a linear relationship between the execution time of the program and the number of focal stack images.The larger the Ss is, the longer the focal stack images rendering time is.Therefore, the optimal value of Ss is 0.4, the refocusing interval used to construct stereoscopic structure features is [a-0.2,a+0.2], and the optimal value is 145.The similar results have been obtained in Xin et al. (2015).

Performance comparison and analysis of different iris detection methods
In order to verify the performance of the proposed algorithm, it is necessary to carry out a comparative test.The existing iris detection algorithms are selected, which are based on image quality evaluation algorithm and based on feature extraction algorithm, to detect the iris image and judge the classification and detection performance of several methods with true and false iris images under the same experimental conditions.
Based on image quality evaluation algorithm: distortion identification-based image verity and integrity evaluation (DIIVINE) is selected.The algorithm extracts statistical features consistent with vision, extracts features from wavelet transform sub-bands in multi-scale and multi-directional, and then classifies and regresses distortion types by using SVM, and predict the objective image quality by using support vector regression algorithm.
Based on feature extraction algorithm: the local phase quantization (LPQ) is selected as the algorithm based on feature extraction.The algorithm is a method of discrete Fourier transform for image and extracting texture features based on phase information in frequency domain.This method can extract texture features with fuzzy invariance under centrosymmetric blur.The reason why the two methods mentioned above are selected is that they overlap with some of the technologies in the proposed algorithm, which is more conducive to detect the performance of the proposed algorithm.
The classification accuracy and classification error rate of based on image quality evaluation algorithm, based on feature extraction algorithm and the proposed iris detection algorithm (Feature Extraction-Texture Extraction-SVM) are compared.Figure 7 shows the comparison results.

FIGURE 3 F1
FIGURE 3 Principle of light field refocusing

( 5 )F
M and F N in the above equation represent Fourier transform operators in M-dimension and N-dimension, the integral projection operator means that the last N-M dimension of the N function is zeroed to M dimension, and the slice operator means that the last N-M dimension of the N function is lowered to M dimension.B represents any coordinate transformation of the N-dimensional function, and B -T represents the inverse matrix of the B matrix.

FIGURE 5
FIGURE 5 Normalized processing graph of the three-dimensional structure of true (upper) and false (lower) iris

FIGURE 6
FIGURE 6 The relationship between the refocusing value and the expression accuracy of three-dimensional structure features

Basic principle flow chart of iris recognition technology
.
FIGURE 2 Iris image real-time acquisition structure