Demonstration of shift, scale, and rotation invariant target recognition using the hybrid opto-electronic correlator

Previously, we had proposed a hybrid opto-electronic correlator (HOC), which can achieve the same functionality as that of a holographic optical correlator but without using any holographic medium. Here, we demonstrate experimentally that the HOC is capable of detecting objects in a scale, rotation, and shift invariant manner. First, the polar Mellin transformed (PMT) versions of two images are produced, using a combination of optical and electronic signal processing. The PMT images are then used as the reference and the query inputs for the HOC. The observed correlation signal is used to infer, with high accuracy, the relative scale and angular orientation of the original images. We also discuss practical constraints in reaching a high-speed implementation of such a system. In addition, we describe how these challenges may be overcome for producing an automated version of such a correlator. © 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Target recognition and tracking has a wide range of applications in the modern world. Optical image recognition systems offer a fast alternative over traditional electronics-based systems. The simplest such optical system is the Vander Lugt correlator [1][2][3], which is able to compare two images using holographic filters. However, a key limitation to this technology is the use of a slow recording process for the filters. Other correlators have been designed to circumvent the recording process, such as the Joint Transform Correlator (JTC) [4][5][6][7][8][9], which uses dynamic materials to record and correlate at the same time. However, the material needed for such a correlator suffers from many practical problems, such as the need for applying a high voltage, and get damaged easily [10,11]. We recently proposed and demonstrated a new hybrid opto-electronic correlator (HOC) [12,13] that overcomes some of these limitations and replaces the JTC's nonlinear material with detectors. The advantage of such a correlator is discussed in more detail in [12]. Yet two key limitations inherent to optical target recognition remain in our originally proposed HOC architecture: the system is intolerant to changes in scale and rotation. There have been many proposals to overcome these limitations, many of which detail the implementation of coordinate transforms [14][15][16][17][18]. We recently proposed that the incorporation of the polar Mellin Transform (PMT) into the existing HOC architecture would result in a shift, scale, and rotation invariant correlator [19]. In this paper, we show the results of such an incorporation using commercially available instruments. In addition, we show that the output of a positive match can be analyzed to determine the rotation angle of the query image.
Today, computers are able to detect matched images with great accuracy thanks to advances in neural networks and image recognition algorithms. However, even state of the art systems take upwards of 26 ms to detect matched features [20]. This time quickly adds up when scanning large databases or processing real-time camera feeds. Our system, as proposed using specialized circuits for the electronic components, is capable of reaching correlation times on the order of a few microseconds [12]. The HOC is not meant to replace computers, as they are capable of detecting much finer details and performing more complex algorithms. Instead, it is expected to work as a pre-processor that would filter out obvious matches and mismatches, and produce a vastly reduced set of images that may require further processing. Of course, in principle, this pre-processing could also be performed using electronic circuits, entirely removing the need for optical components. However, the current best 2D Fourier Transform (FT) electronic integrated circuits have execution times of over 6ms per image [21], highlighting the need for optical techniques.
To exemplify the usefulness of the HOC, consider a database with 1 million images, 100 of which are potential matches to a query. A computer using state of the art algorithms would take 0.026 x 10 6 = 26,000 seconds = 7.2 hours to compare each database image to the query image by using neural networks. If instead one uses electronic FT's for correlation preprocessing (requiring at least two FT's per correlation), it would take 0.006 x 2 x 10 6 = 12,000 seconds = 3.3 hours to filter out the 100 potential matches, which then require a subsequent 100 x 0.026 = 2.6 seconds to process with neural networks for more detailed results. Assuming a correlation time of 5 μs, the HOC requires 5 x 10 −6 x 10 6 = 5 seconds to perform the filtering, and then 2.6 seconds for the neural network processing. It is this kind of largedatabase image processing that would benefit most from the HOC. While electronic components are generally cheaper and more robust, the difference in performance between an all-electronic and the hybrid opto-electronic approach is large enough to outweigh the disadvantages.
The rest of the paper is organized as follows. Section 2 details the experimental setup and theory of operation of the system. An overview of the steps required to implement the PMT in the HOC is given in section 3. The results are presented and examined in section 4, where we show how the use of the PMT conforms to the theory. We conclude with a summary and outlook in section 5.

Experimental setup and working principle of the HOC
The details of the basic HOC architecture can be found in [12] and [13], while the augmentation thereof via incorporation of the PMT can be found in [19]. If commercially available components are used, the operating speed of the HOC is severely limited by the serial communication between the devices. For this reason we proposed a system called the Integrated Graphic Processing Unit (IGPU) which may allow the HOC to perform a correlation in a time scale as short as few microseconds. Much work remains to be done before the IGPU can be realized. As such, we have shown the working principle of the HOC using existing technology, without optimizing the speed of operation.

Overview of PMT augmented HOC
Like other optical correlators, the HOC takes advantage of the FT property of lenses. However, unlike traditional holographic correlators, it does not require a writing step where the information of the FT of the reference image is stored prior to its operation. Instead, the HOC captures the FT of the reference and query images, at the same time, on two separate arms. A Focal Plane Array (FPA) on each arm captures three intensity signals; the FT of the image, an auxiliary plane wave, and the interference between these two. The amplitude and phase information for the FT of the image is thus captured for each arm. We then subtract the intensity of the FT'd image beam and the auxiliary plane wave from the interference pattern for each arm. This yields two electronic FT-domain signals that are then multiplied together pixel-by-pixel resulting in a single output signal. By then transferring this signal back to the optical domain using an SLM, we can pass it through another lens and obtain its FT, which will correspond to the space-domain convolution and correlation of the two original images. This is further explained in section 2.3.
The amplitudes of the cross-correlation and convolution produced this way depend on the relative phase of the two auxiliary plane waves. Thus, for a practical implementation of this scheme we employ a Phase Stabilizer and Scanner (PSS), which is described in more detail later on.
The process as described above is able to recognize a match between a reference image and a query image in a shift invariant manner. However, it is not rotation and scale invariant. This limitation is eliminated by employing the PMT process. This involves the following additional steps in each arm before the interference with the auxiliary beams occurs. First, the FT of each image is detected with an FPA, then the amplitude of the FT is determined by taking the square root of the signal for each pixel. The resulting numbers are then converted from the rectilinear coordinates { }  interfered with the auxiliary beam in each arm. More details of this process can be found in [19].

Experimental setup
For this demonstration we have used a simplified version of the architecture proposed in [12]. This is illustrated schematically in Fig. 1. A continuous-wave diode-pumped solid-state laser (Verdi V2) at 532 nm is used as the light source. The laser beam starts with a diameter of 1mm, which is spatially filtered and expanded to 1" (25.4 mm). This beam is passed through a 50/50 Beam Splitter (BS) into two arms; the Image Arm and the PSS Arm. The latter leads to a mirror mounted on a Piezo-electric Transducer (PZT-1a) which redirects the beam through a shutter (S1) to a Mach-Zehnder Interferometer (MZI). The MZI, along with PZT-2, a pair of photo-detectors (MZI PD) that are separated to detect two different fringes in the MZI interference pattern, and a Proportional-Integral-Differential (PID) controller, forms a phase-stabilization system. This MZI has two BS's inserted in one path. These redirect two plane waves 1 2 ( , ) C C towards the image arms, with 1 C passing through PZT-1b. The phasestabilization system allows us to lock the phase difference between 1 C and 2 C according to a bias voltage applied to the output of the PID controller. This is discussed in greater detail in section 2.4. The image arm also passes through a shutter (S2) and is then split into the reference and query arms. Each of these two beams reflects off an amplitude modulated (AM) SLM to produce the image beams 1 ( , H 2 ) H , each of which is then directed towards a biconvex lens. The lens produces the two dimensional FT of the image at its focal plane. Each of the FT'd image beams 1 2 ( , ) M M then interferes with the corresponding plane wave prior to being detected by an FPA placed at the focal distance of the biconvex lens. For this setup we used the Thorlabs USB2.0 CMOS camera (DCC1545M), which has a resolution of 1280x1024 pixels, to perform the function of the FPA.
The use of shutters allows us to choose what we detect. We can detect just the FT'd image The SLM's used for this demonstration are custom-made using Texas Instrument's DLP3000 modules. These work using Digital Micro-mirror Devices (DMD's) which rapidly move to reflect light towards and then away from a target, effectively functioning as AM SLM's. The DLP3000 modules have a physical resolution of 684 x 608 pixels, but operate in a wide aspect ratio of 854 x 480. The active area of the SLM is 0.3" (7.62 mm) and each individual micro-mirror measures 7.6 μm across.

Mathematical Model of the HOC
In this version of the HOC, each set of measurements ( , , taken by opening and closing the shutters as described in the previous section, using the subscript '1' to denote the reference image, and the subscript '2' for the query. The FT of each image and each plane wave can be expressed as follows: is the phase of the FT'd image beam at the FPA plane, and Ψ j is the phase of the interfering plane wave at the same point. Here, j M and j φ are functions of ( ) , , x y but j C and Ψ j are assumed to be constant on the FPA plane. The detected interference pattern between the FT of the image ( j M ) and the plane wave ( j C ) is given by: This digital signal array can be stored on an FPGA along with the signal arrays j B and 2 .

j C
The FPGA can then perform a subtraction to obtain: This signal can be stored for both the reference ( 1 S ) and the query ( 2 S ) image in the same FPGA and later multiplied together using four-quadrant multiplication to find the signal array S : The resulting signal can be sent to an SLM to be transferred into the optical domain using a laser. Here, the signal beam can be FT'd by passing through a biconvex lens, presenting the final output signal f S at the focal plane: Here  stands for the FT. Because j M is the FT of an image , j H we can now use the wellknown relationship between the FT of products of functions and convolutions and crosscorrelations to express more explicitly the four terms in : where ⊗ indicates two-dimensional convolution, and  indicates two-dimensional crosscorrelation. This shows that using three intensity signals (A, B, and C) from each arm we can find the correlation between the two images. In Eqs. (4) to (6) we have grouped together the factors corresponding to the plane waves 1 C and 2 C into constants (α and ). β A more explicit expression of these terms reveals the following: It is clear that the output of the HOC depends nontrivially on the phases of the plane waves at their respective FPA's. We are also only interested in the cross-correlation terms of our output signal 3 (T and 4 ); T as such it is our goal to maximize β and minimize α while maintaining both values stable. For this we have designed and implemented a PSS that is explained in the next section.

Phase Stabilization and Scanning
The PSS can be considered to be a specific type of optical phase-locked loop (OPLL) with the added phase scan. Currently there are very few ways to implement a stable OPLL [22][23][24], and integrated circuits that perform this task are still at the research stage. To overcome this problem, we designed a discreet OPLL that can maintain lock for some time, along with a method of quickly reestablishing optimum lock values. The HOC requires us to control the phase difference between our Reference and Query auxiliary plane waves. From Eq. (8) it is clear that β will reach its maximum value when 1 where ' m ' is an integer. In order to achieve such a value, the HOC architecture incorporates an MZI with an adjustable mirror (PZT-2) and two coupled detectors (MZI PD), as shown in Fig. 2, which is a subset of the complete apparatus shown in Fig. 1. These detectors are separated a short distance on the plane normal to the direction of propagation of the laser, which allows them to detect different fringes of the interference pattern generated in the MZI. An electronic circuit finds the difference in intensity between these detectors and converts it into a voltage that is then fed into a low noise pre-amp and then a PID controller. The output of the PID is then added to a bias voltage that allows us to control the locking point before being connected to PZT-2. This system operates under the assumption that the mirrors and the optical path lengths are very stable. For this reason, the optical table is floated and the experiment is enclosed so as to minimize air turbulence. The first plane wave ( 1 C ) is extracted from the MZI prior to the PZT, having travelled a distance 1 c L from the first BS to FPA-1a, given by: 1 1,2 2,3 3,4 4,5 5,6 6,7 c L l l l l l l = + + + + + (9) where , m n l indicates the path from element m to element n . The second plane wave ( 2 C ) is extracted after PZT-2. The total path length for this plane wave from the first BS to FPA-2 is 2 c L , given by: Without considering the effects of the optical components (BS's and mirrors) which produce constant phase shifts, the phase of each plane wave can be written as: Using this expression we can now find the phase difference to be: where Δ OE φ represents the constant difference in phase shift produced by the optical element in each path. We can also find the sum of the phases: where , ' n m l represents the path length when the relevant PZT is at its static point. We can now write: Mechanically this means that PZT-1b has to be programmed to move the exact same distance as PZT-2, but in the opposite direction. This can be achieved using a feed-forward system where an inverted version of the bias signal applied to PZT-2 is sent to PZT-1b.
The PID system that controls PZT-2 receives its feedback from MZI_PD. The phase difference between the two path lengths in the MZI can be written as: This means that to lock the PID to a specific phase at MZI_PD (ΔΨ ) MZI we will have a set value of Δ , pzt l which will also lock ΔΨ. We can vary this value by use of a bias voltage that is added to the output of the PID controller [25]. As was previously shown, PZT-1a allows us to adjust the value of 1 Ψ and 2 Ψ simultaneously without changing ΔΨ . By continuously running a ramp signal at some frequency s ω on this PZT, we can scan over a wide range of phases. By applying a Low Pass This is the ideal way to operate the HOC. However, because the phase scan operates in the time domain, this method requires that all six signals ( , , j J A B and ) j C be detected simultaneously with six FPA's, and without shutters, which greatly increases the complexity of the system. As such, we did not implement the scanning segment of the PSS for the demonstration reported here. It should be noted that it is still possible to see the results of a correlation without washing out the α term, but one must be careful to distinguish between the correlation and convolution terms.
One way to reach the maximum value of β for an unknown α is to run a series of known matched images through the HOC at varying bias voltages. This works as follows.
One image is set as both the Reference and Query inputs. The HOC then runs a correlation, for a particular bias voltage. This will yield a match at the output of the HOC. The bias voltage is then changed within the range of operation of the PZT, repeating the correlation. The result will again be a match, but the overall output intensity will have either increased or decreased. The bias voltage is changed so as to look for the maximum intensity. This process is repeated, changing the bias in progressively smaller steps until the maximum output intensity is found.

Polar Mellin transform in the HOC
Due to the properties of the FT and lenses, the detection of a FT'd optical signal will be shift invariant. However, changes to the scale and rotation of the images will alter the scale and rotation of the FT, thus preventing the HOC from achieving a match. To counteract this we can instead compare images that have been pre-processed via the use of the Polar Mellin Transform (PMT). Because the PMT is, by definition, in log-polar coordinates; two identical images with different rotations will present the same PMT with a shift in the θ coordinate corresponding to the relative rotation angle between them. Similarly, any change in scale will manifest as a shift in the log-radial coordinate .
ρ By performing the PMT we are essentially converting any rotation and scale changes into translational shifts. Given that the established HOC architecture is inherently shift invariant and that the PMT is very closely related to the FT, it is thus well suited for adding rotation and scale invariance into the HOC architecture, as explained in detail in [19]. The steps to obtain the PMT in an optoelectronic system are as follows: 1-Find the FT of the image. 2-Determine the amplitude of the FT. (2a-Determine the intensity of the FT. 2b-Find the square root of the intensity). 3-Perform circular DC blocking. 4-Map polar coordinates into a rectilinear plane where x and y correspond to the r and θ axes. 5-Transform radial coordinate to the logarithm of the ratio of the radial coordinate and a reference length.
Steps 1 and 2a can be performed using a laser, an SLM, a FT lens, and an FPA. In this setup we used a single arm of our existing HOC architecture with the PSS shutter (S1) closed.
Steps 2b-5 are then performed by a computer. The resulting PMT image is then used as an input to the HOC.
By using a PMT image as a reference and converting a query image into its PMT, the HOC is able to find the correlation of the two original images in a shift, scale, and rotation invariant manner.
Given that all real digital images are composed of positive integer values, their FT will always contain a high value at the center (DC). The transformation from { } , x y to { } , ρ θ of such an image will produce an output that has a non-zero value for 0. ρ = It is impossible to transform this point to the log-polar domain. To avoid this, we cut a small hole in the intensity profile of the FT at DC prior to performing the polar coordinate transformation. This is called circular DC blocking [19]. It is important that the hole be small enough not to erase important information from the non-DC area of the FT. However, making the hole very small requires high pixel density. A convenient compromise is to use a small hole of a constant size for all images. If a constant-size circular DC block is chosen, the PMT conversion process can be achieved without any complex computations. The final three steps of the PMT process are independent of the detected image and can be achieved by physically connecting an { } , x y coordinate input to a rectilinear-mapped { } , ρ θ coordinate output (neglecting the connections corresponding to the circular DC block hole). In this way a single Application Specific Integrated Circuit (ASIC) could perform the PMT with the help of a FT lens. If an FPA and an SLM are built into this ASIC, the HOC would be able to achieve shift, scale, and rotation invariance using regular non-PMT images by inserting the ASIC at each image arm as shown in Fig. 3. Ideally we would expect the external SLM to be connected to either a camera or a computer to provide the non-PMT images. It would also be beneficial to incorporate such a system only at the query arm as shown in Fig. 3, with the reference arm using a holographic memory disk instead of an SLM to store a large database of PMT reference images. For this experiment, a grayscale image of an F-22 Raptor fighter jet was chosen for its excellent contrast, unique shape, and real-world value. Prior to running the experiment, the HOC was calibrated to its optimum bias voltage by using the method described in section 2.4 of this document.

Experimental results
The original reference image is shown in Fig. 4(a). The query image shown in Fig. 4(b) has been shifted and is scaled by a factor of 0.5 with a rotation of 48.25° counterclockwise with respect to the reference. The detected FT's of these two images are shown in Figs. 4(c) and 4(d) respectively. Because the query image is scaled, its FT is larger than the reference while also presenting a rotation. Because of these two factors, the HOC was unable to detect a match, producing an almost flat output signal 2 f S in Fig. 4(e). times larger than that of Fig. 4(e), indicating a successful correlation. On Fig. 5(b) we have added a red horizontal line that marks the value of θ that corresponds to 0 θ = in Fig. 5(a). This line shows the translational shift of the PMT caused by the rotation of the original query image. The section of the PMT that corresponds to the top of Fig. 5(a) has looped around to be under this red line.  To complement these results, a simulation using the same input images was run. This is shown in Fig. 6, corresponding to the ideal reference PMT, ideal query PMT, their ideal FT's, and the simulated HOC output 2 f S . In Fig. 6(b) we have added a similar red line to the one in Fig. 5(B), this time corresponding to 0 θ = in Fig. 6(a). By measuring the distance in pixels between the bottom of the PMT and the red line, recalling that the full vertical axis represents 360°, we can estimate the rotation of the query image to be 48 ≈°, which is close to the real rotation of 48.25°.
Similarly, the distance between the central peak of the output signal and the two lateral peaks in Figs. 5(e) and 6(e) has been marked with a red line. This is located at 2.3 rad θ = , which is equivalent to a rotation of 48.22°.

Conclusions and outlook
We have demonstrated that an HOC built using commercially available components and incorporating the PMT is able to find a match in a shift, scale, and rotation invariant manner, yielding an output that is ~15 times larger when a match is found vs when it is not found (without the PMT). Furthermore, the relative rotation of the query image with respect to the reference image in a match can be found in the output signal by measuring the distance from the central peak to one of the two lateral peaks. We have also shown that the behavior of the PMT-augmented HOC aligns with the theory by presenting simulated results that correspond to our experiment.
The development of the PMT-HOC can be categorized in three stages. In stage 1, we have demonstrated the functionality of the system by manually using a computer to perform the electronic processing. In stage 2, the PMT's of images and the mathematical processes required can be performed by an FPGA, thus fully automating the system. In stage 3, all of the signal processing can be done by using specially designed integrated circuits that can be incorporated into the FPA's and SLM's, forming an IGPU. This stage would allow for highspeed automation of the system, performing correlations in a time scale as short as a few microseconds.