Quick Techniques for Template Matching by Normalized Cross-Correlation Method

_______________________________________________________________________________ Abstract Object recognition is one of the fundamental challenges in signal processing, image processing and computer vision, where the goal is to identify and localize the extent of object instances within an image. A novel approach for performing the matching by normalized cross-correlation method in minimum time is introduced. The template matching by correlation is performed between template w and the image f where the template’s position is to be determined in the image. The computing process of correlation coefficient is analyzed and resolved into minute parts or units. These minute units are computed one time only before embedding them in larger blocks and stored in sum tables. The larger blocks are computed in recursive manner, using the sum tables, by adding and/or subtracting minute units from the original block instead of computing them from scratch. Moreover, this technique has been more developed by performing the cross-correlation on the odd or even signal’s samples only. The new approach, in its final form, has reduced the cross-correlation calculation time by 90%-94% depending on the image’s and template’s sizes.


Introduction
Normalized Cross-Correlation (also called cross-covariance) between two input signals is a kind of template matching. It is generally considered to be the gold standard of many applications [1][2][3]. However, its high

Original Research Article
computational cost is a significant drawback in its real-time application, especially when highly sampled RF signals and an exhaustive search are used [4]. Normalized Cross-correlation can be done in any number of dimensions. One-dimensional normalized cross-correlation between two input signals can be defined as: The coefficient, r, is a measurement of the size and direction of the linear relationship between variables x and y.
Given an image f(x,y), the correlation problem is to find all places in the image that match a given subimage w(x,y) (called mask or template). This means that the position of the given pattern is determined by a pixelwise comparison of the image with a given template, that contains the desired pattern. For this, the template is shifted u discrete steps in the x direction and v steps in the y direction of the image, and then the comparison is calculated over the template area for each position (u,v). To calculate this comparison, normalized cross correlation is a reasonable choice in many cases [4][5][6][7]. The method of choice for matching by correlation is to use the correlation coefficient: Where w is the template, ‫ݓ‬ ഥ is the average value of the elements of the template (computed only once). f is the image, and ݂ ̅ ௫௬ is the average value of the image in the region where f and w overlap. The summation is taken over the values of s and t such that the image and the template overlap. The denominator normalizes the result with respect to variation in intensity. The values of ߛሺ‫,ݔ‬ ‫ݕ‬ሻ are in the range [-1,1]. A high value of |ߛሺ‫,ݔ‬ ‫ݕ‬ሻ| generally indicates a good match between the template and the image.
There are many approaches for implementing the cross correlation. Most of these approaches are based on the concept of moving a classifier (or object) around over all possible scales and positions, scanning the image and searching for maximal detection responses, which is commonly called Sliding Windows (SW).
Pixel-by-pixel template matching is very time-consuming. For a scene image of size M x N, and the template of size m x n, the computational complexity is O (m × n ×M × N).
Because successive reference windows usually overlap, the entire calculation of the numerator in (2,3) is also redundant.
The basic idea behind this work is to pinpoint a collection of building units (vectors), the sum of elements contained in these vectors will be computed once only. These sums will be used to compute the bigger blocks which consist basically of collections of these vectors. Accordingly, the time required to compute the bigger vectors will be reduced resulting in a speed-up of an order of magnitude over the brute force approach of matching method [7].
Assume that a two dimensional array template w(s,t) is to be matched with a two dimensional array image f(x,y) and considering (2), the normalized cross-correlation consists of three terms, i.e., the energy of the template window ሺ∑ ሾ‫ݓ‬ሺ‫,ݏ‬ ‫ݐ‬ሻ − ‫ݓ‬ ഥሿ ଶ ௦,௧ ሻ in the denominator, the energy of the comparison window ሺ∑ ሾ݂ሺ‫ݔ‬ + ‫,ݏ‬ ‫ݕ‬ + ‫ݐ‬ሻ − ݂ ̅ ௫௬ ௦,௧ ሿ ଶ ሻ in the denominator and the standard cross-correlation between these two ሻ in the numerator. These terms need to be calculated for each pixel in the image ݂ሺ‫,ݔ‬ ‫ݕ‬ሻ . This calculation is to be repeated for each template window across the entire signal length. Therefore, the normalized cross-correlation-based template matching method is extremely time consuming.

Related Work
The efficient Normalized Cross-Correlation (NCC) calculation method based on sum tables relies on the fact that most calculations are redundant because of the exhaustive search of the comparison windows and high overlap between the reference windows [4]. Peter Nillius [8] attempted to speed up NCC first by transforming each sub-block of the image into the Walsh basis. The Walsh transform expansion can be done very efficiently through a binary tree of filters. Calculating the NCC using the Walsh components requires 2N −1 operations instead of 4N + 1 in a straightforward implementation. A highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from "Large-N" arrays of many radio antennas is presented [9]. The computational part of the algorithm, the X-engine, is implemented efficiently on Nvidia's Fermi architecture, sustaining up to 79% of the peak single precision floating-point throughput. M.I. Khalil [10] has introduced Parallel implementation of the cross-correlation execution over the local network, or in some cases over a Wide Area Network (WAN), helps reducing the processing time.

Proposed Approach
The proposed method used some of pre-calculated sum tables to avoid repeating redundant computations in the definition of the normalized cross-correlation given by (1,2). Moreover, the proposed method introduced a method for reducing the time required for computing the sum tables.
The term ‫ݓ‬ ഥ : this term will be computed once only because the contents of the template do not change while sliding over the image: // computing term ‫ݓ‬ ഥ Sum=0; for (i=0; i<s; i++) for (j=0; j<t; j++) sum = sum + w(I,j); next j; next i; The term ሾ‫ݓ‬ሺ‫,ݏ‬ ‫ݐ‬ሻ − ‫ݓ‬ ഥሿ : this term should be computed once only for each pixel in the template window and this procedure is the same for both the ordinary one and the new approach methods: // computing term ሾ‫ݓ‬ሺ‫,ݏ‬ ‫ݐ‬ሻ − ‫ݓ‬ ഥሿ for (i=0; i<s; i++) for (j=0; j<t; j++) d = ‫ݓ‬ (i,j) -‫ݓ‬ ഥ; add d to table-1 at position (i,j); next j; next i; should be recomputed in the same manner.

Experimental Results
Three versions of template matching by normalized cross-correlation algorithm have been implemented using Microsoft visual studio C# platform. The first version is "the ordinary" and is based on the sum tables. The second version is "the modified" and is based also on the sum tables beside the utilization of recursive calculation of the some terms in Eq.2. The third version is similar to the second one except that the outer and inner loops in the correlation procedure are modified to deal only on either the odd or even pixels of both the image and the template window. Each of the three versions has been tested on several images with different sizes and a lot of sum-images with different sizes used as templates. Following are two cases of those evaluation experiments. In the first case, a 956x428 image (Fig. 3) in addition to 16 sub-images with sizes 30x35 ~ 183x329 have been used to test the three programs yielding to results list in Table 4 and plotted in Fig.4 respectively. In the second case, a 655x598 image (Fig. 5) in addition to 16 sub-images with sizes 30x50 ~ 200x330 have been used to test the three programs yielding to results list in Table 5 and plotted in Fig.6 respectively.

Conclusions
Template matching by normalized cross-correlation method has many applications in many fields such as object recognition, signal processing, image processing and computer vision, where the goal is to identify and localize the extent of object instances within an image. However, its high computational cost is a significant drawback in its real-time application, especially when highly sampled RF signals and an exhaustive search are used. In this paper, a new fast algorithm for the computation of the normalized crosscorrelation is presented. It is based on using the sum tables and recursive calculations. The consumed time has been reduced comparing with the traditional approaches while maintaining the same high accuracy (correlation coefficient ~ 0.998). Moreover, for quick purposes, the developed approach can be carried only on the even or odd pixels in both the image and the template window respectively yielding to extreme reduction in processing time.