Multi-Layer Feature Based Shoeprint Verification Algorithm for Camera Sensor Images

As a kind of forensic evidence, shoeprints have been treated as important as fingerprint and DNA evidence in forensic investigations. Shoeprint verification is used to determine whether two shoeprints could, or could not, have been made by the same shoe. Successful shoeprint verification has tremendous evidentiary value, and the result can link a suspect to a crime, or even link crime scenes to each other. In forensic practice, shoeprint verification is manually performed by forensic experts; however, it is too dependent on experts’ experience. This is a meaningful and challenging problem, and there are few attempts to tackle it in the literatures. In this paper, we propose a multi-layer feature-based method to conduct shoeprint verification automatically. Firstly, we extracted multi-layer features; and then conducted multi-layer feature matching and calculated the total similarity score. Finally, we drew a verification conclusion according to the total similarity score. We conducted extensive experiments to evaluate the effectiveness of the proposed method on two shoeprint datasets. Experimental results showed that the proposed method achieved good performance with an equal error rate (EER) of 3.2% on the MUES-SV1KR2R dataset and an EER of 10.9% on the MUES-SV2HS2S dataset.


Introduction
With the popularity of crime investigation dramas and novels, more and more people know what investigators are looking for while solving a case. This kind of knowledge has affected the investigation and solving of cases, because some criminals may conceal their crime traces after they commit crimes. In fact, if there are no other forms of criminal evidence, as the only remaining evidence at a crime scene, shoeprint may very well reveal some available clues to a particular case.
Shoeprint conveys many important human characteristics, such as walking habits and identity, and it may play a vital role in crime investigation. The typical uses of shoeprints are to link cases, to testify whether the shoeprints are left by the same shoe, then identify the suspect. It is clear that some typical forensic evidence (fingerprints and palmprints) can be successful in linking evidence to a suspect for the evidence's uniqueness. Similar with the evidence stated previously, shoeprint evidence has its own unique identifying characteristics. The identifying characteristics can be classified into three categories: class characteristics, wear characteristics, and individual identifying characteristics [1]. Class characteristics are shoe patterns that are usually created in the manufacturing process, and different shoes may have different shoe patterns. Class characteristics are usually used to recognize or retrieve shoeprints with similar shoe patterns. Wear characteristics are those changes on the experts first manually label the individual identifying characteristics, then compare them in their own opinions; and finally draw a conclusion as to whether the two shoeprints could have come from the same shoe according to their own opinions. This kind of comparison method is too dependent on experts' experience, and the conclusions of different experts are not always similar.
Yekutieli et al. [23] proposed a system to compare individual identifying characteristics with a physical matching method. The individual identifying characteristics used in their method are first marked manually, and then the contour of each individual identifying characteristic is extracted for comparison. This method cannot perform shoeprint verification in an automatic manner. To the best of our knowledge, there are no works focusing on automatic shoeprint verification. Therefore, in this paper, we focus on an automatic shoeprint verification algorithm.
The main contributions of the proposed method are summarized as follows: (i) we propose a multi-layer feature-based shoeprint verification algorithm that can be used in forensic practice; (ii) we introduce a shoeprint partition model (SPM) to analyze shoeprints, which considers foot anatomy and the relationship between shoe and foot, and facilitates analyzing shoeprints accurately in practice; (iii) we propose an individual identifying characteristics detection method to perform characteristics detection automatically; (iv) we propose a shoeprint image matching strategy. The shoeprint is divided into nineteen sections.
Similarities of each section are computed respectively, and the total similarity between the two images is a weighted sum.
The rest of this paper is organized as follows: Section 2 describes the shoeprint acquisitions and datasets, Section 3 details the proposed method, Section 4 provides the performance experiments and the detailed evaluation of the proposed method and Section 5 concludes the paper.

Shoeprint Acquisitions and Datasets
To testify the effectiveness of the proposed methods, we build two kinds of shoeprint verification datasets named MUES-SV1KR2R and MUES-SV2HS2S. Samples from the two shoeprint image datasets are shown in Figure 1.

Shoeprint Acquisitions and Datasets
To testify the effectiveness of the proposed methods, we build two kinds of shoeprint verification datasets named MUES-SV1KR2R and MUES-SV2HS2S. Samples from the two shoeprint image datasets are shown in Figure 1. Images in MUES-SV1KR2R dataset are pressure images acquired by using the shoeprint scanners, and the imaging process is shown in Figure 2. When one steps on the reflective tape (the height of the reflective tape is about 1 mm), different height downward bulges are formed on the surface of the reflective tape because of the uneven pressures. The higher the pressure is, the higher the downward bulges are. The lights of the light source shine through the transparent glass and on the surface of these downward bulges. The lights then produce shadowing between the different heights area of those downward bulges and the lights are reflected to the camera by the mirror. Different amounts of light are captured by the camera for the different heights of bulges, and a shoeprint image can be scanned at 300 dpi. Then the obtained shoeprint images are stored in the MUES-SV1KR2R dataset. Some samples in the MUES-SV1KR2R dataset are shown in Figure 1a. The MUES-SV1KR2R dataset consists of more than 1200 pairs of shoeprint images, and each pair of shoeprints has the same shoe patterns. Of 1200 shoeprint pairs, 600 pairs of shoeprints are made by the same shoes. We refer this kind of shoeprint as a reference or suspect shoeprint. Images in the dataset are registered and have the same spatial resolution. Most images in the database are clear and full. Shoeprint images are available as 3600 by 1800 pixels. Images in MUES-SV1KR2R dataset are pressure images acquired by using the shoeprint scanners, and the imaging process is shown in Figure 2. When one steps on the reflective tape (the height of the reflective tape is about 1 mm), different height downward bulges are formed on the surface of the reflective tape because of the uneven pressures. The higher the pressure is, the higher the downward bulges are. The lights of the light source shine through the transparent glass and on the surface of these downward bulges. The lights then produce shadowing between the different heights area of those downward bulges and the lights are reflected to the camera by the mirror. Different amounts of light are captured by the camera for the different heights of bulges, and a shoeprint image can be scanned at 300 dpi. Then the obtained shoeprint images are stored in the MUES-SV1KR2R dataset. Some samples in the MUES-SV1KR2R dataset are shown in Figure 1a. The MUES-SV1KR2R dataset consists of more than 1200 pairs of shoeprint images, and each pair of shoeprints has the same shoe patterns. Of 1200 shoeprint pairs, 600 pairs of shoeprints are made by the same shoes. We refer this kind of shoeprint as a reference or suspect shoeprint. Images in the dataset are registered and have the same spatial resolution. Most images in the database are clear and full. Shoeprint images are available as 3600 by 1800 pixels. Images in MUES-SV2HS2S are crime scene shoeprints collected by using the crime scene shoeprint scanners, and the imaging process is shown in Figure 3. For an impression left on the ground, the oblique lights from the light source shine to create shadows between the low and high areas of the impression and provide a great amount of contrast between those areas. Then the contours and contents within the impression can be revealed and captured by the camera. For impressions in sand, soil, or snow, the oblique light from the light source 1 can provide the best contrast, and for those impressions in dust or residue, the oblique light from light source 2 is most effective at a very low angle. The distance from light source 1 to the shoeprint is about 5 cm. A shoeprint image is scanned at 300 dpi according to the above process, and the shoeprint images are stored in the MUES-SV2HS2S dataset. Some samples in the MUES-SV2HS2S dataset are shown in Figure 1b. The MUES-SV2HS2S dataset consists of more than 256 pairs of shoeprints with same shoe patterns. Of 256 shoeprint pairs, 128 pairs of shoeprints are made by the same shoes. We refer this kind of shoeprint as a crime scene shoeprint. Images in this dataset are registered and have the same spatial resolution. Shoeprint images are available as 3600 by 1800 pixels.

Methods
The proposed shoeprint verification scheme consists of four basic modules: shoeprint preprocessing module, multi-layer feature extraction module, multi-layer feature matching module, and decision-making module, as shown in Figure 4. Each of these modules plays a vital role in the shoeprint verification system. The functionalities of these modules are stated as that: (1) preprocessing module aims at separating the valid shoeprints from the complex backgrounds, and conducting shoeprint registration and partition; (2) multi-layer feature extraction module aims at representing the characteristics of shoeprints on global layer, partial layer and individual identifying layer; (3) multi-layer feature matching module compares the multi-layer features; and (4) Images in MUES-SV2HS2S are crime scene shoeprints collected by using the crime scene shoeprint scanners, and the imaging process is shown in Figure 3. For an impression left on the ground, the oblique lights from the light source shine to create shadows between the low and high areas of the impression and provide a great amount of contrast between those areas. Then the contours and contents within the impression can be revealed and captured by the camera. For impressions in sand, soil, or snow, the oblique light from the light source 1 can provide the best contrast, and for those impressions in dust or residue, the oblique light from light source 2 is most effective at a very low angle. The distance from light source 1 to the shoeprint is about 5 cm. A shoeprint image is scanned at 300 dpi according to the above process, and the shoeprint images are stored in the MUES-SV2HS2S dataset. Some samples in the MUES-SV2HS2S dataset are shown in Figure 1b. The MUES-SV2HS2S dataset consists of more than 256 pairs of shoeprints with same shoe patterns. Of 256 shoeprint pairs, 128 pairs of shoeprints are made by the same shoes. We refer this kind of shoeprint as a crime scene shoeprint. Images in this dataset are registered and have the same spatial resolution. Shoeprint images are available as 3600 by 1800 pixels. Images in MUES-SV2HS2S are crime scene shoeprints collected by using the crime scene shoeprint scanners, and the imaging process is shown in Figure 3. For an impression left on the ground, the oblique lights from the light source shine to create shadows between the low and high areas of the impression and provide a great amount of contrast between those areas. Then the contours and contents within the impression can be revealed and captured by the camera. For impressions in sand, soil, or snow, the oblique light from the light source 1 can provide the best contrast, and for those impressions in dust or residue, the oblique light from light source 2 is most effective at a very low angle. The distance from light source 1 to the shoeprint is about 5 cm. A shoeprint image is scanned at 300 dpi according to the above process, and the shoeprint images are stored in the MUES-SV2HS2S dataset. Some samples in the MUES-SV2HS2S dataset are shown in Figure 1b. The MUES-SV2HS2S dataset consists of more than 256 pairs of shoeprints with same shoe patterns. Of 256 shoeprint pairs, 128 pairs of shoeprints are made by the same shoes. We refer this kind of shoeprint as a crime scene shoeprint. Images in this dataset are registered and have the same spatial resolution. Shoeprint images are available as 3600 by 1800 pixels.

Methods
The proposed shoeprint verification scheme consists of four basic modules: shoeprint preprocessing module, multi-layer feature extraction module, multi-layer feature matching module, and decision-making module, as shown in Figure 4. Each of these modules plays a vital role in the shoeprint verification system. The functionalities of these modules are stated as that: (1) preprocessing module aims at separating the valid shoeprints from the complex backgrounds, and conducting shoeprint registration and partition; (2) multi-layer feature extraction module aims at representing the characteristics of shoeprints on global layer, partial layer and individual identifying layer; (3) multi-layer feature matching module compares the multi-layer features; and (4)

Methods
The proposed shoeprint verification scheme consists of four basic modules: shoeprint preprocessing module, multi-layer feature extraction module, multi-layer feature matching module, and decision-making module, as shown in Figure 4. Each of these modules plays a vital role in the shoeprint verification system. The functionalities of these modules are stated as that: (1) preprocessing module aims at separating the valid shoeprints from the complex backgrounds, and conducting shoeprint registration and partition; (2) multi-layer feature extraction module aims at representing the characteristics of shoeprints on global layer, partial layer and individual identifying layer; (3) multi-layer feature matching module compares the multi-layer features; and (4) decision-making module calculates the total similarity score, and then makes the decision that the two shoeprints are left by the same shoe or not. decision-making module calculates the total similarity score, and then makes the decision that the two shoeprints are left by the same shoe or not.

Shoeprint Preprocessing
This stage aimed to separate the valid shoeprints from the complex backgrounds and conducting shoeprint registration and partition. Image preprocessing has the following steps: (1) Shoeprint extraction: An image segmentation technique [24] is used to extract the shoeprint images from the complex backgrounds. (2) Image registration: In shoeprint verification applications, accurate shoeprint alignment has a determinative effect, and a FFT-based registration algorithm [25] is used to align the shoeprint images.
(3) Shoeprint partition: A shoe partition model (SPM) is proposed to divide shoeprint image into different sections according to the structure of the foot and the relationship between shoe and foot. The SPM is to divide a shoeprint into different sections with a set of landmarks. Firstly, the contour of a shoeprint is represented by an average shape that is trained by using enough shoeprint images with various shapes. Secondly, three points (e.g., the front most point, rearmost point, and leftmost point) are marked interactively. Thirdly the other points of the contour, which are denoted with green dots as shown in Figure 5b, are estimated by using the interpolation method with the average shape and three points. Finally, the subsections are divided according to the predefined model. Each shoeprint is divided into toe section, sole section, instep section, heel section and back of heel section, and each section is further divided into several non-overlapped subsections for further analysis. The total number of subsections is 19.

Multi-Layer Feature Extraction
Most forensic experts use class, wear, and individual identifying characteristics to analyze shoeprints [1]. Class characteristics are usually caused through the manufacturing process and differ on shoe patterns. Wear characteristics are usually caused by natural erosion, and they reflect personal walking habits. Wear characteristics of shoes worn by different persons are different, even though the shoes are of the same patterns. Individual identifying characteristics are usually

Shoeprint Preprocessing
This stage aimed to separate the valid shoeprints from the complex backgrounds and conducting shoeprint registration and partition. Image preprocessing has the following steps: (1) Shoeprint extraction: An image segmentation technique [24] is used to extract the shoeprint images from the complex backgrounds. (2) Image registration: In shoeprint verification applications, accurate shoeprint alignment has a determinative effect, and a FFT-based registration algorithm [25] is used to align the shoeprint images.
(3) Shoeprint partition: A shoe partition model (SPM) is proposed to divide shoeprint image into different sections according to the structure of the foot and the relationship between shoe and foot. The SPM is to divide a shoeprint into different sections with a set of landmarks. Firstly, the contour of a shoeprint is represented by an average shape that is trained by using enough shoeprint images with various shapes. Secondly, three points (e.g., the front most point, rearmost point, and leftmost point) are marked interactively. Thirdly the other points of the contour, which are denoted with green dots as shown in Figure 5b, are estimated by using the interpolation method with the average shape and three points. Finally, the subsections are divided according to the predefined model. Each shoeprint is divided into toe section, sole section, instep section, heel section and back of heel section, and each section is further divided into several non-overlapped subsections for further analysis. The total number of subsections is 19. decision-making module calculates the total similarity score, and then makes the decision that the two shoeprints are left by the same shoe or not.

Shoeprint Preprocessing
This stage aimed to separate the valid shoeprints from the complex backgrounds and conducting shoeprint registration and partition. Image preprocessing has the following steps: (1) Shoeprint extraction: An image segmentation technique [24] is used to extract the shoeprint images from the complex backgrounds. (2) Image registration: In shoeprint verification applications, accurate shoeprint alignment has a determinative effect, and a FFT-based registration algorithm [25] is used to align the shoeprint images.
(3) Shoeprint partition: A shoe partition model (SPM) is proposed to divide shoeprint image into different sections according to the structure of the foot and the relationship between shoe and foot. The SPM is to divide a shoeprint into different sections with a set of landmarks. Firstly, the contour of a shoeprint is represented by an average shape that is trained by using enough shoeprint images with various shapes. Secondly, three points (e.g., the front most point, rearmost point, and leftmost point) are marked interactively. Thirdly the other points of the contour, which are denoted with green dots as shown in Figure 5b, are estimated by using the interpolation method with the average shape and three points. Finally, the subsections are divided according to the predefined model. Each shoeprint is divided into toe section, sole section, instep section, heel section and back of heel section, and each section is further divided into several non-overlapped subsections for further analysis. The total number of subsections is 19.

Multi-Layer Feature Extraction
Most forensic experts use class, wear, and individual identifying characteristics to analyze shoeprints [1]. Class characteristics are usually caused through the manufacturing process and differ on shoe patterns. Wear characteristics are usually caused by natural erosion, and they reflect personal walking habits. Wear characteristics of shoes worn by different persons are different, even though the shoes are of the same patterns. Individual identifying characteristics are usually

Multi-Layer Feature Extraction
Most forensic experts use class, wear, and individual identifying characteristics to analyze shoeprints [1]. Class characteristics are usually caused through the manufacturing process and differ on shoe patterns. Wear characteristics are usually caused by natural erosion, and they reflect personal walking habits. Wear characteristics of shoes worn by different persons are different, even though the shoes are of the same patterns. Individual identifying characteristics are usually attributed to something randomly added to or taken away from the outsole, and they make the shoeprints unique. We propose a multi-layer feature extraction method to represent these characteristics of shoeprints on global layer, partial layer and individual identifying layer.

Global Layer Feature Extraction
The global layer features that represent class characteristics are used to determine whether two shoeprints are of the same shoe patterns. Based on thousands of crime scene shoeprints, we found that patterns of the sole sections and the heel sections may be collected frequently in crime scenes, and they may determine the retrieval results in most cases. Thus, firstly, the shoeprint is divided into the bottom region and top region as shown in Figure 5a. Here, the ratio of the top region height to the bottom region height is set as 3 to 2, which is learned from training examples. Then the wavelet-Fourier-Mellin (WFM) transform coefficients of both the top and bottom regions are used to describe the class characteristics of the shoeprint [5]. For a given shoeprint image u i , its global layer feature extraction process has two main steps as follows.
Step 1: The two corresponding regions S top (i) and S bottom (i) are decomposed by using Haar wavelet at a specified number of levels. We can acquire one approximation subband and three details at each level. The coefficients can be represented as: where L denotes the maximum level. To avoid merging the useful neighbor patterns, L should meet the criterion: 2 L−1 ≤ D min , where D min represents the minimum distance between two neighbor patterns which can be specified interactively.
Step 2: Perform Fourier-Mellin transform on each band of wavelet coefficients, and then a band passed filter proposed in [25] is used to weaken the effect of connections between patterns and noises such as small holes and broken patterns. We use FMW S top (i) and FMW(S bottom (i)) to denote the global layer feature of the shoeprint u i .

Partial Layer Feature Extraction
The partial layer features represent wear characteristics are used to determine whether two shoeprints have the same natural erosion caused by personal walking habits. Partial layer features are extracted in different subsections. The process of partial layer feature extraction are as follows, firstly the shoeprint is divided into nineteen subsections; secondly, the minimum enclosing rectangle region of each subsection is extracted and divided into non-overlapping patches; finally, the intensity distribution of each patch is used as the partial layer feature, which is represented as F P . Figure 6 shows the process of partial layer feature extraction. attributed to something randomly added to or taken away from the outsole, and they make the shoeprints unique. We propose a multi-layer feature extraction method to represent these characteristics of shoeprints on global layer, partial layer and individual identifying layer.

Global Layer Feature Extraction
The global layer features that represent class characteristics are used to determine whether two shoeprints are of the same shoe patterns. Based on thousands of crime scene shoeprints, we found that patterns of the sole sections and the heel sections may be collected frequently in crime scenes, and they may determine the retrieval results in most cases. Thus, firstly, the shoeprint is divided into the bottom region and top region as shown in Figure 5a. Here, the ratio of the top region height to the bottom region height is set as 3 to 2, which is learned from training examples. Then the wavelet-Fourier-Mellin (WFM) transform coefficients of both the top and bottom regions are used to describe the class characteristics of the shoeprint [5]. For a given shoeprint image i u , its global layer feature extraction process has two main steps as follows. Step where L denotes the maximum level. To avoid merging the useful neighbor patterns, L should meet the criterion: D represents the minimum distance between two neighbor patterns which can be specified interactively.
Step 2: Perform Fourier-Mellin transform on each band of wavelet coefficients, and then a band passed filter proposed in [25] is used to weaken the effect of connections between patterns and noises such as small holes and broken patterns. We use

Partial Layer Feature Extraction
The partial layer features represent wear characteristics are used to determine whether two shoeprints have the same natural erosion caused by personal walking habits. Partial layer features are extracted in different subsections. The process of partial layer feature extraction are as follows, firstly the shoeprint is divided into nineteen subsections; secondly, the minimum enclosing rectangle region of each subsection is extracted and divided into non-overlapping patches; finally, the intensity distribution of each patch is used as the partial layer feature, which is represented as P F . Figure 6 shows the process of partial layer feature extraction.  According to human walking habits, the importance of subsections are different, so we defined subsection weights as shown in Table 1, and subsections that likely occur wear characteristics have larger weights. Individual identifying layer feature mainly detects the individual identifying characteristics caused by the material removed from the outsole or objects stuck on the outsole. An individual identifying layer feature may be represented as an interest point, line, or region on a shoeprint image. The word 'individual' here means that an individual feature should possess some very desirable criteria including nonrepeatability, that is, the individual feature cannot be detected in a shoeprint twice or two shoeprints left by different shoes. The individual identifying characteristics on a shoeprint mainly have two kinds of structures: the corner structure and the blob-like structure. Some samples of the two structures are shown in Figure 7. According to human walking habits, the importance of subsections are different, so we defined subsection weights as shown in Table 1, and subsections that likely occur wear characteristics have larger weights. Individual identifying layer feature mainly detects the individual identifying characteristics caused by the material removed from the outsole or objects stuck on the outsole. An individual identifying layer feature may be represented as an interest point, line, or region on a shoeprint image. The word 'individual' here means that an individual feature should possess some very desirable criteria including nonrepeatability, that is, the individual feature cannot be detected in a shoeprint twice or two shoeprints left by different shoes. The individual identifying characteristics on a shoeprint mainly have two kinds of structures: the corner structure and the blob-like structure. Some samples of the two structures are shown in Figure 7. To extract individual identifying layer features, the proposed method is based on two classes of interest point detectors. The two classes of feature points are described as follows: The first class of feature points used in the proposed method is based on Harris corners. This class of feature points is used to detect the individual identifying layer features that have a corner structure. The individual identifying characteristic with a corner structure consists of corner points that can be the intersections of two edges, and these corners usually have conventional structures, such as L-corners, T-junctions, and Y-junctions. The Harris corner detector has a strong response to the intersections of two edges, and it detects corner points by using first order derivations of an image f as follows: x x y x y x y y x y xy xy w x y x y x y x y where x f and y f denote the first order derivative of f with respect to the x and y direction.
Let ( , ) w x y represent the weighting function, and it is usually replaced by a Gaussian function To extract individual identifying layer features, the proposed method is based on two classes of interest point detectors. The two classes of feature points are described as follows: The first class of feature points used in the proposed method is based on Harris corners. This class of feature points is used to detect the individual identifying layer features that have a corner structure. The individual identifying characteristic with a corner structure consists of corner points that can be the intersections of two edges, and these corners usually have conventional structures, such as L-corners, T-junctions, and Y-junctions. The Harris corner detector has a strong response to the intersections of two edges, and it detects corner points by using first order derivations of an image f as follows: where f x and f y denote the first order derivative of f with respect to the x and y direction. Let w(x, y) represent the weighting function, and it is usually replaced by a Gaussian function g(x, y, σ) of a The multiple scales Harris detector can be performed based on: where σ i and σ d are the integration and differentiation scale, respectively [26,27]. We can compute the corner response of a point according to the advices in [28] as follows: where λ 1 and λ 2 represent the eigenvalues of A(x, y, σ i , σ d ). k is a parameter that can be empirically set as 0.04. As shown in Figure 7, individual identifying characteristics with the corner structure are always described by more than one corner point. We cluster these feature points in the position space, and choose center points of each cluster as the first class of candidate individual identifying characteristic points, and these points can be denoted as N 1 where (x i , y i ) denotes the position and N 1 is the number of the first class of candidate individual identifying characteristic points.
The second class of feature points in the proposed method is based on Hessian detector, and a Hessian detector can be performed by using the Hessian Matrix [27]. We use the Hessian detector to detect the individual identifying layer features with the blob-like structure, because the Hessian detector has a strong response to blobs. The Hessian Matrix can be represented as follows: We can compute the Hessian corner response of a point by using the following formula: The second class of candidate individual identifying characteristic points are denoted as B 2 = (x i , y i ) i = 1, . . . , N 2 , where (x i , y i ) denotes the position and N 2 is the number of the second class of candidate individual identifying characteristic points.
Uniqueness is the most visible trait that is used to distinguish individual identifying characteristics from shoe patterns. For a candidate individual identifying characteristic (x i , y i ) ∈ B and the patch centered on it, where B = B 1 ∪ B 2 = (x i , y i ) i = 1, 2, . . . , N , and N = N 1 + N 2 . We use the intensity distribution of each patch centered on each point of B to describe these points. Individual identifying characteristics are usually produced on the outsole randomly, which means that the individual identifying characteristics must be unique in a local region. In the process of individual identifying characteristic detection, periodic patterns are usually mistaken for individual identifying characteristic, as shown in Figure 8. We tell whether a point (x i , y i ) has the same pattern as point (x j , y j ) by calculating the normalized cross-correlation of the Fourier feature of the regions centering at these points:

Multi-Layer Feature Matching
Two matching techniques are introduced in the work presented in this paper, which are the wavelet-Fourier-Mellin transform and similarity estimation (WFSE) method [5] and Ciratefi [29]. Matching using WFSE method computes the wavelet-Fourier-Mellin feature distances between the probe shoeprint and questioned shoeprint. The second matching technique matches a patch in the probe shoeprint against patches in the questioned shoeprint and computes the matching score with the best matching patch. . Please refer to [5] for details.
With regard to the partial layer feature and individual identifying layer feature, we used a template matching method named Ciratefi [29] to find its best matching patch within a searching window of the questioned shoeprint. Figure 9 shows an example of the template matching method. An individual identifying layer feature F is extracted and encircled by the white dotted box as shown in Figure 9a, and patch A is an extension patch and locates at the same position in the questioned shoeprint as shown in Figure 9b, we aimed at finding the most similar patch with feature F in patch A , and assigned the maximum similarity value as the matching score   S F , which represents the similarity value of two shoeprints on feature F . The method compares feature F with every patch in a sliding window manner from patch A . The template matching method integrates three cascaded filters to yield a satisfactory result. The three filters are circular sampling filter, radial sampling filter, and template matching filter. The former two filters are used to eliminate candidate patches that have no opportunity to match with the probe template F , and the last filter is utilized to find the best matching patch and compute the matching score.

Multi-Layer Feature Matching
Two matching techniques are introduced in the work presented in this paper, which are the wavelet-Fourier-Mellin transform and similarity estimation (WFSE) method [5] and Ciratefi [29]. Matching using WFSE method computes the wavelet-Fourier-Mellin feature distances between the probe shoeprint and questioned shoeprint. The second matching technique matches a patch in the probe shoeprint against patches in the questioned shoeprint and computes the matching score with the best matching patch.
For the global layer feature of a pair of shoeprints U, similarities of the top region and the bottom region are computed according to the 2D correlation coefficient, respectively, and the similarity value of class characteristics P G (U) is computed by their correlation coefficient of global layer features. It is a weighted sum of correlation coefficients of both FMW S top (i) and FMW(S bottom (i)). Please refer to [5] for details.
With regard to the partial layer feature and individual identifying layer feature, we used a template matching method named Ciratefi [29] to find its best matching patch within a searching window of the questioned shoeprint. Figure 9 shows an example of the template matching method. An individual identifying layer feature F is extracted and encircled by the white dotted box as shown in Figure 9a, and patch A is an extension patch and locates at the same position in the questioned shoeprint as shown in Figure 9b, we aimed at finding the most similar patch with feature F in patch A, and assigned the maximum similarity value as the matching score S(F), which represents the similarity value of two shoeprints on feature F. The method compares feature F with every patch in a sliding window manner from patch A. The template matching method integrates three cascaded filters to yield a satisfactory result. The three filters are circular sampling filter, radial sampling filter, and template matching filter. The former two filters are used to eliminate candidate patches that have no opportunity to match with the probe template F, and the last filter is utilized to find the best matching patch and compute the matching score. The process of the template matching method is detailed as follows. A circular sampling filter projects the images A and F on a set of circular rings, and uses these projections to detect the first level candidate patches and their corresponding matching scales. Given the template F , a scaled x y : With regard to the image A , the circular projections for each 0 0 ( , ) x y in A can be denoted as: The correlation coefficient of F C and A C is used to detect the best matching scale for each pixel 0 0 ( , ) x y in A , which can be represented as: A candidate pixel x y can be obtained as follows: A radial sampling filter projects the images A and i F on a set of radial lines, and uses these projections to eliminate the first level candidate that have no opportunity to match the probe template i F . Radial sampling filter can obtain the second-grade candidate pixels by eliminating some of the first grade candidate pixels; meanwhile, it can also obtain their corresponding matching orientations. Given a first-grade candidate pixel The process of the template matching method is detailed as follows. A circular sampling filter projects the images A and F on a set of circular rings, and uses these projections to detect the first level candidate patches and their corresponding matching scales. Given the template F, a scaled image F i with the scale parameter σ i (i = 1, 2, . . . , n) can be represented by using Equation (4). Then, each scaled image F i is circularly sampled in a set of circle rings situated at radius r k (k = 1, 2, . . . , l) from the center pixel (x, y): With regard to the image A, the circular projections for each (x 0 , y 0 ) in A can be denoted as: A(x 0 + r k cos θ, y 0 + r k sin θ), (x 0 , y 0 ) ∈ A (11) The correlation coefficient of C F and C A is used to detect the best matching scale for each pixel (x 0 , y 0 ) in A, which can be represented as: A candidate pixel (x 0 , y 0 ) in A can be detected as the first grade candidate pixel if X(x 0 , y 0 ) > t 1 , where t 1 denotes the threshold. The corresponding best matching scale of the first-grade candidate pixel (x 0 , y 0 ) can be obtained as follows: A radial sampling filter projects the images A and F i on a set of radial lines, and uses these projections to eliminate the first level candidate that have no opportunity to match the probe template F i . Radial sampling filter can obtain the second-grade candidate pixels by eliminating some of the first grade candidate pixels; meanwhile, it can also obtain their corresponding matching orientations. Given a first-grade candidate pixel (x 0 , y 0 ) and its corresponding matching scale σ i (i = G 1 (x 0 , y 0 )), the scaled template image F i with the scale parameter s i can be represented by using Equation (4). The scaled image F i is radially sampled on the radial line situated at radius r l from the center pixel (x, y) and inclination θ j : For each first-grade candidate pixel (x 0 , y 0 ), the radial projections can be denoted as: The correlation coefficient of R F and R A is used to detect the best matching scale for each pixel (x 0 , y 0 ) in A, which can be represented as: where "shift" means circular shifting j positions of the argument vector. The first grade candidate pixel (x 0 , y 0 ) in A can be reserved and updated as the second grade candidate pixel if Y(x 0 , y 0 ) > t 2 , and t 2 denotes the threshold. The corresponding best matching orientation of the second-grade candidate pixel (x 0 , y 0 ) can be obtained as follows: The template matching filter computes the similarity between the patch X centered at each second grade candidate pixel (x 0 , y 0 ) and the scaled and rotated template F i that use the best matching scale σ i (i = G 1 (x 0 , y 0 )) and the best matching angle θ j ( j = G 2 (x 0 , y 0 )), and the structural similarity (SSIM) method proposed in [30] was used to evaluate the similarity as follows: where µ X denotes the mean of X, σ X denotes the standard deviation of X, and σ XF denotes the covariance of X and F i . The constants C 1 ,C 2 , and C 3 are used to avoid the denominators being close to zeroes. α, β, and γ are parameters which are used to adjust the importance of the corresponding components. The second grade candidate pixel (x 0 , y 0 ) with the highest structural similarity is chosen, and the template is considered to be found at pixel (x 0 , y 0 ), and the highest structural similarity is set as the feature similarity of this patch.
The similarity values of partial and individual identifying layer features between two shoeprints are calculated by using the Ciratefi method. The similarity values of two-layer features in subsection s i are computed as follows: where F P,m and F I,n denote the mth partial layer patch and nth individual identifying layer patch in the subsection s i , respectively, and S(F P,m ) and S(F I,n ) represent the similarity value of two shoeprints on feature F P,m and F I,n . N P (s i ) and N I (s i ) denote the number of partial layer features and individual identifying layer features in s i , respectively. P P (U) = P P (s i ) i = 1, 2, · · · , 19 and P I (U) = P I (s i ) i = 1, 2, · · · , 19 represent partial and individual identifying characteristics similarity value set of two shoeprints in nineteen subsections.

Decision-Making
For fusing the three layer features at the score level, the total similarity score p ∈ R + is used to denote the possibility that two shoeprints could have been made by the same shoe. For a pair of shoeprints U, their total similarity score p is defined as a function of the similarity values of global, partial, and individual identifying layer features, which can be expressed as p = f (P G , P P , P I , P W ) = sgn(P G − T G )P PI (21) where sgn(•) is a sign function. T G denotes a threshold that is used to determine whether the two shoeprints have the same class characteristics. P PI represents the combined feature score of two shoeprints on both partial and individual identifying layer features, which is mathematically defined as follows: where P W (s i ) is the weight value of subsection s i , and P W ( represents the weight order, which are listed in Table 1.
For a pair of shoeprints, if p ≤ 0, then we can make a conclusion that the two shoeprints may have different patterns; if p is greater than a positive predefined threshold T T , then we conclude that the two shoeprints could be left by the same shoe, otherwise they could not be left by the same shoe. The decision rule can be described as follows: where V(p) = 1 indicates the two shoeprints are from the same shoe, and V(p) = 0 denotes the two shoeprints are from different shoes.
Accord to Equations (21) and (23), two thresholds T G and T T are needed to be pre-trained to tell whether two shoeprint are from the same shoe. We used the decision cost function (DCF) [31] to choose these two thresholds, that is to say where FAR(T G , T T ) and FRR(T G , T T ) denote the false acceptance rate and the false rejection rate at the threshold T G and T T , which can be calculated with Equations (25) and (26). The proposed multi-layer-based shoeprint verification method is summarized in Algorithm 1.

Experiments
In this section, we evaluate the proposed method by conducting shoeprint verification experiments on the real crime scene shoeprint image database and the suspect shoeprint image database. The false rejection rate (FRR), the false acceptance rate (FAR), and the equal error rate (EER) are used to evaluate the proposed method. The following subsections detail the experiment configuration, performance evaluation, analysis and discussion.

Experiment Configuration
In the experiments, we automatically aligned each pair of shoeprints. Thus, the main differences between the shoeprints are of a structural nature (i.e., the class characteristics, wear characteristics, and individual identifying characteristics). The registration make it possible to compare our method with some robust methods that are not invariant to rigid transformations, e.g., the histogram of oriented gradients (HOG) method [32] and the normalized cross correlation method (NCC). We also compared some RST invariant methods, e.g., the SURF algorithm [33] with the proposed method.

Dataset
The dataset consisted of one testing set and one training set. The definitions and descriptions of the two kinds of datasets are detailed as follows.
(i) Testing set: A testing set is a collection of shoeprint images that need to be verified. The testing set contained 1200 pairs of reference shoeprints in the MUES-SV1KR2R dataset and 256 pairs of crime scene shoeprints in the MUES-SV2HS2S dataset. (ii) Training set: The training set is a collection of shoeprint images used to train the thresholds. The training set consisted of 300 pairs of reference shoeprint images and 100 pairs of crime scene shoeprint images. One hundred pairs of reference shoeprint images were from the same shoes. Twenty-five pairs of crime scene shoeprints are from the same shoes, and fifty pairs of shoe prints were not of the same class characteristics. Then accord to Equation (24), the optimal T G and T T can be achieved by operating shoeprint verification on the training set.

Evaluation Metric
In our experiments, we used the false acceptance rate (FAR), the false rejection rate (FRR), and the equal error rate (EER) to evaluate and characterize the proposed shoeprint verification system. These evaluation methods are always used to evaluate the performance of biometric verification, such as the finger vein verification [34,35] and palm vein verification [36].
The false acceptance rate means that deciding a claimed identity is a legitimate one while in reality it is an imposter. The FAR can be expressed as follows: where N FA and N IA denote the number of false acceptances and the number of imposter attempts, respectively. The false rejection rate means that deciding that a claimed identity is a not legitimate one while in reality it is the genuine one. The FRR can be expressed as follows: where N FR and N TA denote the number of failed rejections and the number of target access, respectively. The equal error rate (EER) represents the point where the false acceptance rate and the false rejection rate are equal, and a good system should keep this value as small as possible.

Performance Evaluation
In this section, we conducted two kinds of experiments to evaluate the performance of the proposed method. The first kind of experiment was to verify the performance of the proposed shoeprint verification method. The second kind of experiments was to compare the proposed method with some conventional methods.

Performance Evaluation of the Proposed Method
We evaluated the performance of the proposed method on two shoeprint datasets, and the datasets consisted of genuine and imposter pairs that included intra-class and inter-class matches. In this work, intra-class matching was defined as matching between a pair of shoeprint samples that were made by the same shoe, and inter-class matching was defined as matching between a pair of shoeprint samples that were left by different shoes. The experimental results are shown as follows, and Figure 10 shows the histograms of the similarity value for genuine and imposter shoeprint samples in MUES-SV1KR2R and MUES-SV2HS2S dataset, respectively. The left blue lines show the frequency distribution for inter-class similarity value, while the right red lines show the frequency distribution for intra-class similarity value. The results show that the frequency distribution of similarity value of inter-class are significantly lower than that of intra-class, which implies that the similarity value of the multiple-layer features for shoeprint samples left by the same shoe is significantly higher than the similarity value for shoeprint samples left by different shoe. The results also show that it is possible to tell whether two shoeprints are left by the same shoe.

Performance Evaluation
In this section, we conducted two kinds of experiments to evaluate the performance of the proposed method. The first kind of experiment was to verify the performance of the proposed shoeprint verification method. The second kind of experiments was to compare the proposed method with some conventional methods.

Performance Evaluation of the Proposed Method
We evaluated the performance of the proposed method on two shoeprint datasets, and the datasets consisted of genuine and imposter pairs that included intra-class and inter-class matches. In this work, intra-class matching was defined as matching between a pair of shoeprint samples that were made by the same shoe, and inter-class matching was defined as matching between a pair of shoeprint samples that were left by different shoes. The experimental results are shown as follows, and Figure 10 shows the histograms of the similarity value for genuine and imposter shoeprint samples in MUES-SV1KR2R and MUES-SV2HS2S dataset, respectively. The left blue lines show the frequency distribution for inter-class similarity value, while the right red lines show the frequency distribution for intra-class similarity value. The results show that the frequency distribution of similarity value of inter-class are significantly lower than that of intra-class, which implies that the similarity value of the multiple-layer features for shoeprint samples left by the same shoe is significantly higher than the similarity value for shoeprint samples left by different shoe. The results also show that it is possible to tell whether two shoeprints are left by the same shoe.

Comparison and Discussion
We also used the equal error rate to evaluate the performance of the methods on two datasets by making trade-offs between the false acceptance rate and the false rejection rate. The performance

Comparison and Discussion
We also used the equal error rate to evaluate the performance of the methods on two datasets by making trade-offs between the false acceptance rate and the false rejection rate. The performance of the methods is shown in the Figure 11, where the equal error rates of the proposed method were 3.2% and 10.9% on the MUES-SV1KR2R and MUES-SV2HS2S, respectively. To our best knowledge, there is no automatic shoeprint verification algorithm reported up to now, so we could not compare the proposed algorithm with the state-of-the-art methods. To further test the effectiveness of the proposed method, we conducted comparisons with some existing conventional interest point-based methods. To conduct these comparisons, we implemented the conventional methods based on the corresponding literatures. In the experiments, we mainly compared our method with some interest point detections, such as the Harris [28] and Shi-Tomasi [37] corner detectors. We integrated the normalized cross correlation method (NCC) and the histograms of oriented gradients method (HOG) with the interest point detections. The patch centered on each interest point was used to describe the point, and HOG and NCC methods were used to describe the patch and match shoeprints. We also compared some RST invariant methods with the proposed method, e.g., the SURF algorithm [33].
The compared methods were all implemented with MATLAB codes. The patch size was fixed in all experiments. To ensure a fair comparison, all of the shoeprints in the datasets were aligned. The receiver operating characteristics (ROC) curve on the MUES-SV1KR2R dataset are illustrated in Figure 11a, and the corresponding results are listed in Table 2. The results show that the NCC method and HOG method perform nearly equally for all interest points, respectively, and the NCC method can perform better than the HOG method on average. The reason may be that shoeprint registration is performed by using a FFT-based registration algorithm [25], and there are variations on transformation between shoeprints. NCC methods can be robust to transformational deviations. If there are not large variations on transformation between shoeprint, the HOG method may surpass the NCC method, because HOG methods take into consideration local structural information and can be robust to small transformational deviations by pooling local structural information into histograms. The results in general show that all of the methods perform well on MUES-SV1KR2R dataset. Our approach performs significantly better compared to the compared methods.  To our best knowledge, there is no automatic shoeprint verification algorithm reported up to now, so we could not compare the proposed algorithm with the state-of-the-art methods. To further test the effectiveness of the proposed method, we conducted comparisons with some existing conventional interest point-based methods. To conduct these comparisons, we implemented the conventional methods based on the corresponding literatures. In the experiments, we mainly compared our method with some interest point detections, such as the Harris [28] and Shi-Tomasi [37] corner detectors. We integrated the normalized cross correlation method (NCC) and the histograms of oriented gradients method (HOG) with the interest point detections. The patch centered on each interest point was used to describe the point, and HOG and NCC methods were used to describe the patch and match shoeprints. We also compared some RST invariant methods with the proposed method, e.g., the SURF algorithm [33].
The compared methods were all implemented with MATLAB codes. The patch size was fixed in all experiments. To ensure a fair comparison, all of the shoeprints in the datasets were aligned. The receiver operating characteristics (ROC) curve on the MUES-SV1KR2R dataset are illustrated in Figure 11a, and the corresponding results are listed in Table 2. The results show that the NCC method and HOG method perform nearly equally for all interest points, respectively, and the NCC method can perform better than the HOG method on average. The reason may be that shoeprint registration is performed by using a FFT-based registration algorithm [25], and there are variations on transformation between shoeprints. NCC methods can be robust to transformational deviations. If there are not large variations on transformation between shoeprint, the HOG method may surpass the NCC method, because HOG methods take into consideration local structural information and can be robust to small transformational deviations by pooling local structural information into histograms. The results in general show that all of the methods perform well on MUES-SV1KR2R dataset. Our approach performs significantly better compared to the compared methods. The receiver operating characteristics (ROC) curve on the MUES-SV2HS2S dataset are illustrated in Figure 11b, and the corresponding results are listed in Table 3. The results show that HOG method can perform better than the NCC method. The reason may be that shoeprints in the MUES-SV2HS2S dataset are always of low quality and shoeprints made by the same shoe are not exactly the same due to the varying imaging conditions. HOG methods take into consideration local structural information and can be robust to small transformational deviations by pooling local structural information into histograms. The results also show that the EER of all of the compared methods increase a lot, compared with shoeprint verification on MUES-SV1KR2R dataset. The main reason may be that shoeprint images in the MUES-SV2HS2S dataset are of low quality and with complicated backgrounds, and it results in that the interest points always lying on the shoeprint impression boundary and background. Our approach performes significantly better compared to the other feature detector and descriptor combinations. The results are summarized in Table 1. Our method also performes better than on the Harris, Shi-Tomasi, and SURF interest point method. That is mainly because (i) the multi-layer features used in our method use global and local information to analyze shoeprints and integrate their strengths to yield more satisfactory verification results; and (ii) we analyze and estimate the similarity of the local structural information by using the SSIM method.  Figure 12 shows three samples of shoeprint in the MUES-SV2HS2S dataset. The shoeprints in Figure 12 have the same shoe pattern, and the shoeprint shown in Figure 12b is left by a different shoe from Figure 12a,c. The main individual identifying characteristics are labeled with red circles as shown in Figure 12a. Figures 13 and 14 provide a visual illustration of the shoeprint verification results of the proposed method and the compared methods. We extract individual identifying characteristics on Figure 12a and compare these extracted characteristics at the same position in Figure 12b,c. The results between Figure 12a,b are shown in Figure 13, and the results between Figure 12a,c are shown in Figure 14. In Figure 13, the unmatched points are labeled with blue '*' s. The unmatched points denote that the two shoeprints in the point location were different. The matched points are labeled with blue '*'s in Figure 14. The results show that our approach can tell not only the differences between two shoeprint images but also the characteristics in common. The results show that the proposed method outperformed the compared methods.  Figure 13, and the results between Figure 12a,c are shown in Figure 14. In Figure 13, the unmatched points are labeled with blue '*' s. The unmatched points denote that the two shoeprints in the point location were different. The matched points are labeled with blue '*'s in Figure 14. The results show that our approach can tell not only the differences between two shoeprint images but also the characteristics in common. The results show that the proposed method outperformed the compared methods.   Figure 12a,b are shown in Figure 13, and the results between Figure 12a,c are shown in Figure 14. In Figure 13, the unmatched points are labeled with blue '*' s. The unmatched points denote that the two shoeprints in the point location were different. The matched points are labeled with blue '*'s in Figure 14. The results show that our approach can tell not only the differences between two shoeprint images but also the characteristics in common. The results show that the proposed method outperformed the compared methods.  Tables 2 and 3. Experimental results show that the proposed method required 280.6 s and 293.3 s of computation time for shoeprint verification on the two datasets. Although the proposed method requires more time to verify shoeprints than the compared methods, the performance of the proposed method surpasses the compared methods. Compared with the methods stated above, the proposed method matches not only the individual identifying layer feature, but also the global layer and partial layer feature, so it requires more computation. It is also for that more time is spent on the calling the Ciratefi method, because the Ciratefi method needs a lot of computations. In future works, we would like to reduce computation time by introducing parallel computing technology and optimizing the program code.  Finally, we compare the time-consumption of the methods stated previously. This kind of experiments is conducted on a Windows 64-bit system with a 3.3-GHz CPU and 8-GB RAM. The

Conclusions
In this work, we have proposed a multi-layer feature-based method to conduct shoeprint verification automatically. Multi-layer features were extracted to compute the similarity values of two shoeprints on each layer, and then a total similarity score was computed based on these similarity values and a conclusion was made by a piecewise function of total similarity score. Comparative experiments with state-of-the-art methods were conducted, and the experimental results showed that: (1) our method had a better performance than the SURF method as well as some other state-of-the-art methods; and (2) the higher the quality of the shoeprint images were, the lower the EER of the methods can be achieved. In conclusion, the EERs on the two shoeprint image databases, which were 3.2% and 10.9%, demonstrate that our method is an effective shoeprint verification method, especially for images derived by using the shoeprint scanners. Our next work is to design a much more effective classifier to improve the verification algorithm, and we will also focus on how to improve the performance on complex backgrounds.

Conflicts of Interest:
The authors declare no conflict of interest.