Inference of a compact representation of sensor ﬁngerprint for source camera identiﬁcation

Sensor pattern noise (SPN) is an inherent ﬁngerprint of imaging devices, which provides an effective way for source camera identiﬁcation (SCI). Although SPNs extracted from large image blocks usually yield high identiﬁcation accuracy, their high dimensionality would incur a high computational cost in the matching stage, consequently hindering many applications that require eﬃcient camera matchings. In this work, we employ and evaluate the concept of principal component analysis (PCA) de-noising in SCI tasks. Based on this concept, we present a framework that formulates a compact SPN representation. To enhance the de-noising effect, we introduce a training set construction procedure that minimizes the impact of various interfering artifacts, which is especially useful in some challenging cases, e.g., when only textured reference images are available. To further boost the SCI performance, a novel approach based on linear discriminant analysis (LDA) is adopted to extract more discriminant SPN features. To evaluate our meth-ods, extensive experiments are conducted on the Dresden image database. The results indicate that the proposed framework can serve as an effective post-processing procedure, which not only boosts the performance, but also greatly reduces the computational cost in the matching phase.


Introduction
Nowadays, the use of digital images or videos as evidence in the fight against physical crime and cybercrime is a norm, which makes multimedia forensics crucial. Typically, multimedia forensics includes source camera verification and identification, sourceoriented images classification, integrity verification, forgery detection, authentication, etc. Source camera identification, as an important branch of multimedia forensics, is about answering the question: Which one of the many cameras has taken the image in question? This is actually a task of matching the camera fingerprint of an image in question to a set of reference fingerprints, each representing a different camera. The size of the reference fingerprint set can be in the order of millions. How to deal with such a task more accurately and efficiently is the focus of this paper.
In order to link digital images to the source cameras, many techniques have been proposed in the last two decades. These techniques can be broadly divided into three categories. The sim-homogeneity of silicon wafers. It is essentially the slight variations in the intensity of individual pixels. For instance, even if a sensor takes an image of an evenly lit scene, the resulting image will still exhibit slight changes in intensity between individual pixels [3] . Every image taken by the same sensor would exhibit the same SPN pattern, while two sensors, even made from the same silicon wafer, would exhibit uncorrelated patterns [3] .
The dimensionality of SPN is as large as that of the original image. As a result, not only each SPN needs a fairly large amount of space for storage, but memory access would also take considerable amount of time. Moreover, SPN matching involves vector operations and the complexity is proportional to the size of SPNs. Thus, with a large number of reference SPN in the database to be matched, the complexity of matching process would become a critical concern.
In order to address the high complexity issue, many effort s [12][13][14][15][16][17][18] have been made in recent years. In [12] , Bayram et al. embeded reference SPNs in a binary search tree, where the leaf/internal node represents a reference/composite SPN. Based on this structure, the total number of SPN matchings to be performed is substantially reduced. However, errors tend to increase significantly when a large number of reference SPNs are stored in a single binary tree. On the other hand, more methods reduce the computational complexity by compressing the SPN. In [13,14] , the authors introduced a SPN digest technique for dimensionality reduction, which preserves the largest elements and their corresponding locations. In [15] , Bayram et al. binarized SPN, which considerably reduces the storage requirements and speeds up loading of SPN into the memory. However, the binarization process inevitably degrades the matching accuracy due to information loss. In [16,17] , Valsesia et al. reduced the dimensionality of SPN using random projection. However, since the subspace is randomly selected, the obtained representation is unlikely to be optimal and tends to compromise the matching accuracy.
To alleviate the common limitation (i.e., reduced accuracy) of the afore-mentioned SPN compression methods [13][14][15][16][17] , in our previous work [19,20] , we presented a feature extraction algorithm based on the concept of PCA de-noising [21,22] , and promising results were achieved on a small dataset. However, this method is based on the assumption that the training set is well representative of the population so that an effective SPN f eature extractor can be learned. Unfortunately, the noise residuals in the training set can be contaminated by many sources of interference, making the training set less representative. To learn a robust SPN feature extractor from the noisy training data, in this work, we further propose a training set construction procedure and provide its theoretical basis. We also provide more detailed discussion of the SPN feature extractors and treat it as a general post-processing framework on other SPN methods. It is evaluated in term of effectiveness and efficiency on a much larger dataset. We also test this framework on some challenging cases, e.g., all the reference SPNs are extracted from images with significant scene details (a form of distortion to the SPN), which are scenarios barely considered by previous works.
The rest of this paper is organized as follows. Section 2 provides a brief review on the three main steps of the SPN-based SCI system. In Section 3 , we present the proposed training dataset construction procedure and the feature extraction method in details. In Section 4 , the proposed source camera identification method is summarized, which is then followed by extensive experimental evaluations in Section 5 . Section 6 concludes the work. Note that, in this manuscript, we use bold upper-case letters to represent matrices, and bold lower-case letters to denote vectors.

Background
In order to decide whether a query image is taken by one of the cameras in a large dataset, three main steps are required, i.e., SPN extraction, reference SPN estimation and SPN matching. In this section, techniques for these three steps are briefly reviewed.

SPN extraction
The most important step of the SPN-based SCI framework is to extract the SPNs from digital images. In [4] , Chen et al. modeled the output of imaging sensor I and explained the general idea about how to extract SPN, such as In Eq. (1) , I (0) is the noiseless sensor output and I (0) K represents the discriminative part of SPN, i.e., PRNU noise, which is a multiplicative noise and the signal of our interest. The matrix K is the PRNU multiplicative factor, where all the elements in it are typically close to 0.
is a combination of random noise, such as shot noise, read-out noise, and quantization noise. In order to extract the signal of interest I (0) K from the observation I , the host signal I (0) should be removed. Generally, the noiseless image I (0) is unknown, but we can estimate it by de-noising the observation I , i.e., where F indicates a de-noising algorithm and ˆ I (0) is an estimation of the noiseless image I (0) . Then, the signal of interest can be roughly extracted by subtracting the estimation ˆ I (0) from the observation I , such as where X is the noise residual where the true SPN is present, is the sum of and two additional noise terms introduced by the de-noising filter.
From Eq. (2) , one can see that the better a de-noising algorithm F is, the closer the de-noised version ˆ I (0) is to the noiseless image I (0) , and thus the less noise would be introduced by the de-noising filter and left in the final output X . Therefore, the performance of a SPN extractor is primarily determined by the choice of the denoising algorithm F . In [3] , Lukas et al. proposed to transform the noisy image I into wavelet transform domain and apply the Mihcak filter [23] to extract the SPN components from the high frequency wavelet coefficients of I . In [24] , Chierchia et al. proposed to replace the Mihcak filter with a more recent technique, namely the sparse 3D transform-domain collaborative filtering [25] . In [26] , Kang et al. proposed a SPN predictor based on context adaptive interpolation (PCAI), which is to apply the context adaptive interpolator [27] as the de-noising function F to predict the noiseless image I (0) and extract SPN in the spatial domain.
Also demonstrated in Eq. (2) is the fact that the noise residual X contains not only the SPN term IK but also the noise term . This leaves room for further enhancement. In [5] , Li demonstrated that the noise residual contains the traces of scene details. Therefore, Li proposed 5 enhancing models to attenuate the impact of scene details. In [28] , Li and Li proposed a color-decoupled SPN extraction method to prevent the color interpolation errors from propagating into the noise residual. In [29] , Chen et al. proposed to suppress the JPEG blocky artifacts by transforming the noise residual into the discrete Fourier transform domain and suppressing the Fourier coefficients with extremely larger magnitude.

Reference SPN estimation
This step aims at estimating the reference SPN for a camera. Typically, the reference SPN, R , for a camera is estimated by averaging N (e.g., N ≥ 20) noise residuals extracted from flat-field/lowvariation images (e.g., blue sky images) taken by that camera, such as Random noise presented in different images are different, while the true SPN components would be the same as long as these images are taken by the same camera. Therefore, the random noise components can be averaged out in R while the true SPN components are accumulated. In [4] , Chen et al. proposed a maximum likelihood estimation (MLE) method to estimate the reference SPN. They also proposed two enhancing operations, namely zero-mean (ZM) and Wiener filtering (WF) in the discrete Fourier transform (DFT) domain, to remove the artifacts caused by camera processing operations from the reference SPN. In [30] , Lin and Li argued that the true SPN is unlikely to be periodic and should have a flat spectrum. Therefore, they proposed another reference enhancing method, namely spectrum equalization algorithm (SEA), to detect and suppress the peaks appearing in the DFT spectrum of the reference SPN so as to remove the periodic artifacts.

SPN matching
Once both query SPN and reference SPN are obtained, the matching step can be performed. Such a task can be treated as a binary hypothesis test as follows H 0 : X = R i (the query image is not taken by the i th camera) , H 1 : X = R i (the query image is taken by the i th camera) .
Here a correlation-based detector is used to make the decision between H 0 and H 1 by comparing the correlation ρ( X, R i ) to a pre-calculated threshold τ . The detector accepts H 1 when ρ ≥ τ or H 0 when ρ < τ . The normalized cross-correlation (NCC) is usually used to measure the similarity between the query noise residual X ∈ R M×M and the reference SPN R ∈ R M×M , which is defined as where X and R are the mean value of X and R , and · is the L 2 norm. Given an upper bound on the false positive rate (FPR), the threshold τ for the detector can be calculated via the Neyman-Pearson approach [31] . In [32] , Goljan pointed out that NCC is sensitive to the influence of periodic noise, and proposed the peakto-correlation energy (PCE) ratio as a replacement to measure the similarity between two SPNs. More recently, Kang et al. [6] proposed another measurement, namely correlation over circular correlation norm, to reduce the FPR of a SCI system. The aforementioned methods can be combined to further boost performance gains. For example, forensic investigators can apply the Mihcak filter to extract the noise residuals from both query and reference images, and enhance the query noise residuals with Li's enhancing models [5] and improve the reference SPNs with either the ZM+WF operations [4] or the SEA algorithm [30] , and finally apply NCC or PCE as the similarity measurement for SPN matching. Moreover, in many applications, such as source-oriented image clustering and SCI in a large-scale reference SPN databases, taking the full-sized image into account is not computationally feasible and a block smaller than the full-sized image is used. Due to the vignetting effect on the peripherals of images [33] , it is suggested that such a block is better cropped from the center of the fullsized image. The noise residuals extracted from larger blocks usually yield higher identification accuracy, but they also have high dimensionality. The complexity of matching a query image with the camera in the database is O ( mc ), where m is the dimensionality of each noise residual and c is the number of cameras in the database. Considering the fact that there may be tens of thousands of reference SPNs (each representing a camera) in the database, matching the high dimensional noise residuals may incur excessive computational costs. To address this problem, we propose a new approach in the next section.

Proposed SPN feature extraction and enhancement
Generally speaking, high-dimensional SPNs not only incur a high computational costs but also tend to contain more redundancy and interfering components. For simplicity, we write Eq. (2) as the sum of the true SPN and unwanted noise, i.e., where X (0) is the true SPN, and represents an additive mixture of unwanted interferences, which may include scene details and the artifacts introduced by color interpolation, JPEG compression and other camera processing operations [4] . The former can be scene-specific, while the latter can be shared among cameras of the same model or sensor design. Therefore they are non-unique, less discriminant and redundant. In order to improve the performance of SCI systems, one intuitive way is to suppress these artefacts .
PCA [34] is a well-known unsupervised learning method, which minimizes the reconstruction error using a linear transformation, and can be used to learn compact representation of highdimension data. This method has been widely used for the purpose of de-noising [21,22] , dimensionality reduction [35] , feature extraction [36] , etc. Compared with data-independent dimensionality reduction methods, such as random projection, the PCA projection matrix is learned based on a training data, and it generally has higher performance in classification tasks [37] . In this work, we attempt to find a PCA transformed domain, where the true SPN is well represented. Ideally, by projecting the extracted noise residuals onto this domain, a small set of coefficients that contain most of the representative information of the true SPN can be extracted.

Training set construction
In order to identify such a transformed domain, a representative training set needs to be established in advance. PCA is to find an optimal transformed domain that better represents the primary signal shared among the training samples. So if SPN appears as the most representative signal among the training samples, it would be better to represent it in the obtained domain. However, some contamination (e.g., scene details) can be more dominant than SPN in the noise residual (as shown in Fig. 1 (b)). Without removing these strong contaminations from the training set, the obtained domain is more likely to represent these noisy components rather than the true SPN. To avoid this situation, we propose the following strategies to minimize the impact of the unwanted noise in the training set: 1. Training sample selection: To build the training set, if we have access to the cameras in the database, we give the priority to the noise residuals extracted from flat-field images (e.g., blue sky). Such images are more similar to the evenly lit scene and contain less scene details so that these images can better exhibit the changes caused by SPN. However, in many real-world scenarios, the cameras in question may not be in the investigator's possession, making it impossible for the investigator to use the cameras to take flat-field images. Instead only images with varying scene details taken by those cameras are available (e.g., from someone's Facebook account). In this case, our strategy is to suppress the impact of scene details through averaging. Considering the fact that scene details presented in different images are normally different, we can generate a smoother sample by averaging several noise residuals of the images taken by the same camera. By repeating this process several times, we can finally generate a set of training samples, which are more representative.
We also model the afore-mentioned contamination-removal process based on Eq. (5) . In this context, θ represents the scene details, while ˆ X is the sum of SPN and some non-unique artifacts (e.g., CFA pattern and JPEG blocky artifacts), which will not be suppressed by averaging in this stage. Given that, for a camera with N reference images, each pixel's mean and variance in the reference SPN can be expressed as μ X = ˆ For a camera, if we repeat averaging the SPNs of a random subset of T out of the N reference images for L times, then according to Eq. (5) we will have The new mean and variance for each pixel can be expressed as follows non-unique artifacts such as CFA patterns and JPEG blocky artifacts may also lead to unsatisfactory training. Since these artifacts in the images taken by the cameras of the same model or brand are similar (with small variation), they would survive the averaging operation. Nevertheless, as we have shown in [30] , these artifacts cause peaks in the DFT magnitude spectrum, while the SPN appears as a flat spectrum without salient peaks. Therefore, by suppressing the peaks present in the DFT spectrum, these artifacts can be effectively suppressed and the quality of the true SPN in the noise residual can be thereby enhanced.
Assume there are n reference images , each responsible for N images such that n = cN. According to the two afore-mentioned strategies for training sample selection and enhancement, we can summarize the proposed training set construction as follows: from the blocks of W × W pixels cropped from the center of the n reference images.
(2) For each camera C j , randomly select T noise residuals from (3) Detect and suppress the peaks of the averaged noise residual in the DFT magnitude specturm with SEA [30] . Then concatenate the 2D output into a column vector as a training sample x ij . Note that we use X ij to represent 2D noise residuals and x ij to represent their 1-d version. (4) Repeat the process in Steps (2) Step (2), we randomly select T images from each camera for averaging. As discussed above, it is preferable to set T to a larger value so as to better attenuate the impact of scene details and random noise. However, since the CFA pattern and JPEG blocky artifacts are shared among the images taken by the camera, the averaging operation would also inevitably enhance these two artifacts in each training sample. However, the peaks caused by these artifacts are more distinct in the DFT spectrum and they can be more easily and accurately detected. Given that, setting T to a large value would also help SEA to achieve a more accurate peak detection in Step (3), which would consequently increase the effect of enhancement. More details about how the setting of T affects the performance is discussed in Section 5.2 .

SPN feature extraction through PCA
PCA is performed to seek a set of orthonormal eigenvectors where we apply a fast method [38] instead of computing these eigenvectors when m n .
Assuming v k is the unit eigenvector of A T A ∈ R n ×n with eigenvalue λ k , we can obtain A T A v k = λ k v k . By multiplying both sides by A , we get where Av k are the eigenvectors of A A T = S with eigenvalues λ k .
Thus, instead of decomposing matrix S directly, we can calculate the eigenvectors v k by decomposing a smaller matrix A T A ∈ R n ×n .
Then v k can be obtained via v k = A v k . Computing eigenvectors in such a manner incurs a complexity of O ( n 3 ). Considering the fact that the number of training samples tends to be much smaller than the size of SPNs (i.e., n m ), thus computing eigenvectors in such a manner would be more efficient than the traditional one. The obtained { v k } n k =1 are normalized and sorted in the descending order according to their corresponding eigenvalues λ 1 λ 2 · · · λ n . Subsequently, a transformed domain can be built as After that, we can apply M pca to noise residual x (defined in Eq. (5)) through where y (0) and y are the transformed versions of the SPN term and the noise term, respectively. Now the problem is recast as estimating y (0) from the noisy y . Generally speaking, in a PCA transformed vector (i.e., y ), most energy of the primary signal among the training set would concentrate on the first several elements of y , while the energy of the noise would be distributed in y much more evenly. Therefore, only retaining the first several elements of y while discarding the rest would preserve the energy of the signal of interest and suppress the energy of the noise. Following this concept, the eigenvectors with the d largest eigenvalues are selected to form an SPN feature extractor With this SPN feature extractor M d pca , we can obtain a new feature with much lower dimensionality by where y d is the compact representation of y . With the feature vector y d and SPN feature extractor M d pca , it is reasonable to assume that we can obtain a reconstructed SPN in the spatial domain via the inverse PCA transform as follow where x is an approximation of the original x . If our assumption is correct, noise y should be suppressed by the PCA-based SPN feature extractor. As a consequence, the reconstructed x should contain less noise and have a higher signal-to-noise ratio (SNR) than the original noise residual x . To validate our assumption, we demonstrate the behavior of our SPN feature extractor with a simple example. As shown in Fig. 1 (b), the scene details in Fig. 1 (a) propagates through the Wiener filter into the noise residual. After performing the proposed SPN feature extraction and inverting the PCA transformation, the artifacts caused by the scene details have been significantly suppressed in the reconstructed SPN, as shown in Fig. 1 (c). The effect of the proposed method can also be quantitatively evaluated by comparing the Signal-to-Noise Ratio (SNR) of the true SPN, to the contaminated version ( Fig. 1 (b)) and (to the reconstructed SPN ( Fig. 1 (c)). First, the true SPN x (0) is estimated by averaging 50 noise residuals extracted from blue sky images.
According to Eq. (5) , the noise in the noise residual ( Fig. 1 (b)) and (the reconstructed SPN ( Fig. 1 (c)) can be estimated by subtracting the true SPN x (0) from the observed data, respectively.
Then, the SNR can be calculated according to 10 log 10 v ar(x (0) ) v ar( ) . As expected, the reconstructed SPN has a much higher average SNR (4.3 dB) than the original noise residual ( −15.5 dB), which further validates our assumption.

SPN feature enhancement through LDA
In the task of SCI, the source cameras of the images in the database are usually known, which means the class label of each image is known. If this is the case, by taking advantage of this prior knowledge, we can further extract a more discriminant feature by using a supervised learning method, i.e., linear discriminant analysis (LDA) [39] . The purpose of using LDA in this work is to build an enhancer M lda to enhance the SPN feature extractor M pca so as to extract more compact representation from the original noise residual x . This enhancer can be obtained by maximizing the ratio of the determinant of the between-class scatter matrix S b to the determinant of the within-class scatter matrix S w where S w is defined as y i is the i th sample of class j , μ j is the mean of class j, c is the number of classes, and L is the number of samples in each class. The between-class scatter matrix S b is defined as where z is another compact version of the noise residual x ; M e = M d pca M lda is the refined SPN extractor which is used for extracting z directly from the original x . In most cases, c − 1 would be much smaller than d so that z would be more compact than y d .

Source camera identification
The camera identification process using the proposed compact features are summarized in Algorithms 1 and 2 . We call the feature vectors y d and z produced by Algorithms 1 and 2 as "PCA-SPN" and "LDA-SPN", respectively, throughout the rest of this paper. As mentioned earlier, the complexity of calculating correlation is proportional to the feature size. Considering that the size of PCA-SPN ( y d ∈ R d ) and LDA-SPN ( z ∈ R c−1 ) are both much lower than that of the original noise residual ( x ∈ R m ), using either y d or z in place of the original x would lead to approximately a m / d or m/ (c − 1) times gain in speed in the matching phase.
In addition, given a required false positive rate, the detection thresholds τ y and τ z for the PCA-SPN ( y d ) and LDA-SPN ( z ) can be determined by using the Neymann-Pearson criterion approach [31] .

Experiments
In this section, we carry out experiments on the Dresden image database [40] to validate the feasibility of the proposed methods. First we evaluate and discuss some main parameters, which play key roles in the proposed methods. Significant performance gain is achieved by using the proposed training construction process, which can suppress the unwanted noise. After that we plot   7. Calculate the NCC value ρ(z q , z C j ) between query z q and each reference z C j using Eq. (4). 8. Accept H 0 if ρ(z q , z C j ) < τ z , otherwise accept H 1 .
the histogram of intra-class and inter-class correlations to demonstrate the effectiveness of the PCA/LDA features. Based on several popular SPN algorithms, we also use our methods as a postprocessing framework, and we also compare the dimensionality of different features under the same situation so as to evaluate the compactness of different types of features. Finally, the performance in terms of computational efficiency of the proposed methods are reported.

Experimental setup
In this work, images taken by 36 cameras from the Dresden image database [40] are used. As listed in Table 1 , we can see these 36 cameras are of 12 different models, each having 1-5 devices.
A total of 7200 images from these 36 cameras are involved in our experiments. Each camera contributes 200 images, including 150 images with varying scene details (i.e., textured images) and 50 flat-field images. We consider two scenarios with different types of reference images (i.e., flat-field and textured), as shown in Table 2 . For each image, a blocks of 512 × 512 pixels cropped from the center is used in the experiments so as to avoid the vignetting effect [33] .
For each image block, we extract the noise residuals from three color channels (i.e., red, green and blue channel) and combine them by using the following linear combination to form a grayscale version, such that where x R , x G and x B are the noise residuals extracted from the red, green and blue channel, respectively. In our experiments, the noise residuals extracted with the methods in [3] (Basic), [4] (MLE), [24] (BM3D) and [26] (PCAI8) are served as the original SPNs. SEA [30] is applied to enhance the reference SPNs and the training samples for PCA-SPN and LDA-SPN. The results are compared against the SPN Digest of [14] .
NCC defined in Eq. (4) is used to measure the similarity in the SCI tasks.

Parameter settings and discussions
In this work, one of the most important parameters is the number of noise residuals ( T in Eq. (6) ) used to estimate a training sample (also referred to as the random subset size). As discussed in Section 3.1 , we set T to a relatively large number (i.e., T → N , and N = 50 in this paper) so as to minimize the impact of scene details and random noise. Fig. 2 depicts the performance sensitivity (i.e., True Positive Rate (TPR) with the False Positive Rate (FPR) fixed at 10 −3 ) to T in the two SCI scenarios described in Table 3 The dimensionality d of PCA-SPNs obtained from different SPN methods w.r.t. different setting of T and different reference types.

Method
Flat-field textured Basic  1042  609  1159  867  MLE  1013  605  1138  863  BM3D  1029  598  1148  848  PCAI8  1066  663  1148  860   Table 2 . We can see that generally the performance based on PCA-SPN from BM3D features is not very sensitive to the setting of T (i.e., its performance is stable in a wide range of value of T [20,48]). It improves slightly with an increasing value of T , reaching the peaks for both scenarios (i.e., with flat-field/textured reference) when T = 48 . It is worth noting the result with T = 1 is the case without applying the proposed training set construction process, and the large performance gap (e.g., when compared with T = 48 ) indicates the effectiveness of our proposed training set construction process, especially for textured references. It is also interesting to see that the TPR drops dramatically when T > = 49 , since when T → N ( N = 50 ), all the training samples from the same camera become similar. Especially, when T = N all the training samples from the same camera would become exactly the same. In this case, we literally have only one training sample per camera, and the training set is not large enough to learn the effective feature representation [41] . Therefore, to minimize SPN distortion, we set T to 48 throughout the rest of the paper. It is also interesting to discuss d , the dimensionality of PCA-SPN in different cases. Clearly, we prefer d to be as small as possible without compromising the identification accuracy. d is determined by two main factors, namely the percentage of the total variance retained in Eq. (12) and the quality of the training set. As shown in Eq. (12) , the value of d is affected by the percentage of the total variance that we aims to preserve (i.e., 98% in this paper). The less percentage that is retained, the smaller value of d would be. Table 3 shows the dimensionality d of the PCA-SPNs obtained from different SPN extraction methods with respect to different settings of T (i.e., T = 20 and T = 48 ), for two types of reference images. In both flat-field and textured training sets, we can see that the dimensionality d of PCA-SPNs decrease when T is larger. One reason is that, with a larger T , according to Eq. (8) we can see that the quality of the training set tend to be better (i.e., lower σ 2 X ), thus the energy of the true SPN is more concentrated in the transformed domain. As a result, the SPN feature extractor requires less leading eigenvectors to cover the 98% of the total energy. Similarly, flat-field reference images (with training samples of higher quality) also tends to have a more compact representation than their textured counterpart, as shown in Table 3 . It is worth mentioning that d is insensitive to the size of original SPN. According to our experimental results, the PCA-SPN derived from large image blocks has a similar size to the one from small image blocks. This observation indicates that the PCA-SPN is compressed more effectively when its original SPN is extracted from larger image blocks.

Distributions of intra-class and inter-class correlations
We evaluate the effectiveness of different f eatures in terms of the distribution of their inter/intra-class correlations. A great separation between intra-class and inter-class distributions of a feature suggests the feature's high discriminative power. Experiments are conducted using 3 different types of SPNs (i.e., original SPN, PCA-SPN, and LDA-SPN) in the 2 SCI scenarios (with flat-field/textured reference as listed in Table 2 ). Results are reported in Fig. 3 , from which we can see the means of the intra-class correlations are sig-nificantly increased by using PCA-SPN and LDA-SPN, when compared with the results based on the original SPNs. Specifically, for the two SCI scenarios, the application of PCA increases the mean of the intra-class correlations from 0.046 to 0.564 for the flat-field references while from 0.033 to 0.412 when only given the textured images as reference. The means of the intra-class correlations can be further boosted by using LDA-SPN owing to its supervisedlearning nature, to 0.883 and 0.838, respectively, in the two scenarios.
The increase in the mean of the intra-class correlations results in the rightward shift of the intra-class distribution, which widens the separation between the intra/inter-class similarity distributions. However, the variance of the inter-class correlations is also increased in the case of using PCA-SPNs and LDA-SPNs. For example, in the case with flat-field references, the inter-class variance for PCA-SPN and LDA-SPN are 7 . 8 × 10 −4 and 6 . 8 × 10 −3 , respectively, which are higher than that of the original SPNs, 5 . 4 × 10 −6 . However, the increase in variance are trivial when compared to the displacements of the means of the intra-class correlations (i.e., 0 . 564 − 0 . 046 = 0 . 518 and 0 . 883 − 0 . 046 = 0 . 837 ) away from the inter-class mean. This suggests the benefits of applying PCA-SPN and LDA-SPN in the SCI tasks. This is clearly reconfirmed in Fig. 3 , where the overlapping area between the intra-class and inter-class distributions of PCA-SPN and LDA-SPN are much smaller, making the two distributions more separated (especially with LDA-SPN).
In addition, when using the original SPN (as shown in the first columns of Fig. 3 ), the intra-class distribution has small peaks in the overlapping area, which is mainly due to the small negative correlation exhibited among the matching SPN pairs. These small correlations are probably caused by the strong distortions due to scene details in some query images. Nevertheless, when using PCA-SPN and LDA-SPN (as shown in the figures from the last two columns), the numbers of small negative intra-class correlations are significantly reduced. As a result, the overlapping area decreases substantially, again reconfirming the merit of PCA-SPN and LDA-SPN. Moreover, since the separation is mainly caused by the rightward shift of the intra-class distribution, which has a major influence on the False Rejection Rate (FRR). As such, PCA-SPNs and LDA-SPNs have particular advantage in the situations where low FRR is preferred.

Performance comparison -accuracy
We can use the afore-mention methods (i.e., training set construction and PCA/LDA-based SPN feature extraction) as a postprocessing for the existing SPN extraction methods. For evaluation purpose, here we report the performance (in terms of ROC curves) of four popular methods, namely, Basic [3] , MLE [5] , BM3D [25] and PCAI8 [26] combined with and without the proposed post-processing method. Moreover, since our method aims to compress the size of SPNs, we also present another SPN compression method (i.e., SPN Digest [14] ) for comparison. SPN digest is primarily formed by retaining the top k largest elements from a mdimensional SPN ( k < m ). Therefore, the size of SPN digest is k , which is lower than that of the original SPN. While a reference SPN Digest not only contains top k elements but also the corresponding locations of these k elements. This location information is used to extract the digests from the query SPNs so that it ensures the reference and query digest are extracted from the same locations. In this experiment, we set k / m equal to 10% and 20%.
In this experiment, the overall ROC curve is used for performance comparison. The number of true positive decisions and false positive decisions are first recorded for each camera. A true positive decision is made when hypothesis H 1 is true and H 1 is accepted, while a false positive decision is made when H 0 is true but H 1 is accepted. The total number of true decisions and false de-  cisions are calculated and then used to calculate the true positive rate P t p and false positive rate P f p , respectively. Since the same number of images by each camera are use in our experiment, we can simply calculate the P t p and P f p for a threshold as follows where c is the number of cameras; T is the number of query images from all cameras; D i t p and D i f p are the number of true positive decisions and false positive decisions made for camera C i . By varying the detection threshold from the minimum to maximum value, we can obtain the overall ROC curve. In real-world forensic applications, it is often necessary to ensure a sufficiently low FPR. Therefore, we plot the horizontal axis of the overall ROC curve in the logarithmic scale. Fig. 4 shows the overall ROC curves of different features based on the Basic SPN extraction method [3] in the two SCI scenarios described in Table 2 , i.e., with flat-field/textured reference images. The black, green, yellow, red and blue curve indicates the performance of the original SPN (i.e., Basic), SPN Digest-10%, SPN Digest-20%, PCA-SPN and LDA-SPN, respectively. In both SCI scenarios, we can see that SPN Digest performances very closely to the original SPN when 20% of the top largest elements are retained, but its performance degrades rapidly when the amount of the retained elements is reduced to 10%. On the other hand, the LDA-SPN (blue line) achieves the best ROC performance regardless of the type of reference images, while the PCA-SPN (red line) takes the second place. The same observation can be made from Figs. 5 to 7 when different SPN extraction methods (i.e., MLE, BM3D and PCIA) are used respectively.

Performance comparison -compactness
In this section, we compare the compactness of different types of features. The dimensionality of SPN Digest is determined by k / m . Therefore, in the case of using the image block of 512 × 512 pixels, the size of SPN Digest-10% and -20% are 26,215 and 52,429, respectively. As listed in Table 3 , with flat-field references, the dimensionality of PCA-SPNs based on Basic, MLE, BM3D and PCAI are 609, 605, 598 and 663, respectively, and they are accordingly increased to 867, 863, 848 and 860 when the references are textured images. The size of LDA-SPN is always equal to c − 1 , which is 35 in this experiment. This observation shows that the dimensionality of SPN Digest is much higher than that of PCA-SPN and LDA-SPN. Considering the results obtained in Section 5.4 , we can conclude that both PCA-SPN and LDA-SPN are superior to SPN Digest in terms of compactness and identification accuracy. Experimental results also validate that the proposed SPN feature extraction method can be used as a general post-processing method applied after various SPN extraction methods in the SCI task.

Performance comparison -computational complexity
An efficient SCI system plays an important role when (i) there is a database with a large number of references and (ii) thousands of query SPNs are required to be identified. In order to testify the proposed framework on a large database, we perform this experiment on a synthetic database, which contains 180 cameras derived from the 36 cameras in Table 1 based on the fact that SPNs are location dependent (i.e., SPN blocks cropped from different locations of the same full-sized SPN are not correlated). To build this synthetic database, we first estimate the full-sized reference SPN for each camera in Table 1 . Then we crop 5 SPN blocks of 512 × 512 pixels from different locations of each full-sized reference SPN and treat them as references for 5 different cameras, so that eventually we obtain 180 reference SPNs in total. We also generate 18,0 0 0 query SPNs in the same manner.  Table 4 shows the running time for matching the 18,0 0 0 query samples with the simulated 180 cameras w.r.t. different types of features. In this case, the size of the original SPN, SPN Digest, PCA-SPN and LDA-SPN are m = 262 , 144 , k = 52 , 429 , d = 2 , 484 and c − 1 = 179 , respectively. This experiment is conducted on the same PC with an Intel Core i5 3.20 GHz processor and 16G RAM. In order to reduce the storage requirement, all the data in this experiment is stored in the uint8 data type. To do so, we first project all the data onto the range of [0,255] and then convert the data type from double-precision floating-point to uint8 before storing.
To quantify the efficiency of an identification system, three factors are considered in this experiment. The first factor is "I/O operations", which includes the cost of loading the references and the SPN feature extractors into memory for processing. The second one is "Feature Extraction", indicating the time spend on producing SPN Digest, PCA-SPNs or LDA-SPNs from the 18,0 0 0 query noise residuals. The third factor is the computational cost for calculating the similarity between the 18,0 0 0 query samples and the 180 references, which is referred to as "Matching". The overall computational cost is presented as "Total".
As shown in Table 4 , PCA-SPN incurs the highest computational cost in terms of I/O operations. It is because the data needs to be loaded into the memory. It includes not only the 180 mdimensional reference vectors but also an m × d -dimensional feature extractor ( M d pca ). As shown in Table 5 , PCA-SPN needs a very small space to store its 180 reference vectors (0.43MB) but a relatively huge space for the feature extractor (621.21MB). With such a large amount of data in total, it is not surprising to see PCA-SPN incurs the highest computational cost in terms of I/O operations. LDA-SPN also needs to load a feature extractor ( M lda ), but its size is only m × (c − 1) so that the space it occupies is much smaller than that of PCA-SPN, which is 44.80MB. Moreover, since the size of LDA-SPN is only c − 1 , its storage overhead for the 180 reference vectors (0.03MB) is the lightest among all the features. In this experimental setting, the total storage requirement of LDA-SPN (44.83MB) is just slightly lower than that of the original SPN (45.05MB), but this margin will grow in a linear manner w.r.t. the increasing number of cameras.
SPN Digest requires the smallest storage among these 4 types of features. As mentioned earlier, the digest of a normal-sized reference SPN consists of not only the k top largest elements but also the corresponding location information of these k elements. This location information will be used to extract query digests from the query SPNs so that the location information of each reference digest can be also treated as a feature extractor. Therefore, when using SPN Digest, the data to be loaded includes not only 180 k -dimensional reference digests but also 180 corresponding k -dimensional SPN feature extractors, which take up a space of 18.02 MB in total. As a result, SPN Digest incurs the lowest computational cost in I/O operations (as shown in Table 4 ).
As mentioned in [42] , the process of matching a query feature with all the references in the database has complexity proportional to the product of the number of references and the feature size. For example, when using the Original SPN, the complexity of the matching phase would be O ( cm ) . Since the number of query sam-ples and references in the database are fixed in this case, therefore LDA-SPN, which is of the lowest dimensionality, incurs the least computational cost in the matching phase. PCA-SPN takes the second place, followed by SPN Digest and Original SPN. Although LDA-SPN, PCA-SPN and SPN Digest incur an extra computational cost in the feature extraction process, but with all aspects taken into account, we can see from Table 4 that replacing Original SPN with LDA-SPN, PCA-SPN or SPN Digest can significantly reduce the overall computational cost.
Bear in mind, these above-mentioned post-processing methods would also incur an extra computational cost in the training process or the process of estimating the optimal SPN Digest. However, compared to the processes that have to be conducted on-line (i.e., the processes listed in Table 4 ), PCA/LDA training or SPN Digest estimation can be performed off-line, and there is no need to rerun these processes as long as the population of database does not change. Moreover, the efficiency of the off-line operations of an SCI system is generally less important when it is compared to the identification accuracy or the on-line matching efficiency. Therefore, the computational cost of the off-line operations, i.e., PCA/LDA training and SPN Digest estimation, are not counted in this experiment.

Conclusion
In this paper, we introduced and evaluated the concept of PCA de-noising in the SCI task. Based on this concept, an effective framework for de-noising and compressing full-sized SPNs is proposed. We also proposed a training set construction method that minimizes the impact of interfering artifacts, which plays an important role in learning the SPN feature extractor that is insensitive to various unwanted noise. Both theoretical derivations and experimental results suggest that our methods can be used as a general post-processing framework for effective and efficient source camera identification. It is worth mentioning that the proposed framework also achieves very competitive performance in the challenging tasks when only textured references are available, which is usually the case in real-world applications. However, so far we focus on the case that the reference SPNs of all the cameras in question are included in the training set, while in real-world forensic applications, reference SPNs of new cameras will continuously be added to the database. In this case, the proposed system needs to re-perform the training process with the new cameras or reference SPNs of the cameras involved so as to maintain the identification accuracy. A new line for future research is to develop a new methodology that can progressively update the previously trained SPN feature extractor to accommodate the newly received reference SPNs without having to re-train the entire expanded set.