CNN-based first quantization estimation of double compressed JPEG images ✩

Multiple JPEG compressions leave artifacts in digital images: residual traces that could be exploited in forensics investigations to recover information about the device employed for acquisition or image editing software. In this paper, a novel First Quantization Estimation (FQE) algorithm based on convolutional neural networks (CNNs) is proposed. In particular, a solution based on an ensemble of CNNs was developed in conjunction with specific regularization strategies exploiting assumptions about neighboring element values of the quantization matrix to be inferred. Mostly designed to work in the aligned case, the solution was tested in challenging scenarios involving different input patch sizes, quantization matrices (both standard and custom) and datasets (i.e., RAISE and UCID collections). Comparisons with state-of-the-art solutions confirmed the effectiveness of the presented solution demonstrating for the first time to cover the widest combinations of parameters of double JPEG compressions.


Introduction
JPEG is the most commonly used file format for digital images.Indeed JPEG is by far the most common compression engine and has been investigated for digital investigation purposes.When facing a JPEG image, the first problem that could be addressed is image history reconstruction [1,2]: this could provide information about image authenticity and the acquisition device that generated it.After the acquisition (and likely a first JPEG compression), the image usually goes through a Social Network or an Instant Messaging platform [3] that in most cases, applies further JPEG compression.To recover information about the acquisition device [4][5][6] the forensic analysis of JPEG images specializes in several different tasks: (i) the Double Quantization Detection (DQD) to detect if an image has been JPEG compressed at least twice and (ii) the Quantization Step Estimation (QSE) [7,8].Forensics community has been spending considerable effort on DQD, facing the problem in different contexts [9][10][11][12] and employing different strategies [13][14][15].In a digital forensics investigation, DQD is often followed by FQE which has the goal to estimate the quantization matrix employed in the first JPEG compression.
Many approaches have recently been proposed to address this problem in various scenarios.Initially, several methods were designed to estimate the first quantization matrix in presence of file format changes (e.g., from JPEG to Bitmap) when information about the compression matrix is no longer available in embedded metadata (Fan and De Queiroz [2], Li et al. [16]).Later, scenarios involving multiple JPEG compressions (usually two, with some exceptions such as [17]) were investigated.Bianchi et al. [18][19][20] designed an FQE method based on Expectation Maximization algorithm while Galvan et al. [21] introduced a technique based on the analysis and filtering of Discrete Cosine Transform (DCT) histograms.Similar ideas have also been exploited in [22][23][24].Recently, Thai et al. [7,25] exploited intuitive insights usually employed in steganography to design a robust FQE approach.
Due to the large amount of data that can be properly collected and analyzed, machine learning algorithms have also been introduced in recent FQE methods.Lukáš and Fridrich proposed a first approach based on neural network in [26].Recently, Convolutional Neural Networks (CNNs) have also been exploited to design FQE solutions.In particular, state-of-the-art CNN results have been achieved by Niu et al. [27] and Tondi et al. [28] for both aligned and non-aligned FQE scenarios.These deep learning solutions have been designed to work directly on input patches; although, end-to-end learning can be exploited also in the FQE scenario, this design choice limits the usability of the provided models to the specific patch size used during the training phase.Moreover, in almost all cases the training phase involved only data processed by standard quantization tables producing in output models that suffer heavily of overfitting.To cope with this limit, Battiato et al. [29] proposed a solution based on both statistical analysis and machine learning approach (i.e., nearest neighbor classifier) in the aligned scenario obtaining effective results also in presence of custom quantization matrices also avoiding problem of overfitting.
In this paper we have considerably improved state-of-the-art results by employing a CNN-based solution coupled with a-priori investigation in terms of data exploitation and collection.It is worth noting that the proposed solution, to the best of our knowledge, is the first FQE technique based on deep learning specifically designed to work without any limits in terms of involved quantization matrices employed in the first and second compression steps and able to manage a wide range of input patch sizes without requiring any modification of the related network architecture.Moreover, a novel regularization approach was designed to cope with challenging conditions (i.e., homogeneous patches, specific relations among quantization factors) that are complex to deal with considering information contained in a single DCT histogram.Experiments in different scenarios, involving several datasets confirmed that the proposed strategy significantly outperforms previous studies by a large margin.The main contributions of the proposed solution can be then summarized as follows: The remainder of this paper is organized as follows.Section 2 reports the JPEG notation employed in the paper, Section 3 describes the proposed method with preliminary analysis and related parameters setting in Section 4. Experimental results and comparisons are reported in Section 5 whereas Section 6 concludes the paper.

JPEG notation
Given a raw image , JPEG compression [30] can be defined as a function   such that  ′ =   (), where  ′ is the JPEG compressed image,  is the quantization matrix (8 × 8) containing the quantization factors   ∈ N with  ∈ {1, 2, … , 64}.As first step,   () converts  from the RGB to YCbCr color space, and then divides the input image in 8 × 8 non-overlapping blocks applying also the integer DCT (Discrete Cosine Transform).Finally, each 8 × 8 block is divided, pixel by pixel, by , rounded and then encoded by classic entropy based engine.
In this paper, only luminance (i.e., Y channel) was considered.Let us also define  ′′ =   2 (  1 ()) a JPEG double compressed image, with  1 and  2 denote the quantization matrices employed for the first and the second compression respectively.Moreover, we refer to  as the standard quantization matrix associated to a specific quality factor [30] while   to further specify the th JPEG compression (e.g. = 1, 2, … ) in which the matrix was employed.We denote ℎ  the empirical distributions built from the th DCT coefficients extracted from the 8 × 8 blocks of  ′′ .Finally, we define the  quantization factors, in zig-zag order, of  1 as 1 1 , 1 2 , … , 1  , while we denote as 1 and 2 the quantization factors employed in the first and in the second compression respectively.

Proposed method
The proposed method for the estimation of first quantization matrix can be summarized as an ensemble of CNNs specifically designed for the task.Being a machine learning approach, in the following Subsections we will describe in details: (i) the features, (ii) the employed datasets, (iii) the neural network architecture with all the design information needed for reproducibility and finally the developed regularization strategies.

Features
The main aim of the proposed solution is to exploit the information contained in Discrete Cosine Transform (DCT) distributions computed from a double compressed image to estimate the first  quantization factors employed in the first compression.Preliminary results achieved by Battiato et al. [29], by working on DCT histograms, suggested us to further investigate allowing us to better manage and exploit the considerable amount of involved data.In particular, many state-of-theart approaches based on machine learning [27,28], usually train their models considering a dataset built with a fixed quantization matrix in the last compression (often a standard matrix related to a specific  ).However, this design choice strongly limits, in some way, the effectiveness of the provided models.In real applications, it is very likely to find double compressed images with  2 different than the one used in those studies.This would force the investigator to build a new dataset and to perform a new training phase on it finally obtaining a new model with the desired  2 (moreover it should be demonstrated to work effectively).
To better exploit the information contained in the input data (DCT histograms), limiting also the number of models to be trained, for each 2 value, only two sets of empirical distributions ℎ  related to DC and AC coefficients were considered in the proposed approach.All histograms related to the AC terms, due to the similarity of their distributions [31], were then collected together.In our tests, 1  denotes the maximum quantization factor value to be predicted.For example, in the matrix reported in Figs.1(c) and 1(d) considering  = 15 in zig-zag order, 1  values are 5 and 21 respectively (highlighted in the figures).
To sum up, 2 ⋅ 1  models allowed us to deal with double compressed images with a generic second compression matrix  2 whose quantization factors are lower than or equal to 1  in the first  positions (zig-zag order).Let consider that strategies involving all possible  quantization factors in a unified way actually have to take into account 1   combinations.Considering parameter values usually employed in state-of-the-art solutions,  = 15 and 1  = 22, in order to cope with the set of second quantization matrices, 22 15 models have to be trained, whereas employing our strategy we reduced such amount to a finite short number equal to 44.The proposed strategy allows to deal with a large number of double quantization parameters maintaining a just feasible workload, in terms of computational effort.

Employed dataset
A well known critical aspect in the design of a machine learning algorithm is the choice of a proper dataset to be employed in the training phase.Although several datasets are available in literature, in this paper, RAISE [32] was considered.The choice of RAISE allowed us to obtain heterogeneous images with different resolutions.RAISE is composed of 8156 high-resolution uncompressed images captured in different scenes (indoor, outdoor etc.) employing different cameras.
The double compression phase was carried out extracting different  ×  central patches from raw images and compressing the images with a proper combinations of constant matrices   (Figs.1(a   Finally, to better organize this huge amount of data (i.e., 8156×22×22× 4 × 15), all the distributions were clustered according to the parameters , 2 and DCT coefficient (DC, AC) generating 4 × 22 × 2 = 176 different sets.It has to be noted that other datasets (BOSSBASE [33] and UCID [34]) were also employed in this paper to avoid overfitting in the parameter setting (Section 4.1) and to test the performance of the proposed solution also considering different configurations in terms of image resolution (Section 5).BOSSBASE [33] is a dataset composed of 1000 512 × 512 grayscale images, created in 2010 for a scientific challenge with the goal to figure out which images contained a hidden message and which images do not, while UCID [34] is a dataset which has over 1300 medium resolution uncompressed images often employed for forensic purposes.

Network architecture
The aforementioned choices about employed dataset and histogram generation and organization, are important starting point to the design of a machine learning approach able to work in real scenarios with custom quantization matrices.Deep learning techniques can considerable improve the overall accuracy whilst maintaining robustness and generalization properties.
Given a double JPEG compressed image  ′′ the main aim of the proposed solution is the estimation of the  first quantization factors employed in the first compression.It is worth noting that  2 values can be read directly from the JPEG file: the second quantization factors 2  with  ∈ {1, 2, … , } are already available.Furthermore, as already pointed out in Section 3.2, for each 2, two different DCT coefficient types (DC or AC), have to be taken into account, due to their difference in terms of statistical distribution.We then trained one DC-CNN and one AC-CNN for each possible value of 2 ∈ {1, 2, … , 1  }.Each CNN has the architecture synthetically sketched in Fig. 2. The input of these networks are the normalized DCT histograms with ℎ 2 = ⌈ 1025∕2 ⌉ bins.The size of the following layers is functions of ℎ 2 .Fig. 2 summarizes graphically the neural network architecture layers that consist of 2D convolutions carried out with 1 × 3 filters, a batch normalization and a ReLu activation whereas the last two layers are fully connected layers with the standard softmax function with 1  = 22 values as output layer.As regards the training phase is concerned, we employed Stochastic Gradient Descent (SGD) as optimizer with a starting learning rate 10 −3 and momentum 9 −1 , while the categorical cross-entropy was the loss function employed during a 15-epochs training ran with batches of 512 images.Moreover a decay step on learning rate value was carried out, with the drop value described in Eq. (1).
where  is the epoch,   is the learning rate of epoch ,  0 is the starting learning rate,   = 0.2 is the drop value and  = 3 is the number of epochs for every drop.

Regularization
Sometimes the amount of information contained in an input histogram could be not enough to estimate the related first quantization factor.This lack of information could depend both on the input data (e.g., homogeneous regions) and specific 1, 2 combinations (e.g., multiples).For example, considering 2 = 5, first quantization factor values equal to 1 and 5 are difficult to discriminate.To limit these issues, assumptions about neighboring element values in the quantization matrix can be exploited.To verify these assumptions empirically, an analysis on a dataset (Park et al. [35]) of actual quantization matrices was performed.Specifically, the dataset consists of 1170 different matrices: 1070 custom and 100 standard JPEG quantization tables.Considering only the matrices with 1  ≤ 1  = 22 and  ∈ 1, 2, … , 15, 919 tables (both custom and standard) were selected and the empirical distribution of differences between consecutive quantization factors in zig-zag order were built.As shown in Fig. 3, neighboring elements in the quantization matrices (zig-zag order) are usually associated to similar values (i.e., their difference is close to zero).
Considering then a set of  consecutive first quantization factors to be estimated, a cost function  can be designed as the weighted average of a data term (  ) and a regularization term (  ): where  ∈ [0, 1],   is a cost term related to the goodness of the estimation of first quantization factors under analysis, and   is a regularization term that tries to minimize differences among neighboring 1 values.An equation similar to (2) has been already proposed in [29] where data used to compute the   term were obtained employing an algorithm based on nearest neighbors.However, the main limit of the regularization approach proposed in [29] was the strategy adopted to compute the cost function .That solution actually considered all the combinations of the  consecutive quantization factors.Such as example, if 1  is 22, 22  combinations, and then evaluations of the cost function , have to be performed ( = 3 in [29]).
To increase the number of consecutive elements to be considered, an analysis of the output of the proposed CNN was performed.Also in presence of challenging conditions (e.g., multiples), the softmax output could provide useful information that can be exploited to limit the set of 1 to be taken into account.Specifically, the softmax output is a vector made up of 1  values that can be interpreted as probabilities (they are all positives and sum to one).In Fig. 4 is reported the softmax output computed by the proposed CNN with an histogram obtained from a double compressed 128 × 128 patch with 1 = 5 and 2 = 5.Although the estimation provided by the network is wrong (i.e., the first quantization factor reporting the maximum value is 1 = 1), the score associated to 1 = 5 (i.e., the correct value) is comparable with the best one.It is worth noting that the probability associated to the event 1 = 1 is 0.484 whereas the joint probability related to 1 = 1 or 1 = 5 is 0.959.A set of first quantization factors can be then selected to achieve a satisfactory probability.This behavior can be then exploited in different ways depending on the specific scenario.The softmax output, interpreted as a probability, could be used as a simple index of reliability of the estimate performed.Such as example, considering a threshold value of ℎ = 0.9, one or multiple elements can be selected in the estimation of the quantization matrix.
Another way to exploit this amount of knowledge is to help reducing the number of first quantization factors to be considered as candidates for the final estimation.Fixed a threshold ℎ in the range [0, 1], and denoted as   the output provided by the softmax function with respect to the event 1 = , the smallest set of first quantization factors whose summation of related   is higher than ℎ is selected.The quantization factors belonging to this set can be easily collected sorting probabilities   in decreasing order and computing the cumulative sum.For instance, considering the softmax outputs depicted in Fig. 4 and ℎ = 0.95 only two 1 (1 and 5) are selected (see Fig. 5).

Overall analysis
As already pointed out in Section 3, the proposed solution differently than previous works was specifically designed to work with a wide set of  2 matrices.For each 2, two CNNs related to DC and AC terms were trained by employing the parametric architecture depicted in Fig. 2.Although the designed CNN, taking as input DCT histograms, is not strictly limited to be used with a specific patch size, the accuracy with respect to different input parameters was also evaluated.More specifically, for each patch size  ∈ {64, 128, 256, 512}, 2 ⋅ 22 CNNs were trained with empirical histograms from double compressed JPEG images (see Section 3.2).Each dataset, one per patch size, was split into 80% training, 10% validation and 10% test and exploited to train 4 sets of 2⋅22 CNNs.Each group of CNNs is then trained with histograms obtained from input patches of the same size.It has to be noted that histograms containing no information [36] have been removed and were not considered in our tests.Results of the proposed CNNs at varying of patch size and 1, 2 combinations are reported in Fig.
where average values are computed with respect to the first 15 DCT coefficients.The obtained accuracies strictly depend on the amount of information contained in the input histogram (higher at increasing of patch size) and on the combination of 1 and 2 values (e.g., multiples).It is worth noting that the reported results were computed considering both training and test set related to input patches of the same size.To further study the performance of the proposed solution, additional tests were performed considering a scenario with a mismatch between train and test set patch size.Table 1 shows the average accuracy, computed with respect to 1 ∈ {1, 2, … , 22}, 2 ∈ {1, 2, … , 22} and the first DCT terms, achieved by each couple dataset/CNN.As expected, it is evident that, for each dataset, the best result corresponds to the CNN trained with images of the same size.Note that scenarios involving both Table 1 Accuracies of the CNNs trained with a specific patch size with respect to all the four generated test datasets ( ∈ {64, 128, 256, 512}).Average values were computed with respect to 1 ∈ {1, 2, … , 22}, 2 ∈ {1, 2, … , 22} and the first 15 DCT terms.Each test set is a subset (10%) of the related one described in Section 3.2 built employing constant matrices for first and second compressions and images from RAISE [32]  standard and custom quantization matrices actually select a subset of the possible 1, 2 combinations.Reported average accuracies are then not the same of the ones shown in Section 5.
As already pointed out in Section 3.1, the proposed solution, taking as input DCT histograms, does not strictly depend on a specific patch size.In order to improve the overall effectiveness and usability of the method, a single set of networks can be then trained with multiple patch sizes avoiding then the selection of a specific group of CNNs for each patch size.This improvement can be also really useful whenever input patch size actually differs from the one employed to train the models (e.g., 96 × 96, 384 × 512, etc.).
A novel dataset was then built by simply merging the collections employed before with  ∈ {64, 128, 256, 512}.However, the data to be handled requires a large amount of memory resources.In order to train the proposed method as the patch size varies, while exploiting all available data, a solution based on ensemble of CNNs was considered.Specifically, the merged dataset was split into 10 subdatasets, 8 employed for training, 1 for validation and 1 for test.Three CNN ensembles were considered with the following strategy:   represents an ensemble of  CNNs trained with 8∕ training subdatasets  ∈ {2, 4, 8}.As reported in Table 2 all the proposed ensembles, differently than CNNs trained with fixed patch size, achieve satisfactory accuracy in all the considered test sets  ∈ {64, 128, 256, 512}.Although all the considered models achieve comparable accuracy ( 2 slightly better than remaining ones), the solution with  = 2 does not increase considerably the execution time with respect to the networks trained with a fixed patch size.

Regularization (parameter settings)
To better justify the design choices related to the regularization approach described in Section 3.4 several tests were performed.Specifically, four double compressed datasets obtained cropping central patches with  ∈ {64, 128, 256, 512}, from 1000 images selected from BOSSBASE collection [33] were built.This collection was considered in the parameter setting to limit the overfitting with respect to the dataset employed to train the CNNs (i.e., RAISE [32]).To cope with real scenarios, double compression is performed employing custom tables from [35] by considering only the matrices with 1  ≤ 1  = 22 and  ∈ 1, 2, … , 15.Two different   terms were considered: where  indicates the position (zig-zag order) of the DCT term under analysis,  the number of considered neighbors and   the probability (i.e., softmax output) provided by the proposed CNN at position  related to 1  .Moreover, two different   terms have been investigated: where 1  is the first quantization candidate at position  (zig-zag order) under analysis.In Fig. 7 are reported the average accuracies obtained employing (2) considering all the possible combinations of   ((3), ( 4)) and   (( 5), ( 6)) with  = 3.For each weighting factor , the average accuracy is computed taking into account the four aforementioned datasets ( ∈ 64, 128, 256, 512) and the DCT coefficients.Best performances are obtained with  2 and  1 .An additional test was carried out with  2 and  1 at varying of the number of neighbors  ∈ {3, 5, 7} and weighting factor .Moreover, to make results comparable with respect to different number of neighbors , only positions  = 4, … , are considered in the parameter setting tests.Note that, although the regularization strategy described in Section 3.4 considerably reduces the average number of combinations, worst case scenario has to be avoided.To this aim, the maximum number of allowed combinations per estimation was set to 10 6 .As can be easily seen from Fig. 8,  = and  = 0.43 provide the best results.

Experimental results
In order to demonstrate the effectiveness of the proposed method, a series of comparisons with the state-of-the-art solutions were carried out.The device employed to run the experiments was an hardware equipped with a GPU NVIDIA TESLA K80.Both statistical [19,21,22,37] and machine learning approaches [27][28][29] were selected for comparison.The original code provided by the authors of the aforementioned state-of-the-art solutions was employed.Moreover, to cope with real  scenarios, both standard and custom matrices have also been considered in our tests.Finally, as already described in state-of-the-art, high frequencies are usually killed by the JPEG compression and after a certain position the values are zero (the so-called 'dead-zone').An acceptable trade-off employed by the community is the usage of  = 15.Tests described in the following sections, for sake of comparison, were then performed with  = 15.

Comparison test
Some recent state-of-the-art solutions have been designed to work with specific patch sizes [27,28].To properly compare the proposed method with the aforementioned approaches, a first series of tests were then performed considering several scenarios involving 64 × 64 patches as input.Moreover, the effectiveness of the proposed solution has been demonstrated considering both statistical [19,21,22,37] and machine learning methods [27][28][29].Specifically, four double compressed datasets were generated starting from random 64 × 64 patches cropped from RAISE collection [32] (one patch for each RAISE image): where  1 ∈ {5, 6, 7, 8, 9, 10, 11, 12} of (3) and ( 4) are referred to Photoshop's quantization matrices (version 20.0.4).
Datasets ( 1) and ( 2) are actually related to a classical scenario involving only standard quantization matrices whereas, datasets (3) and (4), employing also Photoshop's quantization tables, can be considered a more challenging test to verify the robustness of the considered methods with respect to real conditions (i.e., custom quantization matrices).Note that Dalmia et al. [22] has not been taken into account in the comparisons involving dataset (3) and (4) due to the assumptions about the standard matrices of first compression in the provided implementation.
As reported in Table 3, where a classical scenario involving standard quantization matrices is considered, the proposed approach outperforms state-of-the-art solutions in almost all combinations.These results are also confirmed in Figs.9(a) and 9(b) with performances analyzed at varying of DCT coefficients.Moreover, the robustness of the proposed solution at varying of the employed quantization matrices has been tested with custom tables.As reported in Table 4 and Figs.9(c) and 9(d), the gain in terms of achieved accuracy of the proposed approach with respect to the other CNN-based methods [27,28] increases considerably.Differently than the proposed approach, end-to-end CNN solutions employed in [27,28] suffer in presence of quantization matrices that have not been considered in their training process.

Generalizing test
Most of state-of-the-art methods have been designed and tested considering standard quantization tables.However, as reported in [35]    different combinations (every ordered couple among Low, Mid and High) have been then considered.Moreover, to study the performance in the wild conditions 8 different input datasets were created: 4 different patch sizes (64 × 64, 128 × 128, 256 × 256, 512 × 512) cropped from RAISE [32] and UCID [34].For each dataset, the quantization tables employed for double compression were randomly selected from the 291 available in the corresponding set (Low, Mid, High).It is worth to note that UCID dataset, due to the different resolution of the original images used to extract patches vs. the collection employed to train the CNNs (i.e., RAISE [32]), allow us to verify the robustness of the proposed solution with respect to the variability of the dataset.
As can be seen from Figs. 10 and 11 the proposed approach achieves satisfactory accuracy even in this challenging scenario.In addition, the results are closely related to the amount of information contained in the input histogram.An higher accuracy is therefore obtained as the patch size increases and with UCID dataset [34].

Tampering localization
FQE can be simply employed to perform tampering localization.The classical scenario is the following one: a copy-paste of JPEG image (foreground image) is applied on a JPEG image (background image) in order to add or hide some information.To localize the tampered areas a sliding window approach can be applied, getting a map for each DCT coefficient; every window estimation represents a pixel and values related to tampered areas must be different than the other ones.We conducted a test with the following parameters.Starting from a JPEG image compressed with  = 60 (the background image, Fig. 12 in the top) 2 tampering were applied: the first one is a copy-move of a patch extracted from the same background image, compressed with  = 90 and moved in the top-left corner of the image while the second one is an external image JPEG compressed with  = 80 and applied in the bottom-right corner.The image was then compressed again with  = 90 obtaining the tampered image depicted in Fig. 13 (bottom).To localize tampered regions, FQE estimation was done with our method for every patch 64 × 64 of a sliding window moved 8 pixel each time (in both the directions).For each DCT coefficient, starting from the values provided by the FQE algorithm, a mask can be generated.As can be seen from Fig. 14, tampered regions can be easily detected from the generated masks.Future steps of this work will include the analysis of the FQE in presence of different types of manipulations [38] but also in presence of artifacts introduced by the normal life cycle of an image (e.g., the manipulation introduced by the upload on Social platforms [39]).12) and a second compression with  = 90 is performed.

Conclusion
The estimation of the first quantization matrix is useful to recover information about the history of the image under analysis mainly for forensics purposes.In this paper, a novel CNN-based estimation solution has been proposed in the aligned scenario.By proper collecting and training a neural architecture by considering 1D-histograms of DCT values (AC and DC terms) the proposed method outperforms state-of-the-art solutions by a large margin.Moreover, experiments results carried out in challenging scenarios, confirmed the robustness of the designed solution with respect to input patch size, quantization matrix (both standard and custom) and employed datasets (RAISE and UCID).A regularization strategy devoted to improve overall results in challenging conditions (i.e., quite homogeneous patches, specific ) and 1(b)) with  ∈ {1, 2, … , 1  }.In our tests, the value of 1  was set to 22 whereas 4 different patch sizes with  ∈ {64, 128, 256, 512}

Fig. 1 .
Fig. 1.Example of the constant matrix   with  = 4 (a) and  = 22 (b), standard quantization matrix with  = 90 (c) and custom quantization matrix extracted from Photoshop with quality 5.

Fig. 2 .
Fig. 2. Parametric architecture representing the trained CNNs.The first layer represents the distribution of th DCT coefficient, compressed the second time with 2.The input distribution is then reduced in ℎ 2 bins.The following four layers are 2D convolutions with a filter 1 × 3, batch normalization and ReLu activation function.The last two layers consist of a fully connected and a softmax layer with 22 elements.

Fig. 4 .
Fig. 4.An example of softmax output provided by the proposed CNN considering an AC histogram obtained with 1 = 5 and 2 = 5.

Fig. 5 .
Fig. 5.An example of 1 candidate selection from softmax output values provided by the proposed CNN considering the same input of Fig. 4. Softmax outputs   are sorted in descending order, the cumulative sum is computed (in green) and compared with a threshold ℎ (in red).The first set of quantization factors whose cumulative sum is higher than ℎ are considered as candidates (1 and 5).

Fig. 6 .Algorithm 1 3 : 9 :
Fig. 6.Accuracies of the trained CNNs at varying of patch size and 1, 2 combinations.Average values were computed with respect to the first 15 DCT coefficients.

Fig. 7 .
Fig. 7. Average accuracy of the proposed regularization solution considering  = neighbors and all the combinations of   and   formulas.

S
.Battiato et al.

Fig. 8 .
Fig. 8. Average accuracy computed considering  2 and  1 (best combinations) at varying of  (i.e., number of neighbors).Note that  = 1 actually corresponds to results achieved without employing any regularization.

Fig. 9 .
Fig. 9. Accuracies of the same methods described in Tables 3 and 4 at varying of the quantization factors 1  to be predicted.The values are averaged over all the  1 ∕ 1 .

Fig. 10 .
Fig.10.Comparison between the proposed solution, Battiato et al.[37] and Battiato et al.[29] considering custom tables from[35] and patches from RAISE dataset at varying of  ∈ {64, 128, 256, 512}.L, M, and H represent respectively the sets of matrices Low, Mid and High described in Section 5.2.

Fig. 13 .
Fig. 13.Tampered image employed in the test.Several patches have been added to the original image (Fig. 12) and a second compression with  = 90 is performed.