Fast pornographic image recognition using compact holistic features and multi-layer neural network

of the recognition or classification algorithm is to determine the correlation between the training set features and input features. If the recognition or classification algorithm returns a high similarity score (more than a given threshold score), the input features are concluded (recognized) as pornographic content then it will be blocked to be seen, saved, and shared. In order to contribute to the


Introduction
The spreading of pornographic contents (pictures, cartoon, animation, video, and text) is a big problem for a country like Indonesia due to their many adverse effects, particularly for kids and youths.A survey report in 2005 showed that 8% of emails, 12% of homepages browsing, 25% of the queries in search engines related to pornographic contents [1].Concerning to the adverse effects of pornographic contents, the most one is an addiction on accessing pornographic contents.It means the addict always be flying or dependent on pornographic contents.Additionally, the early pregnancy or sex deviation behavior also can be triggered by pornographic contents addiction, especially on kids and teenagers who do not understand enough of the undesirable consequences of pornographic contents.Consequently, a blocking scheme which can reject of browsing, accessing, and making the pornographic contents is required to minimize their adverse effects.In order to build a reliable blocking system of pornographic contents, a proper recognition or classification algorithm is needed.In this case, the function of the recognition or classification algorithm is to determine the correlation between the training set features and input features.If the recognition or classification algorithm returns a high similarity score (more than a given threshold score), the input features are concluded (recognized) as pornographic content then it will be blocked to be seen, saved, and shared.In order to contribute to the A R T I C L E I N F O A B S T R A C T rejection of pornographic contents, an alternative pornographic image recognition using compact holistic features and the neural network is presented.The compact holistic features which are invariant features against pose and scale is extracted by shape and frequency analysis on pornographic images under skin region of interests (ROIs).The skin ROI has been confirmed that it can handle the large pornographic images variability due to background variations [2], [3].The classification based on a neural network is employed to overcome the recognition time.The main aim of the paper is to design new scenario of pornographic recognition using compact holistic features representing the shape and dominant skin information which can improve the performance of existing methods (i.e., methods based on skin probability, eigenporn, Multilayer-Perceptron and Neuro-Fuzzy, etc.).In this research, the performance indicators for examining our proposed methods are accuracy, false negative rate (FNR), false positive rate (FPR), and computational time.
Regarding to the previous works, there are three main groups of adult/pornographic image recognition scheme [4] namely, the approach based on color, shape, and local descriptor.While other researches [1] also have grouped approaches to pornographic filtering into three major classes: 1) based on text contents, 2) based on collection lists of adult website addresses that is blocked by internet firewall, and 3) based on image content analysis [1], [2], [5]- [7].Regarding texts content-based method, it classifies the material to be pornographic using the probability/entropy of texts related to pornographic contents that are available in the material (i.e., websites).However, this method fails to block material having many pornographic images and videos.Next, the website URL based method rejects the accessing the adult image using internet firewall such as squid that has the rule to block the website addresses (URLs) list belonging to pornographic contents.However, adult website grows quickly, and the owner can easily rename the website address.Finally, the method based on image analysis performs the rejection of accessing website based on the images or videos that exist in the media.It can solve the weakness of the two above methods.As mentioned early, the last type of filtering method faces many obstacles because of the huge image variability due to the skin, pose, lighting, and background data.
Additionally, detection of pornographic images algorithms also can be categorized as based on region, contour [8], human skin probability [2], [6], [9], scalable color, edge histogram, and shape descriptors [10].Both region and contour of the pornographic image were obtained by using skin color.Those schemes were recommended to solve the difficulties on recognition of the pornographic image.However, they still hold less achievement, especially high false negative and positive data due to the huge changeability of the input images.Commonly, the skin region scheme which was based on threshold model in YCbCr, RGB, and HSV color space [2], [9] and Gaussian mixture models [8] was employed for skin segmentation.Concerning to procedures of feature extraction, they can be classified into a local (eyes, nose, and mouth, genital), holistic, and shape feature extraction techniques.All of the procedures of feature extraction were commonly applied because they can work quickly and properly.The example approach of holistic feature extraction is content-based feature extraction using frequency analysis (FFT [11], DCT [12], and Wavelet [13], [14]), feature point descriptor using scale-invariant feature transforms (SIFT [15]- [17], eigenporn of HSV-ROI feature extraction [18], and descriptors of color, edge, and shape of pornographic images [10].The freshest strategy for pornographic recognition was a modification of Convolutional Neural Network (CNN) called Deep Multicontext Network(DMN) [19].In DMN frameworks, a deep CNN is applied to model blending features of sensible objects in images.It seems the DMN algorithm requires a complicated process which impacts to the computational cost.
The diversity of pornographic recognition schemes show that the pornographic image recognition is an overwhelming task due to the huge variability of the images.It also means the pornographic image recognition still challenges the research topic.Furthermore, those approaches did not consider the computational time of the recognition process yet.Therefore, an alternative solution to pornographic recognition problem using compact holistic features representing shape and dominant skin information and multi-layer neural network (MNN) is proposed to improve the established pornographic recognition methods, which is potentially implemented for mobile devices.In order to confirm the robustness of the proposed method, the experimental data will be compared to those of the well-known schemes of pornographic image recognition methods (i.e skin probability, skin region [2], [3], [8], [9], eigenporn of HSV ROI [18], fusion descriptor of YCbCr ROI(FD) [20], and skin probability and eigenporn on YCbCr (SEP) [17] and Multilayer-Perceptron and Neuro-Fuzzy (MP+NF) [1]).

Method
The diagram block of our proposed pornographic recognition is presented in Fig. 1.The main concern of this research is to design compact holistic features which consist of shape and dominant skin information of pornographic images.The compact holistic features are extracted by moment and Discrete Cosine Transforms on pornographic images under skin ROI.The classification process is performed using MNN.The main difference of this method to the eigenporn [17] and the MP+NF [1] is placed on the features and classification algorithms.The eigenporn based method implemented the eigenporn extracted by PCA [21], [22] and k-nearest neighbor [23], [24] for classification and MP+NF based method implemented the multi features and Multilayer-Perceptron and Neuro-Fuzzy for classification.

Features extraction
The compact holistic features consisting of shape and dominant skin information are presented by vectors representing the global information of pornographic or non-pornographic images.In this case, the compact holistic features, which are design to require limited memory space, are extracted from the chrominance components (Cb and Cr) of the input images.The intensity component (Y) is not included in the features because it is susceptible to lighting variations.Briefly, the features extraction starts from pre-processing, ROI extraction, and shape and frequency analysis.

Pre-Processing and ROI extraction
The pre-processing process relates to image resizing and lighting normalization.The input image is scaled into size 256 pixels by keeping the height and width ratio to decrease the computational time of the next process.In order to decrease the lighting effect on images, the histogram equalization is employed for normalization.The skin pixels of images that have much lighting effect fail to be classified as skin without lighting normalization.In other words, the histogram equalization is applied to handle large variability of lighting variations of the input images.
Next, the ROI extraction is started from pixels-based skin classification to obtain skin tone image.Some methods for pixel-based skin classification are presented by authors [1], [9], [25], [26].Among the existing methods, the best performance was provided by pixels based skin classification on YCbCr color space [9], [17].Therefore, pixels-based skin classification on YCbCr color space is employed for extracting skin tone of images as presented in Algorithm 1 (Fig. 2).
In this case, the Cb and Cr values are in the range of 16-240.These values are the output of the RGB to YCbCr transformation algorithm [27].Additionally, the morphological also included in this process to remove false positive skin classification.This model has been proved to provide better performance for recognizing the pornographic images [17].The output example of skin tone extraction algorithm is given in Fig. 3. From the skin tone image, the ROI is extracted by using vertical and horizontal projection [17].After obtaining skin tone, the ROI extraction is started from performing the vertical (rows) and horizontal (column) integral projection to know the coordinates having large of skin and non-skin region using.Secondly, the vertical and horizontal projection probability having less than a defined threshold is removed.In this case, by trial and error, the best threshold can be set as 0.25 of maximum vertical and horizontal projection probability.Thirdly, the skin tone is cropped using the x and y coordinates where the vertical and horizontal projection probability is thresholded.Finally, the cropped skin tone is mapped to the original image to get the skin ROI image.In this case, the skin ROI itself is implemented to eliminate non-skin information of pornographic images and to minimize huge pornographic images variability because of variations of background.

Shape and Frequency Analysis
Frequency analysis has been implemented to extract holistic features of an image such as a Fast Fourier Transform (FFT) [11], [28], Discrete Cosine Transforms (DCT) [1], [16], [17], and Discrete Wavelet Transforms (DWT) [13], [14].In this research, we propose a different scheme from the mentioned methods in terms of the combination of shape information and frequency features of the skin ROI images.The shape information is extracted using invariant moment algorithm to figure out the huge pornographic images variability due to alterations of pose and the DCT is implemented to get the holistic skin information of ROI images which are robust to rotation and scale variations.In this case, the DCT algorithm that is employed to extract predominant frequency contents of ROI image (I) having size N, M is presented by the Eq. ( 1).From the two-dimensional DCT transformation coefficients, holistic skin information is selected from Cb-Cr color space by two processes: convert the DCT transformation coefficients to onedimensional vector using zigzag rules as implemented in jpeg compression and select first 30 elements from each Cb and Cr vectors.The shape information is just extracted from the intensity component of the image because the most information of shape is available in this component.Finally, from both shape and skin information, the compact holistic features (HF) is composed by placing them as a vector as presented in Algorithm 2 (Fig. 4).

Multi-Layers Neural Network (MNN) Model
There are many types of neural network that can be implemented for pattern recognition, which is distinguished by the architecture of neuron, kind of training, number of layers, etc.Generally, the MNN architecture is shown in Fig. 6 which consist of input vector (p), bias (b), weight matrices (W), and transfer function (f) [29], [30].The weight matrices connecting to inputs vector is called as the input weight (IW), while the weight connecting to outputs layer is called as the layer weights (LW).In addition, superscripts for the various weights indicate the weight of the source/origin (second index) and the destination/target (first index).From Fig. 6, the output of the neural network is defined as (2).In this paper, an MNN model is employed for classification because it can work powerfully and quickly.For example, two layers neural network, where the first and the second layer are sigmoid and linear respectively, can be trained to estimate any function/task (with a limited number of discontinuities/disruption) adequately.However, the best variation of the layer and transfer function (how many layers and what the transfer functions are) has to be investigated by performing several experiments.

Recognition Process
Recognition process has two main stages: training and matching.The training needs a dataset consisting of both non-pornographic and pornographic images.The compact HFs of the input data is extracted by using Algorithm 2 (Fig. 4).Next, features selection are carried out to remove similar features of extracted compact HFs by intersection operation (3).
Where HFP,N is final trained HF, HFN and HFP are compact HFs of non-pornographic and pornographic images, respectively.From these sets, the global mean and standard deviation vectors of each HFP and HFN called as P, P and N, N are determined, respectively.Next, in order to obtain the minimum difference of the mean and standard deviation of both training set, the distance P, P and N, N are calculated.The HFs having smallest score are concluded as shared information which is removed for getting most discriminant information.
Next, the MNN model is trained using the HFP,N, which is supervised by two targets vector.The first target for pornographic HFP is [1 1 1 1 1 1 1 1] and for non-pornographic For instance, a two-layers neural network with linear, log-sigmoid, and tan-sigmoid transfer functions could achieve the goal of setting error when it was trained using HF (size 64 elements) and defined targets vector.
Finally, the classification is performed by simulating the query HF using the obtained MNN.From the simulation output, if the output vector is close to the first target vector, the query HF is concluded as a pornographic image and otherwise as non-pornographic image.This classification process is supposed to work very fast, which is the most benefit of the classification method because the query HF does not need to be compared to all trained HF vectors.

Results and Discussion
In order to know the achievement of the proposed method, some investigations were taken by using two datasets called UNRAM [17] and Kia datasets [1].The UNRAM dataset consists of 687 pornographic and 712 non-pornographic images.While Kia dataset has 18354 images which 9295 and 9059 images are non-pornographic and pornographic, respectively.The images of both datasets were downloaded from the Internet using some downloader tools.The pornographic images of both datasets have large variability in terms of people, pose, skin.While non-pornographic images contain objects which are similar to human skin, such as flower, wood, tiger, dessert, and etc.The data treatment for the experiments was performed as follows: a) For each dataset, 50% of non-pornographic and pornographic images were randomly taken as the training set, and their remaining were applied as testing.b) The accuracy, false negative rate (FNR), and false positive rate (FPR) parameters were used for performance indicators (using Eq. (4, 5, and 6)), and c) The evaluation was performed on a pc Intel Core i3-2370M, 2.4 GHz with 8 GB RAM.
The performance indicators were determined by the formulas that were derived from the confusion table [19], [31] (Table 1 ).
where NP is total of pornographic testing images, and NN is total of non-pornographic testing images.

Porn
True Positive (TP): porn images that were correctly classified as porn False Positive (FP): non-porn that were incorrectly labeled as porn

Non-Porn
False Negative (FN): porn images that were incorrectly marked as non-porn True Negative (TN):non-porn images that were correctly classified as non-porn The first test was executed on UNRAM dataset to demonstrate that the proposed compact HF can be used to discriminate between non-pornographic and pornographic images.Furthermore, this experiment also investigated what size of HF was sufficient for pornographic image recognition.The matching process in the first experiment was performed by Euclidean distance, and the smallest distance deduces the best likeness.The test results show that the proposed compact HF, which consists of shape and dominant skin information, gives high enough accuracy, as shown in Fig. 7.These experimental results prove that the compact HF can be used to discriminate between the nonpornographic and pornographic images.It can be achieved because the compact HF has good enough discriminant information, as shown in Fig. 5. Additionally, the best HF size for performing pornographic recognition is 40 elements which are shown by the highest accuracy and small enough FPR and FNR (see Fig. 7).In detail, the best accuracy is by about 88.17% and the FNR and FPR by about 5.51% and 17.79% respectively.This experimental result also proves that the HF requires small memory space for representing the pornographic image, which implies to the computational cost of the recognition process.For further evaluation, the next experiments will be performed by the best size of HF (40 elements).
The second and third tests were conducted to find the best MNN model parameters (hidden layers and transfer functions) for compact HF classification.In these experiments, the variation of hidden layers and transfer functions were investigated to obtain their best combination of MNN model for pornographic image classification.The results of the second test denote that the best achievement is given by the MNN model having two hidden layers (2 HLs), which is indicated by the highest accuracy and the smallest FNR and FPR, as presented in Fig. 8.These achievements agree to the theory of MNN that it can estimate any function/task/problem (with a limited number of discontinuities/disruption) arbitrarily well.It means the MNN can provide crisp classification hyperplane for pornographic image recognition.Next, in order to discover the best variation of transfer functions for the MNN model having 2 HLs, the third experiment was performed using the same dataset as carried out in the second experiment.The transfers functions that were evaluated in this experiment were linear (L), log-sigmoid (S), and tansigmoid (T).The experimental results show that the best variation transfer functions are T, S, and L, as shown in Fig. 9.It means that the best parameters of MNN model for classifying the compact HF are two HLs and TSL transfer functions.Furthermore, the third experimental results also support the second experimental achievement in term of the powerfulness of MNN for classifying the compact HF of non-pornographic and pornographic images.
In order to analyze the achievement of the combination between compact HF and MNN (HF+MNN) for pornographic image recognition to the previous methods (skin probability (SP), skin region (SR), fusion descriptor (FD on YCbCr), eigenporn on HSV ROI (Ep on HSV), and SEP methods ([2], [3], [8], [9], [17], [18], [20]), the fourth experiment was done on UNRAM dataset using the best size HF and the best MNN model from the prior examinations.The test achievements present that HF+MNN gives greater accuracy and smaller FPR than those of established systems, as shown in Fig. 10.In detail, the HF+MNN method increases 0.30% of the accuracy and decreases 6.60% of the FNR.However, its FPR increases by about 5.90% of that of the best existing method (SEP [17]).Even though the FPR of HF+MNN is higher than that of the SEP method, but the HF+MNN takes very short computational time (0.21 seconds) among the existing methods.Therefore, it can be concluded that HF+MNN outperforms among existing methods.Overall, these experimental results are in-line with all previous achievements which the compact HF of skin ROIs images can be implemented to discriminate the nonpornographic and pornographic images.This performance can be achieved because the compact HF consist most significant shape and dominant skin information of skin ROI images.The next experiment was performed on the large size dataset (Kia dataset [1]) to know the robust performance of the HF+MNN method over large variability pornographic images.In this case, the HF+MNN method is compared to the latest existing method (MP+NF [1]).The experimental result shows that the HF+MNN provides similar performance to the latest existing method (almost 90% of accuracy, 10% and 7% of FNR and FPR, respectively), as presented in Fig. 11.It re-proves that the combination of compact HF and MNN gives good enough achievement for recognizing pornographic images.The variability of the large image due to skin-like and clothes variations such as images with skin-like background, dressing in transparent and mini clothes, partly porn, and dominant background, causes false classification.Additionally, false recognition also happened due to dressing in skin-like clothes.The Fig. 12  FD YCbCr[20] EP_SLDA [3] SEP [17] HF+MNN In order to know, whether the proposed recognition system can work fast, the last experiment was carried out.The experimental result shows that HF+MNN method requires much shorter recognition time by about 0.021 seconds per image than that of existing methods, which is the fastest recognition process (Fig. 13).It can be achieved because the compact HF has a small size (40 elements) for each image.However, the weakness of HF+MNN is the training time by about 194.52 seconds for UNRAM dataset.Long training time is contributed by MNN, which is well known as the main weakness of the neural network.In practice, this problem can be handled by separating the training and recognition processes.From these achievements, HF+MNN method is potential to be implemented for pornographic rejection system for a mobile platform.

Fig. 1 .
Fig. 1.Diagram block of the proposed pornographic image recognition

Fig. 4 .
Fig. 4. The process of holistic features extractionWhen the compact holistic features are evaluated from 1000 pornographic and 1000 nonpornographic images, it shows good enough discriminant information as distributed in two-dimensional space Fig.5.The Fig.5indicates that the features of non-pornographic and pornographic are separated from one to each other.It means the compact holistic features are potential to be implemented for recognizing the pornographic image.

Fig. 5 .
Fig. 5.The distribution of HF of 1000 pornographic and 1000 non-pornographic images

Fig. 7 .
Fig. 7.The effect of HF size on accuracy of the proposed pornographic image recognition methods

Fig. 8 .
Fig. 8.The performance of pornographic recognition in some MNN models

Fig. 11 .Fig. 12 .
Fig. 11.The performance comparison to the recent methods for Kia dataset

Fig. 13 .
Fig. 13.The proposed method computational time compared to that of existing methods , is the images examples which were miss classification by the proposed recognition algorithm.