ASM-Based Objectionable Image Detection in Social Network Services

This paper presents a method for detecting harmful images using an active shape model (ASM) in social network services (SNS). For this purpose, our method first learns the shape of a woman's breast lines through principal component analysis and alignment, as well as the distribution of the intensity values of the corresponding control points. This method then finds actual breast lines with a learned shape and the pixel distribution. In this paper, to accurately select the initial positions of the ASM, we attempt to extract its parameter values for the scale, rotation, and translation. To obtain this information, we search for the location of the nipple areas and extract the location of the candidate breast lines by radiating in all directions from each nipple position. We then locate the mean shape of the ASM by finding the scale and rotation values with the extracted breast lines. Subsequently, we repeat the matching process of the ASM until saturation is reached. Finally, we determine objectionable images by calculating the average distance between each control point in a converged shape and a candidate breast line.


Introduction
We are currently living in the Internet age, in which we can encounter a seemingly infinite amount of information due to the rapid development of computer technologies and information communication.Therefore, anyone can easily obtain the desired information because of the ease of access to such information, but it is also easy for us to obtain harmful images.As a result, this possibility is beginning to emerge as a social issue.
Approaches to accessing various harmful images via the Internet are being diversified, including having access through P2P sites, blogs, bulletin boards, social network services (SNS), and more.In addition, such access can be done through spam texts using E-mail or multimedia messaging services (MMSs).To prevent access to such a variety of harmful information, we must devise a way to determine the hazard of texts and images and prevent this type of occurrence.However, determining the hazard of texts is not effective because of their dependency on language.
Therefore, as a way of blocking harmful images, the necessity for an image-based scheme is increasing [1][2][3].
In recent years, several methods have been used to solve indiscriminate access to harmful images.For example, access control through URLs and harmful words has been applied.However, because this approach limits access to the site and the word only, it cannot be an essential solution to the problem [4].To solve these problems, many studies have been conducted, and most of the studies address a scheme that uses the features of skin color [5][6][7][8].However, although the skin color is used to detect adult images, the pattern of human skin color is different for each race and can be observed even in areas where human skin does not exist.
These disadvantages mean that the determination of objectionable images is obscure and this ambiguity is in accordance with not exhibiting good performance.Therefore, this paper presents an effective algorithm to detect harmful images by applying the more unique features that are found in woman's breast areas.Figure 1 shows the overall flow of the suggested algorithm.Our algorithm consists of two major  phases: a learning phase and a search phase.In the learning phase, we receive an image for learning and set control points for the active shape model (ASM).We then extract the positions and pixel intensity distributions of the selected control points.If the learning process for all of the images is performed, we align the ASM that is expressed in the form of a vector, perform principal component analysis, and compute the covariance matrix for the pixel intensity distribution of the control point.
In the search phase, we extract edges from an input image by using the Canny edge detector and perform dilation operation on the extracted edges followed by labeling.We then locate the positions of the nipples by computing the compactness and the ratio of the width to height for each labeled region.On the basis of the extracted nipples, we generate a data structure that uses the first edges that are at a distance that is higher than a predefined threshold, and we select the candidate breast line among them.Such a process is based on the fact that a breast line is curved and is a shape that is similar to a circle [9].
For this purpose, we first look for the center of a circle by analyzing the generated data structure and then select and expand a candidate breast line by using the distribution of the distance values.After choosing the candidate breast line, the search process of the ASM is performed with the information on the scale, rotation, and initial positions.Subsequently, we locate the mean shape of the ASM by using the acquired position parameters, and we find a new form of the model through the pixel intensity distribution and the edge information.If the extracted shape is considered to be acceptable, the search process is repeated.Otherwise, we look for an acceptable shape by investigating the next candidates.Finally, we decide that the obtained form is the real breast line when some saturation condition is reached.
The remainder of this paper is organized as follows.Section 2 explains the learning scheme that uses the ASM.Section 3 explains how to choose the initial positions of the ASM.Section 4 shows the method of searching with the model and finally detecting the breast areas accurately.In Section 5, we present several experimental results to show that the suggested approach can detect harmful images effectively, and conclusions are presented in Section 6.

Learning ASM
2.1.Definition and Alignment of the Learning Shapes.An ASM was first proposed by Cootes et al. [10].This approach consists of a learning phase and a search phase.In the learning process, we extract averages, eigenvectors, and eigenvalues for the ASM.In addition, we obtain the covariance matrix and the average of the intensity distributions for the control points.The average shape extracted in the learning phase is used as an initial shape in the search process, and the eigenvectors are used as a basis for determining whether a shape is acceptable or not.The covariance matrix for the distribution of the pixel values is used to obtain the Mahalanobis distances, which works as a criterion when the current shape model is transited into a new one [11].
In the learning process, we must select the learning data and define a shape model for the selected data.Figure 2 shows how to learn the shape of the model, where each control point is located at a uniform distance on the edges along the breast line.Once the shape of the model is defined through learning, each learning pattern is expressed in the form of a vector as follows: , where 1 ≤  ≤ .
(  In (1),  is the index that represents the learned shapes and  denotes the number of shapes. means the index that represents the control points, and  and  are the positions of each control point.For example, the control points in Figure 2 are represented as a vector of 18 elements.
After representing all of the learning data as a form of vector, they are aligned.In this paper, we align all of the other data with the first entered data.The received learning data has its own size and position.Therefore, we move the center of gravity of each data to the starting position and normalize its norm to be 1.We then rotate and align the vectors in each normalized form by using the singular value decomposition (SVD) method [12][13][14].
Figure 3 shows the aligned learning data and the mean.The red squares represent the mean shape, and the yellow crosses indicate the aligned learning shapes.

Principal Component Analysis.
To use an ASM, an acceptable form must be defined.The reason is that the search should be performed while maintaining the form and that the final search is within an acceptable range.For this purpose, principal component analysis (PCA) is used [15,16].We define the variance of the learned data that is obtained through PCA as the criterion of an acceptable form.
For the PCA, we obtain an average form as in (2), and the covariance matrix is computed as in (3).We then calculate the eigenvalues and eigenvectors of the covariance matrix as follows: Equation ( 4) represents a vector that uses the eigenvector and the mean vector.Because we obtained the eigenvector by using the covariance matrix of the learning data, we can express a form with the eigenvector  and the weight .The reason why = is represented as ≈ in ( 4) is that we reduce the dimension of the eigenvector to speed it up: Figure 4 is a graphical representation of the meaning of PCA.In this paper, PCA is performed to define an acceptable shape by acquiring the main axis and the distribution, namely, the eigenvalue, of all of the learning data.In Figure 4,  is represented by using  and  representing the distance from , where  1 and  2 are used for the weighting factors of the eigenvector.Thus, to be included in the distribution of Figure 4,  1 and  2 must be within some range [17,18].Figure 5 shows the shapes that we have changed to for the largest eigenvalues  1 and  2 after sorting the above extracted eigenvalues in descending order.In Figure 5, the left and right control shapes are ones that are produced by applying the weights −3√  and 3√  to the corresponding eigenvectors, respectively.The middle shapes represent the mean shapes.In the upper shapes, the weight changes of the eigenvectors indicate how high the breast lines are.The lower shapes represent the rotation values of the breast lines.The weights obtained here, namely, the eigenvalues, are used as thresholds in the search phase when acceptable shapes are determined.

Learning the Distribution of the Pixel Values.
In this paper, to transit into a new shape, the selection criteria of the control points are required.For this purpose, we use the distribution of the pixel values of each control point.Figure 6 shows how to extract the pixel distribution.In Figure 6, we set the line perpendicular to a tangent line of each control point (blue line) as the area for extracting the pixel distribution values.
Equation ( 5) expresses the pixel distribution of each control point as a vector.In (5),  and  are indexes for the learning data and control points, respectively, and  is set to 5: Equation ( 6) is the derivative of (5).Thus,   is a vector of 2 dimension.Equation ( 7) normalizes   .By differentiating and normalizing in this way, we can minimize the sensitivity to the image changes: Equation ( 8) is an equation for obtaining the covariance matrix for the distribution of the pixel values in all of the control points, to acquire the Mahalanobis distance with new control points in case of a transition to a new shape in the search process.Here,   is the average vector of the th control point, and  and  denote the indexes of the learning data and control points, respectively. indicates the number of learning data.Consider

Selecting the Initial Position of the ASM
After learning the shapes and computing the distribution of the pixel values, we conduct a search process in which the initial shape and position of the ASM must be set up.
Particularly, the initial position of the ASM is very significant because a search area is defined not globally but locally and, thus, a malfunction occurs if there is no solution in the defined area.In this paper, we use the location of a woman's nipple as the initial position of the ASM.
To effectively extract the nipple areas, we first apply the Canny edge algorithm [19] to an input image and then perform the dilation morphological operation [20] to the extracted edge image.Figure 7(a) is an image that is obtained through the Canny edge detection, where there are not many edges near the nipple areas.Figure 7(b) shows an image that is produced by applying the dilation operation to the edge image.Figure 7(c) is the labeled image of the dilated image, where we can see that the labeled nipple areas appear to have discriminative features.In this paper, to detect real nipple areas among the labeled areas, we use two main features of each region: the density and elongatedness.Equation ( 9) denotes the density of each region, where   (MER  ) and  ℎ (MER  ) are the width and height of its minimum enclosing rectangle (MER), respectively.  is the number of pixels that are contained in the th labeled region, and MER  denotes its MER.Equation (10) represents the elongatedness of each region.Usually, a nipple area has a high density, and its elongatedness is close to 1: After obtaining the nipple positions, we extract the candidate breast lines according to the following process.

Algorithm of Extracting Candidate Breast Line
(1) Radiately search at intervals of 10 degrees from the position of a given nipple area.
(2) Store the location of the first edges that are at a distance that is higher than a predefined threshold, from the center of the nipple.
(3) Obtain the longest distance among the distances between each edge point and the neighboring points in a clockwise direction.
(4) If the relative distance between each edge point and its neighboring point is 0.3 times less than the longest Figure 8 shows the results of the above algorithm.Figure 8(a) shows the result of performing a radiate search based on the location of a nipple area.Figure 8(b) illustrates a detected breast line.
We then determine the initial position of the ASM.For this purpose, its scale and rotation parameters are required.For the scale value, we utilize the distance between the starting and ending positions of a breast line.Figure 9 shows how to obtain the rotation parameter.As seen in Figure 9, the rotation parameter of the ASM is calculated based on the midpoint between the starting and ending points of the breast line.
Equation ( 11) is an equation for obtaining the rotation value, where  and  are the  and  displacement from the midpoint to the starting point, respectively: The initial position of the ASM is accurately obtained by using the center of gravity of all the points that are contained in the candidate breast line.shape obtained in the learning phase is used as the initial shape of the ASM.

Searching a Target Object with ASM
Figure 11 shows the result of placing the average shape of the ASM on an image while using the positional parameters.There are two main ways of transiting into a new shape in the search process.One method uses the magnitude of the edges, and the other method uses the distribution of the pixel values [21].In this paper, we use both methods.
To perform the search process, the search area of each control point is first defined.We then compute the distribution of the pixel values for all of the candidate locations in the area and extract the Mahalanobis distance between the computed distribution and the previously learned distribution.The search area is defined by 11 pixels toward the inside and outside directions that are perpendicular to the tangent line of a control point.In general, the size of the search area has a significant impact on the search performance.If the search space is too large, malfunctions in the search process tend to occur because there are many cases of transiting to new shapes.In contrast, if the search area is too small, its initial position must be very accurate, which allows a good performance to be achieved.Otherwise, many iterations are required to accurately search the final shape or we might not find the target shape.
The Mahalanobis distance is defined as in (12), where  −1  is the inverse of the covariance matrix obtained in Section 2,   is a vector that represents the distribution of the pixel values for the th control point, and   is the average of   .
After extracting the Mahalanobis distance in the candidate locations for all of the control points, we save  the presence or absence of an edge at each candidate location.
We then sort the candidate locations in the search area in ascending order.In this paper, the transition to a new control point occurs when it contains an edge and its Mahalanobis distance is the smallest.If any edge does not exist in all of the candidate locations, a transition is made to the location of a candidate that has the smallest Mahalanobis distance.Figure 12 illustrates the search process of the ASM.The blue line in the right image means the search area, and the black square is a control point at the current time.The red square shows that the search process finds a candidate location that has the smallest Mahalanobis distance within the search area and contains an edge: Equation ( 13) checks if a new shape is acceptable or not.  is the new shape that is obtained by using the distribution of the pixel values, and  is the normalized average shape of   .  is the transposed matrix of the eigenvector.This equation is a formula that is made by arranging (4) for , which allows us to know how far the new shape is away from the average shape.
In this paper, when   is within the range of −3√  to −3√  , which are the threshold values computed in Section 2, we decide that the new shape is acceptable.If the new shape is determined to be not acceptable, a new shape is reorganized by using the next candidate.When we obtain an acceptable candidate with the reorganized shape by performing the above process repeatedly, we define a search area for the obtained shape again and search a new shape.Such a process is repeated until the shape does not change.
To determine a real breast area, we use the distance between the candidate breast line and the final converged shape.Figure 13 shows an example of detecting breasts, where blue lines represent the search direction when obtaining the distance between a candidate breast line and each control point.Figure 13(a) shows an image that is determined to be a breast, where the average distance between the candidate breast line and the converged control points is small.At the same time, the distance is relatively large in Figure 13(b).Thus, the decision is made that the image does not contain a breast.Consider Equation ( 14) represents how far the candidate line exists in the direction of the search area, as seen in Figure 13. is the direction of the search space, and  is the relative distance from a control point.Here,   and   are the coordinates of the th control point.In other words, we compute the  and  coordinates by increasing or decreasing  and determine if the candidate breast line exists in the stored data.We then calculate the distances between the control points and the breast line and their average.If the candidate breast line does not exist in the search area, the control point is excluded from International Journal of Distributed Sensor Networks calculating the average.In addition, if the breast line exists in both the inner and outer directions, control points that are at a shorter distance are included in computing the average.Finally, if the average distance is within some threshold value, we decide that a breast is detected.

Experimental Results
The proposed harmful image detection algorithm was implemented in Microsoft Visual C++ 2008 and tested in Windows 7 on an Intel Core2 Quad 2.66 GHz processor with 4 GB of memory.Our ASM is generated by using a total of 60 learning data.
Figure 14 shows examples of detecting breast regions.The left images are regions in which we detect candidate breast lines and then place our ASM at the initial position.The right images show the results that were obtained after searching for the breast areas repeatedly.We see that the initial location of our ASM changes depending on the accuracy of a candidate breast line.
Figure 15 shows the results of detecting breast areas inaccurately.The upper-left image has many edges around the nipple, and thus the edges for the candidate breast line are inaccurately selected.For the upper-right image, the edges are not extracted near to the breast line.The lower images are images after finishing the search process of the ASM.As observed in the figures, the suggested algorithm determines that the detected area is not a breast because the distance between the candidate breast line and the detected line is beyond the predefined threshold.
Figure 16 shows some of the results of applying the ASM to areas other than a breast, such as a navel.As seen in Figure 16, the distance between the candidate breast line and the control points is relatively large, compared to Figure 14.Therefore, we decide that it is not a breast area.
Figure 17 shows the accuracy result of the suggested detection algorithm with a total of 120 adult images obtained from the Internet.As shown in Figure 17, the input adult images are categorized into two types: whole-body images and upper-body images.The nonadult images are images captured in natural environments where some constraints are not imposed.Equation (15) shows how the accuracy of the suggested algorithm is calculated: accuracy = number of correctly detected images number of whole input images × 100 (%) . ( For the upper-body images, the distribution of the pixel values of breast lines changes slowly because breast areas are relatively large compared to the size of the image.Therefore, the detection accuracy for upper-body images is lower than that for the whole-body images.

Conclusions
In this paper, we have proposed a new method for effectively detecting woman's breasts using an ASM.Our method first learns the shape of a breast line through principal component analysis and alignment, as well as the distribution of the

Figure 1 :
Figure 1: Overall flow of the proposed method.

Figure 2 :
Figure 2: Setting the control points.

Figure 5 :
Figure 5: Shape change depending on the eigenvalues.

Figure 10 Figure 8 :
Figure10shows the overall flow of the search process of the ASM.In this paper, we can initially locate the ASM because all of the positional parameters have been obtained.The average

Figure 9 :Figure 10 :
Figure 9: Setting the initial location of ASM.

2
International Journal of Distributed Sensor Networks