Skew correction for Chinese character using Hough transform

— Chinese Handwritten character recognition is an emerging field in Computer Vision and Pattern Recognition. Documents acquired through Scanner, Mobile or Camera devices are often prone to Skew and Correction of skew for such document is a major task and important factor in optical character recognition. The goal of the work is to correct skew for the documents. In this paper we have proposed a novel method for skew correction using Hough transform. The proposed approach with high precision can detect skew with large angle (- 90 to +90) the experimental result reveal that the proposed method is efficient compared to well-known existing methods. The experimental results show the efficacy compared to the result of well-known existing methods.


INTRODUCTION
Off-line Chinese character recognition is a very import branch of patter recognition.Chinese is the most population used language.Along the fast development of China, most family has had computer and internet connection already.Online library and data searching become more common during the life.Documents and data imports become a huge problem that we have to face.By manual data input takes too much time, therefore improving the Chinese OCR rate and speed provides efficient works.Recognition of Chinese character is considered a hard problem because of the large categories, the complex structure, and the widely variable and many similar shapes of Chinese character [1].Generally, the Chinese OCR has preprocessing, feature extraction, recognition, and post processing (see in Fig 1).Effective preprocessing method is an important factor to ensure the recognition rate of whole OCR system.Preprocessing has text layout segmentation, line segmentation, and character segmentation.Because of the skew angle while scanning or writing, it makes the characters in input image skew.The skew characters will seriously affect the recognition results.Therefore, character skew detection and transformation should be done before recognition (Fig. 2).The frequently used skew detection methods are Hough, Robust, MST cluster, KNN, and etc. [1][2][3][4][5][6][7][8][9][10].In this paper some experiments are performed using Hough Transform method to detect and correct the skew character image.The Hough technique is particularly useful for computing a global description of a feature, given local measurements.The motivating idea behind the Hough technique for line detection is that each input measurement indicates its contribution to a globally consistent solution.In case of the parameter space is not than two-dimensional, this kind of transformation has a desired effect.Because of excellent characteristics of Hough transform, such as sensitive to local defects, random noise robust and parallel processing.Hough transform method is widely used in image processing, pattern recognition, and computer vision regions.SKEW ANGLE DETECTION Hough transform (HT) is a technique which can be used to isolate features of a particular shape within an image.The classical Hough transform is most commonly used for the detection of regular curves such as lines, circles, ellipses, etc. Presently, HT is widely used in image analysis, computing vision, and pattern recognition [2].It becomes a standard tool on pattern recognition.HT parameter space's peak detection is a problem of cluster detection.The thresh value selection is the key to success.There are two methods of Hough Transform.One method is weighted image space to change the peak distribution of parameter space.The other method is directly searching the maximum value of parameter space.

A. Preprocessing
Hough transform algorithm is mainly applied on binary image (i.e.edge image).Therefore gray image preprocessing should be done before Hough transformation, such as image filtering and edge detection.The preprocessing is the most important preliminary work of Hough transformation.The result of preprocessing directly influences the skew result.Gaussian noise and Impulse noise are two wide known common image noises.In this paper, a method based on multiple median extractions bilateral filtering was used.This method covers both spatial neighborhood correlation and pixel intensity similarity.Through the reference pixel value, which selected by pseudo-median filter, it beautifully protects the edge of the testing image that has Gaussian noise and Impulse noise.Compare to past noise filter methods which only aim to one kind of noise, multiple median extractions bilateral filtering can process images with mixed noise, and it has a good filtering result.Using the method of iterative bilateral filtering instead of Canny operator's Gaussian filtering process and adaptive filtering process, it avoids the fuzzy edge of the images which caused by filtering process, and gets a better edge detection result.

B. Rough transform algorithm
Standard HT uses the voting way by ballot to realize the straight line test, take an image NXN as the example, the value Firstly, the  space discrete into M parameter sector, and thought a non-zero (x, y) possibly belongs to each θ parameter sector of the image space, in other words, it may belongs to any straight-line.In each  parameter sector, according the equation (1) to calculate the value of ρ, then falling into the corresponding ρ parameter sector, finally check each ρ parameter sector voting number( the length of the straight line), if it more than the setting threshold ,it will be considered to be a straight line.

C. Algorithm implementation
The algorithm concrete realization step is as follows: a) Ascending order searches in the image, until a nonzero point p of the image was found, take this point as the seed point.
b) Take the P point as the original point to select a m x m zone A p , searching the none-zero points in the A p zone, then according to the equation ( 1 ; Find out the parameter zone of maximum vote n max from A p , and get the average value of (ρ i , θ i ), to reduce the influence of quantitative error, and get ) , (

 
, set it as the straight line parameter which cross the point P of Ap. d) Set Ti as a length threshold of A p , if n max >T1, then turn to step (6) and search others points that belonging to the line.Otherwise, the line which cross point P is not exist, and set gray value of P as zero, then turn to step (1), and restart searching.
e) Set the search zone into entire image, according to equation (1) to calculate the value p of non-zero point P, which have been found, and set the value , then p belongs this line, then cumulate the voting number, and set the gray value as zero.
f) After entire image searching, if the voting number of the line is larger than preset threshold value T2, then the line is exist, get the parameter and put into set line(n), n is the line number detected.Otherwise the line does not exist.
g) Set the gray value of P as zero, return step a), and restart searching, until there is no non-zero point in image.

III. CHARACTER IMAGE ROUGH TRANSFORMATION
Skew image correction is the transformation of coordinate.Set (x, y) as the dot coordinate of P, then transform θ degree to get image P'.The corresponded coordinate (x, y) is (x', y').Set the values θ positive to clockwise, and negative to counterclockwise.Then the coordinate can be represented as follows: The image was form by pixels, which is discrete variable.Images can be transformed through the following methods.To horizontal scanned or captured images, while the direction of pixel steps is up-towards, the angle of P is , rectwidth is width which has selected; while the direction of pixel steps is down-towards, the angle of . To vertical scanned or captured images, while the direction of pixel steps is right-towards, the , rectheight is height which has selected; while the direction of pixel steps is left -towards, the angle of P is . While skew correcting, clockwise skew images are token.During determining the skew angle, the lines are represented by subpixel step; therefore the skew correction can be implementing through the corresponding offset of each pixel decrease or www.ijacsa.thesai.orgincrease.The following below is the horizontal and vertical skew correction functions.
In the function, rectwidth stands for width of rect, INT () is a function to get the closest integer value.Skew angle is positive while step is up-toward, or negative while downtowards.
In the function, rectheight stands for height of rect, INT () is a function to get the closest integer value.Skew angle is positive while step is right-toward, or negative while lefttowards.Generally, the new image after transform is larger than the original image.The new height and width is: When the skew angle is positive, the original image pixel of column i and row j corresponds column (i+Y_shiftj) and row (j -X_shifti + X_shiftold_height) of new image.When the skew angle is negative, the original image pixel of column i and row j corresponds column (i -Y_shiftj + Y_shiftold_width) and row (j + X_shifti) of new image.The proposed approach with high precision can detect skew with large angle (-90 to +90) the experimental result reveal that the proposed method is efficient compared to well known existing methods.Table 1 gives the experimental result by using our proposed method.The experimental results show the efficacy compared to the result of well known existing methods.The results show Hough transform has a good performance on noise image skew correction, which does not include tables.Fig. 3 and 5 are scanned images, which skewed while scanning, and according to the experiment result, using Hough transform to correct the skew document are very efficient and exact.Fig. 4 and 6 are camera captured images with a skew angle, which caused by camera capturing angle.The experiment gets a good result too.Therefore, after image document preprocessing, and using our proposed method, it improves the efficient of image skew correction.Hough Transform has high Noise immunity and adaptive.In summary the paper reveals that only few works have been reported Skew correction for Scanner and camera captured documents .In our work we have corrected skew using Hough transform.Experiments have been performed for documents that without graphics and also for noisy images.Experimental results reveal that it works better for both types with graphics and without graphics and therefore proposed method is efficient, novel and accurate for skew documents.Where as in the case of noisy images the proposed method performance degrades as the noise density increases.

Figure 2 :
Figure 2: Preprocessing of Chinese OCR II.SKEW ANGLE DETECTION ), and calculate the straight parameter pair ( i  , i  ) of every non-zero point pi and P. c) Make the deviation range  as   , the deviation range of  as   , voting in A p , counting the number n i of the parameter pair which falling into every parameter interval )

Figure 6 :
Figure 6: Camera Captured Handwritten Chinese Numeral IV.EXPERIMENTAL RESULT AND ANALYSIS To evaluate our proposed method, we compare both scanned and camera captured handwritten Chinese character and numerals.Both dataset are documents without tables, which are listed below as Fig 3 to Fig 6.The proposed approach with high precision can detect skew with large angle (-90 to +90) the experimental result reveal that the proposed method is efficient compared to well known existing methods.Table1gives the experimental result by using our proposed method.The experimental results show the efficacy compared to the result of well known existing methods.The results show Hough transform has a good performance on noise image skew correction, which does not include tables.Fig.3 and 5are scanned images, which skewed while scanning, and according to the experiment result, using Hough transform to correct the skew document are very efficient and exact.Fig.4and 6 are camera captured images with a skew angle, which caused by camera capturing angle.The experiment gets a good result too.Therefore, after image document preprocessing, and using our proposed method, it improves the efficient of image skew correction.Hough Transform has high Noise immunity and adaptive.

Table 1 :
Four skew angles experiments