Character Recognition from Virtual Scenes and Vehicle License Plates Using Genetic Algorithms and Neural Networks

Character recognition remains one of the vital research areas mainly because its application to human-machine and machine-machine communication. One example application that needs this technology is vehicle number plate recognition. With millions of vehicles on the roads today, human resources alone are insufficient in recognizing, tracking or controlling their movements. Another area in character recognition is in virtual scenes. In such scenes, the characters are written in the air by hand and captured using a cheap USB camera placed in front of a subject. Such characters are termed "Air Characters" in this work.


Introduction
Character recognition remains one of the vital research areas mainly because its application to human-machine and machine-machine communication.One example application that needs this technology is vehicle number plate recognition.With millions of vehicles on the roads today, human resources alone are insufficient in recognizing, tracking or controlling their movements.Another area in character recognition is in virtual scenes.In such scenes, the characters are written in the air by hand and captured using a cheap USB camera placed in front of a subject.Such characters are termed "Air Characters" in this work.
In this chapter, we present a character recognition method for virtual scenes (air characters) and vehicle number plate recognition using neural networks and evolutionary computation.We combine neural networks learning, image processing and template matching to create a novel character recognition system.To speed up the system and deal with size and orientation issues, we employ a genetic algorithm.Furthermore, to control the size of both the neural network inputs and the template, we also apply a genetic algorithm to guide the search.
Fortunately, many useful technologies in automatic detection and recognition have already been proposed to recognize characters.In vehicle license plate detection and recognition research is widely carried out by many researchers in many countries because of the many applications that benefit from it ranging from traffic control, crime prevention, automatic parking authentication systems, etc. Recognition of air characters will open new areas in human-machine interfaces especially in replacing the TV remote control devices and enabling non-verbal communication.Three steps are necessary in such systems.That is, the size and orientation invariant segmentation of the characters, normalization of other factors like brightness, contrast, illumination, etc. and the recognition of the characters themselves.
In [1] we proposed a robust license plate recognition method which recognized characters using a combination of neural networks, template matching and genetic algorithms.In this work, we improve the system by the introduction of the the bilateral filter for noise Today there are many OCR devices in use based on a plethora of different algorithms [21].Examples include a wavelet transform based method for extracting license plates from cluttered images achieving a 92.4% accuracy [2] and a morphology-based method for detecting license plates from cluttered images with a detection accuracy of 98% [3] .Hough transform combined with other preprocessing methods is used by [4,5].In [6] an efficient object detection method is proposed.More recently, license plate recognition from low-quality videos using morphological and Adaboost algorithm was proposed by [7].It uses the haar like features proposed by [8] for face detection.Furthermore, in our earlier work [9], we proposed a license plate detection algorithm using genetic algorithms.All of the popular algorithms sport high accuracy and most high speed, but still many suffer from a fairly simple flaw: mis-recognition that is often very unnatural to the human point of view.That is, mistaking a "5" for an "S", or a "B" for a "8",etc.
In this work, we extend and largely improve the work in [1] to also recognize the characters using more features and a hybrid system consisting of neural networks and template matching.The genetic algorithm used in [1] has been re-designed to improve its detection accuracy.We extract bifurcation, end and corner points from candidate characters and used a hybrid system made of neural networks, template matching and genetic algorithms to solve such mis-recognition problems.We also test this method using air characters that prove that the method is effective for different types of characters.However, the results are affected by character segmentation whose results are not perfect yet.
The effectiveness of various types of neural networks to solve a variety of problems has recently been shown in [10] for partially connected neural networks (PCNN), [11] for recurrent neural networks (RNN) and [12] for perceptron neural networks.This adds confidence to the use of neural network in learning problems.Character recognition using neural networks to determine a threshold is proposed by [13].However, since character shearing is not handled exhaustively, the accuracy is not high enough.As stated, although a lot of work has been done in this area, as far as we can tell, our proposed use of genetic algorithms and artificial templates has not been used anywhere else.

License plates
This work uses license plates images of vehicles that were taken near a parking lot with the target vehicle coming towards the camera and then turning towards the right or left [1].
The license plates are divided into several categories based on colors and arrangement of the characters on the plates.In private vehicles, the plates have a black background and white characters, for taxis it is the exact opposite, white backgrounds with black characters and a variety of other kinds based on special regions, etc.Moreover, there are single and double row plates.The characters on the plates are all alphanumeric (All upper-case).However, the alphabets I, O and Z are not used in the license plates.
In the database used in our experiments, there are 6444 images of 46 cars each captured in a different number of frames.Each of the images is 320*240 pixels.In some of the images, there are no vehicles and hence no license plates to detect.Although, there are single and double row license plates, the background color of plates is either black or white.Moreover, the number of characters in the plates may also differ.Therefore, the length of the license plates is also different.A sample of these plates is shown in Fig. 1.
The details of the extraction, from the database, of the license plates locations and the character segmentation methods adapted in this work can be found in [1].

Air characters
The database for the air characters is created using images captured in our laboratory.An USB camera is placed on top of a computer display in front of a subject.In this experiment, the subject is asked to use the right hand's pointing finger to write characters in the air in front of the camera.Preprocessing steps required include the detection of the face region to make sure that it does not interfere with the character segmentation because both region are characterised by similar skin color.The tip of the finger is extracted and then tracked to trace the character.The process is as follow: • An open palm in front of the camera indicates the home position.This is important to disable the unnecessary detections at the beginning.
• Extract the face region to make sure the skin pixels inside it don't interfere with the tracking.
• If only the pointing finger is visible, start tracing the movement.
• To end a character, pause for about 1 second.
The detection of the face can be accomplished using several methods.In this work, we use the method that uses neural networks and genetic algorithms proposed in [20].
The segmentation of the palm region uses colour, based on the HSV colour space and dynamic thresholds.Colour based extraction is chosen because it is fast to process and invariant to rotation.The conversion from the RGB color space to the HSV color space can be performed using the following expressions.
An example air character capture scene is shown in Fig. 2.

Character region segmentation
Candidate character regions should be extracted from target images to improve the search accuracy and speed up the process.Raster scanning images for a character is both slow and inefficient.Therefore, a character segmentation method is vital.
Character segmentation is a process in which the areas that are likely to contain characters (search candidates) are extracted from the image.Thereafter, character recognition methods are applied only inside the segmented candidate regions, hence, speeding up the search.However, character segmentation is not such a trivial problem.Although it is assumed that only one character will be included in each segmented area, this is not usually the case.Any method assuming one character per region will thus fail because such recognition methods (for example, template matching) rely on the number of characters and the size of the extracted region to set their parameters.Therefore character segmentation must be accurate otherwise the results of recognition will be adversely affected.

Noise reduction
Once the character candidate regions have been determined, noise reduction is carried out before image binalization by smoothing the region using the bilateral filter.Smoothing reduces the noise and average brightness of an image.The bilateral filter is defined by eq.4. where: σ 1 ,σ 2 are the geometric spreads chosen based on the desired amount of low-pass filtering required.
However, with the bilateral filter, we must decide the optimal window size during smoothing.The best results were obtained with filter window sizes of 5*5 and 7*7.In addition, the value of parameters σ 1 and σ 2 is set to 5.

Image binalization
The most basic image binalization technique is the selection of a single threshold value for the whole image.All the grey levels below this value are classified as black, and those above white.However, it is almost impractical to clearly segment an image into objects and background with a single threshold value because of noise and illumination effects.
Therefore, in this work, to binalize the image, the Variable Threshold Method (VTM) is used.It sets a domain in an image and uses the average of the brightness in the domain to set a two level threshold.The processing windows are set in the image beginning at the top left corner of the image.Inside a window, the extraction thresholds are automatically set using average region brightness.However, it is not easy to decide the optimal size of the window for best extraction accuracy and short processing time.Therefore, we experimented using several window sizes between 5 and 20.
To improve the performance of the conventional VTM method, we employ the Discriminant Analysis (DA) method to automatically determine the threshold in the windows.The DA method classifies data into two classes solving for eigenvectors which maximize between-class variance and minimizes the within-class variance.The algorithm proceeds as follows: For a given image, where ω i , M i , σ 2 i and M T represents the total number of pixels in class, class average brightness, class variance and overall average brightness respectively, the within-class variance (σ 2 W )isgivenby: and the between-class variance (σ 2 B ) can be calculated using: The total variance is given by: The threshold can be determined by maximizing σ 2 B in the following equation.
After image binalization, labeling is carried out to extract the blobs.We expect one blob per segment.Therefore, we select the largest blob as the candidate and delete the others as noise.The major advantage of noise deletion is the reduction in the number of the blobs.This improves the systems speed by eliminating unnecessary computation.The results of this process are shown in Fig. 3.
Character Recognition from Virtual Scenes and Vehicle License Plates Using Genetic Algorithms and Neural Networks

Feature extraction
Two types of features are used to recognize the characters, brightness and shape.Brightness features can be captured from the image data directly.For shape features, after image thinning, linear regression is used to extract straight lines and circular regions common in several characters and numbers.

Image thinning
A image skeleton is useful because it represents the shape of the object in a relatively small number of pixels [16].This reduction of information speeds up the other analysis or recognition processes performed after.Thinning is an iterative technique, which extracts the skeleton of an object as a result.In every iteration, the edge pixels having more than one adjacent background pixels are eroded if their removal does not change the topology of the character.
We employ a skeletonization method called the Zhang-Suen Thinning Algorithm [17] because it is fast and easy to process.This skeletonization algorithm is a parallel method that obtains a new value depending only on the previous iteration's value.The algorithm can be implemented using two iterations.In the first iteration, a pixel is deleted (in order) if • it has at lease two and at most six neighbours • left, top and bottom neighbours exist

• right, top and bottom neighbours exist
The second iteration phase is similar to the first except the order of the last two processes is reversed.The algorithm terminates if no more pixels can be deleted after the two iterations.Figure 4 shows some example results achieved using the Zhang-Suen Thinning Algorithm.However, the results were not perfect for all characters.A closer observation of the "M" in Fig. 4 shows that some areas still have more than one pixel width.A pruning algorithm is therefore necessary to remove such noise.The results of pruning are shown in Fig. 5.

Bifurcation and end points extraction
To define the shapes of the characters, it is important to extract the bifurcation and end-points from the results of the thinning operation.We define a bifurcation point as one with three neighbors and an endpoint as one with only one neighbor.In this work, we deal with license plate characters which are printed and hand written virtual characters.Table 1 shows the number of bifurcation and end-points for each character extracted manually by visual observation.
Extraction of the bifurcation and end points from virtual characters produces similar results to those shown in table 1 because they are based on our intuition about how characters should look like.
However, printed characters like the ones on license plates produce different rather surprising results.For such characters, the binalization and thinning results are somewhat different.For example Fig. 6 shows the thinning results of the character "V" extracted from a license plate.Whereas the manual extraction gives no bifurcation points and 2 end points, the results of automatic extraction are 1 bifurcation point and 3 end points.In fact, "V" and "Y" have the same number of bifurcation and end points.Therefore, there are significant differences in the number of bifurcation and end points for the virtual and printed characters.In Chara-Bifur End Char-Bifur End Char-Bifur End cter points pts cter pts pts cter pts pts table 1, the values in the brackets show extracted points for the character when the value was different for the virtual and printed extractions.As shown in the table, characters "4","8","B","G","J","K","M","N","Q","R","V","W" and "Y" are affected.
Figure 7 shows a set of bifurcation and end points extracted from license plate characters.

Corner points extraction
Bifurcation and end points provide information about the general shape of the characters and can help divide the character in segments for further processing.These segments are assumed to contain either straight lines or curves.However, to help differentiate characters like "S" and "5" or "2" and "Z" which have two end points each and no bifurcation points, or "B" and "8" with two bifurcation points and no end points, it is important to extract the corner points.
Corner points help further classify segments with both straight and curved sections.

Shape features extraction
To extract the shape features, the bifurcation and end points are used.We check for straight lines and circles between a combination of the points.

Number of segments and length
Based on the number of bifurcation and end points, we can determine the number of segments making up the character.Each segment is created between bifurcation points, end points or a combination of both types of points.Moreover, each segment length in pixels can then be calculated.This length is then normalized by dividing it by the height of the character.The normalization should offer some size invariance during training and testing.

Lines: Linear regression
There is need to determine which of the segments are lines and which are circles.Initially, we should assume that the segment is a line and use a line extractor to find it.Hough transform [18] is a widely used line extractor.However, in this work, we cannot use it because we need to determine how well the extracted line fits the data.Therefore, we use linear regression to fit the line because we can calculate an error that tells us how well the lines fit.
Linear regression [19] is the least squares estimator of a linear regression model with a single explanatory variable.It fits a straight line through the set of points by minimizing the vertical distances between the points of the data set and the fitted line.The fitted line has the slope equal to the correlation between y and x corrected by the ratio of standard deviations of these variables.
After fitting a segment, we set a threshold for the error to decide if the segment is a line or should be passed to the curves(circle) extractor.
Character Recognition from Virtual Scenes and Vehicle License Plates Using Genetic Algorithms and Neural Networks

Curves
Most characters contain some form of circle like curve.Therefore, we can process for circularity as a character feature.This value is calculated between 0 to 1.As circularity approaches 1, a near perfect circle is extracted.It can be calculated using Eq. 9. Area and circumference are used. 4Sπ Where S is the enclosed area and L is the circumference.
The area and circumference can be easily calculated from the segment information.

Character recognition
In this work, we chose Neural Networks (NN) as the main classifiers because of their proven effectiveness to learn multi-dimensional and non-linear data [10][11][12].There are 26 alphabets and 10 numerals that must be recognized.

Neural network
Although neural networks can learn from large non-linear data, the processes requires thousands of training examples and huge computation time.Therefore, to effectively use neural networks in real time, their structure should be simple and the number of classes to be learned should be minimized.In this work, instead of creating one neural network to learn the 36 characters, we use "divide and conquer" method to train highly specialized compact forms.
The subdivisions are based the following features: • Table 2 shows a list of all the features extracted from the characters.
Using this information, we can divide the characters into seven neural networks as shown in Table 3.
The neural networks are selected to be 3 layered trained using the back propagation algorithm [14].The size of the training sample is 15x20pixels.There are also nodes to represent the presence of lines(1), angles(3), circles(1) and their numbers (2).Therefore, the number of units in the input layer is 307.
The system is trained to produce an output of 0.95 for the node representing the character or numeral being learned and 0.05 for all the other output nodes.To further reduce the size of these neural networks, improving the training and test speeds, structural learning with knowledge [15] is used to supplement the error back propagation method.
Chara-Bifur End Corner Straight Angles Curves cter points points points Lines 0 All extracted character features.

Genetic algorithm
The neural network training and the character template creation data is extracted using a genetic algorithm for better normalization.The genetic algorithms can extract character

Template matching
Although template matching is computationally expensive especially for large images, it is used in this work as a preprocessing step to help divide the characters into different higher classes.To reduce the computational cost, we have selected a relatively small character region.
The templates used in this work for each of the characters are constructed from the same data used to train the neural network.Therefore the initial size of the template is 15*20 pixels.Each template is the average of 10 images at random from the minimum 20 images extracted for neural network training.Note that the height and the width of the template are fixed.

Experiments
This work uses license plates images of vehicles that were taken near a parking lot with the target vehicle coming towards the camera and then turning towards the right or left [1] for license plate character recognition and air characters captured in our laboratory as explained in sec.2.2.We use 3 subjects each writing the characters a total of 4 times.However, the characters "I", "O","T" and "Z" are missing in the license plate database.This work is carried out using a computer with the following specifications.The processor is Intel core 7 CPU, operation at 3.47GHz and an installed memory of 4GB.

Procedure
The overall system procedure is as follows.

Results
There are two experiments in this work.License plate and air character recognition.The experiments are still ongoing but we can report that we achieved better results than those in [1], in license plate recognition.The air characters have so far produced an accuracy of over 94%.
For each character region, the neural network and template matching method were initially tested individually.To use the two methods in a hybrid system, this time we make an initial guess using the template matching method and confirm the results using a neural network.
The results of these computer simulations for character recognition are shown in table 4.These are the average results for all the 30 characters learned.The total number of characters used for testing was 4268.For comparison, the method described in [1] achieved an accuracy of 94% using a neural network and template matching for licence plate detection only.The results of this work show a 3% accuracy improvement because of the different features used.

Discussion
Initially, we hoped to use the same neural network to recognize all characters weather printed or virtual.However, although human visual observation make it look like they are similar, the results of the thinning process produced completely different rather surprising results.Therefore, two system are required.
Neural network training depends on the number of training samples available.In this work, the samples used per character vary between a minimum of 10 samples for letters (U, X) to a maximum of 200 for letter W and numerals (7,9).Generally, neural network require a lot of data to train.This phenomenon is also observed here where the accuracy results of characters with more training samples are better.
The air character data collection needs some improvement to reduce the processing time.
Although face detection is useful for position extraction etc, it takes valuable time that could be used to improve the character segmentation process.

Conclusion
In this chapter, we presented a character recognition method for virtual scenes (air characters) and vehicle number plate recognition using neural networks, template matching and evolutionary computation.We combine neural networks learning, image processing and template matching to create a novel character recognition system.To speed up the system and deal with, size and orientation issues, we employ a genetic algorithm.In addition, to control the size of both the neural network inputs and the template, we apply a genetic algorithm to guide the search.Average accuracy of about 97% and 94% were achieved for the license plate and virtual characters respectively.
In future we must find ways to combine the different recognition systems to universally recognize all characters and expand the work to include the recognition of lower case characters as well.Computation time especially for air character recognition must also be improved.

Figure 1 .
Figure 1.Example of plates in the Database

Figure 2 .
Figure 2. Air character capture scene.(a) Home position (b)Start tracing position

Figure 3 .
Figure 3. Binalization using the VTM and DA methods.

113
Character Recognition from Virtual Scenes and Vehicle License Plates Using Genetic Algorithms and Neural Networks

Figure 7 .
Figure 7. Bifurcation (red) and end (blue) points extracted from license plate characters.

3
Number of bifurcation points •N u m b e r o f e n d p o i n t s •N u m b e r o f s e g m e n t s • Number of straight segments • Straight Line angles • Number of curved segments • Circularity (0 to 1 value) of each curved segment • Length of segment

8. 1 . 1 . Training 1 .
Train the neural networks separately, for the licence plates and air characters 2. For the license plate, there are about 10 examples per character 3.For the air characters, use 3 characters from each subject, total 9 examples each.8.1.2.Testing1.Extract the character features and use them to decide the neural network to learn.

2 .
Use template matching to make an initial guess.3. Run the neural network to confirm the results from step 2. 4. If results of steps 2 and 3 are same, end. 5. Otherwise use the results of the neural network as the final result.

Table 1 .
Manual bifurcation and end-points extraction.

Table 4
shows the results of character recognition for license plates recognition.

Table 4 .
Character Recognition Results for license plates

Table 5
shows the results of character recognition for air characters.

Table 5 .
Character Recognition Results for air characters