Optimal combination of document binarization techniques using a self-organizing map neural network

https://doi.org/10.1016/j.engappai.2006.04.003Get rights and content

Abstract

This paper proposes an integrated system for the binarization of normal and degraded printed documents for the purpose of visualization and recognition of text characters. In degraded documents, where considerable background noise or variation in contrast and illumination exists, there are many pixels that cannot be easily classified as foreground or background pixels. For this reason, it is necessary to perform document binarization by combining and taking into account the results of a set of binarization techniques, especially for document pixels that have high vagueness. The proposed binarization technique takes advantage of the benefits of a set of selected binarization algorithms by combining their results using a Kohonen self-organizing map neural network. Specifically, in the first stage the best parameter values for each independent binarization technique are estimated. In the second stage and in order to take advantage of the binarization information given by the independent techniques, the neural network is fed by the binarization results obtained by those techniques using their estimated best parameter values. This procedure is adaptive because the estimation of the best parameter values depends on the content of images. The proposed binarization technique is extensively tested with a variety of degraded document images. Several experimental and comparative results, exhibiting the performance of the proposed technique, are presented.

Introduction

In general, digital documents include text, line-drawings and graphics regions and can be considered as mixed type documents. In many practical applications there is need to recognize or improve mainly the text content of the documents. In order to achieve this, a powerful document binarization technique is usually applied. Documents in the binary form can be recognized, stored, retrieved and transmitted more efficiently than the original gray-scale ones. For many years, the binarization of gray-scale documents was based on the standard bilevel techniques that are also called global thresholding algorithms (Otsu, 1979; Kittler and Illingworth, 1986; Reddi et al., 1984; Kapur et al., 1985; Papamarkos and Gatos, 1994; Chi et al., 1996). These techniques, which can be considered as clustering approaches, are suitable for converting any gray-scale image into a binary form but are inappropriate for complex documents, and even more, for degraded documents. However, the binarization of degraded document images is not an easy procedure because these documents contain noise, shadows and other types of degradations. In these special cases, it is important to take into account the natural form and the spatial structure of the document images. For these reasons, specialized binarization techniques have been developed for degraded document images.

In one category, local thresholding techniques have been proposed for document binarization. These techniques estimate a different threshold for each pixel according to the gray-scale information of the neighboring pixels. In this category belong the techniques of Bernsen (1986), Chow and Kaneko (1972), Eikvil (1991), Mardia and Hainsworth (1988), Niblack (1986), Taxt et al. (1989), Yanowitz and Bruckstein (1989), Sauvola et al. (1997), and Sauvola and Pietikainen (2000). In another category belong the hybrid techniques, which combine information of global and local thresholds. The best known techniques in this category are the methods of Gorman (1994), and Liu and Srihari (1997).

For document binarization, probably, the most powerful techniques are those that take into account not only the image gray-scale values, but also the structural characteristics of the characters. In this category belong binarization techniques that are based on stroke analysis, such as the stroke width (SW) and characters geometry properties. The most powerful techniques in this category are the logical level technique (LLT) (Kamel and Zhao, 1993) and its improved adaptive logical level technique (ALLT) (Yang and Yan, 2000), and the integrated function algorithm technique (IFA) (White and Rohrer, 1983) and its advanced “improvement of integrated function algorithm” (IIFA) (Trier and Taxt, 1995b). For those two techniques further improvements were proposed by Badekas and Papamarkos (2003). Recently, Papamarkos (2003) proposed a new neuro-fuzzy technique for binarization and gray-level (or color) reduction of mixed-type documents. According to this technique a neuro-fuzzy classifier is fed by not only the image pixels’ values but also with additional spatial information extracted in the neighborhood of the pixels.

Despite the existence of all these binarization techniques, it is proved, by evaluations that have been carried out (Trier and Taxt, 1995a; Trier and Jain, 1995; Leedham et al., 2003; Sezgin and Sankur, 2004), that there is not any technique that can be applied effectively to all types of digital documents. Each one of them has its own advantages and disadvantages.

The proposed binarization system, takes advantages of binarization results obtained by a set of the most powerful binarization techniques from all categories. These techniques are incorporated into one system and considered as its components. We have included document binarization techniques that gave the highest scores in evaluation tests that have been made so far. Trier and Taxt (1995a) found that Niblack's and Bernsen's techniques are the fastest of the best performing binarization techniques. Furthermore, Trier and Jain's evaluation tests (1995) identify Niblack's technique, with a post-processing step, as the best. Kamel and Zhao (1993) use six evaluation aspects (subjective evaluation, memory, speed, stroke-width restriction, number of parameters of each technique) to evaluate and analyze seven character/graphics extraction based binarization techniques. The best method, in this test, is the LLT. Recently, Sezgin and Sankur (2004), compare 40 binarization techniques and they conclude that the local based technique of Sauvola and Pietikainen (2000) as well as the technique of White and Rohrer (IIFA) (Trier and Taxt, 1995b) are the best performing document binarization techniques. Apart from the above techniques, we include in the proposed system the powerful global thresholding technique of Otsu (1979) and the fuzzy C-mean (FCM) technique (Chi et al., 1996).

The main subject of the proposed document binarization technique is to build a system that takes advantages of the benefits of a set of selected binarization techniques by combining their results, using the Kohonen self-organizing map (KSOM) neural network. This is important, especially for the fuzzy pixels of the documents, i.e. for the pixels that cannot be easily classified. The techniques that are incorporated in the proposed system are the following: Otsu (1979), FCM (Chi et al., 1996), Bernsen (1986), Niblack (1986), Sauvola and Pietikainen (2000), and improvement versions of ALLT (Yang and Yan, 2000; Badekas and Papamarkos, 2003), and IIFA (Trier and Taxt, 1995b; Badekas and Papamarkos, 2003). Most of these techniques, especially those coming from the category of local thresholding algorithms, have parameter values that must be defined before their application to a document image. It is obvious that different values of the parameter set (PS) of a technique lead to different binarization results which means that there is not a set of best PS values for all types of document images. Therefore, for every technique, in order to achieve the best binarization results the best PS values must be initially estimated.

Specifically, in the first stage of the proposed binarization technique, a parameter estimation algorithm (PEA), is used to detect the best PS values of every document binarization technique. The estimation is based on the analysis of the correspondence between the different document binarization results obtained by the application of a specific binarization technique to a document image, using different PS values. The proposed method is based on the work of Yitzhaky and Peli (2003) which has been proposed for edge detection evaluation. In their approach, a specific range and a specific step for each one of the parameters is initially defined. The best values for the PS are then estimated by comparing the results obtained by all possible combinations of the PS values. The best PS values are estimated using a receiver operating characteristics (ROC) analysis and a Chi-square test. In order to improve this algorithm, a wide initial range for every parameter is used and in order to estimate the best parameter value an adaptive convergence procedure is applied. Specifically, in each iteration of the adaptive procedure, the parameters’ ranges are redefined according to the estimation of the best binarization result obtained. The adaptive procedure terminates when the ranges of the parameters values cannot be further reduced and the best PS values are those obtained from the last iteration.

In order to combine the best binarization results obtained by the independent binarization techniques (IBT) using their best PS values, the KSOM neural network (Strouthopoulos et al., 2002; Haykin, 1994) is used as the final stage of the proposed method. Specifically, the neural network classifier is fed using the binarization results obtained from the application of the IBT and a corresponding weighting value that is calculated for each one of them. After the training stage, the output neurons specify the classes obtained, and using a mapping procedure, these classes are categorized as classes of the foreground and background pixels.

The proposed binarization technique was extensively tested by using a variety of document images most of which come from the old Greek Parliamentary Proceedings and from the Mediateam Oulu Document Database (Sauvola and Kauniskangas, 1999). Characteristic examples and comparative results are presented to confirm the effectiveness of the proposed method. The entire system has been implemented in a visual environment.

The main stages of the proposed binarization system are analyzed in Section 2. Section 3, describes analytically the method that is used to detect the best binarization result obtained by the application of the IBT. Also, the same section illustrates the algorithm used for the estimation of the weighting values for each one of the IBT. Section 4, analyses the method used for the detection of the proper parameter values. Section 5, gives a recent description of the IBT included in the proposed binarization system. Finally, Section 6 presents the experimental results and Section 7 the conclusions.

Section snippets

Description of the proposed binarization system

The proposed binarization system performs document binarization by combining the best results of a set of IBT, most of which are developed for document binarization. That is, a number of powerful IBT are included in the proposed binarization system. Specifically, the IBT that were implemented and included in the proposed evaluation system are:

  • Otsu (1979),

  • FCM (Chi et al., 1996),

  • Bernsen (1986),

  • Niblack (1986),

  • Sauvola and Pietikainen (2000),

  • An improved version of ALLT (Yang and Yan, 2000; Badekas

Obtaining the best Binarization Result

It is obvious that when a document is binarized, it is not known initially the optimum result that must be obtained. This is a major problem in comparative evaluation tests. In order to have comparative results, it is important to estimate a ground truth image. By estimating the ground truth image we can compare the different binarization results obtained, and therefore, we can choose the best. This ground truth image, known as estimated ground truth (EGT) image, can be selected from a list of

Parameter estimation algorithm

In the first stage of the proposed binarization technique it is necessary to estimate the best PS values for each one of the IBT. This estimation is based on the method of Yitzhaky and Peli (2003) proposed for edge detection evaluation. However, in order to increase the accuracy of the estimated best PS values we improve this algorithm by using a wide initial range for every parameter and an adaptive convergence procedure. That is, the ranges of the parameters are redefined according to the

The binarization techniques included in the proposed system

The seven binarization techniques, which are implemented and included in the proposed binarization system, are already referred in Section 2. The Otsu's technique (1979) is a global binarization method while FCM (Chi et al., 1996) performs global binarization by using fuzzy logic. We use FCM with a value of fuzzyfier m equal to 1.5.

In the category of local binarization techniques belong the techniques of Bernsen (1986), Niblack (1986) and Sauvola and Pietikainen (2000). Each one of these

Experimental results

The proposed system was tested with a variety of document images most of which come from the old Greek Parliamentary Proceedings and from the Mediateam Oulu Document Database (Sauvola and Kauniskangas, 1999). In this section, four characteristic experiments that include comparative results are presented.

Experiment 1: In this experiment the proposed binarization technique is used to binarize the degraded document image shown in Fig. 5, which is coming from the old Greek Parliamentary

Conclusions

This paper proposes a new document binarization technique suitable for normal and degraded digital documents. The main idea of the proposed binarization technique is to build a system that takes advantages of the benefits of a set of selected binarization techniques by combining their results using a Kohonen self-organizing neural network. In order to improve further the binarization results, the best parameter values of each document binarization technique are first estimated. Using these

References (32)

  • Badekas, E., Papamarkos, N., 2003. A system for document binarization. Third International Symposium on Image and...
  • J. Bernsen

    Dynamic thresholding of grey-level images

  • Z. Chi et al.

    Fuzzy Algorithms: with Applications to Image Processing and Pattern Recognition

    (1996)
  • Eikvil, L., Taxt, T., Moen, K., 1991. A fast adaptive method for Binarization of document images. Proceedings of...
  • L.O. Gorman

    Binarization and multithresholding of document images using connectivity

    Graphical Models Image Processing (CVGIP)

    (1994)
  • S. Haykin

    Neural Networks: A Comprehensive Foundation

    (1994)
  • Cited by (43)

    • Document image binarization using local features and Gaussian mixture modeling

      2015, Image and Vision Computing
      Citation Excerpt :

      Badekas and Papamarkos [13] proposed a binarization technique that combines the results of multiple binarization algorithms using a Kohonen Self-Organizing Map (KSOM) neural network. In [14], the binarization results of many independent techniques were initially produced and then combined with a Kohonen Self-Organizing Map (KSOM). Badekas et al. [15] also introduced a binarization technique, specialized for color documents, where the resulting “binary” image contains the detected text regions with black characters in white background leaving the remaining original color parts of the document intact.

    • A new binarization method for non-uniform illuminated document images

      2013, Pattern Recognition
      Citation Excerpt :

      In recent years, some algorithms which based on pattern recognition and signal processing were introduced into the binarization. Such as support vector machine [32], neural network [33], connected operators [34], mapping theory, wavelet theory, multi-scale framework [35,37], Markov model [38,39], etc. Multi-scale analysis was an useful approach to present various phenomena on different scales.

    • YinYang, a Fast and Robust Adaptive Document Image Binarization for Optical Character Recognition

      2023, DocEng 2023 - Proceedings of the 2023 ACM Symposium on Document Engineering
    • Document Image Binarization

      2023, SpringerBriefs in Computer Science
    View all citing articles on Scopus
    View full text