Image-to-MIDI mapping based on dynamic fuzzy color segmentation for visually impaired people

https://doi.org/10.1016/j.patrec.2010.11.019Get rights and content

Abstract

In this paper, the RGB ratio is defined according to a reference color so that an image can be transformed from a conventional color space to the RGB ratio space. Different to traditional distance measurement, a road color model is determined by an ellipse area in the RGB ratio space enclosed by the estimated boundaries. The proposed dynamic fuzzy logic, where fuzzy membership functions are defined according to estimated boundaries, is introduced to implement clustering rules, such that each pixel will have its own fuzzy membership function corresponding to its intensity. A basic neural network is trained and used to achieve parameter optimization. Experimental results for road detection demonstrate the robustness of the proposed approach to variations in intensity. To provide obstacle information, especially for visually impaired people, Musical Instrument Digital Interface (MIDI) is introduced as the sound generator, and image-to-MIDI mapping algorithm is proposed. Experimental results show that the proposed method can adapt to various road types, and the resulting audio information successfully indicates the position and size of obstacles.

Research highlights

► Intensity and color information are effectively described in the RGB ratio space. ► Color segmentation with dynamic fuzzy logic performs robustness to road types. ► Fuzzy c-means line-clustering technique generates clustering number autonomously. ► Image-to-MIDI mapping makes audio instruction more effective to the visually impaired.

Introduction

Color segmentation is an essential issue with regard to vision applications, such as object detection and navigation (Bosch et al., 2007, Lin, 2007). The process of color segmentation consists of color representation, color feature extraction, similarity measurement and classification. In color representation, the RGB (Red, Green and Blue) model, which expresses color as a mixture of red, green and blue three color components, is often used to depict the color information of an image (Bascle et al., 2007, Weng et al., 2007). By using a transformation, the secondary colors, which are CMY (Cyan, Magenta and Yellow) or RG--GB--BR, can be obtained and used as an alternative color model (Wang et al., 2007). The HSI model, which transforms RGB into Hue, Saturation and Intensity, is also a popular color model at present, and its good performance has been shown in many works (Kim et al., 2007, Kim et al., 2008, Wangenheim et al., 2007). HSV (Value) and HSL (Luminance) are very similar to the HSI model due to the transformation formulas applied. Using the HSI color model, a specific color is able to be recognized regardless of variations in saturation and intensity. CIE Luv, CIE Lab and YCbCr (Wang and Huang, 2006, He et al., 2007) are color spaces which represent a color by its lightness (L), luminance (Y) and chromaticity (uv, ab and CbCr). The idea of color ratio was first introduced by Barnard and Finlayson in 2000 to identify the “shadow” and “non-shadow” regions to be robust under changes in luminance. In 2002, the RGB ratio of the pixel value to the local sum (R/Rsum, G/Gsum, B/Bsum) was proposed by Finlayson et al. to deal with the influences of shadows on roads produced by variations in illumination. In addition, Finlayson et al. (2005) presented an alternative RGB ratio definition, which is the ratio of the intensity of a pixel to the local average (R/Rave, G/Gave, B/Bave), and this formula is used due to its invariance to luminance and device changes. In this paper, we propose a new RGB ratio model, which is based on the fact that a change in the intensity of a reference color will lead to a change in the RGB color components, but their ratios to the reference color (R/Rref, G/Gref, B/Bref) will be linear to an intensity change (Benedek and Sziranyi, 2007, Mikic et al., 2000). With this property, a specific color, such as the road reference color, can be described as a linear road color model, so that it is invariant to intensity variation. Moreover, information about the three color components (RGB) is used to describe the chromaticity by the proposed RGB ratio space. Therefore, while inheriting the characteristics of HSI and RGB models, the RGB ratio has several advantages with regard object recognition under variations in intensity.

There exist many complex and state-of-the-art techniques for color segmentation which are excellent at partitioning an input image. For example, the global color statistics can be represented by a set of overlapping regions and modeled by a mixture of Gaussians (GMM), and a local mixture model is described by Markov Random Fields (Kato, 2008). By optimizing parameters of the global and local models, the maximum likelihood is estimated and then a pixel can be classified. Although this approach has good segmentation results, a large number of iterations are necessary to determine the optimal parameters. As a result, 16 s of computation time is needed for an image with a 256 × 256 resolution (Tai, 2007).

Hill manipulation of the color histogram is another widely used approach to achieve color segmentation. A three-dimensional histogram can be obtained by accumulating three color components of pixels. Dominant hill detection and minor hill dismantling are then used to estimate the clustering index (Al Aghbari and Al-Haj, 2006). The idea of a ‘histon’, which is an encrustation of a histogram such that the elements in the histon are the set of all the pixels that can be classified as possibly belonging to the same segment, was introduced for color segmentation by Murshrif and Ray (2008), and the total computation time this approach requires for a 179 × 122 image is 2.41 s.

Neural networks (Bascle et al., 2007) have recently been used as a clustering kernel for color segmentation, where components of the RGB space and the intensity are used as inputs and three calibrated color components are considered as outputs of the modified multi-layer perceptron (MLP). After the training procedure, good segmentation performance is achieved. Furthermore, the look-up tables (LUT) of the modified MLP can be applied for real-time applications, so that the execution time for a 320 × 240 image is only 0.00375 s. However, a huge database needs to be created for this system to work, and if an input image is very different from those in the database, the network should be re-trained to improve the results. The well-known K-means method (Lloyd, 1982) is one of the most commonly used techniques in the clustering-based segmentation field for industrial applications and machine learning (Berkhin, 2002, Mignotte, 2008). The fuzzy c-means theory (the fuzzy version of K-means) is applied as the clustering method (Kuo et al., 2008), and similarity measurement is based on Euclidean distance (Luis-Garcia et al., 2008). Bosch et al. (2007) presented an approach that can recognize grass, sky, snow and road using fuzzy logic with predefined classes, for which the average processing time for an image size of 180 × 120 to 250 × 250 is 60 s. Efficient fuzzy c-means clustering (qFCM) is also applied to speed up the clustering process by splitting a target image into several small sub-images (Chen et al., 2005). The computation time that qFCM requires for a 128 × 128 gray-level image is 0.1–1.2 s.

The use of a template image is another fast segmentation method. For instance, an image database of eyes can be established, and a skin color database can be obtained from a color conversion matrix with color of the sclera. Consequently, fixed thresholds of the HSV space are introduced to detect the skin area in an input image (Do et al., 2007). However, the use of template images is restricted to specific objects, and may require a large image database.

In this paper, a dynamic fuzzy variable range is proposed to achieve a high quality segmentation result. Firstly, the linearity between the RGB ratio and intensity is estimated by a linear progressive method and parameter estimation. Secondly, upper and lower boundaries are obtained statistically for each color ratio. These boundaries are used to define the fuzzy membership functions of color ratio clusters, which dynamically vary corresponding to intensity changes. The proposed fuzzy system’s parameter optimization, undertaken using a backpropagation neural network, makes the fuzzy decision more adaptive and more effective.

Early warning and real-time obstacle information are important to achieving a successful intelligent vehicle system and a guidance system for visually impaired people. Meijer (1992) used sine-wave sounds to transform image information without any image pre-processing, while a multi-resolution approach was introduced to image-to-sound mapping by Capelle et al. (1998). As is well known, the human ear has a far lower capacity for handling data than the eye. Furthermore, there is a lot of redundant and unnecessary information in a captured image, which means that it is necessary to enhance information related to obstacles and suppress that related to unnecessary elements (Nagarajan et al., 2005, Nagarajan and Yaacob, 2007). These works used sine-wave sounds for sound generation, even though these sounds are not familiar to humans. Due to a the fact that stereo earphones may insulate a user from other environmental sounds, which could be dangerous in any situations, synthetic speech was used with this approach to provide guidance for visually impaired people (Loomis et al., 1998). Although speech provides the clearest instructions for users, it requires more processing time. In this paper, MIDI is introduced as the sound generator, so that a mono earphone is sufficient to provide obstacle information using various musical instruments. In this way, a user can receive both sounds from the environment and instructions generated by MIDI at the same time.

This paper is organized as follows: Section 2 introduces the RGB ratio space and constructs the road color model. The proposed dynamic fuzzy color segmentation method is described in Section 3. Obstacle extraction and the image-to-MIDI mapping algorithm to generate guidance instructions are presented in Section 4. Examples of various types of road detection using the proposed approach are illustrated in Section 5, while Section 6 presents the conclusions.

Section snippets

Road color model construction

Road detection is a typical application of color segmentation. In this study, the central bottom area of an image defined by (2N + 1) × (2N + 1) pixels should represent the road, as demonstrated in Fig. 1. From projective geometry, this area is closest to the camera system, and if this area is not part of the road then the navigation system should give a stop or turn command instead of evaluating the reference road color value. By calculating the average values of RGB components of the (2N + 1) × (2N + 1)

Dynamic fuzzy road detection

For robustness and flexibility, fuzzy logic is used as the decision maker of the proposed segmentation method, where the membership functions are dynamically defined according to the road set obtained in Section 2. In this paper, three RGB ratios and the corresponding intensity are used as the four inputs of the fuzzy decision system. Fig. 6(a)–(c) shows three membership functions for input ratios, where lower/upper represent the sets in which the color ratio is smaller/greater than the

Image-to-MIDI mapping

To develop an intelligent guidance system for elderly or visually impaired people, a camera is mounted on a scooter with 18° pitch down. The hot zone, which indicates the on-going region, is shown in Fig. 7. The region has an 80 cm width, corresponding to the width of the path. The upper edge of the hot zone with Y = 30 corresponds to a distance 650 cm away from the camera.

By the definition of the hot zone, obstacles are objects that block the path and are located inside this zone. In other words,

Case study and discussion

In this case study, an uncalibrated and low-cost camera was utilized and mounted on a scooter at 105 cm height with 18° pitch down. Table 5 shows that an umbrella was detected and classified as a small obstacle according to its normalized image size. AN is used to determine the musical instrument (MI) and pitch. Because the X coordinate of the obstacle’s center was shifting to the right while approaching, the playing index (IP) of the obstacle was increasing, and the volume value (Vol) was also

Conclusions

In this paper, a dynamic fuzzy color segmentation approach for road detection is proposed using the RGB ratio space in association with the dynamic fuzzy classification. The road model described in the RGB color ratios and the intensity space is estimated by a detected road reference value, and is updated subsequently during the process. Dynamic fuzzy logic is also introduced as the clustering method to detect the road in a more effective manner. The use of dynamic membership functions based on

Acknowledgement

Part of the work was supported by the National Science Council under the Grant No. NSC96-2221-E006-052.

References (35)

  • C. Benedek et al.

    Study on color space selection for detecting cast shadows in video surveillance

    Internat. J. Imaging Systems Technol.

    (2007)
  • P. Berkhin

    Survey of Clustering Data Mining Techniques

    (2002)
  • C. Capelle et al.

    A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution

    IEEE Trans. Biomed. Eng.

    (1998)
  • Y.S. Chen et al.

    Efficient fuzzy c-means clustering for image data

    J. Electron. Imaging

    (2005)
  • H.C. Do et al.

    Skin color detection through estimation and conversion of illuminant color under various illuminations

    IEEE Trans. Consum. Electron.

    (2007)
  • Finlayson, G., Hordley, S., Drew, M., 2002. Removing shadows from images using retinex. In: Proc. 10 Imaging Conf., pp....
  • J.A. Freeman et al.

    Neural Networks: Algorithms, Applications, and Programming Techniques

    (1991)
  • Cited by (12)

    View all citing articles on Scopus
    View full text