Image-to-MIDI mapping based on dynamic fuzzy color segmentation for visually impaired people
Research highlights
► Intensity and color information are effectively described in the RGB ratio space. ► Color segmentation with dynamic fuzzy logic performs robustness to road types. ► Fuzzy c-means line-clustering technique generates clustering number autonomously. ► Image-to-MIDI mapping makes audio instruction more effective to the visually impaired.
Introduction
Color segmentation is an essential issue with regard to vision applications, such as object detection and navigation (Bosch et al., 2007, Lin, 2007). The process of color segmentation consists of color representation, color feature extraction, similarity measurement and classification. In color representation, the RGB (Red, Green and Blue) model, which expresses color as a mixture of red, green and blue three color components, is often used to depict the color information of an image (Bascle et al., 2007, Weng et al., 2007). By using a transformation, the secondary colors, which are CMY (Cyan, Magenta and Yellow) or RG--GB--BR, can be obtained and used as an alternative color model (Wang et al., 2007). The HSI model, which transforms RGB into Hue, Saturation and Intensity, is also a popular color model at present, and its good performance has been shown in many works (Kim et al., 2007, Kim et al., 2008, Wangenheim et al., 2007). HSV (Value) and HSL (Luminance) are very similar to the HSI model due to the transformation formulas applied. Using the HSI color model, a specific color is able to be recognized regardless of variations in saturation and intensity. CIE Luv, CIE Lab and YCbCr (Wang and Huang, 2006, He et al., 2007) are color spaces which represent a color by its lightness (L), luminance (Y) and chromaticity (uv, ab and CbCr). The idea of color ratio was first introduced by Barnard and Finlayson in 2000 to identify the “shadow” and “non-shadow” regions to be robust under changes in luminance. In 2002, the RGB ratio of the pixel value to the local sum (R/Rsum, G/Gsum, B/Bsum) was proposed by Finlayson et al. to deal with the influences of shadows on roads produced by variations in illumination. In addition, Finlayson et al. (2005) presented an alternative RGB ratio definition, which is the ratio of the intensity of a pixel to the local average (R/Rave, G/Gave, B/Bave), and this formula is used due to its invariance to luminance and device changes. In this paper, we propose a new RGB ratio model, which is based on the fact that a change in the intensity of a reference color will lead to a change in the RGB color components, but their ratios to the reference color (R/Rref, G/Gref, B/Bref) will be linear to an intensity change (Benedek and Sziranyi, 2007, Mikic et al., 2000). With this property, a specific color, such as the road reference color, can be described as a linear road color model, so that it is invariant to intensity variation. Moreover, information about the three color components (RGB) is used to describe the chromaticity by the proposed RGB ratio space. Therefore, while inheriting the characteristics of HSI and RGB models, the RGB ratio has several advantages with regard object recognition under variations in intensity.
There exist many complex and state-of-the-art techniques for color segmentation which are excellent at partitioning an input image. For example, the global color statistics can be represented by a set of overlapping regions and modeled by a mixture of Gaussians (GMM), and a local mixture model is described by Markov Random Fields (Kato, 2008). By optimizing parameters of the global and local models, the maximum likelihood is estimated and then a pixel can be classified. Although this approach has good segmentation results, a large number of iterations are necessary to determine the optimal parameters. As a result, 16 s of computation time is needed for an image with a 256 × 256 resolution (Tai, 2007).
Hill manipulation of the color histogram is another widely used approach to achieve color segmentation. A three-dimensional histogram can be obtained by accumulating three color components of pixels. Dominant hill detection and minor hill dismantling are then used to estimate the clustering index (Al Aghbari and Al-Haj, 2006). The idea of a ‘histon’, which is an encrustation of a histogram such that the elements in the histon are the set of all the pixels that can be classified as possibly belonging to the same segment, was introduced for color segmentation by Murshrif and Ray (2008), and the total computation time this approach requires for a 179 × 122 image is 2.41 s.
Neural networks (Bascle et al., 2007) have recently been used as a clustering kernel for color segmentation, where components of the RGB space and the intensity are used as inputs and three calibrated color components are considered as outputs of the modified multi-layer perceptron (MLP). After the training procedure, good segmentation performance is achieved. Furthermore, the look-up tables (LUT) of the modified MLP can be applied for real-time applications, so that the execution time for a 320 × 240 image is only 0.00375 s. However, a huge database needs to be created for this system to work, and if an input image is very different from those in the database, the network should be re-trained to improve the results. The well-known K-means method (Lloyd, 1982) is one of the most commonly used techniques in the clustering-based segmentation field for industrial applications and machine learning (Berkhin, 2002, Mignotte, 2008). The fuzzy c-means theory (the fuzzy version of K-means) is applied as the clustering method (Kuo et al., 2008), and similarity measurement is based on Euclidean distance (Luis-Garcia et al., 2008). Bosch et al. (2007) presented an approach that can recognize grass, sky, snow and road using fuzzy logic with predefined classes, for which the average processing time for an image size of 180 × 120 to 250 × 250 is 60 s. Efficient fuzzy c-means clustering (qFCM) is also applied to speed up the clustering process by splitting a target image into several small sub-images (Chen et al., 2005). The computation time that qFCM requires for a 128 × 128 gray-level image is 0.1–1.2 s.
The use of a template image is another fast segmentation method. For instance, an image database of eyes can be established, and a skin color database can be obtained from a color conversion matrix with color of the sclera. Consequently, fixed thresholds of the HSV space are introduced to detect the skin area in an input image (Do et al., 2007). However, the use of template images is restricted to specific objects, and may require a large image database.
In this paper, a dynamic fuzzy variable range is proposed to achieve a high quality segmentation result. Firstly, the linearity between the RGB ratio and intensity is estimated by a linear progressive method and parameter estimation. Secondly, upper and lower boundaries are obtained statistically for each color ratio. These boundaries are used to define the fuzzy membership functions of color ratio clusters, which dynamically vary corresponding to intensity changes. The proposed fuzzy system’s parameter optimization, undertaken using a backpropagation neural network, makes the fuzzy decision more adaptive and more effective.
Early warning and real-time obstacle information are important to achieving a successful intelligent vehicle system and a guidance system for visually impaired people. Meijer (1992) used sine-wave sounds to transform image information without any image pre-processing, while a multi-resolution approach was introduced to image-to-sound mapping by Capelle et al. (1998). As is well known, the human ear has a far lower capacity for handling data than the eye. Furthermore, there is a lot of redundant and unnecessary information in a captured image, which means that it is necessary to enhance information related to obstacles and suppress that related to unnecessary elements (Nagarajan et al., 2005, Nagarajan and Yaacob, 2007). These works used sine-wave sounds for sound generation, even though these sounds are not familiar to humans. Due to a the fact that stereo earphones may insulate a user from other environmental sounds, which could be dangerous in any situations, synthetic speech was used with this approach to provide guidance for visually impaired people (Loomis et al., 1998). Although speech provides the clearest instructions for users, it requires more processing time. In this paper, MIDI is introduced as the sound generator, so that a mono earphone is sufficient to provide obstacle information using various musical instruments. In this way, a user can receive both sounds from the environment and instructions generated by MIDI at the same time.
This paper is organized as follows: Section 2 introduces the RGB ratio space and constructs the road color model. The proposed dynamic fuzzy color segmentation method is described in Section 3. Obstacle extraction and the image-to-MIDI mapping algorithm to generate guidance instructions are presented in Section 4. Examples of various types of road detection using the proposed approach are illustrated in Section 5, while Section 6 presents the conclusions.
Section snippets
Road color model construction
Road detection is a typical application of color segmentation. In this study, the central bottom area of an image defined by (2N + 1) × (2N + 1) pixels should represent the road, as demonstrated in Fig. 1. From projective geometry, this area is closest to the camera system, and if this area is not part of the road then the navigation system should give a stop or turn command instead of evaluating the reference road color value. By calculating the average values of RGB components of the (2N + 1) × (2N + 1)
Dynamic fuzzy road detection
For robustness and flexibility, fuzzy logic is used as the decision maker of the proposed segmentation method, where the membership functions are dynamically defined according to the road set obtained in Section 2. In this paper, three RGB ratios and the corresponding intensity are used as the four inputs of the fuzzy decision system. Fig. 6(a)–(c) shows three membership functions for input ratios, where lower/upper represent the sets in which the color ratio is smaller/greater than the
Image-to-MIDI mapping
To develop an intelligent guidance system for elderly or visually impaired people, a camera is mounted on a scooter with 18° pitch down. The hot zone, which indicates the on-going region, is shown in Fig. 7. The region has an 80 cm width, corresponding to the width of the path. The upper edge of the hot zone with Y = 30 corresponds to a distance 650 cm away from the camera.
By the definition of the hot zone, obstacles are objects that block the path and are located inside this zone. In other words,
Case study and discussion
In this case study, an uncalibrated and low-cost camera was utilized and mounted on a scooter at 105 cm height with 18° pitch down. Table 5 shows that an umbrella was detected and classified as a small obstacle according to its normalized image size. AN is used to determine the musical instrument (MI) and pitch. Because the X coordinate of the obstacle’s center was shifting to the right while approaching, the playing index (IP) of the obstacle was increasing, and the volume value (Vol) was also
Conclusions
In this paper, a dynamic fuzzy color segmentation approach for road detection is proposed using the RGB ratio space in association with the dynamic fuzzy classification. The road model described in the RGB color ratios and the intensity space is estimated by a detected road reference value, and is updated subsequently during the process. Dynamic fuzzy logic is also introduced as the clustering method to detect the road in a more effective manner. The use of dynamic membership functions based on
Acknowledgement
Part of the work was supported by the National Science Council under the Grant No. NSC96-2221-E006-052.
References (35)
- et al.
Segmentation and description of natural outdoor scenes
Image Vision Comput.
(2007) - et al.
Illuminant and device invariant color using histogram equalization
Pattern Recognition
(2005) Segmentation of color images via reversible jump MCMC sampling
Image Vision Comput.
(2008)- et al.
Color segmentation robust to brightness variations by using B-spline curve modeling
Pattern Recognition
(2008) Face detection in complicated backgrounds and different illumination conditions by using YCbCr color space and neural network
Pattern Recognition Lett.
(2007)- et al.
Region partition and feature matching based color recognition of tongue image
Pattern Recognition Lett.
(2007) - et al.
Color image segmentation guided by a color gradient network
Pattern Recognition Lett.
(2007) - et al.
Hill-manipulation: An effective algorithm for color image segmentation
Image Vision Comput.
(2006) - Barnard, K., Finlayson, G., 2000. Shadow identification using color ratios. In: Proc. 8th Imaging Conf., pp....
- et al.
Learning invariants to illumination changes typical of indoor environments: Application to image color correction
Internat. J. Imaging Systems Technol.
(2007)
Study on color space selection for detecting cast shadows in video surveillance
Internat. J. Imaging Systems Technol.
Survey of Clustering Data Mining Techniques
A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution
IEEE Trans. Biomed. Eng.
Efficient fuzzy c-means clustering for image data
J. Electron. Imaging
Skin color detection through estimation and conversion of illuminant color under various illuminations
IEEE Trans. Consum. Electron.
Neural Networks: Algorithms, Applications, and Programming Techniques
Cited by (12)
A key frame extraction method for processing greenhouse vegetables production monitoring video
2015, Computers and Electronics in AgricultureCitation Excerpt :In HSV color space, H and S component value is invariant to illumination (Kuanar et al., 2013). Therefore, frames are converted to HSV color space to weaken the influence of illumination on object segmentation (Chen et al., 2011; Hu et al., 2012; Fu et al., 2013). Although the foreground can be clearly distinguished from the background by both H and S component values without the influence of illumination, there is still a difference.
Research on Colour Matching in Art Design Based on Neural Network Mathematics Models
2022, Mathematical Problems in EngineeringAudio classification using braided convolutional neural networks
2020, IET Signal ProcessingCamera and Sensors-Based Assistive Devices For Visually Impaired Persons: A Systematic Review
2019, International Journal of Scientific and Technology ResearchIndoor/ outdoor navigation system based on possibilistic traversable area segmentation for visually impaired people
2016, Electronic Letters on Computer Vision and Image AnalysisTraversable area segmentation approach at indoor environment for visually impaired people
2015, 13th International Conference on Advances in Mobile Computing and Multimedia, MoMM 2015 - Proceedings