Segmentation and Location Computation of Bin Objects

In this paper we present a stereo vision based system for segmentation and location computation of partially occluded objects in bin picking environments. Algorithms to segment partially occluded objects and to find the object location [midpoint,x, y and z co-ordinates] with respect to the bin area are proposed. The z co-ordinate is computed using stereo images and neural networks. The proposed algorithms is tested using two neural network architectures namely the Radial Basis Function nets and Simple Feedforward nets. The training results fo feedforward nets are found to be more suitable for the current application. The proposed stereo vision system is interfaced with an Adept SCARA Robot to perform bin picking operations. The vision system is found to be effective for partially occluded objects, in the absence of albedo effects. The results are validated through real time bin picking experiments on the Adept Robot.


Introduction
Bin Picking Robot requires information of the object to be picked and its exact location with respect to the bin area.It is generally assumed that the topmost object will be desired object to be picked from a bin with scattered or piled objects.In this paper emphasis is given to solving the segmentation problem of occluded objects in the bin.The stereo vision system proposed considers two aspects, one is the segmentation of the bin image to identify the topmost object and the other is the location of the object midpoint (x, y and z co-ordinates).In bin picking environments object occlusions in the bin image pose a challenge to the segmentation process.Several papers have been published on bin picking algorithms, of which very few have considered occluded objects.Most researches on bin picking use vision only for object recognition and pose determination (Krisnawan Rahardja & Akio Kosaka, 1996), (Ayako Takenouchi & et al,1998) , (Ezzet Al-Hujazi & Arun Sood, 1990), (Harry Wechsler & George Lee Zimmerman, 1989), (Kohtaro Ohba & Katsushi Ikeuchi,1996), while others use a model based approach which compares the object image with a model database for pose determination (Yoshikatsu Kimura & et al, 1995), (Martin Berger & et al, 2000), (Sarah Wang & et al, 1994) .Some other approaches use a combination of sensors and model database to solve the bin picking problem (Martin Berger & et al, 2000) who use stereo and CAD models to determine pose of objects.
In contrast to these approaches, a stereo based vision system is proposed which helps the robot to automatically identify and pick the topmost object.The stereo vision system consists of a hardware componet and a software component, the hardware comprises of two machine vision cameras in a stereo rig.PULNIX progressive scan cameras are used in this experiment.A stereo base length of 70mm is found to be suitable for the current application; a vacuum gripper is used for grasping the objects.Similar geometrical shaped objects with partial occlusions are used in the experimental process.The software component comprises of segmentation and an object location algorithm.The proposed segmentation algorithm uses binary thresholding techniques and image histogram to identify the topmost object in the bin as detailed in Section 2. Section 3 elaborates on the object feature extraction process, while Section 4 and 5 describe the neural network architectures and experimental results respectively.Section 6 comprises of the observations, conclusion and future research aspects of the paper.

Image Acquisition and Prepreocessing
The stereo images of the occluded bin objects are captured using the stereo cameras, direct lighting of the bin is avoided to reduce brightness and albedo effects.Objects in the bin, partially occlude each other and have different intensity levels due to their location.The captured images are monochromatic 640 x 480 pixels size.The acquired images are pre-processed to improve the efficiency of the segmentation process.The preprocessing involves two stages i) image resizing and ii) noise filtering.Images are resized to minimize the processing time and to improve the efficiency of the system without significant loss of information of the objects.A image size of 128 x 96 pixels is found to be suitable for the current application.Since the topmost object will be the one without occlusions and the one with higher intensity levels in comparison to the rest of the objects, filtering techniques are applied to smooth out the intensity of the objects and to enhance the edges.The images are filtered using a regional filter and a mask.This masked filter, filters the data in the image with the 2-D linear Gaussian filter and a mask the same size as the original image.This filter returns an image that consists of filtered values for pixels in locations where the mask contains ones and unfiltered values for pixels in locations where the mask contains zeroes.The above process smoothens the intensity of the image around the objects.The resulting filtered image is then subjected to segmenting techniques as detailed in the following section.

Bin Image Segmentation
Bin image segmentation involves identifying the top most object from the cluster of objects in the bin for pick up.Segmentation of occluded objects has been dealt by Rahardja and Kosaka (Krisnawan Rahardja & Akio Kosaka. 1996), who developed a stereo based algorithm to find simple visual clues of complex objects, this algorithm depends on human assistance.In contrast to this we propose an automatic segmentation algorithm which can segment partially occluded bin objects.Since all the objects are partially occluded except the topmost object, separating the topmost object can be done using the grey value of the object.A histogram of the bin stereo images displays the grey levels of the image.Segmentation using binary thresholding is possible by identifying the pixels of grey levels higher than a threshold value which is assumed to relate to the topmost object, as the topmost object has a brightness level higher than the other objects in the bin.A suitable threshold segmenting only the topmost object is to be computed.For real-time bin picking, automatic determination of threshold value is an essential criterion.To determine this threshold value an algorithm is proposed which uses the grey levels of the image from the histogram of both the stereo images to compute the threshold.The proposed algorithm is as follows: Segmentation Algorithm : Step 1: The histogram is computed from the left and right gray scale images for a bin value of 0 to 255.
Counts a (i), i=1,2,3,…,256 contains the number of pixels with a gray scale value of (i-1) pixels for the left image.
Counts b (i), i=1,2,3,…,256 contains the number of pixels with a gray scale of (i-1) for the right image.
Step 2: Compute the logarithmic weighted gray scale value of the left and right image as where i = 1,2,3,…,255 Step 3: Compute the logarithmic weighted gray scale Step 4: The threshold T is the minimum value of 'tam' and 'tbm'.
Threshold of both the stereo images are computed separately and the min value of the two thresholds is applied as the threshold to both the images.The topmost object is segmented and converted into a binary image.The performance of the proposed algorithm was compared with the more popular global thresholding algorithm proposed by Otsu.Fig. 1 shows the results of the comparison of both the methods.

Bin Image
Segmentation using Otsu Method Segmentation using proposed Method

Object Location
The next phase is to compute the location of the object with respect to the bin area.To pick up the topmost object the robot must be provided with the location co-ordinates of the object midpoint [x, y and z co-ordinates].The x and y co-ordinates are determined from the 2D segmented image of the object, whereas the z co-ordinate which is the depth or distance co-ordinate requires the 3D representation.To find the z co-ordinate a neural network and stereo image approach is proposed which trains a neural network to compute the distance or z co-ordinate of the object from its stereo images.Two neural nets namely radial basis function nets and feedforward nets are compared in computing the z co-ordinate.The object features of the stereo images are used as the input data and the distance of the object from gripper is used as the output data to train the neural network.The object features are extracted using singular value decomposition [SVD].The following sections elaborate on SVD and feature extraction process.

Singular Value Decomposition
The

Feature Extraction
To pick up the topmost object, the location of the object is to be computed.Since only one object is present in the segmented image, finding the centroid of the object will provide the x and y coordinates of the object midpoint, however it is also important to know the z coordinate  σ is the bias to the j-th hidden unit also known as the spread factor.In the case of classification problems an RBF network finds the centroid of data clusters and uses this centroid as the centers of the Gaussian density function

Simple Feedforward Network Architecture
The simple feedforward neural network architecture consists of three layers.10 singular values are fed to the network as input data.The hidden layer is chosen to have 5 neurons and the output consists of 1 neuron, which represents the object distance.The hidden and input neurons have a bias value of 1.0 and are activated by bipolar sigmoid activation function.The initial weights for the network are randomized between -0.5 and 0.5 and normalized.The initial weights that are connected to any of the hidden or output neuron are normalized in such a way that the sum of the squared weight values connected to a neuron is one.This normalization is carried out using equation ( 7), which is used to implement the weight updating (S.N.Sivanandam & M.Paulraj, 2003).
where n -number of input units p -number of hidden units A sum squared error criteria as defined by equation ( 8) is used as a stopping criteria while training the network.The sum-squared tolerance defined in equation ( 8) is fixed as 0.01.The network is trained by the conventional back propagation procedure (Laurene Fausett, 1990).The cumulative error is the sum squared error for each epoch and is given by:-Sum squared error = where k t is the expected output value for the k th neuron, k y is the actual output value for the k th neuron, m is the total number of output neurons, and p is the total number of input neurons.

Experimental Results
In the experimental phase 53 images were acquired.The bin images are pre-processed to smooth the intensity level of the object.The pre-processed images are segmented to extract the topmost object in the bin.The segmented left and right images are added and the edge of the added image is extracted.The 'x' and 'y' coordinates are computed from the left segmented object image, which is considered as the reference image, by finding the centroid of the reference image.To compute the 'z' co-ordinate, the singular value features are extracted from the added edge image using SVD. 10 singular values of an image are fed as input and the object distance is fed as output to the neural networks.The feature data is used to compare the performance of the two networks.The RBF network is trained with 40 sample data, the spread factor j σ is chosen as 3.2.In the real time experimentation phase, the developed stereo vision system is interfaced with an Adept SCARA 600 Robot.A vacuum gripper is used for pick and place operation.A bin with seven partially occluded objects is used for testing.The Adept SCARA Robot is tested for real time bin picking using the object location coordinates ['x', 'y' and 'z' co-ordinates] computed by the stereo vision system.The bin picking system is designed to systematically pick the topmost object one at a time .
The process is repeated till all the objects in the bin are picked.At an aeverage, six out of the seven objects were picked and placed succesfully with a bin pick and place performance of 85.7%.

Conclusion
A bin picking system using stereo vision sensors for object localization is presented.Algorithms for segmentation of partially occluded bin objects and location of topmost object is proposed.The results of two netwroks, RBF and feedforward nets are compared in evaluating the Z co-ordinate of object midpoint.The feedforward net is found to be more suitable for the current application.The system is experimentally verified and realtime bin picking results are presented.Real time tests validate the applicability of the proposed algorithms for bin picking.The major constraint of the proposed system was poor segmentation in the presence of albedo effects and uneven brightness in certain parts of the bin.Optimal lighting conditions are essential to derive satisfactory results.Future work will include optimising lighting conditions and improving the network performance.
Singular Value Decomposition [SVD] is a widely used technique to decompose a matrix into several component matrices, exposing many of the useful and interesting properties of the original matrix (Zi-Quan Hong, 1991).Any 'm x n' matrix A (m >= n) can be written as the product of a 'm x m' column-orthogonal matrix U, an 'm x n' diagonal matrix W with positive or zero elements, and the transpose of an 'n x n'orthogonal matrix V.The diagonal elements of matrix 'W' are the singular values of matrix 'A', which are non-negative numbers.The singular values obtained by the SVD of an image matrix give algebraic features of an image, which represents the intrinsic attributes of the image (Zi-Quan Hong, 1991).
Fig. 2. Flow Diagram for Object Feature Extraction

Fig. 3
Fig.3 Radial Basis Function Network Architecture

Table 2 .
The network trained in 1.407 seconds.Subsequently the network is tested for 53 samples.The radial basis function network successfully computed the distance for 47 samples out of 53 samples, resulting in computation efficiency of 88.6%.The training parameters and results are tabulated in Table1.Training Parameters of the Simple Feedforward Network