STEREOSCOPIC ANALYSIS AND DEPTH MAP CREATION

This contribution is focused on the use of the stereoscopic image for the purpose of depth map creation. Further, methods for calibration of the camera(s) are discussed. A stereoscopic head was constructed for the purpose of creating a stereoscopic image. Two Basler acA1600-20uc industrial cameras with Computar M2514-MP2 lenses were used for constructing this head. Furthermore, the algorithm for obtaining the depth map is described. The programming language C# and EmguCV library were used for the implementation of the algorithm. The algorithm consists of 4 parts. The calibration of the camera(s) and image acquisition is solved as first. Calibration of the camera(s) is solved by detection of intersections on the chessboard. Further, methods for the purpose of obtaining the depth map are described. Finally the implemented algorithm is tested.


INTRODUCTION
Today, robotic assistants are at the forefront of scientific research. Robotic assistants are not only used in the industrial field, but are increasingly used in the home. Therefore, these assistants must be able to manipulate with objects of everyday use. For these purposes, robotic assistants must be able to detect dif-ferent objects in the scene and subsequently manipulate with them.
Currently there exist many approaches to the detection of an in an image and subsequent manipulation. These approaches differ especially in the technology used for image acquisition and in the data storage format. Approaches based upon a stereoscopic image or storing data in RGB-D format are used especially in practice. Each approach brings various problems, e.g. the image quality and the sensor calibration. These problems have to be addressed before creating the stereoscopic image. The depth map can be used for the detection of objects, manipulation sequence and obstacles in the scene, which is necessary for manipulation with objects.
The image quality and calibration of the camera(s) are decidedly possible problems when a stereoscopic image is used. Light conditions affect image quality in most cases. Digital noise is produced especially in low light conditions and extreme oversaturation can be caused by direct sunlight. Therefore, it is necessary to use a polarizing filter or to use subtle image smoothing algorithms. The calibration of the camera(s) is the most important step when working with a stereoscopic image. A perfect parallel type of stereoscopic head is almost impossible to achieve. Therefore, calibration of the camera(s) is necessary. Deformation of the depth map is the next possible problem. This deformation can be detected when the depth map is obtained.
The main goal of this contribution is the creation of a configurable stereoscopic head, the subsequent acquisition of the stereoscopic image, obtaining the depth map and finding the right values of the key parameters of the depth map.

THEORETICAL FRAMEWORK/CALCULATION
Two basic approaches exist for the purpose of obtaining the stereoscopic image. These approaches differ especially in the orientation of the cameras. These approaches are shown in Fig. 1. The ideal approach (Fig. 1a) is almost unattainable in practice. Cameras are mounted perfectly parallel and the image is not distorted in this approach. Therefore, a conventional approach (Fig. 1b) can be often found in practice. The image is distorted by using some lens and cameras are not mounted parallel in this approach. The image must be calibrated for the purpose of firstly compensating the distortion. The OpenCV library or Matlab has an integrated calibration tool. Many experts are currently focused on the process of calibration of the camera(s). A 3D image can be created after the calibration process. (National Instruments, 2016) Sun et al. (2011) deal with calibration of the camera(s). These authors are focused on the calibration of the camera(s) with a large field of view. The conventional model of cameras was used to obtain of the stereoscopic image. Each camera is modelled as a pinhole. Radial distortion can be seen on the lenses of the cameras which are used in this model. All calculations are performed by using rotation or transformation matrices. It is necessary to firstly obtain the distortion. Further, a linear model must be obtained. This model is obtained by using a non-linear optimization method. This method minimizes linearity between key points in the image. Further, the image is converted by using vanishing points to a 1D space (three key points in 1D space are obtained from one key point in 2D space). The intrinsic and extrinsic parameters of the cameras are detected by this conversion. The obtained image in 1D space is affected by digital noise. Therefore, this image is smoothed by using a Lavenberg-Marquart algorithm. When the intrinsic and extrinsic parameters are detected, the rotation and transformation matrices are then calculated. The distortion of the cameras is removed by using these matrices. Cao and Foroosh (2006) also use calibration of the camera(s) in 1D space.
An alternative approach to calibration of the camera(s) uses a pinhole as the model of the cameras. The rotation and transformation matrices are used to the conversion of 3D coordinates from the coordinates system of the cameras to the general coordinates system. The Lavenberg-Marquardt, Extend Kalman Filter and Bard-Deming algorithms were combined with the above matrices and were then used to obtain the camera parameters. The Bard- Deming method achieved the best result. However, this method uses a lot of memory and is time-consuming. The method of global nonlinear minimalization was performed in a multiview context. The Sorensen method was used for the first estimation of the intrinsic parameters. Sub-pixels were extracted from the image. An interpolation of these sub-pixels is used for the optimalization of the calibration accuracy. A grid is used for the calibration. Intersections are detected on the grid. The calibration of the pictures is performed in greyscale (Devy et al., 1997). Chen et al. (2012a) solve this problematic similarly to these authors. Lindner et al. (2010) develop an alternative approach, which uses detection of the chessboard for the purpose of camera(s) calibration. Chessboard detection is also used by Chen et al. (2012b), de la Escalera and Armingol (2010), Fathi and Brilakis (2016), Prokos et al. (2012), Bennett and Lasenby (2014). Placht et al. (2014) use the OpenCV library for the purpose of image processing. Chu et al. (2013) also deal with camera(s) calibration by using a chessboard pattern and he uses Matlab for the purpose of image processing. Laureano et al. (2015) are also focused on camera(s) calibration by using a chessboard pattern and these authors use the Matlab image database for the testing process. Kolomazník et al. (2013) deal with the approach towards obtaining a depth map. This approach uses the BoofCV library for the purpose of image processing. The methods SURF fast Hessian and Canny edge are used in this approach. Kamencay et al. (2012) develop an alternative approach, which uses a modified Sum of Absolute Difference (SAD) algorithm to obtain the depth map. George and George (2014) and Gu et al. (2014) use the OpenCV library for camera(s) calibration and obtaining the depth map. Wang et al. (2017) developed a similar approach and this author uses the Matlab toolbox for camera(s) calibration and the OpenCv library for obtaining the depth map. The use of the OpenCV library for the purpose of calibration and obtaining the depth map is described in Dröppelmann et al. (2010). Revuelta Sanz et al. (2011) use PCI (pseudo-color image) for matching process and segmentation of objects in the scene.
If cameras are properly calibrated, it is possible to obtain the stereoscopic image or the depth map. There exist many toolboxes for obtaining the stereoscopic image, e.g. OpenCV, BoofCV, EmguCV or MatlabCV. These toolboxes are related to the programming language of the user's choice. The toolbox OpenCV is related to the programming language C++. The toolbox BoofCV is related to the programming language Java. The toolbox EmguCV is related to the programming language C#. Various methods can be used to obtain the depth map. These methods differ according to the type of algorithm which is used to compare of the image. These methods are the Block-Matching (BM) method, the Graph-Cut (GC) method and the Semi-Global Block-Matching (SGBM) method. The SGBM method uses the semiglobal block-matching algorithm for computing stereo correspondence. The BM method uses the block-matching algorithm for computing stereo correspondence. The GC method uses the graph-cut algorithm for computing stereo correspondence. When a method is used it is then necessary to set up its key parameters. These parameters affect the quality of the depth map. The key parameters are published in Rambhia (2013) and these are the key parameters: • minDisparity -minimum possible disparity value; • numDisparities -maximum disparity minus minimum disparity (this parameter must be divisible by 16); • SADWindowSize -matched block size (it must be an odd number ≥ 1); • disp12MaxDiff -maximum allowed difference (in integer pixel units) in the left-right disparity check; • preFilterCap -truncation value for the pre-filtered image pixels; • uniquenessRatio -margin in percentage by which the best (minimum) computed cost function value should "win" the second best value to consider the found match correct (normally, a value within the 5-15 range is good enough); • speckleWindowSize -maximum size of smooth disparity regions to consider their noise speckles and invalidate; • speckleRange -maximum disparity variation within each connected component.

METHODOLOGY AND DATA
A stereoscopic head was constructed for the purpose of creating a stereoscopic image as the first step. This stereoscopic head was constructed by using two Basler acA1600-20uc industrial cameras with Computar M2514-MP2 lenses and is shown in Fig. 2. A frame rate of 20 fps, resolution 1624×1234 and chip size 1/1.8" are the key parameters of these cameras. A focal length 25 mm and optical size 2/3 are the key parameters of these lenses. The programming language C# was used for developing this solution. The EmguCV library was used for the purpose of image processing. The designed algorithm is shown in Fig. 3. This algorithm consists of 4 parts. The first part is the acquisition of the image. The key camera properties are set up in the process of initialization of the camera. The digital noise, brightness and shutter mode are these properties. These properties are set up for the purpose of obtaining a noiseless image. Only one image is acquired from both cameras, which means that the cameras do not operate as a stream. The obtained data is converted to PNG format in the process of image acquisition. Furthermore, images are slightly smoothed by using the bilateral filter. Further, images are converted to the image in greyscale.
Camera calibration is next part of this solution. A chessboard detection in the acquired image is used for purpose of camera calibration. The detection is performed 10 times at different positions of the chessboard. Intersections are detected on the chessboard and the resulting lines are subsequently drawn. The detection of lines is performed by using EmguCV functions. The functions FindChessboardCorners and FindCornerSubPix were used for this purpose.
The detection of intrinsic and extrinsic camera parameters is performed in another part. The detection is performed again by using EmguCV functions. The functions StereoCalibrate and cvStereoRectify were used for this purpose.
The depth map is obtained in the last part. The key properties of the depth map, which

RESULTS
The process of implementation has been examined in the previous chapter. Several problems have been found in the implementation of the depth map. The first problem was the format of data acquired by the Basler camera. The camera captures the scene as raw data. Therefore, the conversion to PNG format has been made using a storage sequence. The resolution of the image was another problem. Calculation of the depth map insists that the resolution of 1624 × 1234 is too long. Therefore, the obtained image was scaled to the resolution of 800 × 600. A GUI was developed for testing purposes. This GUI allows rendering the obtained image and the depth map. Furthermore, it allows setting parameters of the depth map, which were discussed in Chapter 2. Objects of various shapes, e.g. cups or cones, were used for the purpose of testing the process. The testing process was realized in several lighting conditions. Methods for obtaining the depth map were tested in the first step. The Block-Matching method (Fig. 4a) and the Semi-Global Block-Matching method (Fig. 4b) were tested in this step. The SGBM method was evaluated as the best. The quality of the depth map was tested in the next step. The results of the testing process are shown in Fig. 5. The quality of the depth map was tested in a special cage with LED lighting, which is placed in the laboratory of intelligent systems, as the first step. The testing process in the special cage with LED was performed with a special photo background (Fig. 5a-5c) and without it (Fig. 5d-5f). The quality of the depth map was also tested in normal lighting conditions and this result is shown in Fig. 5g-5i. The quality of the depth map was evaluated as acceptable. Some shortcomings were revealed in the testing process. The deformation of the depth map is the biggest shortcoming of this solution. The lighting conditions probably affect deformations of the depth map, because it may cause oversaturation of some parts in the image or digital noise may be produced. The deformation of edges and digital noise in the depth map are these kinds of deformations. Therefore, further research will be focused on removing these deformations. Furthermore, further problems are included with the lighting conditions. The global setup of key parameters does not exist for different lighting conditions. Therefore, the key parameters must be reset at the change of lighting conditions. The final depth map is shown in Fig. 6 as the result of author's work.

DISCUSSION AND CONCLUSIONS
This article deals with the use of stereoscopic images for the purpose of creating a depth map and with the associated problems. The image quality and camera calibration need to be addressed when working with stereoscopic images. These problems are gradually examined in Section 1. The stereoscopic head was constructed for the purpose of obtaining of the stereoscopic image. Furthermore, the algorithm for camera calibration and the creation of the depth map was implemented. The programming language C# was used for developing this algorithm. The use of the programming language C# allows control of the Katana 300s robotic arm. The problem with the image quality was solved by using a bilateral filter and setting key properties for the camera. Furthermore, the implemented solution was tested.
The result of the developed solution was also compared with another solution in the EmguCV tutorial (Emgu CV, 2012). A human head is included in the depth map in the EmguCV tutorial and some everyday objects are included in author's depth map. The comparison of these solutions evaluates author's solution as more usable in normal conditions. The depth map of the EmguCV tutorial is better in an ideal setting and conditions. This solution includes some shortcomings, which are discussed above in the Results chapter. These shortcomings will be removed in the future. The use of the depth map includes some benefits. The detection of obstacles in a scene is the first benefit. The detection of the object manipulation sequence or detection of distance of each object are further benefits.

ACKNOWLEDGEMENTS
Published results were acquired using the subsidization of the Ministry of Education, Youth and Sports of the Czech Republic, research plan IGA MENDELU MP PEF_DP_2016002 "Stereoscopic analysis and objects recognition".