Depth camera based dataset of hand gestures

The dataset contains RGB and depth version video frames of various hand movements captured with the Intel RealSense Depth Camera D435. The camera has two channels for collecting both RGB and depth frames at the same time. A large dataset is created for accurate classification of hand gestures under complex backgrounds. The dataset is made up of 29718 frames from RGB and depth versions corresponding to various hand gestures from different people collected at different time instances with complex backgrounds. Hand movements corresponding to scroll-right, scroll-left, scroll-up, scroll-down, zoom-in, and zoom-out are included in the data. Each sequence has data of 40 frames, and there is a total of 662 sequences corresponding to each gesture in the dataset. To capture all the variations in the dataset, the hand is oriented in various ways while capturing.


Specifications Table
This section lists the details of the hardware, procedure used for collecting the data followed by the format of the data.

Value of the Data
• The dataset is useful for developing novel machine learning algorithms to efficiently classify and recognise different video hand gestures.• The data set is useful for researchers working on computer vision to efficiently develop machine learning algorithms for proper classification and recognition of the hand gestures.• The data is useful for developing and testing novel algorithms to work on video hand gesture recognition.• The data is collected for different hand movements at different time instances to integrate all possible variations in the dataset.

Objective
Most of the datasets available in the literature are captured using RGB camera.However, these cameras are not robust to varying lighting conditions.Thus, this dataset which is created with the depth camera is more robust and reliable.This dataset has been used in [1] to classify the hand gestures.

Data Description
The dataset contains the video frames captured from RGB-D camera.The frames are captured from three different individuals for different hand movements like scroll-right, scroll-left, scrollup, scroll-down, zoom-in, and zoom-out.The total dataset has been divided into three sections: training, validation, and testing.Wherein, 80% of the data is allocated for training, 10% of the data is allocated for validation and the rest, 10%, is allocated for testing.

Experimental Design, Materials and Methods
We used RGB-D (Intel RealSense Depth Camera D435) camera module as shown in Fig. 4 to capture the hand gestures of an individual.The camera has the maximum range of 10 meters and supports two channels one for capturing RGB stream and other for capturing depth stream.The camera provides a field of view (FOV) of 87 • × 58 • for depth version and FOV of 69 • × 42 • for RGB version.The Intel RealSense D435 camera is self-calibrated and supports a hardware sync signal for multi-camera configuration [2] .By default, it is self-calibrated and we took all measurements with the default calibration settings.Normally depth stream is supposed to have a dimension of 480 × 860 [2] .In our set up, we lowered the dimension of depth stream to 480 × 640 to match the dimension of depth stream to that of RGB stream.In this way, we tune all the parameters to sync the frames of RGB and depth versions.Fig. 5 depicts the complete setup for capturing and saving the RGB and depth stream data to the computer.In order to maintain the proper stability, the camera is placed on a tripod stand [3] as shown in Fig. 4 to collect the data.The camera is connected to Lenovo thinkpad x1 carbon gen 9 [4] computer via USB-C to USB-3.0 port to capture the video sequence.Finally a python script was developed to successfully save the recorded video frames from both RGB and depth streams in the computer.

Fig. 2 .
Fig. 2. A complete set of RGB version of gestures.

Fig. 3 .
Fig. 3.A complete set of depth version of gestures.

Fig. 5 .
Fig. 5.The RGB-D camera setup for the collection of hand movement frame dataset.