Published December 16, 2018 | Version v1
Dataset Open

A comprehensive video dataset for Multi-Modal Recognition Systems

  • 1. Dr. APJ Abdul Kalam Technical University, Lucknow
  • 2. University Institute of Engineering and Technology, Kanpur
  • 3. Harcourt Butler Technical University, Kanpur

Description

A fully-labelled video dataset will act as a unique resource for researchers and analysts in the fields such as machine learning, computer vision and deep learning. The videos contain similar text recited by 67 different subjects. The text contains digits from 1 to 20 recited by 67 different subjects within the same experimental setup.

Notes

The dataset folder contains the HD videos of 67 subjects. The corresponding sample for one video has been uploaded with the python scripts which can be customized for the entire dataset videos to get the frames, frames with Boundary Box detection, Audio of the entire video, split audio for the text being recited and the waveforms for entire video audio files and the split text. Uncompress the Video_Dataset_uploaded folder. There are two folders : 1. Main_video_dataset: This folder consists of all the HD videos of 67 subjects. 2. Pre-Processed dataset and scripts: This folder consists of samples for a single video such as frames, audio .wav files, split audio .wav files, and waveforms for both. It also consists of python scripts which can be used to extract the same information for all the videos of the dataset.

Files

Video_Dataset_uploaded.zip

Files (3.9 GB)

Name Size Download all
md5:9fef2bc782154628bfca2faf3ddc8d1f
3.9 GB Preview Download