A comprehensive video dataset for Multi-Modal Recognition Systems

doi:10.5281/zenodo.1492227

Published December 16, 2018 | Version v1

Dataset Open

A comprehensive video dataset for Multi-Modal Recognition Systems

1. Dr. APJ Abdul Kalam Technical University, Lucknow
2. University Institute of Engineering and Technology, Kanpur
3. Harcourt Butler Technical University, Kanpur

A fully-labelled video dataset will act as a unique resource for researchers and analysts in the fields such as machine learning, computer vision and deep learning. The videos contain similar text recited by 67 different subjects. The text contains digits from 1 to 20 recited by 67 different subjects within the same experimental setup.

Notes

The dataset folder contains the HD videos of 67 subjects. The corresponding sample for one video has been uploaded with the python scripts which can be customized for the entire dataset videos to get the frames, frames with Boundary Box detection, Audio of the entire video, split audio for the text being recited and the waveforms for entire video audio files and the split text. Uncompress the Video_Dataset_uploaded folder. There are two folders : 1. Main_video_dataset: This folder consists of all the HD videos of 67 subjects. 2. Pre-Processed dataset and scripts: This folder consists of samples for a single video such as frames, audio .wav files, split audio .wav files, and waveforms for both. It also consists of python scripts which can be used to extract the same information for all the videos of the dataset.

Files

Video_Dataset_uploaded.zip

Files (3.9 GB)

Name	Size	Download all
Video_Dataset_uploaded.zip md5:9fef2bc782154628bfca2faf3ddc8d1f	3.9 GB	Preview Download

	All versions	This version
Views	1,051	1,050
Downloads	89	89
Data volume	379.1 GB	379.1 GB

A comprehensive video dataset for Multi-Modal Recognition Systems

Creators

Description

Notes

Files

Video_Dataset_uploaded.zip

Files (3.9 GB)