NSL23 dataset for alphabets of Nepali sign language

Nepali Sign Language (NSL) is used by the Nepali-speaking community in Nepal and in Indian states such as Sikkim, the hilly region of North Bengal, some parts of Uttarakhand, Meghalaya, and Assam. It consists of the International Manual Alphabet (A-Z), Nepali consonants, vowels, conjunct letters, and numbers represented in the form of one-handed fingerspelling or Nepali manual alphabet. The standard gestures for NSL have been published by the Nepal National Federation of the Deaf & Hard of Hearing (NFDH). To learn Nepali Sign Language, the first step is to understand its alphabet set. The use of technology can help ease the learning process. One of the application areas of computer vision is translating sign language gestures to either text or audio to facilitate communication. This is an open research area. However, NSL translation is one of the less explored research areas because there is no dataset available to work on for NSL. This paper introduces the Nepali Sign Language Dataset (NSL23), which is the first of its kind and includes vowels and consonants of the Nepali Sign Language alphabet. The dataset consists of .mov videos performed by 14 volunteers who have demonstrated 36 consonant signs and 13 vowel signs either in one full video or character by character. The dataset has been prepared under various conditions, including normal lighting, dark lighting conditions, prepared environments, unprepared environments, and real-world environments. The volunteers who performed the NSL gesture have been classified as 9 beginners who are using NSL for the first time and 5 experts who have been using NSL for 5 to 25 years. NSL23 contains 630 total videos representing 1205 gestures. The dataset can be used to train machine learning models to classify the alphabet set of NSL and further develop a sign language translator.

To learn Nepali Sign Language, the first step is to understand its alphabet set.The use of technology can help ease the learning process.One of the application areas of computer vision is translating sign language gestures to either text or audio to facilitate communication.This is an open research area.However, NSL translation is one of the less explored research areas because there is no dataset available to work on for NSL.This paper introduces the Nepali Sign Language Dataset (NSL23), which is the first of its kind and includes vowels and consonants of the Nepali Sign Language alphabet.The dataset consists of .movvideos performed by 14 volunteers who have demonstrated 36 consonant signs and 13 vowel signs either in one full video or character by character.The dataset has been prepared under various conditions, including normal lighting, dark lighting conditions, prepared environments, unprepared environments, and real-world environments.The volunteers who performed the NSL gesture have been classified as 9 beginners who are using NSL for the first time and 5 experts who have been using NSL for 5 to 25 years.NSL23 contains 630 total videos representing 1205 gestures.The dataset can be used to train machine learning models to classify the alphabet set of NSL and further develop a sign language translator.© 2024 The Author(s Video: 4K@30fps, 1080p@30/60/120fps, 720p@240fps.

Data acquisition Process
• Data were acquired in the form of video(.mov) • Each volunteer was asked to perform hand gestures facing the camera.Gesture was performed using only the right hand.• To ease cropping and extraction operation during pre-processing, the hand performing the gesture was placed beside the face.• Duration of video captured character by character varied from 1 to 5 s and video comprising all character comprises of 14.45 s to 4.5 min.
Environment -Prepared, unprepared, Lighting -Bright, Dark, Normal Location -Indoor, Outdoor has been considered while capture the data.
• Each volunteer has performed 36 Consonant and 13 vowels either as a full video or one character per video.

Value of the data
• The users of Nepali Sign Language will greatly benefit out of this dataset as it is first of its kind and will be used by researchers to develop automatic Nepali sign language recognition application.There are many existing datasets for other languages [2-5 , 7 , 8 , 10-13 , 15 , 17 , 19-23] • This dataset can be used to develop Nepali sign language translator a mean of communication between the deaf and non-deaf communities.• This dataset will help researchers and developers to train and test machine learning, deep learning models [3 , 4 , 18] created specifically for automated Nepali Sign Language hand signs detection and recognition.• The Nepali sign language dataset opens up a new avenue for future study and development of real-world gesture recognition for researchers as it comprises of real-world condition [3 , 6 , 8] where varying lighting conditions, backgrounds, and hand positioning has been considered.• This will benefit Nepali speakers who want to learn Nepali Sign Language as well as those developing translation applications to serve as a communication tool between NSL user and non-user [1 , 14 , 19] .• It can be used to create new similar dataset or to expand the dataset by replicating the samples or adding new samples of Nepali sing language in different background conditions, lighting conditions, and orientation to further improve NSL23 dataset.

Background
Several hand gesture datasets are available for public use [2-5 , 7 , 8 , 10-13 , 15 , 17 , 19-23] .However, most of them lack variations in the gestures they provide.While some sign language datasets that are publicly available offer a collection of frames to represent a gesture, others provide videos that were collected under the same environmental and lighting conditions.The paper aims to create a comprehensive dataset that includes various lighting conditions such as indoor, outdoor, bright, dark, and natural lighting.The dataset will also consider prepared and unprepared backgrounds.The scenario for the dataset involves volunteers performing gestures at different positions and heights in a real-world setting.The objectives of the dataset are to: • To create a dataset that provides a Nepali Sign Language alphabet set (Consonant and Vowel) and make it available publicly.• To provide a dataset having variations in environments and lighting conditions.
• To provide the dataset in raw form so that users can preprocess the data as per their requirement and use it to train supervised or semi-supervised or test the supervised and unsupervised machine learning model and deep learning models [6 , 9 , 10 , 16 , 18] .• To encourage researchers to start working in Nepali Sign Language translation.

Data Description
The NSL23 dataset contains two folders: one for consonants and one for vowels.Each folder contains subfolder which further contains videos.The details of both folders are provided in Tables 1 and 2 respectively.1 shows the organization of subfolders which contains 36 consonant of Nepali sign language.
Table 2 shows the organization of subfolders which contains 13 vowels of Nepali sign language.
Tables 3 and 4 to identify each video alphabetically.The direction of the arrow in the hand gesture represents the direction of movement of the hand while performing the dynamic gesture.Volunteers were given the table to understand and learn the gesture before capturing their video as shown in Fig. 3 .

NSL _Consonant
Each folder inside NSL_Consonant and NSL_vowel has been labeled using the naming convention as follows: volunteer number (S1 to S14) _ NSL _ Consonant (or Vowel) _Bright (or Dark, Prepared, Unprepared, RealWorld) _Cropped (if cropped video).Each folder consists of videos in which volunteers have demonstrate the hand gestures for consonant and vowel alphabets of NSL respectively.These videos are named using the naming convention specified in Table 3 and Table 4 .If a video displays one character of NSL consonant/Vowel at a time, otherwise the word "all" is used to signify that the video contains all the alphabet of NSL consonant/vowel in a single video.Table 1 and Table 2 provide the detail content of NSL_Consonant and NSL_Vowel  respectively.Users can follow the above given naming convention to access the corresponding data as per requirement.

Experimental Design, Materials, and Methods
To learn any sign language [2-4 , 7 , 20 , 22] , one must start with its alphabets.This paper focuses on the construction of Nepali Sign Language alphabets (consonant and vowel) and has been named NSL23.To start with the data set preparation that can be used by Machine learning model to recognize real world NSL gesture, training data needs to be prepared.A faculty of a special school was approached to assist with the data collection process.He teaches Nepali sign language to differently able students of class 1 to class 5.With his support it was possible to collect raw data in the form of video in a prepared and un-prepared environment.The prepared environment was created by pasting black chart paper at the background while the volunteer performing the gesture pose facing the camera kept in front of them.Fig. 1 shows some volunteers and the environment used during data collection.Total there are 14 volunteers out of whom 5 volunteers are native users of Nepali sign language and are either the faculty using NSL or students who have been using it for more than 5 years.
Data has been captured in first persons' view [8] which means the camera is placed in front of the volunteer performing the gesture.To increase the scope of the dataset various environmental and illumination conditions have been considered.The videos are taken indoor-prepared as shown in Fig. 1 , unprepared, and outdoor to make it more dynamic.Fig. 2 show the different real world environment considered while acquiring the data.Initial phase of collecting NSL23 dataset comprises of 630 total videos, 1205 gestures performed by 14 volunteers labeled as S1 to S14 to identify each volunteer.Volunteers S3, S4, S5, S6, and S14 are experts in NSL, so they provided consonant and vowel gestures in one take.However, the remaining volunteers were new to NSL, so their signs had to be captured character by character as shown in Fig. 3 .A prepared environment was created by putting black chart paper in the background to ease the segmentation process of separating foreground and background.For the dataset, outdoor videos were shot in a natural light environment, while indoor videos was shot with natural light as well as by turning on room light.To illustrate the illumination condition, all the videos have been labeled as dark or bright depending on the lighting environment used during the video shoot.To increase variation in the dataset, some videos were preprocessed using Adobe Express tools to crop and keep only the hand portion.These videos are labeled as "cropped''.The methodology used for the collecting and structuring of datasets is shown in Figs. 3 and 4 .The dataset was captured using the process shown in Fig. 3 and structured using the methodology illustrated in Fig. 4 .
To curate the dataset, we had to identify which videos to include or discard; label each included video, and structure it, which required human intervention.This process was timeconsuming.To label each video nomenclature given in Table 3 and Table 4 has been used.
Figs. 5 and 6 shows the different lighting conditions used in dataset.Video containing incomplete or wrong gesture was manually discarded.The videos were trimmed and cropped using the Adobe Express (online video editing website) [source: https: //www.adobe.com/express] as shown in Fig. 7 for videos labeled as cropped.Other preprocessing approach has not been performed in NSL23 to encourage users apply preprocessing techniques as per their need to make the dataset suitable for their purpose.

Ethics Statements
This data collection work has involvement of human subjects.Consents from all the subjects/participants are taken in writing form and scanned copy of the same is attached as supplementary material with this submission.

Data Availability
Nepali Sign Language -Consonant and Vowel (Original data) (zenodo.org)
a b s t r a c t Nepali Sign Language (NSL) is used by the Nepali-speaking community in Nepal and in Indian states such as Sikkim, the hilly region of North Bengal, some parts of Uttarakhand, Meghalaya, and Assam.It consists of the International Manual Alphabet (A-Z), Nepali consonants, vowels, conjunct letters, and numbers represented in the form of one-handed fingerspelling or Nepali manual alphabet.The standard gestures for NSL have been published by the Nepal National Federation of the Deaf & Hard of Hearing (NFDH).

Fig. 2 .
Fig. 2. Outdoor and Indoor Environment, Unprepared Environment to Showcase Real World Scenario.

Table 1
Description of NSL23 Consonant Dataset.

Table 3
Nomenclature used for consonant videos (All Static Gesture).
( continued on next page )

Table 4
Nomenclature used for vowel videos (Static and Dynamic gesture).