Deep Learning Approaches for Age-based Gesture Classification in South Indian Sign Language

This study focuses on recognizing and categorizing South Indian Sign Language gestures based on different age groups through transfer learning models. Sign language serves as a natural and expressive communication method for individuals with hearing impairments. The intention of this study is to develop deep transfer learning models, namely Inception-V3, VGG-16, and ResNet-50, to accurately identify and classify double-handed gestures in South Indian languages, like Kannada, Tamil


INTRODUCTION
The rapid evolution of AI technologies has opened up significant opportunities to develop the life quality for individuals with disabilities.According to the World Health Organization, approximately 1.2 billion people, or 15% of the worldwide population, live with some type of disability.Recognizing the need for inclusivity, AI solutions to assist the disabled community in their daily activities have been leveraged.Consequently, AI, accessibility, and disabilities have become closely intertwined, with extensive research in AI and machine learning directed towards accommodating the daily lives of disabled individuals.AI has made significant strides in facilitating non-verbal communication for persons with impairments, including the recognition and understanding of sign language and gestures.Sign language, which relies on hand gestures and body motions, serves as a vital means of communication with impaired people.Advancements in computer vision and pattern recognition have been instrumental in enabling AI systems to interpret and respond to sign language effectively.But, despite the recent progress, the fast and accurate recognition of hand gestures remains a challenging task [1].Researchers are continuously working to get better performance of AI models in recognizing and categorizing these gestures.Gestures may vary in the speed, orientation, and hand alignment.Age category divisions for hand gesture performance are not rigidly defined, as there is considerable overlap and individual variations in hand gesture  Adolescence (8-25 years).As children advance through middle childhood to adolescence, their gesture vocabulary expands further, and they become more adept at using gestures for social interaction.
 Adulthood (25+ years).Adults typically possess a fully developed gesture repertoire.They use gestures to enhance verbal communication, convey complex ideas, express emotions, and adhere to cultural norms and social cues.
Gesture accuracy gradually changes as individuals grow up.The literature survey on pattern recognition applications in the identification of sign language gestures reveals the following key findings.Authors in [2] employed a timedistributed Convolution Neural Network (CNN) and a Gated Recurrent Unit (GRU) to extract skin features from hand images.The models achieved a performance accuracy of 96.5% in categorizing hand images into 17 age groups ranging from 18 to 75 years old.Authors in [3] utilized MRELBP to extract features from right and left hand images.The images were classified by person, age, and gender, reporting accuracies of 91.4%, 85.9%, and 92.6%, respectively.Authors in [4] presented age detection on an image dataset using combinations of deep learning and image processing techniques, achieving an accuracy of 91%.The literature survey shows that noteworthy research has been explored in the area of sign language and hand gesture identification, with a particular emphasis given on deep learning techniques for recognizing human poses and scenes from images [5][6][7][8][9].However, there is a noticeable gap in the identification of South Indian Sign Language (SISL) gesture images regarding different age groups [10].This research article aims to address this issue and provide a novel method for recognizing SISL gestures across various age groups.

II. PROPOSED METHODOLOGY
The proposed method consists of two primary stages: the first involves organizing the image dataset into various age groups, while the second entails employing deep learning methods to classify the gestures; Figure 1 illustrates a block diagram showcasing the stages involved.

A. Data Preparation
The dataset preparation process involves selecting 10 categories of double-hand gestures based on different age groups.High-resolution images were captured using a Nikon D3300 camera with a resolution of 24.2 megapixels.The images were taken against black or green backgrounds under natural lighting conditions.Each image has an original size of 1080 × 2400 pixels.For the initial age group (age ranging from 1 to 7), the dataset consists of 5000 images, with 500 images per gesture category.Image extension methods were applied to expand the dataset to 10,000 images, enhancing its diversity.To ensure efficient processing and storage, all images were resized to 300 × 300 pixels.The dataset was divided into three subsets: training (70%), validation (15%), and testing (15%).The training subset contains 7000 images, while the validation and testing subsets contain 1500 images each.This division ensures the acquisition of enough data for the training and reliable estimation of the models.The same methodology was applied to the other age groups.By following this systematic approach, a comprehensive dataset of double-hand gestures is prepared, covering various age groups, facilitating the development, and estimating the accuracy of recognition models.Figure 2 displays a sample of the utilized dataset.

B. CNN Classifiers
In this study, three prominent transfer learning CNN [11] models are utilized for the recognition and classification of double-handed gesture images.These models include Inception-V3 [12], VGG-16 [13], and ResNet-50 [14], acknowledged for their effectiveness in image recognition

III. EXPERIMENTAL RESULTS AND DISCUSSION
The experiments on double-handed gesture classification for different age categories were conducted using the Deep Learning Toolbox given by the MATLAB R2022b platform.The pre-trained CNN models employed in the experiments were imported and prepared for transfer learning by modifying the properties of suitable layers utilizing the Deep Network Designer application.Specifically, the last learnable layer and the output or classification layer were replaced to align with the classes in the newly constructed double-handed gesture image dataset.To control the training process, specific options were set for the CNN models.The initial learning rate, validation frequency, number of epochs, and mini-batch size were initialized to 0.0001, 10, 30, and 35, respectively.These values were chosen based on experimentation and empirical knowledge to achieve optimal training performance.In terms of activation functions, all hidden layers in the CNN models were activated implementing the Rectified Linear Unit (ReLU) function, which has been usually used in deep learning due to its ability to introduce non-linearity and handle vanishing gradients.The output layer, responsible for classification, was activated utilizing the softmax function, which produces a probability distribution over the different classes.To fine-tune the network and optimize its performance, the Stochastic Gradient Descent (SGD) algorithm was employed as the optimization algorithm.The customized CNN models were trained and validated applying the prepared augmented image dataset.The training progress of each CNN model, along with the corresponding validation accuracy and loss, is monitored and visualized in Figures 3 to 5.These figures provide a graphical representation of the training process, allowing for a better knowledge of the model's performance and convergence.The confusion matrices provide insights into the performance of the models by illustrating the circulation of the predicted and actual class labels.They permit for the identification of any misclassifications or patterns in the model's predictions, aiding in the measurement of its efficacy in recognizing double-handed gestures.By analyzing the training progress and the confusion matrices, valuable information can be obtained regarding the performance and reliability of the CNN models in classifying double-handed gestures for different age categories.Table II presents the average scores of these evaluation metrics for all the considered CNN models across the classes of double-handed gesture images, which include different age categories.According to the results in Table II, it is apparent that the Inception-V3 model achieves the highest performance among the three models, followed by the ResNet-50 and VGG-16.The Inception-V3 model demonstrates superior evaluation metric scores and achieves the highest validation and testing accuracies of 95.20%, 92.50%, and 90.20% for age groups 1 to 7, 8 to 25, and 25 and above, respectively.The performance comparison analysis provides valuable insights into the effectiveness of the CNN models in classifying double-handed gesture images for different age categories.To visually depict the performance comparison results of all the considered pretrained CNN models, Figure 7 is provided.According to the experimental results, the class labeled "2M" representing the word" ನ'( /నె మలి /மயில் /Peacock" in the age group from 8 to 25 achieved the maximum validation accuracy among all the double-hand gestures, while the class labeled "2F" representing the word " ಅ /అడుగు /அடி /Foot" in the age group from 8 to 25 obtained the lowest validation accuracy.These results indicate the varying performance of the CNN models in correctly classifying different double-hand gestures based on their age groups and letters.It can be seen that the Inception-V3 model demonstrated the best performance among all the considered pre-trained CNN models.It achieved the highest evaluation metrics, as well as the maximum validation and testing accuracies for different age groups.The high accuracy values of the Inception-V3 model indicate its efficacy in precisely classifying double-hand gestures based on age groups.The superior performance of the Inception-V3 model suggests its appropriateness for recognizing and categorizing South Indian Sign Language gestures across different age groups.The performance comparison results of all the considered pretrained CNN models can be seen in Figure 7. Performance evaluation results of all the pre-trained CNN models.

www.etasr.com Badiger et al.: Deep Learning Approaches for Age-based Gesture Classification in South Indian Sign …
IV. CONCLUSION The current study utilized popular pre-trained CNN models, namely Inception-V3, ResNet-50, and VGG-16, to classify double-hand gesture images across 30 different classes, focusing on 10 signs for different age categories (1-7, 8-25, and 25 and above).These models were customized and finetuned to accommodate the specific image classes and improve classification performance.The study results demonstrated impressive performance across all three models.However, Inception-V3 emerged as the top-performing model, achieving an average classification accuracy of 95.20%, indicating its efficiency in accurately classifying double-hand gestures based on age categories.The outcomes of this work have potential applications in constructing automated systems that can identify South Indian sign language gestures from both still images and streaming videos [15][16][17].By leveraging these advanced CNN models, communication barriers can be reduced, allowing for easier and more effective communication with the outside world.Future research endeavors could explore the use of the latest CNN models and incorporate publicly available image datasets to further enhance the image dataset employed in this study.

Fig. 3 .
Fig. 3. Validation accuracy and loss graph while training the Inception V3 CNN model.

Figure 6
Figure 6 displays the confusion matrix of the trained Inception V3 CNN model on the test dataset.

Fig. 4 .
Fig. 4. Validation accuracy and loss graph while training the ResNet 50 CNN model.

Fig. 6 .
Fig. 6.Confusion matrix of the test dataset for the trained Inception V3 CNN model.

Fig. 7 .
Fig. 7.Performance evaluation results of all the pre-trained CNN models.

www.etasr.com Badiger et al.: Deep Learning Approaches for Age-based Gesture Classification in South Indian Sign …
Table I provides a comprehensive list of double-handed gestures considered for the research work, alongside with their respective letters in various South Indian languages based on different age groups.