The Optimization of Numerical Algorithm Parameters with a Genetic Algorithm to Animate Letters of the Sign Alphabet

: At present, the development of animation-based works for human–computer interaction applications has increased. To generate animations, actions are pre-recorded and animation flows are configured. In this research, from two images of letters of the sign language alphabet, intermediate frames were generated using a numerical traced algorithm based on homotopy. The parameters of a homotopy curve were optimized with a genetic algorithm to generate intermediate frames. In the experiments performed, sequences where a person executes pairs of letters in sign language were recorded and animations of the same pairs of letters were generated with the proposed method. Subsequently, the similarity of the real sequences to the animations was measured using Dynamic Time Wrapping. The results obtained show that the images obtained are consistent with their execution by a person. Animation files between sign pairs were created from sign images, with each file weighing an average of 18.3 KB. By having sequences between pairs of letters it is possible to animate words and sentences. The animations generated by this homotopy-based animation method optimized with a genetic algorithm can be used in various deaf interaction applications to provide assistance. From several pairs of letters a file base was generated using the animations between pairs of letters; with these files you can create animations of words and sentences.


Introduction
Deaf people communicate through sign language, which consists of a series of gestural signs articulated with the hands and accompanied by facial expressions, intentional gaze and body movement, endowed with linguistic function [1].According to data from the World Health Organization (WHO), 1.5 billion people live with some degree of hearing loss.According to the National Institute of Statistics and Geography (INEGI), in Mexico, 1.3% of the population aged three years or older cannot hear.Mexican Sign Language (MSL) is officially recognized as a national language and is part of the linguistic heritage of the Mexican nation [2,3].On the other hand, this form of communication has not yet been disseminated throughout the entire population of Mexico, since there are less than 500,000 people who communicate with this language [4].Therefore, it is important to develop a tool for deaf people to communicate, otherwise this limits their development, access to information, social inclusion and participation in everyday life [5].
In recent years, several works have been conducted on hand gesture recognition based on visual information [6][7][8] in order to develop human-computer applications.In these works [9], machine learning algorithms are used to recognize static and moving signs in various applications of human-computer interactions, such as controlling a robot.However, there are fewer works in which an animation or avatar is developed to communicate with deaf people.The authors of [10] implemented an avatar that performs signs; motion capture (MoCap) was employed to capture body, limb and head movements in 3D space, consequently, these movements had to be corrected during the post-production process and additional animations had to be made by moving each finger bone to the required sign position.
In [11], facial animations are created from two images with a numerical traced algorithm.Their methodology uses the homotopy curve path to generate intermediate frames for different λ values.The intermediate frames are the deformations from the initial image to the final image.A hyperspherical tracking method establishes deformations with visually consistent and smooth changes.In the experiments, the radius of the hypersphere is constant.This method showed good results in the examples presented.
In this research, we are interested in creating animations between pairs of sign language letters using the method proposed in [11].The original contribution of this research is to use a genetic algorithm [12] to optimize the radius and its increment to plot the homotopy curve of the numerical traced algorithm to calculate the animation between pairs of letters of the sign language alphabet.In this way, from a base of images of letters of the alphabet (https://acortar.link/1KWigu(accessed on 8 February 2024)), animations can be generated to spell words in sign language.One of the advantages of generating animations with this algorithm is that once an animation is generated between two pairs of letters, such as (a,b), this same animation can be used for pairs of gestures (b,a) that are to be executed in the reverse order.The files containing the animations between pairs of letters weigh on average 18.3 KB, and they are executed in Matlab (R2024a).
The manuscript is organized as follows: Section 2 describes the hand gesture animation system.In Section 3, the homotopy-based animation method is introduced.In Section 4, optimization with a genetic algorithm is explained.After that, in Section 5, the experimental design and the obtained results are presented.In Section 6, a brief discussion is presented.Finally, Section 7 summarizes the findings of this research and sets up future work.

The Hand Gesture Animation System
The hand gesture animation system proposed in this research consists of the following stages (Figure 1):

•
Hand joint position detection: Google's Mediapipe library is used to recognize the positions of the hand joints in the starting and the ending frame.The Mediapipe library is used in hand tracking work [13], while the MediaPipe Hands library provides only a 2.5D pose estimation.In [14], a simple calibration and the concept of perspective projection were proposed to obtain the 3D position of the hands relative to a smartphone.Figure 2 shows the 21 joints that are detected in a hand with Google's Mediapipe library.Each joint has three coordinates (j x , j y , j z ).The library was configured to detect the hands and their joints with a confidence of 0.5.

•
Calculate transitions: the initial and end image of the joints' position are used with the proposed method in [11] and optimized with a genetic algorithm to calculate the transition images.The lower part of Figure 1 shows the 15 transitions calculated from a given pair of images.These transitions are the animation between pairs of signing gestures.Matlab was used to implement the numerical traced algorithm and its parameter optimization with a genetic algorithm.

Homotopy-Based Animation Method
Homotopy continuation methods [15] are based on the insertion of a homotopy parameter λ into non-linear algebraic equations in order to obtain a continuous deformation from a trivial state to a non-linear state.In (1) n is the number of variables and x is the set of variables from the system of equations Transitions between the starting and ending hand gesture are calculated by applying the homotopy-based animation method (HAM) explained in [11].Following the same notation, the initial hand gesture is named G1 and the final hand gesture is named G2.At the stage of hand joint position detection (Figure 1), each gesture is stored in a matrix of 21 rows and 3 columns, since 21 joints with 3 components are detected in each hand (see the first column of Table 1).In (2), the initial G1 and ending G2 gesture hands are introduced to the system of equations: where λ represents the homotopy parameter and G1 and G2 are Ax = B and Cy = D system equations respectively.Each joint corresponds to variables x 1 , x 2 , x 3 , ..., x 63 for the initial hand gesture (second column in Table 1) and to y 1 , y 2 , y 3 , ..., y 63 for the end hand gesture (third column in Table 1).According to Ax = B and Cy = D, the starting and ending hand gesture can be established as follows: The systems of the equations shown in ( 3) and ( 4) are substituted into (2) to obtain a global system of equations that contains a combination of the systems from the starting and ending hand gesture in order to create the animation.To achieve deformations or transition from the initial gesture hand G1 when λ = 0, to the end gesture hand G2, where λ = 1, it is necessary to track the homotopic curve using a numerical traced algorithm.For this purpose, the hypersphere equation [16] is introduced.
where x 64 is the value of λ in each transition.To start the tracing of the curve [17], the value of λ is 0; x 1 , x 2 , x 3 , ..., x 63 are the dimensions of the hypersphere; C 1 , C 2 , C 3 , . . ., C 64 are the coordinates of the center of the hypersphere; and r is the radius of the hypersphere.Therefore, the system of equations to be solved to calculate the transitions from a starting hand gesture to a final hand gesture contains ( 2) and ( 5).
The numerical traced algorithm calculates the transitions between the hand gestures G1 and G2 as follows: 1.
Matrix A and C are created with random values; for this research A and C are equal to simplify the calculations.2.
Matrix B is calculated using the values of the initial hand gesture joints G 1 and matrix A. Matrix D is calculated using the values of the joints of the end hand gesture G 2 and matrix A. B and D are kept constant during the execution of the algorithm.

3.
Since G1 is Ax = B and G2 is Ay = D, and thus G1 is Ax − B and G2 is Ay − D, these equations are substituted into (2) to obtain (6).
In ( 6), x and y correspond to the joint positions in G1 and G2, respectively, and both sets of variables correspond to the same joint positions in the intermediate gestures of the hand.Therefore, x and y correspond to the same joint, and thus the variable y is changed to x.Then, to calculate the intermediate transitions, ( 6) is changed as follows: 5. Thus, solving the system of Equations ( 8), x contains the transitions needed to obtain animations between pairs of hand gestures.
The centers C 1 , C 2 , C 3 , ..., C 63 of the hypersphere are substituted by the values of G1 and C 64 is equal to the initial value of λ, which has a value of 0. A value is assigned to r.The system of Equations ( 8) is solved iteratively with the Newton-Raphson method [18]., (b,c), (d,c) and (b,d) were calculated.For each pair of gestures the numerical traced algorithm was run 7 times, and the value of the radius r was set as shown in Table 2.Each graph shows the 7 runs of the numerical traced algorithm, with each line corresponding to an animation.The x-axis shows the iterations executed to solve the system of Equations ( 8) and the y-axis shows the calculated values of λ corresponding to each iteration.In each graph it is observed that the lines start at a λ equal to 0, which corresponds to G1.As the iterations are executed to solve the system of equations, the value of λ must increase, and then, according to (2), when λ is equal to 1 the algorithm calculated the transition from G1 to G2 successfully.
Circles in the transition's line indicate the transition gestures that were calculated for each value of r.In each graph, the line that reached a value of λ close to 1 is highlighted in black.For λ in the interval [0,1], it is observed from the transition line in black that for (a,b) it has 5 circles, while it has 7 circles for (b,c), 2 circles for (d,c) and 5 circles for (b,d).No transitions were created for (d,c), since the solution to the system of equations is the initial gesture G1 when λ is equal to 0 and the final gesture G2 when λ is equal to 1 and, in the next iteration, it is observed that the value of λ decreases.Figure 5  In each transition the λ value corresponding to the calculated transition is shown.
According to [19] if the radius of the hypersphere varies, more transitions can be calculated.To prove the above, the letters (d,c) were chosen, because only two transitions were obtained with a fixed radius value in each run.In each run, the value of the radius was increased when it was observed that there was no change in the value of λ between the current and previous iteration.Several tests were performed, and one of the best results is shown in Figure 6.Five transition lines are shown with initial radius values of 0.05, 0.1, 0.15, 0.20 and 0.25, respectively; in each run the radius increment was set to a value of 0.05, and all runs reached the value of λ equal to 1.The line highlighted in black corresponds to an initial radius of 0.1 and has 20 circles, which means that the animation has 20 transitions.In order to automate the generation of transitions in this research, a genetic algorithm (GA) was used to optimize the radius parameters and their increment in the numerical traced algorithm to obtain transitions between pairs of letters of the sign alphabet.

Optimization with a Genetic Algorithm
One of the most used bio-inspired algorithms is the genetic algorithm (GA) [20].The GA consists of an adaptive heuristic search which simulates the processes of natural selection.The competition among individuals for resources results in the fittest individuals dominating over weaker ones.As in nature, individuals use selection mechanisms for mating and the recombination and mutation of genetic material to evolve solutions to a given problem.More details about the implementation of genetic algorithms can be found in [21,22].This technique is useful when the search space is big and traditional methods fail to provide competitive solutions.The GA implemented in this research is executed as follows: 1.
Create a random population with ten binary individuals.Each individual has 32 alleles: 16 are to represent the radius (r) and 16 are to represent the increment (inc) of the radius.The interval is encoded with 16-bit values [0, 65535], the radius value is divided by 1,000,000 and the radius increment is divided by 100,000; thus, the radius takes values in the interval [0,0.655] and the radius increment in the interval [0,6.553].Table 3 shows an example of an individual; the second line shows its value in binary and the third line shows its real value.It was calculated by making the conversion from binary to decimal and dividing by the corresponding value.Each individual encodes the initial radius of the hypersphere and its increment to create a 30-frame animation with the numerical traced algorithm.
For each individual, with the value of their radius and its increment decoded, the numerical traced algorithm is run to solve the system of Equations ( 8).The numerical algorithm is run for 30 iterations.

2.
Calculate the fitness for each individual in the population.An individual is better than another if, at the end of 30 iterations, λ is close to 1. Figure 8 shows the execution of three individuals to compute animations between the letters (b,a).In Figure 8, the left column shows the value of λ for each individual as the 30 iterations are executed and the right column shows the final gesture that was calculated with each individual.
For the first individual (in the first line), the final gesture is the letter b, and its final λ value is 0.314146; in the second line, the individual calculated a gesture that is more similar to the letter a, and its final λ value is 0.904299; and finally, in the third line, the individual calculated a transition with the fingers more closed and therefore more similar to the letter a, and its final λ value is 1.05106.If λ has a value greater than 1.1, the individual's fitness is penalized by multiplying his fitness by −1.

3.
Select the best individual for elitism.

4.
Select individuals in order to create a new population using two-point crossover and simple mutation.5.
Apply elitism.Transfer the best individual to the population for the next generation.

6.
Go to step 2, and repeat for 30 generations.
Table 3. Solution encoding for the GA.

Experiments and Results
The experiments designed to evaluate the animations created with the numerical traced algorithm optimized with a GA are as follows: • Three videos were recorded in which a person spells the following pairs of letters: (h,o), (o,l) and (l,a).

•
With the Mediapipe library and python, the positions of the joints were obtained and recorded in a .txtfile.The file structure is as follows: the first column corresponds to the x-coordinate, the second to the y-coordinate and the third to the z-coordinate of the joints; the first 21 rows correspond to the 21 joints in the first frame, the next 21 rows to the second frame and so on.The videos, text files and matlab program that show the animation have been uploaded to the following link https://acortar.link/YI1ajV(accessed on 8 February 2024).

•
The positions of the joints in the first and last frame of each .txtfile were used to create the animation with the numerical traced algorithm optimized with a GA.The GA was run 30 times (10 for each pair of letters).Table 4 shows the statistical results.Figure 9 shows the execution of the three best individuals, one for calculating the animation of (h,o), another for (o,l) and the last one for (l,a).

•
The similarity between the animations created with the numerical traced algorithm and the recorded sequence was measured using Dynamic Time Warping (DTW) [23].DTW [24,25] is useful because lets us to compare time series with different numbers of frames.Table 5 shows the similarity value between the recordings made and the animations created.The diagonal corresponds to the similarity when comparing the recording of the pair of letters to the corresponding animation.For each row, the cells on the diagonal have the smallest value, so the recordings and their corresponding animations are more similar when compared to themselves and not others.Table 6 shows the similarity value between the recordings and Table 7 shows the similarity value between the animations.When comparing the real sequences and the animations there is a greater difference between the sequences (h,o) and (l,a), in second place are the sequences (o,l) and (l,a) and in third place are the sequences (h,o) and (o,l), so the similarity measures between the real recordings and the simulations maintain the same order of similarity.For the last experiment, for each of the images that have been uploaded to the following link https://acortar.link/1KWigu,accessed on 8 February 2024, the positions (x,y,x) of 21 joints were obtained and stored in a .txtfile.Subsequently, 156 animations, calculated using the numerical algorithm optimized by a GA from pairs of gestures taken from the 20 .txtfiles containing the positions (x,y,z) of the 21 hand translations, were loaded into the animation folder.When creating the animations we realized that having the animation of, for example, (b,c) means that it is not necessary to create a file for the animation (c,b); we can just run the animation (b,c) in reverse order.In this way, it is not necessary to record animations between all pairs of letters.A file that uses 30 frames to create an animation weighs on average 18.3 KB.

Metric
For the 156 animations created, it was measured whether the last frame corresponds to the letter indicated in the sequence.For example, in the sequence "lt" we want to know whether the last frame corresponds to the letter t.For the 156 sequences that were created, we compared, using Euclidean distances, the position of the joints of the last frame in the sequence with the positions of the joints of the 20 gestures a, b, c, d, e, f, g, h, i, l, m, n, o, p, r, s, t, u, v and w.The gesture is identified as the one that is closest in distance.Table 8 shows the result of this classification.The first column shows the final letter of an animation, and the second column shows the number of sequences that end with the letter indicated in the first column and are correctly classified, while the third column shows that only two sequences that end with the letter v were confused with the letter u.The accuracy of creating animations between two pairs of letters in which the final letter is the desired letter is 98.8%.
Finally, in the animation folder, two files-phrase1.m and phrase2.m-wereuploaded to show the animation of the phrases "We eat some bananas" and "The table is big".In these files it was observed that to spell the word banana, the animation of "an" was used to spell "an" and "na"; in the program, the order of the execution of this sequence is indicated.

Final Sign
Correctly Classified Misclassified

Discussion
Using the numerical traced algorithm proposed in [11], gesture transitions between pairs of letters of the sign alphabet were calculated.The transitions have an associated λ value in the interval of [0,1].The calculated transitions are smooth changes from an initial gesture G1 to a final gesture G2.Calculating the final gesture and the number of transitions depends on assigning values to the radius of a hypersphere and increasing that radius.Figure 5c shows that, with a fixed radius value in all iterations, two transitions were calculated for the letter pair (d,c) that had different values of λ in [0,1]; subsequently, by increasing the radius, 20 transitions were created (Figure 7).
In this research, we proposed the use of a GA to set the value of the radius and its increment to calculate the animation between pairs of letters.For each individual, the numerical algorithm was executed 30 times and the best individual had a fitness of λ equal to 1 at the end of the iterations.In this manner, the best individual must contain at least 30 different gestures.To decrease the number of gestures in the animation, fewer iterations must be performed, and to increase the number of gestures, the number of iterations must be increased.DTW was used to measure the similarity between the animations created between three pairs of letters and sequences recorded by a person; the distance between the recordings and the animations corresponding to the same pairs of letters is smaller, so the animations created can be used as patterns for dynamic sign recognition.

Conclusions
In this research, the parameters of the radius and its increment were optimized to obtain animations between pairs of letters of the sign alphabet using the numerical traced algorithm presented in [11].We performed experiments with the proposed method and observed the following: a value is assigned to the radius of the hypersphere, and the number of intermediate images calculated depends on this radius value.There is no guarantee that a final image will be calculated [19]; better results are obtained if the value of this radius is increased to plot homotopy curves while calculating the deformations between the initial and the final image.The number of transitions can be changed by changing the number of iterations that the numerical algorithm executes for each individual.
The animations created with the proposed optimization were compared with the real recordings and were most similar to their corresponding recordings.With the proposal made in this research, it is concluded that animations can be generated between pairs of sign language letters to implement applications that communicate with and provide assistance to deaf people.
From 20 gestures of the sign alphabet, pairs of animations were created to form sentences.The advantage is that each animation and three pairs of letters weighs 18.3 KB and that the same animation, such as (a,b), can be used to execute its inverse animation, (b,a); only the direction in which the animation is executed is changed.This can be seen in the files loaded in the link https://acortar.link/1KWigu,accessed on 8 February 2024, which makes it easy to create animations of words and phrases.
Future work based on this research should focus on implementing these animations in avatars that are controlled by joint movement.Additionally, in [26], actions were recognized from a set of key poses and using DTW.The animations calculated by the method proposed in this research can be used as key poses to recognize dynamic sign language.

Figure 1 .
Figure 1.First, the position of the hand joints in each image is obtained.Then, the transitions between the initial and the final hand signal are calculated.In the end, 15 calculated transitions are shown, which are the animation between the signs given to the hand gesture animation system.

Figure 2 .
Figure 2. The black dots correspond to the joints detected in a hand using Google's Mediapipe library.

Figure 3
Figure 3 shows the hand gestures corresponding to the letters a, b, c and d.The transitions between pairs of hand gestures (a,b), (b,c), (d,c) and (b,d) were calculated.For each pair of gestures the numerical traced algorithm was run 7 times, and the value of the radius r was set as shown in Table2.

Figure 3 .
Figure 3. Images of the hand gestures corresponding to the letters (a-d) are shown.The detected joints are also shown.

Figure 4
Figure 4 shows a graph for each pair of the following hand gestures: (a,b), (b,c), (d,c) and (b,d).

Figure 4 .
Figure 4.Each graph shows the execution of the numerical traced algorithm for 7 different values of the hypersphere radius for each line to calculate the transitions between pairs of letters.The best execution is the one shown in black, in which the λ starts at a value of zero and reaches a value close to 1.The circles show the calculated transitions.

Figure 5 .
Figure 5.The animation for a pair of letters is shown in each line: (a) animation of the letters (a,b), (b) animation of the letters (b,c), (c) animation of the letters (d,c) and (d) animation of the letters (b,d).In each transition the λ value corresponding to the calculated transition is shown.

Figure 6 .
Figure 6.The graph shows the execution of the numerical traced algorithm when using 5 different values of hypersphere radius to calculate the transitions between pairs of letters(d,c).In each run the radius increment was set to a value of 0.05; all runs reached a value of λ equal to 1.

Figure 7
Figure7shows the 20 transitions calculated for the letters (d,c) using an initial radius of 0.1 that increased by a value of 0.05.

Figure 7 .
Figure 7.The 20 transitions calculated for the letters (d,c) are shown.The initial value of the radius is 0.1 and it increased by a value of 0.05.Each image shows the λ value corresponding to the calculated transition.

Figure 8 .
Figure 8.The execution of the numerical traced algorithm for 3 individuals is shown.The left column shows the value of λ for each individual as the 30 iterations are run and the right column shows the final gesture that was calculated for each individual.

Figure 9 .
Figure 9.Each line shows the performance of the 3 best individuals, (a) one for calculating the animation of (h,o), (b) another for (o,l) and (c) the last one for (l,a).

Table 1 .
Mapping of joint position to x and y variables .

Table 2 .
Radius value set for each run.

Table 5 .
Similarity between animations and real sequences.

Table 6 .
Similarity between real sequences.

Table 7 .
Similarity between animations