Derivation of simple rules for complex flow vector fields on the lower part of the human face for robot face design

It is quite difficult for android robots to replicate the numerous and various types of human facial expressions owing to limitations in terms of space, mechanisms, and materials. This situation could be improved with greater knowledge regarding these expressions and their deformation rules, i.e. by using the biomimetic approach. In a previous study, we investigated 16 facial deformation patterns and found that each facial point moves almost only in its own principal direction and different deformation patterns are created with different combinations of moving lengths. However, the replication errors caused by moving each control point of a face in only their principal direction were not evaluated for each deformation pattern at that time. Therefore, we calculated the replication errors in this study using the second principal component scores of the 16 sets of flow vectors at each point on the face. More than 60% of the errors were within 1 mm, and approximately 90% of them were within 3 mm. The average error was 1.1 mm. These results indicate that robots can replicate the 16 investigated facial expressions with errors within 3 mm and 1 mm for about 90% and 60% of the vectors, respectively, even if each point on the robot face moves in only its own principal direction. This finding seems promising for the development of robots capable of showing various facial expressions because significantly fewer types of movements than previously predicted are necessary.


Introduction
Facial expressions are important in human communication and several studies on the design of communication robots have focused on attempts to replicate human facial expressions on the faces of android robots [5,6,10,11,12,14,15,16,19,25,27,29,31]. Android robot faces are mainly composed of a flexible and elastic skin and internal deformation mechanisms. The moving part of the deformation mechanisms is connected to a point or an area of the inner side of the skin and are designed such that they can move their own control point or area along the desired trajectory. One of the important design issue is the determination of the positions of the control points or areas and their trajectories. Because the space in robot heads is limited, it is desirable that the number of control points or areas and the trajectories be small and simple, respectively. In the case of the most common deformation mechanisms, which include the wire-pulling system and linkagedriving system (see the summary written by Cheng et al [6]), the trajectories of the control points are required to be unidirectional. Determining the starting and end points of the one directional trajectories is crucial in the design of android robot faces because quality and the variations in the facial deformations are decreased if the directions are set inappropriately. Therefore, a thorough prior understanding of the actual direction and displacement of every control point on any face is important for an efficient design that can generate realistic facial expressions [6].
Traditionally, facial mechanisms have been designed using a qualitative guideline, i.e. Ekman's action units (AUs) [6,10,12,14,19,24,29,31]. AUs are components of the various facial deformation patterns listed in the facial action coding system (FACS) [7]. The FACS explains that one AU or the combination of multiple AUs creates each of several facial deformations corresponding to both emotional and non-emotional expressions. However, AUs do not provide sufficient information for designing an actuation mechanism for robot faces because the FACS is not intended for that purpose; it is only intended to describe deformations verbally and qualitatively. Therefore, android robot faces have been designed and tested using a timeconsuming trial-and-error method [5].
In order to improve the design efficiency of android robot faces, quantitative guidelines for realistic and expressive facial deformations are necessary. Recognition tests and impression evaluations with questionnaire sheets regarding robot facial deformations are commonly used methods. In the recognition tests, human participants identify the types of facial deformations, e.g. happiness and sadness, that a robot face is showing and the quality of facial deformation is quantified according to the correct recognition rates [3,6,21]. In the impression evaluations, human participants provide numerical scores to the facial deformations of robots based on their impressions [2,8,9,26], e.g. realistic and eerie. Although these methods provide the robot designers with quantitative feedback on the overall quality of the robot expressions, these methods are not sufficient as design guidelines because such feedback does not tell the designers which parts of deformations should be improved and how.
In order to overcome this problem, human facial deformations were measured as references for robot facial deformations. Takanishi et al [30] recorded human facial deformations for four emotional expressions, i.e. surprise, smile, anger and sadness, using a charge-coupled device (CCD) camera and measured the trajectories of 40 facial points using camera images. Kwon [20] compared the ranges of deformations of both human and robot facial deformations by calculating the difference of gray level intensity at every pixel between two images of neutral and deformed faces. Such an image-based approach, which utilizes estimation algorithms of optical flows in images such as facial feature point tracking and high gradient comp onent detection [23], is unsuitable for the mechanical design of a human face robot because an optical flow vector in images and an actual displacement of a facial point are not always consistent as Cheng et al [6] also pointed out.
Cheng et al [6] captured facial deformations with 58 facial markers attached on the facial surfaces of both their android robot and a human face and compared their deformations in order to locate the facial areas that should be improved in a robot face for four emotional expressions: happiness, sadness, anger, and shock. They redesigned the deformation mechanism of the robot based on their comparison analysis and found that the correct recognition rates and realistic scores were significantly improved. Bickel et al [5] captured three-dimensional shapes of a human face for eight facial deformations and showed that the optimal shape of a robot skin that can replicate the deformations with minimum replication errors can be calculated when an arbitrary deformation mechanism is provided. Their cloning approach is promising for efficient design of robot faces. However, the optimal design of the deformation mechanisms for various facial deformations has not yet been realized. An advanced approach in which designers determine the optimal locations and trajectories of each deformation mechanism based on a deeper understanding of the facial deformations rules, such as the biomimetic approach, is required. Therefore, we investigated how human facial expressions can be replicated by identifying simple surface deformation rules. In our previous study [28], we measured 16 lower-face AUs as two-dimensional flow vector fields and conducted a principal component analysis (PCA) on the vectors for each facial point, which revealed that the average contribution of the first principal component (PC) was 86%. In other words, almost all of the measured movement occurred in only one direction at each facial point. This simple rule suggested that an unidirectional deformation mechanism at each point on the lower face might be sufficient for expressing the 16 measured AUs. However, the replication errors in the deformations made using the unidirectional deformation mechanism were not evaluated for each AU at that time. Furthermore, the differences between the flow distributions of the measured AUs were not analyzed.
In this study, we analyzed the differences between the flow vector fields of the AUs, calculated the variations in the lengths and ranges of the measured deformations, and evaluated the replication errors in the deformations made using the unidirectional deformation mechanism.

Deformation patterns
The FACS includes 16 lower face deformations, the names and AU numbers of which are summarized in figure 1. Each name indicates the part of the face responsible for the deformation (e.g. lip corner or chin) and the deformation pattern (e.g. puller or raiser). A Japanese adult male (hereafter referred to as the demonstrator) demonstrated each of these AUs based on the FACS manual [7], which describes how to form each AU.

Measurement
In order to measure the movements of the entire skin surface of the lower face, we identified 96 measurement points on the face of the demonstrator. As shown in figure 2, blue ink markers were applied to these points to facilitate their identification in the images. Each marker was approximately 2-5 mm in diameter. The locations of the markers were determined such that the entire surface of the lower face was covered with the markers as in the study conducted by Cheng et al [6]. We used a greater number of markers (96 markers) than that in the previous study (58 markers).
The marker movements were recorded for each AU by using a video camera (SONY Handycam HDR-CX420) at 60 fps. The recording process was as follows: the demonstrator first maintained a neutral face as shown in figure 2, and then performed one of the AUs and maintained it for a couple of seconds.
Consequently, 16 video clips containing the neutral and deformed faces were recorded. The marker locations or pixel coordinates were obtained by calculating the centers of mass of the blue regions in the video images. Although the motion trajectories and speed of each marker are considered to be nonlinear, these dynamic properties of the deformations were excluded from the analysis in this study. Instead, we focused on the difference in the location of each facial point between the neutral and deformed faces in order to facilitate a principal component analysis of the facial deformations.

Image corrections
Two images of the demonstrator's face, one showing the neutral expression and one with the deformed expression, were selected from each AU video clip in order to measure the flow vectors of the markers. Although the two-dimensional flow vectors could have been approximated using the marker pixel coordinates in the two images, we performed two types of image corrections before calculating these vectors in order to extract more accurate flow vectors and to match the starting points of the vectors of each marker among the different AUs.   The first correction compensated for head movements that occurred while the demonstrator performed each AU and the second correction compensated for the differences between the neutral faces preceding the different AUs. The correction processes were introduced in our previous work [28].

Evaluation of replication errors after PCA
PCA was applied to 16 superimposed flow vectors on each measurement point to identify the first main flow direction (the first PC) for each marker. We evaluated how well the measured facial deformations could be replicated when each point moved only in the direction of its first PC in order to verify that the simple unidirectional deformation mechanism was sufficient for replicating the facial expressions. Figure 3 illustrates the relationship between the measured flow vectors and the errors in the flow vectors replicated using only the first PC. The second PC score, (the length of each red line, i.e. the minimum length between the end point of a flow vector and the first PC) was calculated for each measurement point (P 1 and P 2 ) and each AU and was used to determine its replication error. Figure 4 presents the measured flow vector fields and flow length contour maps for the 16 investigated AUs. The arrows represent the flow vectors, and the circles, which are the starting points of the arrows, represent the marker locations on the neutral face. These flow vectors were obtained after applying the two image corrections. The dark red and pale yellow areas indicate the regions with longer and shorter flow vectors, respectively. These contour maps were drawn by using a linear interpolation of the measured flow vector fields. The maximum measured vector length is 22 mm, which occurs at the lip corner in AU13. Figure 4 demonstrates how extensive and continuous the facial deformations are and how the deformations differ among the 16 analyzed AUs. For example, almost the entire surface moves in some cases, such as AU9, AU12, and AU13. In contrast, the deformations are localized in the case of AU15, AU16, and AU17. AU12, AU13, and AU15 have flow vectors longer than 18 mm, whereas AU11, AU18, AU22 and AU25 only have vectors shorter than 8 mm. The flow vectors near the lip corner are directed toward the upper part of the face in AU12, AU13, and AU14. However, in AU18 and AU20, the flow vectors near the lip corner are directed toward the side of the face. Figure 5 shows a contour map of the maximum flow length for the 16 analyzed AUs. This map was drawn using a linear interpolation of the maximum flow lengths of each marker. This figure indicates the difference in the maximum flow lengths between the facial regions. For example, the lip corner moves by over 20 mm while the mid-cheek and the side of the nose move by approximately 12 mm. Figure 6 shows a scatter plot for the AU deformation patterns. The horizontal axis indicates the number of points that moved by more than 3 mm, which is approximately proportional to the skin area that moved, and the vertical axis indicates the maximum flow length for each AU.

Deformation variations
This graph exhibits three important features. The first is that there is a significant positive correlation between the maximum length and the number of points that moved (r = 0.74, p < 0.01). This correlation indicates that a large (or small) area of skin moves simultaneously when at least one point on the skin moves significantly (or slightly).
Second, large areas of skin moved and significant displacements were produced in AU12 and AU13 to pull up the lip corner and in AU9 to deform the nose. The greatest maximum flow length is approximately 22 mm in AU13, while the maximum number of moved points is 80 in AU9.
Third, the flow distributions are unimodal and the peak flow locations are different among the 16 analyzed AUs. For example, the peak occurs at the lower part of the lip corner in AU12 and AU13 and at the top center of the lip in AU 28. Figure 7 depicts the superimposed vector field obtained after the image corrections were applied. The colors indicate the vector directions. It is evident that x y Figure 2. Ink marker' locations on the right half of the demonstrator (blue dots) [28]. The red circles are reference points placed on a facial image for head movement compensation.
the points on the face moved by different amounts and in different directions. Although the flow vectors appear too complicated to replicate on a robot face, we analyzed the vectors to find simple rules, as described in section 3.3. Figures 8(a) and (b) depict the first and second PCs, respectively, of the flow vectors of each marker as well as the contour maps of the standard deviations of the corresponding PC scores. The directions of the solid lines for each marker represent the directions of the PCs. The midpoints of these lines correspond to the marker positions for the neutral face. The dark red and pale yellow areas indicate the regions with larger and smaller standard deviations, respectively. These contour maps were drawn using a linear interpolation of the calculated distributions of the standard deviations of the PC scores. Figure 8(a) reveals that the first PCs tend to align vertically. Thus, the deformations of the lower face are primarily vertical. The standard deviations of the first PCs are greater at the lower corner of the mouth and the chin (7.7 mm), which means that the flow lengths in those areas differ more than in other areas for several AUs. The extent to which these PCs maintain the variations in their flow vectors is represented by contribution ratios. The contribution ratio of the first PC at each facial point can be calculated by dividing the variance of the first PC scores by the summation of the variances of the first and second PC scores. The average contribution ratio of the first PCs for all of the markers was 86% [28].

PCA
Meanwhile, figure 8(b) indicates that the second PCs tend to align horizontally. The standard deviations of the second PCs are the largest at the lower corner of the mouth (3.3 mm). Figure 9 shows the distribution of the replication errors for all of the measured flow vectors. Their average was found to be 1.1 mm. More than 60% of the errors are within 1 mm and about 90% of them are within 3 mm. The replication error distributions for the individual AUs are compared in figure 10. The errors are different for each AU: every error is within 4 mm for AU25, AU10, AU11, AU17, AU28, and AU22, whereas there are errors greater than 7 mm for AU12, AU9, AU16, and AU20. Figure 11 shows the maximum replication error of each facial point. The maximum error in the lower face was 7.8 mm at the lip corner, which occurs when replicating AU20. This figure indicates that the error is greater in the regions where the second PC scores are greater (see figure 8(b)). This correspondence is not surprising because the unidirectional mechanism we have assumed for calculating the replication error cannot move facial points along the second PC directions. Figure 11 also indicates that the error is smaller in the regions where the maximum flow lengths are smaller (see figure 5).

Discussion
Using the 16 investigated rich and complex lower face deformation patterns, we identified a simple rule, which states that 'each point on the face moves almost only in its own principal direction for 16 patterns of lower-face movement [28]'. In this study, we evaluated the extent of the replication errors that could result from adopting a simple unidirectional actuation mechanism that moves facial surface points in one direction instead of a large and complex multidirectional mechanism that moves the points in several directions. Here, we assume an ideal deformation mechanism that is sufficiently small for the installation at 96 points on the inside surface of the skin and an ideal robot facial skin where the movement of the deformation mechanisms is accurately reflected at the facial surface points. Under these assumptions, the result obtained from the human face analysis can be translated to the design of the deformation mechanism of the robot face: robots can replicate the 16 studied facial expressions with replication errors within 3 mm and 1 mm for approximately 90% and 60% of the flow vectors, respectively, by using unidirectional deformation mechanism.
Although the sizes and shapes of all the markers were not exactly the same, these differences were not considered to have affected the accuracy of the analysis because the marker locations were measured as the center of mass of the marker's color region. This is supported by the fact that the magnitudes of the measured flow vectors were almost zero at the region far from the deformation peak as shown in figure 4.
The number of markers used was sufficient for capturing the peaks and gradients of the deformations shown in figure 4. However, the sufficiency of the numbers of markers should be evaluated quantitatively with respect to how much information can be obtained using these markers. Thus, facial deformations should be measured using a richer number of markers in order to investigate the relationship between the number of the markers and the lack of deformation information.
Several other problems remain to be solved in order to verify the effectiveness of the aforementioned simple rule in actual robots. Firstly, 96 points of control is too great a number for a robot face. In order to reduce the number of control points while leaving the replication error as unaffected as possible, the neighboring skin points that move similarly should be identified to form areas in which skin points move together. Secondly, the relationship between the movements of the robot facial surface and deformation mechanisms should be investigated to determine the movements of the deformation mechanism that produce the desired movements of the facial surface. This is because the original  movement of the deformation mechanisms installed on the inner side of a thick and soft robot skin is not reflected, as it is, in the surface deformation owing to its deformability. Thirdly, a flow vector weighting process is necessary in the PCA in order to obtain a more effective design guideline because the influence of each flow of each facial expression on the impression of a human should be different. If a flow in a facial expression strongly affects the impression on a human, that flow should be prioritized for deriving the PC so that its replication error is reduced. Human sensitivity to each flow should be examined in future research.   In addition, we made several observations that may contribute to effective robot face design: 1. The longest flow length in the lower face was 22 mm. 2. A large (or small) area of skin moved simultaneously when at least one point on the skin moved substantially (or slightly). The deformation distributions were unimodal and covered almost the entire surface in the case of several AUs. 3. The flow length variation with respect to the principal direction was the greatest around the mouth and jaw (the standard deviation was 7.7 mm).
These observations provide several guidelines for designers. Observation 1 suggests that designers should use elastic skin materials that can accept a maximum of 22 mm deformations without breaking and without exerting excessive resisting forces on the facial actuators. Furthermore, designers can estimate the required deformation power at each facial point with the knowledge of the longest flow length at each facial point shown in figure 5. This can prevent the design of deformation mechanisms of great size and power. Observation 2 indicates that designers should determine the elastic modulus distributions of the skin such that large local deformations produced by an actuation mechanism can cause realistic wide-range deformations, as shown in figure 4. Observation 3 implies that the necessary position control accuracy is not the same for the various positions on the face. The highest position control accuracy is necessary around the mouth and jaw, while this accuracy can be lower at other locations such as the cheeks. This suggests that designers can install low-cost deformation mechanisms with low control accuracy at several parts on the robot face instead of high-cost mechanisms with high control accuracy.
The measured flow fields also facilitate the performance evaluation of developed robot faces in replicat-  ing human facial expressions. Traditionally, the performance of android facial robots has been assessed mainly based on the number of degrees of freedom, the rate of identification of robot facial expressions by humans, or the correspondence between the movements of only a few points on human and robot faces [1,4,12,13,17,18,22,24,25,31]. However, these conventional evaluation methods cannot provide information regarding the method for performing corrections or which parts should be corrected in order to enhance the performance effectively. In contrast, by comparing highdensity flow vector fields for human and robot facial expressions using our data, designers should be able to obtain quantitative and rich data for correction. Although the deformations around the eyes and brows are obviously important in a face, we excluded the upper face from measurement and analysis in this study. This is because our preliminary investigations indicated that it is difficult to measure deformations around the eyes using ink markers as several markers become buried in wrinkles. In future research, we will measure and analyze the deformations of the upper face with other types of markers and complementation algorithms for markers hidden in wrinkles.
One of the major limitations of this study is the number of human participants. We recruited only one participant because our main focus was to verify whether effective guidelines can be derived using detailed measurement of the flow fields and conducting a PCA on them. The generality of the rules thus obtained can be evaluated by applying our method to several participants. Furthermore, the flow field analysis with several participants may provide a deeper understanding of human facial expressions on comparing the deformation properties between different sexes, ages, and cultures.
Another major limitation of this study is that our measured flow vectors were two-dimensional and lacked depth. In order to analyze facial expressions more accurately, three-dimensional flow vectors should be measured. Furthermore, by tracking the trajectories of markers with a three-dimensional motion capture system, the analysis of the dynamic properties of facial deformations, e.g. the change in speed and difference in motion start timings between facial regions, will become possible, which would aid in improving the actual expressions of robot faces.

Conclusion
We found that each facial point of a human lower face moves almost only in its own principal direction. This simple rule suggests that a small and simple unidirectional deformation mechanism at each point on a robot lower face might be sufficient. In this case, estimated replication errors for the 16 investigated facial expressions on a robot face were found to be within 3 mm and 1 mm for approximately 90% and 60% of the flow vectors respectively.
Although further analyses are required, our method of replicating several facial expressions using a single robot face is promising. The relationship between emotional expressions and AUs based on our flow field analysis for AUs is a potential area of study. FACS explains that emotional expressions are realized by using combinations of AUs anatomically [7]. A method for representing these combinations in the flow field should be investigated in order to replicate such complex expressions on a robot face.