The Possibilities of Classification of Emotional States Based on User Behavioral Characteristics

The classification of user's emotions based on their behavioral characteristic, namely their keyboard typing and mouse usage pattern is an effective and non-invasive way of gathering user's data without imposing any limitations on their ability to perform tasks. To gather data for the classifier we used an application, the Emotnizer, which we had developed for this purpose. The output of the classification is categorized into 4 emotional categories from Russel's complex circular model - happiness, anger, sadness and the state of relaxation. The sample of the reference database consisted of 50 students. Multiple regression analyses gave us a model, that allowed us to predict the valence and arousal of the subject based on the input from the keyboard and mouse. Upon re-testing with another test group of 50 students and processing the data we found out our Emotnizer program can classify emotional states with an average success rate of 82.31%.


I. Introduction
T he classification of user's emotional state belongs to the most debated topics in the areas of pedagogy, psychology and computer science. In computer science, the problem of classification is mostly encountered with cognitive systems or human-computer interaction (HCI). In the recent years, multiple methods have arisen to describe and recognise the emotional states of human subjects [1]. These methods are not only based on visual cues such as the facial expressions or gestures captured by a webcam [2], auditory cures -pitch, and tone of voice [3], physiological measurements (skin resistance, heartbeat rate, body temperature), but also on behavioural characteristics. Standard methods usually represent an invasive way of gathering data, during these the user is aware of the ways the data is gathered, which can distort the results. As an example of this influence we can mention face expression recognition, the extraction of a region of interest and subsequent classification of emotional state based on instructed expressions. Some authors have already pointed out this phenomenon [4], [5].
Methods of classifying the user's emotional state based on behavioural characteristics are relatively new and very topical area of scientific research even nowadays. Students of applied informatics focus their expertise mainly on the field of programming. The speciality of teaching programming (methodology) is that they must not only be able to program a specific application from the ground up but they must also be able to read the code, look for errors and correct them. Based on pedagogical research (interviews with students and questionnaires) we found out that reading and correcting the wrong code is often a stressful factor for some students that evokes different emotions (amount of anger, joy, frustration, etc.). In order to help them with the educational process and set a suitable learning style for them, we programmed and implemented the Emotnizer application into the educational process.
We have divided the paper into several sections. The Related Work section presents the theoretical bases of emotional state classification, basic models of emotions and their authors, as well as an analysis of the current state. We examined the researchers' focus on emotions based on behavioural characteristics, namely when using the keyboard and mouse of a computer. The Material and Methods section describes our methods used in Emotnizer, an application which collects raw data by capturing real-time keyboard and mouse events using the JNativeHook library. The application groups this data, and stores it for further use using a relational database system contained in a small SQLite library. In the Experiment section, we propose creating a reference database for classifying the emotional state based on user's behavioural characteristics. In the Results section, we present the results of the experiments and a description of the calculation of the overall success of the classification of the emotional state. The rationale for the results and their confrontation with the results published by other authors is given in the Discussion and Conclusion section.

II. Related Work
Emotions are complex psychological processes caused by chemical changes in the brain associated with subjective experiences (the way we experience emotions), physiological reactions (the way our bodies react to emotions) and behavioural or expressive reactions (the way we respond to emotions) [6].
The classification of emotions is how we can distinguish one emotion from another. There are two unique models for expressing emotions: • categorical model -in which it is usually mandatory to select one emotion (or emotional category) from the set of emotions, which is best indicated by the mediated feeling. In these models, the emotions are precisely marked -categorised. The most significant categorical model is the Ekman model of classification [7], [8], which is currently referred to by many authors [2], [9], [10].
• dimensional model -definitions of human emotions are usually based on two or three emotional dimensions. Among the majority of these models there is a degree of valence involved as well (pleasurable -unpleasant) and a degree of arousal (arousingsubduing). The most influential model currently used is Russel's complex circuit model [11]. Dimensional emotional models suggest that all affective states are caused by the neurophysiological system [12].
An interesting issue that stands out in emotional classification is the usage of currently available technology [13]. Most commonly the technology used to collect emotional data is facial expression recognition using widely available web-cameras. With this approach, the recognition is realised in three basic phases: 1. face detection, 2. extraction of the region of interest, 3. classification of emotional state. This procedure was established by Kanade in 1973 and it is still in use to this day [14]- [17].
The facial expression recognition technology is based on the knowledge that the classification of emotional state based on the visible facial expressions of emotion predicts the emotional reactions of people induced by presented imagery. Psychological studies show that experienced emotions can be influenced by various visual stimuli [18]. These influenced emotions are set on a gradient of emotional valence (low level -negative emotion, high level -positive emotion). By visual stimuli we understand the level of contrast, colour, texture, the position of an object on the scene, etc. [19], [20].
Other technologies for emotional recognition are physiological data acquisition using sensors, such as electrocardiograph (ECG), blood volume pulse (BVP), electroencephalograph (EEG) and other electrical signals, e.g. GSR -galvanic skin response, for detection and recognition of emotional states. It is known that the sources of the most reliable physiological signals for emotional recognition and its processing are EEG and GSR [18], [21], [22].
The main hurdle in current attempts of emotional state classification using subject's speech pattern is not only the lack of reference databases (such as RAVDESS, SAVEE, EMO-DB) but mainly the variability of the pitch and intensity, which can differ greatly across various human cultures [23], [24]. The standard emotion classification from speech typically employs only these two approaches: discrete categories or emotional dimensions. Morgan determines that also gender was found to impact the ratings of emotional speech and must be considered alongside stimulus factors in the design of future studies of emotion [25].
Nonverbal communication is a manifestation of human behaviour and involves various ways of transferring information, such as human activity, hand movements, facial expressions, etc. It is often used as an essential means of expressing attitudes, feelings and emotions. It plays an important role in multimodal interactions between the human and computer systems [26].
The issue of the classification of emotional state based on behavioural characteristics is related to the current state of professional literature which would be focused on this area. All research is based mainly on Epp, Lippold and Mandryk [27], who were trying to recognise emotional states based on the keyboard typing dynamics, by logging the duration of keypresses and typing latency. Based on the user's computer activity level, the program encouraged the user to self-evaluate throughout the day. The emotional state questionnaire contained several 5-point questions based on Likert scale regarding the current emotional state of the user for each of the 15 recognised emotional states. The best results of this experiment include two-stage classifiers for self-confidence, hesitancy, nervousness, relaxation, sadness and fatigue with an approximate accuracy of 77.4%. The results of the experiment also show a degree of recognition of anger and arousal with an accuracy of 84% [27].
A similar experiment was carried out in 2015 by Lee using the knowledge gained from Epp, Lippold and Mandryk [27]. The experiment was carried out on a group of 52 people, aged 20 to 26 years (44 men and 8 women). During the experiment subjects wore headphones and were ordered to write "748596132" immediately after hearing each recording from the International Affective Digitized Sounds 2nd edition (IADS-2). The experiment was conducted based on a simple dimensional view of emotions, which assumes that emotion can be defined by matching values on two different strategically chosen dimensions -valence and arousal. The Self-Assessment Manikin -SAM method was used to assess these two dimensions of emotional space. The results of this fixed target text experiment support the hypothesis that keystroke duration and latency are affected by arousal. A shorter key press time (106.70 ms ± 23.80) is shown when arousal is high compared to the keypress duration when arousal is low (108.76 ms ± 24.52). This result indicates an increase in the duration of key presses when people are tired, sad or bored [28].
In his work, Pentel stated that there are several studies on mouse movement and emotions that suggest the existence of a link between the two. He carried out 2 experiments to confirm his hypothesis. In his first experiment, he created a simple online computer game that collected data from a user's mouse. The screen was filled with random buttons labelled with numbers from 1 to 24. The user's task was to click all the buttons in the correct order as quickly as possible. As it turned out, some buttons were harder to find and caused emotions of confusion in users, thereby allowing the identification of mouse patterns associated with this mental state. It was found that users who reported a confusion rate in the Self-Assessment Scale of 5 to 7 (their confusion rate was high) scored lower in locating buttons speed than those who reported values of 1 to 3 on the Likert Scale. The machine learning algorithm model C4.5 predicted confusion using this data with an accuracy of 84.47%. The result of this experiment was that about half of the users were unable to specify exactly what they were feeling [29].
For this reason, in his second experiment, Pentel used the same data collection procedure as in the previous experiment. Besides, he has collected new data from 400 more game playthroughs. In total, the experimental group consisted of 282 individual users between the ages of 12 and 52. There were up to 21984 collected records representing mouse movement between two button clicks. In this experiment, three types of data were stored (coordinates, button sizes, and colours). Users have reported a few different kinds of emotions: confusion, frustration, shame and satisfaction. The strongest emotions were confusion and frustration, and these were associated with tasks during which the user was unable to find another button for a long time. Pentel has tested four popular machine learning algorithms: Logistic Regression, Support Vector Machine, Random Forest and C4.5. The most successful algorithm was Random Forest with a 93% success rate [29].

III. Material and Methods
The main purpose of this paper is to describe the development of the software providing automatic recognition of the user's emotional state based on the keyboard and mouse input. The Emotnizer application can recognize 4 basic types of emotions according to the dimensions of arousal and valence. The application was programmed in Java. For data collection from keyboard and mouse, it takes advantage of: JNativeHook -Java library, which allows the program to globally read all the keyboard and mouse events.
SQLite -small, fast, standalone, highly reliable and full-featured relational database system contained in a relatively small library written in C.
JFreeChart -graphing library for Java. Supports the creation of pie charts (2D and 3D), bar charts (horizontal and vertical, regular and stacked), line charts, point charts, etc.
If we had aimed to collect and save all the available data from every mouse event, in an unsynchronized way, we would have gathered an incredible amount of data that would have no informative value. It is raw data, which needs to be grouped by action type (scrolling, mouseclick, etc.) and unit of time (data collection period).

A. Mouse Events
Mouse click frequency -using the nativeMousePressed method (from the JNativeHook library), which calls our thread method at the moment of pressing a mouse button and increases the number of the variable by one. As this thread sleeps for 5,000 milliseconds (5 seconds), during this time it only counts the number of mouse clicks. When the thread wakes up it calculates the difference of the number of mouse clicks and the amount of time the thread slept (5 seconds).
Mouse button hold duration -time difference between a mouse button is pressed and ends at its release. A couple of methods are being used -nativeMousePressed and nativeMouseReleased. At the moment of the mouse button press the nativeMousePressed method calls our thread method and passes the button code as a method parameter (left, right or scroll-wheel button) and the current time in milliseconds. At the release of the mouse button, the nativeMouseReleased is invoked, which calls the thread method to calculate the mouse button press duration. It also has 2 parameters: button code and the current time in milliseconds. Subsequently, the calculation is performed -the difference of 2 values (new current time value minus the old current time value). However, since the thread sleeps for 5 seconds the result will be the sum of all button hold times divided by the number of times the thread wakes up. The result therefore will be an arithmetic mean calculated according to (1). This result is stored in the database. (1) To capture the scroll-wheel activity we use the nativeMouseWheelMoved method. The principle is the same as in the previous methods -we create a thread that has a predefined sleep time of 5000 milliseconds (5s). If the method detects the scroll wheel movement it calls the thread method and passes a method parameter of positive or negative rotation (scroll up or down) to the thread. When the thread wakes up it will have several positive and negative scrolls. The following pair of values is considered as output: • the total amount of scrolling -the sum of these numbers in absolute values divided by sleep time (in our case it is 5 seconds), e.g. positive rotation would be 15 and negative rotation −5. The result would look like this: (abs (15) + abs (−5)) / 5 = 4, • the sum of counts -we add up the number of positive rotations with the number of negative rotations, e.g. 15 + (−5) = 10. The further this number is from zero, the more decisive it was for the user to choose to go up or down the screen. But the closer it is to zero, the user feels more indecisive (moved chaotically).
The speed of the cursor can be determined by the number of pixels per seconds that the cursor passes from point A to point B. We take advantage of methods: • nativeMouseMoved -method, which outputs the current coordinates of the cursor when it moves, • nativeMouseDragged -method, which outputs the current cursor coordinates, similar to nativeMouseMoved, while the cursor is dragged, not just moved. These situations occur when the user presses and holds the mouse button while moving the cursor.
The first movement of the mouse cursor sets the starting coordinates of the first point using the next thread method. The thread sleeps every 100 milliseconds (0.1 seconds), wakes up, sets the coordinates for the new point, calculates the distance according to (2) between them using the Pythagorean theorem and overwrites the coordinates of the first point with the new coordinates. (2) Distance calculations are added in the meantime to the local variable, and after the thread repeats this 50 times, the sum of all distances is divided by the data collection time (5 seconds). The calculation of the distance between of two points on the screen, A [xA; yA], B [xB; yB] which is represented by a 2D space, a plane, is determined as the size of the hypotenuse of the right triangle ACB (Fig. 1).

B. Keyboard Events
Similarly to the raw mouse events data, raw keyboard events data have to be grouped by the type of action and unit of time.
To determine the frequency of keystrokes or the number of keystrokes per second we use the nativeKeyPressed method which calls a specific thread method and increases the value of a variable by one, similar to recording the mouse click frequency. The thread sleeps for 5 seconds, wakes up and divides the number of keystrokes by sleep time (5 seconds). As there is a much greater number of keys on a keyboard than there are buttons on a computer mouse, we've decided to divide the keys into three basic categories: • modifier keys -Shift (character capitalisation), Ctrl and Alt (keyboard shortcuts), AltGr, Win (summoning Start menu and keyboard shortcuts in Windows), Fn (volume, brightness control, etc.), • erasing keys -Delete (deletes character after the cursor) and Backspace (deletes the character before the cursor), • all other keys -alphanumerical keys and any other key apart from modifiers and deletion keys.
We use two methods from the JNativeHook library to determine key holding period (a time period which starts when a key is pressed and ends when it is released): • nativeKeyPressed -at the moment a key is pressed it calls the thread method with the key code and current time in milliseconds as its method parameters.
• nativeKeyReleased -is executed when a pressed key is released and calls a method which passes the key code and current time in a millisecond as parameters to the thread method.
Since we can have multiple keys pressed at the same time, the key code parameter is very important for the calculation. When the key is released the codes are matched in the list, the timestamp of the keypress is found, subtracted from the key release timestamp and the result is converted from milliseconds to seconds. Partial results are obtained during thread sleep. When the thread is awakened the total result is obtained when the sum of all key holding period is divided by the number of all key holds. And since all keys are divided into three basic categories into modifying, erasing and all other keys, the hold time results are divided into these categories.
Pause between keystrokes is a period that starts when the last key is released and lasts until the next key is pressed. Fig. 2 shows the key holding period as well as the pause between keystrokes. To determine the pause period we used the thread we created, which calculates the frequency of keystrokes and the key holding period. After the key was released we checked whether the list of held keys is empty, and if so, we saved the value of time in milliseconds in a local variable. When a keystroke occurred we checked if it happened during the pause or whether the last key release timestamp was written in front of it. If this was true the difference between the two values in milliseconds was sent to the thread that counted the sum of all the pairs and the number of all the pairs. After the thread awoke the average value was calculated from the sum of the pauses and the number of them. We asked 52 people (47 males and 5 females) of an average age of 21.46 years to undergo profiling by our Emotnizer application. The experiment was supervised by a psychologist. Not all participants met all the requirements, for example, they did not transcribe the entire text, and therefore we have excluded these erroneous data from the overall testing. The total number of people who met all requirements fell to 50 (45 men and 5 women) of an average age of 21.52 years. Most of the students study at the university in the field of applied informatics, therefore they can work with computers very well. Of the total number of people, 38% said they could typewrite. Also, 20% reported using a computer for less than 5 hours a week, 44% using a computer for 6 to 20 hours a week, and 36% using a computer for more than 20 hours a week. Up to 96% of people said they prefer to use the mouse wheel rather than the Home, End, etc. keys. as you scroll vertically across the screen. In this step we created the source data for the reference database (user profile). Based on this database it is possible to determine the user's behavioural characteristics, i.e. how they work with the text (read it, rewrite it) and compare it with the data obtained in the experiment.
To obtain the data, we used a trio of tests, which were prepared for us by a psychologist: 1. Questionnaire tasks to select options, 2. Rewriting the text, 3. Classification of the user's emotional state using self-assessment.
The choice questionnaires required the users to first enter their username, age, gender, dominant hand (right or left), hours of computer use per week, etc. While users were solving this task, their work with the computer mouse was recorded. The users' second task was to transcribe a predefined text (Fig. 3), which was supposed to evoke an emotion of anger or fear in the user. The text consisted of over 800 characters, including spaces. The users' task was to read the text and rewrite it correctly in the dialogue box. All events were measured and recorded when transcribing the text. Finally, we calculated the number of errors the users made when rewriting the text. The number of errors was measured using the matrix principle based on the Levenshtein distance between two strings (compared to the original text with the text rewritten by the user). The result was the number of characters that differed from the predefined text. The users' third task was to evaluate their emotional state with the Self-Assessment Manikin, where they should determine the valence rate (SAM1) and arousal rate (SAM2) on a scale of 1 to 5 they felt during the transcription.

V. Results of Experiment
After the data collection phase we had the results from the test group. The following table (Table I) shows the descriptive characteristics recorded in the database. We have selected one of the participants for the demonstration. In the database, descriptive characteristics are stored in a single row for each user. For a clear view we chose to split the characteristics below one another. From these descriptive characteristics we have created average descriptive statistics for 50 users and continued with multiple regressions. We could not use linear regression because of the independence of the parameters and their number. In multiple regression we focused on processing SAM1 (representing valence) and SAM2 (representing arousal) depending on all recorded events. Multiple regression calculation was performed using MS Excel (Data Analysis tool). We reduced the valence to two states: positive, negative (arousal was reduced similarly) from the original 5-point scale due to the neutral impact of the obtained data on the statistical evaluation.
According to Russel's model, the emotional state of happiness occurs when the user reaches a high value of valence and arousal. State of relaxation is defined by low arousal and high valence. The emotional state of anger is characterized by high arousal and low valence, while sadness is characterized by low arousal and high valence. In the reduction we evaluated happiness and state of relaxation as positive valence, state of anger and sadness as negative valence. Similarly, we have reduced the state of arousal.
When comparing the input parameters we got descriptive statistics for individual coefficients. The Table II shows the results of multiple regression for SAM1 and SAM2 (Table II).  These data were multiplied according to (3), where β are coefficients calculated by multiple regression and X are parameters from the keyboard and mouse. The results are predicated of SAM1 and SAM2.
To determine the overall success of the classification we applied the SAM1 and SAM2 predicates to the Emotnizer application we had designed. We verified the overall success of the classification under the guidance of a psychologist by providing another 50 users (average age of 21.15) a PDF file with instructions to use the application. In this file, we presented detailed instructions for creating a profile, and we also focused on recognizing emotions by this application. Users have been notified in advance to read this PDF instruction file first. After reading and understanding, they were tasked to click the Start button and start performing tasks. After pressing the Start button the program randomly chose 5 words from a set of 20 words. The users' task was to open a web browser and look up the first word generated. Users were tasked to open the first search result and copy at least 4 continuous sentences from there into any text editor. They were tasked to transcribe this copied text below itself. As a result, there should be two identical texts below each other in the text editor (the reference database was created in the same way -users transcribed the predefined text into the dialogue box). In this case the choice of text was up to the user, the user could search for any text unlike the profile creation task. The same was true for the second word chosen by the application. For the last three words users had to think of a short (one sentence) definition that describes the word and the user had to write it in a text editor. After completing all tasks, users were tasked to click the Stop button.
Upon task completion, a program dialogue box appeared in which the users were shown a visual (using a graph) representation of their emotional recognition results by the program. SAM1 and SAM2 values on a scale of -1 to 1 were recalculated to the percentage of all emotion categories according to formula (4). As an example, we chose a formula to calculate the share for the emotion of happiness.
Finally, users were asked to objectively evaluate the success rate of detected emotions for all categories of emotions (happiness, anger, sadness and relaxation status) on a scale of 1 to 9 (9 is best) and submit the results (Fig. 4).  Table III shows the descriptive characteristics we saw after testing. We have selected one of the participants for the demonstration. From the predicted emotions and their ratings we calculated the success rate percentage of each emotion for each user. We have recalculated partial success rates as percentage equivalents of the respondents' accuracy of prediction, represented by (5).
By summing all the partial success rates of the emotional state rating we gained the overall success rate for that user. By summing up the overall success rate of individual users divided by the number of users we obtained an overall average success rate of Emotnizer classification of 82.31% (Table IV).

VI. Discussion and Conclusion
Recognizing the emotional state of a user utilizing their behavioural characteristics in particular, while working with a mouse and a keyboard, is a relatively new and not sufficiently explored topic. Every person has a different typing speed and not all people are used to work with computers. Therefore, when creating Emotnizer application we had to work with the so-called user profile. Using the user profile we obtained user behavioural characteristics, and based on multiple regression and re-calculated SAM1 (valence) and SAM2 (arousal) predicates. We designed a classification algorithm for emotional states classification. Emotnizer is currently able to recognize 4 types of emotional states (from Russell's complex circular model): happiness, anger, sadness, and a state of relaxation.
A classifier is generally considered successful if it has at least 80% average success rate of the classification. According to the valence and arousal predicates, Emotnizer is currently able to classify emotional states with a success rate of 82.31%. Epp, Lee and Pentel also achieved similar values in their research [27]- [29]. In his research, Epp only recorded key holding periods and latency, not focusing on how respondents work with a computer mouse. In the resulting classification he found that the classifiers used by him could better identify anger and excitement (84%) but conversely had difficulty correctly determining self-esteem, hesitation, nervousness, relaxation, sadness or fatigue (77.4%). Based on the analysis of data from Table IV (average values for each emotional state) we see that classifiers were able to determine anger with a probability of 85.25%, state of relaxation with a probability of 85.00%, sadness of 80.25% and happiness of 78.75%. Thus, we achieved very similar classification values to Epp. Pentel also addressed the state of confusion in his classification. His C4.5 machine learning algorithm model for data processing predicted confusion with an accuracy of 84.47%, in his second experiment, using logistic regression, Support Vector Machine, Random Forest and C4.5 he reached the success rate of 93%. However, according to Rosenberg and Ekman [30], confusion, frustration or shame cannot be regarded as emotion but rather a manifestation (effect) of a particular emotional state.
We have now solved 4 basic emotions. These emotions are found in both Ekman's classification and Russell's circumplex model of affect. The reason why we have developed a classification algorithm for 4 basic emotions is that the larger the number of emotions, the lower the overall success of the classification algorithm. Epp, Lee and Pentel also had this problem. We want to expand our solution in the future. We assume that it is possible to increase the success of the classification and at the same time recognize (and classify) a greater number of emotional states. For example, with the help of a sufficiently long text it would be possible to observe the course of individual emotions in a certain time interval. Emotions would again be classified on the basis of predicates. The role of the psychologist would be to tag (evaluate) individual words and sentences from the text from the point of view of sentiment, and thus we would be able to clearly identify the degree of success in classifying individual emotional states.
Currently, our next goal is to integrate Emotnizer into the learning process. The whole application was developed in order to determine why students of applied informatics are unsuccessful in solving certain programming tasks. We want to continue to improve the application and use it to determine how students behave when programming: 1. They must read the task assignment first (induction of emotion).
2. They propose a solution, they start programming (change of emotional state).
3. The application will detect programming errors (number of corrections in the code) and monitor the emotional state of students (whether, for example, frustration does not affect their programming skills and the end result).
4. Based on the results of behavioral characteristics together with a psychologist, we will propose a suitable solution to a possible problem.

M. Magdin
He works as a professor assistant at the Department of Informatics. He deals with the theory of teaching informatics subjects, mainly implementation interactivity elements in e-learning courses, face detection and emotion recognition using a webcam, speech recognition. He participates in the projects aimed at the usage of new competencies in teaching and also in the projects dealing with learning in virtual environment using e-learning courses.

D. Držík
He is student at the Department of Informatics. He deals with programming in Java, Python and C. He specialization is using and implementation various algorithm for face detection, acquisition of user behavioral characteristics and classification emotion.