High-wearable EEG-Based Detection of Emotional Valence for Scientic Measurement of Emotions

An emotional-valence detection method for a very–high wearable EEG-based system is proposed. Valence detection occurs along the interval scale theorized by the circumplex model of emotions. The binary choice, positive valence vs negative valence, represents a ﬁrst step towards the adoption of a metric scale with a ﬁner resolution. Wearability is guaranteed by a wireless cap with conductive-rubber dry electrodes and 8 data acquisition channels. Experimental validation was realized on 25 volunteers without depressive disorders. The metrological reference was built by combining the rating of a standardized set of pictures from dataset Oasis and results from the Self Assessment Manikin questionnaire. Two different strategies for feature extraction were compared: (i) based on a-priory knowledge (i.e. Hemispheric Asymmetry Theories) and (ii) automated. A pipeline of a custom 12-band Filter Bank and Common Spatial Pattern algorithm is the method proposed for automated feature extraction. Four machine learning classiﬁers were tested for validating the proposed method in discriminating two classes ( high or low emotional valence ). An intra-individual average accuracy, 96.2 %, was obtained by a shallow artiﬁcial neural network, while K -Nearest Neighbors allowed to obtain 80.3 % of inter-individual accuracy. Considering the wearability, and the well founded metrological reference, results are the state of the art in emotional valence detection.


Introduction
The word emotion derives from the Latin "Emotus" which means to bring out. Technically, emotion is the response to imaginary or real stimuli characterised by changes in individual's thinking, physiological responses, and behaviour 1 . In the circumplex model 2 of emotion, valence denotes how much an emotion is positive or negative along an interval scale 3 . Discrimination of emotional valence is a broad issue widely addressed in recent decades, affecting the most varied sectors and finding application in multiple domains. Some application fields are for example, car driving 4, 5 , working 6 , medicine 7,8 , and entertainment 9 .
A further approach to the study of emotions is provided by the discrete emotions model (anger, fear, joy,...). The latter allows a classification of emotional states through a nominal scale 3 . On the contrary, the circumplex model allows the measurability of emotions by additive quantities being it based on an interval scale 10 .
Several biosignals have been studied over the years for emotions recognition: cerebral blood flow 11 , electroculographic (EOG) signals 12 , electrocardiogram, blood volume pulse, galvanic skin response, respiration, phalanx temperature 13 . In recent years, several studies have focused on the brain signal. There are many invasive and non-invasive techniques for understanding the brain signals such as: PET (Positron Emission Tomography), MEG (Magneto Encephalography), NIRS (Near-infrared Spectroscopy), fMRI (Functional Magnetic Resonance Imaging), EROS (Event-related optical signal), EEG (Electroencephalogram). Among the mentioned systems, electroencephalography offers a better temporal resolution. There are already some portable EEG solutions on the market. Currently, a scientific challenge is to use dry electrodes 14,15 and increasingly reduce the number of channels to maximize the wearability while maintaining high performances.
A metrologically EEG-based correct measurement of emotions poses a reproducibility problem. Often, the same stimulus or environmental condition does not induce the same emotion in different subjects (inter-individual reproducibility loss). Furthermore, the same individual exposed to the same stimulus but after a certain period of time, reacts in a different way (intra-individual reproducibility loss). In psychology research, suitable sets of stimuli were validated experimentally by using significant samples and are widely used by clinicians and researchers 16 . However, the problem of standardizing the induced response remains still open, also considering, for example, the issue of the cross-cultural generality of perceptions. The effectiveness of the emotion induction can be verified by means of self-assessment questionnaires or scales. The combined use of the validated stimulus rating and the subject's self-assessment can represent an effective strategy towards the construction of a metrological reference for the EEG-based scientific measurement of emotions 17 .
As concerns the measurement model, older approaches predominantly made use of a priori knowledge. Emotions studies, based on spatial distribution analysis of electroencephalographic signal, were principally focused on the asymmetric behaviour of the two cerebral hemispheres 18 . Two theories, in particular, model the relationship between emotions and asymmetry in a different way. The theory of right hemisphere posits that the right hemisphere is dominant over the left hemisphere for all forms of emotional expression and perception. Instead, the theory of valence states that the right hemisphere is dominant (in term of signal amplitude) for negative emotions and the left hemisphere is dominant for positive emotions. In particular the theory of valence focuses on what happens in the two areas of the prefrontal cortex. The prefrontal cortex plays an important role in the control of cognitive functions and in the regulation of the affective system 19 . The EEG asymmetry allows to evaluate the subject's emotional changes and responses and, therefore, it can serve as an individual feature to predict emotional states 20 .
The most common frequency index for emotion recognition is the so called frontal alpha asymmetry (α asim ) 21 : where the parameters α PSD L and α PSD R are the power spectral densities of the left and right hemispheres in the alpha band. Frontal alpha asymmetry could also predict emotion regulation difficulties by resting state electroencephalogram (EEG) recordings. Frontal EEG asymmetry effects are quite robust to individual differences 22 .
Several modern machine learning systems automatically carry out the feature extraction procedure. Therefore, a very large number of data from different domains, i.e. spatial, spectral or temporal, can be used as input to the classifier without an explicit hand-crafted feature extraction procedure.
Spatial filters usually enhance sensitivity to particular brain sources, to improve source localization, and/or to suppress muscular or ocular artifacts 23 . Two different categories of spatial filters exist: those dependent on data and those not dependent on data. Spatial filters independent of data (Common Average Reference, Surface Laplacian spatial filters) generally use fixed geometric relationships to determine the weights of the transformation matrix. The data-dependent filters, although more complex, allow better results for specific applications because they are derived directly from user's data. They are particularly useful when little is known about specific brain activity or when there are conflicting theories (i.e. theory of valence and theory of the right hemisphere).
The aim of this research is to provide a valence emotion detection as a first step towards a scientific measurement of emotional valence based on EEG wearable solutions. In this paper, an instrument for EEG-based valence detection is proposed with reproducible good accuracy (80.3 %) in detecting positive or negative valence from 2 s EEG epochs. The architecture exploits a low number of data acquisition channels (8) and dry electrodes for enhancing wearability in everyday applications. In Section 2, a State of Art of emotional valence detection is reported. In Section 3, the basic ideas, the architecture and the data analysis of the proposed method are highlighted. Then, in Section 4, the laboratory test procedure, a statistical comparison between stimuli scores and participants perceptions, and the experimental validation are reported, by detailing and by discussing the results of the compared methods.
A multichannel EEG emotion recognition method based on a Dynamical Graph Convolutional Neural Network (DGCNN) was proposed by Song et al 29 . Experiments were conducted on the 62-channels dataset SEED 41 and on the 14-channels dataset DREAMER 42 . The average accuracies of 90.4 % and 79.95 % were achieved on the SEED dataset for subject dependent and subject independent settings respectively, in a three classes emotion recognition. The average accuracy of 86.23 % was obtained on valence dimension (positive or negative) of the DREAMER dataset in the subject dependent configuration. A Multi-Level Features guided Capsule Network (MLF-CapsNet) was employed by Liu et al. for a multi-channel EEG-based emotion recognition 37 . Valence (positive or negative) was classified with an average accuracy of 97.97 % on the 32-channels DEAP 43 dataset and 94.59 % on the 14-channels DREAMER dataset. Subject dependent experiments were performed. Comparable results were obtained by applying an end-to-end Regional-Asymmetric Convolutional Neural Network (RACNN) on the same datasets in a subject dependent setup 38 . EEG signal, acquired through ad hoc experimental activities, are employed in further studies 31 parameters, was implemented. A 70 % inter-subjective accuracy was reached using an SVM classifier. Self-assessment tests were not administered to subjects. More recently, several studies focused on channel reduction for improving the wearability of the emotion detection systems 46-55 . Marín-Morales at al. designed virtual environments to elicit positive or negative valence 54 . Images from IAPS dataset were used as stimuli. The emotional impact of the stimulus was evaluated using a SAM questionnaire. A set of features, extracted from electroencephalography (EEG) and electrocardiography (ECG) signals, was input into a Support Vector Machine classifier obtaining a model's accuracy of 71.21 % along the valence dimension (binary classification problem). A 10-channel device was used to record the EEG signal from 15 subjects. Sensors' foams were filled with Synapse Conductive Electrode Cream.
The EEG signals of 11 subjects were used to classify valence (positive and negative) by the authors 48 . Pictures from Geneva Affective Picture Database (GAPED) 56 dataset were used as elicitative stimuli. The accuracy rates of an SVM classifier were 85.41 % and 84.18 % using the whole set of 14 channels and a subset of 10 channels respectively, in the subject independent setting. EEG signals were acquired through a wet-14 channels device and no self-evaluation questionnaires were used.
Wei et al. proposed a real-time valence emotion detection system based on EEG measurement realized by means of a headband coupled with printed dry electrodes 55 . 12 participants undertook the experiment. Pictures selected from GAPED were used to elicit positive or negative valence. Self-evaluation questionnaires were employed. Two different combinations of 4 channels were tested. In both cases, the subject independent accuracy was 64.73 %. The highest subject dependent accuracy increased to 91.75 % from 86.83 % switching from one configuration to another. The latter two works 48,55 both proposed the use of standardized stimuli. However, in the first one 48 , the concomitant use of self-assessment questionnaires was missing. Moreover, in the second one 55 , self-assessment questionnaires were employed but the results were not compared with the scores of the stimuli used. Failure to compare individual reactions with the standardized stimulus scores, negatively impacted on the result of the experiment.
Happy or sad emotions were elicited through images provided by the Affective Picture System (IAPS) by Ang et al 52 . The EEG signals were acquired through FP1 and FP2 dry electrodes. An Artificial Neural Network (ANN) classifier was fed with discrete wavelet transform coefficients. The best detection accuracy was 81.8 % on 22 subjects. Beyond the use of standardized stimuli, the subjects were also administered self-assessment scales. Moreover it is unclear how the SAM scores were used and whether the approach is intra or inter subjective. Ogino et al. developed a model to estimate valence by using a single-channel EEG device 47 . Fast Fourier Transform, robust scaling and support vector regression were implemented. EEG signals from 30 subjects were acquired and an average classification accuracy of 72.40 % was reached in the subject-dependent configuration. Movie clips were used to elicit emotional states and SAMs were administered to the participants for rating the valence score of the stimuli. A subject-independent emotion recognition system based on Multilayer Perceptron Neural Network was proposed by Pandey et al 51 . An accuracy of 58.5 % was achieved in the recognition of positive or negative valence on DEAP dataset using the F4 channel.
A reduced number of channels implies a low spatial resolution. Traditional strategies for EEG signal feature extraction, combined with a-priori knowledge on spatial and frequency phenomena related to emotions, can be unusable in case of few electrodes. In a previous work of the Authors, for a single-channel stress detection instrument, a-priori spatial knowledge drove electrodes positioning 57 . However, signal processing was based on innovative and not well-settled strategies. Although proper psychometric tools were adopted for the construction of the experimental sample, the reproducibility of the experiment was adversely affected by the use of not standardized stimuli. Several of the reported papers pursued an emotion classification task following a discrete emotions model. Thus, they cannot be taken into account for emotion measurement goal. Other papers process EEG signals elicited by not standardized stimuli and so penalizing the generalizability of the results. This is the case of the public EEG datasets SEED, DEAP, and DREAMER. The use of audio-visual stimuli guarantees higher valence intensity (positive or negative) with respect to visual stimuli (pictures) 58 . Therefore, the sensitivity of the measurement system increases and the accuracy in emotion detection can be higher. However, the main standardized stimuli datasets (IAPS, GAPED, OASIS, etc) contain static visual stimuli. Further not standardized stimuli are personal memories. For example, the study 53 presents a very interesting data fusion approach for emotion classification based on electroencephalogram (EEG), electrocardiogram (ECG), and photoplethysmogram (PPG). The EEG signals were acquired through an 8-channel device.
A Convolutional Neural Network (CNN) was used to classify three emotions reaching an average accuracy for the subjectindependent case of 76.94 %. However, personal memories of the volunteers were used as stimulus, compromising the reproducibility of the experimental results. Moreover, due to the adoption of the discrete emotion model, the study cannot be taken into account for emotion measurement goal. Finally, among all the reported studies, only 52, 54 combined SAM and standardized stimuli ratings for the construction of the metrological reference.

Methods
The aim of this study is to create a wearable emotional valence detection system starting from the electroencephalographic signal. Prior informed consent for publication of identifying information and images was obtained by all the participants. In this Section, the proposed approach and its experimental evaluation will be presented. The ideas behind the proposed method are: • An EEG-based method for emotional valence detection: Emotional functions are mediated by specific brain circuits and electrical waveforms. Therefore, the EEG signal varies according to the emotional state of the subject. However, using suitable algorithms, such a state can be recognized.
• Low number of channel, dry electrodes, wireless connection for high wearability: An 8 channel-dry electrodes device does not require a burdensome installation and the absence of the electrolytic gel enhances the comfort for the user. The wearability of the instrument is also guaranteed by the absence of connection cables and, therefore, by the wireless transmission of the acquired signals. Both of them simplify the operator's job.
• Multifactorial metrological reference: A multifactorial metrological reference was implemented. Images belonging to a statistically validated dataset were used as stimuli for eliciting emotions. Therefore, each image is scored according to the corresponding valence value. The metrological reference of the emotional valence is obtained by combining the scores of the stimuli (statistically founded) with the score of the self-assessment questionnaires (subjective response to the standardized stimulus).
The Bland-Altman and the Spearman analysis were carried out for comparing Self-assessment questionnaires (SAM) scores and the OASIS dataset scores.
• 12-band Filter Bank:Traditional filtering, employed to extract the information content from the EEG signals, is improved by a 12-band Filter-Bank. Compared to the five typical bands for EEG analysis (alpha, beta, delta, gamma, theta), narrowing the frequency intervals, the features resolution increases.
• Beyond a priori knowledge: A supervised spatial filter (namely CSP) guarantees automated feature extraction from spatial and time domains.

Architecture
The architecture of the proposed method is shown in Fig. 1 Filter Bank and a Common Spatial Pattern (CSP) algorithm carry out the feature extraction. The Classifier receives the feature arrays and detects the emotional valence.

Data processing
In this section, the features selection and extraction and the classification procedures are presented.

Features selection and extraction
Finer-resolution partitions of the traditional EEG bands were proposed for emotion recognition 59,60 . In the present work, a novel Filter Bank version, recently adopted in distraction detection 61 , is employed. The acquired EEG signal is filtered through 12 IIR band-pass filters Chebyshev type 2, with 4 Hz of amplitude, equally spaced from 0.5 to 48.5 Hz. In this way, the traditional five EEG bands (delta, theta, alpha, beta, and gamma) are divided into 12 sub-bands. Therefore, the features resolution is increased by the narrowing of the bands. The EEG tracks are acquired at a sampling frequency of 512 Sa/s and divided into 2 s time windows overlapping of 1 s. Each record is composed of 96 EEG tracks (obtained by applying the 12 filters of the Filter Bank on each of the 8 channels), each one of 1024 samples.
Spatial and frequency filtering is applied to the output data of the filter bank. A well-claimed Common Spatial Pattern (CSP) is used as a spatial filter. CSP sorts the transformed domain components in order to guarantee decreasing data variances of the first class and increasing data variances of the second one. In this way, according to the "variance of each component", data can be more easily separable.
CSP is mostly used in EEG-based motor imagery classification and gives excellent results 62 . Motor imagery is a strongly spatially and spectrally characterized phenomenon.
A previous study 63 showed that the CSP spatial filtering method entails the relationship between EEG bands, EEG channels, neural efficiency and emotional stimuli types. It demonstrated that CSP spatial filtering gives significant values on band-channels (p < 0.004) combination. Spatial characteristics may provide more relevant information to distinguish different emotional states. A feasibility study demonstrated the CSP capability of applying spatial features to EEG-based emotion recognition reaching average accuracies of 85.85 % and 94.13 % on the self-collected and MAHNOB-HCI datasets. Three emotion tasks were detected with 32 EEG channels 64 . In a binary problem, the CSP computes the covariance matrices of the two classes. By means of a whitening matrix, the input data are transformed in order to have an identity covariance matrix (mainly, all dimensions are statistically independent). Resultant components are sorted on the basis of variance in order: (i) decreasing, if the projection matrix is applied to inputs belonging to class 1, and (ii) ascending, in case of inputs belonging to class 2. The CSP receives as input 3D tensors with dimensions given by the number of channels, filters, and samples.

Classification
In this study, the emotional valence is classified using a k-Nearest Neighbors (k-NN) 65 . One of the main advantages of the k-NN is that, being non-parametric, it does not require a training phase unlike other machine learning methods. In a nutshell, given a set of unlabelled points P to classify, a positive integer k, a distance measure d (e.g., Euclidean) and a set D of already labelled points, for each point p ∈ P, k-NN assigns to p the most frequent class among its k neighbours in D according to the measure d. The number of neighbours k and the distance measure d were set using a cross-validation procedure.

Data acquisition setup
The experimental protocol was approved by the ethical committee of the University Federico II. Written informed consent was obtained by the subjects before the experiment. All methods were carried out in accordance with relevant guidelines and regulations. Prior informed consent for publication of identifying information and images was obtained by all the participants. Thirty-one volunteers, not suffering from both physical and mental pathologies, were screened by means of the Patient Health Questionnaire (PHQ) for excluding depressive disorders 66 . Six participants were excluded from the experiment owing to their score in PHQ, resulting in twenty five healthy subjects, (52 % male, 48 % female, aged 38 ± 14). The experiments were conducted in a dark and soundproofed environment to prevent disturbing elements. The subjects were instructed on the purpose of the experiment. A Mood Induction Procedure (MIP) based on the presentation of emotion-inducing material was used to elicit suitable emotions. Emotional stimuli were presented without explicitly instructing subjects to get into the mood state suggested. The volunteers were emotionally elicited through passive viewing of pictures and they were asked to assess the experienced valence by two classes: negative and positive. Images were chosen from the reference database Oasis 67 . Oasis attributes a valence level to each image on a scale from 1.00 to 7.00.
Only Italian volunteers participated the experiment, thus a pre-test on the trans-cultural robustness of the selected images was administered to a different group consisting of 12 subjects. Specifically, suitable pictures were shown and was asked subjects to rate each image using the scale "self assessment manikin" (SAM). Images with a neutral rating from at least 50

5/13
% of the subjects were excluded from the experiment. In fact, a stimulus strongly connoted in a specific cultural framework, loses its strength out of that context. An emblematic example are the symbols related to the Ku Klux Klan. Those have a different connotative richness for a citizen of the United States of America compared to European people. The same pre-test revealed very low performances for detecting valence level when the stimuli score was around the the midpoint value of the valence scale. The sensitivity of the system was improved by selecting a suitably polarised subset of Oasis images. First of all, images with highest and lowest valence score were identified: respectively 6.28 and 1.32. Then, 1.00 was the span chosen to guarantee the trade-off between the maximum image polarization and an adequate quantity of images to build the experiment (>100). Therefore, [1.32-2.32] and [5.28-6.28] were adopted as the scoring intervals for negative and positive stimuli valence, respectively.
Bland-Altman and Spearman analyzes were carried out to compare the experimental sample with respect to the Oasis experimental sample. The agreement between the measurements expressed by the two samples is verified, as evidenced by a qualitative analysis in fig 2 and the Spearman correlation index ρ = 0.799.

Figure 2. Bland-Altman analysis on the agreement between stimuli (OASIS) and volunteers perception (SAM)
26 photos were shown in as many trials, 13 pictures for eliciting negative valence and 13 for eliciting positive valence. The selected stimuli related to different themes and were randomly shown to participants in order not to create expectations in the tested subject. Each trial lasted 30 s. Before the image projection, a 5-s white screen and a countdown frame were showed to relax the subject and separate emotional states mutually. After the image projection, the subject had 15 s to fill in a self-assessment scale on the experienced valence: the self-assessment manikin (SAM). The subject was required to express a judgement on the positivity/negativity of his/her valence on a scale from 1 to 5.

Hardware
In this study, EEG data were acquired by the ab medica Helmate, Class IIA certified according to Medical Device Regulation (UE) 2017/745 (Fig. 3 A). The Helmate is provided with a rechargeable battery and is able to transmit the acquired data via Bluetooth, without connection cables. This ultra-light foam helmet is equipped with 10 dry electrodes which 8 acquisition channels (unipolar configuration) and with disposable accessories (under-helmet and under-throat). Electrodes are made of conductive rubber and their endings are coated with Ag/AgCl. They have different shapes to pass through the hair and reach the skin (Fig. 3 B). The electrodes, arranged on the scalp according to the International Positioning System 10/20, were placed on: Fp1, Fp2, Fz, Cz, C3, C4, O1, and O2. The resulting signals are recorded differentially vs ground (Fpz), and then referenced with respect to AFz, both placed in the frontal region. A dedicated software measures the contact impedance between the electrodes and the scalp. The acquired EEG signal, sampled at 512 Sa/s, is sent to the Helm8 Software Manager. It allows both to display the signal directly on PC in real time and to apply a large variety of pre-processing filters. The device has an internal µ SD for backup purposes.
In a-priori spatial knowledge framework, frontal asymmetry feature was chosen, computed by subtracting the left frontal (FP1) from the right (FP2) channel. Moreover, the whole hemispherical asymmetry was also considered and the differences of the three symmetric channel pairs were input to the classifiers. The analysis considered only spatial or both spatial and frequency features, according to the different neurophysiological theories. In both the cases, artifacts were removed from EEG signals using Independent Component Analysis (ICA) by means of the EEGLAB Matlab toolbox version 2019.
A-priori frequential knowledge led to the use of a [8][9][10][11][12][13] Hz (alpha band) pass-band filter (zero-phase 4 th -order digital Butterworth filter). Without a priori knowledge, features were extracted via the PCA and CSP algorithms. Also in this case, only the spatial information and the combination of spatial and frequency information were analyzed. Input features were 8192 (8 channels * 1024 samples) when PCA and CSP were fed only by spatial information. In case of spatial and frequency combination, features increased to 98304 (12 frequency bands * 8 channels * 1024 samples), because frequency signal processing was based on the custom 12-bands filter bank. Then, the features were reduced from 98304 to 96 using the CSP algorithm.
Subsequently, in the classification stage, two types of investigations were carried out: intra-subjective and inter-subjective. In the first case, data of a single subject were employed for training and classification phases, while in the second one was based on the data set as a whole. In both cases, the proposed method was validated through a stratified 3-fold validation procedure. Namely, given a combination of the classifier hyperparameters values, a partition of the data composed of K subsets (folds) is made, preserving the ratio between the samples of different classes. A set T consisting of K − 1 folds is then used to train the model and the remaining fold E to measure the model performances using any metric scores (e.g., accuracy). The whole process is then repeated for all the possible combinations of the K folds. Finally, the average scores on all the test sets are reported. Therefore, statistical more relevant results are expected despite less performing classifiers. Furthermore, training and test sets are made keeping together the epochs of each trial (consisting of 4 epochs each) in the same set, both in the inter-subject and in the intra-subject approach. In this way, the training and the test sets do not include parts of the same trial.
k-NN was compared with other three classifiers: Linear Discriminant Analysis (LDA) 69 , Support Vector Machine (SVM) 70 , and Artificial Neural Networks (ANN) 71 . Furthermore, to prevent possible over-fitting, regularization terms in the training procedures were used for SVM learning using the SVM soft-margin formulation 70 , and for neural networks learning using a weight decay 72 during the learning algorithm execution. For all the classifiers, the hyperparameters used during the CV procedure are reported in Table 1. Accuracy, precision, and recall are reported to assess the classification output quality. Precision measures result relevancy, while recall how many truly relevant results are returned. The F1 score, combining precision and recall, was computed to assess the classification performance in minimizing false negatives for the first class (negative valence) analysis. Considering many use cases, the minimization of failure in recognizing negative valence is the main issue.

Experimental results
Accuracy was related to the model's ability to correctly differentiate between two valence states. EEG tracks relating to the negative and positive image tasks were associated to the first and the second class, respectively.
The mean of the individual accuracies and standard deviations computed on each subject (intra-subjective case) and the accuracies and standard deviations computed on all subjects data as a whole (inter-subjective case) are showed when a priori spatial-frequency knowledge is used ( Table 2) or not (Table 3). Results are shown at varying the adopted classifier.
Better performances are obtained without a-priori knowledge and when features are extracted by combining Filter-Bank and CSP, both in intra-subjective and inter-subjective case. In intra-individual analysis, the data subsets are more uniform and all the classifiers provide very high accuracy. In inter-individual analysis, when data from all subjects are merged, variability increases and not all the classifiers give good results. Interestingly, in the inter-subjective approach the best performances are achieved using a k-NN classifier, while the scores degrade using the other classifications setups. This behaviour suggests that the data of similar classes are close together for different subjects, but that in general they are not easily separable using classical machine learning methods.
In conclusion, the proposed solution based on 12-bands Filter-Bank provides the best performances reaching 96.2 % of accuracy with ANN in intra-individual analysis and 80.3 % with k-NN in inter-individual analysis. Precision, recall and F1-score metrics are reported in Fig.4.

Discussion
In the previous Sections the measurability foundation of emotion was discussed. The reference theory adopted and the combined use of the validated stimulus ratings and the subject's self-assessment emerged as pillars for the scientific measurement of emotions. It was also highlighted the challenge posed by the accuracy and wearability trade-off. The novelty of this research is based on the compliance with different quality parameters. In table 4, this study is compared with the works examined in Section 2, taking into account the following criteria: (i) classification vs measurement, (ii) standardized stimuli, (iii) self-assessment questionnaires, (iv) wearability (number of channels ≤ 10), (v) inter-individual accuracy > 80 % (vi) intra-individual accuracy > 90 %. Among all the examined works, the proposed study is the only one that matches all the aforementioned criteria.

Conclusion
An emotional-valence detection method for a very-high wearable EEG-based system was proposed by proving experimentally accuracy of 96.2 % and 80.3 % in intra-individual and inter-individual analysis, respectively. Valence detection occurs along the interval scale theorized by the circumflex model of emotions. The binary choice, positive valence vs negative valence, represents a first step towards the adoption of a metric scale with a finer resolution. A priori information is not needed using algorithms capable of extracting features from data through an appropriate spatial and frequency filtering. A metrological reference was built by combining the statistical strength of the data set OASIS with the collected data about the subject perception. The data set was also subjected to a cross-cultural validity check. Classification is carried out with a time window of 2 s. The achieved performances are due to the combined use of a custom 12-bands Filter Bank with CSP spatial filtering algorithm. This approach is widely used in the motor imagery field and was proven to be valid also for emotion recognition. The wearability of the system is guaranteed by few channels, dry electrodes and the wireless data transmission. The high wearability and accuracy are compatible with the principal applications of valence emotion recognition. Considering the high wearability, and the well founded metrological reference, results are the state of the art in emotional valence detection. Future developments of the research will be: (i) a resolution improvement of the valence metric scale; (ii) combined use of different biosignals (besides EEG); (iii) a deep analysis on interactions among the number of electrodes, classifiers, and the accuracy; (iv) experiments on different processing strategies: in this study, the binary nature of the problem enhanced the classification performances of the k-NN. In future works aimed at increasing the metric scale resolution, other methods may result more effective (SVM, full-connected neural networks, Convolutional Neural Networks 53 etc.) for example in a regression-based perspective. Table 4. Studies on emotion recognition classified according to metrological approach, wearability and accuracy (n.a. = "not available", ✔ = "the property is verified". Only for the first line, ✔ = "Measurement" ) 30 31  24  32  33  25  26  34  35  36  46  37  38  27  39  44  40  28  29  47  48  49  50  51  45  52  53  54  55 Our work