AUTOMOTIVE: A case study on AUTOmatic multiMOdal drowsiness detecTIon for smart VEhicles

As technology and artificial intelligence conquer a place under the spotlight in the automotive world, driver drowsiness monitoring systems have sparked much interest as a way to increase safety and avoid sleepiness-related accidents. Such technologies, however, stumble upon the observation that each driver presents a distinct set of behavioral and physiological manifestations of drowsiness, thus rendering its objective assessment a non-trivial process. The AUTOMOTIVE project studied the application of signal processing and machine learning techniques for driver-specific drowsiness detection in smart vehicles, enabled by immersive driving simulators. More broadly, comprehensive research on biometrics using the electrocardiogram (ECG) and face enables the continuous learning of subject-specific models of drowsiness for more efficient monitoring. This paper aims to offer a holistic but comprehensive view of the research and development work conducted for the AUTOMOTIVE project across the various addressed topics and how it ultimately brings us closer to the target of improved driver drowsiness monitoring.


I. INTRODUCTION
D RIVER status monitoring (DSM) systems have emerged as an innovative solution to prevent drowsiness and fatigue-related traffic accidents. Some major automotive manufacturers, such as Ford, Toyota, BMW, and Nissan, have been developing DSM systems since the 2000s [1], making use of visual, vehicle, and physiological measurements to monitor and evaluate the state of the driver and triggering automatic alerts if drowsiness is detected. By using systems with a half-second warning time, one can expect an estimated 60% decrease in the number of accidents, whereas an extra second can prevent up to 90% of collisions [2].
To develop such systems, since real on-road drowsiness acquisitions are generally unsafe, researchers often rely on simulated environments. However, drivers' behavior in such scenarios is unrealistic since they do not perceive risk in a similar way, resulting in discrepancies between the observed behavioral patterns and the expected real-world observations.
Emulating on-road experiences more faithfully during simulation-based acquisitions has resulted in improved data. The collection of biological signals has had a similar effect [3], [4]. Several simulation-based data collection projects have been recently conducted, including naturalistic driving studies and field operational tests [5]. However, even when considering these advances, the disparities between naturalistic data and simulator-based acquisitions are still a relevant challenge.
One other significant hurdle to the reliability of developed products in real-life conditions is the high variability of behavioral patterns across drivers. A drowsiness monitoring system can present acceptable average accuracy levels for a given set of drivers and, simultaneously, be inadequate at recognizing the specific sleepiness patterns of other drivers. This could be improved by taking advantage of biometric VOLUME 4, 2016  recognition solutions which, by delivering continuous identity predictions, would enable the continuous learning of user-specific drowsiness models using the data acquired from each driver. Although major automotive brands have been introducing biometric recognition technology solutions, the use of biometrics for continuous learning of driver-specific models is still an open topic. For both drowsiness monitoring and biometric recognition in driving scenarios (as well as several related research topics, e. g., emotion recognition) two human characteristics stand out: the face and the electrocardiogram (ECG). The face is the focus of most research, since it can be acquired with inexpensive cameras, carries drowsiness information, and is universally accepted as one of the most effective and intuitive approaches for biometric recognition. However, its performance is known to suffer significantly when the quality of the acquisition is compromised by varying illumination, occlusions, movement, among other factors [6], which should be expected when monitoring a driver inside a moving vehicle.

IM P R O V E S
On the other hand, the ECG is generally robust against illumination and visibility factors that would affect face acquisitions. Research on ECG biometrics has quickly been gaining traction due to its universality, ease of acquisition and processing, difficult counterfeiting, and reliable distinctiveness [7]. As a physiological signal, the ECG also varies significantly with psychophysiological states, thus carrying information on wellbeing factors such as sleepiness, stress, or emotions. The need for continuous contact is a considerable inconvenience, but this can be effectively minimized through modern off-the-person acquisition techniques (such as the CardioWheel [8] or the Nymi Band [9]).
With this, the complementary nature of the ECG and face data for driving scenarios seems obvious. A multimodal solution would combine the benefits observed in recent years with deep learning approaches for face recognition in highly unconstrained scenarios [10], [11] with the meaningful strides verified in ECG biometrics [12]. Its integration in a reliable, robust, and personalized alert system combining multimodal biometric recognition and drowsiness detection represents a sizable challenge. The main hurdle concerns the processing and classification of real data by algorithms developed on simulated data. Nevertheless, this is still the most promising path towards the next generation of intelligent recognition of driver drowsiness. This was the central goal of the AUTOMOTIVE project. To advance the state-of-the-art algorithms regarding behavioral variability patterns and acquisition quality, while dealing with the domain mismatch between simulated scenarios and real-world setups. The AUTOMOTIVE project was conducted in Portugal and led by the Institute for Systems and Computer Engineering, Technology and Science (IN-ESC TEC), with the participation of the company CardioID Technologies, the Instituto Superior de Engenharia de Lisboa (ISEL), and the Universidade Lusófona de Humanidades e Tecnologias (ULHT).
This paper presents the main achievements and highlights of the AUTOMOTIVE project. This entails the description of the meaningful strides achieved in its diverse research topics (see Fig. 1) and how they contribute to the central goal of advancing the state-of-the-art in driver drowsiness monitoring. Beyond this introduction, this paper covers the project's work on driving simulators and data collection, in section II; the pattern recognition methodologies for biometrics, emotion recognition, and drowsiness monitoring, in section III; and the general conclusions drawn from the FIGURE 2: Summary of the diverse measurement approaches used for driver drowsiness detection (adapted from [13]). AUTOMOTIVE project, in section IV.

II. DATA ACQUISITION
Driving research is a broad field with numerous projects typically carried either in simulated or naturalistic environments. Virtual driving environments stand out due to the minimal risk to the volunteers during the experiments and the possibility of fully controlling vehicle physics and traffic complexity. As such, whether by collecting large-scale data and training deep neural networks without any logistic restraints or by emulating exact traffic circumstances to validate trained models in extreme scenarios, using simulator-based development has been a key contributor to the great recent strides in autonomous driving.
This section presents some driving simulation environments currently publicly available or provided upon request for research purposes: • Microsoft AirSim 1 [14] was specifically designed as a testing platform for artificial intelligence (AI) experiments. This platform offers an application programming interface (API) for acquiring sensor, telemetry, and image data from large-scale urban and natural environments. Its cameras have photo-realistic lighting, realtime depth view, and object segmentation. The control of the vehicle is based on the Unreal physics engine; • CARLA 1 [15] enables complete programmatic control over the simulation. This platform includes an autonomous driving sensor suite with configurable sensors and ground-truth data, emulating a virtual city with dynamic pedestrians and vehicles. This simulator extends a C++/Python API and manages remote procedure call (RPC) communication through an event-based serverclient model; • Deepdrive Voyage 2 offers a TensorFlow engine for deep reinforcement learning while providing extensive hours of prerecorded training from a self-driving competition between users performing complex maneuvers; • NVIDIA's 3 platform recreates actual locations with very high levels of detail, realism, and complexity, thus massively extending the development and validation of deep learning models [16]. It has a first server that renders the environment and generates simulated sensor data and a second with the prototype self-driving hardware engine that processes data as if deployed invehicle. Regarding the data that can be acquired in driving research, either through simulated or naturalistic environments, the most commonly used in-vehicle measurement sensors for detecting driver drowsiness can be categorized into vehicle behavior, driver behavior, and physiological signals (according to [17]). Fig. 2 organizes the types of information acquired through these three forms of sensing. Some data is publicly available (or provided upon request for research purposes) as part of the databases presented in Table 1, where those used in the current project are highlighted in bold.
One of AUTOMOTIVE's objectives was to develop a platform to simulate a real driving experience and effectively collect multimodal data in real-time. Hence, two simulator platforms were developed, where the main observable difference between the two relies on the immersiveness of the environment. The first simulator, AUTOMOTIVE DD (Desk-Driver) Simulator, runs on a simple desktop monitor with the driver seated on a chair. The second, AUTOMOTIVE IHC (In-Half-Car) Simulator, includes a more realistic virtual driving environment, projected onto wide screens surrounding the driver. For added realism, the driver is behind the wheel of a real (parked) car cut in half.
Both simulator platforms integrate different sensory information from different sources, namely: • The electrocardiogram (ECG), acquired with the Car-dioWheel, a machine learning solution that has sensors embedded in the steering wheel and recognizes the driver's identity based on ECG [8]. Several heart metrics features are automatically extracted, to be used in the current work; • Face video, acquired with the Intel RealSense™ SR300 camera, which simultaneously captures RGB, depth, and infrared video channels. This camera's software development kit automatically detects 78 face landmarks [28]; • Driving style events, i. e. dynamic signals such as acceleration and braking, are obtained directly from the simulator [28]; • Performance telemetry in simulated urban and highway environments, i. e. information such as speed control, lane following, acceleration, and braking inputs, are obtained using the Mobileye™ collision-avoidance system [28].
The designed experimental protocol for the acquisition is the same for both simulators and is presented below.

A. EXPERIMENTAL PROTOCOL
The data acquisition protocol was influenced by the work of Naurois et al. [29]. Its design addresses the need for robust and complete data. Hence, we collect data regarding vehicle movement (steering wheel angle, vehicle speed, acceleration, and braking), the driver's behavior (facial expressions and features), physiology (their electrocardiogram), and their sleepiness state, as well as additional information that can be useful to further understand the subjects and their driving experience.
Before the acquisition, we check if the volunteer has a valid driver's license, is not diagnosed with epilepsy, and is not susceptible to kinecytosis (the short form of the Motion Sickness Susceptibility Questionnaire [30] is applied for this purpose). Furthermore, their susceptibility to sleepiness is assessed by applying the Epworth Sleepiness Scale © [31], and the Horne and Ostberg morningness-eveningness questionnaire is filled to determine the volunteer's circadian rhythm relation to the different times of the day. Other information is collected, including the subject's age, sex, self-assessed sleep quality, coffee consumption, comfort with the driving task, yearly driving frequency, general daily schedule, diagnosed sleep conditions, and use of medication or devices that may influence their natural heart rate.
On the day of the acquisition, the subject is asked to not drink caffeinated nor alcoholic beverages and to sleep six to nine hours the previous night. If possible, the acquisition is scheduled after lunchtime, since the probability of falling asleep in this period increases three-fold [32]. Just before the start of the data collection, the volunteer has a short test drive for as long as they need, to acclimate to the particular driving experience in the simulator. This initial preparation period aims to minimize the influence of the adaptation to the simulator in the collected data.
After this initial period, the subject drives a car out of an urban area and into a highway, taking approximately 2 minutes. They then drive on the aforementioned highway for approximately sixty minutes before arriving at their destination, another urban area where the drive ends. During this whole process, the driver is periodically asked (every 5 to 15 minutes) for their self-assessed score on the Karolinska Sleepiness Scale (KSS), on a range from 1 (the highest level of alertness) to 9 (the highest level of sleepiness, were the subject is combating falling asleep). After the acquisition, the general well-being of the subject is assessed (looking for signs of kinecytosis), and they are asked to fill a final questionnaire including the System Usability Scale.

1) Setup
This simulator's graphics and physics are supported by Unity 3D, a rendering and physics engine that has gained widespread adoption in gaming and research areas, and is featured in several state-of-the-art driving simulators. It is a general-purpose platform that enables fast development and prototyping of simulated environments while allowing for flexible scripting control of all virtual environment elements with C# or Javascript.
The designed virtual environment is composed of two urban areas connected by a highway. The urban areas provide a set of complex road geometries and interaction opportunities with a variety of agents (e. g., pedestrians, traffic signs, crosswalks, intersections, buildings). Meanwhile, the highway gives this environment a flexible component, enabling experiments of customizable length and duration, as well as varying road monotony. The procedural generation of this highway road is what creates this flexibility. It was implemented by adapting the MicroGSD RoadArchitect 4 module, which generates a static road geometry from a spline defined by user-placed nodes. The adaptation eliminated direct user inputs on road geometry and instead parametrized node location as a random distribution of 3D points. Such parameters are: • Segment size, the longitudinal distance between consecutive nodes; • Number of nodes, which together define the total highway length; • Number of lanes, that can range from single to threelane; • Transverse position and height range: from these, a coordinate is randomly chosen, hence defining curve and elevation variations along the highway. By defining these parameters, a highway is generated for every session, allowing us to control the experiment duration and the complexity of the driving task and ensuring that infinite road geometries with equivalent difficulties are available. Furthermore, having the spline definition of road trajectory makes it possible to define a set of non-playable character (NPC) vehicles that travel the highway alongside the user. Generating cars that follow the defined splines and whose speeds are drawn from random distributions makes it possible to simulate and adjust traffic dynamics for a more realistic experience.
An example of the running simulation can be seen in Fig. 3. The multimodal nature of this setup requires it to communicate with several sensors, each collecting data at different rates. The setup incorporates state-of-the-art sensor devices such as the CardioWheel, acquiring ECG at 1 kHz [8], the PulseOn wrist-band, acquiring optical heart rate signals at 25 Hz [33], [34], the MoveSense 5 chest-band, acquiring clinical-grade heart rate at 512 Hz, the Mobileye Connect 6 collision-avoidance system, and the Intel RealSense 7 camera, performing RGB-depth facial analysis at 50 Hz. Furthermore, the simulated environment processes the data provided by these sensors and is also capable of producing performance telemetry as control of speed limits, lane following angle, obstacles, collision events, traffic violations, and acceleration/braking inputs.
However, managing multiple high-throughput low-latency data sources with precise sampling rates requires a performance-focused architecture. This type of system is usually set with a central queuing structure, where inputs 5  and outputs are integrated -a classic example of the producer/consumer model, typically implemented using Message Queuing Telemetry Transport (MQTT) or Advanced Message Queuing Protocol (AMQP).
Specifically designed for handling multiple real-time intensive data streams, Apache Kafka is the proposed solution as an integration middleware platform. The core simplicity of this system favors performance and allows for massive scalability: a server (zookeeper) receives data from several producers and allows it to be consumed from a transaction log distributed into multiple nodes (brokers). By replicating messages through all brokers, Apache Kafka guarantees performance and redundancy: if a node fails, consumption is resumed at the next available broker. Typical use cases are streaming services or telemetry analytics.
In terms of performance, changing the batch queue size enables reaching a balance between transmission latency and overall throughput, as higher throughput also implies higher latency between production and consumption due to batch buffering (see Table 2). With the Unity driving simulator as API for production/consumption, and reducing latency to 20 ms, the platform still manages approximately five thousand messages per second, i. e., sample rates of 5 kHz in simultaneous parallel streams directly feeding the rendering simulation. If larger messages are transmitted, both the average latency and throughput drop significantly (see Table 3), though the average bit rates can rise to 1 Gbit/s depending on batch queue size.
To test the capabilities of this middleware platform, a benchmarking test was performed at localhost with an i7-8565U CPU, 16 GB DDR4 SDRAM, and a 256GB PCI NVM SSD. Two sample messages of 1 kB and 1 MB were used to evaluate performance at different batch sizes on a two-broker configuration by sending three thousand identical messages in each run. Timestamps were registered at production and consumption for each message, then averaged at test completion for average latency. Elapsed times for each test offer average throughput (in messages per second). The average bit rate converts average throughput from messages per second to bits per second. Linger time for Apache Kafka broker is set at 1 ms.
Apache Kafka's low latency is paramount for a real-time streaming platform, specifically for biometric systems or obstacle detection alerts, both of which depend on millisecondlevel response times to be effective. Benchmark results (Table 2 and Table 3) confirm that this platform can process several thousand messages per second at millisecond transmission latency. Overall, these are significantly faster than regular socket communication and hence fulfill the requirements of a driving environment handling real-time analysis of multiple sources with variable sampling intervals.
As a robust but loosely coupled integration architecture, this platform can also integrate several different producing and consuming modules, including adapting to an actual driving environment (which can replace this simulator with no further changes required).   The resulting simulation is displayed on an ASUS Monitor (model VS197DE TN 18.5" 16:9 60Hz FWXGA), which stands in front of the driver. The steering wheel is a Logitech G Dual-Motor Feedback Driving Force G29 Gaming Racing Wheel with Responsive Pedals. A photograph of the platform being used is shown in Fig. 4.

2) Collected Data
A different experimental protocol than the one previously presented was designed to compare the heart rate variability (HRV) features derived from the cardiac data collected using the Cardiowheel, PulseOn, and MoveSense sensors. Results confirming the equivalency of such features during driving tasks would promote the introduction of physiological insight in drowsiness detection systems, as that would imply that off-the-person in-vehicle systems (CardioWheel) [8] are effectively interchangeable with on-the-person systems (MoveSense and PulseOn).
This particular experiment involved thirteen volunteers, each participating in two half-hour drives on the AUTOMO-TIVE DD Simulator. Participants would rate their drowsiness level every five minutes using the KSS. Session scheduling intended to cover a variety of alertness states, with each participant having a session in the middle of the morning and another in the late afternoon. For each of these, circadian settings in the simulator were tuned to potentiate alertness (using daylight in morning sessions) or drowsiness (nightlight in late afternoon sessions).

3) Data Applications
Using the collected data, the correlation between inter-beat intervals (IBIs) from each source was computed, as well as the similarity of the resulting HRV, evaluating time, frequency, and non-linear domain features. The cardiac signals retrieved from each sensor were synchronized in each session. From R-peak locations detected using the Pan-Tompkins algorithm, series of IBIs were computed and afterward corrected using the algorithm described in [35].
Direct comparison of IBI sequences showed a satisfactory level of equivalency between all data sources, although the PPG-based IBIs differ more consistently from the other sources. This disparity is mitigated when IBI sequences are transformed into sets of HRV features. This convergence is justified since compressing a sequence of IBIs to a single HRV feature loses the vascular modulation of the rhythm measured by the wrist PPG sensor, but maintains the significant contribution of heart compression measured with the cardiac sensors.
These results suggest the feasibility of flexible driver drowsiness detection systems, using any type of cardiac rhythm sensor to assess the driver's state. The possibility of such a flexible system can accelerate the implementation of driver monitoring systems based on cardiac features and increase the functionality of advanced driver-assistance systems (ADAS) by endowing them with prevention and adjustment capabilities.

1) Setup
Based on the Apache Kafka platform described in AUTO-MOTIVE DD Simulator, the functionalities of acquisition and storage of the physiologic data, as well as the biometric recording, user identification, and authentication, are available for the integration of multiple simulation environments. As such, a simulator developed externally to the current project by the ULHT partner was integrated to create a more immersive simulation.
The ULHT simulator offers several driving scenarios for greater diversity during the driving sessions. These include highways, suburban roads, and urban scenarios. Moreover, it includes the procedural generation of traffic events for the analysis of the driver's response. Accessing the database, the event manager can provide event timestamps, allowing the extraction of the physiological data that describes the driver's reaction to the corresponding events and, ultimately, their driving performance.
The order of event occurrences must ensure some novelty across different sessions for the same user. As the route mandates a logical sequence in the highway and suburban road environments, these driving sessions are expected to include all planned events. On the other hand, to ensure a consistent experience across all users, the vehicle has a GPS available in the driving interface, indicating the route to follow in the urban environment. This environment also stores several invisible checkpoints in particular spots, such as road intersections or avenues, which trigger events according to the drivers' proximity.
Some implemented suburban event scenarios are: • Moving emergency vehicles; • Stopped vehicles (including after accidents); • Vehicles breaking road laws; • Weather such as rain and fog. Some urban event scenarios available in the simulator are: • Pedestrians crossing the road (on crosswalks or not); • Cyclists on the road; • Traffic light intersections. Summing up, with the integration of the ULHT simulator, an automated event generator enables the simulation of more realistic driving sessions (for example, commuting between urban and suburban environments with regular traffic events). An example of the running simulation can be seen in Fig. 5. The resulting simulation was designed to be projected onto an angled display surrounding the driver, who is sitting on a real (parked) car with an integrated steering wheel and pedals. An illustration of the planned setup can be seen in Fig. 6.

III. DEVELOPED ALGORITHMS
The following subsections describe several proposed methods and comparison studies carried out over the last years of the AUTOMOTIVE project. Some of the challenges and current state-of-the-art related work were already discussed in [12]. This reference was the guideline for our published research in electrocardiogram (ECG) presented below.

A. ECG BIOMETRICS
A biometric system aims to either identify or verify the identity of a person based on a measurement of one or multiple biometric traits [12]. Usually, it is composed of the following modules: acquisition, quality assessment, feature extraction, storage, and decision (see Fig. 7).
The biometric algorithm uses the data from the acquisition and storage modules and performs quality assessment, feature extraction, and decision. The data is commonly fingerprint, iris, palmprint, or face images, but can also be medical (c) Running simulation on the Urban scenario in the AU-TOMOTIVE IHC Simulator. Traffic lights, a pawn on the crosswalk, and a GPS indicating the route to follow can be seen. biometric measurements, such as the electroencephalogram (EEG) or the electrocardiogram (ECG). The use of the ECG as a biometric trait is relatively recent but has been gaining momentum, as researchers explore deep learning methodologies applied to this signal and novel ECG acquisition setups enable comfortable signal collection from daily objects [12], [36].
In normal conditions, an ECG signal is a cyclic repetition of five easily recognizable deflections: the P, Q, R, S, and T waves (see Fig. 8). The ECG can be used to discriminate between subjects in biometric recognition since it varies according to the following inter-subject factors: • Heart Geometry such as heart size or cardiac muscle thickness affects the depolarization of the heart; VOLUME 4, 2016  • Individual Attributes such as age, weight, or pregnancy, shifts the orientation of the electrical current conduction vectors across the heart.
Nevertheless, intra-subject variability factors, such as physical exercise or meditation, cardiac conditions, posture, emotions, fatigue, and electrode characteristics and placement, should not be overlooked since they may undermine the process of biometric recognition [12].
The literature survey in [12] contributed towards the AU-TOMOTIVE project as follows: • History of ECG biometrics: a deep survey with the evolution and current landscape of ECG-based biometric recognition based on the review of ninety-three stateof-the-art publications; • Fundamental background knowledge: a solid overview of fundamental concepts (such as anatomy, physiology, and intra-subject and inter-subject variability), providing a comprehensive guide to new and current researchers; • Future research paths: a discussion of the most relevant challenges and the most promising future possibilities regarding research and development in each part of ECG biometric systems, from acquisition to decision.
For the research topic of drowsiness monitoring, knowledge from ECG biometrics is extremely relevant. One of the major data sources for driver drowsiness recognition is the ECG signal, and the topic of ECG biometrics has been cultivating deeper insights that could benefit AUTOMOTIVE's target task. Beyond this, ECG biometric algorithms can offer identity information required for continuous user-tuning drowsiness monitoring algorithms and, ultimately, improve their performance.

Feature Extraction
Storage Decision ENROLLMENT RECOGNITION FIGURE 7: General structure of a biometric system (adapted from [12]).

Atria Depolarize
Ventricles Repolarize Atria Repolarize Ventricles Depolarize FIGURE 8: The heartbeat, its characteristic waveforms, and their relationship with cardiac cycle polarization events on the heart's atria and ventricles (adapted from [12]).
The following sections delve deeper into the research work conducted on ECG biometrics for the AUTOMOTIVE project, inspired by the conclusions drawn by the literature survey presented in [12].

1) End-to-end deep learning models
Addressing the lack of end-to-end deep learning approaches for ECG biometrics, a simple method was proposed for both identification [37] and identity verification [38] tasks. The goal was to fully take advantage of the potential and flexibility of deep learning to integrate the whole ECG biometrics pipeline into a single model. Such a model receives all the information carried by the raw signals and is optimized as a whole to freely choose which data is most useful for accurate and robust decisions.
Proposed model: Inspired by the typical structure of a convolutional neural network, the model is composed of two parts: one for feature extraction followed by one for decision (see Fig. 9). The feature extraction part is composed of four convolutional layers (with, respectively 24, 24, 36, and 36 1 × 5 filters), interleaved with 1 × 5 max-pooling layers. The decision part of the model is composed of a fully connected layer. For identification (in [37]), this last layer has one neuron for each identity and softmax activation. It is trained to offer probabilities for each identity. For identity verification (in [38]), the n-dimensional embeddings output by the layer (with ReLU activation) are processed representations of the input samples. The task of finding identity matches is performed through their similarity to other embeddings.
Identification Experiments and Results: The work conducted on [37] and [38] used mainly data from the University  of Toronto ECG database (UofTDB) [22]. As the current major off-the-person database, with recordings from 1019 subjects over multiple sessions and postures, it offers an optimal setup to develop and evaluate robust ECG biometric algorithms.
Firstly, for identification [37], a study was conducted on the successive integration of processing stages into the deep learning model. Let's consider the traditional ECG biometric algorithm pipeline divided into four stages: (1) denoising; (2) preparation; (3) feature extraction; and (4) decision. This study encompassed four experimental setups: (A) stages 1 + 2 + 3 + deep model; (B) stages 1 + 2 + deep model; (C) stage 1 + deep model; and (D) only with the deep model. This amounted to a progressive evolution from the traditional ECG pipeline to a fully end-to-end model and enabled the assessment of the benefits of using the latter.
The results (see Table 4) show that the traditional pipeline results in the poorest accuracy. Setup B, with the deep model performing just feature extraction and decision, corresponds to the best results, followed closely by the fully end-to-end model. Nevertheless, setups C and D can be improved with data augmentation strategies. Data augmentation is generally essential for avoiding overfitting deep learning models [39]. However, current strategies are highly specific for imagebased tasks. As such, data augmentation strategies were specifically designed for ECG biometrics, aiming to mimic common noise and distortions verified in realistic ECG signals. Out of seven types of data augmentation, four offered improved performance: random permutations, magnitude scaling, baseline wander, and flip. Random permutations offered the largest improvements, raising the accuracy in setups C and D to 94.2% and 96.1%, respectively.
Identity Verification Experiments and Results: For identity verification [38], two training strategies were explored: using the aforementioned identification training, and using the triplet loss [40]. Several experiments were conducted for a thorough and realistic evaluation of the performance of the methods: using separate sets of subjects for training and testing, shorter enrollment durations, and crossdatabase tests on the PTB [23] and CYBHi [24] databases (with less and more noisy signals, respectively). The verification results (see Table 5) show that identification training surpasses the triplet loss training, especially when enrollment data is scarcer. Overall, the proposed method was able to achieve equal error rates (EER) as low as 7.86% for 30 seconds of enrollment. The proposed method performed significantly better than alternative state-of-the-art approaches based on autocorrelation (AC/LDA) [41], autoencoders [42], and discrete cosine transforms (DCT) [37], [43] when evaluated in the same conditions. Moreover, observing the results on PTB and CYBHi (see [38]), one can conclude the proposed method is especially promising for more realistic off-the-person data. Overall, combining all typical pipeline processes into a single end-to-end model seems to be the key for robust ECG biometric models in real applications.
The main contributions of this work for the AUTOMO-TIVE project can be summarized as follows: • End-to-end model: An end-to-end architecture was developed to perform biometric identification with ECG signals. Despite being based on deep learning, its relatively simple and lightweight structure, requiring no pre-processing stages beyond signal normalization, paves the way towards future deployment into embedded systems in real scenarios, as foreseen by the AU-TOMOTIVE project; • Data augmentation for ECG signals: The proposed data augmentation strategies, specifically tailored for ECG signals, enable us to take full advantage of available data for more accurate and robust algorithms; • Realistic performance evaluation setup: The evaluation with off-the-person data and the careful division of training, enrollment, and testing data results in a challenging and thoroughly realistic evaluation setup that not only shows the improvements vs. the state-of-the-art but also illustrates faithfully how the proposed method would behave in real AUTOMOTIVE applications. The end-to-end, robust, and lightweight model developed in this work also provide a strong scaffolding for the future VOLUME 4, 2016 development of personalized drowsiness monitoring. The availability of identity labels for new ECG data is essential to teach general drowsiness monitoring models to recognize subject-specific markers of drowsiness.

2) Long-term performance
The performance of biometric systems is known to decay over time, eventually rendering them ineffective. Based on previous studies on long-term ECG permanence [44] and the prior knowledge of ECG variability [45], it is expected that long-term signal variations will have a large effect on real ECG biometric applications. The work described in this section, presented in further detail in [46], aimed to study ECG variability over time and its real impact on the performance of state-of-the-art algorithms. Then, template and model update strategies were implemented to bridge the observed performance gap.
Identification Algorithms: Four state-of-the-art ECGbased identification methods were implemented to serve as foundation for the long-term performance tests. These were: (1) the approach based on autocorrelation and discrete cosine transform (DCT) proposed by Plataniotis et al. [47]; (2) the methodology proposed by Tawfik et al. [48] using DCT coefficients from average QRS; (3) an approach based on discrete wavelet transform (DWT) proposed by Belgacem et al. [49]; and (4) the deep autoencoder proposed by Eduardo et al. [42].
Template Update Strategies: Two strategies of template update were implemented to ensure the models are up-to-date on the identity information of the enrolled subjects. The first is FIFO (first-in-first-out): the database is updated using new samples whose similarity to the current templates is above a defined threshold (obtained empirically using the training data), replacing the oldest template of the same identity. The second is Fixation, where certain templates are fixed, allowing only the remaining stored samples to be updated: this ensures some initial labeled identity information remains on the system over time.
Experiments: Experiments used the E-HOL-03-0202-003 database (commonly called E-HOL 24h). It consists of threelead Holter recordings of 202 healthy subjects. The training set consisted of the last 30 seconds of the first 60 minutes, to mimic short enrollment times and avoid the initial resting period. To study performance over time, testing was performed over seven time points: one immediately after enrollment, another after one hour, and regularly until the end of the records.
Results: The performance results at each test hour (see Fig. 10) show that performance decays significantly, even over relatively short periods. The template update techniques were successful in reducing the performance decay over time. The FIFO technique resulted in improvements of 8 − 9% accuracy, on average, whereas the Fixation strategy resulted in an average accuracy increase of 10%. Overall, the results show long-term identification performance in ECG biometrics is generally weak, and template update techniques should FIGURE 10: Evolution of the state-of-the-art ECG-based identification accuracy over time [46]. be studied further for enhancing the long-term performance of state-of-the-art methods.
The main contributions of this work for the AUTOMO-TIVE project can be summarized as follows: • Deeper knowledge of long-term performance decay: through this comprehensive study of the impact of ECG variability over time on the performance of state-of-theart algorithms, the AUTOMOTIVE project now enjoys deeper knowledge of the hurdles that should be overcome to achieve suitable real applications; • Comparison of update strategies: template update is shown to be a must for real ECG biometric applications, and the comparison of update strategies performed in this work paves the way for more robust algorithms in realistic scenarios. For AUTOMOTIVE's central topic of drowsiness detection, the knowledge and methods built in this work are essential for robust long-term ECG biometrics. In turn, these are paramount for the aforementioned continuous user-tuning of drowsiness monitoring models.

B. xAI IN BIOMETRICS
In the absence of a mathematical definition, interpretability is the degree to which a human can "understand the cause of a decision" [50] or, in Machine Learning context, "consistently predict the model's result" [51]. Thus, a model is more interpretable when it is easier for a person to identify why it took a certain decision. Moreover, a model is more interpretable than another if the former's decisions are easier to understand [50]. For biometrics, this challenge has only just started to be unveiled, with researchers now beginning to use interpretations to improve their biometric models [52], [53].
The work proposed by us in [54] was the first to use interpretability to understand how ECG signals carry identity information, how that information changes with more realistic acquisition settings, and discuss the possibility of using such insights during training for improved robustness to signal noise and variability. In addition, our work in [55] was a pioneer at incorporating interpretability in face Pre-sentation Attach Detection (PAD) methods regarding a wide range of attacks. We believe that the research in both of those problems, inserted in the AUTOMOTIVE project, can remarkably impact the outcomes of the next generation of biometric systems: more accurate, more robust, and more transparent.

1) xAI for ECG Biometrics
Research in ECG biometrics has been steadily evolving from smaller databases of high-quality signals acquired in medical settings (on-the-person settings) towards larger databases of noisier signals acquired in more realistic scenarios (off-theperson settings). One defining characteristic of this evolution is the movement away from fiducial-based methodologies in favor of holistic approaches. Initially, several approaches successfully used the QRS complex (the key defining feature of the ECG) or its amplitude or time measures for identification, as it was considered more stable over time and across variable conditions. However, using only QRS complexes is increasingly uncommon as research evolves towards offthe-person signals and larger databases. This denotes the QRS may not be enough for identity recognition in large populations and more realistic scenarios.
This subsection describes a study [54] conducted to understand how a state-of-the-art end-to-end model uses signal information in diverse scenarios, leveraging the recent tools developed for model interpretability. The structure of the proposed methodology had the following main steps: Biometric Identification Model: The model followed the architecture proposed in [37], [38] (see section III-A1): an end-to-end 1D convolutional neural network (CNN) with four convolutional layers. Neighboring convolutional layers were separated by max-pooling layers. The last convolutional layer was followed by two fully-connected layers, of which the first has 100 neurons and ReLU activation, and the last has N neurons and softmax activation (N corresponds to the number of considered identities). This model was trained with either the PTB [23] or the UofTDB [22] databases, on increasingly larger subsets of identities.
Interpretability Method: Interpretability is a quickly growing topic that is contributing to the more complete understanding of the often elusive behavior of deep learning models [56]. To better understand the trained models for ECG biometrics, the interpretability tools used in this work were Occlusion [57], Saliency [58], Gradient-SHAP [59], and DeepLIFT [60], as implemented in the Captum [61] library for PyTorch.
Results: Analysing the explanations obtained (see an example overview in Fig. 11), one can verify a trend from smaller to larger identity subsets where the relevance of QRS, initially dominant, is increasingly shared with other parts of the signal. A similar dynamic is verified when comparing explanations with on-the-person vs. off-the-person signals: the focus is mostly on the QRS complex for the former, but relevance is more evenly shared with other waveforms when considering the latter. Overall, while the QRS seems to be the most important part of the ECG signal for biometrics, it can only be reliably used alone for on-the-person scenarios with smaller populations. When considering larger sets of identities and more realistically noisy signals, the information carried by other ECG waveforms is important for robust and accurate decisions.
The main contribution of this work for the AUTOMOTIVE project can be summarized as follows: • Better understanding of ECG biometrics: the AUTO-MOTIVE project moves forward with new evidence of the relative relevance of the ECG waveforms on diverse conditions of signal quality and population size. This generated new knowledge that steers research towards the right path to more robust and accurate ECG biometric algorithms; • Application impact: this work was the first one in interpretability for ECG biometrics, establishing a guide for researchers in this field with several results, suggestions for future work, and an intuitive way to visualize interpretations for unidimensional signals. For the topic of ECG biometrics, the exploration of transparency-related topics through the use of interpretability is important as research quickly moves towards deep learning models. This leads to improved ECG biometrics which, by itself, already benefits drowsiness monitoring as discussed before. However, exploring interpretability and explainability for 1D signals opens new doors to understand what information is indeed useful for drowsiness monitoring, where it is located in the physiological signals, and how to best capture it.

2) xAI for Face PAD
In this subsection, a study on interpretability tools applied on deep neural networks trained for PAD in face biometrics is reported [55]. The structure of the proposed methodology had the following main steps: Presentation Attack Detection Network: The deep neural network model was a simple end-to-end convolutional neural network composed of four convolutional layers, with three max-pooling layers interposed between them, and three fully connected layers. The four convolutional layers were composed of 32, 32, 64, and 64 filters, respectively, with size 3 × 3, unit stride, and padding. The max-pooling was VOLUME 4, 2016 performed in 2 × 2 regions with stride 2. The dense layers were composed of 100, 100, and 2 neurons, respectively. All convolutional and fully connected layers were followed by rectified linear unit (ReLU) activations, except for the last dense layer, which was followed by softmax activation.
Interpretability Method: The Gradient-weighted Class Activation Mapping (GradCAM) tool was applied to the last convolutional layer of the model, allowing for a) different importance values to each neuron for a particular decision of interest, b) explanations for any layer of the network, and c) analysis of the model predictions at the class level.
Evaluation Frameworks: • Mix-Attack: the model was trained and tested with bona fide samples and all the varieties of attacks available; • One-Attack: --The model was trained and tested with bona fide samples and one single type of attack, which was already seen by the network during the training step; --The bona fide samples of one random subject were present in the training set on one evaluation iteration, and were then swapped to the test set for a second evaluation run; • Unseen-Attack: the model was trained with all but one type of attack and tested with only this type, besides the bona fide samples in train and test steps. The main contribution of this work for the AUTOMOTIVE project can be summarized as follows: • Identification of desirable properties for generalization of deep neural models to unseen data and attacks: the AUTOMOTIVE project moves forward by using interpretability tools to state the following properties of the models: 1) explanations for the same sample should be similar whether or not it is seen during training; 2) explanations for the same sample should be similar whether or not the model is trained to detect that specific attack; 3) explanations should be similar for different samples with the same predicted label; and 4) explanations should be meaningful; • Application Impact: this work was the first one in interpretability for face biometrics, establishing a guide for researchers in this field with several results and suggestions for future work. By exploring interpretability for face images, this work also paves the way for more transparent driver drowsiness monitoring based on face video. The knowledge acquired with this study not only leads to improved biometric recognition but also illustrates new frameworks to understand what facial features are the most informative for recognizing drowsiness in vehicle drivers.

C. GENERAL BIOMETRIC APPLICATIONS 1) Template security on end-to-end models
The task of recognizing identity requires the storage of highly sensitive personal information. As such, three template se- curity properties should be verified by any good biometric system: cancelability, unlinkability, and irreversibility. These are typically achieved through tailored feature extraction, encryption, or biohashing schemes before storage and matching. The most prominent of such schemes are Bloom Filters (BF) [62] and Homomorphic Encryption (HE) [63]. However, the state-of-the-art in biometric recognition is increasingly dominated by deep learning approaches, and adding separate processes of protection and matching is sub-optimal and creates additional hurdles that may limit achievable performance. All of this calls for an integration of template protection within deep learning models, and since these are so flexible and have been able to learn so many sophisticated tasks, why not have them learn template protection as well?
Proposed Methodology: The Secure Triplet Loss (STL) was proposed [64], [65] to achieve the aforementioned goals. Having a model which receives a biometric sample and a cancelability key and outputs a template, the training objective function includes a component for identity and cancelability and a component for linkability (further details in the original publication [65]). The first component adapts the original triplet loss to not only push away templates with different identities but also templates with different keys, clustering only templates that agree on both identity and key and thus promoting cancelability. The second loss component measures the linkability of the templates in a batch, either through the Kullback-Leibler divergence (STL w/KLD) or using the differences in mean and standard deviation (STL w/SL).
Experiments: The proposed approach was evaluated for identity verification in two scenarios: (A) training a network "from scratch", and (B) adapting and fine-tuning an existing model to make it output protected templates. Scenario A was explored for ECG biometrics, using the end-to-end approach in [37], [38], [54], [66] and the UofTDB database [22]. Scenario B was explored for face biometrics, using the Inception-ResNet-V1 [67], pretrained on VGGFace2, with the YouTube Faces database [26].
Results: The results on scenarios A and B are presented in Table 6 and Table 7, respectively. Results are presented for performance (equal error rate -EER; and false non-match rate at 0.1% false match rate -F M R@F N M R = 0.1), cancelability (false cancelability match rate at the EER point -F M R C ), and linkability (D sys ↔ as proposed in [62]). In scenario A, the results show the proposed method largely avoids performance losses vs. the unprotected triplet loss, offering better performance than with BF. The STL also offers the best cancelability results and unlinkability levels near those offered by HE. In scenario B, the STL verifies some performance gap vs. the baseline, but is nevertheless aligned with the state-of-the-art HE and considerably better than BF. STL once again offered the best cancelability and acceptable linkability results.
The main contributions of this work for the AUTOMO-TIVE project can be summarized as follows: • A novel method for biometric data protection: STL is a simple and competitive alternative to state-of-the-art template protection schemes. Through a tailored objective function, it teaches cancelability and unlinkability to the biometric model, avoiding any separate protection process. Thus, STL can be successfully applied for any biometric trait and most trainable architectures, and not only for new models but also to transform existing ones and lead them to deliver protected templates; • Security without performance losses: Conducted experiments show that STL is able to match the stateof-the-art in template protection (HE) in unlinkability. Most importantly, STL attains the best template cancelability results while avoiding considerable performance decay; • Lightweight and flexible: As it only requires minor architecture changes and no additional processes, STL allows models to retain the original size and average inference time while offering good security levels. Moreover, STL can be applied to new or existing models of varying complexity, paving the way for the lightweight in-vehicle biometric systems foreseen by AUTOMO-TIVE. The lightweight, flexible strategy for biometric template protection proposed in [64], [65] is also of paramount relevance for the new generation of driver drowsiness monitoring systems. To continuously learn subject-specific patterns of drowsiness, such models require biometric systems, which will need to securely store driver identity information. Through the Secure Triplet Loss, this can be done effortlessly and with minimal processing requirements, in a way that is suitable for embedded systems in in-vehicle scenarios.

D. EMOTION RECOGNITION THROUGH ECG AND FACE ANALYSIS
Automatic facial expression recognition (FER) has been one of the key problems in the human-computer interaction field, with growing application areas including neuromarketing, crowd analytics, biometrics, or clinical monitoring [68]. Expression recognition is a task that human beings perform daily and effortlessly, but it is not yet easily performed by computers. Although recent methods, particularly those using deep learning, have demonstrated remarkable performances in highly controlled environments, the automatic FER in real-world scenarios is still a very challenging task [68]. In addition, the performance of deep models is still below its full potential as training high capacity models in small datasets, such as the ones available in the FER field, usually result in overfitting.
To work around the problem of training high-capacity classifiers on small datasets, previous FER works have mainly resorted to (i) transfer learning [69], where a CNN is typically pre-trained in some domain-related dataset before being finetuned to the target dataset; and (ii) classifier ensembles [70], in which an ensemble of CNNs is created to combine their decisions and, hence, reduce the model's variance. However, their benefits are tightly coupled with the source-target domain similarity.
In terms of motivation, the work of Liu et al. [71] is probably the most related to the proposed methodology in the AUTOMOTIVE project, as they also explore the psychological theory that facial expressions are the result of the motions of facial muscles.
However, some remain skeptical about emotion recognition based on facial expressions, as people are capable of counterfeiting these to convey fake emotions [72]. It is known that emotional states influence the autonomic nervous system and, consequently, the morphology of the physiological signals. This explains the current efforts towards affective computing based on physiological signals (such as the ECG), that cannot be voluntarily altered to fake emotions.
Despite several studies addressing emotion recognition using physiological signals, it is still uncertain how emotion variations translate into actual pattern alterations in each physiological signal. Moreover, for the specific case of ECGbased emotion recognition, despite the generally encouraging results reported in the literature [73], there are a plethora of problems to be addressed, mostly related to the scarcity of data, their limited variety, and the subjectivity of corresponding emotion labels.
These current challenges in ECG-based emotion recognition reflect on the performance of the algorithms. These often offer severely inferior accuracy when evaluated under realistic scenarios, failing to live up to the expectations set by the results reported in the literature. For example, the methodologies proposed by [74]- [77] all offer relatively high accuracy, but only when evaluated on random data splits. One should expect performance to decay sharply once the methods are evaluated on disjoint sets of recordings (signal-independent) and subjects (subject-independent), as they would in real applications. VOLUME

Recording Arousal Prediction
Conv. Model (pretrained with self-learning) FIGURE 12: Schema of the proposed methodology for ECGbased emotion recognition as proposed in [73]. MIL aggregates individual predictions for consecutive segments into a single recording prediction.

1) ECG Analysis
The work described in this section (and in [73] in further detail) focused on the development of a methodology for emotion recognition that takes advantage of the continuous nature of the ECG signal for improved accuracy. Special efforts were devoted to ensuring the evaluation settings adequately mimic realistic signal-independent and subject-independent settings, thus offering accurate performance estimates that are more likely to be verified in real applications. Methodology: The proposed methodology is built upon the approach proposed in [77], taking advantage of the selflearning pre-training that offered higher robustness despite data scarcity. The approach (see Fig. 12) is based on a convolutional neural network that receives short ECG segments. Initially, the model is prepared and trained to recognize a set of eight transformations applied to the input signals (self-learning): noise addition, scaling, negation, temporal inversion, permutation, time-warping, baseline wander, and magnitude warping. After this pre-training stage, the convolutional layers are frozen and the top of the model is adapted to provide valence and arousal predictions. Individual predictions from consecutive segments are combined into long-time predictions using multiple instance learning (MIL) through heuristic methods (maximum, mean, and median), a multilayer perceptron (MLP), a long short-term memory (LSTM) network, or a bi-directional LSTM (BiLSTM).
Experiments: Experiments used ECG signals from the DREAMER [78], AMIGOS [79], and MAHNOB-HCI [80] databases. For the individual predictions, 10-second segments were considered. All segments from each recording were considered for the corresponding aggregated MIL predictions. Data were either divided randomly between training and testing (random division), divided by recording (signal-independent setting), divided by subject (subjectindependent setting), or trained and tested in different databases (cross-database setting).
Results: As expected, results show a sharp decay of performance when we move from random data division settings (the most common in the literature) to signal-independent settings (more realistic). For the former, the methodology achieves 75−79% accuracy on individual predictions vs. 54− 56% accuracy for the latter. In cross-database experiments, trained on the DREAMER database, the method achieves 46 − 53% accuracy on the AMIGOS database and 55 − 61% for the MAHNOB-HCI database. However, results appear to improve once MIL is applied, especially when using heuristic methods, which illustrates the benefit of considering the continuity of the ECG for emotion recognition.
The main contributions of this work for the AUTOMO-TIVE project can be summarized as follows: • A realistic perspective over the state-of-the-art: This work identified critical flaws in the way ECG-based emotion recognition algorithms are evaluated. With this, it was possible to restructure the evaluation settings into more challenging and realistic scenarios for more accurate results that are more likely to be verified in real applications; • An improved methodology for ECG-based affective computing: Taking advantage of the continuity of the ECG signal, the base methodology was successfully adapted with MIL techniques for improved performance. This brings us closer to the target real applications foreseen by the AUTOMOTIVE project.

2) Face Analysis
This subsection presents a proposed end-to-end deep learning approach for emotion recognition using prior knowledge on facial expressions [81]. The novel deep learning network architecture along with a well-designed loss function explicitly models both informative local facial regions and expression recognition. The intuitive idea is to learn the most relevant facial regions for expression recognition, such as facial components (i. e., eyes, eyebrows, nose, mouth) and expression wrinkles. To accomplish this purpose, the proposed neural network is composed by three main components, namely (i) the facial-parts component, (ii) the representation component, and (iii) the classification component (see Fig. 13).

I -Facial-Parts Component:
The facial-parts component learns an encoding-decoding function E(x) that maps an input image x to a relevance mapx, representing the probability of each pixel being relevant for recognition. The loss function, supported by the physiology knowledge that facial expressions are decomposed into several action units of facial muscles, is defined to enforce sparsity and spatial contiguity on the activations ofx. Thus, three regularization strategies for regression ofx were proposed.

II -Representation Component:
The representation component learns an embedding function F (x,x) that maps an input image x and its relevance mapx to a hidden representation h. The relevance mapx from the facial-parts component is used to filter the learned representation h, leading it to only activate strongly to the most relevant facial parts. 13: Architecture of the proposed model for emotion recognition based on facial expressions (adapted from [81]).

III -Classification Component:
The classification component consists of a sequence of fully connected layers followed by a softmax output layer. The largest probability output is chosen as the class prediction.
The main contribution of this work for the AUTOMOTIVE project can be summarized as follows: • Facial Expression Recognition Model: the AUTOMO-TIVE project moves forward with a model of an endto-end deep neural network along with a loss function defined to regularize the entire learning process so that the proposed method can explicitly learn expressionspecific features; the approach is based on the strong prior knowledge that facial expressions are the result of the motions of some facial muscles and components; • Application Impact: facial expressions represent an important component for emotion recognition and person identification with strong applications in autonomous driving or mobility solutions. The model can be explored for emotion categorization, action recognition, or well-being monitoring of vehicle occupants. Emotion recognition can complement and extend drowsiness detection and, hence, help pave the way for an all-in-one and more robust well-being monitoring system. Furthermore, these techniques can help promote road safety, particularly if used in the detection of negative emotional states in the driver, such as anger or stress [82].

E. DROWSINESS DETECTION THROUGH ECG, PPG AND EOG ANALYSIS
Normal human sleep is composed of a Rapid-Eye-Movement (REM) stage and four non-REM stages. These stages have well-defined characteristics and alternate cyclically, with a standard human adult cycle lasting approximately 90 minutes. This cycle starts with a non-REM stage, and finishing in REM sleep [83]. According to Keenan et al. [84], "sleep is a reversible behavioral state of perceptual disengagement from and unresponsiveness to the environment." The term drowsiness, often used interchangeably with the terms fatigue and sleepiness in the literature, relates to a physiological need to sleep, as it is the intermediate state between wakefulness and sleep [85]. This state comes with the impairment of visual perception, of higher cognitive functions, the inability to maintain visually focused attention, among other undesirable consequences for safe driving [86].
Since physiological needs cannot be continuously avoided, a drowsy state always precedes a sleeping episode [85]. As such, if a drowsy state is detected then the body is already struggling to remain awake, possibly indulging in microsleeps in the process [87]. Microsleeps have been associated with poor driving performance [88] and, during these episodes, attention gaps can impair the driver's ability to respond to events [89].
Studies report great inter-subject variability in how drowsiness affects drivers' performance [90]. Even for a given self-declared drowsiness level, indicators such as eye blink duration vary considerably [29] and, thus, objectively and non-intrusively measuring drowsiness has been a constant challenge in research.
Some subjective scales are used to score drowsiness based on the subjects' responses concerning standardized sleep symptoms. Some examples include the Epworth Sleepiness Scale (ESS) [31], the Standford Sleepiness Scale (SSS) [91], a Visual Analog Scale (VAS) [92], and the Karolinska Sleepiness Scale (KSS) [93]. The Objective Sleepiness Scale (OSS), which combines features of the electroencephalogram (EEG) and eye movements, presents some disadvantages, including the insufficient number of sleep stage categories [13] and its intrusiveness.
The KSS is a nine-point scale that defines verbal reference for each stage (1 -very alert; 3 -alert; 5 -neither alert nor sleepy; 7 -sleepy but not fighting sleep; 9 -very sleepy and fighting sleep [93]). The ground truth for the drowsiness level is assessed by having the subject periodically report their perceived score. Although not an objective measurement and capable of influencing the driver's state [13], [29], the KSS is easy-to-apply, non-invasive, and widely used in research [93]).
While in a state of drowsiness, the cardiac system significantly alters its behavior and, as such, the study of these modifications can retrieve information regarding the alertness level of a driver. When a subject is under drowsiness, their heart rate becomes slower, more irregular, and blood pressure drops [94]. Under this state, the parasympathetic nervous system becomes more active and the sympathetic nervous system activity decreases, while the opposite remains true when the subject is under a wakeful state [94]- [98]. A reliable measure for the autonomic nervous system is the heart rate variability (HRV), which measures the variation of the cardiac cycle duration and is usually obtained through an Rpeak detection of the ECG and a frequency-domain analysis [95].
Following this, a higher frequency (HF) band of the HRV relates to the parasympathetic activity, while a lower frequency (LF) band provides information about the sympathetic activity. Hence, the state of an individual can be characterized by calculating the ratio of (LF/HF) [99]. A more profound division of frequency bands is presented in Table 8.
Classification techniques based on the HRV, the ECG, electrooculogram (EOG), face video, or other data types are VOLUME 4, 2016  [94], [100], [103]- [106], k-Nearest-Neighbors (kNN) [106], [107], Support Vector Machines (SVM) [102], [106], [108], and Artificial Neural Networks (ANN) [109], [110] being harnessed for this subject. Further concurring with the severity of the context of the AUTOMOTIVE project, automotive manufacturers and other companies have developed solutions targeting the detection of drowsiness and attention during driving tasks. Some commercial solutions (for a complete present-day review, see [13]) include: • Steer Device by STEER Inc. 8 , a bracelet that detects the level of drowsiness through the heart rate and the electrodermal activity (EDA); • IR-LED by Siemens 9 , a camera system equipped with an infrared light-emitting diode and an infrared light sensor to detect microsleep events; • FaceLAB Driver Safety by Seeing Machines [13], which detects driver drowsiness in real-time through eye blinks and percentage of eye closure (PERCLOS) analysis, recording face video to track the head, eyes, eyelids, and gaze; • Driver Attention Warning by Saab 10 , which uses an infrared camera to monitor eye blinks and track gaze and head orientation; • Driver Alert Control by Volvo 11 , which uses lane departure as a detection feature, monitoring the car movements concerning the road markings; • Active Safety by Volkswagen 12 , which uses the Steering Wheel Angle (SWA) and lane departure as detection features and continuously evaluates traffic signals; • Attention Assist by Mercedes-Benz 13 , which uses the SWA as a detection feature (it learns the driver steering 8 Kickstarter: STEER: Wearable Device That Will Not Let You Fall Asleep by Creative Mode. Available on https://www.kickstarter.com/projects/ creativemode/steer-you-will-never-fall-asleep-while-driving?lang=fr. 9 Photonics.com: IR-LED Detects Drivers in Microsleep. Available on https://www.photonics.com/Articles/IR-LED_Detects_Drivers_in_ Microsleep/a44727. 10 Saab Driver Attention Warning System. Available onhttps://www. saabnet.com/tsn/press/071102.html. 11 Volvo Support: Driver Alert Control (DAC). Available on https://www.volvocars.com/en-th/support/manuals/v40/2017w17/ driver-support/driver-alert-system/driver-alert-control-dac. 12 Volkswagen UK: Driver alert system. Available on https://www. volkswagen.co.uk/technology/car-safety/driver-alert-system. 13 No Doze: Mercedes E-Class alerts drowsy drivers. Available on https://www.autoweek.com/news/a2032716/ no-doze-mercedes-e-class-alerts-drowsy-drivers/. pattern at the beginning of a driving session), as well as braking and acceleration events, the duration of the session, and road conditions; • Driver Attention Monitor by Lexus 14 , which tracks eye and head movements to detect whether the driver is looking forward or not, and includes an obstacle detection feature. After a comprehensive analysis of both current research and commercial solutions for driver drowsiness detection systems, Doudou et al. [13] discussed the open technological issues in this field regarding three types of information that can be acquired from the driver, as presented in Table 9.
In addition to these limitations, one frequent conclusion in this field of research is that generalizing to different individuals is extremely difficult, as models perform significantly worse on data from new subjects [111]. Simultaneously, large amounts of high-quality data are needed to train and validate drowsiness detection methods, which is still a challenge due to the cost of deploying real-traffic monitoring systems and the concerns regarding safety and data privacy.
The following subsections present the work conducted for the AUTOMOTIVE project on the topic of drowsiness detection. These endeavors aimed to study the aforementioned limitations of the current state-of-the-art and advance the field by proposing new and improved algorithms.

1) Intrusive and non-intrusive signal acquisition study
In this subsection, a combination of different methods of intrusive and non-intrusive signals to achieve the best possible performance for driver drowsiness detection is reported [112].
The method follows a standard workflow for a supervised machine learning classification problem: 1) extracting features, 2) randomly splitting data into training and testing datasets, 3) choosing the best hyperparameters with ten-fold cross-validation, and 4) evaluating classifiers using accuracy and F1-score.
The ECG, EOG, and video measures such as the eyelid distance, gaze angles, head pose, and pupil diameter in nonoverlapping two-minute windows were used as features. The available KSS scale was considered to categorize sleepiness. For the comparison study, five machine learning classifiers were explored: Support Vector Machine (SVM), Random Forest (RF), Artificial Neural Networks (ANN), Gradient Boosting Tree (GBT), and K-Nearest Neighbors (kNN).
In a summary of the obtained results, the work states that it is preferable to follow a hybrid approach including ECG features in addition to EOG or video features. Also, no classifier seems to be significantly superior to the others, and an imbalance between the alert and drowsy classes has a substantial impact on classification accuracy.
The main contribution of this work for the AUTOMOTIVE project can be summarized as follows: • A reference scheme for researchers in driver handover strategies: the AUTOMOTIVE project moves forward with a baseline reference scheme that compares intrusive and non-intrusive signal acquisition methods to detect driver drowsiness; • Application Impact: As presented in section II, the main approaches used for detecting driver drowsiness in real-time can be divided into three categories: vehiclebased measures, behavioral measures, and physiological measures. While each method has its strengths and weaknesses, being important to assess the combination of different measures to achieve the best possible performance, real-world drowsiness detection solutions need to use non-intrusive acquisition methods. With this in mind, one approach for drowsiness detection is to consider a non-intrusive camera-based method with a physiological method. However, which combination is the best? How much performance gains can one attain by fusing multiple data sources? This study discussed these questions, whose answers have wide applicability to other topics related to wellbeing monitoring.

2) Subject-dependent for driver fatigue classification
This subsection presents the first in-depth study on the use of ECG and EOG for subject-dependent classification in driver sleepiness/fatigue under realistic driving conditions [113]. The proposed methodology starts with a preprocessing stage, specific to each signal, followed by a feature extraction step. The classification procedure was composed of three different tests: multimodality, subject-independent classification, and imbalanced class distributions. KSS ratings collected every fifth minute during the experiments were used as labels.
ANNs, RF, SVM, and GBT were used and, based on accuracy values and ten-fold cross-validation, the best combination was selected. Next, the proposed methodology is summarized.
Preprocessing and feature extraction: The preprocessing of ECG consisted of a bandpass Butterworth filter with cut-off frequencies of 4 and 50 Hz, and the feature extraction of ECG consisted in the following calculations: a) for R-peak detection, the signal was divided into two-minute windows; the R-peaks selected are the ones that contain typical heartbeat values and the lowest RR standard deviation (RRSD) value; b) 8 time-domain statistical features were calculated using HRV time series; c) 8 frequency-domain features were calculated from the power spectrum density (PSD) of the HRV signal. The preprocessing of EOG consisted of a bandpass Butterworth filter with cut-off frequencies of 0.1 and 30 Hz and convolution with a Hamming window. The feature extraction of EOG consisted of several procedures to extract blink events and eye saccades.
Classification: Multimodality scenario: the data were randomly split between a training set (70%) and a test set (30%). Subject-dependent scenario: a) Training and testing sets were split randomly; b) Training set: data from n − 1 subjects and 30% of the data from the n-th subject. Testing set: 70% data from the nth subject; c) Training set: data from n-1 subjects and 10% of the data from the nth subject. Testing set: 90% data from the nth subject. Subject-independent scenario: Training set: data from n-1 subjects and 0% of the data from the nth subject. Testing set: 100% data from the nth subject. Imbalanced class distributions: A ratio between the number of samples in each class is considered for balancing misclassification costs and, therefore, balancing data for each class.
Based on the obtained results, this work finds that the accuracy improves when a combination of ECG and EOG features is used. Moreover, results show significantly worse performance in subject-independent classification, especially for the sleepy class. In conclusion, applying methods for imbalanced distributions can be a promising approach.
The main contribution of this work for the AUTOMOTIVE project can be summarized as follows: • A reference scheme for researchers in driver handover strategies: the AUTOMOTIVE project moves forward with a baseline study on the use of ECG and EOG for subject-dependent classification in driver sleepiness/fatigue under realistic driving conditions. Several crucial conclusions that help further developments were possible to state under this work. For instance, individual differences are present in the physiological signals and the labels, the reliability of the sleepiness ground truth has an impact on the design and the optimization of the classifiers, combinations of different measures (e.g., lane deviations, mathematical models of sleepiness) can improve the results in real driving conditions; • Application Impact: New lines of research arose from this work, such as the use of biometrics to develop subject-personalized models for drowsiness detection with lifelong learning. VOLUME 4, 2016 3) Peripheral cardiac signal acquisition and use in drowsiness classification This subsection presents a study of the feasibility of producing a drowsiness detection system based on peripheral cardiac signal [35]. This study encompassed three main stages: Signal collection and conversion into streams of interbeat intervals (IBIs): To collect cardiac rhythm information, the chest strap Movesense, the capacitive steering wheel Car-dioWheel, and the wrist photoplethysmograph (PPG) sensor PulseOn were used. The CardioWheel (ECG-based) directly provided the IBIs and the Pan-Tompkins algorithm was used to detect the R-peaks in the Movesense electrocardiographic signals. For the PPG sensor, an online filter that mimics recursive moving average removal was applied to one-second windows, and an adaptive threshold peak detection algorithm was implemented to locate peaks on the PPG signal. An IBI corrector system was then created to treat the outliers that resulted from moments of poor electrical contact and artifacts. This system was shown to be capable of reconstructing the sequence of IBIs from signals corrupted with 10% missed detections and additional 10% false peaks with less than 7.5 ms of mean absolute deviation from the true signal. This system is relevant to ensure that all collected information is used to calculate the HRV.
Development of a model based on IBIs values to detect drowsiness: For this stage, only data collected through a chest ECG was used. An initial set of time and frequency domain features was used to compare 4 decision models (SMV, one-class SVM, GBT, and ANN) and it was realized that, independently of the architecture, the model performed poorly in arbitrary individuals. Personalized models were then investigated with great improvements for the twelve selected individuals of the Swedish National Road and Transport Research Institute dataset with a trustworthy self-report and balanced experience of alert and drowsy states. The SVM model was selected as the best fitted to binarily classify the personalized state of drowsiness and, by revising features and tuning hyper-parameters (defining an SVM with linear kernel and C parameter 0.3), a mean performance of 0.63 ± 0.03 MCC (Matthews Correlation Coefficient) was attained.
Model comparison with IBIs measured from peripheral signal: Following the previous stages, the capability of the model to detect drowsiness with IBIs measured from a peripheral signal was tested. Per the experiment described in sections II-B2 and II-B3, chest ECG, hands ECG, and wrist PPG were collected, converted into IBIs, and the HRV features were calculated for every 2-minute window in each of the signals. Although only the data from 2 subjects survived the criteria followed by the Swedish National Road and Transport Research Institute dataset, the model trained with chest ECG remained fairly performing when applied to wrist PPG, with the scores ranging from 0.34 to 0.61 MCC.
Summing up the obtained results, this work found that even though the proposed system based on peripheral cardiac signal is feasible, future efforts should be devoted to tackling the limited size of the analyzed population. Furthermore, given that it is hypothesized that a limited set of individual models can be representative of the possible ranges of HRV for a general population, gathering and combining this set in a voting scheme or other ensemble classification framework might be a promising approach to finally developing a generalized, subject-independent system to detect driver drowsiness.
The main contribution of this work for the AUTOMOTIVE project can be summarized as follows: • A reference scheme for researchers in driver handover strategies: the AUTOMOTIVE project moves forward with a baseline study on the subject-dependent binary classification of driver drowsiness based on the peripheral cardiac signal, which is acquired nonintrusively. The present study concurs with the subjectdependent study presented on 2) in terms of the reliability of the sleepiness ground truth. Upon further analysis of the data where the models were constantly performing badly, an unbalance in classes was observed, as not all participants have managed to provide enough ratings associated with being sleepy for the model to properly learn the separation boundary between the two classes, indicating that a balanced class distribution is important for the topic at hand. • Application Impact: This work showed that on a restricted dataset driver drowsiness detection systems can be capable of using any format of cardiac rhythm sensor to assess the driver's state. More research in a more variety of sensors can contribute to the development of more flexible driver monitoring systems and advanced driver-assistance systems (ADAS).

IV. CONCLUSION AND FUTURE WORK
This paper presented the breakthroughs and achievements towards driver drowsiness monitoring throughout the AUTO-MOTIVE project. Despite the diversity of explored research topics, AUTOMOTIVE has kept its single central target: to usher in the new generation of driver drowsiness systems. AUTOMOTIVE's efforts towards immersive driving simulators with realistic procedural road generation and stateof-the-art labeling strategies brought us the ability to collect more and better data. The extensive research on ECGbased biometric recognition opened the door to user-tuned continuously-learning drowsiness algorithms. The developed emotion and drowsiness methodologies have addressed the main problems currently besieging the research community and delaying the real deployment of reliable commercial applications. And, the innovative studies on interpretability lead us to more transparent and trustworthy monitoring systems.
Despite these contributions, many hurdles are still unconquered. Further effort should be devoted to the full integration of these algorithms into a final robust drowsiness monitoring system. Other relevant challenges that remain are the development of even more realistic simulator prototypes or even larger initiatives for the acquisition of naturalistic data. Furthermore, new approaches are needed for model training less focused on subjective labels, strategies to learn from continuous unlabeled data sources, and models that maintain high-performance levels in hyper-realistic long-term usage scenarios.

ACKNOWLEDGMENTS
The presented research work was financed by the European Regional Development Fund (ERDF) through the Operational Programme for Competitiveness and Internationalization (COMPETE 2020 Programme), and by national funds through the Portuguese funding agency, Fundação para a Ciência e a Tecnologia (FCT), within the "POCI-01-0145-FEDER-030707" project and the "SFRH/BD/137720/2018" Ph.D. grant. The authors also wish to acknowledge all other collaborators in the AUTOMOTIVE project, especially Licínio Oliveira and Cláudia Sofia Silveira, for their valuable contributions.
TELMA ESTEVES is enrolled in the master's program in biomedical engineering at the NOVA School of Science and Technology. During her academic education, she took courses on electrophysiology, signal analysis and processing, medical imaging, medical information systems, and electronics, among others. She is currently developing her M.Sc. thesis in collaboration with CardioID Technologies LDA and the INESC TEC's Visual Computing and Machine Intelligence (VCMI) research group. Her ongoing thesis work is related to the issue of drowsy driving, a fact that made her connect with the AUTOMOTIVE project. She aims to mitigate by analyzing current detection methods and proposing new solutions to recognize sleepy states from the driver's facial expressions, overcoming limitations such as inter-subject variability. Her main research interests include computer vision, machine learning, and image processing. where he is currently undergoing the subsequent M.Sc. studies on Multimedia Software Engineering, combining software engineering, computer vision, and artificial intelligence. In 2019, he became a Researcher for AUTOMOTIVE -AUTOmatic multiMOdal drowsiness detecTIon for smart Vehicles, where he had the opportunity to develop a virtual simulation prototype platform for driver monitoring and driving assistance, in close collaboration with partners CardioID Technologies. The system includes multimodal sensory acquisition for driver state classification and an automated driving module (L1) as a safety-critical measure for lane-path recovery, part of a collaborative vehicle control model between human/artificial agents. He currently works as Virtual Testing and Simulation Engineer at Continental, developing solutions based on state-of-the-art simulation environments for efficient development of Advanced Driving Assistance Systems and Autonomous Driving.
LOURENÇO ABRUNHOSA RODRIGUES is a machine learning engineer at CardioID Tecnhologies Lda. He completed his MSc Degree in Biomedical Engineering from Instituto Superior Técnico (IST), Universidade de Lisboa, Portugal, in 2021. During academic training he devoted himself to signal processing, machine learning, and programming. Since joining CardioID, his work has been focused on the development of algorithms for heart rate variability, namely from wearables and off-the-person devices, as well as drowsiness detection in driving settings.

INÊS ANTUNES received the M.Sc degree in
Bioengineering from the University of Porto in 2021. Thus, her academic education focused on a variety of themes and areas that encountered Bioengineering at some level. During the Biomedical Engineering specialty, subjects like signal and image processing, machine and deep learning, bionics, sensors, software engineering, etc, were more deeply studied and developed. Since 2018, Inês is part of the National Biomedical Engineering Students Association (ANEEB), being a member of the board and a department director since 2020. In the same year, she did a research internship at the Biomechanical Department of the University of Twente, working with raw EMG data and developing controllers for exoskeletons to help paraplegic patients. Furthermore, she has developed her master thesis at INESC-TEC, addressing the field of emotion recognition using the electrocardiogram and deep learning architectures.
GABRIEL LOPES received the M.Sc. degree in bioengineering from the University of Porto in 2019. His academic education was focused on a wide array of subjects, such as signal and image processing, machine learning, computeraided diagnosis, sensors, electronics, and software engineering. He has addressed the topic of longterm performance in ECG Biometrics, including template update techniques with realist data. He is currently a consultant at Deloitte. He belongs to the Center of Excellence (CoE) of Low code/No Code. ARNALDO J. ABRANTES received the Licenciatura in Electronic Engineering and Telecommunications from University of Aveiro, in 1984, and the M.Sc. and Ph.D. diplomas, both in Electrotechnical and Computer Engineering, from the Technical University of Lisbon, in 1992 and 1998, respectively. He is currently a Coordinating Professor at Instituto Superior de Engenharia de Lisboa (ISEL), where he has been teaching (since 1985) courses in the areas of signal and image processing, machine learning, system dynamics, agent-based modeling, and intelligent virtual environments. He has co-authored about sixty articles in the area of signal and image processing, participated in R&D projects, and co-supervised two doctoral theses, in the areas of video surveillance and multimedia applications. PEDRO M. JORGE received the Licenciatura (five years), the M.Sc., and Ph.D. degrees in electrical and computer engineering from the Technical University of Lisbon in 1992, 1998, and 2007, respectively. Currently, he is a Coordinating Professor at the Instituto Superior de Engenharia de Lisboa (ISEL) from Politécnico de Lisboa (IPL), teaching courses in computer vision, machine learning, and mixed reality. He has been supervising several final projects of B.Sc. degrees and M.Sc. dissertations at ISEL/IPL and other institutions. He is the author and co-author of one European patent, one book chapter, journal papers, and several conference communications, and other scientific meetings, both in Portugal and abroad. He has been participating in several R&D projects, both with industry and funded by the National Science Foundation. He is currently the coordinator of the Computer Science and Multimedia Engineering B.Sc. Degree and of the Multimedia and Machine Learning R&D group from ISEL.

PEDRO GAMITO
ANDRÉ LOURENÇO received the M.Sc. and Ph.D. degrees in electrical and computer engineering from the Instituto Superior Técnico (IST), Universidade de Lisboa, Portugal. He is the Co-Founder and the CEO of the CardioID Technologies LDA. Since 2005, he has been Professor at the Electronics, Telecommunications and Computers Department, ISEL, where he is lecturing topics in information and signal processing and programming and supervising over 40 final year projects and 13 master theses for post-graduate students. Apart from technical qualities, he has key business and entrepreneurial experience participating in several startups (Albatroz Engineering, Lusospace, and Minalytics), and R&D projects (including H2020). His scientific contributions include 12 book chapters, 11 peer-reviewed papers, 77 international conference papers, and a family of patents "Device and Method for Continuous Biometric Recognition Based on Electrocardiographic Signals".
ANA F. SEQUEIRA collaborates as Assistant Researcher at INESC TEC in the field of computer vision and machine learning. Ana has a Ph.D. in Electrical and Computers Engineering and a Licenciatura Degree (5-year) and a Master in Mathematics. Ana's research comprises liveness detection techniques (for iris, face, and fingerprint); biometrics for border control; as well as facial analysis topics, such as emotion recognition, image compliance with standardization requirements, among others; and more recently interpretability/explainability of ML for biometrics. In the past, Ana worked at the University of Reading, UK, collaborating in EU projects related to the application of biometric recognition in Border Control (FASTPASS and PROTECT projects). Moreover, Ana had a short-term collaboration with the company Iris Guard UK researching the vulnerabilities and developing a proof-of-concept of a robust iris anti-spoofing measure for their EyePay® Technology. Ana led the construction of several biometric databases, managed biometric competitions, and has co-authored several research publications recognized by her peers with over 300 citations. VOLUME 4, 2016