1 Introduction

Mobility in urban areas can be a challenging and emotionally stressful task for visually impaired people (VIP), especially when navigating in unfamiliar environments. Despite an increasing number of assistive technologies that help individuals with sight loss to augment their spatial awareness and wayfinding abilities when in move, very few systems provide a high degree of independence beyond known environments that would allow VIP to significantly achieve mobility and integrate in everyday active life [14, 17]. Placing the visually impaired in the center of attention and exploiting recent developments in physiological computing and wearable wireless sensor devices, an extensive study was designed to better understand how people with sight loss perceive and interact with the urban space as manifested in their management of cognitive load and stress.

Orientation and mobility (O&M) in humans heavily relies on sight, which provides instantaneous, effortless access to anticipatory (e.g., stairs, turns, signs) and proactive (e.g., moving people, poles) information at various distances simultaneously [20]. Visually impaired pedestrians learn to obtain critical environmental information primarily through touch (sensing the ground surface with a white cane) and hearing (identifying and localising events and landmarks through sound). Mobility challenges can be summarized in four main problems: avoiding objects or obstacles (e.g., pedestrians, tree branches, improperly parked cars); detecting ground level changes (e.g., stairs, pavement edge or incline); negotiating street crossings (e.g., lack of curbs, traffic lights or sound signalling); and adapting to light variation (e.g., abrupt changes between different environments) [13, 24]. Although these problems generally diminish with increased experience of an environment, they still make travelling in unfamiliar settings particularly challenging, often preventing VIP from going outdoors altogether.

Despite a significant amount of research on understanding the perceptual and neurocognitive mechanisms by which people with sight loss access and process wayfinding information [8], there is still little practical knowledge of how the management of mental load and stress relates to the wayfinding process itself. This is a critical aspect of designing mobility technologies that has only recently been considered essential in developing an understanding of how environmental factors affect the cognitive-emotional states of the VIP [27]. Two studies in the early 1970s suggested that some form of psychological rather than physical stress is responsible for increased heart rate in visually impaired versus sighted pedestrians [21, 28]. More recently, examination of electrodermal activity [18] and electroencephalography [19] signals recorded from VIP during outdoor travel has shown that they experience psychological stress when walking on busy shopping streets, passing through large open areas, and crossing junctions.

Electrodermal activity (EDA) and heart rate (HR) are well-known indicators of physiological arousal and stress activation in affective computing and human-computer interfaces [5, 25]. EDA is more sensitive to emotion related variations in arousal as opposed to physical stressors, which can be better reflected in the HR signal. Measurements of blood volume pulse (BVP), originally used to monitor HR, can also reflect transient processes in arousal and cognitions [22]. Electroencephalography (EEG), on the other hand, can provide neurophysiological markers of cognitive-emotional processes induced by stress and indicated by changes in rhythmic patterns of brain activity [15, 16].

Taking advantage of the inherent and complementary properties of the EEG, EDA and BVP signals, this paper presents a multimodal approach to automatic inference of environmental conditions affecting VIP when navigating outdoors using a random forest classifier and features extracted from the three signals. The goal of the study was to discover biomarkers that can be used to detect shifts in emotional stress and cognitive load between different urban environments and situations. Aligning this information with GPS coordinates, we further studied the relationship of specific biomarkers with the environmental/situational factors that evoked them.

2 Design and Materials

A route was charted in the city centre of Reykjavik in Iceland (see Fig. 1) with the assistance of caretakers and O&M instructors to take the VIP through situations where different levels of stress were likely to occur. Accordingly, the route comprised eight distinct urban environments representable of a variety of mobility challenges, which can be grouped in three higher-level categories (see Table 1). The route was approximately 1 km long and took on average 13 min 44 s to walk (range = 9–19 min).

Fig. 1.
figure 1

A map of the charted route in the city centre of Reykjavik in Iceland using the OpenStreetMap (OSM) collaborative project (https://www.openstreetmap.org/). Letters depict the different urban environments reported in Table 1; black bars indicate where they start/end; the red-black dot shows the starting point of the walk. (Color figure online)

Table 1. Descriptions and mobility challenges of the different urban environments along the charted route.

Eight VIP with different degrees of sight loss participated in the study (5 female; average age = 39 yrs, range = 22–51 yrs; relevant demographic characteristics are reported in Table 2). To help make them feel comfortable and safe, they were encouraged to walk as usual using their white canes and were accompanied by their familiar O&M instructor. Participants reported having no general health issues. They were instructed to avoid smoking normal or e-cigarettes and consuming caffeine or sugar (e.g., coffee, coke, chocolate) approximately 1 h prior to the walk. Recruitment was based on volunteering and all VIP were capable of giving free and informed consent. The study was approved by the National Bioethics Committee of Iceland. All data was anonymized before analysis.

EEG was recorded using the Emotiv EPOC+, a mobile headset with 16 passive electrodes registering over the 10–20 system locations AF3, F7, F3, FC5, T7, P3 (CMS), P7, O1, O2, P8, P4 (DRL), T8, FC6, F4, F8, and FC4 (sampling rate \(f_s = 128\) Hz). Given the practical constraints involved in an outdoor mobility study, EPOC+ was chosen because it provides a good compromise between performance (i.e., number of channels and scientific validity of the acquired EEG signals) and usability (i.e., outdoor portability, preparation time and user comfort) with respect to other commercial wireless EEG systems [1, 911].

Along with the Emotiv headset, participants were asked to wear the Empatica E4 wristband [12]. E4 measures the EDA signal through 2 ventral (inner) wrist electrodes (\(f_s = 4\) Hz) and the BVP through a dorsal (outer) wrist photoplethysmography (PPG) sensor (\(f_s = 64\) Hz). The wristband also includes an infrared thermopile sensor and a 3-axis accelerometer. E4 is currently the only commercial multi-sensor device developed based on extended scientific research in the areas of psychophysiology and physiological computing. Additionally, it has a cable-free, watch-like design, which makes it easier and more aesthetically pleasant to wear, and thus better fitted to use in outdoor measurements as compared to other wearable devices. Participants were asked to wear the wristband on the non-dominant hand to minimize motion artifacts related to handling the white cane [5].

Table 2. Demographic characteristics of participants and their every day mobility patterns.

Participants walked the charted route twice for training purposes. Directions were only provided during the first walk to help the VIP familiarize with the route. They were instructed to avoid unnecessary head movements and hand gestures as well as talking to their O&M instructor unless there was an emergency. Video and audio were registered by means of a smartphone camera to facilitate data annotation (observing behaviours across the different urban environments) and synchronization (start/end of walk, urban environments and obstacles). In addition, GPS coordinates were logged via a Garmin GPSMAP-64s unit at a rate of 1 registration per second. At the end of the second walk, participants were asked to describe stressful moments along the route.

3 Data Analysis and Experiments

The goal of the data analysis was to explore features and markers from the collected brain and body signals which can be used to detect cognitive load and stress in humans during outdoor physical activity. While the relationship between unimodal physiological signals and psychological arousal has been studied extensively, the detection of stress from fusing multimodal biosignal streams has not been comparatively investigated. Specifically, the analysis focused on EEG (all 14 channels), EDA, and BVP data.

3.1 Signal Processing and Feature Extraction

The Emotiv EPOC+ system involves a number of internal signal conditioning steps. Analogue signals are first high-pass filtered with a 0.16 Hz cut-off, pre-amplified, low-pass filtered with a 83 Hz cut-off, and sampled at 2048 Hz. Digital signals are then notch-filtered at 50/60 Hz and down-sampled to 128 Hz prior to transmission. In this study, the EEG data obtained from the headset was time-domain interpolated using the Fast Fourier Transform (FFT) to account for missing samples due to connectivity issues. Interpolated signals were then normalized to decrease inter-individual variance. For each of the 14 channels, the power spectral intensity (PSI) [23] in each of the \(\delta \)(0.5–4 Hz), \(\theta \)(4–7 Hz), \(\alpha \)(7–12 Hz), and \(\beta \)(12–30 Hz) bands was computed using the PyEEG open source Python module [2]. The PSI of the kth band is defined as

$$ \text {PSI}_k = \displaystyle \sum _{i = |N(f_k / f_s)|}^{|N(f_{k+1} / f_s)|}{|X_i|},\; k = 1, 2, \ldots , K-1 $$

where \(f_s\) is the sampling rate, N is the time series length, \(|X_1, X_2, \ldots , X_N|\) is the FFT of the series, and K is the total number of bands. In total, 56 EEG features were computed.

A measurement of skin conductance (SC) is characterized by two types of behaviour: short-lasting phasic responses (SCRs; can be thought of as rapidly changing peaks) and a long-term tonic level (SCL; can be thought of as the underlying slow-changing level in the absence of phasic activity). Another characteristic is the superposition of subsequent SCRs (i.e., one SCR emerges on top of the preceding one), typically observed in states of high arousal [5]. Skin conductance data obtained from the E4 was first low-pass filtered (1st order Butterworth, \(f_c = 0.6\) Hz) to remove steep peaks stemming from artifacts and subsequently min-max normalized to reduce inter-individual variance [7]. Conditioned SC signals were then decomposed into continuous components of phasic and tonic EDA using a deconvolution-based method implemented in Ledalab, a Matlab based toolbox [4]. Six features were extracted: number of SCRs (hereinafter SCRs), sum of their amplitudes (AS), average phasic EDA (PA), maximum phasic EDA (PM), time-integrated phasic EDA (ISCR), and mean tonic EDA (TonicMean).

The BVP signal recorded by the E4 PPG sensor is preprocessed on board using a proprietary motion artifact removal technique [12]. No further conditioning was implemented and the reported data (i.e., BVP amplitude) was used directly as a feature of cardiovascular activity.

3.2 Classification Design

In order to identify automatically the affective meaning of an urban space based on biosignals recorded from VIP walking through it, we postulated the study as a supervised classification process. A widely-used ensemble learning method for classification was employed, namely Random Forest (RF) classifier [6], selected due to its ability to deal with possibly correlated predictor variables as well as because it provides a straightforward assess of the variable importances. For each of the distinct environments described in Table 1, each time point of the corresponding biosignal data was annotated based on a binary schema per second, where “1” signalled the presence of the participant in the given environment at the given time point and “0” otherwise.

A series of experiments were designed to assess and compare the predictive power of each modality (EEG, EDA or BVP) as well as of their fusion in a feature-level basis, in both single-class and multi-class scenarios (see Table 3). The adjustment of the two most important parameters of RF was performed by means of grid search parameter estimation with 5 fold cross validation. We exploited the effect of the number of estimators [150, 300, 600] as well as the effect of the maximum number of features \([ .5, 1, 2] *\sqrt{\text {NumberOfFeatures}}\). Overall, the optimum number of estimators was 300 and the maximum number of features was set equal to the total number of features for each experiment.

Table 3. Definitions of the classification models assessed for the prediction of each environment independently (single-class scenario) or all environments at the same time (multi-class scenario).

For each experiment we estimated the relative rank (i.e. depth), as emerged from the “Gini” impurity function, of each feature in order to assess the relative importance of that feature to the predictability of the target variable [6]. We trained one model for each of the single-class cases and one for the multi-class experiment following a 5 folds cross-validation schema, where the 80 % of the data points were used for training and the 20 % for testing, with data shuffling in order to avoid dependencies in consecutive data points. The best model is chosen as the one that maximised the area under of the receiver operating characteristic (AUROC) weighted statistic, taking into account the lack of balance between the labels.

3.3 Results

Table 4 summarises the AUROC weighted metric for all the experiments. Both modalities (Exps. I–III) are predictive of the distinct environments, however, the fusion of the two modalities gave particularly high results, not only in the one-versus-all scenario (Exp. IV) but also in the multi-class classification (Exp. V). Figure 2a depicts the weighted ROC curves of the latter in an one-against-all binary scenario, assessing the qualitative performance of each class. Interestingly, we note that the model performs equally well for all classes showing proof of its stability.

Table 4. Classification AUROC weighted metric for all the environments across the various experiments. Exp. IV with feature fusion at level base outperforms all other models almost in all environments closely followed by Exp. II. The reported numbers refer to the mean AUROC over all folds in percentile and in parenthesis the standard deviation is reported.

Figure 2b depicts the ten most predictive features of Exp. V. The feature importances were estimated also for all experiments and the most predictive ones appear always with the highest ranks. Interestingly, we note that the features related to skin conductance are the most predictive, with spectral power of the \(\beta \) brainwaves further dominating predictions. Although real-time EEG acquisition may be subject to very noisy signals, this finding is in line with the neuroscientific literature. A recent study on cognition and cortical activity after mental stress demonstrated that low amplitude beta waves with multiple and varying frequencies are often associated with active, busy, or anxious thinking and active concentration [3]. Another study confirmed that in subjects with high stress both baseline EEG (low frequency wave) and EEG during a stressful task (high frequency wave) were beta waves [16]. Theta waves were also observed during the stressful task and attributed to frustration and disappointment. This finding is in line with the fourth most important feature in the multi-class classification, which is a \(\theta \) wave.

3.4 Visualising Biomarker Density Distributions

To better understand the properties of the most predictive features that emerged from the classification experiments as well as the intensity of the cognitive-emotional responses they express, we assigned feature values to pairs of latitude and longitude coordinates based on recorded timestamps and assessed their geographical distributions by means of weighted kernel density estimation.

The recorded GPS traces were subject to noise due to our request for high sampling rate (1 Hz), therefore each trace was corrected by its Euclidean projection onto a reference route. The high sampling rate allowed us to immediately observe increased concentrations of GPS points when the VIP had to cross a main road (environment F, see Table 1), pass along parked cars in a narrow alley after the urban park (C), walk up and down stairs (E), or pass through a narrow area between construction works (H). In fact, these are the same situations reported as stressful by the participants themselves at the end of the study. Geographic information methods offer great promise in objectively measuring and studying the relationship of biomarkers to human behaviour in terms of physical and transport-related activity.

Let \( \{ \mathbf {x}_1, \mathbf {x}_2, \ldots , \mathbf {x}_n \} \) be an independent random sample drawn from some distribution with density function \(f(\mathbf {x})\) defined on \(\mathbb {R}^d\). The (multivariate) weighted kernel density estimate of f is defined in [26] as:

$$ \hat{f}_H(\mathbf {x}) = \frac{1}{n} \displaystyle \sum _{i=1}^{n}{ w(\mathbf {x}_i,\mathbf {w})\, K_H (\mathbf {x}-\mathbf {x}_i)} $$

where K is a kernel function, \(H > 0\) is a symmetric \( d \times d \) matrix which controls the bandwidth (or smoothing) of the estimate, , and w is a function weighting each data point in the sample with a value from \(\mathbf {w} \in \mathbb {R}^m,\, m \le d \). A popular choice for K is the Gaussian (or normal) kernel, which was also applied here.

The three most predictive features were mean tonic EDA (TonicMean), number of SCRs (SCRs) and the sum of their amplitudes (AS). For each of them, using the values as weights (w with \(m = 1\)) for GPS coordinates (\(\mathbf {x}\) with \(d = 2\)) and a bandwidth of \(H(\mathbf {x}) = 0.0008\), helped estimate the feature-weighted density of GPS points on a \(500 \times 500\) grid, and based on this generate a contour plot for each participant. Figure 3 shows the resulting contours aggregated for all participants and plotted on top of an OSM map (the darker the colour, the higher the density of the distribution). Locations of increased stress-elicited arousal along the different urban settings of the route are clearly illustrated.

Fig. 2.
figure 2

(a) One against all ROC curves for each one of the classes in Exp. V. The overall AUROC weighted metric for the multi-class classification of environments is 93(0.5) and, importantly, the trained model seems able to learn equally well all the different environments. (b) Feature importances in Exp. V. Mean tonic EDA (TonicMean), number of SCRs (SCRs) and the sum of their amplitudes (AS) emerged as indicative features also in Exps. II–IV.

Fig. 3.
figure 3

Contour plot of the kernel density distribution along the charted route in the city centre of Reykjavik in Iceland. GPS coordinates were weighted according to the three most predictive features: mean tonic EDA (TonicMean), number of SCRs (SCRs) and the sum of their amplitudes (AS). The darker the colour is, the higher the density of the distribution is. The lower right figure describes the types of obstacles and situations that evoked increased stress.

4 Conclusions

This study presents a framework for assessing the emotional experience of people with sight loss, while navigating in unfamiliar outdoor environments based on ambulatory monitoring and fusion of multimodal biosignal data. Different urban scenarios were compared, aiming to address the robustness of the model as well as emerging differences in the perception and interaction of the VIP with their surroundings. The high prediction rate (93 % AUROC weighted) is highly encouraging of this approach and, interestingly, the most predictive features of stress and cognitive load indicate as stressful “hotspots” (Fig. 3) scenes that coincide perfectly with the self-reported stressful situations experienced by the participants.

Among the limitations of the study is of course the recording precision of the mobile EEG headset as well as the limited number of participants which does not allow for an in depth analysis of specific stressors in each category of sight loss. Moreover, even if the city of Reykjavik does not present the complexity of big metropolitan areas, the charted route was designed in order to combine some of the busiest streets and most challenging settings reported by the VIP.

Future steps of this research study includes a refinement of the predictive model, extending the categories according to Table 1, as well as expanding to indoor navigation scenarios. Such findings hopefully pave the way to mobile technologies that take the concept of navigation one step further, accounting not only for the shortest path in an urban route but also for the less stressful and safer one.