Machine learning approaches to understand the influence of urban environments on human's physiological response

This research proposes a framework for signal processing and information fusion of spatial-temporal multi-sensor data pertaining to understanding patterns of humans physiological changes in an urban environment. The framework includes signal frequency unification, signal pairing, signal filtering, signal quantification, and data labeling. Furthermore, this paper contributes to human-environment interaction research, where a field study to understand the influence of environmental features such as varying sound level, illuminance, field-of-view, or environmental conditions on humans' perception was proposed. In the study, participants of various demographic backgrounds walked through an urban environment in Zurich, Switzerland while wearing physiological and environmental sensors. Apart from signal processing, four machine learning techniques, classification, fuzzy rule-based inference, feature selection, and clustering, were applied to discover relevant patterns and relationship between the participants' physiological responses and environmental conditions. The predictive models with high accuracies indicate that the change in the field-of-view corresponds to increased participant arousal. Among all features, the participants' physiological responses were primarily affected by the change in environmental conditions and field-of-view.


Introduction
Understanding influence of the environmental conditions on human perception is complex.Various environmental features e.g., sound level, temperature, and illuminance affect our senses.Therefore, we adopted enhanced measurement and analysis techniques to define and measure what influences citizens in dynamic urban environments.The environmental features measured in this research include sound level, dust, temperature, humidity, illuminance and the field-of-view since they influence a person's The features of the data were recorded through devices and sensors at varying frequencies, which had both temporal and spatial properties.The features had a temporal property due to continuous recording, and the features had spatial characteristics because the recording's association with the change in locations-global positioning system (GPS).Hence, in this research, we proposed a framework that perform signal preprocessing, signal filtering, signal quantifications, data fusion, and data labeling to answer the defined research questions.
Machine learning based techniques have been successfully applied for knowledge mining and pattern recognition in various real-world situations [32,39] since they are useful in identifying the underlying patterns within data [1,25].Thus, we formulated the processed data such that four state-of-the-art machine learning techniques, classification, fuzzy rule-based inference, feature selection, and clustering, were applied for discovering patterns in the participants' physiological responses related to the urban environmental conditions.The first step in this research was to assess the predictability of participants' perception (physiological responses) of the urban environment.Thus, a ten-fold cross-validation was performed on a reduced error-pruning tree (REP-Tree) classification model [29].Following the classification approach, a fuzzy rule-based learning inferential model was built, using fuzzy unordered rule induction algorithm (FURIA) [17], to investigate the relationship between the urban environmental features and the physiological response measures.Subsequently, the importance of various urban environmental features was analyzed by applying backward linear feature elimination filter (BFE) [22].Furthermore, self-organizing map (SOM) [18] was applied to visualize the impact of urban environment features on participants' physiological responses.In the final step, a method for referencing GPS location (geo-location) to compute mean physiological response across all participants was developed.Since various methods were involved in data processing, additional graphics and multimedia can be found on the project website [12].
In summary, following are three essential contributions of this research: (a) a field study design to understanding human perception of the urban environment; (b) a framework design comprising signal processing, signal quantification, and data fusion methods that invokes a novel of approach in physiological data quantification; (c) a comprehensive analysis using four machine learning methods to discover the patterns which are crucial to our understanding of human perception in urban settings.
We organized this paper into seven Sections.Section 2 places this research in the context of literature and describes the experimental procedure.Section 3 describes signal preprocessing, multi-sensor information fusion, and machine learning techniques in detail.Section 4 is devoted to explaining the obtained results followed by a comprehensive discussion in Section 5.The challenges and opportunity of the research are presented in Section 6, and Section 7 concludes the findings of this research.
2 Human perception of the urban environment

Literature review
The process of measuring physiological data as an indicator of human perception is complex, particularly in real-world application since perception can be influenced by various factors [2].However, physiological pattern recognition can derive significant evidence about human perception [27].Similar to our research, Picard et al. [27] focused on physiological sensor data, specifically skin conductance, and they related high and low arousals as positive and negative biological reactions.Also, Picard et al. [27] focused on the collection and filtering of the physiological data to construct good quality data void of failure and corrupt signals.They formulated physiological data so that a k-nearest-neighbor classifier can predict human's physiological arousal-based perception.Krause et al. [19,20], on the other hand, used wearable device data, including physiology based sensor data (galvanic skin response), to identify user's state in terms of physiological and activity context using SOM based clustering.Specifically, they performed unsupervised learning to classify sensor data to determine the context from which the signals were generated.
In Wang et al. [38], pattern recognition and classification of physiological sensor signals were performed by first decomposing signals into its constituent features and by applying support vector machine to classify negative and positive emotion labels.Here, the label associated with the signals were predefined during the experiment by exposing the participant to negative and positive environments during the recording of signals.Rani et al. [31] performed an empirical study of four machine learning techniques: k-nearest neighbor, regression tree, Bayesian network and support vector machine for the recognition of the emotional state from physiological response data.They performed signal processing to evaluate features from the physiological data and labeled them with the emotional state reported by the participants.
Since we investigate "cause and effect" between the environmental conditions and the human's perception, unlike Wang et al. [38] and Rani et al. [31], we performed signal processing on the physiological data to evaluate skin conductance response (SCR) arousals [40].Subsequently, we assigned labels to signal fragments based on the degree of arousal within a specified time.While doing this, we considered physiological data as the output in the classification model and the signals from the environment as the inputs.Whereas, Wang et al. [38] and Rani et al. [31] considered features of the processed data as the inputs and the reported environment as the output.Our approach, to first determine arousal level was adopted because of the complexities of the urban environment and because we cannot accurately consider an urban environment to be positive or negative towards the perceptual quality of a participant.Thus, we labeled environmental conditions as the positive and negative by considering physiological data as the target in the classifier's training.
Ragot et al. [30] found that the physiological response signals from the Empatica E4 wearable device were closely comparable to laboratory-based measurement devices.They also found that the data from such wearable devices could be used to train a support-vector-machine classifier to recognize the participants' emotional state.Similarly, Poh et al. [28] confirmed that EDA data from wearable devices is comparable to laboratory devices and the data are a valid physiological measure.Hence, was our approach in this study to employ Empatica E4 to perform physiological measure.

Study design and measurements
We designed a study to understand the general pattern(s) of human perception related to events which occur in a dynamic urban environment.An event indicates the change in the environmental condition, and also, a sample of the measured environmental data.As a case study, we selected a neighborhood in Zürich, Switzerland (Fig. 1a), and invited participants to take a leisure walk on a predetermined path (Fig. 1b).The participants were equipped with a "sensor backpack [14]" and an Empatica E4 wearable device [11].The 1.3 km walking path was carefully selected , which covered a diverse urban scenario [15], e.g., spacious and narrow streets, green and urban areas, and loud and quieter locations.
Our sensor kit [14] measured the changes in sound level (decibel, dB), the amount of dust (mg/m 3 ), temperature ( • C), relative humidity (%), and illuminance (lx).We also calculated field-of-view based on the GPS information and spatial configuration of the neighborhood.The field-of-view is formerly described as the Isovist descriptor, which refers to the open space a person can view from a single vantage point [4].Since participants were walking in a forward direction, we considered 180 • fieldof-view with a distance of 100 m.Subsequently, the Isovist descriptor for each participants' walk was measured by drawing a polygon around the participants' 180 • field-of-view at their specific GPS locations.From this, the following measures of the Isovist polygons were calculated: Area-polygon's surface area; Perimeter-polygon's perimeter length; Compactness-the ratio of area to the perimeter (relative to an ideal circle); and Occlusivity-the length of occluding edges.The EDA measures the individuals' physiological state [6], which was recorded using Empatica E4 wearable device, similar to studies by [11,12,13].We placed the wearable device on participants' non-dominant hand and let it adjust for 10 minutes according to Empatica guidelines [11].The data were recorded on the Empatica website and corrected for motion artifact [11].The EDA measure (physiological response) was a time-series signal and has temporal dependencies.The sensor backpack, on the other hand, was designed to capture the contextual-based events that occur in an urban environment.In the context of this study, an event is non-temporal since an event is dependent on the instance of its observation.Therefore, the continuous signals recorded for environmental features and the continuous signals recorded for participants' physiological responses were quantified in two different manners (Section 3.2).Moreover, since the recorded signals were associated with the geographical location, they also had spatial properties.The primary infrastructure of the urban environment and season (April 2016) were uniform.However, inherent diversity occurred from different experiment days, time-of-day, and participants demographic background.The data for both environment measures and corresponding participants' physiological response measures are summarized in Table 1.

Methodologies
A comprehensive signal processing and data-preprocessing framework was proposed in order to apply select machine learning methods.Fig. 2 illustrates the framework and describes how it was used for information fusion and knowledge mining approaches.Here, e i and r i indicate i-th quantified event (a sample in the quantified environmental data) and response (a sample in the quantified physiological response data) respectively.The variable m j for j ∈ {1, 2, . . ., N } indicates the total number of samples belonging to the j-th participant p j .The information, therefore, was fused in three stages: (a) Each participants' event-based data (e) are collected from five sensors, which were re-sampled to a unique frequency and samples were aligned as per with on their time (Fig. 2, mark "A").
(b) The environment and response data from each participant were independently cleaned, filtered, and quantified.Each participants' quantified event and response data were fused (paired) by assigning a quantified response r i to event e i (Fig. 2, mark "B").
(c) The paired participants' data were then stacked (Fig. 2, mark "C").The three-stage information fusion approach produced the compiled dataset, which was fed to select machine learning techniques.For each machine learning technique, the compiled dataset (Fig. 2, mark "C") was arranged and configured as per the techniques' requirements and objectives.

Frequency unification
The environmental features sound and dust were collected at 0.4 Hz frequency; while GPS position, temperature, humidity, and illuminance were collected at 1 Hz frequency (Table 1).Therefore, an up-sampling mechanism with a linear interpolation was applied to sound and dust data [5] to unify the frequencies of the gathered data.All features were then aligned to the same timestamp, which was crucial to ensure that all sensor values belong to an exact event during the study.

Signal filtering and smoothing
The physiological response data (EDA signals) were kept at their original 4Hz frequency to maintain the information required for arousal detection from the physiological data.With close inspection, we found that some participants EDA signals were unusable and were discarded.The remaining (accepted) EDA signals were first smoothed and then filtered to remove artifacts as recommended in EDA literature [6,8].  in [8] suggested an adaptive method for SWT-based smoothing for EDA signals recorded for long periods (30 hours).In our study, EDA signals were recorded for 25-29 minutes.Therefore, we applied a one-level SWT and reverse-SWT for smoothing.Each EDA signal was transformed using "Haar" as a mother wavelet in the SWT [24].A one-level SWT transformation was performed on each signal;

Physiological data selection
and on the obtained wavelet coefficients, a threshold of value ±0.001 was applied to eliminate larger fluctuation in the signal.That is, the values of wavelet coefficients above +0.001and below −0.001 were cut off (Fig. 4a).Finally, a reverse SWT was applied to the transformed signal to produce a smoothed signal (Fig. 4b).
Truncation of the unwanted signal fragments SWT based treatment to the EDA signals eliminated the large fluctuations from the signal.However, some sharp drops in signal (corrupt fragment) caused by artifact were not filtered out completely.Thus, the corrupt fragments and participants' waiting time fragments of EDA signal were truncated from both original (raw) and smooth EDA

Signal quantification and labeling
Signal quantification involved three steps: time-window marking, arousal detection, and data labeling.
In fact, these are the critical steps in the fusion of the environmental data and the physiological response data.As shown in Fig. 2, at first, physiological data were quantified, and then, the timestamp information was passed to the environmental data for its quantification.

Time window marking
Each EDA signal's timestamp information was compared with the timestamps recorded at various stages during a participants' walk.Based on signal filtering shown in Fig. 4b and available timestamp information, the signal fragment belonged to the walking duration-indicated by Start and End in Fig. 5a-were marked with a regular interval of time-window size t seconds.Such a time-window marking was crucial to our data analysis to observe participants physiological states in relation to their experience of the events occurring at a regular interval of t seconds (Fig. 5a).
Therefore, for each time-window, event e p j i for i = 1 to m j experienced by participant p j is a vector of the environmental features and was computed by averaging the values of signal fragment (environmental measurement) at the i-th corresponding time-window.On the other hand, the participants physiological response r  detection method described in Section 3.2.2.Additionally, the participants' field-of-view (Isovist descriptors: area, perimeter, occlusivity, and compactness) were computed at the start of each timewindow.Thus, participant quantified data p j had an identically independent vector of environmental conditions (event e p j i ) and a corresponding physiological state (response r p j i ) for each time-window.

Arousal detection (EDA)
The level of arousal r p j i in an EDA signal depends on identifying a specific signature (pattern) called skin conductance response (SCR) or arousal [3,6,9,33,35].The state of arousal in an EDA signal is typically defined as a peak having a specific signature [6].We processed the EDA signals using a skin conductance processing tool Ledalab [3].Ledalab offers a continuous decomposition analysis (CDA) method for analyzing an EDA signal.In CDA, an EDA signal is decomposed into tonic skin conductance level (SCL) and phasic drivers SCR.
We performed CDA on each EDA signal data-of each participant-by using the recommended settings in Ledalab [3].That is, the signal's optimization procedure was performed two times, which automatically determined the optimization parameters for evaluating the number of significant SCR (nSCR) above a defined threshold 0.01µSiemens within a time-window.We used nSCR, because we could not, in a theory-driven manner, define what stimulus (event) caused a change in participants "physiological arousal state."Thus, we relied on a data-driven approach by analyzing phasic SCR, a non-specific fast changing EDA measure; i.e., the number of peaks in phasic skin conductance response measures nSCR to any kind of event for the given time-window.Therefore, the nSCR gave us the measures of r p j i shown in Fig. 5b.

Data labeling
When aggregating all participants data (Fig. 2, mark "C"), we observe that nSCR value for a timewindow vary from 0 to 12.An nSCR value 0 indicate that, in a time-window, a participant had a normal physiological condition.On the other hand, an nSCR value greater than 0 for a time-window indicates that a participant experienced a state of arousal at least once in that time-window.Thus, for the labeling of each time-window-of each participant data-a binary-class label indicating a binary state of phasic nSCR r p j i can be used, where (a) class 0 is "normal" physiological response ("N"), i.e., an nSCR value equal to 0; and (b) class 1 is "aroused" physiological response ("A"), i.e., an nSCR value greater than to 0.
A multi-class classification was also used, in which case, aroused physiological response, "A" has two categories: class "LA" indicating low arousal response, i.e., 0 < nSCR < 6 and class "HA" indicating high arousal response, i.e., nSCR ≥ 6.A total of 6,057 samples and 9 input features were available in the compiled dataset for a time-window size t (quantification rate) of 5-seconds.In the compiled data, 3,491 samples belonged to the category "N" and 2,566 samples belonged to the category "A," i.e., approximately 60% and 40% of the samples respectively belong to "N" and "A."Furthermore, in the multiclass classification, 2,079 samples were labeled "LA" and 487 samples were labeled "HA."

Non-inferential modeling
We build a predictive model consisting of the environmental features as the inputs, and binary (and multiclass) quantified arousal level as the output using REP-Tree, which is a decision tree learner [29].
In a decision tree, a tree-like predictive model is built, where the leaves represent the target (e.g., the class labels: "N" or "A") and the branches represent an observation for a feature (e.g., sound level) at a node.REP-Tree is a method applied to reduce the size of a decision tree, where it keeps pruning subtrees by replacing it with a leaf (a class label) as long as the error does not increase (i.e., the accuracy of the model does not decrease).
We chose REP-Tree to build a predictive model because the algorithm constructs a decision tree, where each node makes a decision for a feature, and its specific value produces a particular class label.While making a predictive model, REP-Tree chooses the most significant features based on their contribution to the model's accuracy, which is advantageous for this problem since it is uncertain which environmental features influence physiological responses.For the validation of the model's predictive performance, we chose ten-fold cross-validation (10-fold CV).Section 4 describes the test accuracies of 10-fold CV based REP-Tree training.

Inferential modeling
Contrary to non-inferential modeling, inferential modeling explains the relationships between the input features and the output feature.A fuzzy rule-based inference system is capable of describing how independent environmental features are related to the dependent physiological response (phasic nSCR) feature.For this, we applied FURIA, which is a fuzzy rule-based classifier [17].
Unlike conventional rule-based classifiers, FURIA gives a fuzzy rule [17].FURIA produces fuzzy rules with operators ≤, =, and ≥; the operators define clear conditions for a feature's association with a class label (e.g., "N" or "A").FURIA also provides a range (e.g., x → y) indicating fuzziness in feature's condition, which may be considered as a soft boundary while associating a feature with a class label [17].This ability was particularly useful in this study since we wanted to observe the specific values range of the environmental features that corresponded to a participants' state of arousal.For instance, we needed to determine for which particular sound level range, a participant experienced a state of arousal.Since FURIA fulfills this requirement, it was selected as the technique for inferential analysis.Interpretation of the obtained rules is described in Section 4.

Feature selection
Feature selection is a process to determine the ability of each input feature to predict the output.
Moreover, feature selection involves making a model using a subset of features and testing its predictive accuracy.We applied backward feature elimination (BFE) method in this research for its ability to examine all possible combinations of feature subsets [22].BFE starts with all features in a set (in this case, it begins with 9 features) to build and test the model.Subsequently, BFE iteratively eliminates features one-by-one while propagating high accuracy feature subsets to the next iteration.Finally, BEF gives a list of subsets with their corresponding accuracies, from which a subset can be selected depending on the accuracy or the number of features required.In addition to REP-Tree, MLP [16] and SVM [7] were used for a more comprehensive analysis in BFE.Therefore, the feature selection result was an assessment of three different predictors.During the feature selection, at each iteration, BFE used 60% randomly selected samples for training and the rest 40% samples to test the model.

Pattern discovery
In general, the primary aim of self-organizing map (SOM) is to map m-dimensional data onto a 2dimensional (2D) plane.The 2D plane of SOM consists of a network of neurons (nodes).The network's nodes acquire the underlying property of the input data samples (e.g., events in the environmental data).Moreover, a SOM projects similar data samples to a cluster center (a node in a SOM) as per the similarity (Euclidean distance) of the data sample to the node [18,37].
SOM is an appropriate choice for this problem since it is tedious to define the number of clusters, especially when problems have complex relations between the features.SOM produced clusters automatically (see Section 4.4).Additionally, to analyze pattern related to the geo-locations, geo-locations referenced mean physiological response r meani = (r p 1 x i ,y i + r p 2 x i ,y i + . . .+ r p N x i ,y i )/N across all participants was computed by matching GPS location information (x i : latitude, y i : longitude) and aggregating the samples.Geo-location referenced mean physiological responses r meani were computed to visually understand patterns in participants' physiological responses related to the actual map of the neighborhood, described in Section 4.4.

Sensitivity analysis (non-inferential modeling)
First, a classifier (REP-Tree described in Section 3. curve plot [26] in Fig. 6. The model's performance improved as the quantification rates decreased (Fig. 6).The model's high predictability for smaller quantification rates is an indicator of the participants' strong sensitivity towards the changes in the urban environment.The model's performance for smoothed EDA data (red square) was better than the model's performance for raw EDA signal (circles).Thus, the smooth EDA data more accurately draw the association between a change in environmental features and participants' physiological states of arousal.
The results of the 10-fold CV training of the RET-Tree classifier for both binary and multiclass classification for the dataset where smooth EDA data were quantified at 5-second time-window as shown in Table 2.The classifier's predictive accuracy was found to be 87% for the binary-class classification and 80% for the multiclass classification.

Sensitivity range analysis (inferential modeling)
The non-inferential model indicates that the participants' physiological responses are sensitive to the environmental changes.Therefore, we build an inferential model to understand how environmental features influence participants' physiological responses.A fuzzy rule-based inferential model was built using FURIA whose parameter settings are mentioned in Table A.1.We adopted a binary-class classification of nSCR, where nSCRs were categorized into two classes: normal physiological response, "N" and aroused physiological response, "A."The FURIA algorithm offered an average test accuracy of 70.23% after a 10-fold CV training.Such accuracy is notably high for the complex problem of understanding the humans' perception of their urban environmental conditions.
We analyzed the set of fuzzy rules generated by FURIA by segregating the rules between the participants' "N" and "A." Fig. 7 is a visual interpretation of the obtained fuzzy rules for both classes "N" and "A."We interpreted and represented the FURIA rules in Fig. 7 to find the values (range of values) of the environmental features that (a) were linked to class "A," which indicates participants' aroused physiological state; (b) did not significantly influence the participants' aroused physiological state.
To validate the knowledge obtained from the visual interpretation of fuzzy rules, distributions of the environmental features were examined through histograms in Figs.7b, 7d, 7f, 7h, 7j, and 7l.The visual interpretation and summarization of the rules for sound level in Fig. 7a and its corresponding distribution in Fig. 7b indicate that the participants normal physiological responses match a particular sound level distribution.For example, the sound level distribution around 60 dB to 66 dB (Fig. 7b) correspond normal physiological state (Fig. 7a).Furthermore, the participants had a tendency to exhibit aroused physiological state when experienced sound level above 66 dB.This result indicates that loud sound levels correspond to increased participant arousal.
The result was similar for temperature, where temperature degrees greater than 21-22 • C were associated with aroused physiological state (Fig. 7e).However, it can be observed that the samples in the dataset for temperatures above 22 • C were fewer than for the temperature degrees below 22 • C (Fig. 7f), which we could take as confidence that heat alone did not cause the physiological arousal of participants.In (Fig. 7i), the participants exhibited physiological arousal for darker locations (illuminance level below 580 lx).

Simultaneous impact of environmental features
Inference modeling provided the values for environmental features that were responsible for normal and aroused physiological states.However, it is also essential to discover which of the environmental feature(s) have the strongest influence on the participants' physiological responses.Hence, we constructed a backward linear filter elimination (BFE) based feature selection framework and analyzed the obtained results to build a significance hierarchy of feature subsets (Fig. 8).A feature subset's significance was estimated on its ability to predict "N" and "A" classes with high accuracy.
Fig. 8 is a significance hierarchy triangle of the feature subsets, where a subset's predictability reduces when the number of features in the subset decreases.Three predictors provided three feature selection result sets.Fig. 8 is the compilation of the three result sets from all three predictors.The MLP, REP-Tree, and SVM agreed on the feature subset temperature, humidity, illuminance, and Isovist area, where the REP-Tree had the highest accuracy, followed by SVM and MLP.Therefore, temperature, humidity, illuminance, and Isovist area, were noted as the most significant feature set but is a matter of trade-off between accuracy and number of features as indicated in hierarchy triangle (Fig. 8).

Patterns of perceptual variations
The predictive modeling confirmed the sensitivity of participants' physiological responses towards dynamic environmental conditions.The fuzzy rule-based analysis described the relationship between the environmental features and the physiological response.Feature selection indicated the most significant environmental features.However, pattern discovery explains: (a) which participants were experiencing a similar environmental conditions and what were their response; (b) whether the participants' physiological responses for certain environmental conditions were similar; (c) the patterns of the environmental features that influence the participants physiological arousal.
The compiled data (see Fig. 2) were analyzed using SOM.Fig. 9 is a result of automatic clustering from a trained SOM, where the 9-dimensional input data were mapped onto the 20 × 20 dimension 2D plane consisting of hexagonal nodes.Each node in the map acquired the property of a set of samples.
Fig. 9a shows the maps of the environmental features on feature matrices (F-matrices).On a feature matrix (F-matrix) of an environmental feature (e.g., sound level), the features' value assigned to Fmatrix nodes are corresponding to the nodes on the SOM's unified distance matrix (U-matrix) in Fig. 9b and Label matrix (L-matrix) in Fig. 9c.Hence, the position and value of the nodes in all the maps (matrices) in Fig. 9 are comparable to each other.More specifically, the U-matrix is the result of the F-matrices of the environmental features, and the L-matrix is the corresponding dominant label associated with the nodes.Therefore, to make sense of the pattern, we need to compare all matrices with one another.
The U-matrix in Fig. 9b shows the clusters of similar data points.The nodes with small differences (in terms of Euclidean distance) are shown in dark blue, and the nodes with high differences and are shown in bright yellow.In addition, the patches of nodes with similar colors, separated by lighter colors, indicate the clusters of data samples.Moreover, the data samples corresponding to a cluster in the U-matrix share a commonality, and dissimilar data samples are further apart.It is therefore implied that the participants' ID label belonging to a cluster experienced similar environmental conditions.ganization of the dataset.This could carefully be interpreted as a "cause" (Fig. 9a) and "effect" (consult with Fig. 9b and Fig. 9c) of the dynamic and simultaneous environmental features with the participants' physiological responses.
On the U-matrix (Fig. 9b) a bright yellow patch separates itself from all the other nodes clusters.This distinctly available yellow spot is the result of a high concentration of a set similar input samples, which in this case, is due to the concentration high illuminance values as evident from F-matrix for illuminance (Fig. 9a).Fig. 9c shows that at the exact same spot, participants' had aroused physiological state (most of the nodes are colored blue) and nodes were labeled with participants ID's (8, 13, 23, and 29) indicating that all the participants exposed to extremely high illuminance also experienced an equal aroused physiologically state.In pattern analysis, the mean physiological response across all participants was mapped onto the geographic location along the path.The geo-location referenced mean physiological response was Fig. 10: Geo-location referenced mean physiological responses across all participants.An animation of this graphic indicating real-time simulation is available at [12].
computed and normalized between 0 and 1.The geo-location referenced physiological responses highlighted specific locations on the neighborhood's map where participants experienced aroused physiological state (Fig. 10).The locations, where on average all participants exhibited high physiological arousal response are indicated in red while low physiological arousal is indicated by yellow.Varying size of dots on the map in Fig. 10 is proportional to the degree of participants' physiological arousal.

Discussion
Through this research, we extracted patterns from the data gathered during a controlled study, where we asked participants to walk through an urban environment (Section 2.2).Our data analysis methods had the following dimensions: signal processing, multi-sensor information fusion, and knowledge mining using machine learning techniques.The sensor frequency unification and quantification led to the preparation of identically independent data samples of events and corresponding physiological response.During the data processing phase, we categorized physiological response data (EDA signals) into clean and erroneous signals (Section 3.1).EDA signal recording is susceptible to artifacts and the suggested definition identifies an erroneous EDA signal.Finally, the quantification method segmented the continuous temporal data into regular time intervals of t-seconds ( time-window size) and the ological arousal state) and expected to fall into the same cluster or node on the map.For example, a cluster formed due to extremely high illuminance and another for low illuminance conditions (Fig. 9).
This indicates that a particular environmental condition influences most of the participants equally and the majority of participants responded a similar physiological response state when experiencing similar conditions.Furthermore, because the participants walked at different speeds, the number of quantified events corresponding to each participant slightly varied.Therefore, the geo-location referenced normalized mean of the events was the best method to show the geolocation of the participants' average physiological responses on the map (Fig. 10).This map can be used to visually inspect the impact of urban features, such as street-width, street-type, traffic, type of area (residential and industrial) and their potential impact on the participants' physiological response.

Challenges and opportunities
The methods developed for this investigation help reveal patterns from complex human-environment interactions.The analysis predominantly focused on improved quantification methods for physiological arousal level detection and a means to correlate arousal level with environmental stimuli.This approach allows us to observe an increase in physiological arousal in response to specific environmental conditions (Section 5).The primary challenge of this study was the process of selecting the appropriate tuning parameters to quantify and evaluate the arousal label.For example, the accuracy of the methods (Fig. 6) varied depending upon the quantification rate.Similarly, the accuracy of the method depends on the procedure and threshold adopted for the nSCRs level detection [6].Moreover, we captured 9 features of a real-world dynamics situation.Hence, increased number of features may further improve the predictive model's accuracy.
Future studies can utilize the presented experimental design and quantification methodology.For instance, it can be extended to capture citizen's public transport commuting experience (physiological response while walking, waiting, and riding), and for traffic safety, the method can be potentially applied to understand the physiological arousal pattern of vehicle riders while they ride through cities [10,34].Moreover, the developed predictive model can be used to extrapolate the potential citizen's arousal levels to a larger geographic area when combined with the isovist values and measured environmental data beyond the selected path.
In this research, we recognized factors influencing humans perception.Whereas to meet the refereed challenges, our findings suggest that further employing virtual reality set-up could help reducing noise that may be induced by unknown factors.Additionally, our findings suggest that a subjective thresholding skin conductance can also be employed to mitigate the challenges.
Moreover, in the field of urban studies, it is crucial to understand how the built environment influences human behavior and perception.This question has been central to the practice and research ever since and poses a fundamental methodological problem since it is especially difficult to a) objectively measure perception and b) deal with the multitude dynamic environmental factors preventing to identify the effect of pure urban form on human perception.As an answer to this problem, this research provides a major contribution by presenting and empirically testing a novel research framework for predicting and inferring the effects of planning decisions on human perception.In essence, the framework provides insides into How, and Why do architecture and urban design influence human perception which is particularly helpful for evaluating planning proposals and to guide the design decisions.For this purpose, we adopt the state of the art mobile sensing technologies as well as machine learning methods which are specifically chosen and adapted for needs of architecture and urban design research.

Conclusions
This research presented a specific methodology to evaluate a complex dataset from an experiment with physiological responses of 30 participants linked to environmental conditions.The measurements in the dataset came from seven sensors with differing frequencies and four additional geometric features.The proposed data quantification and multi-sensor information fusion methods linked participants' physiological state of arousal to environmental conditions.Four categories of machine learning techniques (non-inferential modeling, inferential modeling, feature selection, and clustering) revealed patterns in the dataset: The high accuracy of the non-inferential predictive model was an evidence of the participants' physiological state sensitive to the changes in environmental conditions.The fuzzy rule-based inferential modeling results indicate that the occurrence of "normal" and "aroused" physiological conditions corresponds to specific values (and range of values) for each environment feature.It suggested that the changes in the participant physiological arousal state primarily occurred due to the fluctuations in the environmental conditions.Feature selection showed that some environmental features, such as temperature, humidity, illuminance, and the-filed-of-view were more dominant in their influence on participants' physiological response than sound level and dust.Pattern analysis from self-organizing map indicated that, primarily, the participants who experience similar environmental conditions responded in similar physiological arousal state.Finally, the geo-location referencing of average physiological response across all participants produced a means to visually inspect how participants respond during the actual walk in relation to permanent urban features.The proposed data analysis framework revealed patterns from the complex spatial-temporal environmental and physiological data that impact our understanding of urban settings.
Figs. 3a, 3b, 3c, and 3d were considered for the data analysis.The EDA signals belonging to the two erroneous EDA profile types illustrated in Figs.3e and 3f were discarded.In total 10 EDA signals were discarded.The erroneous EDA signal types were classified as: (a) Type-1 error, when EDA signal values only fluctuate between two values, i.e., the EDA signal behaved like a step function, and the signal may also contain a significant amount of sensor loss (no sensor response record).(b) Type-2 error, when the majority of the sample values were zero (significant sensor response loss), despite the otherwise normal fluctuations (correct sensor response) in EDA signal.

Fig. 3 :
Fig. 3: Signals in (a), (b), (c), and (d) are the most commonly found EDA signal profiles and considered for the analysis.Most commonly found error in signals are shown in (e) and (f).

Fig. 4 :
Fig. 4: Stationary Wavelet Transform based smoothing.(a) Wavelet transform of an original EDA signal using Haar wavelet, and smoothing by applying a threshold over wavelet coefficient.(b) Original and smoothed EDA signal with filtering of corrupt and unnecessary fragments.

Fig. 5 :
Fig. 5: (a) Timestamp is indicating Start and End of a participants' walk during the study.It illustrates the approach to quantify a participant's physiological response and environmental experience data (b) Timestamp and time-window marking for an EDA signals (physiological response) at every t seconds for the detection of arousal r p j i for i = 1 to mj.

Fig. 6 :
Fig. 6: ROC graph of classification models on two categories of datasets represented in two different shapes: square and circles.Square represents dataset prepared with the output feature being the quantified smoothed EDA data; circles represent dataset prepared with the output feature being the quantified original EDA data.

Fig. 7 :
Fig.7: Visual interpretation of the fuzzy rules.The color "red" indicates the range for which the fuzzy rules finds nSCR > 0, i.e., an indicator of aroused physiological state.The color "blue" indicates the range for which the fuzzy rules finds nSCR =0, i.e., an indicator of normal physiological state.The color "white" indicates a range of fuzziness.The color "gray" indicates the range for which rules do not provide any conclusive information.

Fig. 8 :
Fig. 8: Hierarchy of feature importance.The symbol I* appeared only in the REP-Tree based feature selection.The feature set {T,R,A,I} appear in all three predictor's results.

Fig. 9 :
Fig. 9: Trained SOM results; node value in the maps are indicated by color: lowest value is shown in dark blue, and the highest value is shown in bright yellow.(a) U-matrix: SOM clustering map.(b) F-matrix: maps for environmental features, which were linearly scaled with a variance of 1.0 so that they have equal importance in clustering.(c) L-matrix: participant ID and participants physiological response state label ("N" and "A") map.

Fig. 9c is
Fig. 9c is an L-matrix with each node was labeled with participant ID and the state of physiological response.White nodes indicate a normal physiological response and blue nodes indicate the aroused physiological response.By comparing these matrices, one can discover relevant patterns in the or- Fig.9a, we can find that the clusters at the bottom-left and the top-left in Fig.9bare the results of high values of sound and temperature and extremely low values of illuminance.These clusters, when compared to L-matrix in Fig.9c, indicate that the majority of participants responded with an aroused physiological state.Similarly, the cluster on the top-right is due to a combination of low values of dust and temperature.The corresponding L-matrix in Fig.9chas the majority of nodes indicating a normal physiologically state.Further, the F-matrix for Isovist area in Fig.9ashows that the high value of Isovist area resulted in an aroused physiological state, also evident from the L-matrix in Fig.9c.L-matrix also indicates that participant IDs 16, 23, 24, 29, 32, and 35 experienced such a high Isovist area and responded with a similar physiological state.

Table 1 :
Measured features in the study.