11 The Human as the Mind in the Machine: Addressing Big Data

,


'Implicitly' Processing Complex and Rich Information
Over the past decades, researchers have become increasingly interested in how implicit cognitive processing works and how it influences our knowledge and behaviour. It has been claimed that a key element in the definition of implicit processing is the absence of conscious awareness of stimuli, or of patterns within stimuli. In contrast to explicit processing, which is related to those aspects of cognition and perception which are available to an individual's conscious awareness, "implicit processing refers to the acquisition of information expressed [...] in the absence of subjective awareness of the information acquired" (Morton & Barker, 2010, p. 1090Faulkner, & Foster, 2002;Reber, 2013).
Much evidence supports the distinction between implicit and explicit processing on a number of distinct axes. In the cognitive domain, Zimmerman and Pretz (2012) refer to explicit processing as rule-based, analytical and deliberate, and as dependent on awareness, attention, and effort, whilst describing implicit processing as associative, holistic, unaware, and effortless. According to this definition of implicit processing, these authors hypothesised that individuals can respond appropriately to complex relations without conscious effort and -more controversiallythat the quality of scientific problem solving can sometimes benefit from operating in the absence of the goal to find a rule. Along these lines,  famously described such hypothesis as the 'deliberation-without-attention effect' (DWAE), which proposes that, under certain specific circumstances, as complexity of a decision increases, the quality of a decision made at the conscious level decreases. In contrast, decisions made at the unconscious level can be less affected by task, so that unconscious thought (i.e. deliberation without attention) may actually lead to better choices when encountering complex issues. Supporting this hypothesis, Dijksterhuis conducted four studies on consumer choice. In one study (Dijksterhuis, 2004), participants were asked to choose one of four apartments based on their attributes: one apartment was made to be attractive; another was made to be less appealing; the remaining two were average, with the same amount of positive and negative attributes. In the experiment, participants were either asked to choose their favourite immediately, or to think about the apartments for three minutes and then state their preferences. Participants in a third condition were asked for their preferences after being distracted for the same three minutes, thereby engaging in what is called unconscious thought. As predicted by DWAE, those who engaged in unconscious thought made better decisions than those who were given three minutes to think. Furthermore, those who were asked to choose immediately picked more poorly than did either of the thought groups. The results across the four studies were taken as support for the idea that conscious deliberation works very well for simple tasks involving a small amount of information, presumably because the limited capacity of conscious deliberation is able effectively to weigh a small number of attributes in a rational and efficient manner. However, when one is presented with large and complex information, unconscious deliberation can be more effective and more likely to lead to a more satisfying choice than mere conscious deliberation alone (Dijksterhuis & Van Olden, 2006).
The DWAE proposed by Dijksterhuis is only one of the empirically testable hypotheses that were used to describe a broader theory called unconscious thought theory (UTT), in an attempt to locate unconscious cognition within the broader decisionmaking literature. According to UTT, both conscious thought (CT) and unconscious thought (UT) have different characteristics and can be applied in different circumstances. The key difference between these two modes of thought is the presence (for CT) and absence (UT) of attention. During conscious thought, attention is directed toward the task at hand, and the problem is thoroughly considered before making a decision. In contrast, unconscious thought is characterized by attention that is directed away from the problem, and the problem is considered through processes outside of conscious awareness, resulting in deliberation-without-attention. The empirical support for UTT is still evolving, with some authors claiming that the early findings in support of DWAE can be explained by other mechanisms (e.g., Lassiter, Lindberg, González-Vallejo, Bellezza, Phillips, 2009). However, the basic idea remains powerful -there is simply no reason to suppose that explicit conscious processing should always be more effective than implicit unconscious processing.
According to the work of Gigerenzer and his group (1999), simple heuristics often perform better in complex decision situations than more elaborate strategies, perhaps because they can implicitly select the most relevant and robust aspects of the situations instead of explicitly taking all details into account.
The distinction between implicit and explicit processing is an important issue in consciousness research because it can provide for a better understanding of the processes underlying human behaviour. A fundamental question here concerns the extent to which brains can implicitly process information that is subliminally presented. Subliminal perception refers to a person's ability to perceive and respond to stimuli that are presented below the threshold for conscious perception.
Numerous studies and reviews of subliminal perception support the contention that some stimuli that are not consciously identified can influence the affective reactions of individuals, their brain activity and behavioural performance (Tsushima, Sasaki & Watanabe, 2006;Dupoux, De Gardelle & Kouider, 2008). Subliminal priming studies indicate that masked words can activate cognitive processes associated with the meanings of the words and thus have an influence at both perceptual and semantic levels (Allport, 1977, Kouider & Dehaene 2007. Masking experiments with Chinese characters demonstrate such effects already at the first stages of processing of visual information (Elze, Song, Stollhoff, Jost, 2011). Priming from masked stimuli has been shown not only with words, but also with auditory stimuli, pictures and videos. In a study on the effects of subliminal smiles on evaluative judgements (Tamir, Robinson, Clore, Martin & Whitaker, 2004), participants were led to believe that they were playing a competitive game in alternative runs and were then asked to passively watch the videos of their performances and those of their competitor. While watching the videos, subliminal smiles or frowns were flashed (for 32 milliseconds) and immediately masked. The results revealed that participants rated their performances as being better when they were exposed to smiling faces, whilst their opponent's performances were paired with frowning ones, creating the belief that they were doing well and their opponent were doing poorly.
There is also plentiful support for the notion that people's performance on learning tasks is affected by subliminal stimuli (Cleeremans & Sarrazin, 2007). This nonconscious acquisition of information allows for the development of procedural knowledge that may be important for a wide variety of cognitive functions (e.g., drawing inferences, and triggering emotional reactions) and evident in performance rather than in conscious recall. Lewicki, Hill and Czyzewska (1992) suggest that the procedural knowledge about complex information involves an advanced and structurally more complex organization that exceeds the one used in any consciously controlled inferences. These authors concluded that "our nonconscious information-processing system appears to be incomparably more able to process formally complex knowledge structures, faster, and 'smarter' overall than our ability to think and identify meanings of stimuli in a consciously controlled manner" [p. 801].
More recently, many other researchers have investigated the superiority of unconscious thought in processing, integrating, and searching through massive quantities of complex information (Payne, Samper, Bettman, & Luce, 2008;Smith, Dijksterhuis & Wigboldus, 2008). Apart from optimising complex decisions, implicit processing has also been found to facilitate divergent thinking, i.e. accessing a wide range of prestored information to produce more novel ideas, which improves the creative quality of unconscious thought (Zhong, Dijksterhuis & Galinsky, 2008).
Given the considerable evidence on the important role played by unconscious cognition and perception in the presence of rich information, it is not surprising that this concept is increasingly gaining interest in a broad range of domains where effective responses to large information and complex decision making is required. For example, in many professional occupations, such as air traffic control and stock market trading, a model of rapid decision making called 'recognition-primed decision' (RPD) is often employed to interpret and facilitate the flow of decision making. The RPD model stipulates that people integrate vast numbers of experiences within a given domain into patterns and recognise situational patterns within that domain. When faced with complex information or situations, the situational pattern recognition process should allow them to make fast decisions without invoking full awareness and conscious deliberation (Klein, 2008).

CEEDs Project
Recent years have seen the physical environment becoming more and more saturated with sensing technologies that interact with users. More and more computing and communication entities are capable of sensing information from both the environment and users and responding to appropriate stimuli. A major problem raised by this technology-rich scenario is the rapid increase in the amount of information available and how to comprehend this information (i.e., the so called big data challenge). Experts in many specialist areas have to deal with large-scale, diverse and complex data sets which require efficient data-intensive decision-making, and they mostly rely on computational, statistical, data mining tools and techniques to find meaning in these large data sets. The big data phenomenon emphasises a need for research driving a new category of interfaces that considers the human implicit responses as an additional source of information to support human experience, understanding, and effective decision making in the context of large data sets.
Based on this accumulating evidence for influence of implicit forms of information processing in shaping explicit conscious experience, and in guiding our action and decision making, a European Commission Framework 7 funded project called Collective Experience of Empathic Data Systems (CEEDs) has, over the last four years, researched and developed new tools for interacting with rich and complex information that will assist our everyday decision making and information foraging Wagner et al. 2013;Lessiter, Miotto, Freeman, Verschure, Bernardet, 2011;Verschure, 2011). To address this goal, the CEEDs technology monitors users' implicit responses when they are experiencing innovative visualisations of large data sets. Using this information, CEEDs technology automatically infers the extent to which users are surprised, satisfied, interested or engaged by a part of the data, even if they are not aware of being so. These implicit responses then guide users to discover patterns and meaning in the data sets.
The gearing mechanism that is at the core of this technology is called the CEEDs 'Sentient Agent' (CSA), a computational model of the implicit and explicit factors underlying consciousness, which derives indices of unconscious processing such as heart rate, skin conductance (arousal), pupil dilation and brain activity and defines how these feedback cues are presented to users to guide them through a complex data space. The CSA is modelled after the Distributed Adaptive Control theory of mind and brain (See Verschure 2012 for a review).
The overall user experience when interacting with large data sets is thus enhanced by CEEDs by merging the initial data representations with the changes made interactively due to user's implicit (and explicit, e.g., gesture, verbal responses/speech, motion) cues as well as the intentional actions of the CSA based on predictions about the user's behaviour. Additionally, the behaviour of multiple users can be combined to collectively control the data presentation.
Among the wide range of data-rich domains that could benefit from a more human experience centred solution to explore massive volumes of information, the CEEDs project identified a number of real world applications that are being used as test-beds for validating its approach and effectiveness at both scientific and technological levels. Below is a brief description on how implicit user reactions are combined with explicit reactions, and used to guide discovery of patterns and meaning in selected CEEDs-relevant domains.
-Archaeology: for archaeologists, the ability to sift automatically through pottery samples recovered from ancient cities is becoming a necessity. Their classification is manually done by pottery experts and archaeology students under their supervision, who methodically analyse piece by piece and classify these into the established type/variety classification system. Not surprisingly, a huge effort is put in this process, which can take from weeks to years to complete for a typical site. This also leads to the fundamental challenge of how to integrate these residues of ancient cultures into an understanding of their patterns of organization and interaction (such as functional zoning within ancient cities). Currently this remains intuitive and problematic and solutions beyond standard technological solutions are needed. In order to address these challenges, the CEEDs project proposes a two-step process based on sherd classification and statistical analysis of pottery dataset (Piccoli et al., 2013). Firstly, eye and gaze tracking are used to identify the parts of the sherd which gained most of the user's attention when attempting to classify it. These parts are matched by CEEDs with, for example, pots that share similar parts and shapes, and as a result, the most similar pots are retrieved and returned one by one to the user. CEEDs then assesses user satisfaction to the presented pots and filters out those with which the user is not satisfied, thus reducing the time of pottery classification. Secondly, collections of artifacts used to study cities from the whole of their surfaces can be data mined using statistical packages to reveal clusters of associations of artifact types with particular areas of the town. The clusters revealed scientifically can be compared with intuitive groupings based on experts' intuitive functional classes, and this could confirm or challenge the traditional expert system approach to mapping the infrastructure of the ancient site. -Neuroscience: a goal for current neuroscience is to describe the "connectome" (Sporns, 2011), a comprehensive map of neural connections in the brain. Current versions of the connectome represent complex networks composed of hundreds of brain regions and their white-matter connection pathways, which are impossible to understand without the aid of models and data analysis techniques. The CEEDS neuroscience application supports the visualisation of large connectome data sets in virtual, synthetic reality environments. This allows scientists to virtually explore the intricate brain connectivity (both at the structural and functional level) and disclose different layers of complexity, test dynamical models and manipulate them in real time. These manipulations are displayed in an immersive real-time environment to support new discoveries, in particular, patterns of activation in the brain (see Figure 11.1). Figure 11.1: CEEDs eXperience Induction Machine, an immersive space equipped with a number of sensors and effectors (retrieved with permission from "BrainX 3 : embodied exploration of neural data" by Betella et al., 2014) For example, immersed in the CEEDs eXperience Induction Machine (CXIM), a neuroscientist can explore the connectome to better understand brain structures and dynamics. She might do this by navigating in the CXIM to move her own body towards the desired point in the dataset representation. Her behaviour will be constantly analysed by the CSA, which interprets the user's implicit (e.g., galvanic skin response, hear rate, pupil dilation) and explicit (e.g., CXIM tracking data, gaze) signals obtained using special sensing technology worn by the user (e.g., sensing glove, shirt, eye tracking) or present in the CXIM environment (e.g., full body tracking system). The CSA uses these cues to build a model of arousal/cognitive load of the user and changes the data presentation accordingly. For example, when the user's arousal increases, the system records the position of the user in the dataset (location, gaze, options enabled, and path) and presents other sub-datasets with similar features. In another example, high levels of arousal or eye movement patterns signalling attentional overload would be used to 'dial down' the overall complexity of the dataset presentation, for example by thresholding the level at which connections are shown. First pilots have shown that naive users understand a connectome better in the CXIM than in the state of the art desktop tool.
-History: understanding the significance of the Holocaust remains of great importance today as it can help people engage in a critical reflection about the root causes of genocide, and in turn prevent the recurrence of such atrocities. However, the presentation of this event is problematic due to the vast and diverse amount of information available, but at the same time and in certain circumstances, to the lack of original structures remaining on memorial sites. For example, in concentration camps such as Bergen-Belsen in Germany, visitors have to rely upon their own imagination and interpretation of available information in order to picture the camp and events in their mind, as the site is now largely an empty field. This induces a dissonance which subverts the visitor's intention to re-imagine the events of the past. To address this problem, the CEEDS project developed a 'history' application that supports the acquisition and presentation of data that represents key aspects of both the Bergen-Belsen concentration camp and the holocaust. A virtual reconstruction of the space was developed to support public awareness and understanding of the significance of Bergen-Belsen. A mobile application (Figure 11.2) was also developed that allows visitors to navigate through a virtual reconstruction of the concentration camp displayed on a handheld device (iPad) while moving through the memorial site. Visitors can decide to follow different subject perspectives (e.g., survivors, victims, liberators of the camp, or guards) and discover factual information from these recollections. Implicit information regarding visitor locations can be used by CEEDs to provide visitors with relevant information when approaching locations significant to the selected subject perspective. Figure 11.2: CEEDs 'history' application (retrieved with permission from "Spatializing experience: a framework for the geolocalization, visualization and exploration of historical data using VR/AR technologies" by Pacheco et al., 2014) -Product design and commercial retail: the ongoing embrace of virtual, interactive media technology in product design and manufacturing opens new possibilities for testing ergonomic design by providing visual representations and exploration of products and prototypes. As part of this process, the CEEDs approach can be applied to support designers' understanding of their customers' preferences. In a hypothetical research session, potential customers can be located in front of an interactive large screen in one space within the design houses and interactively explore the range of virtual products, e.g., white goods. A design team in a second room is looking at the same display, and see what customers are looking at, how long they inspect areas, where they point, and where something looks odd or inconsistent with expectations. When appropriate, the design team vary the force required to turn the control (e.g., of what fridges or elements of fridges customers view) in real time and observes how this affects customers' implicit responses (e.g., interest, valence). The CEEDs system may use real-time information of customers to personalise the interactive experience and also induce the user to follow particular paths/journeys through the experience of the product, guided by subliminal stimuli. Indeed, our preliminary research suggests that subliminal priming is effective in guiding people through data in the CXIM (Cetnarski, Betella, Prins, Koudier & Verschure, 2014). This approach can be applied not only to design concept feedback, but to other scenarios including marketing and sales training, and end consumer information delivery. For example, an objective is to explore the use of CEEDS technologies to allow customers to develop CEEDS Universal Personal Preferences (CUPP) in retail environments. CUPPs will contain the explicit and implicit preferences of users with respect to their interests and behaviours. With the user's permission, CUPP files may be accessed by customers through their personal smartphones to receive services that are more tailored to their historical behaviours and interests.

A Unified High Level Conceptualisation of CEEDs
With such a diverse range of applications, a unified framework for CEEDs was needed to provide a better understanding of the commonalities across all real world applications, and to support the development of in-scope application scenarios, use cases and goals. The process, driven by open discussions with stakeholders and with CEEDs members, and critical and creative thinking, led to a unified high level conceptualisation of CEEDs uses and a more precise formalisation of what CEEDs 'is'. The resulting framework consisted of two core use cases which specify how CEEDs may be used by potential users from different domains. These two core use cases are.
1. To communicate known meaning (e.g., transferring knowledge to students): with sensitivity to user's state, including to components of state of which user is not aware; 2. To enable discovery (e.g., supporting experts in the discovery of new patterns in datasets): exploiting recognition of user responses (implicit and explicit) according to the user model.
The CEEDs use cases are oriented towards, and may be implemented in, a wide range of domains that deal with complex and very large datasets, and that are in need of better analysis tools. Whether for archaeology, neuroscience, commercial or historical data, the datasets are also characterised as having structures which, without augmentation, are hard to conceptualise. The use cases breaks down to five primary interdependent components of CEEDs experiences (identified as CEEDs 'core features') and two associated databases. The core features (CFs) are outlined below along with examples of how such features could be applied to the neuroscience scenario described above.
-Associated Database 1: Raw Data Database (RDDB). Exploring large datasets is fundamental to the primary objective of CEEDs. Any CEEDs experience requires an existing raw database from which data are represented (e.g., visualised) and displayed to the user. -Core Feature 1: The display of a CSA-independent filtered view, perspective or flow of RDDB. CF1 defines the treatment of the raw data as it is displayed to users. It relates to the rules governing cue sequences including how the data is presented, the starting point and route taken. Importantly, CF1 is defined by its independence from the CSA, i.e., how the data is displayed does not require a user model derived from the CSA. Thus, passive sequences of data (akin to a 'fly-through') can be determined by outcomes of non-CSA variables such as 'sort' or 'match' (e.g., typologies), a directorial/producer preference, or a random sequence, and the data can be contextualised using a developer-designed virtual reconstruction. Active interactions between the user and the data displayed would be possible based on (reactive) rules specifying interaction paradigms (e.g., hand flick gesture to browse through object sequences). The way in which the data are presented provides the problem/data space. Example: a complex map of neural connections in the brain is visualised to a neuroscientist inside the CXIM; the neuroscientist can now interact naturally with the visualised data with gestures and body movements.
-Core Feature 2: The collection and storage of users' explicit and/or implicit responses to a dataset. In a CEEDs experience, users respond to datasets based on the output of CF1 or the output of CF4 (tagged dataset), and CF2 reflects the collection and storage of these responses. Raw user responses (e.g., GSR, ECG) are essential prerequisites for: (a) inferring how the user unconsciously interprets the data (i.e., CF3); (b) the CSA to build a user model (defined in CF4) and (c) user response 'overlays' (review) (defined in CF5). Example: while the neuroscientist actively engages with the data by changing the presentation properties (e.g., rotating the brain left and right, zooming in and out), his cognitive workload and arousal levels are collected and stored. -Core Feature 3: The interpretation and storage of the output of CF2. Raw user responses require interpretation in order to establish whether the display has provoked in the user the desired response, which are variable across application scenarios/goals, and to understand what type of response the stimuli elicit (e.g., does the user's responses indicate implicit satisfaction?). In CF3, meaning is inferred through analysis of the pattern of user response data inputs from multiple sources (e.g., EEG, GSR). This information is used to drive CF4 and CF5. Example: the combination of the neuroscientist's physiological reactions to the abstract representation of data suggests that the student is not focused or overloaded with information.
-Associated Database 2: User Response Database. The CF-URDB stores outputs of CF2 and CF3, and in relation to the raw data (CF-RDDB input to CF1), this information is input to CF4. -Core Feature 4: The display based on a user model of a CSA-dependent view, perspective or flow of a raw dataset. The autonomous CSA is a real-time goal driven agent which can control the data displayed and guide data exploration. The CSA coordinates the interaction between the user and the problem/data space. It does this by constructing a user model based on the outputs of CF2 and CF3 and, together with its own interests and intentions, modifies the display to guide users in their data exploration. CF4 defines the presentation of this real time CSA-influenced dynamic perspective of the raw dataset. As with CF1, cue sequences are rule based, but unlike CF1, in CF4 the rules are dependent on the CSA which may include subliminal and supraliminal influence to guide users through the data. This could be based on, for instance sort, match or typology functions (e.g., if the goal is to maintain a threshold level of interest or empathy). CF4 is analogous to car satnav systems by which a route is plotted and specified to the driver ("turn left") based on metadata of which the user is unaware (e.g., traffic congestion); that is, the metadata influences the presentation of raw data. Example: the system adapts the visualization and the interaction according to a predefined set of rules to avoid information overload (e.g., increasing the saliency of objects, reducing the field of view of the data, reducing the complexity of the dataset) and boost the exploration process.
-Core Feature 5: The display of users' responses and/or the data on which the CSA is making decisions as an overlay to the output of CF1 or CF4. CF5 defines an alternative representation of the raw data (CF1) or tagged data (CF4) by overlaying the outputs of CF2, CF3 to allow the user to access an overview perspective. This could be used in contexts where a user wishes to see which user data (responses) have influenced the display, for instance, a professional examining the responses of a group of users, learning how experts classify stimuli, debugging, and general data exploration. In this sense, in contrast to CF4 in which metadata influences the display without the user being consciously aware of the relationships between their inputs and the output of the display, in CF5, the metadata is displayed. Analogous to a car satnav system, CF5 is where the driver can see the traffic congestion data in addition to, or instead of, being provided with instructions based on those data. Example: both the information used by the system and the neuroscientist's implicit responses are made available to the user in order to provide transparency and insight in subconscious processing and decision making.
Across the implementation domains chosen by the project, the CEEDs research project has also uncovered evidence of consistency, for instance, among similar broad classes of users. These were identified and labelled as primary end users (or interactors) and beneficiaries. The former are those users who directly use and interact with the system. For instance, customers are supported in their product choices by CEEDs offering a personalised service based on their own (stored and/or real time) unconscious desires and preferences. As an alternative example, consider a team of neuroscientists attempting to validate/refute models to explain patterns of data. They are supported in this discovery process by CEEDs technology because it harnesses their unconscious responses to different visualisations of those models with the data. Primary CEEDs end users could be both expert/professional users as well as novices.
On the other hand, other stakeholder goals suggested that some CEEDs users could be more correctly classified as CEEDs beneficiaries as they are (secondary) CEEDs users of others' data. These are characterised as CEEDs system/database owners who can analyse end user responses to data in all sorts of ways. Beneficiaries could use CEEDs user data to optimise displays for different goals (e.g., learning, empathy, sales); predicting and influencing a user's behaviour by understanding their states/plans/ intentions in a given context. For instance, design teams may be beneficiaries if they explore their customers' implicit reactions to products to improve product design.

Conclusion
This chapter has provided an overview of the new scenarios enabled by the seamless integration between humans and intelligent technology. This synergistic interaction allows for novel solutions in the fields of data mining and knowledge discovery, and in particular to complex situations and environments requiring difficult decisions and rapid responses. Research has shown that unconscious cognition and perception to play an important role in human understanding of rich, complex information. The CEEDs project has adopted an innovative approach to this topic, which relies on two steps: a) monitoring signals of discovery or surprise in unconscious processes when people are experiencing visualisations of large data sets, and b) using such signals to direct users to areas of potential interest and guide meaning within the dataset, in real time. The CEEDs approach can be applied in a wide range of scenarios where performance and 'discovery' are hampered by a deluge of data. The data deluge is a source of increasing difficulty in analysis and sense making of data in fields as diverse as astronomy, neuroscience, archaeology, history and economics.
To support identification of use cases and scenarios that are not only possible but also in-scope for CEEDs, a framework of core features that apply across all applications was developed, which captures discrete components of any CEEDs experience. A shared understanding of the commonalities across all CEEDs applications is also important for identifying what can broadly be achieved with CEEDs-like technology independent of any implementation domain.
The broad consistency in the goals that CEEDs technology could support (i.e., supporting insight and adaptability to users' responses to data), in the core elements shaping the CEEDs experience, and in the characteristics of its users, makes CEEDs a unique system, and provides a high level framework that may be used as conceptual inspiration and the basis for future research and development in the new field of human computer confluence, and symbiotic interaction. CEEDS contributes to addressing the question of how people will interface with complex data and the systems that generate and analyse it, by placing human experience at the centre of the solution, thus breaking new ground in the shaping of a synergy between human and machine.