Workflows and individual differences during visually guided routine tasks in a road traffic management control room

Road traf ﬁ c control rooms rely on human operators to monitor and interact with information presented on multiple displays. Past studies have found inconsistent use of available visual information sources in such settings across different domains. In this study, we aimed to broaden the understanding of observer behaviour in control rooms by analysing a case study in road traf ﬁ c control. We conducted a ﬁ eld study in a live road traf ﬁ c control room where ﬁ ve operators responded to incidents while wearing a mobile eye tracker. Using qualitative and quantitative approaches, we investigated the operators ’ work ﬂ ow using ergonomics methods and quanti ﬁ ed visual information sampling. We found that individuals showed differing preferences for viewing modalities and weighting of task components, with a strong coupling between eye and head movement. For the quantitative analysis of the eye tracking data, we propose a number of metrics which may prove useful to compare visual sampling behaviour across domains in future. © 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Control room design: fundamentals and challenges
Ergonomic control room design requires consideration of many factors such as personnel, systems design or equipment layout (Wood, 2004), is subject to standards such as ISO 11064 (part 1e7) or BS EN ISO 6385 and is discussed by substantial literature (Noyes and Bransby, 2001;Ivergard and Hunt, 2008). An important ergonomic design consideration for control rooms is that the available technology has to be aligned with human behaviour, requirements and limitations (Hughes and Kornowa-Weichel, 2004); this helps to maximize the efficiency and effectiveness of operators. Human information processing in control rooms has received scientific interest in domains as varied as air traffic control (Stein, 1992;Endsley and Rodgers, 1996), airplane cockpit design (Steelman et al., 2011), nuclear power plant control (Chang Hoon et al., 2006;Kim et al., 2013) or monitoring of CCTV (Howard et al., 2011;Stainer et al., 2013). Human error as a consequence of insufficient ergonomic workspace design has arguably led to catastrophic accidents such as Three Mile Island, Chernobyl and Bhopal (Meshkati, 1991). The causes for human error are manifold (Kirwan, 1992;Reason, 2000;Dekker, 2014). A possible contributing factor to human error is the incomplete use or false interpretation of visual information; this has for example received substantial interest in medical image analysis (Krupinski, 2010), but also in air traffic control (Stein, 1992) or cockpit design (Hanson, 2004). Furthermore, people might not use available visual resources as expected by designers; an example is the discrepancy between expected and observed behaviour in CCTV control rooms (Smith, 2004) or the selective use of available information sources by air traffic controllers (Stein, 1992;Seok et al., 2006). Analysis of operators' visual workflows and preferences allows understanding of such discrepancies and helps designing towards reliable resource usage.
Most control rooms are built around the presentation of visual information through a multitude of display screens (H enique et al., 2008;Ivergard and Hunt, 2008;Stanton et al., 2009). In air traffic control, visual information perception has been described as most crucial next to voice communication (Meyer et al., 2013), and this extrapolates to many other control room domains. Activities of operators in control rooms range from making predictions about criminal activity in CCTV footage (Troscianko et al., 2004), to monitoring and control of process control plants (Kim et al., 2013) or evaluating risks for collision in air traffic control (Landry, 2011). In each domain, a core activity of the operator involves interpreting the visual information that is displayed to them in order to infer the state of the system. In order to understand how this information is used, studies from the domain of psychology and ergonomics can apply a methodology called 'eye tracking', which allows to measure where operators are looking, what information sources they use and how they combine information.

Understanding user behaviour and cognitive processes through eye tracking
Humans have to move their eyes to attend to relevant information sources because only central vision provides a high resolution and sharp rendering of a scene (Henderson, 2003;Land, 2006;Borji and Itti, 2014). This is due to the distribution of photoreceptors within the retina and the non-linear representation of this information in the visual cortex (Snowden et al., 2012). Eye tracking allows recording where someone is looking with central vision, which is called the 'point of gaze'; it has been used as part of study design in thousands, if not tens of thousands, of research papers (Tien et al., 2014) and has long been used to inform user interface design (Jacob and Karn, 2003;Poole and Ball, 2006). For example, scan patterns (the sequence of attended regions of interest) and fixation duration can be used to identify sub-optimal layout of interfaces or the perceived importance of individual user interface elements (Pohl et al., 2009;Burch et al., 2011).
The use of eye-tracking to study operator activity and decision making has been employed in a range of control room domains (Moray and Rotenberg, 1989;Lin et al., 2003;Shepley et al., 2009;Moore and Gugerty, 2010). Eye tracking has also been used with a view to detecting operator fatigue and impairment of visual vigilance (McIntire et al., 2014a,b). Moray and Rotenberg (1989) for example demonstrated that, when dealing with incidents, operators tend to increase the frequency of fixations on the failed system component, rather than increasing the duration of fixations, and that information processing becomes restricted to one information source at the expense of attending to subsequent or parallel incidents. This suggests that operators might not optimally use available resources (Smith, 2004). Analysis of operators' workflows and preferences could aid the understanding of such discrepancies. To date, reports of visual scanning behaviour (the act of moving the point of gaze across a scene) across a wide range of domains is lacking and hence conclusions are often drawn on a case-by-case basis.
Gaze shifts can be executed by eye movement alone or accompanied by head movement (Wollaston, 1824;Bizzi et al., 1972;Morasso et al., 1977;Zangemeister and Stark, 1982;Guitton and Volle, 1987;Goossens and Opstal, 1997;Oommen and Stahl, 2005). Gaze shifts larger than 45 e50 visual angle have to be executed by head movement simply because the eyes do not rotate further within the head (Proudlock and Gottlob, 2007;Freedman, 2008). Gaze shifts larger than 75 e90 additionally need rotation of the upper body due to the functional limits of head/neck rotation (Proudlock and Gottlob, 2007). For example, in a study of visual search in a mail room, 80% of search time was spent moving head and body rather than only the eyes (Foulsham et al., 2014). Recent work has highlighted that aligning eyes and head, rather than diverting gaze laterally, results in better performance during visual search tasks (Nakashima and Shioiri, 2014). Hence, aligning eye and head orientation is likely beneficial for cognitive information processing. The interaction between eye-, head and arm movements has previously been investigated in air traffic control (Boyer, 1995) and so it would be interesting to determine the relationship between eye-and head-movement in other control room environments. For this paper, the control room environment under consideration concerns road traffic management.

Road traffic control rooms: purpose and goals
Road traffic management involves the monitoring of traffic, responding to incidents and influencing road user behaviour. Given that incidents can contribute to some 25% of the overall congestion levels on major roads (UK Highways Agency, 2009), it is important that any incident is resolved as quickly as possible. Regional Control Centres, such as the 'Direction Interd epartementale des Routes Centre-Est' (DIR-CE) in Grenoble, France, are the central focus of communications regarding major roads. They monitor traffic flow (through CCTV, through verbal reports or through sensor data from the roads or vehicles) and control the Variable Message Signs on these roads. In broad terms, the goals of such a centre can be summarized as follows (Folds et al., 1993): i.) maximise the available capacity of the roadway system; ii.) minimise the impact of incidents (accidents, debris, etc.); iii.) contribute to demand regulation; iv.) assist in the provision of emergency services; and v.) maintain public confidence in the control centre operations and information provision.

Aims and scope of this study
The present study is a case study, constrained by the availability of staff and environmental factors. Aim of the study was to i) present insight into operator behaviour in a road traffic control room, both from the perspective of qualitative work analysis and quantitative visual sampling analysis, and ii) present a number of eye tracking metrics which we deem useful to compare visual sampling behaviour across domains in future. We use a Hierarchical Task Analysis (HTA) and eye tracking to study information sampling behaviour and workflow in a road traffic management control room. Visual scanning via eye-and head movement forms the sensory foundation for decision making and actions in control rooms and would benefit from further exploration, especially in context of user interface-and control room design. The present study provides a rare reference dataset on visual scanning behaviour in a fully operational road traffic management facility.

Control room layout
Data for this study were collected at the road traffic management facility at DIR Centre Est, Grenoble, France. The control room under investigation for this study consisted of multiple displays (Fig. 1). In front of the operator, at arm's length, is an arced arrangement of five monitors, containing the following information sources and components: D1 ("Display 1") -generic display with access to internet and software packages. D2 ("Display 2") -user interface (UI) for incident logs. The workflow of operators is systematically guided by this UI, which contains for example dropdown menus and text entry fields to document incident details. D3 ("Display 3") -interactive schematic map of the road traffic network. D4 ("Display 4") -live CCTV feed, which the operator can select from a number of available feeds. The selected camera can be controlled by the operator through zooming, panning and rotating. D5 ("Display 5") contains an auxiliary schematic interface In the far field are two panels displaying CCTV information: SS ("Small Screens") -bank of eight CCTV feeds arranged in two columns, presenting colour CCTV footage which can be interacted with from the operator's desk (D4). This display is located approximately 5 m away from the operator. BS ("Big Screens") is a large projection of 16 CCTV feeds, arranged as a 4 Â 4 array. This display is not interactive and camera views are fixed. This display is located at the same distance as SS.
On the table (TA) in front of the operator are standard PC peripherals and phones/radios (PH) for communication with stakeholders from outside the facility such as traffic patrol staff and emergency services personnel.

Scenario
DIR Centre Est is a fully operational, live traffic control centre, hence forbidding the interruption of incoming data with a prerecorded scenario. Instead, we captured the resolution of two live incidents (one broken down lorry and one broken down motorbike) by two different observers and asked another three operators to engage in a simulated task ('object in the road') during a period of less intense activity. The live incident concerning the lorry presented a situation where a lorry had broken down near an exit on a main road, causing congestion ( Fig. 2 a). This incident took 15 min to resolve and included liaising with authorities, managing traffic and status updates. The live incident concerning the broken down motorbike presented a situation where a motorbike presumably had an accident and was in a road close to a slip road on a quiet road ( Fig. 2 b). This incident took 6 min to resolve and included communication with the police, situation monitoring and status updates. In the simulated task of 'object in road', operators were asked -after a pretend call -to resolve a situation where an object was found in a road. The rationale behind this simulated approach was that operators were assumed to navigate this task in a manner representative of their 'average' behaviour, while visual search would be required independent of the presence or absence of an object in the road.

Approach
To understand visual sampling behaviour in context of a person's goal structure, a Hierarchical Task Analysis (HTA) was performed, which is a common technique for goal decomposition in the UK (Annett et al., 1971;Shepherd, 2001). Through a combination of observing operators and discussion with subject matter experts (i.e., the operators, their managers and staff in other control centres), we identified those subgoals that are important in the activity of Road Traffic Management.

Identified subgoals and plans
For the operators' main goal of 'respond to incident' (goal 0), the following seven subgoals were identified (the full HTA is shown in Fig. 3): Subgoal 1: Receive Notification. The operator typically responds to an incident notification at the start of an engagement. This could arrive through different media (phone, radio, auto-detect from CCTV etc.) and the operator would make sure that the notification was from a credible source. If the incident is felt to be sufficient to require a response, an incident log is created.
Subgoal 2: Determine incident type. The operator needs to classify the incident in terms of its type. The operators spoke of {Accidents, congestion, obstacles and incidents, road works} as examples of incident type. The initial notification would have provided some information about the type of incident but the operator might seek further clarification from CTTV or from colleagues at the scene over the phone. Having classified the incident, the operator updates the incident log.
Subgoal 3: Determine incident location. In this control room, there are two distinct strategies for determining incident location.  The operator could, on the basis of information in the initial incident report, select a CCTV camera close to the incident and then use this to look for the incident. If the CCTV shows the incident then the location of the CCTV and the nearest exit are recorded. Alternatively, the operator might refer to the schematic map screen to define the most likely location and then use this information to select a CCTV to confirm this location. While the tasks are not complicated, this shows how operators could develop different strategies for achieving the same subgoal.
Subgoal 4: Determine incident impact. Having defined a specific incident at a specific location, the next subgoal is to decide what impact this incident will have on the performance of the road network. This can involve the operator relating the type of incident to particular consequences. The operators spoke of {risk, safety, journey time, average speed, traffic density and congestion, changes in demands on the road}. An understanding of the factors which contribute to the current situation, such as weather, road conditions, traffic conditions, and how these factors are likely to change during the course of incident, influences the operators' ability to develop a coherent mental representation of the situation, respectively situation awareness (Endsley and Rodgers, 1996;Moore and Gugerty, 2010).
Subgoal 5: Initiate response. As the operator completes the incident log, the software system used by the centre shows the options available regarding the type of incident. The operator could accept the system prompt or could select another option. The response could involve changing the content of overhead signs on the road (either in terms of changing speed limits, indicating lane closures or providing advisory signs). The content of the signs is pre-defined and the operator selects from this set. If the incident cannot be dealt with by a sign from the set, the operator is not empowered to improvise or create new content; rather, new content needs to be approved by the relevant agencies. As well as modifying the signs, the operator can limit access to the road network (through control of junctions). Finally, the operator could call for road-side assistance to attend the incident. This assistance could help the vehicle or could be specialised support (paramedic or technical).
Subgoal 6: Monitor road user compliance. When a response has been initiated, the operator will check that the incident has been resolved and also that the road users affected by the response are complying with it.
Subgoal 7: Close incident log. Once the incident has been resolved, the operator closes the incident log. We noted that operators (and the control room) would have some incidents that remained open. This might be because it was taking longer to resolve than expected due to unforeseen circumstances or because the incident was open for a scheduled reason, such as road works. Fig. 4 shows the correlation between smoothed gaze data and the different HTA stages and actions for three example participants. Gaze data were interpolated to 0e100% task time. For bins of 1% increments, the ROI is shown which was attended to most within the bin. Example plans for the identified subgoals are shown in Table 1.

Participants
The DIRCE control room is staffed by six trained personnel (plus two managers), working two shifts (7am to 3pm and 2.30pme9.30pm). Over the course of a single day, the study recruited five out of six staff members. Participants ranged in age from 35 to 40 years; four participants were male and one was female. Personnel had been working for DIR-CE for at least 5 years.
Personnel had been recruited for the role by competition: the criteria for selection were rigorous people who have good analytical skills and excellent responsiveness.

Collection of eye tracking data
This study was approved by the University of Birmingham Ethics Panel (Reference Number ERN_13e0997). All participants received an explanation of the task. Each operator was equipped with a monocular mobile eye tracker (Tobii Glasses v.1, Tobii, Sweden). The sampling frequency of the system was 30 Hz. Data were collected according to the system's standard operating procedures, including a 9-point calibration and placement of infrared reference markers around the regions of interest (ROIs) within the control room. Data collection time was around 6 min with the exception of one real incident (participant 1) that lasted almost three times as long.

Data mapping
The recorded raw gaze data were automatically mapped to global space using the subset of infrared (IR) reference markers visible in each frame through custom-written code and a set of conditional pre-processing steps (please refer to Supplementary Material 1 for technical details). Mapping had to be executed outside the proprietary software, since due to the room layout only 2 IR markers were captured in a substantial number of frames, while the proprietary software requires 3 visible IR markers. Head orientation was approximated by mapping the ROI in which the centre of the CMOS was located to each frame. The analysed data range started with an incoming call reporting an incident and finished with the operator indicating task completion and finishing entries on the incident log (on screen D2). Times during which operators engaged in unrelated activity, such as talking to a colleague about unrelated matters, were cropped out.

Data analysis
The scan pattern of each participant, respectively the sequence of attended regions of interest (ROIs), was analysed to quantify how long individual operators attend to specific information sources, how often they switch between sources and how information sources were combined.

Viewing behaviour
We calculated several metrics for analysis of the gaze data, some of which are conventional eye tracking metrics (Holmqvist et al., 2011) and some of which were borrowed from network analysis (Scott, 2013). Network analysis metrics were chosen to quantify how information sources were combined: network analysis facilitates the quantification of connections, rather than sequential or cumulative properties of a scan pattern.
Conventional metrics: 1. Cumulative percentage viewing time per ROI. Sum of frames for which a participant attended to an individual ROI divided by the sum of all frames for which eye tracking data was available. This metric shows how e in total e participants split their time between the available information sources. 2. Number of visits per ROI. Count of each dwell (continuous attention) per ROI for ! 5 consecutive frames (167 ms; based on the minimum viewing time of 150 ms required to process and understand a complex pattern (Thorpe et al., 1996)). This metric shows how often a participant returned to an information source. 3. Dwell time. Time rested per visit, here calculated across all ROIs.
This metric gives an indication how long participants rest on an information source before attending to the next one.
Network analysis metrics were calculated using the freely available 'Matlab toolbox for network analysis' released by MIT 1 or self-written code. The terminology in network analysis is as follows: node e point in network, equivalent to a region of interest/ ROI; edge -connection between two nodes, equivalent to a gaze shift; undirected network -independent of gaze shift direction, e.g. a transition of D1 to D2 or D2 to D1 is treated as the same; directed network e the gaze shift direction does matter. The network analysis based metrics were: 4. Number of edges/link density and inclusiveness. Link density is a measure of the number of edges within the network as a fraction of the total number of possible edges. Inclusiveness counts the number of nodes which are connected within a network, respectively attended to. Link density quantifies how an observer sequentially combines information sources: a low link density indicates that an observer attends to sources in a very ordered and cyclical way. In contrast, a high link density suggests a random scan pattern, as all sources are combined in a non-systematic way and are hence connected to many other sources. Inclusiveness is a simple measure of the number of ROIs which an observer attended to; if gaze was not directed at a ROI, it would not be included in the network. Nodal degree allows exploring the function of different ROIs within the observer's work pattern: a ROI with a high nodal degree could for example be central to the work process as it may require the assimilation of information from several different ROIs, or it could indicate that the ROI holds information relevant to several other ROIs. A ROI with a low nodal degree would in contrast indicate that the information source is only attended to for very specific purposes. 6. Leaf nodes. Count of nodes that are only connected to one other node. This metric identifies ROIs that serve a very specific function, such as a CCTV monitor attended to only to check entries on a schematic diagram.
Further to the quantitative metrics, we constructed what we term 'viewing networks', a schematic visualization of the frequency of connections between any two ROIs; this type of visualization is often used in network analysis and gives an intuitive insight into which nodes are combined how frequently.

Eye-and head movement
In order to quantify the movement frequency for eyes and head when switching between different ROIs as well as the coupling between eye and head orientation, we calculated the following metrics: 1. ROI switch frequency. Frequency of gaze shifts between the available ROIs, calculated as average values across the whole trial. 2. Agreement between ROI attended to by eyes and head. Number of all frames in which eye and head orientation were assigned to the same ROI divided by the number of all tracked frames. Further, we quantified the proportion of switches between ROIs executed by both, eye and head movement: a) by approximating head movement velocity from IR marker displacement across frames and tagging those frames as 'head moving'. If a gaze shift was made while the head was tagged as moving, this was classed as concurrent eye and head movement; b) by checking for the match in the new ROI attended to by gaze and head orientation. 3. Amount of head movement. From those frames that were tagged as 'head moving' in 2 a), the percentage time that operators spent moving their head above the set threshold was calculated as the fraction of task time.

Viewing behaviour
The selection of ROIs and the cumulative percentage viewing time varied across participants (Figs. 5 and 6). For example, when examining the traffic situation, while participant 5 attended to D4 for the majority of time with hardly any time allocated to SS, the other participants attended mainly to SS and D3. The time spent viewing D3 was 7% viewing time except for participant 4, who spent 24% of viewing time on this ROI. All participants attended to the incident log (D2). According to the attention allocation above, the visit count for each ROI varied as well across participants (Table 2); the number of visits to the different ROIs was substantially higher for participant 1 due to the almost 3-fold task time.
Dwell times showed a skewed distribution as expected, median values ranging from 1.2 to 4.2 s across participants with individual dwells often lasting up to 10 s and in rare cases even 20e60 s.
A summary of the outcomes for the calculated network metrics is presented in Table 3. While values for participant 1 stood out due to the longer and more involved task, they spanned similar ranges for the remaining four participants. Number of edges and link density showed that around 20e30% of possible paths were taken while operators attended to 60e100% of ROIs. The average nodal degree was around 2e3 for undirected and 4 for directed networks and the number of leaf nodes ranged from 0 to 2. Constructed viewing networks highlighted the heterogeneity between participants, who followed different search patterns when redirecting gaze between the different displays (Fig. 6).
Example plans for the seven identified subgoals. In the notation for plans, '>' signifies "followed by" to indicate sequence, numbers indicate subgoals (i.e., nodes in the HTA), text indicates 'conditions' and each plan terminates with 'exit' (which indicates that the process moves to the next top-level subgoal). Provided are three examples for each subgoal for illustrative purposes; the set of plans is not exhaustive but intended to illustrate that operators can achieve these subgoals in a variety of ways, depending on operating conditions and strategy employed by the operator.
The median (IQR) agreement between ROIs attended to by eyes and head on a per-frame was 91 (6) %. The number of concurrent switches between ROIs was 87 (9) % based on tagging the head as moving (method 2a) and 77 (31) % based on agreement of the new ROI attended to after the switch (method 2b). The agreement between the two methods was 70 (23) %.
The prevalence of head movement ranged from 21 to 43% task time across participants, with a median (IQR) of 30 (12) %.

Operator workflow
In a situation akin to a road traffic control room, perception of the current situation primarily involves two processes. The first is the ability to collate sufficient information to define the situation. From the HTA, this information can be considered in terms of location, type and impact of an event. In routine (and planned) incidents, the definition of an 'event' can be straightforward e there is a specific and discrete incident (e.g., a collision, a broken down vehicle, an object in the road) which can be unambiguously defined in terms of the information. In this case, the ability to define the event in terms of the information is supported by the categorisation scheme applied in the options in the incident log. The second process which is relevant to perception of an event is the ability to recall previous, related examples. In our observations, this ability was supported mainly through discussion with colleagues (although there is also the likelihood that the operator would simply remember similar events). The operators are monitoring the situation and choosing an appropriate response to make. We believe that this is not a two-stage process of comprehend and then respond, but rather an interleaving process in which both activities are performed in parallel, with one influencing the other.

Visual scanning
In this study we observed heterogeneous visual scanning behaviour, both with respect to the attended regions of interest and combination of information sources. Previous work in other domains showed comparable preferences for sources: CCTV operators primarily attended to a selected video feed on a separate monitor rather than looking at all available feeds (Stainer et al., 2013) and air traffic controllers attended to a preferred information source (Stein, 1992). In the present study, the difference in CCTV monitoring between participant 5 and all other participants was rooted in the ability to view a selected CCTV feed on the panel in the far field (SS) or on monitor D4 in the near field. Only participant 4 attended to the schematic road map on D3 for a substantial amount of time (viewing time 24%), while the other participants did not use this source much (viewing time 7%). None of the operators attended to the large bank of sixteen non-interactive CCTV feeds (BS) for large amounts of time. This may be related to information overload from too much visual input; a recent review reported that an increasing number of CCTV feeds results in a drop of both operator accuracy and confidence (Stainer et al., 2013). On the other hand, the CCTV feeds were static and might not contain a view relevant to the task. Operator 2 in fact used BS to see the broken down motorbike. Task constraints are likely to affect the similarity in viewing behaviour across staff: while the completion of subgoal 1, 5 and 7 were highly constrained (requiring the incident log on D2 with access to road sign amendment and/or a phone), subgoals 2, 3, 4 and 6 allowed room for individual preferences in ROI attendance; all goals could be completed by combining a variety of information sources, such as the interactive bank of CCTV on SS, the CCTV monitor on D4 or the schematic road map on D3 or attending to a single preferred source.
The median dwell times of 1e4 s across participants compared well to previously reported values of for example 0.1e3.5 s when assessing swimming patterns (Moreno et al., 2006) or switch frequencies of around 1.1 Hz (equal to dwells of around 0.9 s) when assessing football games (North et al., 2009).
Use of the network analysis metrics revealed the following scanning behaviour: around 20e30% of possible paths were taken, which indicates that the attendance sequence was not completely random. The viewing networks highlighted preferred paths between pairwise ROIs, showing that operators had individual patterns for those information sources which they frequently attended  to in a specific order. The average nodal degree of around 3 indicates that an individual ROI was attended to following attention to various other ROIs. This means that there was not one specific sequential pattern, but flexibility in how information was combined and ROIs attended to sequentially. In line with this finding, the number of leaf nodes was low, ranging from 0 to 2. The median nodal degree was highest for D2 (incident log) and TA (table in front  of operator), which fits in with the function of these two ROIs: D2 serves to aggregate information from all ROIs, and TA is attended to during various points of the task to for example take notes on paper. The median nodal degree for the other ROIs was lower, indicating that they had a more specific function during sequential scanning. The limitations to this study are the small sample size and the number of scenarios. Sample size was limited by the number of staff working at the facility, where 5 out of 6 participants chose to participate. The facility subsequently changed/updated the control room layout, so it was not possible to go back and record further scenarios or repeats. In this study we present results for two 'real' live scenarios, which differed substantially, and three 'simulated' scenarios, which we hoped would result in reasonably similar behaviour during the workflow of locating an object in the road and making the adequate responses. This case study hence may not generalize, despite giving a first insight into road traffic management behaviour. With a larger sample size and more scenarios, a future study may be able to identify and classify distinct patterns. With this case study we aimed to provide a starting point when considering the layout of a new facility and introduce metrics that may in future allow to compare operator behaviour across domains.

Eye-and head movement
The high prevalence of head movements during the task highlights that head movement should be taken into account during control room design in context of physiological requirements and possible strains. In the present study, there was wide spacing of information sources, which can require head movement per se when switching gaze between certain ROIs: the approximate degree visual angle between ROI centres was approximately 35 . This value is close to the physiological threshold for head movement necessity (45 e50 ). A confounding factor is that the high prevalence of head movement throughout the study may have resulted in ROI switches performed by the eyes being classed as a match by chance. On the other hand, matching eye-and head switches based on the ROI attended to by the head may suffer from artefacts of the head ROI mapping, as this was performed using the simplifying assumption that the CMOS centre is a reliable indicator of head orientation.

Considerations for control room design
Having to follow a clearly defined procedure through the necessity to complete the incident log ensured that despite heterogeneous sampling approaches, subgoal completion remained structured and directed. The differences in scanning patterns between operators leave two opposing hypotheses for optimal control room or interface design: on the one hand, it may be beneficial to accommodate different individual workflows so that operators can use material which feels most intuitive and insightful for them. On the other hand, in the related domain of air traffic control, trainees are "explicitly instructed to acquire a scan pattern across the radar display and other equipment that allows coverage of the scope without being lured into any particular problem in one part of the airspace" (Durso and Dattel, 2006). This approach to constrain viewing modalities and learn scan paths may be advantageous because it encourages a more structured visual scanning protocol, ensuring that the focus is kept on the task at hand and encouraging a more consistent behaviour between different operators.