What multimodal data can tell us about the students’ regulation of their learning process?

Continuing research has provided increased understanding of the complex processes involved in self-regulated learning as a cyclical and metacognitive process involving adaptive thinking, motivation, emotion, and behavior (Zimmerman & Schunk, 2011). In addition, the increased focus on the social and interactive nature of learning has led to research that provides a theoretical understanding of self-regulation as a socio-cognitive phenomenon. Hadwin, Järvelä, & Miller, (2018) and Järvelä and Hadwin, (2013) define and conceptualize three forms of learning regulation (i.e., self-regulation of learning [SRL], co-regulation of learning [CoRL], and socially shared regulation of learning [SSRL]) as central processes in highly interactive and collaborative learning contexts. Although there has been significant theoretical and conceptual progress in social aspects of SRL theory, there has been little progress in developing methods that make the invisible mental regulation processes and their accompanying social and contextual reactions visible and, thus, measurable. “Making visible” means that even though SRL is a psychological phenomenon (Winne, 2017), it has physiological indicators such as stress, excitement, enthusiasm, or emotional dynamics in groups (Mønster, Håkonsson, Eskildsen, & Wallot, 2016). Traditional approaches to measuring the regulation of learning are based on selfreports or other subjective measures (e.g., subjective coding of video and/or verbal protocols) that poorly inform the deployment of regulatory processes in social learning contexts. Self-reports are based on students' perceptions of how they would or did enact certain processes, but these perceptions often do not align with what actually occurred during learning (Winne, 2004; Zimmerman, 2008). Subjective coding of observation data is also weak due to the coders' interpretations of observed behaviors. Another problem is that its results lack generalizability as they are both content and situation specific and time dependent (Strijbos & Fischer, 2007). Triangulating multiple sources of data could overcome these methodological weaknesses. Extending data collection to physiological reactions can provide information about a person's physiological reaction to a situation, which may be linked with their thoughts, but which cannot provide direct information about what a person is thinking (Azevedo, 2015). Triangulation involves matching process data resulting from different channels based on the timestamped information related to each data source. For example, video data-coding can be the foundation for contextualizing process data analysis and triangulation (Järvelä, Järvenoja, Malmberg, Isohätälä, & Sobocinski, 2016). This triangulation enables testing and experimentation with different combinations of data to investigate and eventually validate the appropriateness of different data channels for SRL. Recent advances in the development of new data-capturing devices allow researchers to go beyond ontologically flat data (not stratified) of tracking, coding and modelling of basic actions and processes observable behaviors to multimodal data-sets that simultaneously trace a range of cognitive and non-cognitive processes to more nuanced ways, including nearly invisible micro-level environmental interactions and invisible responses of the body and the brain (Reimann, Markauskaite, & Bannert, 2014). These new technologies include eye-movement tracking, brain activation, skin conductance, and other bio-physiological signals. As Reimann et al. (2014) argued, capturing process data across the modalities of the body, brain, actions, and language provides us with resources for exploring learning processes that cross the ontological boundaries between the human body (i.e., neurobiological processes), the environment (i.e., actions), and the mind (i.e., dispositions). This paper explores what multimodal data can reveal about SRL processes in collaborative learning tasks with an emphasis on how data triangulation can help the study of important features of regulated learning and solve several current methodological limitations.


Introduction
Continuing research has provided increased understanding of the complex processes involved in self-regulated learning as a cyclical and metacognitive process involving adaptive thinking, motivation, emotion, and behavior (Zimmerman & Schunk, 2011). In addition, the increased focus on the social and interactive nature of learning has led to research that provides a theoretical understanding of self-regulation as a socio-cognitive phenomenon. Hadwin, Järvelä, & Miller, (2018) and Järvelä and Hadwin, (2013) define and conceptualize three forms of learning regulation (i.e., self-regulation of learning [SRL], co-regulation of learning [CoRL], and socially shared regulation of learning [SSRL]) as central processes in highly interactive and collaborative learning contexts.
Although there has been significant theoretical and conceptual progress in social aspects of SRL theory, there has been little progress in developing methods that make the invisible mental regulation processes and their accompanying social and contextual reactions visible and, thus, measurable. "Making visible" means that even though SRL is a psychological phenomenon (Winne, 2017), it has physiological indicators such as stress, excitement, enthusiasm, or emotional dynamics in groups (Mønster, Håkonsson, Eskildsen, & Wallot, 2016). Traditional approaches to measuring the regulation of learning are based on selfreports or other subjective measures (e.g., subjective coding of video and/or verbal protocols) that poorly inform the deployment of regulatory processes in social learning contexts. Self-reports are based on students' perceptions of how they would or did enact certain processes, but these perceptions often do not align with what actually occurred during learning (Winne, 2004;Zimmerman, 2008). Subjective coding of observation data is also weak due to the coders' interpretations of observed behaviors. Another problem is that its results lack generalizability as they are both content and situation specific and time dependent (Strijbos & Fischer, 2007). Triangulating multiple sources of data could overcome these methodological weaknesses. Extending data collection to physiological reactions can provide information about a person's physiological reaction to a situation, which may be linked with their thoughts, but which cannot provide direct information about what a person is thinking (Azevedo, 2015). Triangulation involves matching process data resulting from different channels based on the timestamped information related to each data source. For example, video data-coding can be the foundation for contextualizing process data analysis and triangulation (Järvelä, Järvenoja, Malmberg, Isohätälä, & Sobocinski, 2016). This triangulation enables testing and experimentation with different combinations of data to investigate and eventually validate the appropriateness of different data channels for SRL.
Recent advances in the development of new data-capturing devices allow researchers to go beyond ontologically flat data (not stratified) of tracking, coding and modelling of basic actions and processes observable behaviors to multimodal data-sets that simultaneously trace a range of cognitive and non-cognitive processes to more nuanced ways, including nearly invisible micro-level environmental interactions and invisible responses of the body and the brain (Reimann, Markauskaite, & Bannert, 2014). These new technologies include eye-movement tracking, brain activation, skin conductance, and other bio-physiological signals. As Reimann et al. (2014) argued, capturing process data across the modalities of the body, brain, actions, and language provides us with resources for exploring learning processes that cross the ontological boundaries between the human body (i.e., neurobiological processes), the environment (i.e., actions), and the mind (i.e., dispositions). This paper explores what multimodal data can reveal about SRL processes in collaborative learning tasks with an emphasis on how data triangulation can help the study of important features of regulated learning and solve several current methodological limitations.

Strategic adaptation and regulation of collaborative learning
SRL in a social context is considered a cyclical, complex metacognitive and social process (Zimmerman & Schunk, 1989). This approach forces the field to seek out and test alternative ways to show the phenomena underlying this process because using single methodological solutions is not enough for studying the metacognitive cyclical Multimodal data are data that originate from different data channels which are subjective and/or objective. Subjective data, such as repeated and contextualized self-reports, can help reveal students' intentions to learn and students' beliefs about themselves as learners (Zimmerman, 2008). In particular, well-designed, repeated self-reports embedded within learning tasks can elucidate learner-subjective SRL (e.g., task perceptions, goals, perceived challenges, and intended strategies) at different points during the learning process (Morris et al., 2010). In contrast, objective data, such as log data and physiological measures or objective use of video-data (e.g. study choices), can provide direct objective information about certain aspects of students' behavioral and mental processes that coincide with things such as study choices, confusion, and changes in effort or attention in a learning situation that are almost impossible to capture otherwise (Henriques, Paiva, & Antunes, 2013;Winne, 2010).
For example, cardiovascular data can reveal arousal, and heart rate and heart rate variation indices have been found to be measures of experienced cognitive load (e.g., Cranford, Tiettmeyer, Chuprinko, Jordan, & Grove, 2014;Haapalainen, Kim, Forlizzi, & Dey, 2010;Wilson, 2002). EDA refers to the skin's electrical conductivity properties that reflect sympathetic nervous system activity and arousal. Phasic measures of EDA refer to skin conductance responses (SCRs) that are seen as rapid changes in the EDA signal elicited by specific known stimuli. Skin conductance level (SCL) and nonspecific skin conductance responses (NSSCRs) are tonic measures of EDA that reflect long-term changes elicited by unfolding events. Skin conductance level and nonspecific skin conductance responses can reveal information about cognitive appraisals related to goal relevance (e.g., Kreibig, Gendolla, & Scherer, 2012) and perceived task difficulty leading to emotions (e.g., Pecchinenda & Smith, 1996;Tomaka, Blascovich, Kelsey, & Leitten, 1993). In addition, other modalities, such as eye-movement data and facial recognition data, can provide information about cognitive demand and when students are feeling bored or confused (D'Mello, 2013;Fairclough, Venables, & Tattersall, 2005). In general, these data modalities refer to directly measured objective data and they can be complemented with observational data (e.g., videotaping), which must be further evaluated by others. These data are indirectly measured objective data, as, despite being objective, they always include one or more researchers' interpretations of what is going on and, thus, do not reach the level of objectivity of sensor data.
Combining physiological measures such as EDA (Chanel & Mühl, 2015;Pecchinenda & Smith, 1996) to track skin reactivity changes in challenging moments (i.e., emotional arousal) and video observations to reveal SRL's sequential and temporal dynamics can provide a fundamentally new approach using objective and subjective means. In this way it is possible to (a) capture temporal and cyclical processes (i.e., planning, enacting strategies, reflection, adaptation) of regulation to see how previous small-scale situated adaptations and regulation of situated challenges contribute to large-scale adaptation, as during a collaborative learning task, and (b) show different patterns of activation of regulatory processes (i.e., planning, goal setting, enacting strategies, regulating motivation) to see how possible sequences of regulated learning contribute, for example, to learning progress. Such finegrained objective data and an examination of the relations among different data sources and different variables can help elucidate hidden physiological reactions that are practically invisible and thus nearly impossible to capture.
As we are interested in regulation in collaborative learning, we have conceptualized three types of regulated learning (SRL, CoRL, and SSRL) as central processes in interactive and collaborative learning contexts (Järvelä and Hadwin, 2013). Investigating SSRL in collaboration means capturing the cognitive, metacognitive, social, and socioemotional aspects of interaction (Miyake & Kirschner, 2014). As stated by Bannert, Reimann, and Sonnenberg (2014), subjective measures are inadequate for coherently and reliably capturing the complexity of these processes. Multimodal data can provide new supplementary and complementary methods for capturing important phases of regulated learning as they occur in challenging learning situations (Harley, Bouchet, Hussain, Azevedo, & Calvo, 2015). Triangulating these multichannel data can provide fundamentally new objective and subjective ways to capture the critical phases of the SRL, CoRL, and SSRL processes and find evidence of critical moments of success or failure.

Collecting and triangulating multimodal data for understanding SRL
While there is a long tradition of investigating the relationship between physiological responses and affective parameters in social interaction (e.g., psychosomatic research; Kaplan, 1967), multimodal data-collection is just emerging in the field of learning research. An increasing number of studies has collected physiological data, but these were often in small-scale experimental settings. While progress has been made in capturing affective states (D'Mello, Duckworth, & Dieterle, 2017) in complex learning situations, these studies have employed only single physiological marker targeting affect. A problem with this is that learning regulation is multifaceted, involving taking control of motivation, emotion, affect, behavior, and cognition influencing one another . For example, the joint cognitive goals which are generated during group planning has the potential to create new emotional conditions informing progress of collaborative work (Malmberg, Järvelä Järvenoja, 2017). Triangulating multimodal data channels can signal more about the multifaceted phenomena than a single state and help to follow the cyclical processes of all forms of learning regulation.
Blikstein, Gomes, Akiba, and Schneider (2016) used a multimodal dataset gathered from 21 students to investigate the effect of the type of instruction on students' exploratory behavior and arousal levels. The students completed the task in a physics microworld platform, and arousal was measured with galvanic skin response (i.e., EDA). Although the researchers did not find any statistically significant effect of instruction on task performance, they found that the development of an arousal state was affected by instruction. On one hand, detailed instruction was related to U-shaped arousal meaning arousal was higher at the beginning and at the end of the task. On the other hand, generic instruction seemed to be related to the continuously decreasing arousal slope. Their hypothesis was that heightened arousal at the end of the detailed instruction might reflect stress about the deadline and the need to meet detailed requirements set by someone else. While their chosen channel was relevant for investigating affect in learning regulation, adding more data channels such as videos could have helped contextualize the reasons for the affect peaks. Harley et al. (2015) synchronized three emotional measurement methods (i.e., automatic facial expression recognition, self-report, and EDA) and the degree to which these three measures agree with each other among 67 undergraduates working in a multi-agent computerized learning environment. They used self-reported emotional states as markers to synchronize data from facial expressions and electrodermal sensors and found high levels of agreement between the facial and selfreport data but low levels of agreement between the facial, self-report data, and sensor data. This suggests that a tightly coupled relation does not always exist between different emotional response components. Worsley and Blikstein (2015) compared human annotations, speech, gestures, and EDA data in a study of 20 9th through 12th grade high school students and 8 undergraduate university students. The students participated in two experimental conditions (i.e., example-based reasoning which used examples from the real world as an entry point to solving a task and principle-based reasoning which used engineering fundamentals as the basis for design) in a hands-on engineering design context. The authors identified behavioral practices that differed between the two conditions and provided examples of how to conduct learning analytics research in complex environments and compare how the same algorithm, when used with different forms of data, can provide complementary results. Although the results based on the handannotated data (i.e., gestures) provided a consistent picture of how principle-based reasoning may be related to success and learning, the multimodal sensor data provided a much more definitive delineation between the two experimental conditions. The authors concluded that an integrated multimodal analysis can significantly enhance the field's ability to detect and model student learning.
Finally, Antonietti, Colombo, and Di Nuzzo (2014) integrated eyetracking measures, EDA, and cardiovascular activity with self-reports of students' metacognitive strategies and learning results to investigate the implicit and explicit metacognitive strategies used by 20 undergraduate students in two conditions while the students examined a multimedia presentation that contained either written text with pictures or audio material with the same pictures. The model presented in the results assumed that the electrodermal and cardiovascular indexes (considered measures of cognitive activation) were predictors of eye-movement patterns. The results showed that students were able to discriminate between the written-and audio-text conditions and self-regulate their behavior accordingly. Even though the authors concluded that the use of technologies, such as eye tracking and biofeedback to record covert processes as an addition to traditional cognitive measures was effective, contextualizing and triangulation the physiological signal data with e.g. videodata episodes or interviews could have given answers for students' target of cognition.
Empirical research using and triangulating physiological and multimodal data in temporal progress of all facets of SRL, cognition, motivation, and affect is in an early stage of development. These data collection methods are expected to become new channels to make visible and identify SRL processes that have been impossible to achieve within conventional educational psychology research methods (Azevedo, 2015). Hadwin, et al. (2018) characterized important features of SRL, such as multifaceted regulation, cyclical adaptation, agentic nature, as well as the individual and socio-historically situated nature of regulated learning. Data collection technologies have opened up new methods of characterizing the core mechanisms of SRL, the individual learner's deliberate and strategic adaptation during the planning, task enactment, and reflection phases. For example, one of the first methods was Winne and Nesbit's (2009) software called gStudy that aimed to promote self-regulated learning, as well as record observable traces of students' uses of strategies over time during complex tasks.

Aim
This article explores what multimodal data can tell us about SRL processes in collaborative learning tasks, discussing and demonstrating how multimodal data can be used for collecting and triangulating data about the regulation of learning. Subjective measures can be used to explore -post hoc -the intent of student learning, whereas objective data provide on the fly, real-time information about what students do when they study and can detect periods of challenge, interest, and attention (Henriques et al., 2013). These multimodal data can be used to identify markers that characterize successful SRL and learning progress and help in understanding and increasing the evidence about (a) interactions between different facets of regulation (i.e., cognition, motivation, emotion), such as how affective responses interact with cognitive strategic actions, (b) the occurrence and temporality of different types of regulation (SRL, CoRL, and SSRL) and (c) temporality and cyclical processes (i.e., planning, enacting strategies, reflecting, adapting) of regulation such as how previous small-scale situated adaptations and regulation of situated challenges contribute to large-scale adaptation during collaborative learning tasks.

Can multichannel data be used for data triangulation on SRL?
In order to demonstrate how multimodal data can be used to collect objective and subjective data about the regulation of learning, five data examples are described to illustrate how and what evidence can be found that supports each claim and demonstrates possible ways of using multimodal data in learning regulation research. The examples derived from collaborative learning situations utilized multimodal data and generated an enormous amount of data presuming the need and use of specific data processing techniques (Boucsein, 2012) and synchronization between the different data channels (Pijeira-Díaz, Drachsler, Järvelä, and Kirschner (2016). The individual data channels in the context of collaboration were broken down (e.g. EDA, heart rate, video, and audio) and triangulated to understand how different data modalities represent regulated learning.
Four data modalities, observation data, two types of physiological data (EDA and heart rate), and facial expression data processed from the video data, are used in the examples to illustrate how multimodal data can be used to identify markers that characterize successful learning regulation and learning progress. The observation data, focusing on students' collaborative learning, was recorded with the MORE video system (Keskinarkaus et al., 2016) that can simultaneously record 30 speech tracks and 3 video tracks through spherical, 360-degree point-of-view cameras. The observation data serve as the foundation for interpreting and contextualizing the physiological data in each of the five examples. Physiological data were collected by using Empatica E3 multisensor devices that tracked the students' EDA. Specific EDA data were separate NSSCRs that signal the response and amplitude of reactions to external stimuli. In addition, the Empatica E3 device also tracked cardiovascular activity, measuring interbeat intervals. Facial recognition data were processed with an automatic analysis tool created for the MORE system that detects and analyzes visible faces in the video recording. For each face, the tool gives an estimate of valence using one of three classes: positive, neutral, or negative.
The main principles in the use of multimodal data was to contextualize it. Without contextualizing the data, it is impossible to understand or further investigate how it relates regulation of learning . In the qualitative content analysis of the video data the target was to a) recognize a challenge (Examples 1 and 2 and 3) and b) type of interaction (Examples 3, 4 and 5) (See Sobocinski, Malmberg, Järvelä, 2017). The identification of a challenge was considered important because, to collaborate successfully, group members need to recognize challenges that might hinder their collaboration and then develop appropriate strategies together to overcome these challenges through the group members' interactions. Challenges might trigger the occurrence of learning regulation (Hadwin and Järvelä, 2011;Malmberg, et al., 2015).

Participants and context
Each of the examples is drawn from a science experiment conducted among high school students (N = 48, 27 females, 21 males, Mean age = 17.4 years; SD = 0.67) from a Finnish teacher training school. The experiment was conducted in LeaForum (http://www.oulu.fi/ eudaimonia/node/19394), which is a classroom-like space with modern research equipment. During the study, the students collaborated in groups of three (16 groups in total). The students sampled for the five cases characterizes a) purposeful samples of the data (Patton, 1990) and b) provide the most illustrative examples to explore what multimodal data can tell us about SRL processes. By describing these five cases, we demonstrate the ways how multichannel data can be collected, used and contextualized to provide future guidelines and analysis techniques applied to the multimodal data collection.

Collaborative task
The collaborative task, lasting for 75 min, was to "Design a perfect breakfast for a marathon runner." During the collaboration, the students used the weSpot learning environment which guided their collaboration. The learning environment included a case description of a hypothetical person, along with that person's daily energy needs. The problem statement at the beginning of the task also included information in percentage terms of how much fiber, calories, fat, and carbohydrates the breakfast should include. The students' collaborative task was to complete a sheet that included a detailed list of nutrients that a marathon runner should eat for breakfast. The task was complex and there were multiple ways to come up with the task solution.

What can multimodal data reveal about interactions between different facets of regulation?
The first case example (Fig. 1) shows three group members' collaborative work, the challenges observed in the collaboration (via a videotaped episode), and the simultaneous changes in each group member's standardized EDA signal while they worked during a 75-min session. The standardized EDA values were used to make the signals visually more comparable in terms of the temporal changes (Ben-Shakhar, 1985;Dawson, Schnell, & Filion, 2017). The color code grey refers to observed challenge and pattern area to the off-topic discussion coded for the videotape. As Fig. 1 shows, at the beginning of the learning session the skin conductance level (SCL) (i.e., the general level presented on the X axis) and the nonspecific skin conductance responses (NSSCR) frequency (i.e., the rapid changes seen as > 0.05 μS peaks in the signal, see Bouchein, 2012) were high which means students were aroused. This is typical at the beginning of a task because the participants are anticipating the start of work on the task. If there are no challenges in the situation, the EDA values are likely to decline (Boucsein, 2012). However, if the students confront challenges while working, the NSSCRs can occur rapidly (e.g. minutes 65-67 in Fig. 1), and the SCL increases (Fritz, Begel, Müller, Yigit-Elliott, & Züger, 2014). Fig. 1 provides an example of how standardized EDA signal values change during collaborative learning. Despite the high values that appear at the beginning of the learning session, elevated levels also occur steadily during the collaborative learning session. For example, during the observed challenge episodes, the students' NSSCRs occur rapidly, as the bodily response to the situated challenge related to task completion. The implication is that the SCL and NSSCRs during collaborative learning have the potential to provide information about interactions between different facets of regulation, such as how affective responses interact with cognitive strategic actions.
However, several issues require further elaboration and caution. First, it is not possible to use the EDA signal as it occurs from each student, because each student's reactions vary considerably depending on the contextual stimuli that might (or might not) awaken a reaction. This means that not all the learners react to the same external stimuli the same way (Palumbo et al., 2016). For example, the learning situation can include various aspects that are not related to the learning goals that can also elicit changes in different EDA components, such as off-topic interactions in Fig. 1 (minutes 38-43). Thus, only further observational analysis can reveal the context of the EDA reactions that can be caused by unrelated factors, such as artefacts occurring in the data due to the learner's movement. Second, the EDA signal does not show the valence, for example, in terms of experienced emotions. It is not clear whether the changes in the SCL and NSSCRs are due to the students' high engagement, increased interest resulting in excitement or distraction, and/or fear due to task difficulty or fear of failure (Boucsein, 2012;Haapalainen et al., 2010;Kreibig, 2010).
The second example (see Fig. 2) is zoomed from the first example ( Fig. 1) and it illustrates in more detail how three group members' SCL and NSSCRs occur when the group members confront a task understanding challenge, a technical challenge, and a motivational challenge. The duration of the episode is 4 min 10 s. In the example, the video data are synchronized with the standardized EDA signal and the challenges have been marked with grey color. The episode demonstrates a situation in which the students' SCL and NSSCR frequency increase almost simultaneously. The video analysis shows that when the increase occurs, the group experiences a technical challenge. The screenshot from the video captures the moment when the student in the middle expresses his frustration, which can also be seen from his body language. Soon also other group members become aware of the challenge which is prohibiting their task completion. Instead of using strategies for regulating emotion, the two other group members start to echo his frustration. Also the increase and decrease in EDA values happens in a synchronized manner. This phenomenon has been called physiological synchrony, linkage or coupling, and prior research suggest it's potential tool for monitoring group processes (Palumbo et al., 2016).
Finally, the group solves the challenge by asking for external help (grey dash line in Fig. 2). In sum, a technical challenge due to something in the environment leads to an emotional challenge, which is also shown as simultaneous increase in EDA.
Although the qualitative video analysis indicated a challenge in the students' task understanding, there was no change in the students' SCL or in the occurrence of NSSCRs. However, when the students confronted a technical challenge, there was a distinct change (from M = −0.56 to M = 2.08 and from M = 4.97/min to M = 14.1/min) in the SCL and NSSCR frequency. The implication may be that it is important that learners are aware of challenges that might hinder their learning and that they have set learning goals to perceive the lack of task understanding as a challenge (McCardle & Hadwin, 2015). If this is not the case, no physiological reactions that could be used in signalingregulated learning will occur. If learners do not encounter real learning challenges, then the learners have limited opportunities to activate and refine their regulatory responses (Sobocinski et al., 2017).

How can the occurrence and temporality of different types of regulation (SRL, CoRL, and SSRL) be demonstrated?
The third example is from a situation where the video analysis revealed a task interpretation challenge followed by an instance of socially shared regulation. Fig. 3 presents the three group members' standardized heart rate values for 30 s windows over the period of 4 min, starting at minute 52 of the learning session. At the beginning of the example, the group receives the task, and student 1 (black line) starts reading the task out loud. Student 2 (green line) and Student 3 (red line) start wondering about what exactly they need to do. During the period coded as socially shared regulation (min 54-55), student 1 and student 3 suggest that they should make a decision about how to Fig. 1. Observed challenges (grey area), off-topic discussion (pattern area) and standardized EDA signals (green, red and black lines) presenting skin conductance levels and non-specific skin conductance responses of three students in a collaborative learning situation. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Fig. 2.
Challenge episode (grey area) associated with standardized EDA signals of three students presented with black (left student), green (right student) and red (middle student) lines. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) Fig. 3. Heart rate changes of three students during a challenge, followed by an instance of socially shared regulation. Standardized values of heart rate are presented with black line for student 1, green line for student 2, and red line for student 3. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) proceed, and student 3 sums up their discussion after which all three start working quietly on filling out a table. An increase in the students' heart rates is observed during the challenging situation (from M = −0.1 to M = 0.03), which then drops until the segment containing regulation (M = −0.01), after which the students' heart rates again increase (M = 0.02). During and after challenge (min 53-55), all three group members' heart rates increase and decrease in a synchronized manner. The increase and decrease of the signals is not entirely simultaneous, which might be due to individual differences in reaction time to stimuli (Turner, 1989).
In the fourth example (see Fig. 4), the group is facing a technical challenge. At first, their behaviors are off task (min 39-40), they are talking and laughing together with another team. Then they find out they have a new task (min 40), but face a technical difficulty while accessing it (min 40 to min 41.5), after which they start reading the new task in silence. Student 3 (red line) breaks the silence and suggests that they all should start adding items to the form (min 42,5-43,5), which was coded as co-regulation. They continue adding items in silence. In this example of regulation, student 3 (red line) is the only one verbally participating, and her heart rate fluctuates the most (between −0.15 and −0.05), showing a difference in the level of arousal or cognitive activation (Cranford et al., 2014). This fluctuation, when contrasted with the signals of the two other group members, might be an indication of monitoring that precedes the verbalized act that is coded as co-regulation.

What can multimodal data tell us about the temporality and cyclical process of regulation?
The fifth example (see Fig. 5) illustrates how arousal events recognized from EDA are related to the SRL phase (i.e., task interpretation, planning, and task enactment) and the valence of the detected emotions (i.e., negative, neutral, positive) during a 67-min collaborative learning session for a group of three members. The episodes when the EDA signals increase synchronously among the three group members are labeled based on the type of interaction, such as a) low interaction, b) high interaction, and c) confusion, that occurs within those episodes. "Low interaction" refers to reading and processing of information to acquire knowledge, accompanied by low interaction. This means that group members were either silent when one person was talking or agreed silently but did not participate in the conversation, and there was no visible regulation of learning. "High interaction" refers to activities related to the construction of meaning, such as generating new ideas, elaborating ideas, critiquing ideas, and connecting them to prior knowledge, and which feature high interaction and regulated learning (e.g., Volet, Summers, & Thurman, 2009). In contrast, "confusion" could lead to either high or low levels of interaction, depending on whether it was resolved and regulated (D'Mello & Graesser, 2011). That is, confusion involves markers of metacognitive monitoring and prompting of other group members to regulate learning (Hadwin, Järvelä, & Miller, 2017). The regulation phases, such as planning (green) and task enactment (orange), are located before the episodes when the EDA signals increase (Zimmerman, 2011). The valence of the emotions detected from the recognized facial expressions (Keskinarkaus et al., 2016) is shown as relative frequency percentages with the color coded lines (green for positive, gray for neutral, red for negative). The relative frequency means percentage from all the identified expressions in the group for each minute. It tells how the identified expressions are distributed between negative, neutral and positive ones.
Typically, low interaction occurred most often when the EDA increased in all three group members, while high interaction and confused interaction occurred the least. However, it was not possible to directly assume that each time the EDA increased there was a direct connection to regulation of learning. It was possible to link the phase of regulated learning to the type of communication (Malmberg et al., 2019). Types of interaction were associated with SRL phases, such as planning and task enactment, indicating that learners progressed in their collaboration. However, when the interaction was confused, it was not directly associated with any of the SRL phases. Fig. 5 illustrates that at the end of the learning session the type of interaction was confused. Based on the SRL theoretical models (e.g., Hadwin et al., 2018;Winne & Hadwin, 1998), this type of interaction has the capacity to activate regulated learning in the context of collaboration , because the interaction invites students to activate regulation. Although the learners expressed negative, positive, and neutral facial expressions throughout the session and within each type of interaction, the learners expressed negative faces the most when the episodes included markers of confusion (Malmberg et al., 2019). This result indicates that when EDA rises and learners exhibit negative faces, there is a possibility of locating markers of regulated learning following those episodes. In sum, it is possible to capture cyclical processes of SRL by using contextualized physiological signals. However, it is not possible to make the direct link that each time EDA rises among group members their regulation of learning can be located. In our example, the best sign for finding evidence about cyclical processes of SRL is related to episodes that include confused interaction.
Three issues require further elaboration. First, the observation data did not include visible markers about confusion or task difficulty when the EDA rose among all group members. However, the quality of the interaction was captured. For example, when the interaction was low, it could have meant that each group member was simultaneously either engaged in his or her work or was confused, but the group members did not show confusion to each other, and it was not visible in the observation data. However, the EDA data did show significant changes that were due to arousal. Second, facial recognition data have the capability of tracking subtle emotional changes that are less obvious to perception with the naked eye (i.e., micro-expressions). This means that it is compelling to address whether the physiological sensor data that accompanied the facial recognition data can shed light on the quality of the social interactions during collaborative learning. Third, connecting synchronized moments of arousal to the regulation phases (i.e., task interpretation, planning, and task enactment) can show different phases of activation of the regulatory processes (i.e., goal setting and enacting strategies) and can signal learning progress. However, this analysis requires qualitative, subjective interpretation and is difficult to capture only from objective data sources.

Fig. 4.
Standardized heart rate values for three students (black for student 1, green for student 2 and red for student 3) during a segment containing off task behavior, challenge and co-regulation. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Conclusions
Our major claim is that multichannel data can be potential for understanding regulatory processes in collaboration. With our five empirical case examples, we illustrate how triangulating multiple sources of data has potential to advance the theoretical and conceptual progress in social aspects of SRL theory. With the first and second case example, we made visible how interactions between different facets of regulation, such as cognition, motivation and emotion interact with cognitive strategic action by using video and EDA data. Example one visualizes how EDA occur rapidly, as the bodily response to the affective stimuli interacts with cognitive strategic actions as learners progress with the task. Example two, instead, visualizes in more details how task understanding challenge, a technical challenge, and a motivational challenge located from the video is also shown as an increase and decrease in EDA of all the group members in a synchronized manner.
Examples three and four visualize the occurrence and temporality of different types of regulation (SRL, CoRL, and SSRL). With those examples we visualize how physiological synchrony measured from the heart rate can reveal or backup the interpretation of socially shared regulation of learning or co-regulation of learning located from the video.
With the fifth example we visualize temporality and cyclical processes (i.e., planning, enacting strategies, reflecting, adapting) of regulation, such as, how previous small-scale adaptations and regulation of situated challenges contribute to large-scale adaptation during collaborative learning tasks by using video, EDA and facial expression recognition data. With the fifth example, we illustrate how combining not only physiological measures, but also facial expression data can lead even more accurate interpretations of the situations where regulation of learning is needed. Thus, it also shows how phases of SRL occur in the context of collaborative learning.
In all five case examples we focused on SRL processes in collaborative learning tasks collecting physiological measures that occurred simultaneously between the group members. This is to say, the unit of the analysis was the group, not individual student. So far, the strongest argument is that simultaneous peaks or rapid occurrence of EDA has potential to reveal a need for regulated learning and the measures of synchronicity has potential to reveal the actual shared regulation of learning and the temporal process of regulation.
New technologies can provide rich data for investigating a range of cognitive and non-cognitive processes. Learning processes that could be studied based only on subjective data can be verified objectively and triangulated with different types of more objective data. Usually, using ratings by researchers (e.g., verbal protocols) has been time-consuming and expensive. Digitization of protocols along with subsequent automated analysis and the use of reliable sensor data can speed up investigations and produce big data about complex learning processes.
Currently, limited methods exist for making invisible mental regulation processes and the accompanying social and contextual reactions visible. In addition, empirical research on regulated learning in the social context, which is understood as a complex metacognitive and social process that is cyclical and involves adapting thinking, motivation, emotion, and behavior, is scarce . Effort is needed to investigate whether and how these processes can be shown with certain objective data channels. The present limited data examples show how affective responses interact with cognitive strategic actions; these examples provide information about the temporality and cyclical process of regulatory processes and about the occurrence and temporality of situations that can, to some extent, signal the occurrence of different types of regulation (SRL, CoRL, and SSRL). Although the examples presented in this article are snapshots of complex collaborative learning processes, they contribute to the progress of discussion on how multimodal data collection can advance research on regulation of learning. A limitation is that there are very few empirical studies in which SRL, CoRL, and SSRL processes were measured with physiological data channels. Therefore, the interpretation of the findings is based on previous evidence obtained with traditional methods or from findings obtained in non-authentic (i.e., laboratory) learning contexts or based on a cognitive task performance with a short duration. A second limitation, or actually a methodological problem, is deciding upon and matching the granularity of the data from each source. Some of the data is very fine grained, for example, changes in a heart rate varies rapidly and is of short duration, and various time windows have been used in previous literature for analyzing it in the context of learning (Ahonen et al., 2016). Others, such as facial expressions, are very less fine grained (i.e., fewer categories and of comparatively long duration). A limitation is also that one datum (i.e., a change in the heart rate) could be due to multiple causes requiring painstaking -sometimes subjective -human interpretation of the contextual data. A final limitation is dealing with the challenges of analyzing physiological data quantitatively. Our current examples are visual and descriptive, but when working with bigger data more statistical explanatory power is needed. In progress of this we have designed a Graphical User Interface known as SLAM-KIT (Noroozi et al., 2018). SLAM-KIT reveals principal features of complex learning processes by allowing users to travel through the learners' data and its statistical characteristics. This kit will have practical implications as it simplifies complex information and data while making them available through vizualisation and analysis to the researchers.
Many challenges and open questions have emerged when multimodal data in learning regulation are considered, and systematic empirical research is needed. First, physiological and technology-assisted data collection produces a significant amount of data (i.e., big data), which increases the number of data challenges researchers must handle, analyze, and understand regarding the data gathered (D'Mello et al., 2017). Effort is needed to understand how we can progress from more data to deep data. Second, multimodal data sets simultaneously trace a range of cognitive and non-cognitive processes, which are parallel and overlap. Strong theory and a researcher's deep conceptual understanding are needed to analyze and make inferences from the data (Wise & Shaffer, 2015). Third, multidisciplinary research teams are a prerequisite for progress with multimodal data sets. Experts in datadriven analytical techniques, such as learning analytics or educational data mining, are critical for the success and progress of the analyses (Gašević, Dawson, & Siemens, 2015).
We propose that researchers continue to integrate interdisciplinary methodologies to capture regulated learning using trace methodologies, such as log-file data, eye-movement data, physiological measures, video data, and self-report measures of learning processes. These methodologies are needed to extend our methodological paradigm in the area of self-regulated learning with new, interdisciplinary work using innovative tools and techniques from educational data mining, machine learning, and affective computing (Baker & Siemens, 2014). Triangulation of multichannel data provides a new approach, objective and subjective means, through which to capture important phases of SRL as they occur in challenging learning situations.