Identifying processes underlying the multimedia effect in testing: An eye-movement analysis

Test items become easier when a representational picture visualizes the text item stem; this is referred to as the multimedia effect in testing. To uncover the processes underlying this effect and to understand how pictures affect students' item-solving behavior, we recorded the eye movements of sixty-two schoolchildren solving multiple-choice (MC) science items either with or without a representational picture. Results show that the time students spent ﬁ xating the picture was compensated for by less time spent reading the corresponding text. In text-picture items, students also spent less time ﬁ xating incorrect answer options; a behavior that was associated with better test scores in general. Detailed gaze likelihood analyses revealed that the picture received particular attention right after item onset and in the later phase of item solving. Hence, comparable to learning, pictures in tests seemingly boost students' performance because they may serve as mental scaffolds, supporting comprehension and decision making. © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Visualizations are frequently integrated into text-based test items in large-scale assessments (LSA; e.g., Programme for International Student Assessment [PISA]; OECD, 2013), yet little is known about how multimedia elements affect cognitive processing in item solving.Recent studies have shown that adding a representational picture (i.e., a picture that displays task-relevant information stated in the text; in the following referred to as picture) to a text-based item stem reduces item difficulty (Hartmann, 2012;Lindner, Ihme, Sab, & K€ oller, in press;Sab, Wittwer, Senkbeil, & K€ oller, 2012).Similar to the findings of many studies in the learning context (see, e.g., Butcher, 2006 for a review), items become easier when representational pictures are added to the text.Thus, it seems that the so-called multimedia effect (e.g., Mayer, 2009) can be transferred from learning to testing situations (i.e., multimedia effect in testing; cf.Lindner et al., in press).However, not much is known about the cognitive processes that underlie a decrease in item difficulty due to the addition of a picture to text-based multiple-choice (MC) items.In the following, we refer to multimedia theories of learning to derive informed hypotheses about how pictures may affect item processing in the testing context.

Learning with text and picture
A large number of studies provide evidence that students' performance improves when they learn with text and pictures rather than learning with text alone (see e.g.Ainsworth, 2006;Butcher, 2014;Mayer, 2009 for reviews).To explain why a combination of text and picture is effective for learning, the Cognitive Theory of Multimedia Learning (CTML;Mayer, 2009Mayer, , 2014) ) postulates that information from text and picture is processed in two separate channels in working memory, while each one is limited in its capacity.When using both channels simultaneously, for example, by learning with (spoken) text and pictures, the limited workingmemory capacity is better exploited.Furthermore, the CTML refers to a more active processing of information when pictures are added to a text, because of the necessity to integrate the information from the two resulting mental models (a verbal and a pictorial model) with the help of prior knowledge that is retrieved from long-term memory.Moreover, the necessity to integrate multiple representations can foster the abstraction of general rules and can, therefore, facilitate conceptual understanding at a higher level of organization (Ainsworth, 2006).
According to the Integrated Model of Text and Picture Comprehension (ITPC; Schnotz & Bannert, 2003;Schnotz et al., 2014), when processing a text, the learner first constructs a mental representation of the text's surface structure, from which a propositional representation of the semantic content is generated (cf.Van Dijk & Kintsch, 1983).A mental model can be constructed from a text only by generating a propositional representation as an intermediate step.This requires students to infer relations that are sometimes only implicitly expressed in the text (cf.Glenberg & Langston, 1992), which is subject to erroneous interpretations, so that the mental model may inadequately reflect the contents or situations described.In contrast, when processing pictures, the perceptual representation of the picture's visuospatial relations can be mapped onto semantic relations to provide the structure of the mental model (analogical structure mapping; Schnotz & Bannert, 2003).This has unique benefits for comprehension, for instance, because the picture as a second representation can disambiguate potentially ambiguous text (i.e., constraining interpretation function; Ainsworth, 2006).Hence, presenting a picture in addition to a text means that the mental model construction (or interpretation) from the text can be facilitated or even replaced by a mental model construction based on the picture.Especially early attention on the picturedwhich has been found in studies with simultaneous textpicture presentation (e.g., Lenzner, Schnotz, & Müller, 2013)dcan provide learners with the structure of a mental model so that part of the mental model construction is already completed based on the picture (cf.scaffolding assumption; Eitel, Scheiter, Schüler, Nystr€ om, & Holmqvist, 2013).As a consequence, subsequent processes of mental model construction from the text are facilitated.This assumption is supported by studies that have provided evidence for a link between early picture processing, facilitated text processing, and better learning outcomes (see e.g., Eitel & Scheiter, 2015, for a review).
However, while learning with text and picture has great potential, it can also be too demanding to yield the desired outcomes.In particular, successful learning with text and picture requires students to integrate information both within and between representations.Therefore, because students often tend to treat representations in isolation, beneficial outcomes do not always occur (see Ainsworth, 2006, for a review).Fortunately, integration behavior can be facilitated and, thus, learning supported when key principles of effective multimedia design are applied (see Mayer, 2014, for an overview) e for instance, when split attention is avoided because text and pictures are presented in close spatial and temporal proximity (e.g., Ayres & Sweller, 2005;Brünken, Plass, & Leutner, 2004).
Even though these assumptions were originally proposed for learning with text and picture, they may extend to the case of testing with text and picture.We will discuss this issue in the next section.

Testing with text and picture
Reading to understand is an important process in which pictures can be helpful -not only in learning but also in testing situations.In testing situations, students usually have to understand the problem that is presented in the test item stem to be able to solve the item correctly.Especially when test items require conceptual problems to be solved by transferring declarative domain knowledge to a more applied situation, which is often the case in LSA (cf.e.g., IEA, 2013;Mullis, Martin, Ruddock, O'Sullivan, & Preuschoff, 2009;OECD, 2013), it is crucial to develop a coherent mental model of the problem in order to draw the right inferences and find the correct solution.Thus, it is assumed that theoretical assumptions from learning research about how pictures facilitate text understanding (e.g., Schnotz & Bannert, 2003) can be generalized to item solving (see also Jarodzka, Janssen, Kirschner, & Erkens, 2015;Lindner et al., in press;Sab et al., 2012).If multimedia principles are accounted for (cf.Mayer, 2014), presenting a picture in addition to a text probably facilitates comprehension of the problem situation because it may help students to disambiguate or abstract information from a text and facilitate mental model construction, also in testing situations.
Empirical studies that investigated younger test-takers from the fourth to the ninth grade so far tentatively support these assumptions.On the one hand, there is accumulating evidence that pictures reduce item difficulty in the context of multiple-choice assessment (e.g., Hartmann, 2012;Sab et al., 2012), and this effect has moreover been shown to be highly stable across students and items by Lindner et al. (in press).On the other hand, response time measures in a computer-based study showed that students did not spend more time on solving items that contained a picture in the item stem compared to corresponding text-only items (Sab et al., 2012).In this initial result on the processing of text and pictures, the comparable response time measures for text-only and textpicture items indicate that the time used to process a picture is compensated for by an acceleration of the item-solving process.However, the question concerning the stage at which this takes place remains open.Potential answers to this question are presented in the following section.

Processing text and picture in MC testing
In MC testing, eye-tracking is a suitable method for revealing cognitive processes because the components of MC items are typically displayed in distinct locations on the screen e with the item stem and question on top, and the answer options below (see Fig. 1).Accordingly, Lindner et al. (2014) found that students mainly processed the question at the beginning and the answer options, particularly the subjectively preferred and chosen answer options, towards the end of the decision-making process.According to their findings, the item-solving process can be roughly divided into two phases: (1) an information-acquisition phase, in which students construct a mental representation of the problem or situation described in the item stem, and (2) a decision-making phase, in which students evaluate the answer options with respect to their likelihood of being the correct answer, followed by the actual choice of an answer.
Referring to multimedia learning theories (e.g., Mayer, 2009;Schnotz & Bannert, 2003), the picture should especially help in the information-acquisition phase; that is, when trying to construct a representation of the problem or situation stated in the item stem.Since the picture represents the problem as the text does, both media can be used to construct a mental model of the situation or problem, but a mental model can be constructed more directly based on the picture (analogical structure mapping; Schnotz & Bannert, 2003).Therefore, the processing of the text item stem should be facilitated by the picture, which can be expected to receive initial attention (cf.Lenzner et al., 2013).In consequence, less time should be required to read the text.A reduction in textprocessing time should, however, go hand in hand with early picture processing, allowing students to initially extract the gist of the item and to thereby reduce the time required to encode and understand the corresponding verbal information (i.e., scaffolding;Eitel et al., 2013).Nevertheless, the processing time required for the picture might still outweigh any time saved by a shorter processing of the text, resulting in a longer overall engagement time with the item stem information in text-picture compared to text-only items.This could be desirable, as spending more time and using different