Learning Science by Constructing Models: Can Dragoon Increase Learning without Increasing the Time Required?

VanLehn, Kurt; Chung, Greg; Grover, Sachin; Madni, Ayesha; Wetzel, Jon

doi:10.1007/s40593-015-0093-5

Learning Science by Constructing Models: Can Dragoon Increase Learning without Increasing the Time Required?

Article
Published: 21 January 2016

Volume 26, pages 1033–1068, (2016)
Cite this article

Download PDF

International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

Learning Science by Constructing Models: Can Dragoon Increase Learning without Increasing the Time Required?

Download PDF

Kurt VanLehn¹,
Greg Chung²,
Sachin Grover¹,
Ayesha Madni² &
…
Jon Wetzel¹

2483 Accesses
12 Citations
Explore all metrics

Abstract

A common hypothesis is that students will more deeply understand dynamic systems and other complex phenomena if they construct computational models of them. Attempts to demonstrate the advantages of model construction have been stymied by the long time required for students to acquire skill in model construction. In order to make model construction a feasible vehicle for science instruction, the Dragoon system combined three simplifications: (1) a simple notation for models of dynamic systems, (2) a step-based tutoring system, and (3) problems that described the model to be constructed as well as the system represented by the model. In order to test whether these simplifications reduced the time for learning how to construct models while preserving the benefits of model construction over baseline instruction, three classroom studies were conducted. All studies were experiments, in that they compared classes using Dragoon to classes learning the same material without Dragoon. However, as classroom studies, they could not tightly control all sources of variation. The first study produced null results, but it compared learning across just one class period. The second study in 4 high school science classes showed that instruction based on Dragoon cost only one extra class period (about 50 min) out of 4 class periods and was more effective than the same content taught without Dragoon. A third study in 3 more high school science classes, where 2 Dragoon classes and 1 non-Dragoon class met for the same number of class periods, showed that Dragoon was more effective than the same content taught without Dragoon. The effect sizes were moderately large on both an open response test (d = 1.00) and a concept mapping task (d = 0.49). Thus, it appears that our efforts have simplified model construction to the point that it can be used in science instruction with no additional class time needed, and yet it still seems to be more effective than the same instruction done without model construction.

Key Characteristics of Successful Science Learning: The Promise of Learning by Modelling

Article 02 December 2014

Yvonne G. Mulder, Ard W. Lazonder & Ton de Jong

Guided Discovery—Robert Gagné

Epistemic gameplay and discovery in computational model-based inquiry activities

Article 16 October 2017

Michelle Hoda Wilkerson, Rebecca Shareff, … Brian Gravel

Introduction

The main focus of this research is finding out if teaching high school students to construct models of dynamic systems can be done in a feasibly short period of class time and if so, whether such instruction helps them learn more about the dynamic systems than they would otherwise have learned during that time. The research focuses on modeling because modeling has become a central, valued practice in the emerging national standards of the USA. In both the Common Core State Standards for Mathematics (CCSSO 2011) and the Next Generation Science Standards (NGSS 2013; NRC 2012), modeling is one of 7 practices that are considered essential for high school students to master. However, “model” and “modeling” can denote many things. The first subsection below discusses types of models and why we chose to focus on just one type. The second subsection discusses types of educational activities that involve modeling and why we chose to focus on just one type. Subsequent subsections review the relevant literature and finally pose the research question in full detail.

Types of Models

In education, a “model” can denote many things. Although Collins and Ferguson (1993) present an impressively complete list of types of models, their list can be simplified into three basic categories:

Models that are expressed in an informal notation and their implications are derived informally. Examples are compare-and-contrast and taxonomies.
Models that are expressed in a formal notation, and their implications are derived manually. Examples are a concept map or a set of mathematical equations written on paper.
Models that are expressed in a formal notation, and their implications are derived by a computer. Examples are spreadsheets, NetLogo programs, and equations entered into a graphing calculator.

This research is focused exclusively on the third type of model, which uses computers to calculate the model’s predictions. Computer-based modeling is increasing in the professions, so this particular form of modeling is arguably an increasingly important practice to learn. Moreover, the standards place special emphasis on computational modeling. The Next Generation Science Standards discussion of modeling often mentions computational models and ends up emphasizing the need for computational modelling tools (pg. 59):

Curricula will need to stress the role of models explicitly and provide students with modeling tools (e.g., Model-It, agent-based modeling such as NetLogo, spreadsheet models), so that students come to value this core practice and develop a level of facility in constructing and applying appropriate models.

Both the Next Generation Science Standards and the Common Core State Standards for Mathematics distinguish between learning (1) skills and practices, such as computational modeling, and (2) domain concepts, principles and facts. Although the Dragoon project is addressing both instructional objectives, the work on teaching students how to construct models is reported in another publication (VanLehn et al. 2016). This paper covers only the work on using Dragoon to teach science.

Compared to other types of models, there are both advantages and disadvantages to using computational models for teaching science. On the one hand, an advantage of expressing the model in a formal language like mathematics or directed graphs is that the language is designed to make certain kinds of inferences easy and mechanical. Thus, if we follow Chi et al. (1994) in assessing deep understanding by asking students to make inferences that construct information that is not presented in the instruction, then formal models should facilitate such inference-making by the student. For example, telling a student that “the current through a two-terminal device is determined by the device’s resistance and the voltage across it” does not allow the student to make as many inferences as telling the student that “V = I*R, where V is voltage, I is current and R is resistance, all relative to the two-terminal device.” On the other hand, a disadvantage of formal models, regardless of whether they are interpreted computationally or by hand, is that they are expressed in a formal language that may be difficult for students to understand. Although the operators in the language might pose difficulties (e.g., in a concept map, what does the arrow between two concepts mean?), a more insidious factor is the tendency to use short names for components and quantities in the system, which makes it all too easy for students to lose track of what the names denote and leads them to manipulate the model in nonsensical ways (VanLehn 2013, section 7.2). For example, when interpreting V = I*R, students may lose track of which voltage V refers to. Lastly, when comparing the two types of formal models (human vs. computer calculation), a key advantage of the computational models is that much more complex systems can be analyzed feasibly if the computer does the calculations rather than having students do them. These observations suggest that computational models are most advantageous for learning science when the systems to be understood are complex and when the students already have fluency in the formal language of the model or can attain fluency easily.

Perhaps the most common types of computational modeling systems used in education are graphing calculators, spreadsheets, agent-based models and system dynamics models. The latter two require some explanation. An agent-based model is essentially a set of interacting computer programs, one for each type of agent. For instance, if one is modeling wolf-sheep population dynamics, one might have one program for the wolves and one for the sheep. NetLogo is currently the major language for agent-based modeling (ccl.northwestern.edu/netlogo/).

System dynamics models appear in two forms. When used in university engineering courses, a system dynamics model is expressed as coupled differential equations that are solved using MATLAB, Mathematica or similar systems. When used in social science, ecology, business and other university courses, a system dynamics model is expressed in a graphical notation called a stock-and-flow diagram. The notation, which appeared first in 1983 in Stella (http://www.iseesystems.com/), represents quantities as nodes in a directed graph. For instance, a wolf-sheep population model would have one node representing the number of wolves, a second node representing the number of sheep and other nodes representing other quantities.

For high school science instruction, graphical system dynamics languages seem to offer the best tradeoff along a scale of computational abstraction. Agent-based models are the most concrete in that they have visible objects that represent individuals in the system, such as a wolf, a sheep or a patch of grass. Such a model has a subprogram that represents the event of a wolf-sheep meeting that results in the wolf killing the sheep. This concreteness makes it easy to see what a given model represents and to envision or design a model for a given system. On the other hand, agent-based models require learning how to program NetLogo or some other language so that all those concrete details can be expressed. Thus, learning how to create an agent-based model is hard, but seeing what it represents is easy. This is just as one would expect of a more concrete model.

On the other end of the scale, the equations entered into graphing calculators are abstract but also familiar, because algebraic notation is taught in high school algebra classes. Once one has envisioned or designed an equation, entering it into a graphing calculator is easy. However, it is often difficult for students to see what the terms in an equation represent (Corbett et al. 2006). For instance, the frequency of fatal meetings between wolves and sheep would be expressed by a product of the number of wolves, the number of sheep and a parameter. It is not obvious why that product represents the number of predation events. Thus, constructing equations is easy, but it is hard to see what a given equation and its parts represent and to design equations to model a given system.

Graphical system dynamics models are midway between concrete, agent-based models and abstract, equation-based models. For instance, instead of an anonymous subexpression to represent predation, like the product mentioned above, a well-built graphical system dynamics model would have a node with a long name, such as “number of fatal wolf-sheep meetings” or “number of predation events.” These and other practices make it easier for students to see what a given model represents and to design a model for a given system (Löhner et al. 2003). However, the graphical notation is less familiar than the algebraic one, so it may take longer for students to learn how to enter graphical models than equation-based models.

Although we chose to focus on graphical system dynamics models because we believe they offer a good tradeoff between two sources of difficulty, an ideal curriculum would probably include several types of modeling languages because each has its own advantages, and students preparing to enter our computationally rich world would probably benefit from mastering all of them. However, teaching multiple modeling languages addresses computational modeling as a practice, which is a different instructional objective than the one addressed here. This paper is concerned only with using computational modeling to teach scientific principles, systems and facts. For that instructional objective, it may suffice to have students attain fluency in just one modelling language.

Types of Modeling Activities

Model construction and model exploration are the two most common educational activities done with computational models (Alessi 2000b; Stratford 1997). Model construction involves writing out a model in a formal notation, running the model on the computer, checking whether the predictions make sense and match any data on hand, and perhaps repeating this process in order to improve the model or its predictions. Like programming a computer, model construction requires fluency in the formal notation was well as good problem solving skills. Figure 1, from the Common Core State Standards for Mathematics, is a diagram of the model construction process.

In contrast, model exploration refers to manipulating aspects of a given computational model and observing the changes in its predictions. For instance, computational modeling systems typically let the user manipulate a slider in order to change the value of a model’s parameter, and this causes an instantaneous change in the gauges or plots that display the model’s predictions. For model exploration, it is not necessary for the student to see the model or to understand the modeling language. When the model is hidden from the user, and all the user sees are the controls (e.g., sliders) and prediction displays (e.g., graphs), the model is typically referred to as a simulation. Model exploration is the second major type of model-based learning activity.

According to the ICAP framework (Chi 2009; Chi and Wylie 2014), model construction should be more effective than model exploration, as the former is constructive (the “C” of ICAP) and the model exploration is often merely active (the “A” of ICAP). Although we know of no studies testing this hypothesis with graphical system dynamics models, Hashem and Midouser (2010, 2011) compared model construction and model exploration with NetLogo. More specifically, they focused on two NetLogo models exemplifying emergence, an important concept in complex systems. They compared a 3 h lesson where students explored the two models to 48 h of instruction in NetLogo programming culminating in students constructing the two models. The model construction group learned more about emergence than the model exploration group. The difference in mean post-test scores was reliable and moderately large. This is consistent with the common-sense hypothesis that model construction takes longer than model exploration, but fosters deeper learning.

We decided to focus on model construction in the belief that it generally affords deeper understanding than model exploration. However, the key is reducing the time required for students to achieve enough model construction skill so that science lessons based on model construction become feasible. In the near future, we would like to run an experiment similar to the Hashem and Midouser one but using Dragoon instead of NetLogo. We hypothesize that constructing Dragoon models produces more learning gains than model exploration and it costs only 1 or 2 h more instructional time instead of 48 h more. However, in our current studies, we compare Dragoon-based model construction only to baseline instruction that does not involve computational model construction. If Dragoon cannot beat baseline instruction, then there is no sense comparing it to model exploration.

However, “constructing a model of a given system” can mean many things depending on how the system is “given” (VanLehn 2013). For easy reference, let us define three points along a continuum of model construction tasks.

The most difficult model construction tasks merely identify the system but do not provide any information about it. Such tasks require the students to seek information empirically or in the literature, so they are a kind of inquiry activity. For example, such a task is: “Create a model of the elephant population of the Serengeti ecosystem covering the next 20 years.” This brief text is all that the students are given, so they must search for the current size of the elephant population, its historical sizes, factors that influence the population, etc. and eventually formulate all this information as a model.

At an intermediate level of difficulty are model construction tasks where the system is described completely and concisely. For instance, an example of such a description is:

As of August, 2014, there were 7535 elephants in the Serengeti ecosystem and an annual growth rate of 5 %. Construct a model of the population covering the next 20 years.

Although such problems often provide relatively complete information about the system and students are not expected to go to the literature to find out more about the system, even a complete system description may require making key assumptions, such as knowing that “growth rate” includes all sources of change in the elephant population, including births, deaths, emigration and immigration. A concise system description may also include extra, distractor information. Such model construction problems are like the word problems that are widely used in arithmetic and algebra classes.

Perhaps the simplest model construction tasks occur when students are first learning how to use the tools; these tasks provide a complete and concise description of the model as well as the system to be modeled. For instance, if the modeling language is algebraic equations, then such a description would be:

Construct a model of the Serengeti elephant population covering 2014 to 2034. Let E(t) be the herd population in year t. Assume E(2014) = 7535 and E(t + 1) = E(t) + 0.05*E(t).

If the students’ task is to construct a model using a graphing calculator or a spreadsheet, then they still have some work to do despite the concise description of the model. Similarly, a graphical system dynamics model can be described in terms of the nodes that need to be defined, and an agent-based model can be described in terms of the procedures each agent should follow.

In summary, three types of model construction tasks have been defined: One identifies the system but does not describe it; the second describes the system completely and concisely; the third describes both the model and the system completely and concisely. This research focuses on the latter two types of model construction tasks, so let’s adopt some terminology:

An analytic model construction task provides a complete and concise description of the system to be modeled. Solving such problems requires analytic skill, and a student who can solve any such problem has acquired analytic mastery.
A notational model construction task provides a complete and concise description of both the model and the system. Solving such problems requires notational skill, because students must translate the model description into the notation of the model. A student who can solve any such problems has acquired notational mastery.

Although analytic model construction tasks are often used in instruction (e.g., Mandinach and Cline 1994), notational model construction tasks appear to be less common, and the distinction between them has not been noticed or defined before.

Prior work on Educational Use of Graphical System Dynamics Modeling

There has already been considerable work on using graphical system dynamics modeling languages in education. It began in 1983 when Stella was first released. Educators were strongly impressed by the potential pedagogical benefits of Stella, and several large projects were conducted wherein teachers worked with researchers to co-design Stella-based instruction in a variety of disciplines (Doerr 1996). However, after a decade of studies, participants reached a discouraging conclusion (Alessi 2000a; Mandinach and Cline 1994, 1996; Zaraza and Fisher 1999). They first noted that there were three major ways to use Stella:

1.
Model exploration
2.
Model construction “word problems,” which are called analytic tasks in the terminology introduced above.
3.
Model construction of a non-trivial natural or engineered system. These are the first kind of model construction task mentioned early, which involve some type of inquiry.

Activity 1 (model exploration) was by far the most common activity, even among participants who had tried hard for many years to get their classes to do activity 3 (model-based inquiry). Activity 2 (analytic model construction tasks) was mostly done in math classes, whereas our focus is on science instruction. After several years, researchers and reviewers concluded that using model construction to learn science was rare because it required analytic mastery, which was not taught in math classes so it had to be taught in science classes. Although a few students attained sufficient mastery quickly, many students struggled to construct models even after many hours of instruction.

This suggested that the modeling language should be simplified in the hope that analytic mastery would be more quickly attained. This approach was pursued by two successful projects, Model-It (Crawford and Cullin 2004; Lee et al. 2011; Metcalf et al. 2000) and Betty’s Brain (Biswas et al. 2010; Chin et al. 2013; Leelawong and Biswas 2008; Schwartz et al. 2008, 2009; Segedy et al. 2012a, b). Both used a graphical modeling language that was similar to a concept map, in that nodes represented quantities and links represented how one quantity influenced another. Links were labelled with icons (in Model-It) or symbols (in Betty’s Brain) that indicated whether the influence was positive, negative, strongly positive or strongly negative. A key distinction in system dynamics modeling (between regular functions and integrals, i.e., stocks) was either absent (Betty’s Brain) or turned off for most students (Model-It). Thus, students primarily worked with qualitative descriptions of direct relationships between quantities. Lee et al. (2011) showed that inquiry instruction based on Model-It was more effective than inquiry instruction based on answering questions. Chin et al. (2010) showed that Betty’s Brain was more effective than instruction based on either concepts maps or science kits.

While these studies of Model-It and Betty’s Brain showed benefits for model construction compared to instruction without models, they did require many hours of training from the students. When shorter studies compared model construction to baseline instruction, they usually did not show uniformly reliable benefits for model construction (VanLehn 2013). For instance, a particularly well-controlled study (van Borkulo et al. 2012) compared students working on Co-Lab to students reading texts and answering essay questions. There were differences in the expected direction in only 2 of 8 measures. However, the Co-Lab instruction was short: 150 min.

Although these simplified modeling languages make it possible to use model construction to teach science, their representational powers are quite limited compared to Stella and other graphical system dynamics languages. Whereas Stella and similar languages are roughly equivalent to differential equations, the augmented concept maps of Model-It and Betty’s Brain are roughly equivalent to algebraic equations. They lack integrals (stocks), so they can only model static systems and not dynamic ones. Perhaps most importantly, they do not allow feedback loops, so they cannot model many important system behaviors such as homeostasis, oscillation or exponential decay. The positive empirical results with the simplified languages are encouraging in that they showed model construction was more effective than instruction based on baseline instruction. However, the problem remains: How can students use models to learn about dynamic systems?

Meanwhile, early efforts to make object-oriented programming easy enough for children to do (i.e., Logo (Papert 1980), Smalltalk (Goldberg and Tenenbaum 1975)) eventuated in a new computational modeling paradigm, now called agent-based modeling. After several early developments (Boohan 1995; Neumann et al. 1999; Repenning et al. 2000), Uri Wilensky’s NetLogo (ccl.northwestern.edu/netlogo) emerged as the most popular agent-based modeling tool, although competitors are beginning to appear such as Molecular Workbench (mw.concord.org) and SimSketch (Bollen and Van Joolingen 2013).

Case studies with NetLogo showed that there exist K12 students who can construct models of non-trivial systems (Centrola et al. 2000; Levy and Wilensky 2005; Wilensky 2003; Wilensky and Reisman 2006; Wilensky and Resnick 1999). Unfortunately, it proved to be difficult to replicate the success of case study participants in classrooms of students with variable background. For instance, Hashem and Midiouser (2010, 2011) found that it took 48 h of instruction in NetLogo programming to prepare students for a science lesson.

Because achieving analytic mastery takes too long for some students for both graphical system dynamics models and agent-base models, model exploration has become the dominant method for incorporating computational modeling into science instruction. Model exploration is the centerpiece of many interventions such as:

Biology: (Buckley et al. 2004; Hickey et al. 2003; Horwitz et al. 2010; Wilensky and Novak 2010)
Chemistry (Chiu and Linn 2014; Levy and Wilensky 2009a, b)
Earth science: (Gobert and Pallant 2004; Svihla and Linn 2012)
Physics: (Sengupta and Wilensky 2009; Shen and Linn 2011)

In summary, the current state of art in educational modeling uses model exploration almost exclusively. Although the standards are clearly calling for students to practice model construction (see Fig. 1), it is absent from the curriculum because analytic mastery takes too long to acquire for some students. Nonetheless, there are always a few students who master model construction quickly and deftly apply it to construct a deep understanding of systems, which keeps the hope alive that all students can somehow be taught model construction quickly enough that it can be a useful tool for understanding natural and engineered systems. This is the problem addressed by the project reported here. For the reasons discussed above, we are focusing on graphical system dynamics models.

Basic Technical Approach

One approach to making model construction feasible in schools is to reduce the time required to achieve analytic mastery enough that science classes can afford to teach analytic mastery instead of relying on math classes to do so. This project started by pursuing that approach and achieved some success. By simplifying the notation and having students practice solving modeling problems on a step-based tutoring system, it appears that most students who have had high school algebra can achieve analytic mastery with Dragoon in a maximum of 7 h of instruction, with a mean of about 5 h (VanLehn et al. 2016).

However, for most science instruction applications, spending 5 to 7 h on a prerequisite skill isn’t feasible. If teachers can only spend three or four class periods on teaching a particular dynamic system, then they can’t afford to spend most of that time getting students up to speed on analytic mastery. During the course of the studies described here, we discovered a more efficient instructional method: After a brief introduction to the modeling notation and the computer user interface, students are given notational model construction tasks for the target dynamics systems. Thus, they learn the target science concepts while they attain notational mastery. They do not attain analytic mastery, nor do they need it in order to learn the science.

The rest of this document presents first the model construction system, Dragoon, and then the studies where it was used for science instruction.

Dragoon

First, some terminology needs to be underscored. A system is just a part of the real world, and a dynamic system is a part of the real world that changes over time. A record of the system’s changes is called its behavior. A model is an expression in a modeling language that can be interpreted by computer. Executing a model generates its predictions. An accurate model will generate predictions that match the system’s behavior.

Dragoon uses a graphical notation similar to the one pioneered by Stella (Doerr 1996). Figure 2 shows a simple model in Dragoon. Each node in the directed graph represents both a quantity and how its value is calculated. For every input to a node’s calculation, there is a link coming into the node from a node representing that input quantity.

A rectangle represents a quantity whose value is the integral of its inputs over time.
A circle represents a quantity whose value is an ordinary function (i.e., no integrals or differentials) of its inputs.
A diamond represents a parameter, which is a quantity with a constant value. Parameters have sliders so that the user can easily modify their value and observe the effect on the model’s predictions.

It is important to understand the difference between function nodes and accumulator nodes. Suppose a quantity X is determined by Y + Z; then X is represented by either a function node or an accumulator node with both Y and Z as input links. However, it is often unclear to students which type of node to use. They should use an accumulator if Y + Z represents how much X changes with each unit of time. In mathematical language, dX/dt = Y + Z or X = ∫ (Y + Z) dt. On the other hand, if X’s value at any time tick is equal to the value of Y+ Z at that same time tick, then X should be represented by a function node. That is, X = Y + Z at all times. Using the model of Fig. 2 as an example, “moose population” is an accumulator because it is the integral over time of its inputs, “moose births” and “moose deaths.” On the other hand, “moose births” is a function because it is not an integral over time of its inputs, “moose population” and “moose birth rate”, but instead is a function (i.e., the product) of their current values.

Authors construct a model by clicking on the “Create Node” button, and then filling in a form (Fig. 3) with a name for the node, its type (parameter, accumulator or function), its inputs and how its value is calculated. When the student clicks on the Graph or Table button, Dragoon pops up plots or tables of the quantities as a function of time (see Fig. 4). The sliders allow the user to temporarily change the value of the parameters and observe the resulting change in the plots and tables.

What has been described so far is just the typical model construction system: an editor for constructing a model and displaying its predictions. When Dragoon is in author mode, this is all that the user has available. (Actually, there are a few more features in author mode. See (Wetzel et al. 2016) for details.) On the other hand, when Dragoon is in one of its student modes, it can give helpful feedback. This study used only immediate feedback mode, wherein Dragoon provides feedback on each step in the model-construction process. It colors an entry in the node editor green if its value matches the corresponding value in the author’s model, and red otherwise (Fig. 3). When too many mismatches have been made on an entry, Dragoon fills in the entry with the author’s value and colors it yellow. When the student closes the node editor, the colors of its entries are reflected in the color of its boundary (see Fig. 1).

Dragoon has other features than the ones discussed here (see (Wetzel et al. 2016) for descriptions). However, they were not used in the studies discussed below. In the studies below, students merely accessed Dragoon in order to solve an assigned modeling problem. They logged into Dragoon, creating an account if they had not done so already, found the assigned modeling problem in the menus, solved it, and clicked on Done.

Formative Evaluations

Our initial attempts to use Dragoon to teach science were conducted in late 2013. We first developed a short sequence of model construction problems that were intended to introduce students to Dragoon and modeling and give them sufficient skill in analytic model construction. The sequence involved only familiar systems, such as gaining and losing weight while dieting, so students did not need to learn any science in order to work through the introductory sequence. Pilot tests with 4 high school students indicated that everyone could work through the sequence in less than an hour.

In order to detect changes in students’ skill and understanding of system dynamics modeling, a pre-test and post-test were developed. They consisted of a series of questions about a familiar system: the accumulation and removal of litter on the school grounds.

Next, four science teachers met with researchers for a 2 day workshop. The teachers were all from the same high school. The four teachers taught biology, chemistry, physics and earth science, respectively. They learned how to use Dragoon and then worked with researchers to develop modules for their classes. All the modules began by having students work through the introductory sequence. The modules then split, focusing on topics chosen by the teachers.

When the modules were enacted, the instruction lasted from 1 to 3 h, spread over a small number of days. The pre-test and post-test took about 35 to 45 min each. Teachers did all the teaching; researchers only observed and conducted a few interviews after selected classes.

The formative evaluations succeeded in uncovering a large number of flaws in the Dragoon software, the instruction, the professional development, and many other things. This led to a complete redesign of the notation in order to simplify it. The Dragoon software was completely rewritten. Dragoon was rewritten in JavaScript to run in a web browser and thus avoid Java installation problems. We learned to organize the instructional development differently. Using a single workshop to train teachers in using Dragoon and then expecting them to author problems and instruction was too challenging for most of the teachers. In subsequent work, we first taught teachers how to use Dragoon, and then had them work closely with researchers who did the actual authoring.

Most importantly, we found that students varied considerably in how fast they learned to use Dragoon. Some students required only a few minutes to understand the notation and learn how to use the interface. Others were still struggling after several hours of usage. For instance, students in the AP Physics class learned how to construct Dragoon models much faster than students in the biology class. This led us to develop different instructional approaches for different classes. The subsequent studies, presented next, used the rewritten Dragoon software and began implementing new practices.

Study 1: Physics

The purpose of study 1 was to evaluate Dragoon-based instruction in domain principles and concepts. The context was an AP high school physics course. Students who had already learned a little about the dynamics of falling bodies solved problems using either Dragoon or paper. This study occurred before we understood the distinction between notational mastery and analytic mastery. All the problems were presented using complete, concise descriptions of systems and not models. For instance, the descriptions did not use modeling terminology such as “accumulator” or “parameter.”

Design

The study had two instructional treatments which were different enough that they had to be run in different classes. Thus, all the students in one class used Dragoon-based instruction while all the students in a second class used paper-based instruction. The same instructor taught both classes. The students’ learning gains were measured using a pre-test and post-test with some shared items.

Subjects

Most of the students (75 %) were in 12th grade, most (67 %) were taking calculus, and most (76 %) had received an A grade on their most recent math class. In other words, these students had very strong mathematical backgrounds compared to most high school students.

There were 26 students in the Dragoon group and 29 students in the Control group. Although there were no statistically reliable differences between the groups in their math, computer science and physics background, the Control group had slightly more students (31 %) who programmed outside of class compared to the Dragoon group (19 %). The Control group also self-reported more familiarity with modeling than the Dragoon group.

Materials

The instructor developed the instructional materials, which consisted of five kinematics problems (see Fig. 5). The same kinematics principles and concepts were used in all the problems. The problems did not form a model progression. That is, each could be solved without having solved the earlier ones. The presentations were system descriptions rather than model descriptions. That is, the descriptions did not identify nodes nor indicate their types and inputs. Although the first two problems only asked the student to construct a model that would graph particular quantities, the remaining three problems required first constructing a model then answering questions about its behavior. Nonetheless, all the problems required the same model structure, which is shown in Fig. 6. Only the values of the parameters differed across problems.

The assessments were composed of four subtests. The first subtest assessed the student’s skill at understanding dynamic systems and constructing tabular models of them. The second subtest consisted of solving traditional quantitative physics questions. The third subtest had students draw acceleration, velocity and displacement of falling objects over time. The fourth subtest asked students to draw a concept map that used a given set of node and link names. Although subtest 3 (drawing graphs) was given only on the post-test, the other three subtests were given during both the pre-test and post-test.

Procedure

The study spanned three consecutive days, with each class period lasting 50 min. Both the Treatment and Control classes were in the late morning. On the first day, the pre-test was given. On the second day, students solved the five instructional problems working as they normally did, wherein some students worked independently and some worked in pairs. The instructor gave the Dragoon class a brief demonstration of how to use Dragoon, and then told the students to solve the 5 problems. The Control students solved the same problems on paper while using their textbook as a resource. Although the instructor did not apply a uniform procedure for giving Control students’ feedback on their work, he did roam the classroom answering questions and inspecting students’ progress. Moreover, these students tended to ask for feedback if they had any doubts about correctness. On the third day, the post-test was given.

Scoring of the Pre-Tests and Post-Tests

The content of the pre-test and post-test will be described in the results section. Here the methods for scoring are described for items that could not be scored by an exact match to an answer on a coding sheet.

For open response items, researchers constructed rubrics and trained a coder to apply them. Unfortunately, a second coder was not used so we do not have inter-coder reliability measures.

For the concept maps, two experts generated “correct” concept maps. Student concept maps were compared to the two expert maps, to produce four sets of scores per expert:

Exact scoring counts propositions in the student map that exactly match propositions in that expert’s map.
Ignore direction scoring counts propositions in the student map that match propositions in the expert map when disregarding the arrow direction but taking into account the link label.
Ignore label scoring counts propositions in the student map that match propositions in the expert map when disregarding the relationship label but taking into account the arrow direction.
Loose scoring counts propositions in the student map that match propositions in the expert map if both the arrow direction and the relationship label are disregarded.

Thus, each student concept map received eight scores. The inter-rater reliability was assessed using generalizability theory. The amount of variability attributed to rater was high, and ranged from 26 % for the exact scoring to 49 % for the loose scoring. This means that scores from one expert map should only be compared to scores from the same expert map. Similarly, scores should not be compared across score types.

Results: Pre-Tests and Post-Tests

The first subtest assessed the students’ skill at modeling system dynamics. On the pre-test, the two conditions hardly differed in their mean scores (Dragoon 17.42 (sd 2.98); Control 17.86 (sd 2.52); p = 0.56). On the post-test, the mean of the Dragoon students (18.24, sd 2.47) and the mean of the Control students (17.79, sd 2.74) were not significantly different (p = 0.54). Thus, it appears that on this measure, none of the students gained much, and the Dragoon students were no better at learning than the Control students.

The performance of the students on the second subtest, which measured their ability to answer traditional physics questions, turned out to be too difficult for these students. Out of a possible 3 points, one Dragoon student scored 1 point on the pre-test; all other scores, on both pre-test and post-test, were zero. The Control students were only slightly better, with a mean score of 0.21 (sd 0.56) out of 3 on the pre-test and a mean of 0.31 (sd 0.54) on the post-test. None of the differences between means were significant. Thus, on this measure, there was a floor effect.

The third subtest, which measured students’ ability to draw the kinematic values of acceleration, velocity and displacement of a falling object over time, was given only during the post-test. Out of a maximum score of 6, the mean score of the Dragoon students (2.60, sd 1.29) was slightly lower than the mean score of the Control students (2.72, sd 1.16), but the means were not reliably different (p = 0.71).

The fourth subtest had students draw a concept map using a given list of physics terms. As mentioned earlier, there were eight scores per concept map (2 expert maps × 4 methods of matching). On all eight score types, the Control students scored higher than the Dragoon students on both the pre-test and the post-test. The gain scores (post – pre) averaged −0.66 for the Dragoon students and −2.46 for the Control students. This could be regression to the mean, perhaps caused by students putting less effort into the post-test concept maps.

Overall, the results suggest that neither group learned much physics nor much system dynamics modeling during the study. This may have been due to the short duration of the treatment. Moreover, these students had already been studying kinematics for 2 weeks. Clearly, they had much left to learn, as their scores were not high. Nonetheless, adding one more day of instruction onto the preceding 2 weeks appears not to have made much difference in their scores.

Results: Log Data

Because the Control students did not use a computer system for their problem solving, log data were available only for the Dragoon students. Thus, the analyses in this section comprise a brief description of the Dragoon students’ work processes.

Although the observers reported that most student were on-task for most of the time, as one would expect from such academically successful students, the mean time spent solving problems on Dragoon was low. Of the 55 min period, students had a Dragoon problem open for only 21 min on average. Although the initial 15 min (approximately) of the period were spend on classroom logistics and a demonstration of Dragoon, the students were not given any readings or workbooks, so one would expect that they would spend almost all the remaining 40 min working on Dragoon. One possible interpretation is that although every student had a laptop running Dragoon, some worked in pairs. Thus, they may have collaborated on solving a problem using one student’s computer, then opened Dragoon on the second computer only to copy the model from the first computer. Because the observer did not record the arrangement of students into pairs, this conjecture cannot be confirmed with the log data.

Even though students were given minimal instruction in how to use Dragoon, most figured it out. Figure 7 is a histogram of the number of problems solved. Only five students solved no problems at all. Although the mean number of problems solved was also low, 1.6 problems, many of the students (55 %) at least opened all the problems, even though they didn’t always complete them. Overall, these findings suggest that their difficulty was in solving the problems rather than in using Dragoon.

Interpretation of Study 1

The bad news from this study is that our materials seemed to be somewhat advanced for these students, as neither group appear to have learned much new physics during the single day of instruction that they received. The good news is that some students seemed to have attained at least some proficiency in analytic model construction in that most (24 of 29) were able to solve at least one Dragoon problem without much instruction on how to use Dragoon. However, we do not know how much help students were giving each other, so there is considerable uncertainty about this interpretation. On the other hand, observers of both Study 1 and the formative evaluation that occurred the preceding year in the same classrooms reported informally that the students seemed to have no trouble using Dragoon. This would be plausible given the strong backgrounds of these AP physics students. However, if the formative evaluation of the other classes is any guide, then rapid skill acquisition is not likely to occur in more typical high school science classes. Thus, the next study used a more typical class, notational tasks instead of analytic ones, and a longer, more structured instructional sequence than the one used in Study one.

Study 2: Physiology

The purpose of study 2 was to compare Dragoon to baseline instruction over a longer period of instruction using a highly structured workbook of notational tasks. The context was four high school physiology classes in a school in California. This study pioneered the use of concise model descriptions for the target systems.