Cracking the code: An evidence-based approach to teaching Python in an undergraduate earth science setting

Scientific programming has become increasingly essential for manipulating, visualizing, and interpreting the large volumes of data acquired in earth science research. Yet few domain-specific instructional approaches have been documented and assessed for their effectiveness in equipping geoscience undergraduate students with coding and data literacy skills. Here we report on an evidence-based redesign of an introductory Python programming course, taught fully remotely in 2020 in the School of Oceanography at the University of Washington. Key components included a flipped structure, activities infused with active learning, an individualized final research project, and a focus on creating an accessible learning environment. Cloud-based notebooks were used to teach fundamental Python syntax as well as functions from packages widely used in climate-related disciplines. By analyzing quantitative and qualitative student metrics from online learning platforms, surveys, assignments, and a student focus group, we conclude that the instructional design facilitated student learning and supported self-guided scientific inquiry. Students with less or no prior exposure to coding achieved similar success to peers with more previous experience, an outcome likely mediated by high engagement with course resources. We believe that the constructivist approach to teaching introductory programming and data analysis that we present could be broadly applicable across the earth sciences and in other scientific domains.


Introduction 25
Motivation 26 Data programming has become the foundation of research in today's geoscientific disciplines. As the volume and 27 size of earth science data sets have steadily increased, so have the complexity and ubiquity of the computational 28 techniques used for analysis and visualization. Some argue that innovation in earth science research will 29 increasingly be driven by one's competency in translating ideas into computer code (Jacobs et al., 2016). 30 The field of oceanography is no exception to this "data tsunami," with more hydrographic casts collected in the 31 past two decades than over the previous 100 years (Brett et al., 2020). Unprecedented collaborative initiatives 32 such as the Argo profiling float array (Wong et al., 2020), the National Science Foundation's Ocean Observatories 33 Initiative (OOI; Greengrove et al., 2020), and remote sensing platforms such as satellite altimeters (Scheick et al.,34 2023) are continuously adding to expansive, publicly available data sets. In addition to these observational 35 programs, hard drives at institutions across the world are being filled with terabytes of data generated by 36 numerical simulations. From highly resolved ocean general circulation models to lower-resolution global climate 37 models assessed in the Intergovernmental Panel on Climate Change (IPCC) reports, the natural ocean is being 38 reproduced with ever-increasing fidelity (Haine et al., 2021). The resulting challenges in accessing and analyzing 39 these data require new computational tools that enable truly open science, further motivated by the notion that 40 "research conducted openly and transparently leads to better science" (National Academies of Sciences, 41 Engineering, and Medicine, 2018). At the same time, the computational methods used to study the ocean -which 42 have traditionally differed between modeling-and observation-focused oceanographers -remain "radically 43 unstandardized," contributing to scientific code that is influenced by unique requirements and social contexts and 44 may deviate from best practices in software engineering, as highlighted by an ethnography of oceanographers' 45 programming practices (Kuksenok et al., 2017). 46 Domain-specific computational coursework and data literacy are thus a critical part of a modern oceanographic 47 undergraduate curriculum, and we infer the same applies across many geoscience disciplines. While students can 48 collect and analyze small-scale data sets through hands-on fieldwork and labs that are common elements of 49 undergraduate earth science curricula, working with larger, professionally collected data sets requires familiarity 50 with a programming language (Kastens et al., 2015). Historically, introductory programming education has been 51 the responsibility of computer science departments, with a focus on data structures and algorithms. Geoscience-52 specific programming instruction will necessarily reflect distinct goals and tools compared to computer science 53 (Grapenthin, 2011) or data science (Anderson et al., 2015;Lasser et al., 2021), namely, the use of coding to derive 54 insight into natural systems through mathematical manipulation, visualization, and interpretation of idiosyncratic 55 data, often in the time and space domains. Yet scientific computing is often absent in earth science curricula, 56 including oceanography (Old, 2019), except for highly scaffolded coding modules in courses where programming 57 is not the focus (e.g., Rowe et al., 2021). In this void, brief but intensive hands-on workshops like those offered by 58 Software Carpentry (https://software-carpentry.org; Wilson, 2016), Data Carpentry (https://datacarpentry.org/; 59 Irving, 2019), and scientific societies (e.g., Arms et al., 2020) have provided crucial training to young scientists. 60 These short workshops, however, give learners limited opportunities to apply new coding skills to their own 61 research in a supervised setting. In lieu of formalized instruction, many earth science students teach themselves 62 programming during research experiences or in graduate programs, which can lead to the propagation of ad hoc, 63 inefficient, and outdated practices. 64 Incorporating programming into an earth science curriculum additionally opens the door to a constructivist 65 approach to teaching scientific concepts-one that encourages students to use experimentation and individualized, 66 self-guided inquiry to build on previous learning, construct new knowledge, and engage in critical reflection 67 (Bada, 2015;Hadjerrouit, 2008). The iterative, reflective process of writing and refining scientific code makes it 68 naturally suited to this individualized model of learning. In practice, a constructivist pedagogy -much like 69 programming instruction -often involves active techniques such as project-based investigation, cooperative 70 learning, and inquiry-based activities, which have been shown to improve student competencies in information 71 Why teach Python? 95 In an introductory classroom setting, the choice of programming language matters. Python is an ideal candidate, 96 as it is easy to learn, versatile, and free to use. First released three decades ago, Python is increasingly ubiquitous 97 within earth science (Lin, 2012) and is widely used outside the scientific community, particularly in industry, 98 making it valuable even for students seeking a career outside of academia (Srinath, 2017). The language features 99 concise, easily read, higher-level syntax that allows one to focus on data exploration, enabling more efficient 100 science (Ayer et al., 2014;Jacobs et al., 2016;Lin, 2012). For those learning programming for the first time, a 101 primary challenge is thinking algorithmically, that is, developing structured code to solve a problem. Compared to 102 Python, lower-level programming languages commonly taught in introductory computer science courses (such as 103 Java and C++) require substantial syntactical overhead that can distract from achieving that pedagogical goal 104 (Pears et al., 2007;Srinath, 2017). 105 Python offers other advantages (Gentemann et al., 2021). Its open-source nature has fostered a large active 106 developer community, which has contributed to its stability and the dissemination of numerous multipurpose 107 packages that extend its functionality. Python is free to download and use, avoiding reliance on expensive 108 commercial solutions that can render analysis code inaccessible to scientists outside of well-resourced university 109 environments. These stand in contrast to MATLAB, a scientific programming language also popular in 110 geoscientific research. Despite the clear benefits of teaching Python in an earth science context, we find only one 111 documented example of an instructional approach for a quarter-or semester-long course in the existing literature 112 (Jacobs et al., 2016). 113 114 Our study reports on an evidence-based redesign of an undergraduate oceanography course that teaches 115

Course history and development
introductory Python data analysis techniques. In subsequent sections, we highlight key course elements 116 (summarized schematically in Fig. 1) and assess the efficacy of the redesign from the standpoint of student 117 engagement and learning. 118 6 S.C.R. established and previously taught "Methods of oceanographic data analysis" (OCEAN 215)

annually in the 119
School of Oceanography at the University of Washington from 2015-2019. It was the first introductory Python 120 course offered by the department and met in person two times each week in two-hour sessions that featured a mix 121 of traditional lecturing and dedicated homework time. Over a ten-week quarter, students completed four 122 assignments using programming techniques taught in lectures. The course was well-received by students, who 123 rated it as "very good" (4 on a scale from 1-5) across a variety of metrics in end-of-quarter evaluations from 2015, 124 2016, 2017, and 2019 (Fig. 2), and has been perceived as demanding relative to other courses in students' 125 curricula (see Fig. S1 in Supplemental Materials). 126 However, faculty teaching other courses in the department's curriculum reported that many students who 127 completed OCEAN 215 had difficulty with core Python programming tasks. A review of past senior theses -128 projects in which students formulate and execute original research -revealed that students often used minimal 129 scientific code and reverted to less versatile, non-coding solutions like Microsoft Excel and Google Earth for data 130 visualizations, to the detriment of their science. Given that students recognized the usefulness of the course 131 content after completing the course (see Fig. S1 in Supplemental Materials), we partially attribute their 132 subsequent hesitancy and lack of confidence in applying Python skills to weaknesses in the course design, some 133 of which are prevalent across undergraduate education: 134 • An overreliance on non-interactive lectures. This is commonplace-in a survey of almost 200 135 undergraduate oceanography professors, for example, three-quarters indicated that they use data in their 136 teaching but are most likely to use a lecture teaching strategy, rather than creating opportunities for active 137 inquiry (McDonnell et al., 2015). As detailed above (see Introduction section "Active learning"), 138 traditional lecturing is less effective at promoting student understanding and retention of material than 139 active learning techniques. 140 • A lack of student-driven inquiry. In assignments, students answered prescribed questions and worked with 141 tidy, unrealistically clean scientific data. Such a controlled environment is valuable for practicing basic 142 7 skills but offers students few opportunities to pose their own questions and engage in "open inquiry," 143 which Banchi & Bell (2008) associate with deeper, more original scientific thinking. 144 • A stagnation of curriculum. Since the course's launch in 2015, the scientific computing landscape has 145 rapidly evolved (Gentemann et al., 2021). However, certain course elements not reflective of current 146 scientific Python practices were still taught, resulting in the use of outdated, unsupported, and 147 unnecessarily limiting packages and methods. At the same time, the course did not formally address 148 essential programming practices such as commenting etiquette, formulaic code debugging, and use of 149 The course was restructured ( Fig. 1, Table 1) and subsequently co-taught during a 10-week quarter in 2020 by 151 two graduate students (E.C.C. and K.M.C.), both of whom had served as TAs in past years. Twenty-five 152 undergraduate students completed the course, a typical class size (Fig. 2). The plurality were third-year 153 oceanography majors. No prior knowledge of computing or upper-level math was required or assumed. Elements 154 retained from previous iterations included the basic format of four structured programming assignments as well as 155 twice-weekly classes and office hours; however, the latter were conducted virtually rather than in a physical 156 classroom space. 157 In 2020, the COVID-19 pandemic forced a swift transition to virtual instruction. The timing of this course in 158 Autumn 2020, however, allowed for careful planning of an online learning framework, rather than the forced 159 adoption of emergency remote instruction necessary in the first half of 2020 (Donham et al., 2022;Hodges et al., 160 2020). Nonetheless, disruptions outside of the classroom were still present: students dealt with being isolated on 161 campus or sequestered at home with family, research programs had to be reconfigured, mental health declined, 162 and many became sick or had loved ones fall ill or even pass away (Furman & Moldwin, 2021). With these 163 realities in mind, the course redesign also paid special attention to the need for a supportive and accommodating 164 learning environment (Shay & Pohan, 2021). 165  8   The updates to the course were guided by past experience as TAs, consultation with previous teaching teams and  166   department faculty, the need for fully virtual instruction during the COVID-19 pandemic, and a desire to infuse  167   the course with active learning strategies. Changes included flipped video lessons delivered on the online platform  168 Panopto, an individually-driven final research project, content that reflected the current scientific Python 169 ecosystem (including cloud-based notebooks; see Table 1), discussions on the online question-and-answer (Q&A) 170 forum Piazza, analysis of data from a wider range of earth science domains, encouragement of pair collaboration 171 and use of external resources, and a syllabus with explicit policies, expectations, and the following end-of-quarter 172 student learning outcomes: 173 • Understand why the Python programming language is ideal for data analysis. 174 • Write, execute, and debug Python code. 175 • Access, read, transform, visualize, and interpret oceanographic data with confidence using Python. 176 • Explore the ever-expanding universe of packages and tools available for creating and sharing code. 177 • Formulate and investigate scientific research questions using programming and data analysis skills. 178 • Adopt best practices in programming and data visualization that facilitate collaboration and information-179 sharing, both within the classroom and the broader scientific community. 180 All course materials were original, created by the graduate instructors, and are available for free reuse and 181 adaptation under a CC-BY-4.0 license at https://ethan-campbell.github.io/OCEAN_215/. 182

183
We qualitatively assess the effectiveness of instructional approaches in Autumn 2020 using descriptive examples 184 from the quarter. We also quantitatively analyze the data from standardized course evaluations, an end-of-quarter 185 student survey, graded assessments, and engagement/usage metrics provided by the video and Q&A platforms. 186 Various student-specific engagement and performance metrics were collected by the co-instructors (E.C.C. and 187 K.M.C.), as described in sections below. Prior to analysis, all metrics were de-identified and coded by a coauthor 188 (M.N.) who was not directly involved in quantitative analyses; identified versions were not used thereafter. This 9 study was approved as qualifying for exempt status for institutional review by the Human Subjects Division at the 190 University of Washington. 191 Initial, mid-quarter, and end-of-quarter surveys 192 To gauge initial exposure to the Python programming language and to coding in general, students were asked to 193 share their prior experience(s) in an introductory survey issued during week 1 (Assignment #0). The instructors 194 translated students' short-answer responses into a numeric rating (1-5) using a subjective analysis of their word 195 choice (see rubric in Table S1 in Supplemental Materials). The factors considered were any previous coding 196 languages learned, the reported efficacy of past learning experiences, and time since last exposure to coding. were consistent across years, others evolved in their wording and thus required mapping or aggregation to enable 202 comparison between years (as shown in Table S2 in Supplemental Materials). Questions that could not be tracked 203 across years were excluded. Students completed surveys either in paper or online format, with the class response 204 rate of around 70% in 2020 being somewhat higher than in past years ( Fig. S1 in Supplemental Materials). As 205 IAS summary reports correspond to specific instructors, we averaged the class median responses between the two 206 graduate instructors for each question in 2020.  Table S3 in the Supplemental Materials. In addition to excerpting quotes from students' 210 responses, we identified common or unique themes mentioned by students and tabulated the frequency with 211 which each theme was mentioned in either a subjectively positive context (e.g., an appreciative or affirming 212 10 comment; assigned a value of +1) or subjectively negative context (e.g., an unenthusiastic or critical comment; 213 assigned a value of -1) (Fig. 3). 214 In addition to the university-managed IAS surveys, a Google Form survey was administered during the week after 215 the final class to measure students' perceived success relative to the main objectives outlined in the syllabus. The 216 response rate was 92%. Submissions were not anonymous, but instructors guaranteed that students' responses 217 would not impact their final course grades. As a final self-assessment of students' Python skills, we use responses 218 to the question, "How proficient do you feel in writing, executing, and debugging Python code?", which were on 219 a 6-point scale from "Least proficient" to "Most proficient." 220 where applicable -were downloaded, and student identities were anonymized as described above. Usage data are 225 presented in Fig. 4, Fig. 5a, and Fig. S2 in the Supplemental Materials. Student-specific Panopto metrics 226 computed for Fig. 6 include total minutes watched, minutes watched before the class for which a video was 227 assigned, and minutes watched after class for the first time (i.e., late views). 228

Final grades and programming skills 229
To measure learning outcomes, students' final grades and programming skills at the conclusion of the course are 230 presented in Fig. 6. Grades were recalculated to ignore assignments that students did not complete (i.e., dropping 231 grades of 0%), and the following weights were re-applied: 60% for assignments #0-#4 (weighted equally), 15% 232 for Piazza posts, and 25% for final projects. Original and recalculated final grades averaged 95.0% and 95.9%, 233 respectively, with standard deviations of 5.7% and 3.8%. Programming skills were evaluated as the fraction of 234 Python syntax (functions, operators, and methods) taught in the course that were used at least once in each 235 student's final project code notebook (see Table S4 in the Supplemental Materials). This metric varies widely 236 between students from 6% to 29% of all syntax keywords taught and thus offers significant discriminatory power, 237 albeit limited by our exclusion of miscellaneous functions that were not taught in the course but were used by 238 some students at higher skill levels. 239 Online forum engagement 240 Piazza, the online Q&A platform, also makes usage statistics available to instructors. The following student-241 specific metrics (presented in Fig. 6) were downloaded, then anonymized as described above: days online, 242 answers, and total contributions (which include questions, notes, answers, and comments). Additionally, a time 243 series of engagement was constructed ( Fig. 5a) based on unique users per day, as provided by Piazza. The time 244 series was supplemented by a manual tabulation of daily Piazza activity within the following categories: student 245 questions and notes related to programming; student scheduling, extension, or logistical requests; student answers 246 and comments; student posts that were required for assignments; and instructor posts, answers, or comments. 247 Where relevant, those categories were further divided by chosen audience into total posts that were public and 248 signed, public and anonymous, or private (i.e., visible to instructors only), as shown in Fig. 5b. 249 were issued over a month prior to selecting students). Three focus group sessions were held in the quarter 257

Student focus group
following Autumn 2020, each lasting 1-2 hours. In the sessions, E.C.C. and K.M.C. asked questions designed to 258 provoke open and candid discussion on students' perception of course elements. Insights gleaned from the focus 259 group are clearly denoted in the text. We use them as supporting evidence to depict students' perspectives about 260 the course more holistically and accurately, and to indicate areas where students felt the course could be modified 261 to improve their experience. 262 Additionally, at the request of E.C.C. and K.M.C., four of the five students shared short testimonials detailing 263 their unique experiences in the course, which are presented in Box 1. The testimonials were assembled from 264 students' responses to their selection of a subset of the guiding questions included as Table S5 in the 265 Supplemental Materials and were edited for style and grammar. As noted below in Author Contributions, the five 266 undergraduate students were offered coauthorship on the basis of their substantive intellectual and written 267 contributions to this study and were full participants in providing input on the final manuscript. The 268 undergraduate student coauthors did not have access to the anonymized student metrics described above and did 269 not participate in analysis of the data. Python syntax, as well as data management and research practices ( Table 1). Students learned core functions (see 274  considerations of accuracy and accessibility when choosing colormaps for visualizations (Thyng et al., 2016). 285 These concepts were introduced using examples and data from oceanographic disciplines (physics, chemistry, 286 biology, and marine geology) and other domains (e.g., cryosphere, atmosphere, and climate) using scaffolding to 287 familiarize students with new topics. 288 That said, the most novel aspect of this course was not its content but rather how it was taught. As we discuss in 289 the following sections, an effective learning environment was created through the use of evidence-based 290 pedagogical elements: a mix of flipped lectures and engaging activities, opportunities for student collaboration, an 291 online discussion forum, a student-designed research project, and efforts to center accessibility and foster 292 classroom community. for grading purposes, similar to Google Docs, and built-in edit history can confirm students' compliance with 308 deadlines. While constraints exist, such as a lack of transparent package management, computational limitations, 309 and the need for an internet connection, the advantages of Google Colab outweigh its disadvantages in a 310 classroom setting. 311

Flipped structure
312 Blended learning models have been shown in a systematic review to improve the learning experience of novice 313 programmers, as they allow class time to be reserved for active learning and afford students more flexibility to 314 plan and customize their study (Alammary, 2019). In our course, a flipped classroom approach was implemented 315 by assigning 14 recorded lessons of approximately 30 minutes each to be watched before synchronous (Zoom) 316 sessions. Most lessons consisted of lectures using slides that illustrated Python concepts using multiple 317 representations, which has been suggested as a core pedagogical strategy for teaching programming (Hadjerrouit, 318 2008). For example, slides introducing a new concept would often include three distinct representations: a 319 simplified overview of syntax and function arguments, a minimal example of the function or concept being used 320 (e.g., Fig. 1b), and a schematic or illustrative plot. Consistent fonts, color schemes, and other design elements 321 were used to reliably indicate relationships between concepts and distinguish examples from core syntax. Some 322 lessons used live-coding demonstrations rather than slides. Accompanying Colab notebooks were provided with 323 each lesson to allow students to run code while watching. 324 The 14 flipped lessons were divided into 41 tightly scripted segments of about 10 minutes each (see Fig. S2c in 325 Supplemental Materials). This was done with the goal of helping students maintain focus, as some evidence 326 suggests the average student has an attention span of 15-20 min during traditional lecturing (Middendorf & 327 Kalish, 1996). In addition to segmenting videos, students were reminded to take breaks between segments. 328 Students in the focus group indicated that they indeed used these opportunities to step away and refocus. While 329 one student reported in their final course evaluation that "occasionally the length of the recorded lectures prevented [them] from finishing them entirely," we find no significant correlation between video or lesson 331 duration and fraction watched (see Fig. S2f, Fig. S2h in Supplemental Materials). 332 In total, students spent 166 hours watching lesson videos on the Panopto platform. Two-thirds of the watch time 333 occurred before the class for which the video was assigned (Fig. 4). Most lessons were released 1.5-3 days before 334 the Zoom class meeting, and students generally watched lessons during the 24 hours prior to class. The remaining 335 one-third of total watch time occurred throughout the month following the relevant class, of which three-quarters 336 were first-time views. This indicates that some students attended class without having watched videos, but did so 337 later, perhaps while completing assignments. Students in the focus group expressed that they appreciated the 338 opportunity to watch videos at a convenient time. Some shared that they would have viewed videos immediately 339 before class regardless of release timing, while others said they would have taken advantage of a longer period of 340 availability. Half of students watched nearly every video, with class-wide average video completion between 80-341 90% in most weeks (Fig. 5a). Completion rates dropped near the end of the course, which student focus group 342 participants suggested was due to high end-of-quarter demands in other courses and because the material covered 343 didn't appear in assignments. 344 The flipped structure appears to have enabled a diversity of strategies for content acquisition. Some students in 345 the focus group re-watched videos to review material or used corresponding slide decks for the same purpose, 346 while another student took notes on the videos and later referenced those notes. In final course evaluations, 347 students noted that having slide decks available benefitted their learning (Fig. 3), with one student sharing, "I was 348 able to surprise myself with how much I could figure out through review when feeling helpless at first." Despite 349 the addition of watching flipped videos (as well as a final project) to the overall course workload, students 350 reported in final evaluations that the amount of time they spent each week was similar to past quarters. Yet 351 students reported that out of the total time spent on the course, a greater fraction than in past quarters -nearly 352 90% -was valuable in advancing their education, and that their participation was higher (Fig. 2 generally received students' approval in course evaluations (Fig. 3). 356 In-class sessions were conducted using the Zoom platform. Each synchronous class started with simple 357 icebreakers and anonymous Poll Everywhere polls to gather feedback about previous video lessons. Following 358 these activities, concepts from the relevant flipped videos were briefly reviewed, with ample time for students to 359 ask lingering questions. In some class sessions, short activities were used to introduce topics not covered in lesson 360 videos. Classes often concluded with discussions of course logistics and upcoming deadlines. One-on-one tutoring 361 was offered in lieu of class sessions for students located in remote time zones, among other accommodations (see 362

Synchronous class sessions
Course Elements section "Accessibility and inclusivity"). was clearly communicated to students to explain why they were relevant. 370 Tutorials were presented in a Google Colab notebook for each class, which students would copy within the 371 Google Drive file structure so that they could edit their notebook individually. In each notebook, copious 372 scaffolding around each problem (e.g., step-by-step instructions, expected intermediate results, and links to 373 documentation websites) was often provided to create an environment of "structured inquiry." In the hierarchy of 374 Banchi & Bell (2008), who propose a four-level continuum of inquiry, for example, structured inquiry represents 375 the second level, followed by the more independent modes of "guided inquiry" and "open inquiry." 376 A tutorial notebook would often include four or five related but distinct problems that applied different concepts 377 or functions to a real-world data set from oceanographic and related disciplines (e.g., Fig. 1c); data were curated 378 by the instructors for their instructional potential. These exercises created opportunities to divide the classroom into small groups that worked cooperatively within Zoom breakout rooms. A modified "think-pair-share" model 380 few minutes, then teamed up with their group of classmates in a breakout room to discuss challenges encountered 382 and optimal solutions, and lastly returned to the main Zoom room, at which point a designated 'reporter' from 383 each group reviewed their results with the full class. Instructors monitored student discussions by moving 384 between breakout rooms and providing guidance when needed. Groups' progress was tracked by watching a 385 shared Google Doc configured ahead of time with templates in which each group was told to fill in their code 386 after they finished their work. We recommend that instructors consider randomizing groups occasionally so that 387 students get exposure to a variety of coding styles, social dynamics, and levels of confidence with the material. 388 Student focus group participants shared mixed views on the number of students per group, as smaller groups 389 require more individual accountability, but larger groups allow instructors cycling between breakout rooms to 390 provide more efficient guidance. Additional benefits of larger groups include increased opportunities for peer 391 instruction and a higher likelihood of at least one student having the required understanding to assist their group 392 in completing an activity. In course evaluations, students mostly offered criticism on the use of breakout groups, 393 with one noting, "I didn't find the small group coding breakout rooms very helpful for coding, but they were nice 394 for getting to know my classmates." While breakout rooms allow for more individualized attention, instructors 395 must be careful to distribute their finite time across groups. Several students wished for more time and instructor 396 guidance in breakout rooms, which contributed to their overall negative rating (Fig. 3). 397 On the other hand, interactive tutorials involving live coding demonstrations and individual activities were the 398 most positively reviewed course element in students' mid-quarter and final surveys (Fig. 3). Based on the mid-399 quarter feedback, the instructors emphasized these tutorials and live coding in the second half of the course. 400 Compared to using slides or copying and pasting blocks of existing code, live coding offers several advantages: it 401 forces slower, more digestible instruction, allows instructors to be responsive to student questions in real-time, 402 and inevitably allows students to see instructors' mistakes and how they are diagnosed and fixed (Wilson, 2016).
The unique challenges posed by virtual teaching require instructors to explore alternative avenues of assessing 404 student understanding. Opportunities for engagement were provided through breakout rooms and use of the chat 405 function to ask and answer questions; in final course evaluations, students rated their participation as higher 406 relative to other courses (6.0 on a 7-point scale, where 4.0 is "average" ; Fig. 2). 407

Assignments 408
Students completed four programming assignments at two-week intervals, each consisting of approachable, multi-409 part problems in a Google Colab notebook that utilized real scientific data (e.g., Fig. 1d). For example, one 410 assignment tasked students with importing data collected by an ocean observing platform (a seaglider), 411 identifying key summary statistics, creating a visualization of the glider's location and temperature measurements, 412 and calculating trends in the data. 413 Assignments incorporated elements of both "structured inquiry" and "guided inquiry," the second and third levels 414 in the hierarchy of Banchi & Bell (2008). Questions were somewhat less structured than in class activities, 415 allowing students more flexibility to design their own solutions. This created opportunities to practice both 416 programming skills and data literacy, creating a stepping stone to more sophisticated independent analysis of data 417 sets. Without a midterm exam, assignments were instructors' main window into student progress prior to the final 418 project. The assignments were designed to be challenging yet were viewed favorably by both the student focus 419 group and the final evaluation respondents (Fig. 3). Both, however, indicated a desire for more short, frequent, 420 low-stakes practice opportunities to help reinforce concepts and check understanding. 421

Pair programming 422
Students were offered the option to collaborate in pairs on assignments and the final project, which 48% of the 423 class exercised at some point and, on average, 37% of students exercised on any given assignment. The number of 424 times that a student worked collaboratively is presented as the metric "Pair programming experiences" in Fig. 6. 425 When programming as a pair, one student may serve as the "driver," writing code, while the other observes, monitoring the code for defects and helping to problem-solve. Pair programming has long been known to improve 427 student learning, performance, and satisfaction in the computer science classroom, without loss of competency on 428 exams (e.g., McDowell et al., 2002;Williams & Upchurch, 2001). Previous work has found equal benefits to 429 student performance and confidence for students who pair program remotely using screen-sharing and audio 430 connectivity compared to physically collocated students who pair program (Hanks, 2005). In a survey of 431 undergraduates who conducted collaborative research, almost 80% reported that working in teams or pairs 432 enhanced their research experience (Lopatto, 2010). 433 We found pair programming to be readily adaptable to the virtual classroom using Zoom screen-sharing, with the 434 caveat that Colab notebooks must be refreshed to show updates and thus edits must be made by one user at a time 435 rather than synchronously. One lesson learned was that some pairs will gravitate towards asynchronous 436 collaboration (i.e., a division of labor, rather than true pair programming) unless it is specified that the coding 437 must be done synchronously. Additionally, collaborations appeared to prove more successful when coding 438 partners had a pre-existing working relationship; naturally, this is less likely to occur in a remotely taught 439 introductory class setting. 440

441
In the context of a pandemic that saw many undergraduate students isolated from friends and support networks, 442 there was an urgent need to cultivate a classroom community. An online Q&A board, Piazza, was offered as an 443 outlet for students to connect asynchronously with peers and instructors outside of class and office hours (see Fig.  444 1e; we note that alternative platforms with similar functionality exist, e.g., Ed Discussions). Instructors benefit 445 from receiving fewer individual emails from students and being able to endorse student answers. Students benefit 446 from easier access to help -not only on logistical or clarifying questions, but also when seeking support on their 447 problem-solving processes. Previous study in an undergraduate computer science setting found that students use 448 on a discussion forum, by definition, constitutes a form of active learning, though posts may vary in their level of 450 reasoning and connectedness. 451 We find that engagement with Piazza in the form of questions, answers, and comments closely tracked 452 assignment deadlines and peaked while students worked on the final project (Fig. 5a). Many questions from 453 students were simple -for example, diagnosing a coding bug or clarifying the goal of an assignment -while 454 others were more complex -such as seeking strategies to efficiently work with large data sets for one's final 455 project. Four brief check-ins (including Assignment #0) required Piazza submissions and an additional quota of 456 five substantive posts per student (i.e., those that contribute "further insight" to the discussion, rather than simply 457 writing "Good work" or "I agree") was prescribed in the syllabus. That said, voluntary engagement was 458 unexpectedly robust, with students visiting Piazza once every 1-5 days on average. The forum saw 889 total 459 contributions, out of which two-thirds of students' posts were not required by a check-in or Assignment #0 (Fig.  460   5b). Past work has likewise shown high participation rates on Piazza when students are encouraged to use the 461 platform by teaching staff (Vellukunnel et al., 2017). 462 In the ideal case, Piazza would be used by students to seek help after they have invested time into trying different 463 solutions and have perhaps consulted online resources, rather than as an option of first resort. The asynchronous 464 nature of the forum also encourages students to look elsewhere first. While prompt instructor engagement is vital 465 for establishing a strong teaching presence in a remotely taught course (Prince et al., 2020), it is important that 466 responses be somewhat delayed so that an expectation of near-instantaneous feedback is not established. 467 Importantly, this also allows peers an opportunity to provide input. Nonetheless, the instructors found that 468 delaying feedback -particularly when a question had a straightforward answer -often ran against their desire to 469 help students, and thus proved challenging. 470 The platform allowed students to select the audience for their questions (instructors and/or classmates), to post 471 anonymously, and to respond to peers in threaded discussions. Students selected the three audience options 472 (public, signed or anonymous, and private posts) with approximately equal frequency, depending on their needs (Fig. 5b). Student focus group participants shared that the anonymous and private posting options were useful 474 when they were worried that a question would be perceived as obvious or simple, or when they were less sure of 475 their answer. Final course evaluations show that students felt positively about having access to Piazza (Fig. 3). 476 One student shared their appreciation for the ability to post anonymously, stating that it "alleviated some anxiety 477 about asking questions." for code, figures, and presentation content and delivery (see Table S6 in Supplemental Materials). A literature 487 review tentatively indicates that rubrics can lead to increased student performance, and in any case, rubrics are 488 recognized as a user-friendly tool for setting guidelines and enabling self-assessment (Brookhart & Chen, 2015). 489 In contrast to instructor-generated activities, the final project allowed for student-designed questions and 490 procedures. This encouraged "open inquiry" -the highest level of the hierarchy presented by Banchi & Bell 491 (2008) -an experience that is exceedingly rare in undergraduate oceanography teaching (McDonnell et al., 2015). 492 In general, inquiry-based learning develops cognitive skills on higher levels of Bloom's taxonomy (Bloom et al., 493 1956;Krathwohl, 2002). Consistent with a constructivist approach to learning (Bada, 2015), the project exposed 494 students to complex or potentially ill-structured questions and 'messy' real-world data sets that were flawed or In courses where undergraduate students conduct research with unknown outcomes, students have reported 497 learning gains similar to those of dedicated summer research programs (Lopatto, 2010). In final course 498 evaluations, most students viewed the final project as beneficial, specifically citing the opportunity to synthesize 499 course knowledge and to collaborate with classmates (Fig. 3). One critical comment related to ambiguity about 500 the rigor of science expected and the open-ended nature of project checkpoints. 501 The final projects that students produced were impressive and original, and spanned oceanographic, cryosphere, 502 and atmospheric domains (see Fig. S3 in Supplemental Materials). Here we assess students' final project 503 questions and hypotheses based on four higher levels of the cognitive process dimension of the revised Bloom's 504 taxonomy (Bloom et al., 1956;Krathwohl, 2002), namely application, analysis, evaluation, and creation (see 505 rubric in Table 2), similar to the methodology of Kastens et al. (2020). We also evaluate each project's 506 complexity by summing the number of scientific domains, file types, and data sets incorporated. We find that 507 students' project cognitive levels were consistent between the questions and hypotheses they posed. Interestingly, 508 we identify no significant relationship between projects' overall cognitive level and complexity, suggesting that a 509 larger project scope was not necessarily indicative of higher-order (or lower-order) cognition and vice versa (Fig.  510 S3 in Supplemental Materials). 511

Accessibility and inclusivity 512
The instructors of the course in 2020 (E.C.C. and K.M.C.) implemented intentional practices to ensure that the 513 course was accessible for all students and that those with varying backgrounds and needs felt welcome and 514 accommodated. Some practices were specific to the remote setting, while others are equally applicable to in-515 person teaching. Instructional approaches focused on active learning and student engagement can help to combat 516 inequities in the classroom (Theobald et al., 2020), but equally important are strategies that promote a culture of 517 respect and foster a sense of belonging for students (Dewsbury & Brame, 2019). 518 Virtual teaching -and adaptations such as virtual office hours -offered inherent accessibility benefits for students 519 facing long commutes, disability-related accessibility challenges, and other barriers to attending classes on 520 campus (Pichette et al., 2020). Virtual office hours offered added benefits for students who may perceive office hours as an unfamiliar, unsafe, or inaccessible space, with breakout rooms creating privacy for students with 522 questions on assignments or personal matters. Students shared their enthusiasm for virtual office hours in final 523 course evaluations (Fig. 3). Recorded lessons, the asynchronous Piazza Q&A board, a flexible attendance policy, 524 and an option to submit a recorded final project presentation enabled the participation of students located in 525 remote time zones due to the pandemic. 526 That said, virtual learning can make it harder to maintain focus and limit distractions. The large amount of screen 527 time was the most frequently mentioned criticism in students' course evaluations (Fig. 3). "Zoom fatigue" is a 528 form of exhaustion that may result from the intensity of continuous, close-up eye contact and seeing oneself, 529 reduced mobility when having to stay in a video frame, and increased cognitive load from having to exaggerate 530 nonverbal cues (Bailenson, 2021). As one student reported in their mid-quarter evaluation, "just being on Zoom 531 for so long takes away my attention span." To mitigate these effects, regular breaks were taken during class, 532 students were encouraged to take breaks during recorded videos, a video-optional policy was instituted on Zoom, 533 and students were allowed to use the chat function to participate. Nonetheless, we acknowledge that teaching 534 online to students with their cameras off can be disorienting. We remind prospective instructors teaching in a 535 virtual setting for the first time to be kind to themselves. 536 In a survey distributed in the first week of class ("Assignment #0" in Fig. 5a), students were encouraged to 537 introduce themselves to the teaching team by sharing their pronouns and any anticipated accessibility or learning 538 needs. Survey responses helped instructors affirm students' identities and accommodate students' disabilities and 539 led to instructors making an effort to accurately caption all lesson videos. The survey also asked about comfort 540 with technology and prior exposure to coding, which we analyze in this study (as discussed in Methods). Previous 541 coding experience was not required, and a prerequisite of one quarter of calculus from previous iterations of the 542 course was removed. Instructors offered one-on-one mentoring as needed, recognizing that some students require 543 additional, intensive help with certain topics or specialized guidance tailored to their specific learning style in 544 order to keep pace with the class. These mentoring sessions also had the benefit of allowing those students to 545 form a personal connection with the instructors, which is otherwise challenging in a large virtual classroom.
A classroom community built on safety and mutual understanding promotes engagement, especially among 547 students with marginalized identities, by creating a supportive space to share ideas and ask questions (Barrett,  easily accessible for questions, encouraging collaboration, and emphasizing that student physical and mental well-559 being were priorities throughout the course. In mid-quarter evaluations, one student noted that the "low stress 560 environment" of the course helped them learn. 561

Course policies and expectations
562 Setting clear expectations supported by explicit guidance on how to succeed contributes to an accessible learning 563 environment by establishing a safe and productive classroom culture and reducing confusion. The syllabus is the 564 first opportunity to outline expectations. As such, a detailed course syllabus was drafted to include six student 565 learning objectives (see Introduction), course and university policies, logistics, guidelines on Zoom etiquette, and 566 a week-by-week schedule. Each of these components give students a clear understanding of what they should gain 567 from the course, outline metrics for success, and create trust that the instructors have thoughtfully planned the 568 curriculum (Habanek, 2005). 569 The syllabus also included an integrity policy that encouraged collaboration but prohibited plagiarism. Students 570 were allowed to reference external resources such as online API documentation sites and Stack Overflow. 571 Citations and acknowledgment of collaboration were expected in assignments, and students confirmed their 572 agreement with the integrity policy in the initial survey (Assignment #0). In this way, the syllabus also acted as a  In-class participation and flipped video watching were not graded, partially in recognition of pandemic stressors 582 but also to accommodate individual circumstances without requiring students to disclose possibly sensitive 583 information. The expectation was that assignment grades would be sufficiently impacted if students were not 584 engaged in these activities. For assignments that were graded, instructors offered a one-time, two-week extension 585 to allow flexibility while still requiring students to learn foundational material. While lesson videos had high 586 completion rates (Fig. 5a), implementing low-stakes graded comprehension checks could be useful in a situation 587 of lower engagement (Jacobs et al., 2016). 588

Conclusions 589
Student experience 590 Overall, students perceived the course positively, rating its content, evaluation techniques, organization, and the 591 course as a whole markedly higher than in past quarters (Fig. 2). These evaluations are notable given hardships related to the COVID-19 pandemic, as well as findings that show students often prefer passive lecturing over 593 active learning due to the additional cognitive effort required to engage actively with material (Deslauriers et al.,594 2019). Students' view of the course content evolved from a critical stance expressed in mid-quarter evaluations, 595 with comments citing its abstract or challenging nature, to an appreciative view of the data skills they had 596 acquired by the end of the course (Fig. 3). 597 By calculating correlations between a variety of anonymized data sources (see Methods), presented in Fig. 6, we 598 explore the impact of students' varying backgrounds and learning strategies on their course experiences and 599 outcomes. We find that highly engaged students acquired more Python skills and earned higher grades. The 600 correlation observed between three key metrics -Q&A forum days online, total lesson minutes watched, and 601 number of forum answers -and the breadth of Python skills used in final projects suggests that highly-skilled 602 students were more engaged with the course, acquired more content knowledge, and frequently shared that 603 knowledge with peers. Variations in students' final Python skills cannot fully explain differences in their final 604 grades, but the two show a positive nonlinear correlation. Students who earned higher grades tended to monitor 605 the Q&A forum more frequently, collaborate more often with classmates, and watch lesson videos before class. A 606 positive relationship between question-asking on a Q&A forum and final grades has been found in past work 607 (Vellukunnel et al., 2017). Exposure to video content before working on related in-class activities may have 608 helped students prepare for assignments, which comprised the majority of final grades. That said, the lack of 609 correlation between Python skills used in final projects and the timing of video lesson views suggests that it was 610 the total amount of time spent viewing lessons, not whether those lessons were watched before or after a class, 611 that mattered most for students' application of course content to an open-ended project. 612 We find that students' self-assessment of programming skills in a final survey was not correlated with their final 613 grades, consistent with research that found a weak correlation between tutor grades and self-assessments by over 614 3,000 undergraduate students (Lew et al., 2010). That said, students were asked to self-assess their Python 615 competence, rather than their final grade, and the two metrics may not be entirely comparable. Nonetheless, this 616 result could reflect the Dunning-Kruger effect, a cognitive bias in which those with the least knowledge tend to overestimate their performance or ability because they lack the competencies required for self-assessment (Kruger 618 & Dunning, 1999). Students' final self-assessments were not correlated with any metrics other than prior coding 619 experience, pointing to a persistent confidence from previous Python exposure that contributed to a perception of 620 competence not necessarily reflected in grades or skills. 621 Significantly, neither students' final grades nor their code usage in final projects were correlated with prior coding 622 experience, indicating that previous exposure to Python was not predictive of success in the course. That said, less 623 prior experience was associated with higher engagement with lesson videos and the Q&A forum. This suggests a 624 'level playing field' in which those who came in with less previous knowledge of programming took full 625 advantage of class resources to ultimately reach the same level of proficiency as their peers. 626 627 We recommend without reservations adopting the key elements that we describe in this paper, particularly flipped 628 instruction, an online coding platform and discussion board, and strong attention to accessibility. That said, we 629 encourage others to improve on our framework and regularly seek feedback from students, preferably in a format 630 that allows for anonymity. For example, in course evaluations, students encouraged the addition of more frequent, 631 low-stakes practice of basic skills to reinforce fundamental concepts (see Course Elements section 632 "Assignments"). New practice opportunities would ideally be coupled with immediate feedback that guides 633 further practice, which promotes efficient learning and refinement of conceptual understanding (Ambrose et al., 634 2010). Additionally, data literacy skills could be taught through higher-level exercises asking students to 635 scrutinize the limitations, biases, and provenance of scientific data sets and make predictions and 636 recommendations grounded in their analysis of data (see, e.g., Kastens & Krumhansl, 2017). Instructors may 637 consider expanding this offering into a multi-course sequence to incorporate these elements. The pandemic likely accelerated existing trends in higher education towards multi-modal instruction and more 647 engaging teaching practices (Lockee, 2021). As universities have transitioned back to in-person teaching, we 648 believe that the framework developed for this course is well-suited to a hybrid approach with in-person tutorial 649 and work sessions but recorded lesson videos, opportunities for regular online engagement, and virtual office 650 hours for accessibility. Alternatively, a fully remote version like that described in this study could still be offered, 651

Recommendations for future teaching
potentially with minimal penalty in student performance and satisfaction compared to in-person instruction 652 Furthermore, the graduate student instructors have benefited from the professional development that teaching this 661 course allowed. Opportunities such as this have been linked with the success of doctoral students earning their 662 degree in a timely manner and attaining future employment in higher education (Bettinger et al., 2016). Our 663 department plans for a rotating cast of two graduate students to continue serving as the primary teaching team, with the guidance and support of a dedicated teaching mentor to develop their pedagogical skills. Graduate 665 students' ownership of the course will promote the teaching of current data science practices. 666 For many undergraduate students without a deeper interest in data science, however, multiple years may pass after 667 completing OCEAN 215 before their next opportunity to use Python programming. For most, this comes in the 668 form of their senior thesis. Students' demonstrated loss of coding skills during the intervening years (see 669 Introduction section "Course history and development") suggests not only the merits of our improved 670 instructional design but also an urgent need to infuse an oceanographic undergraduate curriculum with regular 671 opportunities to practice and apply programming skills. Barriers to enacting this change include some instructors' 672 lack of familiarity with Python -many, for example, use MATLAB for research -and the need to communicate a 673 standard set of programming skills that students can be expected to know. In addition to infusing curricula with 674 programming, effort could be invested in creating supervised research opportunities for students that involve the 675 use of programming and data analysis skills. More broadly, we see the need for earth science undergraduate 676 curricula to adopt active, student-centered pedagogical practices that more frequently allow students to construct 677 knowledge through hands-on exploration of real-world data. Infusing earth science curricula with current data 678 programming practices will naturally facilitate the achievement of these goals.  Table S4 in 912 Supplemental Materials for specific functions, operators, and methods); (b) flipped video lessons, with a slide 913 demonstrating how colors, fonts, design elements, and a minimal working example help to explain Python syntax; 914   of-quarter (solid bars) surveys in 2020, ranked according to the net positivity (blue) or negativity (red) of 932 comments regarding those themes (see Methods section "Initial, mid-quarter, and end-of-quarter surveys"). 933 Original survey prompts are listed in Table S3 in the Supplemental Materials. 934 where each viewing session is weighted by its length, expressed as a fraction of the total video time delivered 938 during the course (166.3 hours over n = 41 videos). The median and interquartile range (25%-75%) of video 939 releases by instructors, relative to the corresponding class, is included for reference, indicating that videos were 940 generally released 1.5 to 3 days before they were due. Note that vertical shading corresponds to days; also note 941 the compressed positive x-axis scale. 942 (outer) and further divided by chosen audience (inner). "Required posts" were those requested from every student 950 for Assignment #0 and final project check-ins. "Public posts" were viewable by all users, while "private posts" 951 were visible to instructors only. "Anonymous posts" refer to those in which the author was hidden from other 952 students, but not from instructors. 953  Table S4 in Supplemental Materials (for "Python skills used in project"; column 2), Course Elements 960 section "Assignments" (for "Pair programming experiences; column 3), Methods section "Online forum 961 engagement" (for Q&A forum-related metrics; columns 4-6), Methods section "Flipped video viewership" (for 962 video-related metrics; columns 7-9), Table S1 in Supplemental Materials (for "Prior coding experience"; column 963 10), and Methods section "Initial, mid-quarter and end-of-quarter surveys" (for "Final self-assessment of Python 964 skills; column 11). 965 Tables   967   Table 1. Core topics and concepts taught in Ocean 215. Topics listed here are not necessarily in chronological 968 order as taught in the course, and class time was not necessarily allocated in equal proportions to each topic. 969

Topic Main concepts and skills
Why code in Python?
The power of programming is its versatility. Python is open source, stable, popular, free, and ideal for scientific data analysis. Google Colab offers advantages in a classroom setting compared to other programming environments.

Variables and object types
Variables store Python objects, which include numbers, booleans, strings, lists, tuples, dictionaries, and module-specific objects. Objects can be altered, indexed, sliced, iterated over, or used in mathematical operations. Assigning meaningful variable names makes for clearer code.

Logical operations and control flow
Objects can be compared using logical operations (and, or, is/equals, greater/less than, in, not). Loops and if-statements facilitate repetitive and conditional actions.
Packages and functions Installing and using packages extends the capabilities of Python. Built-in, imported, and usercreated functions accomplish common tasks and make for more compact, efficient code. Online documentation can be used to understand functions' arguments and outputs.
Data files Oceanographic data are often stored in CSV and netCDF files, which can be read into Python, displayed, indexed, sliced, and manipulated using functions in the NumPy, Pandas, and Xarray packages. Real-world data sets can be obtained from public repositories and frequently contain messy or missing data.
Working with data Data can be stored in multi-dimensional NumPy arrays and labeled structures specific to the Pandas and Xarray packages. These packages, as well as others like SciPy, have functions that average, sort, group, correlate, resample, smooth, regress, interpolate, and perform other computations on the data. Understanding common error types and tracing errors from their line of origin allows for methodical debugging of code.
Plotting Line, scatter, bar, contour, pseudocolor, and other types of plots available from the Matplotlib package can be used to visualize data. Geospatial data can be projected onto maps using Cartopy. Appropriately customizing and labeling a plot is essential for interpretability.

Scientific skills
The modern scientific method is driven by data exploration, but also relies on traditional research skills like formulating hypotheses, interpreting the scientific significance of visualizations, effectively communicating results, and giving and receiving feedback from peers and mentors. more details). The students were encouraged to address one or more of the guiding questions listed in Table S5 in

998
In that class, most students were not engaged during the lectures, which led them to be bewildered when doing real coding. I 999 have also been teaching myself MATLAB for three years, basically learning by doing tasks with the help of the internet. This 1000 process has often been time-consuming, and it has been hard to organize my notes in a logical way. In comparison to those 1001 experiences, this course provided a logical pathway into Python, especially for oceanography applications. Without this class, 1002 it would have taken ten times longer to acquire the same knowledge, which would also have been less clear.

1003
In class, Zoom breakout rooms forced everyone to discuss and practice the coding, which in turn forced us to come well-1004 prepared for class. Though Google Colab has limited storage (RAM) and is unable to process large data sets, it is great for   Fraction watched represents the total minutes that a specific video was viewed by a specific student divided by its duration, and thus can exceed 100% due to rewinds and repeat views. (e) Videos per lesson vs. video fraction watched, averaged across all students. Note that the final video lesson (Lesson #16) was excluded as an outlier where indicated with an asterisk (*) due to its lower viewership. (f) Lesson duration vs. fraction watched, averaged across all students. (g) Video duration vs. completion rate, averaged across all students. Completion rate represents the fraction of a video that was viewed at least once, and thus is capped at 100% for a specific student and video (unlike "fraction watched"). (h) Video duration vs. fraction watched, averaged across all students.  , 2002). Each student's questions and hypotheses (up to three each per student) were assessed using the rubric and weighting described in Table 2, with higher levels of Bloom's taxonomy representing higher-order questioning and prediction.  Table S1. Rubric used to assess students' prior coding experience based on their written responses to the Assignment #0 survey during Week 1 of the course. Students were asked: "Do you have prior coding experience, and if so, with what language?" and "How comfortable do you feel using technology?" Responses to the first question were graded subjectively based on word choice on a scale from 1-5, using the keywords in quotes (e.g., "a little") when present. Additional points were awarded to weight responses in favor of prior exposure to Python or similar high-level and/or interpreted languages (MATLAB, Java, R). Points were subtracted to account for less relevant prior experience. Results are presented as the metric "Prior coding experience" in Fig. 6.
No experience Minimal experience (e.g., "a little", "small", "tiny amount") "Some" or "moderate" experience Experience Experience (with full additions) Additions (maximum total: +1.0) Subtractions (maximum total: -0.5) +0.5 for one of MATLAB, Java, R -0.5 if response mentions many years since their previous experience +1.0 for Python or multiple languages -0.5 if response mentions that their previous experience was not useful Note: If no level of coding proficiency was provided, the base number is from the students' "comfort with technology" statement ("Very comfortable": 4; "Fairly comfortable": 2). Relative to similar courses taught in person, your participation in this course was: Intellectual challenge relative to other courses The intellectual challenge presented was: Course as a whole The course as a whole was: The remote learning course as a whole was: 0-5 scale ("Very poor" to "Excellent") Course content The course content was:

Usefulness of course content
Relevance and usefulness of course content were: Average of: "Usefulness of reading assignments in understanding course content was:", "Usefulness of written assignments in understanding course content was:", "Usefulness of online resources in understanding course content was:" Facilitation of learning Amount you learned in the course was: The effectiveness of this remote course in facilitating my learning was:

Evaluation and grading techniques
Evaluative and grading techniques (tests, papers, projects, etc.) were:

Reasonableness of assigned work
Reasonableness of assigned work was: Organization Course organization was: Organization of materials online was: Clarity of student responsibilities Clarity of student responsibilities and requirements was: Instructor's contribution to the course The instructor's contribution to the course was: Effectiveness of instructor's teaching The instructor's effectiveness in teaching the subject matter was:

Quality of instructor answers and feedback
Average of: "Explanations by instructor were:", "Instructor's ability to present alternative explanations when needed was:", "Instructor's interest in whether students learned was:", "Answers to student questions were:" Quality/helpfulness of instructor feedback was: Table S3. Open-ended questions asked in IAS (university-administered) mid-quarter and final course evaluations in 2020. Students' anonymous responses are tabulated in Fig. 3 and are excerpted throughout this study.

Evaluation period Question
Mid-quarter What is helping you to learn in this course?
What is hindering your learning in this course?
What can your instructor do to improve your learning in this course?

Final
Was this class intellectually stimulating? Did it stretch your thinking? Why or why not?
What aspects of this class contributed most to your learning?
What aspects of this class detracted from your learning?
What suggestions do you have for improving this class generally?
If this course were offered remotely again, what suggestions do you have to improve the student experience? Table S4. Functions, operators, and methods taught in the course that were used as search terms to assess the complexity of students' final project code. A Python script was used to count instances of each search term in students' project code notebooks, and the number of search terms used at least once (expressed as a percent of all search terms below) is presented as the metric "Python skills used in project" in Fig. 6.  Table S5. List of guiding questions offered to undergraduate student coauthors for structuring their testimonial submissions, which are presented in Box 1 (see Methods section "Student focus group"). Students were encouraged to address one or more of the questions in their submissions. Table S6. Grading rubric for students' final research projects. This rubric was provided to students to delineate expectations and evaluation techniques.
Topic background is sufficient, but missing some details or lacks coherency.
Topic background is clear, complete, and relevant.

Questions / Hypotheses
Questions are not well-defined. Hypotheses are not substantiated.
Questions are welldefined. Hypotheses draw on prior knowledge.
Questions are well-defined and pertinent for the topic. Hypotheses draw on prior knowledge and have clear explanations for why they are expected.

points
Data Information Information about the data collection process is missing key details or is inaccurate. The limitations of the data are missing or not realistic.
Information about the data collection process is accurate, but missing some minor details. The limitations of the data are explained.
Information about the data collection process is complete and accurate. Underlying problems and limitations of the data are explained. Use of these data to answer the project questions is justified.

Data Processing
The student has made errors in processing their data. The student is missing steps.
The student has processed the data correctly. Steps for obtaining, loading, cleaning, and analyzing the data are welldefined.
The student has processed the data correctly and taken precautions to ensure that their results are appropriate. Steps for obtaining, loading, cleaning, and analyzing the data are welldefined.

Results
Results of the project do not attempt to answer the scientific questions. The data visualizations are not relevant.
Results of the project somewhat answer the scientific questions. Data visualizations are mostly appropriate for the data.
Results of the project answer, or earnestly attempt to answer, the scientific questions. Data visualizations are entirely appropriate for the data.

Presentation Skills
Organization The presentation is not in a logical order and the student makes no effort to guide the audience.
The presentation is organized in a logical order and takes some care to guide the audience.
The presentation is organized in a logical order and shows exceptional attention to guiding the audience.

Timing
The student far exceeds their allotted time and/or has not made an effort to practice.
The student completes the presentation in somewhat over 5 minutes.
The student completes the presentation within 5 minutes and it is clear that they have practiced.

Explanation of Ideas / Information
The ideas and information explained in the presentation were not clear and were not relevant.
The ideas and information explained in the presentation were clear and relevant.
The ideas and information explained in the presentation were exceptionally clear, relevant, and coherent.

points
Presentation: 20 points

Correctness
The student misuses code and does not produce reasonable results.
The student uses some coding techniques/tools learned throughout the quarter. The analysis produces reasonable answers that can be replicated with some effort.
The student properly and efficiently uses the coding techniques/tools learned throughout the quarter. The analysis produces reasonable answers that can be replicated easily.

points
Functionality The code does not run and has egregious errors.
The code is mostly able to run, but has some (small) errors.
The code runs efficiently with no errors.

Tidiness
The code breaks proper etiquette and should not be shared with others.
The code mostly follows proper coding etiquette. The organization is somewhat lacking and would need review before sharing.
The code follows proper coding etiquette. It is organized and commented effectively so that it can easily be shared with another person.

points
Perseverance The student has made no effort to work through problems and hurdles.
The student has made some effort to work through problems.
The student has made a gallant effort to work through problems and documented in their code their best understanding of the problems they are facing.

Plots
Plot Clarity The plots are unclear and do not make sense in the context of the project.
The plots are mostly clear and show some thought from the students about ways to present their data.
The plots are extremely clear and are effective tools to help the audience understand the results/analysis.

Colormaps
The colormaps are not appropriate for the data being shown.
The colormaps are appropriate for the data being shown.
The colormaps are appropriate for the data being shown and take into account colorblindness, and perceptual accuracy.

points
Proper Labels The plots are missing most/all labels or have improper labels.
The plots are labeled with general accuracy and completion.
The plots are labeled extremely accurately in a way that guides the audience through the figure.

Creativity
The student made no effort to create original plots.
The student has made some effort to create original plots.
The student has created original plots that show the data/analysis in an extremely effective manner.

points
Code: 40 points