SCRATCH to R: Toward an Inclusive Pedagogy in Teaching Coding

Abstract SCRATCH, developed by the Media Lab at MIT, is a kid-friendly visual programming language, designed to introduce programming to children and teens in a “more thinkable, more meaningful, and more social” way. Although it was initially intended for K-12 students, educators have used it for higher education as well, and found it particularly helpful for those who haven’t had the privilege of learning coding before college. In this article, we propose using SCRATCH to create an interactive and fun project for introduction as a gateway to learn R in introductory or intermediate statistics courses. We begin with a literature review on recent K-12 computing education, as well as how visual coding has been used in college classrooms as an aid for teaching syntax-based coding. Then, we explain the design of the proposed project and share the observations from a pilot study in a liberal arts college with 39 students who had diverse coding experiences. We find that the most disadvantaged students are not those with no coding experience, but those with poor prior coding experience or with low coding self-efficacy. This innovative SCRATCH-to-R approach also offers us a pathway toward an inclusive pedagogy in teaching coding.


Introduction
The hottest term in higher education these days is DEI (Diversity, Equity, and Inclusion), and there are countless calls for changes in how we teach Statistics and Data Science (SDS) and Computer Science (CS) in colleges. To name a few, a recent editorial of the Journal of Statistics and Data Science Education (JSDSE) invited SDS educators to carefully examine "pedagogical practice, including unexamined assumptions and behaviors" as one way to promote inclusivity in SDS classrooms (Witmer 2021, p. 3). CSforALL, an organization committed to equity in CS education pledged to "repeatedly speak out against our historical pedagogies and approaches to computer science instruction that are grounded and designed to weed out all but a small prerogative subset of the U.S. population" (CSforALL 2020). The Diversity and Inclusion Committee in the Department of Computer Science at University of Maryland published a DEI statement in 2020 addressing the urgency of changing CS curriculum so that "classes are structured such that students starting out with less computing background can succeed, as well as reorienting the department teaching culture toward a growth mindset" (Lin 2020). Similarly, Guzdial (2020aGuzdial ( , 2020b also pressed the educators to brainstorm different ways in teaching computing so that the long-existing structural inequities between the most and least privileged students could be eliminated or at least reduced. More specifically, they recommended that educators "go ahead and bore your best students, " as they believe that helping the students with less computing background succeed would benefit project, named "SCRATCH introduction project, " followed by results from a pilot study conducted in Spring 2021. The article conclude with discussions about the proposed project and potential future work.

Why Learn Coding
This generation of young people are very different from most of the SDS educators in schools. As Prensky (2001) called them "digital natives, " children and college students today grew up with all kinds of digital technologies and most of them can fluently speak the "native" language of computers, online games, mobile apps, text messages, social media, etc. But this doesn't mean all of them can "write" this language well. Young people still rely on parents and educators to help them develop and learn computational thinking (Resnick 2007).
Wing (2006, p. 33) described computational thinking (CT) as a fundamental skill for everyone, not just for computer scientists, which "involves solving problems, designing systems, and understanding human behavior, by drawing on the concepts fundamental to computer science" and argued that "we should add computational thinking to every child's analytical ability. " While "computer science is not computer programming" (Wing 2006, p. 35), programming does support computational thinking and provides many important benefits to young learners, such as improving their creativity, expanding the range of their learning, advancing their spatial and reasoning skills, and helping them reflect on their own thinking and learning (Resnick et al. 2009;Scherer, Siddiq, and Sánchez Viveros 2019). Moreover, Forsström and Kaufmann (2018) noticed that adding programming in mathematics education could improve not only students' motivation for learning mathematics, but their performance in mathematics. Two studies (Arféa, Vardanega, and Ronconi 2020;Salehi et al. 2020) suggested that the first-graders and college students who study computer science in primary schools and universities tend to perform better in problemsolving challenges than those who don't. As Steven Jobs said well in his 1995 interview with journalist Robert Cringely, "everyone in this country should learn to program a computer, because it teaches you to think. "

Coding Education in K-12
While teaching kids to code is beneficial, K-12 teachers still have mixed perspectives about how to introduce coding into K-12 curriculum. For instance, a cross-country study conducted by Wu et al. (2020) examined teachers' perceptions and readiness to teach coding in K-12 classrooms and found that most teachers in Finland, Singapore, Taiwan, and South Korea have recognized the importance of coding and believe that "coding skills are needed also for those who are not aiming to be professional programmers" (p. 25). They also discovered that K-12 teachers in Singapore, Taiwan, South Korea, and Mainland China are more ready to implement curricular changes to introduce coding in basic education than those in Finland, with Mainland China teachers carrying the most positive attitudes toward technological changes. As Dagiene et al. (2014) noted, the lack of coding education in K-12 has become a serious issue in many western countries, and the 2020 State of Computer Science Education report (p. 2) revealed that less than half (47%) of the high schools in the U.S. teach programming.
Why are K-12 teachers hesitating to add coding into their curriculum? Learning to code is not always easy (Fitzgerald, Simon, and Thomas 2005;Vainio and Sajaniemi 2007) and could be particularly challenging for novice learners (Gross and Powers 2005;Kelleher and Pausch 2005). Several researchers have further pointed out the difficulties in learning coding, such as teachers' lack of content knowledge, sense of purpose, problem solving skills, and abilities to use logical thinking (Lahtinen, Ala-Mutka, and Järvinen 2005;Koscianski and Bini 2009). Many children are simply not ready to "master the syntax of programming, " or debug when things went wrong (Resnick et al. 2009, p. 63).
In the past two decades, many studies have focused on addressing those learning barriers for children and teens, and several new programming languages have been developed with young learners in mind. Some of these new languages use a text method with more traditional languages like C (Saito and Yamaura 2013); some use a visual method with new interfaces like Alice (Cooper, Dann, and Pausch 2000) or SCRATCH (Resnick et al. 2009); and others use both text and visual methods for problem-based learning (O'Kelly and Gibson 2006) or game-based learning (Jiau, Chen, and Ssu 2009;Vasilateanu, Wyrazic, and Pavaloiu 2016). Several more recent studies revealed advantages of block-based programming over text-based programming (e.g., Price and Barnes 2015;Weintrop et al. 2019). Saito, Washizaki, and Fukazawa (2017) investigated the learning effect between text-based and visual-based input methods for first-time learners aged from 6 to 15-years-old and discovered that "a visual input is advantageous in a programming implementation environment for first [-time] learners" (p. 210). Weintrop and Wilensky (2017) conducted a study to compare visual block-based and text-based methods in two high school introductory programming classes-one used a block-based coding interface while the other used a text-based interface to work through the same curriculum in the first five weeks of the year-and showed that the students using the block-based modality learned better and became more interested in taking more computing courses in the future. Instead of using one or the other, some researches have shown that learning a blockbased language first would help novice programmers learn a more traditional text-based language. For instance, Meerbaum-Salant, Armoni, and Ben-Ari (2013) and Armoni, Meerbaum-Salant, and Ben-Ari (2015) investigated the transition from learning coding with SCRATCH in middle school to learning a text-based language like C or Java in secondary school and discovered not only that the core CS concepts were successfully learned using the SCRATCH interface, but also that learning SCRATCH first in middle school greatly facilitated student learning more complex language in secondary school; as they noted, "less time was needed to learn new topics, fewer learning difficulties, and they achieved higher cognitive levels of understanding of most concepts" (Armoni, Meerbaum-Salant, and Ben-Ari 2015, p. 1). A recent study done by Grover (2021) revealed similar promising results of this block-to-text approach in middle school classrooms.

Text-Based versus Visual Block-Based Coding
Why does block-based coding work so well? The authors of the article "Learnable Programming: Blocks and Beyond" (Bau et al. 2017) provided a good explanation that the learnability of block-based languages results from how visual blocks help lower the following learning barriers of text-based languages: • Learning a text-based language is hard because of all the "new vocabularies. " • Text coding is difficult because of its high cognitive load associated with learning a "new syntax, " especially for new learners. • More critically, writing text code is so prone to errors, and nothing is more discouraging than getting an "error message" to novice coders.
They further explained how visual blocks help eliminate possible struggles and frustrations with syntax. For example, blocks rely on "recognition instead of recall" and choosing a block from a palette is easier than memorizing hundreds of "vocabularies. " Moreover, block-based coding can help reduce cognitive load by "chunking code into a smaller number of meaningful elements" (Bau et al. 2017, p. 74). Most constructed blocks on SCRATCH are displayed as colorful objects with a few empty sockets for inputs, so the "syntax" of the language is automatically visualized for the users. Most importantly, the structure of blocks makes it impossible to connect two incompatible pieces together, which helps prevent (not just reduce) errors; in other words, the "grammar" of the program is visualized too.

Block-to-Text Approach in Higher Education
This block-to-text approach is not brand new to higher education and can be found in many introductory computer science courses in universities. For instance, according to Bau et al. (2017), Harvard's CS50 students begin with SCRATCH and then move to C; Berkeley's CS10 course moves from visual-based language Snap! to text-based Python; Code.org's Computer Science Principles (CSP) course uses Droplet blocks before moving to JavaScript; and Project Lead The Way's CSP course teaches both SCRATCH and App Inventor (another visual-based coding application maintained at MIT) before transitioning to Python. An earlier study (Moskal, Lurie, and Cooper 2004) conducted at two colleges suggested that using Alice before learning Java greatly improved the retention and performance of their "at risk" introductory CS majors (referring to students with little prior programming experience and weaker math preparation).
Another two studies (Malan and Leitner 2007;Wolz et al. 2009) found that using SCRATCH before Java or C in the same class not only helped students understand important CS concepts, but also improved students' engagement. Dann et al. (2012) conducted a two-year study at Carnegie Mellon University (CMU) which focused on learning the transition from Alice to Java in an introductory programming course. Since Alice was designed inhouse at CMU, they developed a new version of the Alice system (Alice 3) so that students can view accurate Java code with syntax details directly in the Alice 3 interface. What they found might surprise some people: Students enrolled in the sections using both languages (who then had less time practicing Java) significantly outperformed those in the sections using only Java on the same Java final exam. Despite its popularity in CS education, this block-to-text approach didn't seem to draw similar attention in the SDS education community (at least to the best of our knowledge), although the challenges associated with teaching coding are nothing new to SDS educators (Gomes and Sousa 2018). Some began addressing the importance of inclusive practices in SDS (e.g., Dogucu, Johnson, and Ott 2021). Our work here is to provide an innovative project using SCRATCH as a fun means for learning the general programming syntax of R (e.g., objects, functions, augments, sequences, etc.) and make SDS students' first R experience as positive as possible, regardless of their prior experience.

Student Self-Efficacy
As Dewsbury and Brame (2019) stated in their article titled Inclusive Teaching, "Pedagogical practices that improve sense of belonging and self-efficacy help reinforce a classroom climate that is inclusive" (p. 4). To ensure a positive and inclusive R experience for all students, we decided to first focus on understanding how student self-efficacy in R coding changed over a course of a semester in this study.
Self-efficacy, according to Bandura (1997, p. vii), is "the exercise of human agency through people's beliefs in their capabilities to produce desired effects by their actions, " or more simply, one's confidence in their ability to achieve a goal or complete a task. Usher and Pajares (2008) conducted a critical review on self-efficacy and summarized its four theorized sources in academic contexts: 1. Mastery experience-This refers to how students interpret the result of their previous accomplishments. It is the most powerful resource of self-efficacy among four, and is known to have enduring effects on self-efficacy. When students believe that their past attempts were successful, their confidence to complete similar tasks is increased. 2. Vicarious experience-This occurs when students observe the performances of their peers or others. For instance, students often compare their exam score with their classmates' or the class average in order to get a sense of how well they performed. 3. Social persuasions-External verbal encouragement and support received from parents, friends, or teachers can help boost or sustain students' confidence in their capabilities, especially when they are struggling with difficulties. 4. Physiological states-Those are internal feelings associated with student success or failure, such as satisfaction versus frustration, comfort versus anxiety, etc. Bandura (1997) suggested that physiological states may relate to self-efficacy curvilinearly, while most researchers generally believe that "increasing students' physical and emotional well-being and reducing negative emotional states strengthens self-efficacy" (Usher and Pajares 2008, p. 90).
Soykan and Kanbul (2018) studied K-12 students' self-efficacy regarding coding and found that K-12 students receiving coding education tend to have higher self-efficacy than those who didn't. In another study, Tsai (2019) revealed a positive effect of App Inventor 2 (a visual coding language) on improving college students' understanding of basic programming concepts and further pointed out that such effect was "especially large in students with moderate and low self-efficacy" (p. 224). Moreover, many studies have shown that self-efficacy is a robust and significant factor (in some studies, this is the only significant factor) for academic achievement in different domains; see, for instance, Ramalingam, LaBelle, and Wiedenbeck (2004), Zajacova, Lynch, and Espenshade (2005), and Honicke and Broadbent (2016). Wilson and Shrock (2001), after studying 12 factors that may contribute to student success in an introductory computer science course, concluded that the most important predictor of students' performance was students' comfort level, a source of self-efficacy.

Why SCRATCH
According to Wolz et al. (2009), SCRATCH is "both a social computing environment and a rich programming language with a highly supportive interface. " Malan and Leitner (2007) proposed using SCRATCH as a first language in introductory courses-for majors and nonmajors alike-and argued that SCRATCH would allow less experienced learners to master programmatic constructs and focus on problems of logic before worrying about syntax. Developed by the Lifelong Kindergarten team at MIT's Media Lab, SCRATCH was established with first-time programmers at the core and three goals in mind: more thinkable, more meaningful, and more social than other programming interfaces (Resnick et al. 2009); its creative and playful, lowers the bar to programming, enables learners to code with a mouse, and empowers first-time programmers at all ages and background from the very beginning. But honestly, we had never heard of SCRATCH until Summer 2020-when we got stuck at home during the pandemic and were asked to develop "high-quality" online courses for 2020-2021. Suddenly, how to build a strong remote learning community and motivate students with very diverse coding background to learn R became the biggest challenge for our teaching preparation. After watching our school-aged children take an online SCRATCH course, we were so inspired that we created an introduction project using SCRATCH, thinking to use it to introduce ourselves to students from a distance, but quickly realized that this would be a perfect activity for our students to do in Week 1.

The Course and the Project
The proposed SCRATCH project was implemented in an Intermediate Statistics course (STAT 230) at a liberal arts college in Spring 2021. Originally designed as a second statistics course for potential statistics majors, STAT 230 has become one of our service courses to the College and often attracts a diverse student body from other disciplines. While it has a prerequisite of one introductory statistics course, students who got a score of 5 on AP statistics are recommended to place out of the prerequisite and take STAT 230 directly. After taking this course, students are expected to be able to (a) appropriately perform statistical analysis of the data, and properly interpret and communicate their results; (b) understand how most R functions listed in Pruim (2021) work and how to run R for data analysis; (c) understand basic working environment in R Studio; and (d) create reproducible reports using R Markdown. While it's our departmental practice to use the R package "mosaic" (Pruim, Kaplan, and Horton 2017) to teach R in all of our introductory statistics courses and STAT 230, the proposed SCRATCH project can also be integrated in courses using other R packages or syntaxes.
We sent the Project Guideline (see Appendix A) as part of our welcome E-mail to the pre-registered students about a week before Spring 2021 began, inviting them to get to know us via our SCRATCH introduction project, as well as to begin thinking about how to introduce themselves to the class in a similar way. One cool feature of SCRATCH is its "remix" functionality which permits a user to easily duplicate any shared project in SCRATCH to create their own within a few minutes. Therefore, our students could easily use our SCRATCH project as a template and remix it to make their own introduction. After remixing our project, they can click the "See Inside" button at the upper right corner and access the SCRATCH coding interface as shown in Figure 1. There are three main sections in SCRATCH: A block palette on the left, a coding area in the middle, and a stage area on the right which displays the project results on the top and all the sprites (images, objects, or characters) and backdrops (backgrounds) on the bottom. Since SCRATCH was built upon Papert's principles (1980) of good programming languages-namely, low-floor (easy access to get started), high-ceiling (great potential to expand), and widewalls (inclusive capacity to support different types of projects, interests, styles, etc.)-even coding novices can immediately begin dragging blocks from the block palette at the left, drop them in the middle coding area, and click the green flag on the top to see how things change in the project. Students could change photos, sprites, backdrops, and contents to "personalize" the project. Additional materials-including a "Get Started with SCRATCH" document and a tutorial video-can be easily accessed via the links included in the Project Guideline (see Appendix A). There are numerous good tutorials available in SCRATCH too.
After students completed their projects, they clicked the "Share" button on their SCRATCH webpage, and copied and pasted the web link of their SCRACH project on a designated discussion forum on the course website to share their introduction with the whole class. They were also asked to check and comment on their peers' posts to get to know them, and could continue exploring those posts when meeting new folks in class throughout the semester. Since students were encouraged to share their hobbies and interests in their posts, those shared projects also provided students (and us) opportunities to get to know each other at a deeper level. This assignment was graded by completion.

Transition from SCRATCH to R
After all students played with (or more explicitly, self-explored) SCRATCH for about a week or two and turned in their SCRATCH projects, we transitioned to R in the first lab session at the end of the first week. We used some self-created SCRATRCH-like blocks (like those shown on the left side of the arrows in Figure 2) to explain some basic R functions (like those shown on the right side of the arrows in Figure 2) in the formula template of the "mosaic" package ( Figure 3, Pruim, Kaplan, and Horton 2017)-This is the R package used for data analysis in this course in the rest of the semester, and its consistent syntax helps the transition from a block-based language to a text-based language easier.
It's our belief that such analogy between SCRATCH and R plays a critical role in student learning, especially for those who were new to syntax coding, as it not only "visualizes" and gives life to a text-based programming language, but also helps students understand what is an R function and how it works in an effortless way. Let's use the first R function, msummary(), in Figure 2 as an example. Students could imagine that they dragged a "msummary" block from the block palette to the cod-ing area, typed the name of the dataset of interest in the empty socket, and clicked the green flag icon in SCRATCH to obtain numerical summaries of all variables in the specified dataset in the Stage area. Take the third function, gf_point(), as another example; this is the mosaic function used to generate scatterplots. Its corresponding "gf_point" block, if it existed, would have three sockets for the users to enter the y-variable, x-variable, and the dataset, while the other parts of the block/function were unchangeable (including the tilde sign "~" between y and x variables). In other words, we used what students had seen in SCRATCH to help them understand that, like a block in SCRATCH, a function in R has parts we can't change (the structure of a block vs. the name and syntax of a function), and parts we have to decide and fill in (the empty sockets in SCRATCH blocks vs. the arguments in R functions such as datasets, variables, etc.) Furthermore, we used this opportunity to explain various features in R Markdown, such as a code Chunk (like a stack of blocks), the "Run Current Chunk" gear (like the green flag icon in SCRATCH), R packages (like colors/categories in the block palette) and an R Markdown document (like a SCRATCH project). Those connections may seem trivial at first, but pointing out those similarities between these two different programming languages helps the least experienced students understand basic structures of a complex language like R in the most vivid and intuitive way possible. The joy and confidence students gained from completing this SCRATCH project also helped set the right tone for student learning from the very beginning.

Participants and Survey Question
There were 46 students enrolled in two sections of STAT 230 in Spring 2021, 39 of whom granted us permission via IRB consent to use their data for this pilot study. We classified them into the following three groups based on their responses on Day 1 to the survey question (Q1): "If you used R before, on a scale of 0-10 (0 being extremely unpleasant, 10 being extremely pleasant), how comfortable do you feel about using R?" Group 1: R novices 11 out of 39 had never used R before (who were also all first-year students); they were instructed to enter 0 for Q1. Group 2: R progressors 11 out of the 28 students who had used R before said that they didn't feel comfortable using R-those were the students who responded with a score of 5 or lower to Q1. Group 3: R proficient The remaining 17 students who had learned R previously gave a score of 6 or higher to Q1, indicating that they felt comfortable using R.
It is worth noting that 5 out of 17 students in Group 3 rated their comfort level of using R as 8 (out of 10). This group of 39 students presents a typical mix of learners with very heterogeneous coding background in our STAT 230. About a quarter of the class were R novices who placed out of the intro stats prerequisite with an AP Statistics score of 5 or other quantitative training. Moreover, another quarter of students had learned R previously in an intro stats course, but either didn't have a positive first experience or didn't gain self-efficacy for R coding from that course; we learned of those information from students' reflections on their prior R experience in the first day survey and from one-on-one conversations during our office hours. Last, the remaining students already felt good about using R, some of whom might have taken more than one statistics courses; those were R proficient. Teaching R programming to students with such high diversity in their prior experience has been always a major challenge in teaching this 200-level statistics course.
To understand students' self-efficacy in R coding over the course of a semester, we tracked students' comfort levels of using R by asking Q1 two more times: one at the end of Week 4, and the other at the end of the semester (in the last week, Week 14, of the teaching period). While all 39 participants (Group1: 11, Group 2: 11, Group 3: 17) filled out the survey on the first day, it's worth noting that two students missed the survey in Week 4 (total 37 participants-Group1: 11, Group 2: 10, Group 3: 16), and one in Group 1 missed the end-of-semester survey (i.e., Group1: 10, Group 2: 11, Group 3: 17).

Results and Perspectives
Appendix B includes some exploratory results of the pilot study, which show that the majority of students enjoyed the proposed SCRATCH project, found SCRATCH easy and intuitive to use, and believed that this project helped them understand how basic coding works, regardless of their prior coding background. However, it's not our intention to examine the effectiveness of this SCRATCH-to-R approach here. Instead, three main goals of this pilot study are: (a) to provide an innovative and inclusive pedagogical tool for introducing coding to students with diverse prior experience; (b) to understand how students with different prior coding experience change their coding self-efficacy over time; and (c) to find out which group(s) of students is more disadvantaged than the other(s) when it comes to coding selfefficacy. It is also our hope that students could get to know each other and us (the professors) via this fun project. Some students' comments about this project are included in Section 4.4. Figure 4 shows how students' comfort levels of using R vary among three R groups through the semester. Notice how divergent three groups' comfort levels were in the first week and how much closer they were at the end of the semester. While we can't fully contribute the observed "improvements" to the SCRATCH project presented here (as those effects were highly compounded with other scaffolding course material and unmeasured factors), Figure 4 helps visualize the existence of the "self-efficacy gaps" among three groups, and how those gaps evolved over time. One can easily see that, although all students learned R via same material from the beginning, their levels of self-efficacy in R coding are very different. For instance, it's noticeable how fast some of the R novices caught up with the R progressors (or even some R proficient) in three weeks, while the majority of R progressors moved upward relatively slowly; a few of them even remained "progressing" (rated their comfort level of using R still around 5) through the end of the semester.

Disparities in Student Coding Self-Efficacy
One might use Figure 4 to argue that coding educators don't need to worry too much about how to introduce R programming to students, as students would eventually achieve similar comfort levels by the end of the semester, regardless of how much R novices or R progressors might feel overwhelmed and struggled in the beginning. That is certainly not our point here. First, keep in mind that the results shown here were impacted by the proposed SCRATCH project (whose effect will be studied in a separate study), so students' levels of comfort in using R could look differently without such an intervention. Moreover, what Figure 4 brings to light is not only the changes in students' comfort levels of using R, but also the additional cognitive load of R novices and R progressors; students with no or poor prior coding experience apparently started off disadvantaged, and had to spend extra time and energy "catching up" their coding gaps through the entire term, while mastering the same statistics material as R proficient. In other words, R novices and R progressors are disadvantaged, from the beginning to the end of the semester, by substantially more content to learn and a lot more stress or anxiety to cope with compared to R proficient students in the same class. This is a learning inequity issue that has been long overlooked in college coding education. It's our hope that more SDS educators would provide equitable support and inclusive tools to R novices and R progressors so that their learning barriers resulting from such "catching up" could be further lowered, giving those disadvantaged students a more equal chance to thrive in our SDS classroom.

The Most Disadvantaged Group
From our teaching experience, Group 2 (R progressors) is probably the most disadvantaged group among three; some of them usually "hide" themselves in the beginning of the semester, due to lack of confidence in their coding skills, while some may struggle through the whole semester, if no adequate and timely support is provided. After all, students who struggle are often the most silent ones, especially when they don't feel supported and "included" in class. This pilot study confirmed our anecdotal experience. Unlike many R novices who were excited about learning R programming, R progressors had already learned R before but unfortunately didn't get a good first (or second) coding experience or didn't gain sufficient confidence in R coding from those prior experiences. Therefore, they may need to "un-learn" what they incorrectly learned first and figure out which parts they missed or misunderstood previously before they could move forward and do well; or, they may need a different pedagogical approach, which is more inclusive and intuitive, to help them better understand how coding works and gain coding self-efficacy. Either way, this is another hidden layer of stress and anxiety for R progressors, which most R novices and R proficient don't need to face in their learning.
Too often SDS educators (including us) tend to teach too much and too fast in the beginning of the semester and forget that the doing-more-coding-exercises-on-your-own approach might not work for all learners, especially those with poor prior experience or with low coding self-efficacy. This pilot study shows that the lingering effect of poor prior experiences or low self-efficacy in learning coding might be bigger than many of us thought, so it's crucially important for SDS educators to design and make the first few computing assignments in our 100or 200-level courses as inclusive, accessible, encouraging, and positive as possible. We argue that integrating more inclusive pedagogy in coding education is not only necessary but indispensable, if we want to recruit and retain more marginalized and disadvantaged students in statistics and data science, and diminish the systemic inequity in SDS education system.

Community Building in a Remote Learning Environment
According to Murphy et al. (2021aMurphy et al. ( , 2021b, belonging and connection in the classroom play important roles in student success and well-being, especially for marginalized students, but it's particularly hard to foster those interpersonal relationships during the pandemic. That's why we set "community building" as our number one goal when preparing for remote teaching last year, which soon sparked the birth of this SCRATCH project. To understand if this SCRATCH project helped students and us build a learning community remotely, we asked students to recall and reflect on this SCRATCH project at the end of the semester. The feedback from students was overwhelmingly positive on this regard. Almost all students (except one), regardless of their prior R background, agreed that this project helped them get to know us (the professors) and see us as human beings from the first day; they found it easy to find out things beyond our title and name, such as our journey in SDS, passions, hobbies, etc., from our shared sample project. Likewise, we could learn more about our students besides their names and class years shown on the course roster from their shared projects. In addition, 83.8% of the class also agreed that it's easy to get to know their peers via those shared projects and found this an effective way in connecting themselves to their peers in a virtual space. The information students learned from others' SCRATCH projects helped them identify partners with similar minds, interests, or hobbies, for course work and initiate casual conversations about life beyond academic collaborations. Several students shared with us that the relationships they built in our class seemed rooted deeper than other classes they took during the pandemic, and many of them became good friends and met regularly even after the semester ended.

Students' Feedback
While this pilot study was conducted in Spring 2021, this SCRATCH project was first introduced in our class in Fall 2020, which resulted in very positive feedback from students, especially those with poor prior coding experience. Several R progressors shared with us (in separate conversations) that they didn't fully understand how R works when taking intro stats, and were hesitating to take STAT 230 because of R. However, the completion of this SCRATCH project in the first week and hearing the analogy between SCRATCH and R in the first lab helped them close some hidden learning gaps and "suddenly everything clicked" (a direct quote from a student). They felt less worried about coding in general and had more joy and confidence in learning R this round. Here are some other comments from students in Spring 2021: • I thought that learning how to figure SCRATCH out on my own was really valuable, since it helped me adjust to the learning curve of using R for the first time. • I liked the fact that we were allowed to be creative with our SCRATCH projects……I had a lot of fun. I also really liked the problem solving aspect of it. • I thought it was cool to drag the functions over to the coding region to get a sense of the logic required for the coding. • I liked the easiness to attach blocks together in SCRATCH, it made it very intuitive to code. • Doing the blocks with code helped to visualize what we are doing in R. • I thought that making blocks was helpful in understanding coding structure. • I thought the analogy of SCRATCH vs R was helpful for my understanding. • It was nice to meet classmates at our own pace.
• Scratch project was helpful for getting to know the rest of the class.

Conclusion and Discussion
D'Ignazio and Klein (2021) addressed in their keynote speech in the United States Conference On Teaching Statistics (USCOTS) 2021 that "feminist design prioritizes the participation of people who have been most marginalized by the system" and suggested in their book, Data Feminism, that "we should be creating new knowledge and new designs from the margins" (D'Ignazio and Klein 2020, p. 139). Designed with the students with no or poor prior R experience or with low coding self-efficacy in mind, the proposed SCRATCH project helped the disadvantaged groups (R novices and R progressors) more than the dominant group (R proficient)-dominating in their coding advantages, rather than size-while all individual in the class, including us, benefited from the community building aspect of the project. The pilot study further revealed distinct changes in self-efficacy among students with different prior experiences over the course of a semester, drawing our attention to the most disadvantaged group, the R progressors-who either need to unlearn the past and catch up their learning curve in R, or rebuild their R coding confidence, or both, while also mastering the same statistics materials as their R proficient peers. This learning inequity issue is, in a sense, unique to introductory and intermediate SDS courses, because our students, who often have diverse prior coding experiences in those courses, need to learn not only statistical concepts and models but also some programming language simultaneously. Thus, how to teach coding effectively and inclusively is an imperative and urgent task for all SDS educators.
Traditional coding education relies heavily on lecture and problem sets, assuming that students can "figure out" how coding works simply by going through many tutorial exercises. However, such approach only prizes students who are independent, clever, and already excelling in higher education (Ko 2020), but overlooks students who are struggling, disadvantaged, or marginalized. To build an inclusive and equitable learning environment for all students with different programming backgrounds and abilities, SDS educators have to work together and create new innovative pedagogical tools "from the margins, " as well as brainstorm new ways to provide support to the least privileged students. We believe that our pilot study helped shed some light on this front.
Briefly, let us address some possible questions about the proposed project and potential future work. First, we are not interested in studying whether students enjoy learning SCRATCH or not (although many of them do, according to the pilot study). The only programming language we would like our students to master in this course is R, and SCRATCH is merely an aid; this SCRATCH project was the only assignment for which students needed to use SCRATCH. Second, we are not suggesting that simply using SCRATCH in the first week would automatically help students learn R in the rest of the semester. What matters more, pedagogically, is probably the analogy between SCRATCH and R, as highlighted in Section 3.3. The instructors need to plan carefully and make an intentional transition from SCRACH to R-which brings up our third question about this SCRATCH-to-R approach. Some SDS educators may have concerns about possible negative effects on students' learning if more than one programming languages were introduced in the same course. Note though that this block-to-text approach has been widely studied in higher education (see Section 2.4). Moreover, some recent studies have found positive outcomes from teaching two programming languages in the same semester; for instance, see Förster et al. (2021) and Ham and Amini (2017).
In our pilot study, we only focused on disadvantaged students who were disadvantaged, and thus marginalized, by their inequitable prior preparations or coding experience. Enlightened by this pilot study, we plan to collect more data, including demographic information of participants, and conduct a comparative study. We will compare outcomes from classes implementing the SCRATCH project to those which don't in the near future-either over multiple semesters or over different levels of SDS courses. This investigation will help us further understand the impacts of this SCRATCH-to-R approach in helping students with different prior backgrounds, as well as those with other marginalized identities (like women or people of color), to learn R. We wonder if the demographics of the students, or their perceived understanding of statistical concepts, are significantly different among three R groups. We are also curious about how this innovative new approach may help close the gaps in student programming self-efficacy and/or knowledge between students on the margins and in the majority. Last but not least, we believe that creating a block-based R application will likely be gamechanging. It will not only lower the entry barriers embedded in complex syntax coding but also make R more accessible and enjoyable to all students, just like SCRATCH.

[Purpose of the project]
The main purpose of this introduction project is for us to get to know each other from the very beginning, and help build a strong community throughout the course! Using SCRATCH -a coding program developed by MIT -to create this project should further help us learn some basic coding concepts *without* writing any code/syntax! Check on our sample project to get some ideas about what this project should look like: Group!), pointing out something you have in common with each of them, or any perspective or experience they shared surprised/excited you. 3. Your response to each post must include at least two complete sentences. Ideally the second post should be done before your first hangout with your Work Team, or no later than Sunday 2/21, 10:00 p.m.

Appendix B: Exploratory Results
Shortly after the first lab (i.e., after explaining the analogy between SCRATCH and R), students were asked to answer the following survey questions: Overall, over 80% of students found the SCRATCH project enjoyable (80.6%) and agreed that SCRATCH is easy to use (83.3%). If we look at those percentages by groups ( Figures 5  and 6), it appeared that R novices enjoyed this project the most (88.9%, compared to 72.7% for R progressors and 81.3% for R proficient); interestingly, those who didn't find SCRATCH easy and intuitive to use had all learned R previously. The corresponding sample sizes in those plots and figures are: 9R novices (Group 1), 11R progressors (Group 2), and 16R proficient (Group 3). When examining students as a whole, we found that 77.8% of them believed that this project helped them learn basic coding, 13.9% said they were not sure, and 8.3% didn't find it helpful but had no problem using R-all of whom had used R before. It may be worth noting that no students chose the option "No, and feel confused about how R works" in Q4. This finding   echoed the study done by Malan and Leitner in 2017 (although in their study they used SCRATCH as a gateway to Java, not R): "Most students felt that SCRATCH was a positive influence [on learning Java], particularly those without prior background; those students who felt that SCRATCH was not an influence, positive or negative, all had prior programming experience. " Figure 7 displays the responses of Q4 by groups. This SCRATCH project again seemed to benefit students without coding background the most (88.9%), but a good portion of those R proficient (81.3%) also believed that this project helped them understand how coding works. Figure 7 also shows that about 27.3% of the R progressors were not sure about the helpfulness of SCRATCH in learning R coding in the first week-which is much greater than that of R novices and R proficient. We wonder if this difference was caused by R progressors' negative prior experience, and lack of confidence in coding.