In search of experimental evidence on Scratch programming and students’ achievements in the first-year college computing class? Consider these datasets

This article presents datasets representing the demographics and achievements of computer science students in their first programming courses (CS1). They were collected from a research project comparing the effects of a constructionist Scratch programming and the conventional instructions on the achievements of CS1 students from selected Nigerian public colleges. The project consisted of two consecutive quasi-experiments. In both cases, we adopted a non-equivalent pretest-posttest control group design and multistage sampling. Institutions were selected following purposive sampling, and those selected were randomly assigned to the Scratch programming class (experimental) and the conventional (comparison) class. A questionnaire and pre- and post-introductory programming achievement tests were used to collect data. To strengthen the research design, we used the Coarsened Exact Matching (CEM) algorithm to create matched samples from the unmatched data obtained from both experiments. Future studies can use these data to identify the factors influencing CS1 students' performance, investigate how programming pedagogies or tools affect CS1 students' achievements in higher education, identify important trends using machine learning techniques, and address additional research ideas.


a b s t r a c t
This article presents datasets representing the demographics and achievements of computer science students in their first programming courses (CS1).They were collected from a research project comparing the effects of a constructionist Scratch programming and the conventional instructions on the achievements of CS1 students from selected Nigerian public colleges.The project consisted of two consecutive quasi-experiments.In both cases, we adopted a nonequivalent pretest-posttest control group design and multistage sampling.Institutions were selected following purposive sampling, and those selected were randomly assigned to the Scratch programming class (experimental) and the conventional (comparison) class.A questionnaire and pre-and post-introductory programming achievement tests were used to collect data.To strengthen the research design, we used the Coarsened Exact Matching (CEM) algorithm to create matched samples from the unmatched data obtained from both experiments.Future studies can use these data to identify the factors influencing CS1 students' performance, investigate how programming pedagogies or tools affect CS1 students' achievements in higher education, identify important trends using machine learning techniques, and address additional research ideas.
© 2022 The Author(s).Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Subject
Computer Science Specific subject area Pedagogies for teaching novice computer science students Type of data Table Figure How the data were acquired These data were collected from 4 cohorts of first-year polytechnic computer science students, representing 4 treatment groups.A CS1 student profile questionnaire, a pre-and post-Introductory Programming Achievement Tests, all paper based, were used to acquire the data.Then, the Coarsened Exact Matching (CEM) algorithm was employed to generate matched treatment samples from the data.This resulted in 2 pairs of equivalent samples from the 4 treatment groups.Data format Raw.
Filtered.Analysed Description of data collection These datasets were gathered from 4 cohorts of Nigerian polytechnic CS1 students who participated in 2 successive experiments, spanning 2 academic sessions.Institutions were selected using purposive sampling, and those selected were randomly assigned to treatment groups.The data collected include student profiles and pre-post achievement test scores.The participants were administered paper-based questionnaires and achievement tests.A computer science educator marked all the achievement tests, following the rubric presented in [1] .Data were collected from 520 first-year computer science students.We excluded data from subjects who did not complete all 3 instruments, leaving data from 418 participants.

Value of the Data
• These datasets contribute empirical data on the effect of Scratch, a block-based programming language, on students' achievements in a college first-year programming course (CS1).• Academic researchers and students looking at how programming pedagogy affects CS1 achievement, as well as computing instructors planning to use Scratch in a college course, can benefit from using these data.• With these data, a researcher can generate the effect size likely to be detected in an experiment comparing the effects of Scratch and conventional programming languages on the achievements of first-year college students.This input is required by a researcher when performing a power analysis to determine the required sample sizes for the treatment groups.
• They can also be used to test factors that may moderate CS1 achievement, such as previous achievements in English, mathematics, and physics, age or gender.• These data can be used to reproduce or replicate experiments comparing the effects of constructionist Scratch and conventional pedagogies on CS1 students' achievements.This can be achieved by employing the same unmatched or matched data, or by using a matching algorithm like the Coarsened Exact Matching (CEM) to generate randomly matched samples from the unmatched data.• Other research questions or hypotheses can also be tested with these data.For example, the data collected included both the conceptual and algorithmic knowledge that the students provided in their answers to the open-ended questions in the achievement tests.From these, a researcher can explore the knowledge gained by the participants from the use of both pedagogies.

Data Description
The data presented in this article were obtained from a research project that compared the effects of a constructionist Scratch programming intervention and conventional programming instruction on the achievements of first-year college computer science students.Constructionism, a variant of the constructivist theory, is an educational philosophy propounded by the South African American mathematician and computer scientist Seymour Papert.Defined as a theory of learning and making, constructionism argues that students can engage better with knowledge if teachers provide them with the freedom to express their creative potentials as they construct and share artefacts of interest with their peers [2] .Scratch, the most popular block-based programming language, is a product of the constructionist philosophy.While the constructionist class experienced an inquiry-based learning with the teacher presenting Scratch programming demos and students developing Scratch codes, the conventional instruction had lectures and labs (with students employing Visual Basic, a textual programming language).In this section, we present the demographic and achievement data from both treatment groups as provided in the repository [1] .

Demographic Data
These datasets contained demographic information such as gender, age, and educational, programming, and artistic backgrounds.These variables provide a means for operationalising and measuring constructs that are sometimes found to moderate CS1 students' achievements.Although some variables had values reported by the participants, others represented indices computed from their self-reported data.

Achievement Data
We gathered the achievement data using the open-ended questions from the pretest and posttest.The questions were split into 2 categories: conceptual programming knowledge and computational thinking.By testing for computational thinking, we made the tests languageindependent since both treatment groups were exposed to different programming languages.In doing this, we assessed students' activities that resemble constructing, explaining, and tracing program codes.
We evaluated the students' answers to the questions in the achievement tests, employing a combined taxonomy (Bloom and SOLO), as used in [3] .
The taxonomy used in the rubric for grading the tests had 3 categories, from lowest to highest: unistructural, multistructural and cognitive classes.
Also, each category had 3 cognitive levels, from lowest to highest: understanding, applying, and creating.
Unistructural cognition denotes a student's limited knowledge of a body of concepts and local perspective.The student fails to connect between related ideas and misses the other points or ideas.
When a student responds with multiple ideas or concepts in their answers, this is a sign of multistructural knowledge.However, the student did not connect these related concepts.
The relational cognitive category assumes that the student knows every related idea or concept and can connect them in the correct way.
Therefore, a student demonstrates the highest ability when their answer indicates relational creation and the lowest when their answer shows a unistructural understanding.
In the datasets provided in the repository [1] , both demographic and achievement data were combined into SPSS file or Microsoft Excel files.To simplify the presentation in this article, we divide the contents of a file in the repository into four tables ( Tables 1 -4 ).
Table 1 contains data from the CS1 Student Profile Questionnaire (CSPROQ).We performed some pre-processing in Microsoft Excel before moving the data to SPSS.As a result, the table now includes self-reported data as well as data generated from the self-reported data using Excel formulas.For instance, from English, Math, and Physics, respectively, the EnglishGP, MathGP, and PhysicsGP were computed.PriorAcademicBackground is an index that indicates a participant's prior academic performance.EnglishGP, MathsGP, and PhysicsGP were used to compute this index.PriorProgrammingLearning is also an index.It indicated the level of a participant's prior learning of programming.This was calculated based on values in variables LearntInPrima-rySchool, LearntInSecSchool, LearntAtITSchl, LearntAtITPark, OnTheInternet, and FromTextBook.PriorProgramWriting is also an index aimed at measuring level of students' prior-to-college experience with writing programs.It was derived from answers to questions about participants' prior programming experience in programming languages like C/C ++ /C#, HTML, Java, JavaScript, Basic/VisualBasic, Python, MATLAB, SQL, Scratch, and Others.Using four self-reported Likert-scale variables-PlayingComputerGames, DrawingOnTheComputers, BuildingArtworks, and WorkingWithVideos-the PriorVisualArt index was computed to measure the degree of prior visual artistic experience of the participants.
Before taking programming classes in one of the 2 modes, the data that were gathered from the participants are listed in Table 2 .The variables in this pretest instrument demand that participants respond to some open-ended questions.The questions consist of 2 categories: conceptual programming knowledge and computational/algorithmic thinking.Variables CMU1 to CMU10 refer to the first category, while the remaining variables refer to the second category.CTOTAL20 represents the total score computed from the values of CMU1 to CMU10.The CQ-TOTAL represents the total score of the computational/algorithmic thinking questions.The total score for the pretest was 50, as represented by the PretestScore50.
Table 3 presents the data collected from participants after exposing them to programming in the 2 classes.Although with some reordering of questions, this posttest contains the same variables as in the pretest ( Table 2 ).The variables obtained by rounding the pre-test, post-test, and gain scores (the difference between the pre-test and post-test) to 100 are shown in Table 4 .
Two CS1 cohorts from 2 polytechnics are represented in Table 5 , showing the descriptive summaries of the data collected from them.Participants who enrolled and were instructed in the 2 programming learning modes make up both samples.The minimum entry age to university in Nigeria is 16.This informed our use of participants from that age in our data collection.However, this raises some ethical questions as regards consent.We provide answers to that in the ethical statements section of this article.
Table 6 displays the descriptive summaries of the matched samples generated by CEM from dataset1 ( Table 5 ).The samples consist of cases chosen at random from each treatment group in Table 5 .They were then assigned to the corresponding treatment groups in dataset2 ( Table 6 ).Matching was to ensure that we have equivalents samples in the 2 treatment groups in dataset2.Pretest scores, gender, age, prior academic level, prior program writing and prior visual artistic abilities of students in the intact classes are among the covariates used to match samples.Descriptive summaries of the demographic and achievement data from 2 new cohorts of CS1 students who participated in the following session are presented in Table 7 .
Table 8 gives the summaries of the matched samples (dataset4) obtained by using CEM to match cases from the 2 treatment groups in Table 7 .Samples were matched on pretest scores, gender, age, prior academic level, prior program writing and prior visual artistic abilities of students in the intact classes (dataset3).

Research Design
We employed a quasi-experimental, non-equivalent pre-test-post-test control group design.With the weakness arising from this inability to assign participants randomly to treatment classes, the research design was strengthened by pretesting and using the Coarsened Exact Matching (CEM) algorithm to generate matched treatment groups ( Fig. 6 ).Another advantage of employing CEM is that it removed outliers from the unmatched data, generating equivalent samples for data analysis (See Figs. 1 and 2 ).Interested users can download CEM freely as an SPSS add-in from https://projects.iq.harvard.edu/cem-spss/pages/installation .Following the installation, CEM will be found in the Analyze menu in the SPSS program.

Setting
Data collection took place in four selected public polytechnics in 2 states of north-central Nigeria ( Fig. 3 ).Niger State Polytechnic Zungeru (NSPZ) has its main campus in Zungeru, a rural town and former capital of the colonial northern protectorate of Nigeria.The NSPZ admits mainly Niger state indigenes, with most inhabitants working as agrarians, artisans, traders, and civil servants.Federal Polytechnic Bida (FPB) is in Bida, the second largest town in Niger state.Being a federal institution, the FPB admits a large population of students from neighbouring  southwestern and northern central states.Another institution, the Federal Polytechnic Nasarawa (FPN), is in Nasarawa State.FPN is in a rural town, but like FPB and with its proximity to Abuja, Nigeria's capital, it enrols large student population from various parts of Nigeria.The fourth site is another state-owned institution, the Nasarawa State Polytechnic Lafia (NSPL), now renamed Isa Mustapha Agwai 1 Polytechnic, located in Lafia, the state capital.
The first experiment was conducted with FPB and FPN representing the control and experimental sites respectively, during the 2014/2015 session.The second experiment was conducted during the 2015/2016 session with new cohorts of students in the NSPZ, FPN, and NSPL.NSPZ represented the experimental group, whereas the other sites were the control groups.However, the datasets presented in this article did not included data from the NSPL.

Sampling
The same sampling procedure was followed to collect data during the 2 experiments.A purposive sampling technique was employed to select institutions that were randomly assigned to treatment groups.Using CEM, we generated from dataset1 matched samples ( n = 82, with randomly assigned 41 cases in each treatment group).This resulted in dataset2 shown in Table 9 .We conducted an ANCOVA of the dataset2 using SPSS version 23.This provided one input (i.e., effect size) required for the power analysis.We obtained a partial eta-squared value of 0.094, indicating a moderate effect.This value agrees with the value obtained from a meta-analysis comparing the effects of block-based and textual programming languages on student achievements [4] .G * Power version 3.1.9.2 software was used to determine the sample size for dataset3.As Fig. 4 suggests, to detect an effect from the treatment at a power of 0.8, a p -value of 0.05, and a moderate effect size of f = 0.3113, we would require a sample of 83.With this input, using CEM, we generated from the dataset3, a matched sample ( n = 84) shown in Table 9 .

Instruments
We employed Scratch 2.0 environment ( Fig. 5 ) in the experimental class.Developed by the Lifelong Kindergarten Group at the MIT Media Lab USA, Scratch (current version 3.0) is freely available at https://scratch.mit.edu/download .
Adapting from prior research [3] , we developed 2 instruments: the CS1 Student Profile Questionnaire (CSPROQ) and Introductory Programming Achievement Test (IPAT).The participants provided demographic data with the CSPROQ and achievement data using the IPAT.IPAT was used as a pretest, then with reordering questions, as a posttest.An author in [3] and two researchers validated both the CSPROQ and IPAT.

Data Collection Procedure
These experimental data were acquired from a research project that spanned 2 academic sessions: 2014/2015 and 2015/2016.As shown in Fig. 6 , each experiment started by administering CS1 students' profile questionnaires to the participants.Before programming instructions began, participants in both groups took the introductory programming achievement test (IPAT 1 ) as a pretest.The first author taught both classes in two-hour weekly sessions for six weeks.Fig. 6 highlights the activities and features of both instruction modes.Then, subjects in both groups took the posttest, that is, IPAT 2 which contained the same questions as IPAT 1 , but with some reordering.

Data Validation
To use these data to answers specific research questions or test hypotheses, they need to satisfy some assumptions for required statistical tests.We provide additional documentation in the repository [1] detailing the specific tests that were conducted to validate the data.

Ethics Statements
Before commencing the research project, the Institute for Science and Technology Education-Sub Research Ethics Review, and College of Science, Engineering, and Technology Research and Ethics committees of the University of South Africa scrutinised and granted ethical approvals (No. 2015_CGS/ISTE_016) for the collection of these data.Then, we requested and obtained the approvals from the managements of participating polytechnics.Lastly, the research participants, after duly informing them about the nature of the project, signed the informed consent forms.Few participants between the ages of 16 and 17 took part in the study, raising some ethical questions since we did not obtain their parental approvals.However, the nature of the research and context provide some answers.The research is a low risk involving intact first-year computer science classes with one group learning to program in the conventional way and the other group learning in a constructivist inquiry-based pedagogy, during a six-week period.
In Nigeria, as in Sweden [5] , minors between the ages of 16 and 17 can participate in research without obtaining parental approval, as long they have the capacity to give their informed consents.Nevertheless, as stated earlier, data collection took place only after we have obtained approvals of participating institutions.We have provided copies of the ethical clearance certificates, participating institutions' approvals, and the informed consent form in the supplementary files.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
The Conventional versus Constructionist-Scratch programming instructions and students achievements in higher education CS1 classes (Original data) (Mendeley Data).

Fig. 1 .
Fig. 1.Box plot indicating outliers in the dataset before matching.

Fig. 3 .
Fig. 3.The four data collection sites -selected polytechnics in central Nigeria.

Fig. 4 .
Fig. 4. Power analysis to determine the main study sample size.

Table 4
Conversion of scores to 100.

Table 6
Demography and achievements of matched samples (dataset2)

Table 8
Demography and achievement of matched samples (dataset4).