Modeling students’ algorithmic thinking growth trajectories in different programming environments: an experimental test of the Matthew and compensatory hypothesis

In recent years, programming education has gained recognition at various educational levels due to its increasing importance. As the need for problem-solving skills becomes more vital, researchers have emphasized the significance of developing algorithmic thinking (AT) skills to help students in program development and error debugging. Despite the development of various text-based and block-based programming tools aimed at improving students’ AT, emerging evidence in the literature indicates insufficient AT skills among students. This study was conducted to understand the growth trajectory of students’ AT skills in different programming environments. The study utilized a multigroup experiment involving 240 programming students randomly assigned to three groups: a text-and-block-based group, a block-based-only group, and a text-based-only group. Students in the text-and-block-based group were exposed to Alice and Python; those in the block-based-only group were exposed to Alice; and those in the text-based-only group were exposed to Python. We found that participants’ growth trajectory in AT skills is linear, with a significant growth rate. Although between-person variability exists across groups, we observed a compensatory effect in the text-and-block-based and block-based-only groups. Additionally, we found significant differences in AT skills across the groups, with no evidence of a gender effect. Our findings suggest that combining text-based and block-based programming environments can lead to improved and sustained intra-individual problem-solving skills, particularly in the field of programming.


Introduction
In recent years, programming education has gained recognition at various educational levels due to its increasing importance (Altun & Mazman, 2015;Yusuf & Noor, 2023a, 2023b).However, to understand the fundamental concept of programming, students need to have a thorough understanding of algorithms.As the need for problem-solving skills becomes more vital, researchers have emphasized the significance of developing algorithmic thinking (AT) skills (Bacelo & Gomez-Chacon, 2023;Erümit, 2020;Kanaki & Kalogiannakis, 2022;Wang & Hwang, 2017).Such skills are expected to help students in program development and error debugging (Lehmann, 2023a).
Several countries have consciously incorporated AT into their school curricula (Durak, 2020Sari et al., 2022;Stephens, 2018).This aligns with the recognition that computational thinking, which encompasses AT and other essential skills, is a critical competency that students must acquire to meet the demands of the STEM industry now and in the future (Agbo et al., 2023;Angeli, 2022).Having the ability to think algorithmically helps in creating logical models that would solve computational problems (Agbo et al., 2023;Moon et al., 2020).
The field of AT and programming education is currently dominated by research emphasizing the importance of instructional tools in developing useful algorithms to solve computational problems (Lehmann, 2023a(Lehmann, , 2023b)).Consequently, empirical evidence reporting students' AT skills during programming instruction is currently emerging (Angeli, 2022;Lehmann, 2023a;Moala, 2021;Tupouniua, 2020).While various programming tools have been developed to improve students' AT, available evidence suggests insufficient AT skills among students (Angeli & Valanides, 2020;Sari et al., 2022).However, this evidence emanates from studies focusing on the impact of programming tools on AT development (see e.g., Angeli, 2022;Sari et al., 2022;Tupouniua, 2020).Although these studies have provided insight into the practical importance of these tools, it remains unclear how students' AT develops in relation to various programming tools.
We argue that if researchers and programming educators are to better understand the actual impact of programming tools, they need to have prior information on students' growth trajectories of computational processes, with a substantial amount of different time points.In this context, we also stress the importance of addressing questions such as do students' growth trajectories in computational processes occur in linear or quadratic patterns; do novice programmers show more progressive growth than expert programmers; or do students exhibit differences in their computational processes in relation to programming tools?Arising from the need to address these questions, a study is, therefore, needed to track students' AT growth trajectories to uncover patterns and variances in AT development.This is a departure point of this study.We addressed the following research questions: RQ1 What are students' AT growth trajectories when learning programming across three programming environments?RQ2 Do low-achieving learners show more progressive growth in AT skills than high-achieving learners when learning programming across three programming environments?RQ3 Do students exhibit differences in their AT growth trajectories when learning programming across three programming environments?RQ4 Are there significant differences in AT skills dimension between students who learn programming in different programming environments?RQ5 Are there gender differences in AT skills dimension across different intervention groups?
Considering these varied interpretations, a consolidated definition of AT can be proposed.We refer to AT as a cognitive process that involves the ability to decompose complex problems into manageable components, recognize patterns, generalize solutions, and design systematic steps or algorithms to solve problems effectively.It integrates elements of logical reasoning, mathematical thinking, and computational skills, and provides a comprehensive framework for understanding and applying AT.This consolidated definition captures the essence of AT as described by various scholars, emphasizing its systematic, logical, and organized nature.Our definition acknowledges the procedural aspect highlighted by Futschek (2006) and Ziatdinov and Musa (2012), the logical and structured approach described by Lockwood et al. (2016) and Bacelo and Gomez-Chacon (2023), and the broader cognitive skills emphasized by Stephens (2018) and Blannin and Symons (2019).

Algorithmic thinking framework
Research has shown that the process of developing an algorithm is similar to a problemsolving process (Erümit, 2020;Ritter & Standl, 2023).In this light, several frameworks have been proposed to understand the process of AT.For example, Sari et al. (2022) developed a framework to measure student AT skills.Their three performance indicators include understanding the problem, determining a solution, and creating the algorithm.Futschek and Moschitz (2010) proposed an iterative framework comprising five AT processes: problem analysis, idea formulation, algorithm design, algorithm execution, and algorithm reflection.From the lens of computational thinking, Ritter and Standl (2023) proposed a five-stage AT process, including description, abstraction, decomposition, algorithm design, and testing.
Due to lack of universal framework, Lehmann (2023aLehmann ( , 2023b) ) proposed a unified framework that highlights the algorithmic thinking process based on four major cognitive skills: decomposition, abstraction, algorithmization, and debugging (we refer to this as the DAAD framework).Although Lehmann's unified framework has not yet received wider empirical validation, we strongly believe that it was rooted in a valid computational thinking framework (Shute et al., 2017), suggesting potential universal acceptability.We discussed the components of the DAAD framework below to provide more insight into AT processes.

Decomposition
Across the literature, decomposition is perceived differently depending on the discipline.In mathematics, it involves breaking down complex problems into simpler ones that meet initial conditions.The solutions to these simpler problems are then combined to solve the original problem (Stephens & Kadijevich, 2020).In computer science, it is a problem-solving technique where a complex problem is broken down into smaller, more manageable parts (Futschek & Moschitz, 2010).This approach allows programmers to tackle each part of a problem individually, making it easier to understand and solve.By breaking down a problem into smaller parts, programmers can identify potential errors and bugs more easily (Kwon & Cheon, 2019).This conviction is similar to the perspective of computational thinking which Shute et al. (2017) view as a process of dissecting complex problems into manageable parts.A compelling example of decomposition is shown by Lehmann (2023a) in their Waffle breakfast project where students were required to decompose the process of making a Waffle dish.

Abstraction
The second step in AT process according to Lehmann is abstraction.Within the computational thinking literature, abstraction involves looking for patterns within a decomposed problem and filtering out important elements that are not needed to solve a problem (Shute et al., 2017;Wing, 2008).Lehmann highlights that this stage involves the construction of a visual representation in the form of models that illustrate how the problem works by using the important, retained elements.A compelling example of abstraction was shown by Lee et al. (2011) where a group of middle school students were required to use a virtual 3D model to represent the spread of a disease by identifying important information related to school layout, number of students, and disease virulence.Similarly, Nurhasanah et al. (2013) examined seventh-grade students' abstraction process in learning geometry using Geometer Sketchpad (GSP) within the framework of van Hiele's teaching model.Their study revealed that abstraction process is categorized into conceptual-embodied and proceptual-symbolic, with the latter gaining more significant support from the GSP.

Algorithmization
The third step in AT process is the development of the final algorithm to solve the problem.This stage largely depends on the previous steps.In developing the algorithm, Lehmann (2023a) recommends the inclusion of specific actions that would transform input into a desired output.In this stage, various algorithmic concepts can be considered, including sequencing, branching and iteration, and functions.An example of algorithmization was proposed by Angeli (2022) where pre-service teachers were requested to write down pseudocodes of a suitable algorithm based on a textual description that highlights the behaviors of a robot.Similarly, Sari et al. (2022) exposed college students to STEM activities using the AT process.The participants were able to develop algorithms that enabled them to design traffic light models, develop a radar system, design fuel gauge, and develop many other prototypes.

Debugging
Debugging involves identifying and resolving errors in an algorithm to ensure that it accurately solves the problem.In addition, it involves exploring alternative approaches and actions to optimize the algorithm's efficiency.Debugging process is similar to those suggested in mathematics education (Lehmann, 2023a).For example, Maurer (1992) used the terms 'algorithm verification' and 'algorithm analysis' to describe the process of confirming that an algorithm was able to solve a problem and the process of evaluating the efficiency of an algorithm.Similarly, Moala et al. (2019) tasked a team of three predegree students with creating an algorithm to solve a friendship network optimization problem and testing their algorithm on other friendship networks for validation.The researchers found that the students kept certain elements of their algorithms that worked well for specific networks, but made changes (such as adding or removing instructions) when the algorithm did not work as intended.
Despite their efforts, the students were unable to create a general algorithm that worked for all networks.Venigalla and Chimalakonda (2020) employed a treasure hunt game to expose novice programmers to debugging, in which participants progress through levels by debugging code snippets.Based on the survey results, it is evident that the learning platform has an exceptional level of quality and underscores its suitability for learning debugging skills.While there seems to be some agreement in the literature on the concept of debugging, Lehmann (2023a) noted that there is a paucity of research evidence exploring how students utilize debugging skills.Table 1 provides a summary of the algorithm thinking process as proposed by Lehmann (2023aLehmann ( , 2023b)).

Issues in teaching AT skills
As highlighted in the preceding section, learning AT is crucial to solving programming and general problems.However, an important but less discussed issue in the literature is the concepts of AT to be taught in schools and the instructional environment to be used without significantly changing the existing curricula of computing education.Studies have proposed important AT concepts to be taught.For example, Futschek and Moschitz (2011) proposed that the basic AT concepts should include sequencing, iteration, and abstraction.Cooper et al. (2000) proposed that general AT concepts should include decomposition, repetition, data organization, generalization, design, and refinement.These concepts are similar to the DAAD framework proposed by Lehmann (2023aLehmann ( , 2023b)).However, to prevent ambiguity, we argued that the DAAD model should be used as a broad framework that guides the development of AT skills while specific activities can be integrated into each broad category.Figure 1 presents our proposed AT framework.
The outermost layer includes the broad categories of the DAAD framework.Each broad category forms the AT skills to be taught.However, they cannot be taught independently.Therefore, specific activities or sub-concepts need to be integrated for effective problem-solving processes.Although authors have emphasized that algorithmic contexts should be detached from programming and be connected to students' everyday lives (Li et al., 2020;Nijenhuis-Voogt et al., 2021), we strongly emphasized that teaching AT skills requires a careful selection of concepts from a particular Table 1 The DAAD Framework of AT process Proposed by Lehmann (2023aLehmann ( , 2023b) ) Cognitive skills Description/Supporting literature Decomposition Involves breaking down complex problems into simpler ones that meet initial conditions (Stephens & Kadijevich, 2020;Futschek & Moschitz, 2010) Abstraction Involves looking for patterns within a decomposed problem and filtering out important elements that are not needed to solve a problem Construction of a visual representation in the form of models that illustrate how the problem works by using the important, retained elements (Lee et al., 2011;Nurhasanah, 2013;Wetzel et al., 2020) Algorithmization Involves the development of the final strategy to solve the problem Inclusion of specific actions that would transform input into a desired output Consideration of various algorithmic concepts such sequencing, branching and iteration, and functions (Angeli, 2022;Sari et al., 2022) Debugging Involves identifying and resolving errors in an algorithm to ensure that it accurately solves the problem (Maurer, 1992;Moala et al., 2019) discipline.In the context of programming, we propose Brennan's and Resnick's (2012) computational thinking concepts (sequencing, conditionals, looping, functions) as alternative sub-concepts to be taught in a typical AT classroom.
Besides the proposed AT concepts to be taught, there are different views on the type of instructional environment to be employed.In a programming context, studies have shown that block-based programming environments (BBPEs) have the potential to enhance AT skills because they contain visual models that are appealing to human senses (e.g., Cooper et al., 2000;Yusuf & Noor, 2023b).According to these studies, students can understand the basic blocks of solving programming problems when exposed to these tools, and, therefore, develop better AT skills.For example, in earlier research, Cooper et al. (2000) used Alice to support the development of AT skills among college students.Their results indicate the flexibility of using Alice to promote students' AT skills.Similarly, findings by Durak (2020) indicate that both Alice and Scratch significantly improve students' AT skills.Furthermore, Angeli (2022) found that LEGO WeDo significantly promotes the participants' AT and debugging skills.
Despite the widely reported positive results of block-based programming modality (BPM), existing studies suggest that such positive effects are inconsistent in many experimental conditions.One recent experimental study indicates that BBPE such as Scratch did not cause any significant effect on students' AT skills (Jiang & Li, 2021).A plausible explanation from the literature is that BBPEs rely heavily on drag-anddrop (as opposed to text-based), which is not enough for beginners to develop AT skills (Bai et al., 2021;Deng et al., 2020).In their study, Weintrop and Wilensky (2017) argued that preventing students from writing codes and debugging errors through the use of BBPEs could distance them from mastering several computational thinking (CT) competencies.To effectively develop CT competency skills, several authors proposed the sole use of text-based programs (such as Python, e.g., Bai et al., 2021).In contrast, other authors proposed the sole use of BBPEs (such as Alice, e.g., Angeli, 2022;Yusuf et al., 2023), while several others proposed a combination of text-and block-based programming tools (Deng et al., 2020;Jiang & Li, 2021;Kroustalli & Xinogalos, 2021;Pellas, 2023;Saritepeci & Durak, 2017).With this debate, it is still unclear the best programming environment for developing AT skills.In this study, we propose the best environment using a multigroup experimental procedure.

Empirical review of related studies
Historically, algorithmic thinking (AT) was primarily associated with mathematics until recent applications in historical studies, text analysis, science activities, and mind games (Sari et al., 2022).With advancements in computing, studies have explored AT skills development in various programming environments.For example, Wong et al. (2024) examined the development of children's AT skills using Scratch.Their findings suggest significantly increased engagement-behavioral, cognitive, and affectiveduring the learning process.Similarly, Bacelo and Gomez-Chacon (2023) characterized AT in a university mathematics context using unplugged tasks.Their model connected mathematical and algorithmic spaces across three dimensions: semiotic, instrumental, and discursive, showing these interactions predict better programming performance.Angeli (2022) studied AT and debugging skills in an educational technology course.Using scaffolded programming scripts, pre-service teachers engaged in robotic programming with LEGO WeDo, resulting in significant learning gains.
Furthermore, Sari et al. (2022) investigated the impact of STEM-focused physical computing activities with Arduino on AT skills and STEM awareness among teacher candidates.Their mixed-methods study showed significant enhancements in both areas, involving 24 students from various universities in Turkey.Hsu and Wang (2018) explored using game mechanics and a student-generated questions strategy to promote AT skills in an online puzzle-based learning system, TGTS (Turtle Graphics Tutorial System).Their quasi-experiment with fourth-grade students demonstrated that game mechanics significantly improved AT skills and puzzle-solving performance.The combined approach of game mechanics and student-generated questions was especially effective in enhancing AT skills.
The above empirical review suggests that studies have been conducted to examine students' AT using different programming environments.While the findings in these studies recommend effective programming tools for promoting AT, they failed to provide information on sustained intra-and inter-individual growth trajectories.Mok et al. (2015) recommend the assessment of how students' learning outcomes progress over time.When examining growth trajectory, one interesting question is whether the growth rate is associated with students' initial skill level (Mok et al., 2015).For example, do higher achieving students during the initial experimental procedure show faster AT growth than the lower achieving ones, or do lower achievers show more progressive growth than the higher achievers?If such an association exists, then two reciprocal activities are likely to occur.
The first is the Matthew effect, which borrows the notion that "the rich get richer and the poor get poorer" (Shin et al., 2013).In the context of this study, the Matthew effect means that students with higher AT skills at the initial stage of an experiment are more likely to develop higher AT skills compared to those whose initially developed AT skills are low.In statistical terms, the intercept (initial AT skills) and the slope (rate of change in AT skills) are positively correlated.Thus, the Matthew effect widens the gap between higher and lower achievers.The second is less common and known as the compensatory effect.This effect refers to the possibility of low-achieving students developing more growth in AT skills than those whose initial AT skills are high.In statistical terms, this means that the intercept and slope are negatively correlated (Mok et al., 2015), thus reducing the achievement gap between the higher and lower achievers (Shin et al., 2013).
As previously highlighted, no research tracked students' growth trajectory in AT skills, particularly in the context of programming.This is a departure point of this study.Thus, our research was based on a multi-group experiment that tracked students' AT skills growth across several weeks.We propose that if researchers are interested in understanding students' AT skills development under any condition, information regarding their growth over time is significantly required, with a considerable number of time points.This information would help researchers and educators better understand crucial stages of AT skills development and how to promote sustained intra-individual AT development.

Design
The study employed a true experimental procedure involving pre-test, posttest design, and multi-group design.This design entails a random assignment of participants as opposed to the use of intact classes in a quasi-experimental procedure.This design has a strong philosophical underpinning and has been widely adopted in many studies that are interested in objective reality in the context of cause-and-effect relationships.According to Creswell (2014), the purpose of using a true experimental procedure is to establish a causal relationship between the independent and the outcome variables, while allowing the researcher to control for possible variables that are likely to influence the relationship.This design was used in preference to the quasi-experimental competition due to its ability to permit random assignment of participants.
Although more rigorous in terms of time and resources, Creswell (2014) contends that random assignment prevents possible selection bias that may arise from personal characteristics of the participants.In this research, participants were distributed equally across three intervention groups, making it possible for potential errors to be controlled.Several authors have discouraged the use of true experiments in educational settings where participants' formal classroom routines could be affected.However, the implementation of a true experiment in this study did not affect the participants' lecture periods as a suitable time was allocated for the interventions.In addition, the interventions were conducted as supplements to their regular programming instructions.

Participant recruitment
The participants involved year-two computer science and computer education students from a research university in Nigeria (Mean age = 19.5,Male = 157, Female = 83).The gender inequality in the number of recruited participants was due to gender disparity where girls were reported to lag far behind boys in enrollment into computing courses.This is contained in a recent report which indicates that enrolment into computing courses in Nigerian universities is generally skewed in favor of boys (75.87%), thereby creating a gender disparity (Statista, 2021).
Across the literature, there is no numerical standard for sample size adequacy in true experiments.However, several authors suggest a minimum of 15 subjects for experimental and control groups (Cohen et al., 2007;Gall et al., 1996).Other authors suggest a higher sample size, which depends on various factors such as the number of variable levels and whether the participants are treated as between or within groups (Brysbaert, 2019).Due to the lack of a numerical standard, the study recruited 240 students using stratified random sampling.Stratification was done to ensure that the two departments had a representative sample, while randomization was done to ensure that each student stood an equal chance of being selected (Creswell, 2014).Within the spectrum of the random sampling technique, a random generator approach was used.This approach entails inputting students' registration numbers into a column in an Excel worksheet and creating a randomly generated set of digits in another column.We sorted the randomly generated digits in ascending order, which then reshuffled the registration number column.The first 240 registration numbers were selected, indicating the eventual recruitment of the corresponding participants.
The research benefit of this sampling technique is that selection bias is controlled since individual characteristics are distributed by chance compared to other sampling competitions where such control is limited.The participants had no prior knowledge of programming except for an introductory course they received in their first year of study.Participants received an information sheet prior to the study outlining its purpose, with the option to opt out at any time and have their data deleted.After reading and signing a consent agreement, participants were assured of privacy with their data stored on an encrypted hard drive.The study was approved by the institutional review board.

Research setting
The study was conducted in a formal classroom environment where the participants were randomly assigned to three groups and were exposed to eight-week programming instruction as a supplement to their formal programming lectures.Information regarding participant distribution across the groups is presented in Table 2.The first group (Experimental Group 1) was exposed to a combination of Alice, a block-based programming environment, and Python, a text-based programming language (i.e., text-and-block-based group).The second group (Experimental Group 2) utilized only Alice as their programming environment (i.e., block-basedonly group), while the third group (Control Group) utilized only Python as their regular programming language (i.e., text-based-only group).The choice for opting for text-based-only as the programming environment in the control group is because it is the widely used learning environment in many Nigerian universities.
Although the National University Commission (NUC) has granted full autonomy to Nigerian universities concerning the right programming language they desire to teach (Eteng et al., 2022), many universities have yet to integrate BBPEs into their programming curriculum (Yusuf et al., 2023a).While we acknowledged the importance of text-based-only option in teaching algorithms and programming problems, recent research argued that text-based programming language such as Python requires good memorization skills to recall syntax and procedures (Yusuf & Noor, 2023b), hence defeating students' opportunity to acquire foundations in AT.
Hence, we introduced Alice as an innovative programming tool in this research.Across the three intervention groups, the participants were exposed to the four algorithmic thinking problems (Lehmann, 2023a), along with four core computational thinking concepts (Roman-Gonzalez, 2015) that were implemented to brainstorm the participants on programming problems.For every intervention across the groups, the participants first received a 40-min programming instruction delivered via an electronic board (e.g., see Fig. 2) and then proceeded to a 1-h practical session via computer workstations (e.g., see Fig. 3).During the practical sessions, each participant was assigned to one computer workstation with Internet connectivity and was encouraged to solve their assigned programming tasks alone.This was done to Fig. 2 Participants in experimental group 1 receiving programming instruction on Alice establish independent observation and prevent possible effects of pair or collaborative programming (Braught et al., 2008;Wei et al., 2021;Wu et al., 2019).

Instrument
While there are several computational thinking tests, including those that incorporate algorithmic thinking measures (e.g., Korkmaz et al., 2017), to the best of our knowledge, there is no specific competency test that measures students' AT skills in the literature.Therefore, we developed an algorithmic thinking test (ATt) based on the DAAD framework (Lehmann, 2023a(Lehmann, , 2023b)).The ATt is a competency test that comprises 28 items (see sample in Fig. 4).Each dimension of the DAAD framework  has 7 items that cut across the four algorithmic thinking concepts (sequences, conditionals, looping, and functions).The ATt was standardized on 417 computer science students from three universities in Nigeria (excluding the university where the sample was drawn).Initially, 32 items were tested for item difficulty using Rasch model (Rasch, 1960).
The model was fitted to the data using conditional maximum likelihood (CML) and the item characteristic curve (ICC) was used to identify difficulties.From the ICC plots,

Latent Dimension
Probability to Solve it was observed that items 5, 16, and 28 were extremely difficult while item 18 was extremely easy (Fig. 5).To validate these findings, a person-item map was created (Fig. 6) to compare the distribution of item measures and person measures along the scale.It is important to note that items are positioned along the scale measurement to assess the ability of all persons effectively.
After analyzing the person-item map, it was found that items 5, 16, 28, and 18 are positioned at the opposite end of the map, indicating that they are out of range.Based on the map, it appears that all individuals have nearly equal abilities to answer item 18, while only a selected few can correctly answer items 5, 16, and 28.Several authors (e.g., Boone, 2017) recommend removing or replacing items that are found to be extremely located.To prevent any instrument bias, we decided to eliminate these items and maintain the remaining 28 items.A second round of pilot test was conducted on 79 students to measure the instrument's reliability.A split-half method was used based on even and odd items.The two composite variables were then correlated, yielding a correlation coefficient of 0.756.This indicates the suitability of the ATt.

Programming activities
Across the three intervention groups, four programming activities, related to the four AT dimensions (Lehmann, 2023a), namely, decomposition, abstraction, algorithmization, and debugging, were used for developing students' AT skills.For each programming activity, the participants were presented with a visual illustration of a problem and asked to write down a pseudocode representing a suitable algorithm.Then, they were instructed to code the algorithm in their relevant programming languages.Although the activities were similar across the groups, the modality of task presentation differs.For example, in experimental group 1, the participants were presented with visual information that illustrates the behavior of a character in Alice and then instructed to write down the corresponding pseudocodes.Then they were instructed to implement the code in Alice to mimic the behavior of the character.Thereafter, they were requested to develop a simple Python code that prints each behavior of the character.
Similar procedures were implemented in experimental group 2 but the participants were only required to write down the pseudocode and implement an Alice code that mimics the character's behavior.In contrast, a textual description of a programming problem was presented to participants in the control group.Following this information, the participants were required to write down relevant pseudocodes and then implement the pseudocodes in Python programming language.It should be noted that each of the AT stages presents a unique activity that encompasses the algorithmic thinking concepts.A brief description of the tasks in each AT stage is presented in Table 3 while some examples of the activities are illustrated in Figs. 7, 8, and 9.

Intervention fidelity
Research evidence has shown that poor intervention fidelity leads to Type III error (i.e., getting the right answer for a wrong reason), which in turn, provides misleading information (An et al., 2020).To prevent this error and enhance a reliable and valid examination of the intervention effects, we followed the intervention fidelity highlighted by Creswell (2014).First, to prevent possible communication among the intervention groups, we conducted each experiment across the groups at an interval of 3 days.Because the participants reported having prior knowledge of problem-solving techniques in their first year, we presented them with a plethora of questions to link their existing knowledge with the subject matter.We ensured that each intervention group received the same instructional content, implemented at similar durations, and appropriate for participants' current preparedness level.To ensure that all participants focus on learning objectives, we engaged them in respectful and interesting tasks.Finally, we assisted participants in developing independent learning skills and provided them with self-assessment rubrics.In this step, block or textual codes of a solved problem containing several syntax, logical, and procedural errors was presented and the participants were instructed to run the codes and identify and fix problems.This can be done by eighter editing the source codes or using other coding alternatives Fig. 7 A sample of student tasks in experimental group 1 requiring them to watch and dissect the sequential hand movement of a Biped (decomposition), identify patterns in the movement (abstraction), write a pseudocode that illustrates the hand movement (algorithmization), use the pseudocodes to implement a code in Alice to mimic the exact movement of the Biped, implement simple codes in Python that illustrate the hand movement, and identify and fix errors (debugging)

Procedure
A consent approval was first sought from the university's review board.Upon approval, we embarked on recruiting the participants as discussed previously.The participants Fig. 8 A sample of student tasks in experimental group 2 requiring them to watch the conditional movement of the adult and child Biped, decompose their movement into constituent parts (decompose), remove less important components (abstraction), write a pseudocode that illustrates the movement (algorithmization), implement a code in Alice to mimic the exact movement of the Bipeds, and run the codes to fix identified errors (debugging) Fig. 9 A sample of student tasks in experimental group 1 requiring them to watch the sequential and repeated movement of an adult Biped, decompose movement patterns and gestures, identify unimportant movement, develop an algorithm to mimic the movements, implement the code, and debug errors also gave their consent to take part in the study and were randomly assigned to one of the intervention groups in equal proportion.Prior to the intervention, all participants received a pretest consisting of 28-item ATt.The purpose of the pretest was to measure prior AT skills of the participants.The use of pretest in pretest-posttest experimental procedure has attracted much debate, with several authors suggesting the use of some items in the original test while others recommend the use of all the tests.However, Creswell (2014) strongly recommends the latter option to examine the actual intervention effects but warned that the potential effects should be statistically controlled using relevant statistical models.After administering the pretest, we then exposed the groups to their corresponding interventions for a duration of eight weeks, including task-based activities that exposed the students to different AT skills.After every two interventions, one dimension of the ATt was administered to the participants online under intensive invigilation.In total, four different posttests were administered to measure students' growth trajectories in the domain of the AT dimensions.

Data analysis
We estimated the participants' growth trajectories of AT skills development (RQ1) using latent growth modeling (LGM) with lavaan R package and examined their within-and between-person variability (RQ2) using multilevel growth model (MGM) with nlme R package.
LGM is a continued version of Structural Equation Modeling (SEM) applied in the analysis of growth trajectory that occurred over time (Corcoran & O'Flaherty, 2017).Like SEM, it applies the same criteria for determining how well the model fits the data.The MGM is also a growth curve modeling technique but permits the estimation of inter-and intra-individual change over time by modeling their variances.For the LGM, two primary models have been proposed, including linear and quadratic models.Among the two models, a factor loading of 1 was assigned to the intercept (i.e., ICEPT).This represents the hypothesized AT skills when the growth curve begins.The factor loadings for the slope (0, 1, 2, 3, 4) assumed that the growth pattern is linear and that variations in AT skills development have a proportional effect that either accelerates or decelerates.In contrast, the factor loadings for the quadratic latent factor (0, 1, 4, 9) assumed that AT skills development occurs in a U shape.
To choose a robust model between the linear and quadratic model, four indicators were employed as baseline comparisons to test for the model fit, including the Akaike information criterion (AIC), Bayesian information criterion (BIC), chi-square test; Tucker-Lewis Index (TLI); Comparative Fit Index (CFI); and root-mean-squared error of approximation (RMSEA).As a rule of thumb, a model is considered to be fit if: AIC and BIC values are lower; chi-square (p-value) > 0.05; TLI > 0.85; CFI > 0.9; RMSEA < 0.05).For the MGM, two models have also been proposed, including the fixed and random effect models.The fixed effect assumed that the variation in AT skills development across time is constant within and between persons while the random effects assumed that such variations occur at random.
To address RQ3, we correlated the intercept and slope from the LGM and examined the direction and magnitude of the relationship.As a rule of thumb, a significant positive relationship (i.e., covariance) indicates the presence of a Matthew effect (i.e., students with high AT skills at the initial stage of the intervention develop more AT growth).In contrast, a significant negative correlation indicates the presence of a compensatory effect (i.e., students with low AT skills at the initial stage of the intervention develop more AT growth).To account for the effect of the pretest, we introduced scores of the pretest as a covariate in both the LGM and MGM.To address RQ4, we employed multivariate analysis of covariance (MANOVA) in SPSS version 23.The purpose of using this test is to test for significant differences in overall scores of the four AT dimensions across the three intervention groups while controlling for scores on the pretest.Finally, a two-way multivariate analysis of variance was used to address RQ5.Prior to the analysis, preliminary assumption tests were conducted to test for the presence of collinearity, outliers, normality, and homogeneity of covariance matrices, with no violations noticed.

RQ1 What are students' AT growth trajectories when learning programming across three programming environments?
After estimating the two growth models, the linear growth curve was a better model (see Table 4) The major importance of LGM is to model the mean variance and slope of growth progression across different time intervals.From Table 5, participants' initial average of AT skill development in the experimental group-1 is 20.30 with a mean growth   pretest across the groups suggests that it does not affect the growth trajectories of the participants.Figures 12, 13, and 14 illustrate the graphical growth trajectories of the three intervention groups.

RQ2 Do low-achieving learners show more progressive growth in AT skills
than high-achieving learners when learning programming across three programming environments?
We observed significant variances of the intercepts (initial growth) and slope (growth rate) across all the intervention groups.However, intercept variances were more prevalent in the control group (intercept variance = 12.97, p-value < 0.001) while slope variance was prevalent in the experimental group 1.These suggest differences in AT skills development across participants in the intervention groups.We correlated the variances of the intercepts and slopes across the intervention groups to examine their covariances (see covariance row of Table 5).As previously discussed, a positive covariance indicates the presence of a Matthew effect, and a negative covariance indicates a Compensatory effect.We found a negative and significant covariance in experimental group 1 (estimate = − 4.355, p-value < 0.01) and experimental group 2 (estimate = − 1.461, p-value < 0.05), suggesting a Compensatory effect.Despite the negative covariance in the control group, it is not significant, suggesting the absence of Matthew and compensatory effect (i.e., there are stable interindividual differences in the control group).
RQ3 Do students exhibit within-and between-person differences in their AT growth trajectories when learning programming across three programming environments?
Although significant variances were observed across the intervention groups in the preceding section, our latent growth models could not provide information on sustained intra-and inter-variances among the participants in each group.To account for this,  6 shows the unconditional multilevel growth model.Across the intervention groups, participants in the control group demonstrate more significant between-person variability (estimate = 12.468, p-value < 0.001) while experimental group 1 shows more significant within-person variability (estimate = 4.531, p-value < 0.001).To provide more insight into the within-and between-person variability, we calculated the inter-class coefficient (ICC) using the expression below: where σ0 is the between-person variation, and σε is the within-person variation.
From the inter-class coefficient (Table 6), it is apparent that the control group has a larger proportion of the total variability of between-persons (estimate = 0.916) but a smaller proportion of the total variability of within-persons (estimate = 0.084).This indicates that about 91.6% variations of the participants' AT skills development in this group is accounted for by between-persons while only 8.4% of the variability is accounted for by within-persons.Compelling evidence can be seen in Fig. 17 in the Appendix where most of the participants in this group have more different intercepts (between-persons) with a relatively stationary growth rate (within-persons) over time compared to the experimental group 1 (see Fig. 15) and experimental group 2 (see Fig. 16).

RQ4 Are there significant differences in AT skills dimension between students who learn programming in different programming environments?
To examine if significant differences exist among participants' AT skills across the intervention groups, a multivariate analysis of covariate was conducted.The dependent variable was the type of treatment given to each intervention group, while the covariate consisted of scores obtained from the pretest.After controlling for the covariate, there was a significant difference in the participants' AT skills development on the combined dependent variable (see Table 7, Wilks' Lambda = 0.287, F = 50.477,p-value < 0.001).The difference was large as evidenced by partial Eta squared of 0.423 (Cohen, 1988).When the results for the dependent variables were considered separately (Table 8), they also reached statistical significance using Bonferroni adjusted alpha level of 0.016 (Decomposition: F = 18.432,Abstraction: F = 45.199,Algorithmization: F = 110.60,Debugging: F = 200.05;p-value < 0.001).An inspection of the mean scores (Table 9) shows that participants in experimental group 1 obtained higher scores in the four AT skills compared to other groups.Analysis of post-hoc test using Tukey HSD (Table 10) shows that the differences in decomposition, abstraction, and algorithmization skills lie among the three intervention groups while the differences in debugging skills lie between the two treatment groups and the control group.

RQ5 Are there gender differences in AT skills dimension across different intervention groups?
To analyze gender differences in AT skills across the intervention groups and the interaction between gender and treatment type, we conducted a two-way multivariate analysis   of variance (see Table 11).Subjects were classified according to their gender category and their treatment group.There was no significant difference between male and female students in their AT skills development across the intervention groups (Pillai = 0.025, F = 1.521, p-value = 0.196).Additionally, there was no significant interaction effect between gender and the type of treatment (Pillai = 0.009, F = 0.533, p-value = 0.712).This suggests that male and female students did not differ significantly in their AT skills development regardless of the programming environment used.

Discussion
The study examined the growth trajectories of programming students' algorithmic thinking skills development.We adopted an algorithmic thinking framework proposed by Lehmann (2023a) to collect data on students' growth trajectories in eight programming lessons that utilized multigroup experiments.We analyzed the resultant data with statistical approaches that permitted us to model students' growth trajectories, within-and between-person variability, and the statistical difference in algorithmic thinking (AT) skills across different intervention groups.First, we found that the trajectory of students' algorithmic thinking skills development is linear rather than quadratic, and accelerates when students are exposed to continued and sustained intervention.On average, the students demonstrated significant AT skills at the initial stage of the interventions but also showed a significant improvement as the interventions proceeded.However, students taught with a combination of block-and text-based programming and those exposed to only block-based environments demonstrate significant AT skills at the initial stage of the interventions than those taught using only text-based environments.These students also show more progression in their AT skills development.
A plausible explanation for such differences in growth patterns can be attributed to the power of visual models in block-based environments.Although authors have argued that growth trajectories correlate with task difficulty in subsequent skill levels (Corcoran et al., 2017), several other authors argued that the sustained growth rate of AT skills is largely associated with visual models that expose students to real-world problem-solving skills (Durak, 2020;Yusuf et al., 2023).These authentic problem-solving skills are further strengthened by the integration of text-based programming that augments the real programming skills missing in block-based environments (Jiang & Li, 2021;Pellas, 2023).In a recent review, Yusuf and Noor (2023b) proposed that the pedagogical effectiveness of block-based environments lies in their incorporation of visual models which are more authentic.They claimed that such authentic learning promotes problem-solving skills and supports procedural learning.On this premise, we propose that although significant growth of algorithmic thinking skills could be achieved when learning programming in a text-based-only environment, sustained intra-individual growth is more achieved when students are exposed to a combination of text-and block-based environments.
As we consider the broad spectrum of growth trajectory, we found evidence of a Compensatory effect on students' AT skills when they are exposed to text-and-block-based and block-based-only environments.However, there was no evidence of such an effect when learning with a text-based-only environment.The presence of the Compensatory effect justifies the hypothesis that low-achieving students show more progressive growth in AT skills development than the higher-achieving ones.This further indicates that exposing students to block-and-text-based and block-based-only environments narrows the wider gap that exists between high-and low-achieving students.Our finding is supported by a large volume of research (Davis-Kean & Jager, 2014;Larsen & Little, 2023;Mok et al., 2015;Ready, 2013;Rescorla & Rosenthal, 2004), suggesting the validity of the compensatory hypothesis in different fields of scientific inquiry.
Explanations have been offered for the reason of compensatory effect and these were explained from the hypothesis of expertise reversal effect.This effect describes a phenomenon where instructional strategies that are beneficial for novice learners become less effective, or even detrimental, for more experienced learners.As learners gain expertise, they may find detailed guidance and scaffolding to be redundant or intrusive, as they prefer more autonomy and less structured support.In support of this argument, Kalyuga (2008) explains that when learners are familiar with information presented in multimedia environments, they process its transience easily and ignore the content.Additionally, due to their potential for higher mental objects.They have to reconcile the additional guidance with their schema, which might further induce their problem-solving skills.
While the compensatory effect suggests the narrowing of the achievement gap when learners are exposed to block-based programming, it, however, indicates that these environments are not inclusive as they are largely beneficial to a typical expert category but become less important to others.The implication of this finding is that block-based programming environments are only suitable in teaching AT concepts to novice programmers.
Consistent with prior studies (e.g., Larsen & Little, 2023) we did not find any evidence of Matthew effect in this research across the intervention groups.Matthew effect is largely associated with preferential attachment, whereby personalized growth rate is distributed according to initial achievement.With the absence of this effect, we, therefore, conclude that the three programming environments do not depict a preferential distribution of growth rate.Nonetheless, exposing students to a text-based-only environment would lead to stable intra-individual differences in AT skills, with a limited growth rate.
Our analysis has shown that despite the significant improvement in students' AT skills across the intervention groups, these improvements tend to be idiosyncratic as a considerable number of students' AT skills remained relatively stable while others accelerated.We, therefore, further our analysis to model differences within and between participants across groups.An inspection of the variability shows that participants across the intervention groups exhibited significant between-and within-persons variability in their algorithmic thinking skills development.However, participants exposed to text-based-only programming exhibited more between-persons variability while those exposed to text-and-block-based environments exhibited the highest proportion of the total variability of within-persons.
The higher proportion of between-persons variability in the control group is an indication of divergent AT skills among the participants, confirming the absence of a compensatory effect.In contrast, the smaller proportion of between-persons variability in experimental group 1 is an indication of convergent AT skills among the participants, confirming the validity of the compensatory hypothesis.Furthermore, a high proportion of within-person variability in experimental group 1 confirms the significant growth trajectory of AT skills in this group.On this premise, we conclude that exposing students to AT skills using a combined text-and block-based programming environment would enhance sustained intra-individual growth with less variability between students.
To check whether the Alice innovation was meaningful, we extended our analysis to examine significant differences in AT across groups.The analysis revealed significant differences in AT skills development.An inspection of the mean scores shows that students exposed to text-and-block-based programming obtained higher AT skills compared to other groups.Despite the significant differences, we observed that no such difference exists across gender categories in the three treatment groups.The literature on gender differences in thinking competencies presents mixed findings.A substantial portion of the literature reports no gender differences (e.g., Jiang & Wong, 2022;Kanaki & Kaloginnakis, 2022;Relkin et al., 2020).
However, other studies suggest significant gender differences (e.g., Del Olmo-Muñoz et al., 2020;Hu, 2024;Sun et al., 2021Sun et al., , 2022;;Wong et al., 2024).These inconsistencies may be attributed to the geographical contexts in which studies were conducted.For instance, Hu's (2024) meta-analysis highlights the geographical variations in gender differences.His findings suggest that gender differences in thinking skills are minimal in the "East Asia and Pacific" regions but more pronounced in "North America, " "Europe, " and "Central Asia." In conclusion, the insignificant gender difference found in the present study as opposed to other studies could be due to differences in geographical context.
In the context of this research, we propose that everyone is capable of developing algorithmic thinking skills regardless of their gender category.As science educators, we always understand and appreciate the need for gender inclusivity.For this reason, expanding programming tools to meet the expectations of all gender categories has been our top priority.We, therefore, question the intervention fidelity and overall validity of prior studies that reported gender differences in thinking competencies.We strongly believed that this difference could be weakened when all gender categories are carried along during experimental intervention.This effort is within the capacity of teachers and researchers as opposed to other individual student characteristics.

Conclusion
Our research has confirmed the possibilities of measuring students' algorithmic thinking skills using the DAAD framework (Lehmann, 2023a(Lehmann, , 2023b)).This framework appears to be relevant in measuring students' AT skills across disciplines.By employing statistical models, we found that students' growth trajectories in AT skills development are linear but more growth rate is prevalent when they are exposed to AT skills using a combination of text-and-block-based programming environments.We found evidence of compensatory effect only when text-and-block-based and block-based-only programming tools are integrated, suggesting that these environments narrow the wider gap that exists between high-and low-achieving students.These tools also have the potential to significantly reduce between-person variability and promote sustained intra-individual growth in algorithmic thinking skills.
We proposed that the combination of text-and block-based programming environments would lead to improved and sustained problem-solving skills, particularly in the field of programming.To this end, our findings have contributed to the ongoing debate concerning the learning benefits of block-and text-based programming modalities (for a review, see Xu et al., 2019).

Practical and theoretical implications
Our research has several implications, including theoretical and practical.Theoretically, this study contributes to the understanding of how different programming environments influence the development of AT skills.Through the examination of growth trajectory, our research provides an understanding of how AT skills develop in different programming environments.This study also extends the existing literature by suggesting that a mixed approach can enhance learning outcomes more effectively than a singular focus on either environment.Furthermore, our research underscores the importance of considering individual differences in AT development, including demographic factors such as gender.This is particularly significant in the context of computing education in sub-Saharan Africa, where there is a pressing need to address gender disparities and promote inclusive education.The findings suggest that tailored instructional strategies that account for these differences can lead to more equitable and effective learning experiences.
Practically, this study provides actionable insights for educators and policymakers.It highlights the potential benefits of integrating block-based programming environments with conventional text-based tools to enhance students' AT and problem-solving skills.Such integration can be facilitated by strong educational policies, especially in Nigerian institutions where technology integration faces resistance.Programming tutors are encouraged to adopt a mixed approach to teaching programming, which can lead to significant improvements in students' AT skills.

Limitations
The major limitation of this study is the use of a competency test that is restricted to programming problems and ignores daily life events.While general tests could attract wider application and empirical validation, we believe that specific competency tests are more suitable for novel research.Another limitation is the use of a small sample across groups, and this means that our study could not be overly generalized.However, as previously discussed, there is no numerical standard for sample size adequacy in true experiments.Nevertheless, we recommend other researchers to validate our algorithmic thinking test using large samples.

Fig. 1
Fig. 1 Our framework for developing algorithmic thinking

Fig. 3
Fig. 3 Participants in experimental group 1 performing programming tasks

Fig. 4
Fig. 4 Sample question of ATt: Loops and conditionals

Fig. 5 Fig. 6
Fig. 5 ICC Plot showing the probability of solving an item Figure 10 represents the linear model while Fig. 11 represents the quadratic model.

Fig. 15 Fig. 17
Fig. 15 Differences in participants' growth trajectory of AT skills in Experimental Group 1

Table 2
Participant distribution

Table 3
Summary of class activitiesParticipants were instructed to use the refined components and develop their algorithms.They can do this by writing pseudocodes or drawing flowcharts that illustrate the whole problem and then implement a code in their programming environment that solves the problem

Table 4
Baseline comparisons

Table 5
Parameter estimates *p < 0.05, **p < 0.01, ***p < 0.001 age in the control group is 13.63 with a mean growth rate of 1.92 units.These results indicate a significant growth trajectory of students' AT skills development.Despite recording significant growth, the overall growth trajectory of experimental group 1 was more prevalent compared to the other two groups.The insignificance of the

Table 6
Unconditional multilevel growth model

Table 7
Multivariate differences in AT skills

Table 8
Univariate differences in AT skills

Table 9
Mean scores of AT skills development

Table 10
Multiple comparisons

Table 11
Gender difference in AT skills across groups