On the predictors of computational thinking and its growth at the high-school level

Computational thinking (CT) is a key 21st-century skill. This paper contributes to CT research by investigating CT predictors among upper secondary students in a longitudinal and natural classroom setting. The hypothesized predictors are grouped into three areas: student characteristics, home environment, and learning opportunities. CT is measured with the Computational Thinking Test (CTt), an established performance test. N = 202 high-school students, at three time points over one school year, act as the sample and latent growth curve modeling as the analysis method. CT self-concept, followed by reasoning skills and gender, show the strongest association with the level of CT. Computer literacy, followed by duration of computer use and formal learning opportunities during the school year, have the strongest association with CT growth. Variables from all three areas seem to be important for predicting either CT level or growth. An explained variance of 70.4% for CT level and 61.2% for CT growth might indicate a good trade-off between the comprehensiveness and parsimony of the conceptual framework. The findings contribute to a better understanding of CT as a construct and have implications for CT instruction, e.g., the role of computer science and motivation in CT learning.

to be interrelated, a study that more comprehensively captures potential predictors may be necessary to identify their unique effects. Second, research on predictors of CT is mainly cross-sectional; longitudinal studies to examine predictors of CT development are missing. The third research desideratum is the investigation of CT learning in naturalistic classroom settings (Ilic, Haseski, & Tugtekin, 2018;Lye & Koh, 2014). Interventions where students voluntarily participate may not accurately represent students in general. Fourth, high-school students are currently rarely studied in comparison to those from elementary and middle schools. However, no rationale suggests that elementary and middle schools represent the only critical stages of CT learning (Tang, Yin, Lin, Hadad, & Zhai, 2020). To address the four research desiderata, the paper at hand aims at answering the research question: What are the predictors of high-school students' CT level and CT growth on the individual level?
To this end, we develop hypotheses on predictors of the level and growth of CT. As we are focusing on the individual level, the classroom or school context is not considered. We test our hypotheses using a sample of 202 high-school students (11th grade) who we surveyed/tested at three intervals over one school year. A latent growth curve model acts as the method because it allows us to make inferences about the initial level of an outcome variable (CT), as well as its rate of change (Pakpahan, Hoffmann, & Kröger, 2017), and thus to answer our research question.

Review of available studies
From the available studies, the work of Durak and Saritepeci (2018) is most closely related to our research. The authors utilized a cross-sectional sample of 156 students from grades 5 to 12. They reported educational level (grade), mathematics and science class performances, and ways of thinking as significant predictors of CT. Building on this study, it could be beneficial to also consider students' CT motivation and family background because those factors may be important when investigating CT (Del Olmo-Muñoz, Cózar-Gutiérrez, & González-Calero, 2020;Fraillon, Ainley, Schulz, Friedman, & Duckworth, 2019;Repenning et al., 2015). Moreover, Durak and Saritepeci relied on a self-assessment instrument (Korkmaz, Çakir, & Ö zden, 2017). CT performance tests and self-assessment instruments tend to capture different constructs (Guggemos, Seufert, & Román-González, 2019;Román-González, Moreno-León, & Robles, 2019). It may be beneficial to complement the research of Durak and Saritepeci (2018) by using a performance test that captures CT skills.
Werner, Denner, and Campe (2012) relied on a cross-sectional sample of 311 middle-school students aged 10 to 14. They reported a positive relationship between CT and pair programming, parental education, the father or mother working with computers, speaking more English (native language) at home, interest in computer science classes, grades in school, frequency of computer use, confidence with computers, and attitudes towards computers. A predictor missing in this comprehensive study may be reasoning skills.
Román-González, Pérez-González, and Jiménez-Fernández (2017) and Román-González, Pérez-González, Moreno-León, and Robles (2018) carried out a criterion validity study of the Computational Thinking Test (CTt), a CT performance test. Their cross-sectional sample comprises 1251 students from grades 5 to 10. They investigated the relationship of the CTt with various cognitive variables, e. g., reasoning, and non-cognitive ones, e.g., self-efficacy. By nature, the work is correlational and students' family background and learning opportunities are not considered.
In the large-scale International Computer and Information Literacy Study 2018 (ICILS), CT was an option in which eight countries participated (Fraillon et al., 2019). Using 8th graders as the cross-sectional sample, the ICILS found significantly higher CT scores among male students in comparison to females, for the overall sample. Moreover, the ICILS reported a positive association of CT with socioeconomic background, access to computers at home, and experience using computers. A negative association is found with an immigration background. Evidence about the association with motivation and cognitive dispositions, like reasoning or mathematical skills, however, is not provided.
In terms of CT development (growth), quantitative studies are mostly carried out using (quasi-)experimental designs where the causal effect of specific interventions on CT learning is of interest (Cutumisu, Adams, & Lu, 2019). However, these intervention studies mostly do not focus on CT learning in naturalistic classroom settings over a substantial period of time.

Conceptual framework
According to the research question, we only consider the individual level. To structure our hypotheses, we rely on the conceptual framework of the ICILS. It distinguishes Antecedents and Processes. "Antecedents are exogenous factors that condition the ways in which (…) learning takes place" (Fraillon et al., 2019, p. 6). Antecedents comprise student variables, e.g., gender, and home environment variables, e.g., parents' socioeconomic status. "Processes are those factors that directly influence (…) learning" (Fraillon et al., 2019, p. 6). Such CT learning opportunities can be formal or informal in nature (Grover & Pea, 2013;Wing, 2008).

Antecedentsstudent
Gender differences are an important topic in CT research (Shute et al., 2017). Durak and Saritepeci (2018) argued that reasons for gender differences in CT may be due to differences in self-efficacy and interest, which might be attributed to gender stereotypes (Master, Cheryan, & Meltzoff, 2016). Overall, the empirical evidence is inconclusive. Grover, Pea, and Cooper (2015) found higher CT levels for female middle-school students in comparison to males. Werner et al. (2012) did not find any gender differences among middle-school students. Atmatzidou and Demetriadis (2016), based on a sample of junior high and high vocational students, reported that females need more learning time but reach the same CT levels as their male counterparts. The ICILS, across the eight countries in the sample, reported significantly higher CT scores for males in comparison to females. Román-González et al. (2017) used Spanish secondary students aged 10 to 16 as a sample and found an increasing gender CT gap in favor of males as students become older. Since our study is in the realm of high-school education, we expect: H1. The gender 'female' negatively predicts the level and growth of CT.
CT is conceptualized as a problem-solving methodology across subjects (Barr & Stephenson, 2011;Wing, 2006). In light of this, it may be strongly related with reasoning (Ambrosio, Xavier, & Georges, 2015;Buitrago Flórez et al., 2017;Wüstenberg, Greiff, & Funke, 2012). Reasoning is a core facet of general intelligence. It can be broadly defined as "the process of drawing conclusions" (Leighton, 2009, p. 3). In terms of CT core facets, reasoning may be involved in all of them: abstractions, algorithmic thinking, decomposition, and debugging. The strong conceptual relationship between CT and reasoning is supported by empirical evidence. Román-González et al. (2017) reported a medium correlation of CT with reasoning skills. Hence, we assume: H2. Reasoning skills positively predict the level and growth of CT.
According to Wing (2006, p. 33), CT may be an analytical skill, like "reading, writing, and arithmetic". In terms of mathematical skills, Wing (2008) claimed that CT and mathematical thinking share the same general way of approaching problems. Sneider, Stephenson, Schafer, and Flick (2014) demonstrated a substantial overlap between CT and mathematical thinking, e.g., 'modeling'. In light of this, we might expect a positive association between mathematical skills and CT. Durak and Saritepeci (2018), as well as Grover et al. (2015) and Grover, Pea, and Cooper (2016), reported higher CT levels for students with higher academic success in mathematics. Concerning language skills, Barr and Stephenson (2011) claimed that CT concepts are also present in the language arts. An example is "Write instructions" as a manifestation of an algorithm (Barr & Stephenson, 2011, p. 52). Zhang and Nouri (2019) showed in their literature review that reading is regularly regarded as a part of CT. Against this background, language skills could be positively associated with CT. Román-González et al. (2018) reported a medium positive correlation between CT and verbal ability. In sum, we hypothesize: H3. Mathematical skills positively predict the level and growth of CT.
H4. Language skills positively predict the level and growth of CT.
The relationship between programming and CT is often thematized. In line with Wing (2006), Israel, Pearson, Tapia, Wherfel, and Reese (2015) regard the use of computers to model ideas and programming as an integral part of CT. Concepts of programming, e.g., loops and conditionals, are elements of the Brennan andResnick (2012) CT framework. Buitrago Flórez et al. (2017), as well as Lye and Koh (2014), argue that by means of programming, several core facets of CT can be addressed. Shute et al. (2017) concluded there is a close relationship between CT and programming skills due to similar underlying cognitive processes. Hsu et al. (2018) reported, based on their review of the literature, that programming is widely used to teach CT. Against this background, the ability to program may be positively associated with CT. Concerning empirical evidence, Grover et al. (2016) highlighted the positive influence of programming experience on CT; Scherer, Siddiq, and Sánchez Viveros (2019) concluded, based on a meta-analysis, that CT can be taught through programming. However, Duncan and Bell (2015) did not find a positive influence of teaching programming on CT levels. In sum, we expect: H5. The ability to program positively predicts the level and growth of CT.
CT can be regarded as part of an overall digital literacy that also includes computer (and information) literacy (Yadav et al., 2018). We define computer literacy as the procedural and declarative knowledge that enables an individual to deal competently with the computer and thus to use information technologies in an individually and socially successful way (Richter, Naumann, & Horz, 2010). Many authors, including Wing (2006), stress that computer literacy is different from CT. However, the question remains whether computer literacy is conducive to CT or not. Since CT aims at representing a problem in such a way that a computer can solve it (Israel et al., 2015;Wing, 2006), knowledge about the capabilities of computers may be beneficial. Moreover, CT is often taught using computers and technology (Hsu et al., 2018). Students with higher computer literacy may benefit more from CT instruction because they are more proficient in using computers as learning tools. Concerning empirical evidence, Yeh et al., 2012 used Microsoft Excel, namely functions, to promote CT and reported positive effects. The ICILS also found a strongly positive correlation between information and computer literacy and CT. Against this background, we hypothesize: H6. Computer literacy positively predicts the level and growth of CT.
According to the expectancy-value model (EVM) (Wigfield & Eccles, 2000), the expectation of success and subjective task value drive the level of achievement motivation. The expectation of success depends on the person's self-concept, which can be broadly defined as the perception of oneself (Shavelson, Hubner, & Stanton, 1976). Drawing from this, CT self-concept could be defined as the perception about oneself in the field of CT. A core element of self-concept is the perceived competence (Bong & Skaalvik, 2003). As such, it may be closely related to self-efficacy. Indeed, domain specific self-concept and self-efficacy are often hard to separate (Bong & Skaalvik, 2003). The main difference might be the time orientation: self-concept is relatively stable whereas self-efficacy is malleable. In both cases, if a person expected to master CT tasks, they would put more effort into CT learning, overcoming obstacles, and CT activities in general. To capture the expectation of success component of the EVM, we rely on self-concept (Retelsdorf, Köller, & Möller, 2011). For our longitudinal study, in comparison to self-efficacy this more time-stable construct might be more suitable. Empirical evidence is available for self-efficacy. Román-González et al. (2018) and Ketenci, Calandra, Margulieux, and Cohen (2019) reported a positive correlation with CT. Overall, we hypothesize: H7. CT self-concept positively predicts the level and growth of CT.
The second component of the EVM addresses the perceived task value. Following Wigfield and Eccles (2002), the individual perception of usefulness plays a central role. Students who regard CT as more important for their academic and personal success are expected to put more effort into CT learning. This is also expected from students who enjoy engaging in CT tasks and are interested in them, regardless of external incentives. The described elements of perceived task value are consistent with the self-determination theory (Ryan & Deci, 2000); they may be manifestations of self-determined motivation, i.e., the person engages in the subject independently from external influences. Self-determined motivation is an important predictor for high-level learning processes (Seidel, Rimmele, & Prenzel, 2005). Repenning et al. (2015) stressed the importance of motivation for CT learning. For primary students, Kong, Chiu, and Lai (2018) showed a strong positive association between interest and 'programming empowerment', a construct closely related to CT. Ketenci et al. (2019), however, did not find a significant correlation between interest and CT. Drawing on the EVM, we hypothesize: H8. 'Self-determined motivation' positively predicts the level and growth of CT.

Antecedentshome environment
The home environment is a regularly used predictor of educational outcome (Rutkowski & Rutkowski, 2013). An important aspect of home environment is 'Socioeconomic and Cultural Status' (SECS), which comprises parental income, parental education, parental occupation, and the availability of cultural goods at home. The rationale is that families with a higher SECS are able and willing to provide more favorable learning environments (Retelsdorf et al., 2011). In terms of empirical evidence, the ICILS consistently reported higher CT scores for students from families with a higher SECS. Werner et al. (2012) reported a positive correlation between parental education and CT performance. We hypothesize: H9. SECS positively predicts the level and growth of CT.
Another important aspect of home environment might be migration (OECD, 2015). Reasons for the lower performances of students from families with migration background could be due to their concentration in disadvantaged schools and language-related issues. The ICILS reported a significantly lower CT score for students from immigrant families in comparison to those from non-immigrant families. Hence, we hypothesize:

H10.
A migration background is negatively associated with the level and growth of CT.

Processeslearning opportunities
Formal and informal learning opportunities may be necessary to foster CT (Grover & Pea, 2013;Wing, 2008). Formal learning can be regarded as learning that takes place in courses at school. In light of this, the extent of formal CT learning opportunities is difficult to capture because CT, by its nature, can be part of (almost) every subject (Webb et al., 2018). Hence, for formal learning opportunities, we hypothesize on a general level: H11. Formal learning opportunities during the school year positively predict the growth of CT.
Although CT could be part of every subject, it is deeply rooted in computer science education (Grover & Pea, 2013) and draws on basic concepts of computer science (Wing, 2006). Hence, we regard computer science instruction as a formal learning opportunity for CT and hypothesize on a more specific level: H12. Computer science instruction positively predicts the level and growth of CT.
Besides formal learning opportunities, CT could also be fostered in informal settings, i.e., outside of school courses. Durak and Saritepeci (2018) hypothesized that the use of information and communication technology (ICT) and the internet has a positive influence on CT. They argue that interacting with ICT is important for reflecting on CT, and that the use of ICT may be conducive to CT. However, both hypotheses were rejected. A reason for this could be that students use digital devices like smartphones to a great extent for leisure activities (Fraillon et al., 2019), e.g., listening to music or using social media. These activities might not be conducive to CT. In light of this, the use of computers (PC and laptop) may be a better indicator for informal learning opportunities. Indeed, Werner et al. (2012) reported a positive correlation between frequency of computer use and CT. We therefore hypothesize: H13. Duration of computer use positively predicts the level and growth of CT.

Data collection and sample
Our sample comprises N = 202 students from the 11th (second last) grade of three 'Gymnasium Helveticum' (high schools) in German-speaking Switzerland who received 90 min of computer science instruction per week during the school year. A synopsis of the CT-related content in the curriculum can be found in Table S1 in the Supplementary Material; CT is explicitly mentioned as a curricular goal. The students are nested in twelve classes. Data was collected using Questback Unipark at the beginning (t1 = August), the middle (t2 = January), and the end (t3 = June) of the school year 2018/19, each within a two-week time slot. Data about CT performance was collected at all three time points, concerning the formal learning opportunities during the school year in t3, and about all other covariates in t1. The class teachers supervised the students and ensured a suitable test environment. The scheduled time was 90 min in t1 and 45 min in both t2 and t3. Teachers allowed all students to finish their work. On average, the students were aged 17.23 years (SD = 0.85 years) at the beginning of the data collection. Further characteristics of the sample can be found in the two last lines of Table 2.

Outcome variable -CT
To measure CT skills, we utilized the CTt (Román-González et al., 2017), a widely accepted performance test (Cutumisu et al., 2019;Shute et al., 2017;Zhao & Shute, 2019). It aims at measuring "the ability to formulate and solve problems by relying on the fundamental concepts of computing, and using logic-syntax of programming languages: basic sequences, loops, iteration, conditionals, functions and variables" (Román-González et al., 2017, p. 681). The well-established CT framework of Brennan and Resnick (2012) acts as the theoretical background. Moreover, the CTt covers core facets of CT: abstraction, decomposition, algorithms, and debugging. We adapted the CTt to the high-school level, resulting in a performance test with 26 items. To this end, we selected more difficult items in comparison to the initial version of the CTt from the item pool of Román-González (2015). We validated our version of the CTt and demonstrated Rasch-scalability (Guggemos, Seufert, & Román-González, 2019; a summary of this validation can be found in the Supplementary Material). This also implies test fairness, which means that the test "does not advantage or disadvantage some individuals because of characteristics irrelevant to the intended construct" (AERA, APA, & NCME, 2014, p. 50). Moreover, we provided evidence for unidimensionality. Our CTt version in general meets the quality criteria applied in the PISA studies (OECD, 2017, pp. 131-134). The weighted mean square error (wMNSQ) lies between 0.80 and 1.20 with the exception of only one item (1.22). This suggests an overall good fit with the Rasch model. The Wright Map (see Fig. S1 in the Supplementary Material) demonstrates an acceptable fit between student ability and item difficulty. All point-biserial correlations are higher than 0.30 revealing a sufficient discriminatory power of the items. The percentage of correct answers for all items lies between 0.90 and 0.25 indicating the absence of too easy or too difficult items. A sample item of our CTt version is depicted in Fig. 1. We used WLE estimators to measure student CT performance (Warm, 1989). The WLE reliability equals 0.81 in t1 and 0.85 in t2 and in t3. For a more vivid presentation, CT scores were transformed to a scale of M = 50 and SD = 10 at t1 (Retelsdorf et al., 2011). At this level of CT, the probability of mastering the item provided in Fig. 1 equals 51%.

Predictors
This section describes the operationalization of the hypothesized CT predictors. The subsequent section addresses their scaling, reliability assessment, and the taken measures. Note. ω = Revelle's Omega Total, α = Cronbach's Alpha. ✓ = sufficient reliability. Self-report = fact is reported. Self-assessment = (subjective) assessment necessary. Rating scales 1-7 ranging from entire disagreement to entire agreement. Note. Correlations in bold are significant considering the cluster structure of the data: p < .05 (two-tailed). a For dummy coded variables the percentage of category '1' is reported: Gender (0 = male, 1 = female); Ability to program (0 = no, 1 = yes); Migration background (0 = both parents born in Switzerland, 1 = at least one parent not born in Switzerland). b standardized to M = 50, SD = 10.
In terms of gender, students' self-reports were utilized (Margulieux, Ketenci, & Decker, 2019). Reasoning skills were measured using the short version of the Hagen Matrices Test (HMT-S) (Heydasch, Haubrich, & Renner, 2017). It comprises six multiple-choice items and covers intelligence facet reasoning (Schneider & McGrew, 2012). Students' self-reported grades in mathematics and German (the general language spoken at school and the language of all measurement instruments) for the previous school year act as a proxy for mathematical and language skills. School grades in Switzerland range from 1 to 6, where 6 represents the best possible grade. In general, self-reported grades may have the same usability as actual grades (Gonyea, 2005). To capture ability to program, we asked students 'Are you able to program (e.g., Java or Python)?' (yes/no). Computer literacy was measured with the dimension 'practical computer knowledge' of the revised Computer Literacy Inventory (INCOBI-R) (Richter et al., 2010). It comprises 20 multiple-choice questions (performance test) that address situations that may occur when working with computers. A sample item is: 'You want to prevent other people from following your navigation behavior on the Internet. What measure contributes to this?' CT self-concept was measured with six items taken from Köller, Daniels, Schnabel, and Baumert (2000) that we adapted to the context of our study. A sample item is: 'Generally, I can solve that kind of tasks well'. Self-determined motivation consists of the facets: identified, intrinsic, and interested (Prenzel, Drechsel, & Kramer, 1998). Each facet comprises three items. Sample items are (Prenzel, Drechsel, & Kramer, 1998): 'I get involved because I want to get a little closer to my own goals' (identified), 'Performing such tasks is fun' (intrinsic), and 'I am so fascinated that I fully committed myself' (interested). In line with Howard, Gagné, and Bureau (2017), we opted for capturing self-determined motivation with one factor.
To form the variable SECS, students' self-reports were used. From indications regarding parents' professions, e.g., 'mechanical engineer', we derived the 'International Socio-economic Index of Occupational Status' (ISEI) using the database of Ganzeboom and Treiman (2012). The ISEI ranges from 16 (e.g., cleaners in offices) to 90 (judges) and reflects income as well as educational level (Ganzeboom, Graaf, & Treiman, 1992). Furthermore, we used the number of books in the home on a six-point scale ranging from 0 to 10 to 500+ (an illustration was provided) as a proxy for the available cultural goods at home (OECD, 2017). By means of a principal component analysis, we extracted one factor. This approach is consistent with ICILS and PISA (Fraillon et al., 2019;OECD, 2017). However, we used the ISEI of both parents instead of the highest ISEI. This means that a family where both parents are judges is assigned a higher SECS than a family where the mother is a judge and the father is a plumber. If the highest ISEI was utilized to form SECS, both families would be assigned the same SECS. Moreover, we did not include parents' educational level as an individual variable; the ISEI also reflects the educational level (Ganzeboom et al., 1992). As for the CTt, we scaled the SECS to M = 50 and SD = 10. The dichotomous variable 'Migration background' equals zero (0) if students reported that both parents were born in Switzerland and one (1) if at least one parent was born outside Switzerland (Retelsdorf et al., 2011).
For capturing formal learning opportunities during the school year on a general level in t3, we drew on the 'generic skill' scale of Byrne and Flood (2003). A sample item is: 'The courses taken during the school year have promoted my ability to solve such problems.' The students in our sample attended 90 min of computer science instruction per week. Hence, it is not possible to capture formal learning opportunities during the school year by the number of computer science lessons due to a lack of variance. For past formal learning opportunities, we utilized the self-reported weekly number of past computer science lessons. Concerning informal learning opportunities, the self-reported computer use by students (PC or laptop) was used (OECD, 2017).

Reliability assessment and taken actions
For assessing the internal consistency reliability (reliability) of our measures, we used Revelle's Omega Total as it is superior to Cronbach's Alpha (McNeish, 2018). Since Cronbach's Alpha is widely used, however, we report it along with Omega. A reliability of greater than 0.7 may be sufficient (Hair, Risher, Sarstedt, & Ringle, 2019). Table 1 summarizes the operationalization of the CT predictors, reports their reliability, and shows what actions are taken.
Ability to program may be problematic because it is a self-assessment based on a single binary item. However, using a latent variable approach, unreliability can be modeled. Following Wang and Wang (2020, pp. 144-150), we restricted the error variance of the predictor Ability to program to Var(A) (1 − ρ A ), where Var(A) is the observed variance of Ability to program and ρ A its reliability.
To determine the reliability, we drew from research about single item reliability. Postmes, Haslam, and Jans (2013) carried out a meta-analysis and reported a reliability (Cronbach's Alpha) of single item measures between 0.14 and 0.68. As a proxy for ρ A , we took the midpoint of this range (0.41). Needless to say, this value can only act as a rough approximation. We will therefore come back to this point in the discussion and limitation section.

Missing data and outliers
The students who were present during the data collection fully answered the questions. In t1 and t2, absence of data can be attributed to potentially random events, mainly sick leave. In t3, an entire class (n = 23) dropped out due to scarcity of time. Since the dropout was not caused by CT-related reasons, e.g., poor performance in the preceding tests, we regard this data as missing at random. The final sample comprises N = 202, N = 198, and N = 177 students in t1, t2, and t3, respectively. Following the simulation study of Shin, Davison, and Long (2017), we relied on full information maximum likelihood estimation to handle the missing data. We also checked for outliers by means of Mahalanobis distances (Penny, 1996); no student was excluded.

Longitudinal data analysis
To answer the research question, we used latent growth curve modeling by means of the R-package lavaan 0.6-3 (Rosseel, 2012). Since our data is clustered (students nested in classes) and we are interested in the individual level, the basis for statistical inferences are cluster-robust standard errors (McNeish, Stapleton, & Silverman, 2017). In light of this, we report χ 2 statistics after considering the Satorra Bentler correction: SB-χ 2 (Satorra & Bentler, 2010). First, we created an intercept only model: the intercept as a latent variable is measured by the CTt scores of t1-t3 with all three loadings restricted to one corresponding to the three time points (August, January, and June). Second, we created an unconditional linear growth model by further inserting a slope as a latent variable. To this end, we constrained the loadings on the intercept to one; the loadings of the slope are constrained to 0, 1, and 2, also corresponding to t1-t3. By comparing the intercept only model with the linear growth model, we can gauge if there is significant change in CT over the school year. Third, we assessed if the assumption of a linear growth is justified. In order to do this, we restricted the loadings on the slope to 0 in t1 and 1 in t3; the loading for t2 is freely estimated. An estimated loading of greater (smaller) than 0.5 would imply that the expected growth from t1 to t2 is greater (smaller) than the growth from t2 to t3.
To examine the association of the predictors with the level (intercept) and growth (slope) of CT, we utilized a two-step strategy (Retelsdorf et al., 2011). First, each predictor was individually included in a simple conditional model to test its association with level and growth. Fig. 2 depicts the simple conditional model.
Second, we tested all predictors at once in a full conditional model to determine which ones uniquely contribute to explaining level, as well as growth, and to test the hypotheses. Table 2 shows the Pearson correlations (r) of the considered variables as well as their mean or frequency and SD. The ISEI equals on average 57.34 (SD = 25.11) for the father and 51.60 (SD = 24.63) for the mother. In the median household, 101-200 books are available. The median student uses the computer 31-60 min per day. The largest correlation with CT appears for CT self-concept in t1 and t2 and for computer literacy in t3.

Longitudinal data analysis
The intercept only model yielded an insufficient fit (Hu & Bentler, 1999): SB-χ 2 (4) = 9.21, p = .056, CFI = 0.900, TLI = 0.925, RMSEA = 0.185, SRMR = 0.084. The unconditional linear growth model yielded a significantly better fit: ΔSB-χ 2 (3) = 8.85, p = .031. A non-linear growth model did not yield a significant improvement in fit (p = .941); hence, we used a linear growth model for all our analyses. In this model, the slope is not significantly negative (b = − 1.70, p = .067). Since the mean of CT in t1 (intercept) is standardized to 50 with a SD of 10, this equals a non-significant decrease of CT of about one-sixth of the standard deviation in t1. We will address this finding in the discussion section. The variance of the intercept equals 73.92, and the variance of the slope, 2.52. Intercept and slope are not significantly correlated (r = 0.08, p = .896). The full conditional model showed a decent fit: SB-χ 2 (27) = 29.44, p = .340, CFI = 0.992, TLI = 0.984, RMSEA = 0.027, SRMR = 0.039. Table 3 presents both the simple and full conditional model. At the top, the prediction for the initial level can be found and, at the bottom, the prediction for growth. Table 4 summarizes what hypotheses were accepted or rejected based on the full conditional model. It also indicates the cumulative explained variance grouped by the three areas of the conceptual framework.

Overall findings
The students in the sample had 90 min of computer science instruction per week during the school year of the investigation. Surprisingly, we observed a non-significant negative growth of CT. Cohen's d equals 0.16 from t1 to t2 and 0.20 from t2 to t3, which would be a marginal and a small effect, respectively. The reason for this finding may be a decrease in test motivation during the school year. As a robustness check, we formed a proxy for lack of test motivation by means of two items measuring the construct amotivation on a seven-point scale of rating, captured at all three time points (Prenzel, Drechsel, & Kramer, 1998): 'When performing the tasks, I disengaged' and 'My mind was elsewhere when I was performing the tasks.' Cronbach's Alpha is greater than 0.74 at all three time points. We observed an increase in lack of test motivation from t1 (M = 3.36, SD = 1.66) to t2 (M = 4.18, SD = 1.47) to t3 (M = 4.75,  Computer science was newly introduced as a curriculum subject in the school year of our investigation. Implementing a new curriculum is a complex endeavour that takes time (Altrichter, 2005). Hence, the ideas of the curriculum had potentially not yet been fully implemented at that time. Moreover, a one-year period of time may not be long enough to observe substantial change; learning might not be a linear process for all students.

Antecedentsstudent
Females show CT scores that are about one standard deviation lower than those of males (simple conditional model). Since there is evidence for the fairness of CTt (Guggemos, Seufert, & Román-González, 2019), this finding may not be the result of a lack of test fairness. Of the gender difference, more than 50% can be attributed to females' lower CT self-concept, lower computer literacy, and lower self-determined motivation (simple vs. full conditional model). However, even when considering all covariates in the full conditional model, a negative association of 'female gender' and CT level remains. Román-González et al. (2017) argued that the CT gender gap may increase with age. This, indeed, may be consistent with the available evidence: the ICILS reported only minor gender differences for 8th graders; our sample of 11th graders revealed a substantial gap. The question that remains is what variables, like personality traits, might explain these findings? We agree with Shute et al. (2017) that gender differences deserve further research.
Language skills do not show a positive association with the CT level in the simple conditional model or in the full conditional model. This finding is contrary to Román-González et al. (2018). Since the study of those authors also rely on the CTt for measuring CT, one reason could be that the students in our sample show a sufficient level of language skills to perform the CTt items and, above this level, language skills no longer have a positive influence on the CTt performance. To support this line of argumentation, we split up our sample in a low and a high language skill group using the median (= 5). Since we did not find a (stronger) positive association among the students in the low language skill group in comparison to the high language skill group, all students in our sample may have sufficient language skills to perform the CTt items. On a more general level, however, further research may be necessary to obtain a better understanding about how exactly language skills positively influence CT. Regarding the ability to program, the association with CT level is only significant in the full conditional model and if measurement error is considered (b = 5.58, p = .024). If measurement error is not taken into account, the association is not significant (b = 2.13, p = .068). As a robustness check, we carried out a sensitivity analysis. It indicated that up to an unrealistically high reliability of 0.7, the association remains significant at the 5% level. Overall, however, the relationship between text-based and block-based programming, like in our CT test, may not be as straightforward as often assumed. Programming skills required in text-based languages are only partially overlapping those required in block-based languages (Mladenović, Boljat, & Ž anko, 2018;Weintrop & Wilensky, 2017). The association between computer literacy and level of CT is not statistically significant in the full conditional model, but in the simple conditional model. The main reasons are the inclusion of CT self-concept and reasoning skills in the full conditional model; both are positively correlated with computer literacy.
With regard to motivation, both components, self-concept and self-determined motivation, are significant predictors of the CT level. CT self-concept has the strongest association with the level of CT in both the simple and full conditional model. In sum, these findings may indicate the usefulness of the EVM for capturing motivation in CT research. Considering motivation might be especially important for the understanding of gender differences. Simple comparisons may overestimate the effect of gender because a substantial part is explained by female students' lower CT self-concept and self-determined motivation.
uniquely predict the CT level, specific circumstances during the school year could be the reason: CT may be instructed in a way that allows students with higher levels of computer literacy to benefit more from instruction. In fact, students worked with computers or digital devices almost all time during the computer science lessons. A multigroup analysis that compares students with low (0-6 correct answers in the computer literacy test), medium (7-12 correct answers), and high (13-20 correct answers) computer literacy revealed for the high computer literacy group a significant positive slope: b = 1.92, p = .021. The slopes of the medium and low group are not significantly different (p = .618). For the pooled medium/low group, the slope equals b = − 2.14 (p = .014). In both groups the estimated slopes are higher when amotivation is considered as a time-varying covariate: b = 6.45, p < .001 (high computer literacy group) and b = 1.23, p = .385 (medium/low computer literacy group).

Antecedentshome environment
Home environment comprises SECS and migration background. These variables are (in line with large-scale studies like ICILS and PISA) negatively correlated (r = − 0.37, p < .001). Both predictors do not show a significant association with the CT level in the simple conditional model, but do so in the full conditional model. Concerning SECS, this can mainly be attributed to its negative correlation with self-determined motivation. For migration background, there is no specific reason. The finding for a positive association of SECS with the level of CT is in line with the ICILS. Other than hypothesized, however, a migration background has a positive association with the CT level in our study. The explanation might be that general reasons for migrant students' lower performance, like language problems, are not pertinent in the context of our study. Our sample comprises students from the 'Gymnasium Helveticum' where students are selected by means of a rigorous admission test covering, among others, language and mathematical skills. In terms of reading, the 2018 PISA study showed, for Switzerland, that migrants do not perform worse if control variables like SECS are considered (Konsortium PISA.ch, 2018). In our case, students with a migration background might perform even better than students without such a background because these students, in our specific circumstances, may be very competitive and from families with a high affinity towards education. Overall, consistent with the ICILS, home environment seems to be an important variable when investigating CT. As a robustness check, we also used a different measure for SECS. Instead of considering both parents ISEI, we used the highest ISEI of father and mother along with the number of books to form SECS. With one exception, the results are identical in terms of sign and significance at the 5% level: SECS is significantly positively associated with CT growth (β = 0.33, p = .035). In the specification reported in Table 3, home environment variables showed no significant association with CT growth.

Processeslearning opportunities
Learning opportunities in informal and formal settings have a positive association with growth of CT. The finding of a positive association between duration of computer use and CT might confirm the results of Werner et al. (2012) and complement the work of Durak and Saritepeci (2018), which did not find a positive influence from the general use of ICT. Overall, CT learning seems to be related to computing because duration of computer use and computer literacy are both positively associated with CT growth. Those two predictors are positively correlated (r = 0.24, p < .001). The perceived formal learning opportunities during the school year do not predict CT growth in the simple conditional model, but they do in the full conditional model. The main reason for this increase in the coefficient is the inclusion of computer literacy, which is negatively correlated with perceived learning opportunities during the school year.

Limitations
Our study is not without limitations. We discuss them following Cachero, Barra, Melia, and Lopez (2020) by addressing internal, external, construct, and conclusion validity.
Compared to randomized field experiments as the gold standard, internal validity is suboptimal. The reported associations cannot be interpreted as causal because we cannot rule out that variables not considered in our conceptual framework cause the associations. We tried to reduce this risk by a comprehensive review on variables that influence CT from a theoretical point of view. Another threat to the internal validity could be the drop out of one school class in t3. This drop out, however, was not due to the results in the preceding tests; the data may be missing at random.
A threat to external validity could be our sample. It is narrow in scope as it comprises only students from German-speaking Switzerland, and from one type of school. This sample is not representative for high-school students in general. For making general claims about those students, replication studies would be necessary. Moreover, computer science was newly introduced as a curriculum subject in the school year of investigation. Since implementing a curriculum is a complex and time consuming endeavour (Altrichter, 2005), the results on CT growth could be different if the study was replicated in the future. Moreover, it is doubtful to what extent our findings on CT can be generalized to text-based programming due to an only partial overlap with block-based programming languages.
In terms of construct validity, CT as the dependent variable is measured using the CTt, a unidimensional performance test that is by nature narrow in scope. Concerning Brennan and Resnick's (2012) framework, the CTt focuses mainly on computational concepts, partly on computational practices, and neglects computational perspectives. It is unlikely that the complex nature of CT can be captured by using a single instrument. Rather, a system of assessment tools may be necessary in order to gain a comprehensive picture of students' CT Tang, Yin, Lin, Hadad, & Zhai, 2020). In terms of the predictors, Ability to program was captured with only one question: 'Are you able to program (e.g., Java or Python)?' This question may be hard to answer because there is no clear-cut definition of ability to program. Moreover, a dichotomous item may be suboptimal because programming ability is probably a continuum. Using a more elaborate self-assessment instrument or a performance test in future research would be beneficial.
To tackle the inherent unreliability of our measure, we can only rely on a very rough approximation for its measurement error. However, up to an unrealistically high reliability of 0.7 of the binary single item, the association with the CT level remains significant at the 5% level. Besides this, self-reported grades could suffer from a self-serving bias (Campbell & Sedikides, 1999) and students may not be able or willing to fairly assess themselves.
Conclusion validity may be impaired as we focused on the individual level. Learning (of CT), however, is embedded in classrooms, schools, and an educational system (Fraillon et al., 2019). These higher-level variables may moderate relationships at the individual level. In particular the consideration of the class and the school level by means of a multi-level analysis could provide further insight into the level of CT and its growth. Nevertheless, our statistical inferences may be valid because we relied on cluster-robust standard errors (McNeish et al., 2017).

Implications
Overall, computer science instruction may be conducive to CT because there is a positive association with the number of attended computer science lessons in the past and CT. However, although the students in the sample attended 90 min of computer science instruction per week, they reported low levels of learning opportunities (M = 2.63, SD = 0.93 on a seven-point scale). Moreover, if decreasing test motivation is not considered, the level of CT growth over the school year was actually (non-significantly) negative. As Cachero et al. (2020) claim, CT may not develop naturally. Rather, a deliberate effort may be necessary for effectively fostering CT (within computer science instruction).
Motivation plays a crucial role in predicting the CT level. Hence, when selecting a method for CT instruction, motivational aspects may be considered. In this vein, Repenning (2017) argues that professional programming languages may demotivate students and it might be preferable to rely on visual programming languages in CT instruction. Professional (text-based) programming languages might be regarded as too difficult (lack of CT self-concept) and not attractive enough (lack of self-determined motivation). As our research might indicate, both components of the EVM should be considered when selecting CT instructional means and carrying out research that compares text-based and block-based programming (Mladenović et al., 2018;Weintrop & Wilensky, 2017).
Initiatives that involve females in CT may be necessary because the gender gap does not seem to decrease with age. Rather, the opposite may be the case. Since CT is an important 21st-century skill prevalent in all areas of study and in the workplace (Yadav et al., 2018), initiatives to tackle this issue may be required. In terms of instruction, game-based approaches could be promising for engaging female interest, as well as students from minority groups (Repenning et al., 2015). Efforts involving female students in CT activities may start at the elementary level (e.g., Luo, Antonenko, & Davis, 2020).

Conclusion and outlook
Our study has provided new insight into predictors of CT because we used a sample of high-school students, an under-researched area. Using a longitudinal design allowed us to investigate predictors of CT level and growth in a naturalistic classroom setting. An explained variance of 70.4% for CT level and 61.2% for CT growth might indicate a good trade-off between the comprehensiveness and parsimony of our framework. Further (qualitative) research could aim at identifying more, not as yet considered, predictors of CT.
In terms of predictors, we showed that gender is important for explaining the level of CT within a high-school context. Language skills seem to have no influence either on the level or growth of CT. Since mathematical skills can predict CT level, CT may be more closely related with mathematical than with language skills. Concerning the ability to program, we find a significant association with CT if measurement error is considered. We suggest using a performance test or an elaborate self-assessment instrument for measuring programming ability in future research. Motivation, in the form of CT self-concept and self-determined motivation, plays an important role in explaining CT level and gender differences. The duration of computer (PC and laptop) use has a positive association with CT level and growth. Further research could investigate what kind of activities carried out during computer use are conducive to CT. Moreover, we showed that perceived learning opportunities during the school year under investigation are positively associated with CT growth. Here, follow-up questions should include the kind of activities and courses responsible for such a perception.

Acknowledgement
The author has received a postdoctoral fellowship for carrying out this research from the Basic Research Fund at the University of St.Gallen (no. 1031542). Our thanks go to the coordinators and administrators at the three schools that participated in this study. We also thank Marcos Román-González who shared with us the Computational Thinking Test (CTt) and was involved in the process of adapting the CTt for this study. Finally, we are very much indebted to Frank Fischer, Michael Sailer, and Sabine Seufert for their help and insightful suggestions during the conduct of this research. Readers who are interested in replicating the study may contact the author.

Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.compedu.2020.104060.