Predictive and incremental validity of Students’ Learning Approach Test (SLAT-Thinking)

SLAT-Thinking is the only test that evaluates and distinguishes stages of approaches through performance. Although SLAT-Thinking shows evidence of internal validity, its external validity has not yet been examined. In this paper we study the predictive and incremental validities of SLAT-Thinking. Two models were tested. The predictors were inductive reasoning, SLAT-Thinking approaches and Learning Approaches Scale (EABAP) approaches. The outcome was the Brazilian large-scale exam that evaluates the students that finish secondary education. In both models, the superficial approach of SLAT-Thinking was the main predictor, followed by inductive reasoning. The deep and superficial approaches of SLAT-Thinking were positively associated with academic achievement. While the deep intermediate approach was negatively associated to outcome. Non-linear relationships (positive and negative associations) were found in the two EABAP approaches and inductive reasoning with the outcome. This study shows evidence of predictive and incremental validity of SLAT-Thinking.


Introduction
The Learning Approach Test: Identification of Thought Contained in Texts (SLAT-Thinking) is an instrument developed to measure learning approaches.There are two reasons by which this instrument has shown itself to be very promising.While the other instruments of approaches are self-report tests, SLAT-Thinking is a performance test.Since self-report measures are permeated by biases, such as social desirability and acquiescence, SLAT-Thinking is advantageous to measure approaches, as it overcomes these biases (Gomes et al., 2020).In addition, traditional instruments measure approaches in terms of "all or nothing", that is, without identifying intermediate stages.In contrast, SLAT-Thinking allows the identification of intermediate stages, enabling the measurement of levels of development of the approaches (Gomes et al., 2020).
SLAT-Thinking shows evidence of internal validity.The content validity was examined and the judges attested the content validity of SLAT-Thinking, as well as, an expert in Portuguese approved the writing of the test and 10 people from the target audience certified that both the instructions and the task are easy to understand and execute (Gomes & Linhares, 2018).The structural validity of SLAT-Thinking was investigated through a sample of 622 higher education students from public and private institutions in the areas of biological, exact and human sciences (Gomes et al., 2020).The findings of this study support the validity of a model of approaches with four correlated factors: superficial approach, superficial intermediate approach, deep intermediate approach and deep approach.This model was tested by means of item confirmatory factor analysis showing acceptable fit, (CFI = .946,RMSEA = .037,95% CI [.037, .042])and average factor loads of .66(superficial approach), .34(superficial intermediate approach), .41(deep intermediate approach) and .50(deep approach).This study also showed that SLAT-Thinking produces reliable scores for superficial (.82), deep intermediate (.66) and deep (.69), according to Cronbach's alpha, and reliable scores for superficial approach (.61), according to McDonald's omega.Evidence of configural, metric and scalar invariance in two different samples was also supported in this study.
Despite the evidence of internal validity, SLAT-Thinking has not yet been studied for external validity.External validity is an important step, as it assesses whether or not the test is capable of producing empirical evidence that supports theoretically established relationships between the construct measured by the test and related variables.The theory Revista Interamericana de Psicología/Interamerican Journal of Psychology 2023, Vol., 57, No. 3, e1514 ARTICLE | 3 of learning approaches postulates that the deep approach provides more effective learning, while the superficial approach provides poor quality learning.Meta-analysis studies show that approaches are associated with academic performance (Richardson et al., 2012;Watkins, 2001).Watkins (2001) found mean correlations of .14 and -.18 between academic performance and the deep and superficial approaches, respectively.Richardson et al. (2012) found similar results, identifying mean correlations of .16 and - .11 between the deep and superficial approaches.Despite the small effect size, these studies show that there is a predictive relationship between approaches and academic performance.For this reason, this study examines whether SLAT-Thinking predicts academic performance.
Intelligence is recognized as one of the variables which is traditionally studied in order to predict academic performance.This study examines whether SLAT-Thinking still increases the prediction of student performance in the presence of intelligence as a control variable.This study also examines whether SLAT-Thinking adds in predicting academic performance in the presence of a traditional measure of self-reporting in approaches as control variable.In summary, we expect that SLAT-Thinking will predict academic achievement and provide incremental validity over traditional self-report instruments and intelligence.Regression (Soper, 2023) to this model, using a type I error rate of 5%.We found a type II error rate of 2.8% for the estimated R², which is considerably lower than the standard reference criterion of 20% (Banerjee et al., 2009).

Data collection procedures
The data used in this study come from two independent sampling collections carried out in 2019.and Z (there is no way to answer).There is only one correct alternative, and each hit is recorded as 1, and each error is recorded as 0.
BAFACALO includes 18 tests that measure both the general factor and six broad factors of the Cattel-Horn-Carroll (CHC) model.BAFACALO shows evidence of external and internal validity (Alves et al., 2012;Gomes, 2010aGomes, , 2010bGomes, , 2011bGomes, , 2012;;Gomes & Borges, 2009a, 2009b, 2009c;Gomes, de Araújo, et al. 2014;Gomes & Golino, 2012a, 2012b, 2015).TDRI derives from the Inductive Reasoning Test, a BAFACALO test that measures inductive reasoning.The Inductive Reasoning Test items have groups of letters ordered according to a rule.The respondent's task is to discover the rule and point out the alternative that does not follow it.TDRI is structured similarly to the Inductive Reasoning Test.The difference is that, in addition to the CHC model, it takes the hierarchical complexity model as a reference.By combining a psychometric model of intelligence with a hierarchical complexity model, TDRI allows the identification of different stages of intelligence development (Golino, Gomes, Commons, et al., 2014).
TDRI measures seven stages of inductive reasoning: single representation, representational mapping, representational system, single abstraction, abstract mapping, abstract system and metasystematic.Each stage is assessed by using eight items.Each item has groups of letters ordered according to a rule.The respondent must discover the rule and indicate the alternative that does not follow it.There is only one correct alternative, and each hit is recorded as 1, while each error is recorded as 0. TDRI has evidence of validity and reliability in Brazilian samples (Golino & Gomes, 2019;Golino, Gomes, et al., 2014;Gomes, Golino, et al., 2014).The model tested in this study had a sample of 488 students and is characterized by a general factor of inductive reasoning and four specific factors, all orthogonal to each other.The general factor explains the variance of the 56 items and each specific factor explains 8 items.These specific factors represent the first four stages of TDRI.Stages 5, 6 and 7 were not defined in the model due to the limitations of the sample; few people got the items in these stages right, making their identification unfeasible.

Learning Approach Scale (Escala de Abordagem de Aprendizagem -EABAP)
EABAP is a self-report questionnaire that measures learning approaches in people who have at least incomplete elementary education.The instrument consists of 17 items that represent motivations and strategies related to classroom and study.EABAP includes 8 items that measure superficial approach and 9 items that measure deep approach.The respondent must answer how much each behavior is present in his life, answering on a Likert-type scale from 1 to 5 where 1 represents "not at all" and 5 represents "totally".
EABAP shows evidence of validity and reliability in Brazilian samples of primary and secondary education (Gomes, 2010c(Gomes, , 2011a(Gomes, , 2013;;Gomes & Golino, 2012b;Gomes et al., 2011).The EABAP model with the two correlated factors had a sample of 648 students and showed an acceptable fit (CFI = .968and RMSEA = .073,95% CI [.066, .079]).The deep approach and the superficial approach correlated at -.60 and had mean factor loads of .64.The scores were found to be reliable, according to Cronbach's alpha, with .84 and .86 for superficial and deep approach and with .83 and .86 according to McDonald's omega.
In turn, the superficial approach of EABAP includes eight items.If the respondent selects the value of 1 on the Likert-type scale for all these items, then their score will be 8/8 = 1.
TDRI scores consist of the sum of correct answers divided by eight (number of items per stage).For example, if the respondent hits 40 items, then his score is 40/8 = 5.This score indicates the respondent's stage on inductive reasoning.The overall ENEM score can vary from 0 to 1000, so that the higher the score, the better the performance.
The first stage includes the descriptive statistics of the variables.In this step, the mean, the standard deviation, the minimum and maximum values, the kurtosis, the asymmetry and the correlation matrix are presented.
The second stage of the analysis examines by means of multiple linear regression whether or not SLAT-Thinking predicts academic performance.Versions 0.5.3 and 3.0.9 of the olsrr (Hebbali, 2020) and car (Fox & Weisberg, 2019)  variables.The stepwise forward method was used in order to select only the variables that increase the explanation of the dependent variable.The variance inflation factor (VIF) was inspected to examine multicollinearity.The Shapiro-Wilk test, kurtosis and skewness were used to assess the normality of the residues, just as the Bonferroni Outlier test was used to examine the presence of outliers and the score test was used to assess homoscedasticity.
In the third step of the analysis a regression tree is performed, using the rpart package (Therneau & Atkinson, 2019) and involving the same variables as the previous step.The tree regression model does not assume that the relationships between variables are necessarily linear, nor does it assume important data characteristics, such as normality of the dependent variable, homoscedasticity and independence of predictors.Although the literature suggests the use of pruning to minimize overfitting (Osei-Bryson, 2008), this procedure underestimates the correct number of leaves in small samples and overestimates in large samples.For this reason, this procedure was not used.For more details on the CART algorithm, see Gomes and Almeida (2017), Gomes andJelihovschi (2019, 2020), Gomes, Amantes, et al. (2020), as Gomes, Lemos, et al. (2020).

Results and discussion
The results of the descriptive analyzes can be seen in Table 1.The mean of the deep self-reported approach is higher than the mean of the superficial self-reported approach, since the 95% confidence interval of the deep approach (3.82 ± 0.03 * 1.96 = 3.76, 3.88) does not overlaps with the confidence interval of the self-reported superficial approach (2.62 ± 0.03 * 1.96 = 2.56, 2.68).This indicates that the participants in this study perceive themselves as deeper than superficial.In addition, as their average score is 3.82, and this value is very close to point 4 of the scale, it can be said that the participants have a perception that deep behaviors are frequent in their repertoire.In turn, the participants believe that the superficial approach behaviors are moderately present in their repertoire.The average of the performance in superficial approach is greater (.The ENEM global score correlates positively and weakly with the superficial approach of SLAT-Thinking and with inductive reasoning (TDRI).Among the dimensions of SLAT-Thinking and EABAP there is only a statistically significant correlation between the deep approach of EABAP and the superficial approach of SLAT-Thinking (r = .10).This indicates that the dimensions of the two instruments are orthogonal to each other.This orthogonality is not a problem, as the EABAP evaluates approaches in a broad context that involves the diversity of study and learning behaviors in the classroom.In turn, SLAT-Thinking assesses the individual's approaches to identify the author's thinking in a given text.Since the superficial approach of SLAT-Thinking correlates positively with the overall score of ENEM (r = .25)and with inductive reasoning (r = .14),and as we contrast this result with the negative correlation between the superficial approach of EABAP and inductive reasoning (r = -.20), it can be seen that the superficial approach derived from performance is different from the superficial selfreported approach.If the motivation in the self-reported superficial approach involves low engagement, this does not necessarily occur in the superficial approach measured by performance.Unlike the meta-analyzes presented in the introduction to this article, no statistically significant correlations were found between self-reported approaches and academic performance.On the other hand, the size of the correlation between academic performance and the superficial approach of SLAT-Thinking is similar to those found in meta-analyzes.
The step forward multiple regression analysis indicated that only the superficial approach of SLAT-Thinking and inductive reasoning explain the ENEM global score.

Revista Interamericana de Psicología/Interamerican Journal of Psychology
The model's intercept was 441.67,CI 95% [364.70, 518.64], indicating that if the participant misses all items in the superficial approach of SLAT-Thinking and TDRI, he will get an overall score in the ENEM of 441.67 points.The slope of 123.80,CI 95% [56.96, 190.65] for the superficial approach of SLAT-Thinking indicates that by getting all the items in that domain right, it would add 123.80 points to the overall ENEM score, producing a score of 565.47 points.The slope of 16.76,CI 95% [2.946,30.56]for inductive reasoning indicates that at each dominated stage of inductive reasoning there is an increase of 16.76 points in the overall score of ENEM.
The superficial approach of SLAT-Thinking explained .0592(adjusted R² = 5.92%) of the variance of the overall ENEM score, while inductive reasoning increased the explanation of variance by .015(adjusted R² = 1.5%), so that the model explained .0742(adjusted R² = 7.42%) of the variance of the overall ENEM score.If we take the Cohen (1988) criterion as a reference, and knowing that the R² of 7.42% is equal to r of .272,we conclude that the variance explained by the model has a weak to moderate size.
These results indicate that SLAT-Thinking shows evidence of predictive and incremental validity, insofar as the superficial approach is the main predictor of the ENEM global score.
The tree produced by the CART algorithm is shown in Figure 1.Each oval element in Figure 1 is a terminal node, in other words, a leaf of the tree.Within these elements there is a number that represents the predicted score of the people who are allocated by the CART algorithm to belong to that leaf.For example, the upper left extreme leaf shows the number 510, indicating that the participants who are part of that leaf have, according to the predictive model, a score of 510 points on the overall ENEM score.In turn, the number at the bottom of the oval element indicates the relative percentage of the sample.
In our example, the number 7% indicates that 7% of the participants are part of this leaf.
The tree must be read from top to bottom.Let us take again the example of the upper left extreme sheet.To understand the characteristics of this group of people, it is necessary to observe the initial group at the top of Figure 1 and follow the lines that lead to the sheet.The upper node is composed of all persons in the sample.This node was broken into two parts using the .92score of the superficial approach of SLAT-Thinking.

ARTICLE | 12
People who scored less than .92,were allocated to the leaves on the left of Figure 1, while people who scored at least .92were allocated to the leaves on the right.Note that the upper left extreme sheet is made up of people who scored less than .92 in a superficial approach to SLAT-Thinking.However, this sheet on the extreme left is not only characterized by the score in the superficial approach of SLAT-Thinking.Only people with a score below 2.7 points in inductive reasoning are part of this group.In short, this group is characterized by people who do not master the superficial approach and have not reached the stage of a system of representation of inductive reasoning.
CART produced 19 splits, 20 leaves and a relative error of .65,indicating that the model explained 35% of the variance of the ENEM global score.We simulated 100 regression trees with pruning for three different sample sizes, the first with 239 observations (sample size for this analysis), the second with 10,000 observations and the third with 100,000 observations.The simulations considered as true the means and standard deviations of the variables of the tested model, as well as the cutoff points of the tree nodes.The averages of the number of leaves produced by the 100 simulations for each sample size were calculated.
The simulation with 239 observations produced pruned trees with an average of 12.21 (SD = 3.86) leaves, ranging from 2 to 20 leaves.By taking the original tree as the true tree, this result indicates an evident underestimation of the true number of leaves.
The simulation with 10,000 observations produced pruned trees with an average of 53.62 (SD = 12.11) leaves, ranging from 36 to 85.The simulation with 100,000 observations produced pruned trees with an average of 66.5 (SD = 12.11) leaves, ranging from 62 to 71.These results indicate that, for large samples, the pruning procedure overestimates the correct number of leaves relative to the original tree.In short, the simulations indicate that the pruning does not estimate the correct number of leaves and does not eliminate the problem of overfitting.We believe that the best way to assess the consistency of the leaves identified by the tree in this study is to investigate whether the model tested in this study replicates in other samples.
In turn, students with the highest ENEM global score (711) believe that they have, at least, moderate behaviors of: (1) deep approach (D.A. EABAP > 3.4) and (2) superficial approach (S.A. EABAP > 3.5) in the context of the classroom and study.In short, the group with the highest ENEM score is the group that reports combining the two approaches, maximizing performance.This combination is evidence favorable to the strategic approach, which is in line with Biggs' (1985) argument that the student highly motivated to get good grades combines strategies from both approaches to achieve better performance.
Since it is the first variable chosen by the CART algorithm to break the data into nodes, the superficial approach of SLAT-Thinking is the most important predictor variable in the model.A greater superficial approach implies a higher overall score for ENEM.
The tree shows that having a greater deep intermediate approach to SLAT- The deep approach of SLAT-Thinking is positively associated with the overall score of ENEM, conditioned to the superficial approach of SLAT-Thinking, to the inductive reasoning, to the deep intermediate approach and to the superficial approach of EABAP.Getting at least 8.5% of the items in this approach results in a score from 573 to 692 points in the ENEM, a difference of 119 points, indicating an increase of more than one standard deviation.
Inductive reasoning has non-linear relationships with the overall score of ENEM.
The stages of the representation map (stage 2), representation system (stage 3), singular abstraction (stage 4) and abstraction map (stage 5) differentiate the overall score of ENEM.
The deep approach of EABAP has non-linear relationships with the overall score of ENEM.There are situations in which reporting a more profound approach in the context of the classroom and study implies a higher ENEM score, while there are situations in which this relationship is reversed.The same is true for the superficial approach of EABAP.

Conclusion
This article examined the predictive and incremental validity of SLAT-Thinking relative to the overall score of ENEM.The superficial approach of SLAT-Thinking was the main predictor and inductive reasoning the second most important predictor, both in the linear multiple regression model and in the tree regression model.2023, Vol., 57, No. 3, e1514 ARTICLE | 15

Revista Interamericana de Psicología/Interamerican Journal of Psychology
While the multiple linear regression model explained 7.42% of the variance of the overall ENEM score, the tree regression model explained 35% of the outcome variance.
In the case that the regression tree does not have too much overfitting, which is only possible to know by replicating this study in several other samples, it can be said that the tree regression variance explanation is much higher than the multiple linear regression's due to the predictors having pronounced non-linear relationships as an outcome.The It is important to note that TDRI has 5 multiple-choice options, while SLAT-Thinking has only 3 response options, one of which is very unlikely to be answered because it is an option that the respondent claims to have no way to answer.This option is not part of the answer key and has no logical support, and may be selected only by people with a very high superficial approach.Therefore, the respondent has approximately a 50% chance of hitting a SLAT-Thinking item by merely choosing the answer at random and it is very likely that a great number of participants with high performance in the intermediate-deep approach and in the deep approach have had a high performance at random.This feature of the test shall be changed in future versions.A new version of SLAT-Thinking with many multiple-choice options may reverse this strong chance of hitting at random, making the assessment of deep approach levels more reliable.This would also allow us to analyze whether the lack of correlation between the deep approach and the ENEM scores is related to the characteristic of SLAT-Thinking itself.
Despite the aforementioned shortcomings, this study brings evidence of the predictive and incremental validity of SLAT-Thinking.This test increases the prediction of the overall score of ENEM, taking as control both the inductive reasoning and the selfreport of approaches.

Figure 1
Figure 1 results of the regression tree indicated that the superficial and deep approaches of EABAP and inductive reasoning have non-linear relationships with the ENEM scores.This study has a shortcoming which is the lack heterogeneity in the sample relative to the performance in SLAT-Thinking approaches and in inductive reasoning.Few participants in the sample of this study hit the correct answers about the items of the deep intermediate, deep approaches and the items of the more advanced stages of TDRI.Only 98 participants hit more than half of the items of deep intermediate approach, while only 8 participants hit more than half of the items of deep approach of SLAT-Thinking.Only 61 people reached the abstraction map stage in inductive reasoning, only one reached the abstraction system stage and none reached the meta-systematic stage.

Revista Interamericana de Psicología/Interamerican Journal of Psychology 2023
with at least incomplete high school and measures four levels of approaches: superficial, superficial intermediate, deep intermediate and deep.The test consists of two texts of similar sizes and 12 items related to each text.Each item consists of a proposition that may or may not represent the author's thinking.The respondent must mark one of three The model presented an acceptable fit (CFI = .997,RMSEA = .060,95% CI [.058, .062]),and average factor loads of .44 (general factor), .90(stage 1), .78(stage 2), .73(stage 3) and .71(stage 4).TDRI produced reliable scores for the general factor (.94), stage 1 (.99), stage 2 (.97), stage 3 (.97) and stage 4 (.98) according to Cronbach's alpha, and reliable scores for the general factor (.66), stage 1 (.64) and stage 2 (.60) according to McDonald's omega.
80 ± .01 * 1.96 = .78,.82)than the averages of the deep intermediate (.32 ± .01 * 1.96 = .30,.34)and deep (.11 ± .01 * 1.96 = .09,.13).This indicates that the participants in this study predominantly achieve the superficial approach to the ability assessed by SLAT-Thinking.The TDRI average indicates that the sample participants are close to the fourth stage of inductive reasoning, the singular abstract stage.Considering that the ENEM scale is prepared by the National Institute of Educational Studies and Research Anísio Teixeira (INEP) to have an average of 500 and a standard deviation of 100 (BRASIL/INEP, 2011), it appears that the participants in this research had a performance above one standard Revista Interamericana de Psicología/Interamerican Journal of Psychology 2023, Vol., 57, No. 3, e1514 ARTICLE | 9 deviation in relation to the scale average, indicating a performance above the national average.Statistically significant correlations (p < .05)are highlighted in bold.deep approach from SLAT-Thinking, D.A. SLAT = deep approach from SLAT-Thinking, SD = standard deviation, SE = standard error, min = minimum, max = maximum.
Thinking is associated with lower overall score of ENEM.This result is not what we would expect, since, in theory, the deep intermediate approach should be positively associated with academic performance.It is important to note whether this result is replicated in new studies, or it is just a peculiarity or mere randomness of the sample in this study.