Examining the factorial validity of the Individualized Classroom Assessment Scoring System in preschools in Austria

ABSTRACT Reliable information regarding interaction quality in preschools is essential because children’s preschool experiences predict educational success. The Individualized Classroom Assessment Scoring System (inCLASS) is a new tool to assess interaction quality developed in the United States. Research on the validation of the factorial structure of the inCLASS—with the factors teacher interactions (comprising positive engagement, teacher communication), peer interactions (peer sociability, peer communication, peer assertiveness), task orientation (task engagement, self-reliance), and conflict interactions (behaviour control, teacher conflict, peer conflict)—is limited and inconsistent. This study aims to extend knowledge on the factorial validity of the inCLASS in Austrian preschools. The sample consists of 261 children (M = 49.54 months of age) from 81 preschool classes in Tyrol, Austria. Initial confirmatory factor analysis indicated negative residual variances and standardised factor loadings >1 for teacher communication and behaviour control. After modifications such as setting error residuals for teacher communication and behaviour control to zero and allowing the residuals of positive teacher engagement and peer sociability to covary the model fit was acceptable to good. The findings suggest a revision of the operationalisation of conflictual interactions and a consideration of attachment theory to explain the relationship between teacher interactions and peer interactions.


Introduction
The educational quality of preschools includes structural aspects (e.g. group size, teacher-child-ratio), educational beliefs (e.g. educational goals of preschool teachers, beliefs about the relevance of specific domains), collaboration with parents (e.g. joint activities, parents council), and interactional processes (e.g. interactions between problems in clarifying if and to what extent the previous findings are applicable to other countries with unique preschool systems and educational approaches. This study examines the factorial validity of the inCLASS tool via a sample from Austria; it is the first study on this topic in Austria. After a brief description of the Austrian preschool system, the inCLASS tool will be introduced, and previous research on the factorial validity of the inCLASS will be reported. Thereafter, the study aims and the methodical procedure will be outlined. Finally, the findings are reported and discussed.

The Austrian preschool system
In Austria, preschools have high attendance rates of around 93% for 3-5-year olds and are in public and private sponsorship. Since 2009, the last year of preschool (i.e. the year before school entrance) has been mandatory (Smidt 2018). A striking difference to many other countries (e.g. United States, several European countries) is that the regular vocational qualification of pedagogues working in Austrian preschools is nonacademic (5-year training or a shortened 2-year training for students with university entrance qualification) (Smidt 2018). Only recently have some academic courses in early childhood education (advanced trainings) been introduced (Hartel et al. 2019). This relatively low level of formal qualification is somewhat in contrast to discussions about professionalisation of early childhood pedagogues (Smidt et al. 2017) as well as results on the predictive role of preschool teachers' formal level of education for educational quality in preschools (Manning et al. 2019).
In terms of educational practices in preschools, an educational plan with basic pedagogical guidelines (e.g. diversity, empowerment, individualisation) as well as educational domains in which children should be supported (e.g. emotions and social relationships, language and communication, nature and technology) was established in 2009 (Ämter der Landesregierungen der österreichischen Bundesländer, Magistrat der Stadt Wien, and Bundesministerium für Unterricht, Kunst und Kultur 2009). There is a relatively strong separation between preschool and elementary education (Smidt 2018), which may also be reflected in considerations of a 'work-care-reconciliation model' (Scheiwe and Willekens 2009, 13) primarily emphasising non-academic skills.

Previous research on the factorial validity of the inCLASS
The inCLASS was developed in the United States (inCLASS, Downer et al. 2010) and it is intended to measure interaction quality at the individual level of the child (Halle, Vick Whittaker, and Anderson 2010). In terms of factorial validity, Downer et al. (2010) established a four-factor structure that includes nine items by using exploratory factor analysis with oblique rotation: teacher interactions (comprising of two items: positive engagement with the teacher and teacher communication), peer interactions (comprising of three items: peer sociability, peer communication, and peer assertiveness), task orientation (comprising of two items: engagement within tasks and self-reliance), and conflict interactions (comprising of two items: teacher conflict and peer conflict). Subsequently, during the revision of the inCLASS, behaviour control-representing 'a degree of behavioural dysregulation' (Bohlmann et al. 2019, 169)-was added as the tenth item and was assigned to the factor conflict interactions (Vitiello et al. 2012).
As the inCLASS gained importance in international research to capture interaction quality, attempts have been made to validate the four-factor-structure via confirmatory factor analyses. Unfortunately, the findings have been inconsistent and indicate the presence of problems. A German study with 110 children from 38 preschools by von Suchodoletz, Gunzenhauser, and Larsen (2015) replicated the four-factor-structure after excluding teacher conflict (conflictual interactions between children and preschool teachers) due to lack of item variance. They presumed that conflictual interactions with preschool teachers are not sufficiently operationalised or are rare in daily preschool life in Germany. More recently, Slot and Bleses (2018) published a study of 184 children from 81 preschool classrooms in Denmark. They replicated the four-factor-structure after defining teacher conflict and peer conflict (conflictual interactions between children and peers) as categorical variables due to a lack of variance and after constraining the residual variance of positive engagement with the teacher (children's emotional connection with the preschool teacher, use of preschool teacher as a secure base) to zero. Based on the negative correlations between teacher interactions and peer interactions, the authors concluded that Denmark has 'a different "cultural model" of preschool education' (Slot and Bleses 2018, 75) with less emphasis on teacher-directed interactions and more on interaction with peers. In an American study with 711 children from 220 preschool classrooms, Bohlmann et al. (2019) replicated the four-factor-structure after constraining the residual variance of teacher communication (children's initiations and maintenance of conversation with the preschool teacher) to zero and allowing for covariation between the residual variances of behaviour control (children's regulations of movements and vocalisations) and engagement with tasks (children's involvement in activities and tasks). They stated that the latter is based on a conceptual proximity between behaviour control and engagement with tasks.

Study aims
Previous findings indicate that validating the factorial structure of the inCLASS is problematic and the modifications vary in terms of sophistication. Some studies suggest that country-specific and cultural issues may be responsible for the problems that arise in validating the structure (Slot and Bleses 2018;von Suchodoletz, Gunzenhauser, and Larsen 2015). This implies that clarifying whether and how far these findings and explanations are transferable to other countries (with specific preschool systems and educational approaches, see Scheiwe and Willekens 2009) is important. This is particularly applicable for Austria where there has been no research on this topic so far. This study aims to examine the factorial validity of inCLASS by using a sample from Austria. It is expected that reliable findings would sensitise researchers to being cautious when applying instruments like the inCLASS for country-specific educational contexts. Further explanations for improper solutions might also be provided.

Participants
This study used data from the longitudinal project 'Quality of Children's Interactions in Preschool' (InKi), that was funded by the Austrian Science Fund (FWF). Statistical analyses refer to the first wave of data collection, which occurred from April to June 2019. The sample contains 261 observed children (131 girls) from 81 preschool classes (from 81 preschools) in Tyrol state in Austria. The children were in their first year of preschool, and were an average of 49.54 months (SD = 4.25; range = 36.76-58.28) old. The sample was randomly selected, and children with an immigrant background (19 percent had a family language other than German) are slightly overrepresented.

Measures
Interaction quality was measured using the inCLASS (Downer et al. 2010). The inCLASS observation tool was developed to assess children's competence in interactions with teachers, peers, and tasks in preschool classrooms (e.g. the degree to which the child initiates communication with teachers and other children) (Downer et al. 2010;2012). The inCLASS consists of ten items (dimensions) assigned to four factors (domains): teacher interactions (α = .81, containing positive engagement with the teacher and teacher communication), peer interactions (α = .89, containing peer sociability, peer communication, and peer assertiveness), task orientation (α = .63, containing task engagement and self-reliance), and conflict interactions (α = .57, containing teacher and peer conflict as well as behaviour control [reverse coded]). Domains and dimensions of the inCLASS as well as a brief description of each dimension are presented in Table 1. Internal consistencies of the inCLASS domains are acceptable to good (Nunnally 1978) other than in the case of the domain 'conflict interactions'. A possible reason for the lower internal consistency of this domain could be the small variance of the dimension 'teacher conflict' (M = 1.04; SD = 0.14) (see von Suchodoletz, Gunzenhauser, and Larsen 2015 for similar findings).
According to the technical manual (inCLASS Technical Manual 2010) up to four children per preschool class can be observed with three or four alternating observation cycles Measures the extent to which the child's interactions with peers are characterised by tension, aggression, or negativity on a regular morning across all activity settings (over approximately four hours). On average, 3.70 observation cycles (SD = 0.53) per child were carried out from morning to noon (typically between 8 am and 12 pm) during one observational visit in each preschool. Each observation cycle took 15 min with 10 min for observation and coding of the dimensions followed by a five-minute period in which the level of observed interactions was rated on a seven-point scale (1-2 = low level, 3-5 = intermediate level, and 6-7 = high level) guided by multiple behavioural indicators . For statistical analysis, the rating scores of all cycles were averaged in each dimension. Interactions of the children were mainly observed during free choice (59.14% of the observed time); furthermore, children's interactions were observed during planned and led activities (20.86%), transitions (8.94%), mealtime (7.87%), and nursing routines (2.89%) (Smidt and Embacher 2020). Before data collection, 14 observers (students of educational science and psychology) underwent two full days of intensive training by a certified inCLASS trainer. The training included detailed information regarding the domains and dimensions of the inCLASS as well as instructions for the usage of the manual; furthermore, the trainees had to watch and code several training clips. At the end of the training, all observers had to independently code five reliability clips (10 min each). To pass the reliability test, all data collectors had to score within one point of the mastercode on 80% of their scores (Downer et al. 2010).
Overall, 8.81% of the observations (85 observation cycles) were double coded by two trained observers who independently observed and rated the same children. Concerning the inter-rater reliability, intraclass correlation coefficients (ICCs) of the single dimensions were calculated and ranged between .75 and .95 (see Table 2). This indicates an excellent inter-rater agreement (Cicchetti 1994).

Data analysis
To examine the factorial validity of the proposed four-factor model (Downer et al. 2010;Vitiello et al. 2012), a confirmatory factor analysis (CFA) with a maximum likelihood estimation was conducted with Amos 26 (Arbuckle 2019). The multivariate normality test (Mardia's multivariate kurtosis = 43.65, p < .001) indicated violation of normality assumptions. Furthermore, skewness and kurtosis of the dimensions 'teacher conflict' and 'peer conflict' (see Table 2) exceeded univariate normality criteria with values greater than ±3 for univariate skewness and ±10 for univariate kurtosis (Kline 2016). In terms of the non-normality and nested nature of the data, a Bollen-Stine bootstrap with 1000 bootstrap samples was used to adjust the model test statistic p value (Nevitt and Hancock 2001). Regarding the goodness of fit, χ 2 / df ≤ 3 and p ≥ .01 are considered acceptable (Schermelleh-Engel, Moosbrugger, and Müller 2003). The χ 2 fit statistics are sensitive to sample size and violation of the multivariate normality assumption; thus, additional absolute and incremental fit indices were examined. Absolute fit indices (e.g. root mean square error of approximation, standardised root mean square residual) measure how well an a priori model fits the sample data; in contrast incremental fit indices (e.g. comparative fit index, Tucker-Lewis index) compare the tested model with a 'baseline' model and assess the proportionate improvement in fit (Hu and Bentler 1998). Comparative fit index (CFI) ≥ .95, Tucker-Lewis index (TLI) ≥ .95, root mean square error of approximation (RMSEA) ≤ .06, and standardised root mean square residual (SRMR) ≤ .08 indicated good fit. A CFI ≥ .90, TLI ≥ .90, RMSEA ≤ .08, and SRMR ≤ .10 indicated acceptable fit (Hu and Bentler 1998;Vandenberg and Lance 2000). There were no missing data. Table 2 shows descriptive results including means, standard deviations, range, skewness, kurtosis, and ICCs of the ten inCLASS dimensions. Mean scores of the single dimensions range between M = 1.04 (SD = 0.14) and M = 4.87 (SD = 0.89) indicating low to intermediate level of interaction quality.
Modifications were applied because the replication of the initial factor-structure resulted in invalid solutions (Heywood cases for the dimensions 'teacher communication' and 'behaviour control'): The variance of the error residuals of 'teacher communication' and 'behaviour control' were set to zero. In addition, the residuals 'positive teacher engagement' and 'peer sociability' were allowed to covary based on the importance of attachment theory, which states that it can be assumed that securely attached children consider preschool teachers as a secure base to explore their social environment and engage in peer interactions (Cugmas 2011;Howes, Hamilton, and Matheson 1994).

Discussion
Considering a sample from Austria, this study investigates the factorial validity of the inCLASS tool via confirmatory factor analyses. The initial findings indicate an improper solution in terms of Heywood cases. However, after considering reasonable

Factorial validity of the inCLASS
As in previous studies (Bohlmann et al. 2019;Slot and Bleses 2018;von Suchodoletz, Gunzenhauser, and Larsen 2015), the initial validation of the four-factor-structure of the inCLASS resulted in complications that were addressed through modifications. Heywood cases, as found in this study, can be attributed to distributional problems of the data or misspecifications of the model (Kolenikov and Bollen 2012). If a correct specified model (that is, four intercorrelated factors) can be assumed, then violation of the assumption of multivariate normality, as the Mardia-test indicated, could pose a problem. In line with the suggestions of Kolenikov and Bollen (2012), these issues were addressed by applying the Bollen-Stine Bootstrapping Method and by holding the variance of the error residuals for teacher communication and behaviour control at zero. Further introspection indicated that the univariate skewness and kurtosis were particularly high for conflictual interactions with preschool teachers (Table 2). This is consistent with findings from Denmark (Slot andBleses 2018), Germany (von Suchodoletz, Gunzenhauser, andLarsen 2015), and the United States (Bohlmann et al. 2019;Downer et al. 2010). Research has also indicated that the differences between demographic groups based on gender, ethnicity, and poverty status (Bohlmann et al. 2019) are small. This may indicate that the low amount of variance is less related to countryspecific issues as has been previously suggested (von Suchodoletz, Gunzenhauser, and Larsen 2015). Rather, it could be primarily attributed to the operationalisation of conflictual interactions, which are sometimes subtle and difficult to observe (Downer et al. 2010;von Suchodoletz, Gunzenhauser, and Larsen 2015). The data suggest that distributional problems of the conflictual items are a major problem of the inCLASS and a challenge in data analysis including strategies to deal with these issues: defining conflict items as categorical variables (Slot and Bleses 2018) or even eliminating a conflict item (von Suchodoletz, Gunzenhauser, and Larsen 2015). One conclusion is that a revision of the conflictual items would be advisable to try to reduce or overcome the distributional problems. This might also true for other measures where distributional problems in terms of lack of variance of conflictual interactions have also been found (e.g. Smidt 2012; Stuck, Kammermeyer, and Roux 2016).
We found a small positive correlation between teacher interaction and peer interaction (r = .23). This is in contrast to Slot and Bleses (2018) in Danish preschools but is similar to the United States (Bohlmann et al. 2019). Based on Slot and Bleses's (2018) findings, it could be assumed that emphasis should be on teacher-directed interaction in favour of peer interaction. However, a recent study found only a small proportion of teacher-directed interactions and activities in Austrian preschools (Smidt and Embacher 2020). This further leads to an alternative explanation.
Here, we evaluated the modification indices provided to improve the model fit by considering theoretically meaningful explanations. For this purpose, we allowed the errors between positive teacher engagement and peer sociability to covary based on attachment theory according to which securely attached children use preschool teachers as a secure base to explore their social environment and to become involved in interactions with peers (Cugmas 2011;Howes, Hamilton, and Matheson 1994). Although this modification led to a substantial improvement in the model fit, we cannot examine the influence of children's attachment on their interactions because this is not in the scope of the study. This assumption should be investigated in the future by considering children's attachment patterns as a moderating variable.

Study limitations and implications for research
This study does have some limitations. First, the study is limited by the relatively small sample size of 261 children from 81 preschools. Second, the study was only confined to one federal state of Austria, Tyrol. Austria has a countrywide education plan for preschool education; however, important elements of the preschool system (e.g. class size, teacher-child-ratio) are regulated by law at the federal state level (Smidt 2018). Future studies should address these restrictions by recruiting a representative sample for Austria. Third, the children were in the first year of preschool. Research should also investigate interaction quality in preschool at a later stage when children are older because the pattern of interaction quality would likely have changed (e.g. increased school preparation activities and interaction with preschool teachers). This is particularly applicable for the final year of preschool (the year before school enrolment), which is obligatory for all children (Smidt 2018).
To summarise, the findings of the present study indicate that applying inCLASS for Austrian preschools seems feasible. Despite initial improper solutions, there is some evidence for the validity of the factorial structure of the inCLASS model with a sample from Austria. The factorial structure of the inCLASS can be used with some caution in research practice. Nevertheless, greater challenges remain for national and international research. Future research should attempt to revise the operationalisation of conflict interactions. In addition, the importance of attachment theory to explain the relationship between teacher interactions and peer interactions should be considered.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by the Austrian Science Fund (FWF) under Grant P 30598.