Introduction

One of the most frequent mental disorders of children and youth is attention-deficit/hyperactivity disorder (ADHD). In Germany, about 4.7% of school children aged between 5 and 14 years are given this diagnosis (Göbel et al., 2018). In school, children with ADHD usually exhibit specific cognitive and behavioral problems (Frölich et al., 2021). Children with ADHD are eight times more likely to be recommended for special education than children without this diagnosis (Jantzer et al., 2012). Stereotypes about ADHD held by teachers have been shown to negatively affect their perception of students’ achievement (Metzger & Hamilton, 2021).

In school systems with hierarchical tracks in secondary education (e.g., in Germany, the Netherlands, Austria, or Luxembourg), teachers decide at the end of primary school about the school track a student is supposed to attend in secondary school. In these countries, successful completion of the highest track allows for university entrance, whereas lower tracks usually do not (European Education & Culture Executive Agency, 2021). These track decisions are primarily based on students’ achievement in school.

However, there is ample evidence (Glock et al., 2013) that teachers’ track decisions are also affected by students’ characteristics that are not or only peripherally related to achievement. Explanations of these effects are predominantly based on stereotypical assumptions of teachers regarding students with different characteristics (e.g., Klapproth et al., 2020). The stereotyped ascriptions of characteristics to students often come along with expectations about their achievements.

Since the expected behavior of students is likely to affect teachers’ decisions, it can be assumed that teachers’ knowledge of behavioral or mental problems of students would also influence their decisions. Previous studies indicate that students who are labeled as suffering from learning disorders elicit lower teacher expectations regarding students’ achievement compared to students without a label (Bianco & Leech, 2010). Research has also shown that teachers tend to attribute the causes of behavioral and emotional difficulties of students to poor parenting and the home context (Miller, 1995). Both parenting and home context are known to largely affect students’ achievement in school (Areepattamannil, 2010). It appears therefore reasonable to assume that labeling a student with an ADHD diagnosis would similarly affect teachers’ track decisions.

The study at hand is the first one that experimentally explored whether the knowledge of a student’s ADHD diagnosis affects teachers’ decisions regarding the track on which the student will be taught in secondary school. It also examined whether track decisions based on the knowledge of students’ ADHD diagnoses were different between pre-service and in-service teachers.

Symptoms of ADHD

Essential features of ADHD are a persistent pattern of inattention or hyperactivity and impulsivity that interferes with functioning and development. According to the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association [APA], 2013), inattention represents as a lack of persistence, wandering off tasks, having difficulties in focusing on tasks, and being disorganized. Hyperactivity means excessive and often inappropriate motor activity, excessive fidgeting, or talking. Impulsivity entails actions that are not planned and therefore may have the potential for harm to the individual (e.g., running on the street without looking). ADHD often co-occurs with delays in language, motor, and social development, low frustration tolerance, irritability, of mood lability.

ADHD and school

Children with ADHD usually show poor academic performance in school and tend to be involved in educational problems (DeShazo et al., 2002), despite average or above-average IQ estimates (LeFever et al., 2002). Roughly one-third of children with ADHD have a learning disability (DuPaul & Stoner, 2014). Moreover, children with ADHD are more likely to be expelled, suspended, or repeat a grade, compared with controls (Loe & Feldman, 2007). They also have a higher probability of obtaining a lower-level degree and failing to graduate from secondary school (Galera et al., 2009).

Teachers often experience less emotional closeness, less cooperation, and more conflicts in their relations with their students with ADHD than with other students (Ewe, 2019). Attitudes of teachers as well as their behavior toward students with ADHD depend on their knowledge about ADHD (Ohan et al., 2011). Knowledge of ADHD may include an understanding of the symptoms, causes, and treatment for the disorder (Poznanski et al., 2018). However, several studies indicate that teachers’ knowledge about ADHD is only moderate (Jerome et al., 1999; Poznanski et al., 2018).

Stereotypes about ADHD

Numerous studies (e.g., Coleman et al., 2009; Law et al., 2007; Martin et al., 2007; Walker et al., 2008) suggest that individuals labeled as having ADHD face negative stereotypes and social rejection. For instance, Walker et al. (2008) presented children and adolescents with a vignette that displayed a peer with either ADHD, depression, or asthma. The participants responded to items assessing—among other aspects—positive and negative attributions. They found that the participants were more likely to make negative attributions about peers with ADHD and depression versus asthma, particularly regarding the likelihood of antisocial behavior and violence.

Stigmatization of ADHD children has also been found in school. For instance, Metzger and Hamilton (2021) showed that teachers’ evaluation of students’ achievements was affected by whether the students were labeled as having ADHD or not. ADHD students were perceived as performing lower than their peers without ADHD. Moreover, teachers were prone to neglect high achievements of students with ADHD—regardless of demonstrated ability on academic achievement tests. These results suggest that an ADHD diagnosis of students may activate teachers’ negative stereotypes about diagnosed students and hence may affect teachers’ perceptions and evaluations of their students.

Stereotypes can be seen as simplified cognitive representations of reality, which consist of characteristics of a group of individuals that automatically come into mind when thinking about or being presented with that group (Bordalo et al., 2016). Once mentally activated, stereotypes are used to categorize individuals as members of a group (Brewer, 1996). Stereotyping, which is the use of stereotypes to build an impression of a person, occurs almost instantly and without cognitive effort whenever an individual meets another individual (Fiske & Neuberg, 1990). Whereas stereotypes allow for quick assessments of individuals, they may also cause biased judgment because stereotypes highlight differences between groups and tend to neglect the similarities (Bordalo et al., 2016).

Stereotypes about ADHD contain assumptions and beliefs regarding the presumed characteristics of ADHD-diagnosed individuals, which are, for instance, unreliability, unpredictability, disinterest in others, impoliteness, character weakness, and emotional immaturity (Masuch et al., 2019). Teachers’ stereotypes regarding ADHD are likely to affect their selection of a teaching approach (Westwood, 1996), their willingness to implement interventions (Vereb & DiPerna, 2004), and their classroom management strategies (Atkinson et al., 1997).

A possible gender bias in ADHD diagnosis

Generally, ADHD is more frequent in boys than in girls (Gudjonsson et al., 2014). Moreover, girls are more likely than boys to show predominantly attention deficits, whereas in boys, ADHD symptoms are usually stronger (Arnett et al., 2015). Girls, as opposed to boys, exhibit less frequently learning disorders, problems in school, or oppositional defiant disorder (Biederman et al., 2002). Consequently, boys with ADHD are more often suspended from school than girls with ADHD (Bauermeister et al., 2007). Teachers tend to suggest medical or psychological examinations more often for boys than for girls when they suspect ADHD (Sciutto et al., 2004).

Despite the differences in the prevalence of ADHD diagnoses between males and females, there is a debate regarding whether ADHD may be overdiagnosed in boys (Bruchmüller et al., 2012; Fresson et al., 2019). Overdiagnosing ADHD in boys may be caused by stereotypes because boys, compared to girls, are often perceived as more impulsive and inattentive than girls (Brown & Stone, 2016). Both characteristics are also typical for ADHD children. Moreover, information about ADHD in the media is gendered, as boys are more often than girls presented as showing ADHD symptoms, thus making ADHD a predominantly male phenomenon (Horton-Salway, 2013).

In-service versus pre-service teachers

Stereotypes about students with ADHD may be established in teachers as early as they encounter descriptions about ADHD or ADHD students in the classroom. Since stereotypes depend on contact with social groups (Hewstone, 1996), the strength of stereotypic thinking and judging may be affected by experiences with individuals showing ADHD symptoms. Research has shown that pre-service teachers, who usually have gained only limited experience in the classroom, know less (Kos et al., 2004) and react differently on the labeling of students as having ADHD compared to in-service teachers with classroom experience on a daily basis. For instance, Ohan et al. (2011) demonstrated that undergraduate students enrolled in an elementary teacher study program judged children with ADHD as being more disruptive and their behavior problems to be more serious compared to in-service teachers. They discussed as a possible explanation of this effect the lack of experience of pre-service teachers which may result in being more overwhelmed by challenging ADHD-related behaviors than in-service teachers. Hence, pre-service teachers might develop more negative expectations about students diagnosed with ADHD, and therefore would presumably opt more often for a lower school-track compared to in-service teachers.

Research question, hypotheses, and rationale of the study

When a person encounters an individual member of a social category that matches a stereotype, behavior toward that individual is likely to be consistent with that stereotype (Lord et al., 1984). For instance, teachers may view students with ADHD more frequently to be easily distracted and disruptive than students without ADHD (Anderson et al., 2012). The more cues are available that are consistent with the stereotype, the more likely it is to be activated (Casper et al., 2010). For example, the stereotype of an ADHD student is presumably more likely to be activated if this student is male (Brown & Stone, 2016) or known to behave inappropriately in the classroom (Anderson et al., 2012), compared to being female or behaving appropriately. Once the stereotype is activated, it is likely to affect teachers’ decisions about students who may belong to the stereotyped group. Even if stereotype-inconsistent information is available (e.g., an ADHD-diagnosed student who shows appropriate behavior), stereotypical expectations may still be relevant for judgments (Bodenhausen & Lichtenstein, 1987; Fiske & Neuberg, 1990).

The present study aimed at examining whether in-service and pre-service teachers’ track decisions would be affected by their knowledge of the ADHD diagnosis of their students. Due to stereotypes about ADHD, it was assumed that teachers would expect a student with ADHD to be low achieving in the future. Expectations of future low achievement would more likely correspond with teachers’ decisions for the non-academic school track than for the academic school track. It was also hypothesized that the effect of students’ ADHD diagnosis on teachers’ track decisions would be moderated by student gender and the students’ school-related behavior. The effect of teachers’ knowledge about the ADHD diagnosis on their track decisions should be larger if the students were male rather than female, or if they showed inappropriate rather than appropriate behavior. Moreover, it was assumed that students’ achievements, represented by their grade point average (GPA), would affect the teachers’ track decisions. Higher achievements should result in a higher likelihood of decisions favoring the highest track than lower achievements. In addition, student gender and students’ school-related behavior should not only moderate the ADHD diagnosis effect but are also expected to result in main effects. When judging the eligibility of the students for attending the highest track, teachers should favor girls over boys and appropriately behaving students over inappropriately behaving students. Finally, we expected an effect of the occupational status of the participants, with pre-service teachers being more affected by the labeling of the students as having ADHD than in-service teachers. In total, four additive effects and three interaction effects were expected to occur: a main effect of GPA, a main effect of student gender, a main effect of students’ behavior, a main effect of ADHD diagnosis, an interaction between ADHD diagnosis and gender, an interaction between ADHD diagnosis and behavior, and an interaction between ADHD diagnosis and occupational status of the participants.

Method

Vignettes were used that mimicked study report cards of 16 primary-school students. The students’ grade point average (low or high), their ADHD diagnosis (absent or present), their school-related working and social behavior (negative or positive), and their gender (male or female) were orthogonally varied, thus creating 16 different combinations of student characteristics. All participants received all 16 student vignettes to judge whether each of them should be recommended for the highest track in secondary education.

Participants

Based on previous studies (e.g., Klapproth et al., 2018; Klapproth et al., 2019), a large effect (odds ratio ≈ 5.0) of students’ GPA and a small-to-medium effect of students’ gender (odds ratio ≈ 1.5) on participants’ decisions were expected to occur. Since evidence regarding the effects of ADHD and school-related behavior on school-placement decisions is less clear, a medium effect of each of both variables (odds ratio about 2.0) was expected. A power analysis using G*Power 3.1 (Faul et al., 2009) was conducted. When prespecifying the average expected effect to be an odds ratio of 2.50, α = 0.05, 1-β = 0.80, and the estimate of the squared multiple correlation with the covariates to be R2 = 0 (since all covariates were non-correlated), power analysis yielded a total sample size of N = 41, which we deemed to be the minimum sample size.

Social media groups for pre-service and in-service teachers on Facebook and Whatsapp were used to recruit participants from different federal states in Germany. Criteria for including individuals in the sample were being in-service teachers in a German primary school or being enrolled in a primary-school teacher study program at a German university. The time during which the sampling took place was from November 3, 2021, to December 8, 2021. A total of N = 46 participants finished the study. Of these, n = 44 (91.3%) were female and n = 4 (8.7%) were male. The distribution of gender in this sample was similar to the distribution of primary-school teachers in Germany (Statistisches Bundesamt, 2021). About half of the participants (n = 24) were in-service primary-school teachers, n = 22 participants were pre-service teachers, enrolled in a primary-school teacher study program. Eighteen (39.1%) participants did not mention their age. From the remaining participants, the mean age was 37.3 years (SD = 13.1). Pre-service teachers (M = 26.6 years, SD = 5.0) were on average younger than in-service teachers (M = 48.1 years, SD = 9.2), t(26) = 7.69, p < 0.001. The gender distribution was similar between pre-service and in-service teachers, χ2 (1) = 0.01, p = 0.927. The participants were recruited across the entire country. Most of them (65.2%) were located in Berlin or Brandenburg. The place of residence of the other participants was distributed almost evenly across the majority of the remaining federal states. The distribution of the participants’ location was not significantly different between pre-service and in-service teachers, χ2 (10) = 14.14, p = 0.167. Thirty-five of all participants (76.1%) stated that they had been in contact with a child that was suspected of having ADHD, and 82.6% of the in-service teachers disclosed that they already had gained experience with teaching ADHD-diagnosed students.

Materials

Each vignette displayed a report card of a fictitious student consisting of the student’s name, his or her grades, and notes about his or her behavior and if he or she was diagnosed with ADHD.

Students’ gender was indicated by their names which were shown on top of the vignette. The names used were common for German girls and boys. Common names were chosen to prevent the activation of other concepts such as a certain socioeconomic background by rarely used names that are especially frequent in certain social and economic milieus (Gerhards, 2010). Figure 1 displays an example of a student vignette.

Fig. 1
figure 1

Example of a student vignette

Each vignette contained nine grades, each of which was related to an unspecified school subject. School subjects were not specified because subject-related effects should be avoided. The grades followed the German grading system and varied on a numerical scale between 1 (meaning “very good,” which corresponds to an A on the ECTS grading scale) and 4 (meaning “just sufficient,” which corresponds to an E on the ECTS grading scale). The GPA of the students represented a medium level of achievement, which does not automatically result in a decision in favor of or against the highest school track, but instead indicated an area of achievement where highest-track recommendations are possible but not mandatory. Two values of the GPA were realized: For rather high-achieving students, the GPA was 2.33, whereas for rather low-achieving students, the GPA was 2.67.

Each vignette was supplemented by a description of the students’ school-related social and working behavior. This description was adopted from the criteria of social and working behavior of students proposed by Behn and Rohrbach (2018). Two categories of behavior were created: positive behavior, which—according to Behn and Rohrbach (2018)—meets teachers’ expectations, and negative behavior, which is supposed to not meet teachers’ expectations. Positive social behavior was indicated if the students behaved empathically, acted reflected, and obeyed to rules. The students’ working behavior was positive if they displayed a high level of achievement motivation, showed cooperation skills in open work times, and motivated others. Negative social behavior was displayed if the students did not reflect what their actions mean to others and if they did not obey to rules and agreements. Negative working behavior was indicated if the students did not show cooperation with other students and if they frequently forgot their homework and work material.

Half of the students had an ADHD diagnosis, which was displayed by the sentence “The student was diagnosed with ADHD.” For the other half of the students, no information about a diagnosis was given.

The four independent variables (GPA, gender, behavior, diagnosis) were varied orthogonally, resulting in a completely crossed 2 (GPA: low vs. high achievement) × 2 (gender: male vs. female) × 2 (behavior: negative vs. positive) × 2 (diagnosis: no ADHD vs. ADHD) within-subjects factorial design.

The dependent variable was the decision of the participants concerning the track they would recommend for each student. Currently, most German federal states offer two distinctive tracks in secondary school. A non-academic “lower” track is provided for students with low to average achievement profiles. Students who attend this track usually acquire vocational qualifications. An academic “higher” track (“Gymnasium”) offers students with above-average achievement profiles the qualification for university entrance when the students accomplish this track. The participants were given two options, which were “in favor of the highest track” or “not in favor of the highest track” at the end of each student description.

Additionally, data about the participants’ sociodemographic background were collected. This included information about the participants’ age and gender, their migration background, their occupational status, the location of their study or workplace, and whether they had experience with ADHD-diagnosed children.

A manipulation check was conducted at the end of the study. Four variables were realized to assess whether the participants correctly (or incorrectly) remembered details about the student vignettes presented. This manipulation check was done to ensure that the manipulation of the student characteristics in the vignettes had been recognized by the participants. The participants were asked whether or not the students differed regarding their gender, GPA, diagnosis, and ethnic origin. The participants could respond with either yes or no.

Procedure

The study was conducted online via www.unipark.com, and it was part of the second author’s master thesis. The study was online for 5 weeks. Before starting with the instructions, the participants were told that their participation was entirely voluntary and that they could abort participation anytime and without mentioning any reason. Furthermore, they were informed that all data they would produce would be anonymized for further processing. By pressing a button, the participants gave informed consent about their participation.

The participants were then instructed to imagine that they were the students’ teachers in their last year of primary school and had to decide whether they would recommend them to the highest track in secondary school or not. To reach a decision, they were encouraged to base their recommendation on the students’ grades and their school-related behavior. After the general instruction, an example task followed to help familiarize the participants with the procedure. After the example task, the student vignettes were presented in random order. A new vignette was shown on screen after the preceding vignette was closed by the participants. After giving school-track recommendations for all 16 vignettes, the manipulation check was conducted wherein the participants were asked questions about whether the students presented had differed regarding their gender, diagnosis, migration status, and behavior. At the end of the study, the participants were asked to provide information about their sociodemographic background.

Data analyses

Logistic regression analysis was used for estimating the effects of the independent variables on the binary variable school-track decisions. To estimate the logistic regression parameters, a generalized linear model approach was applied. With this technique, the logit function is used as a link function that maps the parameters of the binomial distribution to the parameters of the normal distribution. Since in the present study the participants were to judge all students and therefore all observations were more or less correlated, the generalized estimating equation (GEE) approach was used (Liang & Zeger, 1986) to test the hypotheses. In this approach, observations are assumed to be dependent within subjects and independent between subjects. The binary predictors in the regression models were the GPA (0 = low achievement, 1 = high achievement), the ADHD diagnosis (0 = no, 1 = yes), students’ working and social behavior (0 = negative, 1 = positive), and student gender (0 = male, 1 = female). Since in the hypotheses four main effects were predicted, at first, a regression model was estimated that contained only main effects (Model 1). Adding interaction effects to the main effects resulted in Model 2, showing both two-way and three-way interactions. A third model was estimated which examined whether the occupational status of the participants (pre-service versus in-service teachers) affected their recommendations.

Results

Table 1 shows the mean proportions of decisions favoring the highest track and the respective standard deviations for each condition. Table 2 shows the results of the regression analyses.

Table 1 Means and standard deviations of the proportions of teachers’ decisions favoring the highest track as a function of students’ GPA, the presence of an ADHD diagnosis, students’ working and social behavior, and student gender
Table 2 Results of multilevel logistic regression analyses

Three regression models were estimated. Model 1 entailed only main effects. Model 2 entailed both the main effects and all interaction effects. In Model 3, the effects of the occupational status of the participants (pre-service versus in-service teachers) were included. The use of all predictors (GPA, ADHD, student behavior, student gender, and participants’ occupational status) in one single regression model was avoided for two reasons: first, a possible five-way interaction is hard to interpret; second, regression models with a large number of predictors are likely to produce sparse data. Sparseness of data refers to have zero or very small frequencies of the dependent variable in some combinations of the predictors. Sparseness can bias tests of model deviance and Wald tests, and it also decreases the power of statistical tests in logistic regression and might even cause difficulties in the estimation of the parameters (Cohen et al., 2003).

In Model 1, two predictors were found to be significant. The omnibus test indicating the significance of the model was χ2 (4) = 173.72, p < 0.001. The resulting logistic regression equation reads as follows:

$$\mathrm{Predicted\;logit\;of\;track\;decision}=-1.20+{1.47}^{*}\mathrm{\;GPA}-{0.18}^{*}\mathrm{\;ADHD}+{1.70}^{*}\mathrm{\;Behavior}+{0.13}^{*}\mathrm{\;Gender}$$

As hypothesized, there was a significant main effect of the GPA such that when students had a high GPA, they were 4.36 times more likely to be recommended to the highest track relative to students with a low GPA. In addition, there was a significant main effect of behavior. When students were described as behaving positively, the likelihood of a decision favoring the highest track increased by factor 5.46. No other main effect was significant, meaning that both student gender and the labeling of the students as having ADHD did not statistically affect the participants’ decisions.

A post hoc power analysis was conducted using the software package G*Power 3.1 (Faul et al., 2009). Whereas the observed power for detecting the main effects of GPA and behavior was high (both 0.99), the observed power for detecting effects due to ADHD and gender was comparatively low (0.15 and 0.11, respectively).

In Model 2, 11 interaction terms were added to the main effects, which resulted in the following regression equation:

$$\mathrm{Predicted\;logit\;of\;track\;decision}=-1.28+{1.91}^{*}{\text{GPA}}-{0.13}^{*}\mathrm{ ADHD}+{1.91}^{*}\mathrm{ Behavior}+{0.12}^{*}\mathrm{ Gender}-{0.67}^{*}\mathrm{ GPA}\times {\text{Gender}}-{0.98}^{*}\mathrm{ GPA}\times {\text{Behavior}}-{0.41}^{*}\mathrm{ GPA}\times {\text{ADHD}}-{0.03}^{*}\mathrm{ Gender}\times {\text{Behavior}}-{0.12}^{*}\mathrm{ Gender}\times {\text{ADHD}}-{0.14}^{*}\mathrm{ Behavior}\times {\text{ADHD}}+{1.36}^{*}\mathrm{ GPA}\times {\text{Gender}}\times {\text{Behavior}}+{1.02}^{*}\mathrm{ GPA}\times {\text{Gender}}\times {\text{ADHD}}+{0.54}^{*}\mathrm{ GPA}\times {\text{Behavior}}\times {\text{ADHD}}+{0.12}^{*}\mathrm{ Gender}\times {\text{Behavior}}\times {\text{ADHD}}-{1.12}^{*}\mathrm{ GPA}\times {\text{Gender}}\times {\text{Behavior}}\times {\text{ADHD}}$$

The result of the omnibus test of Model 2 was χ2 (15) = 179.75, p < 0.001. As compared to Model 1, the goodness of fit, shown by the quasi-likelihood under the independence model criterion (QIC), was slightly larger with Model 2, indicating that the fit of Model 1 was slightly better than that of Model 2. This was the case because—compared to Model 1—much more terms were included in Model 2, of which most were insignificant and therefore did not contribute to the explanation of the data. The QIC, however, is sensitive to the number of predictors included in the model, and its value increases with the number of predictors (Cohen et al., 2003).

Model 2 revealed that—as in model 1—two main effects (GPA and behavior) were significant. Moreover, Model 2 yielded a significant GPA × gender × behavior interaction and a close-to-significant (p = 0.052) GPA × behavior interaction. All other interaction terms were not significant. The GPA × behavior interaction means that when all other predictors (ADHD and gender) were set to zero (which corresponds to male students without an ADHD diagnosis), the difference in the logits between positively and negatively behaving students was Diff = 1.91, if the GPA indicated low achievement, and Diff = 0.93, if the GPA indicated high achievement. This means that the effect of students’ behavior on the participants’ track recommendations was more pronounced when they were low achievers rather than high achievers.

The GPA × gender × behavior interaction reflects the dependency of the GPA × behavior interaction on the students’ gender. This interaction could be described as the differences in logits between positively and negatively behaving high achievers on the one hand and positively and negatively behaving low achievers on the other hand, depending on student gender. For boys, the difference was Diff = 0.93 for high achievers and Diff = 1.91 for low achievers, meaning that the participants’ evaluation of low-achieving boys was more related to their behavior than the evaluation of high-achieving boys. For girls, however, the difference for high achievers was Diff = 2.26 and Diff = 1.88 for low achievers. Hence, behavior was more important for the participants’ recommendations when the girls were high instead of low achievers. Figure 2 shows the triple interaction graphically.

Fig. 2
figure 2

Logits of the probability of high-track recommendations, depending on student gender, level of achievement, and behavior. The lower panel shows logits from students with ADHD diagnosis, the upper panel logits from students without ADHD diagnosis. “Low-ach.-neg.” means low achievement (GPA = 2.67) and negative behavior, “low-ach.-pos.” means low achievement (GPA = 2.67) and positive behavior, “high-ach.-neg.” means high achievement (GPA = 2.33) and negative behavior, and “high-ach.-pos.” means high achievement (GPA = 2.33) and positive behavior. Logits > 0 mean that recommendations for the highest track were more likely than recommendations for a lower track, logits < 0 mean that recommendations for a lower track were more likely than recommendations for the highest track, and logit = 0 means that both recommendations were equally likely

To extract more meaning from the triple-interaction effect, simple slope tests were conducted by choosing the different values for GPA and behavior at which the significance of the simple slope for the regression of student gender on the dependent variable was evaluated (cf. Aiken & West, 1991; Preacher et al., 2006). As the triple-interaction effect was unexpected and simple slope tests were conducted post hoc and explanatory in nature, a correction for multiple testing after Bonferroni was necessary (Dawson & Richter, 2006), according to which the accepted significance level α was divided by the number of tests carried out (Miller, 1981). Simple slope tests revealed that after Bonferroni correction (α/8), no slope was significantly different from zero; however, slopes were highest for high achieving-negative and high achieving-positive students without ADHD diagnosis.

The results for students without ADHD diagnosis were as follows: low achieving-negative: B = 0.12, Wald-χ2 (df = 1) = 0.14, p = 0.705; low achieving-positive: B = 0.10, Wald-χ2 (df = 1) = 0.20, p = 0.654; high achieving-negative: B =  − 0.54, Wald-χ2 (df = 1) = 3.81, p = 0.051; high achieving-positive: B = 0.79, Wald-χ2 (df = 1) = 4.16, p = 0.041. Similar results were obtained for students with ADHD diagnosis: low achieving-negative: B ≈ 0.00, Wald-χ2 (df = 1) = 0.00, p = 0.999; low achieving-positive: B = 0.09, Wald-χ2 (df = 1) = 0.09, p = 0.763; high achieving-negative: B = 0.36, Wald-χ2 (df = 1) = 2.07, p = 0.150; high achieving-positive: B = 0.69, Wald-χ2 (df = 1) = 2.72, p = 0.099.

In Model 3, main effects, including the main effect of participants’ occupational status (POS), and two-way interaction effects between occupational status and the characteristics of the students were estimated. The model chi-square was χ2 (9) = 229.92, p < 0.001. The resulting regression equation reads as follows:

$$\mathrm{Predicted\;logit\;of\;track\;decision}=-0.66+{1.75}^{*}\mathrm{ GPA}-{0.47}^{*}\mathrm{ ADHD}+{1.75}^{*}\mathrm{ Behavior}+{0.28}^{*}\mathrm{ Gender}-{1.34}^{*}\mathrm{ POS}-{0.30}^{*}\mathrm{ POS}\times {\text{GPA}}+{0.53}^{*}\mathrm{ POS}\times {\text{ADHD}}+{0.17}^{*}\mathrm{ POS}\times {\text{Behavior}}-{0.28}^{*}\mathrm{ POS}\times {\text{Gender}}$$

Model 3 revealed—in addition to a significant main effect of GPA and a significant main effect of behavior—a significant main effect of the participants’ occupational status. In-service teachers were about four times less likely (odds ratio = 0.26) to give a highest-track recommendation than were pre-service teachers. Besides these main effects, no interaction term was significant. This result means that in-service and pre-service teachers did not respond differently concerning the characteristics of the students. The goodness of fit of Model 3 was lower than that of Model 1 and Model 2, which is due to the third significant predictor.

Finally, a manipulation check was conducted, which examined whether the participants remembered that the descriptions of the students differed across some characteristics. The characteristics considered were the students’ GPA, their ethnic origin, their psychiatric diagnosis, and their gender. It was expected that the participants would recognize the variation of GPA, diagnosis, and gender, but not of ethnic origin, since the latter variable had not been manipulated in the vignettes. Chi-square tests revealed that more participants than expected by chance remembered the variation of GPA, χ2 (1) = 43.87, p < 0.001; diagnosis, χ2 (1) = 14.70, p < 0.001; and gender, χ2 (1) = 10.52, p = 0.001. Correspondingly, fewer participants than expected by chance (falsely) remembered the variation of ethnic background, χ2 (1) = 69.70, p < 0.001.

Discussion

The aim of the current study was to find evidence for possible effects of the labeling of students as having ADHD on teachers’ decisions for a secondary-school track. Theoretical considerations (Casper et al., 2010; Fiske & Neuberg, 1990) and empirical findings (Metzger & Hamilton, 2021; Walker et al., 2008) gave reason to assume that ADHD-diagnosed students would face a lower probability to receive a recommendation for the highest and most prestigious track in German secondary education than would students without an ADHD diagnosis.

As expected, logistic regression analyses produced a large significant effect of students’ achievement, which was even larger when interaction terms were included in the regression model. It was also assumed that students’ behavior would influence teachers’ decisions. This hypothesis had also been confirmed by logistic regression analysis, which yielded an effect of a similar size as did the achievement of the students. Furthermore, it was hypothesized that students’ gender would affect teachers’ decisions. Although girls were indeed slightly more likely to get a highest-track recommendation than boys, the hypothesis was rejected, since the effect obtained was not significant. With respect to the labeling of students as having ADHD, the direction of the effect obtained did also corroborate the hypothesis: students with ADHD were 17% less likely to be recommended for the highest track than those without ADHD. However, this effect, too, was not statistically significant, meaning that the labeling of students as having ADHD did make no significant difference in teachers’ judgments. Although the effect sizes (i.e., the odds ratios) differed between the regression models, no model yielded either a significant main effect of the ADHD diagnosis or a significant interaction with ADHD.

Besides the main effects, three interaction effects were expected to occur. First, an effect of the ADHD diagnosis on teachers’ recommendations should be more pronounced in boys than in girls. Second, it was assumed that the effect of the ADHD diagnosis would be stronger in students showing rather negative social and working habits in school than in students whose school-related behavior was rather positive. Both hypotheses reflected the assumption that stereotypes regarding ADHD would be elicited more easily and would guide teachers’ decision-making more strongly when the characteristics of the students are in line with the contents of the stereotype (Casper et al., 2010). In this context, students behaving inappropriately or being male should fit well the contents of the ADHD stereotype, whereas students with positive school-related habits or girls should provide a lower fit and should therefore inhibit the activation of the ADHD stereotype. Contrary to the assumptions, both interaction effects failed to become significant. Neither the behavior nor the gender of the students affected the activation of an ADHD-related stereotype.

The absence of both a main effect of ADHD diagnosis and ADHD-related interaction effects on the participants’ judgments suggests that either the participants were free of stereotypes related to ADHD, or that stereotypes were present but could not be activated by the instruction and the material presented to them. Previous studies, however, have obtained similar results. For instance, Cornett-Ruiz and Hendricks (1993) presented teachers with videotapes, which depicted behavior that was either generated by children with or without an ADHD diagnosis. Additionally, the children shown were labeled as either having an ADHD diagnosis or not having this diagnosis (independent of their actual diagnosis). The task of the participating teachers was to rate their impression about the presented children and to predict the children’s long-term success in school. The authors found that whereas the actual behavior affected their ratings and predictions, the labeling had no effect. Cornett-Ruiz and Hendricks (1993) reasoned that their participants’ exposure to and experiences with ADHD children might have mitigated negative responses and instead facilitated judgments of these children based on individual characteristics rather than stereotypical presumptions. This reasoning is confirmed by results obtained from studies where labeling effects regarding ADHD were related to the experience of the participants with ADHD-diagnosed individuals. For example, Ohan et al. (2011) could show that teachers’ experience with teaching ADHD students reduced the effect of ADHD labels on teachers’ willingness to support students with class-based behavioral programs. Similarly, Barr and Bracchitta (2008) demonstrated in a sample of undergraduate students that the frequency of contact with individuals with behavioral disabilities was negatively associated with stereotypical misconceptions about the behavior of disabled individuals. In general, contact seems to have beneficial effects on reducing stereotypes and prejudice (Corrigan & Shapiro, 2010). Actually, most of the participants of the current study reported having had contact with ADHD children, and this contact might indeed have inhibited stereotypical judgments.

However, and rather unexpectedly, two interaction effects related to students’ behavior became significant, or close to significant, respectively. First, the behavior × GPA interaction revealed that the behavior of the students did matter more for teachers’ decisions when the students were low rather than high achievers. Second, the behavior × GPA × gender interaction qualified the former interaction, as it showed that the behavior × GPA interaction was dependent on student gender. More specifically, the effect that when students were high achievers, behavior did matter less, was more pronounced for boys than for girls. For girls, however, the opposite was the case: behavior affected teachers’ decisions more when girls were high rather than low achievers.

This triple interaction might give some insight into the cognitive processes of the participants in this study. The way student gender affected teachers’ decisions might be related to what teachers usually think about and expect from boys and girls in the classroom. As studies have shown, girls show on average a more positive attitude toward school than boys (OECD, 2004) they enjoy going to school more than boys (Segeritz et al., 2010; Van Ophuysen, 2008), and they show more positive approaches to learning such as attentiveness and task persistence than boys (Ready et al., 2005). Teachers are usually well aware of these differences between boys and girls (Åhslund & Boström, 2018) and therefore are likely to hold different expectations with regard to school-related habits for boys and girls (Reyna, 2000; Siegle & Reis, 1994). According to the significant gender-related effects that were found in this study, participants’ stereotypical thinking about boys and girls appeared to be dependent on their achievement and their school-related behavior. More precisely, when the participants were presented with a high-achieving female student who was described as behaving negatively, the stereotype about girls might have been violated, because showing inappropriate social and working habits contradicts the stereotypical image of a girl (Koenig, 2018). The violation of the girls-specific stereotype might have caused teachers to devalue her achievement and hence lowered her chance of getting a highest-track recommendation, compared to a high-achieving, but well-behaving female student. However, with high-achieving male students, the difference between appropriately and inappropriately behaving individuals was less pronounced. The results of the present study indicate that boys’ school–related behavior did not violate the participants’ stereotypical expectations with the same amount as that of girls. There is evidence in the literature that violation of gender-related stereotypical expectations leads to lower ratings of both boys and girls, but with stereotype-violating girls to a larger amount than with stereotype-violating boys (Sullivan et al., 2018).

In Model 3, the assumed interaction between the participants’ occupational status and the ADHD diagnosis did not occur. However, in-service teachers were found to be much less prone than pre-service teachers to endorse highest-track recommendations for the students.

Literature about teacher assessment literacy may give some hints about possible reasons. Assessment literacy refers to teachers’ knowledge about basic principles of sound assessment practice (Paterno, 2001). Several studies have compared the assessment literacy of pre-service and in-service teachers. For instance, Mertler and Campbell (2005) found that in-service teachers scored higher than pre-service teachers (meaning they were more assessment literate) on their classroom assessment literacy inventory. The superiority of in-service over pre-service teachers was obtained predominantly on two scales of the inventory, measuring (a) the ability to analyze assessment results to identify students’ strengths and weaknesses, and (b) using assessment results when making decisions about individual students. Hence, with respect to assessment literacy, the in-service teachers of the study at hand were presumably better able to identify the students’ strengths and weaknesses from the grades presented in the school reports and to use these grades to make an appropriate decision that was best suited to the needs of the students, than were the pre-service teachers.

Moreover, studies have shown that pre-service teachers tend to evaluate students’ performance from a student perspective, meaning that they focus on how it is to be assessed, whereas in-service teachers rather apply a teacher perspective, meaning that they focus on how it is to assess others (Smith et al., 2014). Since pre-service teachers have had limited field experience, their beliefs about assessment are more likely to arise out of personal and recent experiences as students, not as teachers (Crossman, 2004). Hence, in comparison to in-service teachers, pre-service teachers might have different and perhaps naïve beliefs of assessment (Chen & Cowie, 2016), which might have led them to more lenient track decisions.

The results of Model 3 regarding the absence of significant interaction effects between the participants’ occupational status, and the students’ characteristics, are in line with some previous results. For instance, Liang and Gao (2016) found no differences between pre-service and in-service teachers concerning their knowledge of ADHD and their attitudes toward students with ADHD. Similarly, Jerome et al. (1999) and Bekle (2004) found that both pre-service and in-service teachers possessed sound information about ADHD and shared similar attitudes. Moreover, knowledge about ADHD was positively correlated with (positive) attitudes toward ADHD in both groups. Additionally, in the Bekle (2004) study, both groups expressed interest in receiving more training to help manage ADHD children in the classroom.

Overall, the absence of the ADHD-related effects may also be due to a social desirability bias. Social desirability bias refers to the tendency of participants to choose responses they believe are more socially acceptable rather than choosing responses that truly reflect their thoughts and emotions (Grimm, 2010).

Limitations

The results of the current study should be interpreted in light of three limitations. First, although the actual number of participants was slightly higher than the recommended number provided by the power analysis, raising the number of participants would likely have produced more significant results. This seems to be particularly true for the main effects obtained regarding students’ gender and ADHD diagnosis.

Second, grades were not associated with specific school subjects. This might have been confusing for some participants, as in practice grades are always related to school subjects. However, the primary aim of the study was to examine whether stereotypical attributes of the students affected the participants’ judgments. If grades had been assigned to specific school subjects, students would likely have been judged differently, based on the grades in distinct school subjects. For instance, good grades in math could have led the participants to favor boys over girls (Cvencek et al., 2015), whereas girls would likely be favored over boys when grades were good in languages (Lorenz et al., 2016). Hence, possible effects related to the weighting of school subjects would have overshadowed or even pronounced the effects of the stereotypical attributes of the students.

Third, since the participants were presented with vignettes displaying information about fictitious students, the external validity of the study might be extenuated. That is, decisions made by teachers in the classroom might not be alike decisions made by teachers in the laboratory or an online experiment. Effects obtained in experiments usually are larger than effects obtained in non-experimental field studies (e.g., Lovakov & Agadullina, 2021) because in natural settings, a lot more variables interfere and interact with the variables that are controlled for. Hence, the effects obtained in this study might not be generalized to real settings, at least with respect to their size.

Conclusions and implications

Stereotypes related to ADHD are deeply rooted in society. They are displayed and experienced by many individuals, particularly by teachers and students. Their impacts, as shown by numerous scientific studies, can be devastating to those affected (Nguyen & Hinshaw, 2020). This study, on the contrary, sheds a positive light on teachers. It shows that—at least in this sample—in-service and pre-service teachers were not prone to stereotype students according to a label presenting them as diagnosed with ADHD. Instead, the participants of the current study met their obligations when recommending primary-school students, displayed by vignettes, for one of two possible tracks in secondary school. That is, they judged the students’ achievement, represented by their grade point average and their behavior, in the first place. Yet, despite official regulations guiding school-placement recommendations in all German federal states, the participants valued achievements as well as the behavior of male and female students differently. In particular, behavior mattered more for girls than for boys, when they were high-achievers. This result is likely to be affected by gender stereotypes held by the participants. Thus, the recommendations made by the teachers were not bias-free; however, the bias was rather small, compared to similar studies (e.g., Author et al., 2019).

The results of the present study are therefore encouraging. Perhaps progress has been made, and teachers are now less influenced by diagnostic labels than they are often supposed to be (Koonce et al., 2004; Ohan et al., 2011). Rather than being biased by a label, the evidence provided by the present study suggests that both achievement and behavior of students have a stronger influence on subsequent judgments than the absence or presence of a mere label. Hence, the authors abstain from doing what often is done at the end of a paper, namely to recommend teacher training to overcome social stereotypes. It is just recommended to do more research on this topic, both in the laboratory and in the field, to monitor teachers’ (and pre-service teachers’) proneness to stereotype students.