Development and psychometric evaluation of an instrument measuring subjective definitions of inclusion (FEDI)

ABSTRACT Within educational and social discourses, the term ‘inclusion’ has various meanings. In both research and practice, there is no official definition of inclusion. Nevertheless, subjective definitions affect pedagogical acts. We developed the Definitions of Inclusion Questionnaire [Fragebogen zur Erfassung subjektiver Definitionen von Inklusion] (FEDI) as an economic instrument that takes subjective definitions of inclusion into account. This paper describes the construction and first psychometric evaluation of the questionnaire instrument that 513 participants with various professional backgrounds completed. We used exploratory and confirmatory factor analyses of two different subsamples to identify the measuring model. We found a three-factor structure with acceptable-to-good fit measures and an acceptable reliability (CR from .81 to .87). Small to medium correlations of the FEDI scales with attitudes towards inclusion and teachers’ sense of efficacy suggest that discriminant validity is given. Perspectives for further research and implications for practice are discussed.


Introduction
As Grosche (2015) and Piezunka, Schaffus, and Grosche (2017) argue, no distinct, accurate, and overall valid definition of inclusion (or inclusive education) exists, especially in the German context, which leads to problems for practitioners and researchers when planning, coordinating, and discussing inclusion-oriented processes (see also Werning 2010). Furthermore, discussing inclusion from different perspectives of understanding without acknowledging the differences may lead to the depreciation of others' views (Grosche 2017a(Grosche , 2017b. Although numerous papers have tried to define inclusion systematically (Ainscow, Booth, and Dyson 2006;Farrell et al. 2007;Fuchs and Fuchs 1998;Göransson and Nilholm 2014;Kavale and Forness 2000;Loreman 2014), definitions remain diverse.
Data from inclusion research containing descriptions of, attitudes towards, or beliefs about inclusion may be influenced by implicit or subjective definitions. Lüke and Grosche (2018c) showed that participants' attitudes towards inclusion are highly influenced by the perceived attitudes of the surveying organisation. Although this relates to social desirability, it indicates that anticipated, but not explicitly given, definitions of inclusion may bias findings.
'Subjective' definitions refer to subjective theories that Mandl and Huber (1983, 98) define as a 'system of cognitions that in its structure is seen as comparable to a scientific theory' on a specific topic. Until now, the subjective definitions of inclusion have not been considered sufficiently in research (Przibilla, Linderkamp, and Krämer 2018).
Several studies have used qualitative approaches to gather data about subjective inclusion definitions. Makoelle (2014) identified three ways to define inclusion, and Przibilla, Linderkamp, and Krämer (2018) identified 27 categories for teachers' subjective definitions of inclusion and arranged them into nine dimensions.
Another qualitative study by Piezunka, Schaffus, and Grosche (2017) focussed on definitions from expert interviews with researchers in the field of inclusion. The interviews first asked, 'What does inclusion mean to you?' This was followed by questions covering goals of inclusion, forms of legitimation of inclusion, distinction from integration, inclusion target group(s), inclusion feasibility, inclusion practice, and a one-sentence summary of inclusion.
These dimensions served as categories for qualitative content analysis, from which Piezunka, Schaffus, and Grosche (2017) identified four distinct definitions: (1) 'inclusion as implementing the UN-CRPD', (2) 'pragmatic definition with a focus on supporting students' individual academic progress', (3) 'participation/acknowledgement/wellbeing', and (4) 'inclusion as a utopia'. While these four definitions are distinct from each other, they share a consensual core (inclusion as overcoming discrimination), 'which makes a hierarchical development within single definitions' (Piezunka, Schaffus, and Grosche 2017, 216;authors' translation). Definition (1) is the least complex, mostly referring to the placement of people with disabilities, and definition (4) is the most complex, describing people as individuals who cannot be divided into separate groups. These findings resemble the four definitions that Göransson and Nilholm (2014) found in their review of international literature (see Table 1), which they later Table 1. Comparison of the empirical findings from Piezunka, Schaffus, and Grosche (2017) and the theoretical review from Göransson and Nilholm (2014). Piezunka, Schaffus, and Grosche (2017) Göransson and Nilholm (2014) (1) 'inclusion as implementing the UN-CRPD' (A) Placement definitioninclusion as placement of pupils with disabilities/in need of special support in general education classrooms (2) 'pragmatic definition with a focus on supporting students' individual academic progress' (B) Specified individualised definitioninclusion as meeting the social/academic needs of pupils with disabilities/pupils in need of special support (C) General individualised definitioninclusion as meeting the social/academic needs of all pupils (3) 'participation/acknowledgement/well-being' (D) Community definitioninclusion as creation of communities with specific characteristics (which could vary between proposals) (4) 'inclusion as a utopia' replicated (2017). However, detailed examination finds differences between Piezunka, Schaffus, and Grosche (2017) and Göransson and Nilholm (2014). While both studies identify placement as the least complex and least comprehensive definition, the pragmatic definition of the former includes the meaning of both the specified and general individualised definitions of Göransson and Nilholm (2014). In contrast, the community definition of Göransson and Nilholm (2014) includes both the 'participation/acknowledgement/well-being' and 'inclusion as a utopia' definitions from Piezunka, Schaffus, and Grosche (2017).
There is additional research from Krischler, Powell, and Pit-Ten Cate (2019) that references the framework from Göransson and Nilholm: They found that participants' definitions corresponded with their attitudes towards inclusion. An open question asked participants to define inclusion and the answers were categorised by using the first three definitions of Göransson and Nilholm. Furthermore, participants filled out a quantitative measurement questionnaire concerning attitudes towards inclusion. Differences in attitudes across the three groups (placement definition, specified individualised definition, and general individualised definition) were analysed using ANOVA that 'indicated differences in attitudes for people providing different definitions of inclusion' (Krischler, Powell, and Pit-Ten Cate 2019, 639): More specifically, people defining inclusion according to the 'General individualised' definition held significantly more positive attitudes than the people defining inclusion according to the 'Specified individualised' definition […] and to the 'Placement' definition, […]. Additionally, people defining inclusion according to the 'Specified individualised' definition held significantly more positive attitudes than the people according to the 'Placement' definition […]. (Krischler, Powell, and Pit-Ten Cate 2019, 639) Thus, the findings from Krischler, Powell, and Pit-Ten Cate (2019) imply that we must consider subjective definitions of inclusion when analysing how attitudes towards inclusion work and develop. Scheer (2020) conducted qualitative interviews with school principals about their leadership roles for inclusive education. The results from qualitative content analysis indicated that principals who provided different definitions of inclusion also provided different reasoning for their measures to foster inclusive school development. Although the framework from Piezunka, Schaffus, and Grosche (2017) was used in this study, the 'pragmatic definition' had to be split into two categories ('focused on various lines of difference' vs. 'focused on special educational needs'). This reflects the 'general individualised' vs. 'specified individualised' definitions from Göransson and Nilholm (2014). Beside the finding that, for actual practice, subjective definitions of inclusion seem to be powerful, this also indicates that the distinction between a 'wide' understanding of inclusion ('focused on various lines of difference' or 'general individualised') and a 'narrow' understanding of inclusion ('focused on special educational needs' or 'specified individualised') must be considered as a category of its own when developing a questionnaire to measure subjective definitions of inclusion.
In qualitative interviews with school principals, Graham and Spandagou (2011) asked the participants about their approaches to inclusive education. From their analysis, Graham and Spandagou elaborate that 'the more culturally diverse the school, the more expansive the view of inclusive education' (Graham and Spandagou 2011, 227).
One of the main findings was that the participants' conceptualisation of inclusion found its expression in the distinction between 'including them' and 'being inclusive'. Overall, the authors conclude that 'school principals' attitudes towards inclusive education and their success in engineering inclusive practices within their school are significantly affected by their own conception of what "inclusion" and "being inclusive" mean' (Graham and Spandagou 2011, 233). In another study, Salisbury (2006) combined measures of schools' inclusiveness (for example, percentage of students educated outside the general education classroom) with qualitative interviews conducted with the schools' principals: In our sample, the views and commitment to inclusive education appeared to affect the decisions rendered by principals as they guided the development of their school's service de-livery model. Several principals in our sample chose to view inclusive education as an agenda for reform, whereas others saw it as an exercise in compliance with LRE [least restrictive environment] provisions. (Salisbury 2006, 79-80) Furthermore, they concluded that the 'absence of a consensus definition of inclusive education in the field makes comparisons across studies difficult' (Salisbury 2006, 81). From our point of view, this implies the need for integrating measures of participants' conceptualisations of inclusion within empirical studies on inclusive education. Starczewska, Hodkinson, and Adams (2012) concluded from interviews with teachers about their conceptualisation of inclusion and integration that inclusion seems to be 'employed at the level of policy rhetoric […] [and] one of those educational buzzwords that […] says everything but says nothing' (Starczewska, Hodkinson, and Adams 2012, 168). They found that teachers, when asked about inclusion, experienced a terminological ambiguity and struggled to provide a substantive definition. However, when asked about integration, the participants' answers included more concrete measures. Thus, we can conclude that teachers should be provided with example statements so that they can evaluate how these statements reflect their understanding of inclusion. For research on inclusive education, this justifies the use of a rating scale questionnaire like the instrument described in our paper.
While the studies described in this section provide in-depth insights into subjective inclusion definitions and their importance, there is a need for an economic quantitative instrument that can capture differences in subjective definitions. This paper introduces the first version of the Definitions of Inclusion Questionnaire [Fragebogen zur Erfassung subjektiver Definitionen von Inklusion] FEDI (Egener et al. 2019;Scheer et al. 2020) and its psychometric properties. Initially, this questionnaire was developed to be independent of the personal and professional backgrounds of the respondents and to be used in a large variety of contexts (not only education). However, we see FEDI as an important tool within the context of inclusive school development research and in research on teacher education for inclusion. In these two contexts, the questionnaire can help evaluate if and how subjective understandings of inclusion affect teachers', student teachers', and principals' attitudes and self-efficacy with respect to inclusive education, and provide actual measures for developing inclusive schools, which can be assumed according to the findings of Graham and Spandagou (2011), Krischler, Powell, and Pit-Ten Cate (2019), Salisbury (2006), Scheer (2020), and Starczewska, Hodkinson, and Adams (2012).

Research questions
This study provides the first psychometric evaluation of the FEDI, which aims to fill the persistent gap in research regarding subjective definitions of inclusion.
Since subjective definitions of inclusion should comprise several dimensions, we assume that a valid measuring instrument should have a multifactorial structure. Thus, our first question considers: Q1: Does a multifactorial model of subjective definitions of inclusion show a better fit than a unifactorial model?
Although Varimax-rotation forces the factors themselves to be uncorrelated, one might expect small correlations (r < .2) between the final FEDI-subscales. Inclusion, especially inclusive education, is regulated via legislation, and legislation often pertains to human rights and ethical reasoning, which leads to interactions between different aspects of subjective definitions. However, since the factorial structure is evaluated during data analysis, we cannot predict the amount of intercorrelations. Therefore, we ask: Q2: How much do the FEDI-subscales intercorrelate? Furthermore, a valid measurement of subjective definitions of inclusion should capture descriptive cognitions rather than evaluative beliefs and attitudes. However, some interference may occur. For instance, a prior study from Krischler, Powell, and Pit-Ten Cate (2019) showed a medium main effect of participants' definitions of inclusion on their attitudes. Furthermore, attitudes towards inclusion are associated with self-efficacy (Scheer et al. 2015;Urton, Wilbert, and Hennemann 2014;Savolainen, Malinen, and Schwab 2020). Thus, for evaluating discriminant validity, we suggest that there are small to medium correlations between FEDI scores and attitudes towards inclusion and teachers' self-efficacy. Therefore, our third and fourth research questions are: Q3: How do the FEDI-subscales correlate with attitudes towards inclusion? Q4: How do the FEDI-subscales correlate with teachers' self-efficacy?

Sample and procedures
Using internal mailing lists from several German universities, we emailed an invitation to researchers, lecturers, and students from all faculties with a hyperlink to our questionnaire on LimeSurvey. Additionally, practitioners in the field of inclusive and special education recruited participants with a school-related perspective (QR code on handouts and posters). A total of 513 persons (417 female, 93 male, 3 diverse) participated. Many were university students (n = 426), of whom most were pursuing a teaching degree (n = 399). From those, the majority were studying special education (n = 179). See Tables 2 and 3 for further details.

Measures
Subjective definitions of inclusion questionnaire (FEDI) Item pool. To construct our instrument, we used the three most general and descriptive dimensions of inclusion that Piezunka, Schaffus, and Grosche (2017) used for their interview guidelines and formulated possible manifestations for each. These guidelines also reflect international research (Krischler, Powell, and Pit-Ten Cate 2019; Nilholm and Göransson 2017; Waitoller and Artiles 2013). We formulated the following dimensions: Goals and aims. These are the goals and aims that a person correlates with the term inclusion, such as 'anti-discrimination' (prefix for items: ZA for 'Ziele: Antidiskriminierung'), 'effective (educational) support' (prefix: ZF for 'Ziele: Förderung'), 'access to regular schools' (prefix: ZR for 'Ziele: Regelschule'), and 'social and political participation' (prefix: ZT for 'Ziele: Teilhabe').
Lines of difference. These are the lines of difference addressed by the term inclusion. For example, a person may feel that the term inclusion addresses 'disability' (prefix: DLB for 'Differenzlinie: Behinderung'), 'several lines of difference and/or intersectionality' (prefix: DLU for 'Differenzlinie: Unterschiedliche'), or 'overcoming social construction of categories' (prefix: DLD for 'Differenzlinie: Dekategorisierung').
Using the resulting ten theoretical aspects, we formulated 35 descriptive, objective statements. These statements comprise the pool of Likert-like rating items for the questionnaire. We applied a five-point rating scale with full verbalisation to the items (0 = not true, 1 = somewhat not true, 2 = partially true, 3 = rather true, 4 = true).
Pilot version. We administered the first draft to 10 university students and 20 field experts. They were asked to answer the questionnaire and give written feedback on all and/or on single items.
First revision. We revised the questionnaire based on the pilot feedback and response patterns. This version contained 39 items (see Table 4; the German items and English translation are presented in the electronic supplementary material, ESM-1.docx).
Final version. During analysis, the questionnaire was reduced to the final 15-item version (see subsection 'Data analysis' for details). For this, we first eliminated all items with a relative information content H < .75. Then after a first run of factor analysis, we incorporated the five items per factor with the highest loadings into the final questionnaire (items listed in Table 5). This version is licensed under CC-BY 4.0 (Egener et al. 2019; for an English version, see Scheer et al. 2020).  Disabilities. LR2 Inclusion is a legal requirement that must be met. LR3 Inclusion means implementing the legal right to participation and non-discrimination. LR4 Inclusion means implementing the legal right to education in the general school system. Goals & aims of inclusion Anti-discrimination ZA1 The aim of inclusion is to reduce discrimination. ZA2 The aim of inclusion is that no one is excluded. ZA3 The aim of inclusion is that all people are treated equally ZA4 The aim of inclusion is that all people have equal rights. Effective (educational) support ZF1 The aim of inclusion is that children receive optimal support based on their individual skills. ZF2 The aim of inclusion is that every child can learn at their own pace. ZF3 The aim of inclusion is to tailor teaching to the needs of each child. Access to regular schools ZR1 In an inclusive school system, every child / adolescent has access to a general school. ZR2 In an inclusive school system, as few of the children with disabilities as possible attend a special school. ZR3 There are no special schools in an inclusive school system. ZR4 In an inclusive school system, all children and adolescents attend school in their neighbourhood. ZR5 An inclusive school system is an integrated school system.

Teachers' Sense of Efficacy Scale (TSES)
The TSES (Tschannen-Moran and Woolfolk-Hoy 2001) was administered to participants who had a school-related professional background (n = 420) using a German translation by Sung and Melzer (2014).

Data analysis
All data analysis occurred in the R language (R Core Team 2019) using the RStudio environment (RStudio 2019). See ESM-3.docx electronic supplementary material for the full R input. First, we evaluated the descriptive statistics of all FEDI items. Due to the ordinal level of raw data, we used category frequencies (0 = not true, 1 = somewhat not true, 2 = partially true, 3 = rather true, 4 = true), median (Mdn.), and inter-quartile range (IQR) as descriptive measures. Item statistics were computed using the descript function from the ltm-package in R (Rizopoulos 2018). Additionally, we analysed the items' relative information content H using equation 1 as provided by Eid and Schmidt (2014), where p ij is the relative frequency (probability) of category j from item i, and k i is the number of categories of item i. This can be expressed as follows: Items with H < .75 were eliminated from the questionnaire before further analysis.  Note. SSB = school system-based perspective, HEB = human rights/ethics-based perspective, OAB = outcomes/achievement-based perspective. 0 = 'not true', 1 = 'somewhat not true', 2 = 'partially true', 3 = 'rather true', 4 = 'true'. Mdn. = median, IQR = interquartile range, λ CEFA = loadings from 2nd run CEFA, h 2 CEFA = communalities from 2nd run CEFA, λ CCFA = loadings from CCFA, r it = corrected item-total-correlation.
Next, we split the original sample (see ESM-2.csv electronic supplementary material for the full dataset) to minimise any bias due to repeated testing. To generate two randomised but comparable subsamples, we used stratified randomisation.
In the first subsample, we conducted a categorical exploratory factor analysis (CEFA) with Varimax-rotation using the function fa from the psych-package in R (Revelle 2019). We checked the Kaiser-Meyer-Olkin and the items' MSA based on the polychoric correlation matrix using the KMO and polychoric functions from the psych-package. To determine the number of factors to be retained, we considered Cattell's scree test criterion (Cattell 1966), Kaiser's criterion (Kaiser 1960) as subjective visual methods, and Velicer's minimum average partial (MAP) criterion (Velicer 1976) and parallel analysis (Horn 1965) as objective statistical methods using the nfactors and fa.parallel functions from the psych-package. Loadings from >.4 were accepted (Field 2017). We then chose proper names for the factors according to the items' content.
The CEFA factor solution was cross validated in subsample II using a categorical confirmatory factor analysis (CCFA) based on the polychoric correlation matrix. The model was specified and tested with the cfa-function from the package lavaan (Rosseel 2012(Rosseel , 2018 in R; for visualisation, we used the lavaanPlot-package (Lishinski 2018). We used DWLS as the estimator with NLMINB as the optimisation method. As fit measures, we evaluated x 2 /df-ratio, RMSEA, SRMR, CFI, and TLI with the cut-offvalues provided by Moosbrugger and Schermelleh-Engel (2012), Schermelleh-Engel, Moosbrugger, and Müller (2003), Hu and Bentler (1999), and Browne and Cudeck (1992). An LRT test for nested models (Satorra 2000) was applied to compare the onefactor and multi-factor models, using the lavTestLRT-function from lavaan.
Measurement invariance (MIV) across university students pursuing different teaching degrees was checked using the measEq.syntax-function from the package semTools (Jorgensen et al. 2019), which uses the lavaan functions. According to Svetina, Rutkowski, and Rutkowski (2020), we used theta-parametrisation, and the guidelines suggested by Wu and Estabrook (2016). According to Chen (2007), MIV is given if RMSEA increases less than .015, and CFI simultaneously decreases less than .01. Differential item functioning (DIF) was evaluated by applying a partial credit model (PCM; Masters 1982). We used the grm-function from the R-package pairwise to calculate the PCM (Heine 2020).
Since Cronbach's a underestimates true reliability if factor loadings are not equal between indicators (Cho 2016), we also used composite reliability (Netemeyer, Bearden, and Sharma 2003) as shown: wherein l i is the completely standardised loading for indicator i, V(d i ) is the variance of the error term for indicator i, and p is the number of indicators.

Results
After a first round of analysis of the descriptive item statistics (see electronic supplement material ESM-4.docx for the descriptive statistics of all original items), items with H < .75 (13 items) were eliminated.
For the following round of analyses, we divided the total sample into two randomised and stratified subsamples as described above (see Table 2).
As Figure 1 shows, the point of inflection occurs at the third factor. According to Field (2017), Cattell (1966) argued to retain the factor at the point of inflection while Thurstone argued to consider only factors to the left of the point of inflection. In our data, two or three factors could be extracted. Kaiser's criterion, however, indicated that two factors were sufficient. Since these criteria are discussed as subjective and unreliable, either parallel analysis or Velicer's MAP is recommended (Field 2017, 790;Bühner 2011, 322;Wood, Tataryn, and Gorsuch 1996;O'Connor 2000;Velicer 1982, 1986). Furthermore, extracting too many factors is preferable to extracting too few (Bühner 2011, 328). However, with ordinal data, MAP might underestimate the true number of factors (Garrido, Abad, and Ponsoda 2011). In our study, the MAP criterion supported three factors while parallel analysis supported five. Since the solutions with four and five factors (see ESM-3.docx) led to results that could not be interpreted meaningfully, we retained three factors as a compromise between the visual and statistical methods.
After running a first round of CEFA (see electronic supplementary material ESM-5.docx for loadings and communalities of all items included), we took the five items per factor showing the highest loadings and re-ran the CEFA with the resulting 15item set. Factors 1 and 3 had one item with a loading <.5 (see Table 5). The communalities of the items ranged from .274 to .658, with seven items having a communality <.4. Factor 1 ('school system-based perspective', SSB) explained 14.32% of the total variance, Factor 2 ('human-rights/ethics-based perspective', HEB) 13.89%, and Factor 3 ('outcomes and achievement-based perspective', OAB) 15.82%.
Due to the theoretical assumption that subjective definitions of inclusion are composed by the emphasis of different possible perspectives, this model should have a better fit than a unidimensional model. The unidimensional model showed a poor fit with x 2 /df-ration = 9.499, RMSEA = .183 (90%-CI = [.171, .194]), SRMR = .149, CFI = .776, TLI = .739; our three-dimensional model had a superior fit. An LRT test for nested models (Satorra 2000) indicated that this finding was significant with Dx 2 (df =3) = 191.55, p < .001.

Measurement invariance (MIV) across groups (explorative)
We used an ad-hoc-sample across different professional backgrounds because the FEDI should measure subjective definitions of inclusion independently from professional background. To ensure this, MIV must be evaluated. Since many professional backgrounds were represented by few participants, we evaluated MIV across university students studying special needs education and those studying general education.
The measurement model for all university students attending a teaching degree programme (n = 329, see Table 3) showed an acceptable fit with x 2 /df-ration = 2.832 (x 2 (df =87) = 246.342), RMSEA = .068 (90%-CI = [.058, .078]), SRMR = .070, CFI = .967, TLI = .960, which is even better than the fit within the total subsample II. According to the cut-off values suggested by Chen (2007), configural, metric, and scalar MIV across groups of students studying special needs education (n = 150) Figure 2. The model specified for the CCFA with empirical standardised coefficients and covariances. and general education (n = 179) was given (see Table 6). As Figure 3 indicates, two items from the OAB scale ('Inclusion focuses on the learning development of the students' [LO2] and 'Inclusion focuses on the motor development of the students' [LO4]) show DIF and should therefore be considered for revisions in further studies.

Item and scale properties
Overall, the items showed acceptable corrected item-total-correlations (see Table 5). SSB and HEB showed a slightly skewed distribution, and OAB was slightly peaked (Figure 4), but skewness and kurtosis (Table 7) were acceptable for all three scales (Hair et al. 2017). However, the Shapiro-Wilk test indicated a significant deviation from normal distribution with W = .98, p < .001 for the scale SSB, W = .98, p < .001 for the scale HEB, and W = .98, p < .001 for the scale OAB. Since the test for normality is sensitive to sample size, the tests should not be overinterpreted. Furthermore, we found acceptable results for all scales' internal consistency, ranging from a = .7 to .79 (Table 7). Composite reliability calculated from the CCFA-sample was also acceptable  Figure 3. DIF-analysis of the FEDI-items.
To evaluate convergent and divergent validity, we analysed the intercorrelations of the FEDI scales and their correlations with attitudes towards inclusion (PREIS) and teachers' sense of efficacy (TSES). In our data, the SSB and OAB did not correlate. The HEB showed small correlation with both other perspectives. Furthermore, the FEDI should measure subjective definitions of inclusion separated from attitudes or other evaluative cognitions. Thus, we assumed small to medium correlations would exist between FEDI scales, attitudes towards inclusions, and teachers' sense of efficacy, especially for definitions that imply an ethical imperative. Overall, our data had small correlations between scales and medium correlations between attitudes towards inclusion and HEB (see Table 8).  Note: SSB = school system-based perspective, HEB = human rights/ethics-based perspective, OAB = outcomes/achievement-based perspective. LL = lower limit of the 95%-CI, UL = upper limit of the 95%-CI. CR = composite reliability.

Discussion
CEFA and CCFA provide evidence that subjective definitions of inclusion are multidimensional constructs (Q1). This confirms our belief that these subjective definitions are influenced by a variety of individual factors and sub-definitions. Furthermore, we showed MIV across different groups of university students. This is an early indication that the FEDI measures subjective definitions of inclusion independently from professional background. However, this requires further evaluation with structured sampling strategies. An initial DIF-analysis indicated that two items (LO2, LO4) did not measure equivalently between groups. This must be evaluated further with a more sophisticated sampling strategy considering different professional backgrounds.
We found small and medium correlations between the three FEDI scales and small correlations between the FEDI scales and teachers' sense of efficacy. Attitudes towards inclusion were substantially correlated with only the HEB of the FEDI. This medium correlation can be explained by the normative imperative implied within an ethical or human rights-based perspective, because rejecting the idea of inclusion while defining it via ethical principles would mean to question one's own ethical integrity.
Although Cronbach's a was not extraordinarily high for any of the FEDI scales, we showed an acceptable amount of internal consistency, especially considering the small number of items used per scale.
Our study suggests that the FEDI is an economic instrument with acceptable reliability that can be used in inclusion research and the field. Using this instrument, practitioners can evaluate the definition of inclusion to which any participant adheres and if the teaching staff of a school share a common vision. Hence, the definitions of inclusion from the questionnaire broaden the empirical indicators for an inclusive school climate.
However, some aspects require further research evaluation. First, internal consistency and composite reliability can be used as measures of reliability but should be complemented by other methods, especially the retest method. Second, as research interest in development processes increases, the FEDI's sensitivity to change should be evaluated. Although our findings support construct validity and discriminant validity, further efforts should be undertaken to evaluate the instrument's validity.

Limitations
Some aspects of the study may influence the internal and external validity of our results. First, we disseminated the call for participation primarily via e-mail. Thus, we had an adhoc sample instead of a representative sample for a specified population. Although the stratified randomisation to build two separate samples for CEFA and CCFA is a good solution to control against repeated testing, it is still a workaround and reduces the number of participants per analysis. Therefore, replication studies must cross-validate our findings using appropriate sampling procedures.
This issue also affects the interpretation of MIV across different subgroups. We performed an explorative evaluation of MIV across two groups of university students. Since this sample was also included in the first assessment of the factorial structure, the results must be interpreted with caution and further replicated.
Finally, questionnaires on attitudes towards inclusion show a social desirability bias Grosche 2018a, 2018c). As our questionnaire did not use scales for controlling social desirability, we cannot completely determine if it measures subjective definitions. Thus, we would argue for analysing this possibility in other research projects.

Conclusions
As there is no clear definition of inclusion or inclusive education, researchers cannot know on what subjective constructs their inquiries are based. With the FEDI, we provide an economic quantitative measure that shows good psychometric properties and works well across different groups. We encourage cross-validation of our findings using qualitative approaches. Furthermore, the effect of subjective definitions of inclusion on other aspects, such as attitudes, self-efficacy, and knowledge, should be replicated and further evaluated.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributors
David Scheer, is a junior professor for inclusive education at the Annelie-Wellensiek Center for Inclusive Education, Heidelberg University of Education.
Lea Egener, M.Ed., graduated from Paderborn University and is a student teacher at Grundschule Stieghorstschule Bielefeld.
Désirée Laubenstein is a professor at the Institute for Educational Science, Paderborn University.
Conny Melzer is a professor for special needs education in the Faculty of Human Sciences, University of Cologne. ORCID David Scheer http://orcid.org/0000-0002-0534-7869