Development and Validation of an Instrument for Measuring Student Sustainability Competencies

The importance of education, and ESD in particular, for achieving sustainable development is highlighted in the formulation of the Sustainable Development Goals (SDGs). Since the Brundtland Report (1987) and the Agenda 21 conference in Rio in 1992, many measures and programs have been launched. However, no widely accepted and validated assessment instruments are currently available to examine the output levels of ESD on the student side as a means to contribute to monitoring the effects of ESD initiatives. Furthermore, connections to the results of empirical educational research are often lacking. Indeed, operationalization is necessary in order to evaluate actions of fostering ESD. Taking concepts of empirical educational and other relevant research findings (for example, psychology for sustainability) into account, this study develops a reliable and valid approach to measuring sustainability competencies. In this paper, novel data of a first school assessment is presented. One thousand six hundred and twenty-two students (aged from 9 to 16) participated in the survey. The paper-pencil questionnaire covers general (socio-demographic) as well as cognitive, affective, behavioral, applicationand curriculum-orientated aspects of sustainability competencies. The evidence for the validity and reliability of the instrument indicates that the presented assessment tool constitutes a suitable instrument by which to measure sustainability competencies in secondary schools. The gathered insights show a path towards the operationalization of sustainability competencies to clarify the needs and achievements of ESD implementation in schools.


Introduction
Education is considered a crucial element in the shift towards greater sustainability and a more sustainable world from a very early age [1,2]. Educational efforts for this purpose are generally summarized under the term Education for Sustainable Development (ESD). Most recently, ESD was placed at the center of the 2030 Sustainable Development Agenda and has been widely recognized as a key enabler of sustainable development and an integral element of quality education [3] (p. 4). The outstanding role of ESD for sustainable development is largely consensual. However, depending on the different understandings of the ESD concept [4][5][6], there is also criticism of the term itself [7,8].
In the framework of this article, it is unfortunately not possible to go into depth regarding the details and theory of these debates. To give an example, other transformative approaches or transformational concepts of education could be taken into account when dealing with ESD in a broader sense [5,9,10].
The current state of the debate can be seen as follows: Although sustainable development and ESD are widely accepted as theoretical concepts or goal dimensions, they remain without a universally agreed upon definition [11,12].
On a policy level, ESD-actions have been taken on international, national and local levels (e.g., under the UN Decade for Education for sustainable Development (2005Development ( -2014 or the Global Action Program on Education for Sustainable Development (2014-2019). However, the crucial question is whether all these measures and programs launched over the past 30 years were or are successful, or whether they have had the desired effect. Of course, it is not easy to define what is considered successful in education, nor how to measure success. Nevertheless, we should not shy away from approaching these important issues in the field of ESD through research if we want ESD to make a real contribution to the urgently needed changes in society. In particular, looking at the micro level and focusing on how to measure, for example, learning outcomes (see [13] p. 107) seem highly necessary and urgent. On this level, the effects of ESD on personal traits need to be assessed. By personal dispositions, we mean properties such as knowledge, skills, attitudes and values. However, before we look at the possibility of exploring the effects of ESD at the micro level, we first need to clarify what we mean by ESD and determine which central issues need to be considered.
In this article, the term ESD is used to describe the totality of all actions by which people seek to promote learners' sustainability competencies, i.e., enabling them to shape a sustainable development [14]. Educators in this context must tackle two outstanding tasks: First, they should determine the goals (for example learning objectives, competencies) of their pedagogical activity. Secondly, they have to decide how they want to achieve and realize these goals. The first task constitutes the prior step. Without knowing the goals, one cannot think wisely and productively about the means by which the goals can be reached.
One would think that more than 25 years after Rio de Janeiro, the goals and questions of how to approach these goals would already have been clarified. However, this is not the case. In the following, we will first draw attention to the findings that many of the objectives and competencies recommended in the field of ESD usually do not have the quality required for measurement. This condition impedes the effective development of ESD. For this reason, we have developed a framework model in which measurable goals of an ESD can be ordered and related to each other. Based on this model, we have developed and tested a measuring instrument for important ESD goals. The purpose of this article is to present a measurement instrument that could be used for analyzing student sustainability competencies with respect to the complex underlying structure of competencies and sustainability-related problems. This measuring instrument, and the first results of its testing, are the focus of this article.

The Frame-Model for Sustainability Competencies
Regarding the goals or objectives recommended for ESD, two major sources were identified. Firstly, very basic and abstract goals formulated in official international agreements; a notable example is the Agenda 21. According to chapter 36.3, there is a need to change attitudes among people so that they are able to assess and address their concerns about sustainable development [1]. Further examples are the target settings for the World Decade of ESD and the World Action Program on ESD [15,16]. In addition, there are goal recommendations which were mostly developed by researchers from the educational field, and which often achieve a higher degree of differentiation (see for example [17][18][19][20][21][22]). An analysis of existing ESD (learning) goals has revealed a significant deficit in terms of the operationalization of the ESD output [14,18]. On the positive side, it could be stated that the recommended objectives can be considered normatively well-founded, because the authors referred to accepted international agreements in the context of sustainable development.
It can still be observed that there are currently very few precise formulations of objectives and competencies in ESD that are or could be translated into measurement models and tools. From this deficiency, the following problems arise: Without operationalization, and the resulting non-existent measuring instruments, the needs for ESD and effects of ESD-related interventions (for example lessons, seminars, projects) cannot be determined empirically (see [18,23,24], p. 54).
How can this shortcoming be addressed? Recourse to existing and empirically proven measuring instruments of related disciplines (for example environmental psychology, psychology for sustainability, environmental sociology, science teaching, and empirical educational sciences) facilitates the integration of already operationalized facets of competencies (for example environmental knowledge, awareness, behavior) into the target setting for ESD. On the other hand, these operationalizations can also be used as a starting point for the development and adaptation of new tools to evaluate ESD actions.
Firstly, ESD (learning) goals need to be structured and related to each other. Taking concepts of empirical educational research into account, Riess et al. [14] suggested a frame-model for the structuring of the relevant competencies and subcompetencies of ESD (see Figure 1). Based on Weinert's concept of competency [25], we determine "sustainability competencies as the overarching goal of ESD. Sustainability competencies compromise the entirety of cognitive abilities and skills as well as related motivational, volitional and social readiness in order to solve sustainability-related problems and to shape sustainable development in private, social and institutional contexts" ( [14], p. 299). This is largely consistent with the following understanding of sustainability competencies: "Sustainable development and social cohesion depend critically on the competencies of all of our population-with competencies understood to cover knowledge, skills, attitudes and values", defined by the OECD Education Ministers [26] and other literature on (E)SD competencies (see for example [27,28]). However, this definition does not cover the behavioral field appropriately. Lambrechts et al. state that "competencies for SD dealing with system orientation, future orientation, personal commitment, and action taking are virtually absent." [27]. Within the context of this study, to operationalize sustainability competencies, we conceptually state the three domains of the trilogy of cognitive, the affective motivational, and the behavioral aspects (see for example [29][30][31]). Additionally, the frame-model maintains the required openness that is needed in the field of sustainable development questions to guarantee the adaptivity of the relevant sustainability competencies for each specific context. This neutral concept of competency that comprises the international wide spread accepted threefold division [32][33][34] serves as the construct of conceptualization, and therewith, responds to the current criticism in the competency debate, i.e., the dominance of cognitive dispositions (see for example [35]).
The model additionally distinguishes between two basic (and an elaborated level of) sustainability competencies. In the context of school learning, the focus is on cross-curricular (cross-disciplinary) competencies (Level 1) and a basic level of more subject-specific competencies (Level 2). In university contexts, a level of elaborated sustainability competencies (Level 3), including highly domain-specific as well as inter-and transdisciplinary competencies, is relevant. However, in this paper, special attention is paid on levels 1 and 2, as they constitute the foci of interest for our researched domain, i.e., secondary schools. For further considerations concerning higher education and Level 3 of the frame-model, see for example [27,36,37].
At each level, a distinction is made between cognitive (a), affective-motivational (b), and behavioral (c) aspects, and additional subcompetencies (d). The cognitive goal dimension (Figure 1, section 1a, 2a, and 3a) ranges, for example, from the knowledge of fundamental concepts of sustainable development or the SDGs on Level 1, to basic knowledge of physical and ecological as well as social, cultural, economic, and political systems with connection to sustainability-related questions on Level 2, to knowledge of theories, methods, models, and findings from the natural and environmental sciences (for example modeling of global warming and anthropogenic environmental influences), the social sciences (for example on strategies of public sustainability management) and transdisciplinary research (for example on phases and principles of transdisciplinary research) and ethics (see for example [38][39][40]) of sustainable development on Level 3. The model additionally distinguishes between two basic (and an elaborated level of) sustainability competencies. In the context of school learning, the focus is on cross-curricular (crossdisciplinary) competencies (Level 1) and a basic level of more subject-specific competencies (Level 2). In university contexts, a level of elaborated sustainability competencies (Level 3), including highly domain-specific as well as inter-and transdisciplinary competencies, is relevant. However, in this paper, special attention is paid on levels 1 and 2, as they constitute the foci of interest for our researched domain, i.e., secondary schools. For further considerations concerning higher education and Level 3 of the frame-model, see for example [27,36,37].
At each level, a distinction is made between cognitive (a), affective-motivational (b), and behavioral (c) aspects, and additional subcompetencies (d). The cognitive goal dimension (Figure 1, section 1a, 2a, and 3a) ranges, for example, from the knowledge of fundamental concepts of sustainable development or the SDGs on Level 1, to basic knowledge of physical and ecological as well as social, cultural, economic, and political systems with connection to sustainability-related questions on Level 2, to knowledge of theories, methods, models, and findings from the natural and environmental sciences (for example modeling of global warming and anthropogenic environmental influences), the social sciences (for example on strategies of public sustainability management) and transdisciplinary research (for example on phases and principles of transdisciplinary research) and ethics (see for example [38][39][40]) of sustainable development on Level 3.
In the sense of hot cognitions [41], the affective-motivational facets of sustainability competencies ( Figure 1, field 1b, 2b, 3b) include all affect-, need-, and motivation-related competency features. These include, among other things, values (such as personal acceptance of the intergenerational idea of justice or the personally favored lifestyle), attributions of responsibility or attitudes (for example in questions of consumption or mobility (on Level 1 and 2) and mature epistemological beliefs about the relativity and situational character of empirical knowledge in important areas of sustainability on Level 3). The underlying affective-motivational traits become On the left, the interacting nonspecific competencies (with regard to sustainability). (We gratefully thank GAIA for the permission to use the figure published before in German, see [14] as a theoretical basis for the hereinafter presented novel results of the measurement instrument development).
In the sense of hot cognitions [41], the affective-motivational facets of sustainability competencies ( Figure 1, field 1b, 2b, 3b) include all affect-, need-, and motivation-related competency features. These include, among other things, values (such as personal acceptance of the intergenerational idea of justice or the personally favored lifestyle), attributions of responsibility or attitudes (for example in questions of consumption or mobility (on Level 1 and 2) and mature epistemological beliefs about the relativity and situational character of empirical knowledge in important areas of sustainability on Level 3). The underlying affective-motivational traits become progressively more conscious on the way from Level 1 to Level 3. An affective goal commitment or a positively assessed sequence of actions is the core of any motivation, without which an action will not be performed.
Regarding the facets of behavior (sections 1c, 2c, and 3c), they comprise introducing and practicing sustainable behavior or helping learners to translate what they assessed to be right into concrete action. This also requires, for example, the "overwriting" of harmful routines in dealing with natural resources with new sustainable action patterns.
The dimension of subcompetencies (sections 1d, 2d, and 3d) built the last facet of the competencies conglomerate. Subcompetencies are specific cognitive abilities to solve partial aspects of sustainability-relevant problems. These include, for example, system competency (or systems thinking), as, for example, the ability to solve complex dynamic problems with the help of a systemic approach, or the evaluation competency for ethical questions, as a cognitive ability to make well-founded decisions in sustainability-relevant contexts, but also to be able to critically reflect on decisions or decisions made by others (see for example [42][43][44]). Conceptually, there is a deepening of the sustainability-related competencies situated on Level 1 with the Levels 2 and 3, i.e., they become more specific and situationally complex.
The facets of sustainability competencies to be promoted in the ESD should be differentiated from the more general, non-sustainability-related competencies (see left side of Figure 1).
Nevertheless, there is a correlation between ESD and these non-sustainability-related competencies. For many of these interacting nonspecific competencies, well developed empirical constructs exist from, for example, educational psychology. A few, by no means exhaustive, examples for sustainability-unspecific skills will be given in the following. In this field, reading literacy, fluid intelligence (for example logical thinking/reasoning), general epistemological beliefs, crystallized intelligence (general knowledge), declarative meta-memory (learning-relevant characteristics, problem solving capabilities, self-regulation and self-regulated learning strategies) can be situated. In order to initiate and sustain a learning process, there must be a corresponding motivation which then also affects the cognitive processing of the content, the epistemological beliefs, moral sagacity, (for example [38,[45][46][47]) and the self-efficacy expectation (expectation to perform intentional actions, for example [48]). These are competencies that are needed for learning in general, as well as for effective and just social interactions. On the one hand, these person-related prerequisites foster the development of sustainability competencies; on the other, the latter foster the former. In education psychology, the person-related requirements for learning have been investigated for decades. More affective-motivational prerequisites are, for example, moral reasoning, delay of gratification [49] (the ability to resign a smaller prompt gratification in order to receive a larger gratification later), self-efficacy expectations and learning motivation [50] (implementing actions leading to a positively valued goal often requires volitional self-regulation and self-control skills [51]).
Especially for advanced learners, the acquisition of sophisticated epistemological beliefs (beliefs about the applicability, changeability, and usefulness of scientific knowledge, or the assessment of sources and scientific statements [52], are of great importance.
Regarding social interactions in general, perspective coordination and fair argumentation skills [53], as well as moral reasoning abilities [40], are needed. For all these competencies, appropriate assessment instruments exist. These instruments are empirically approved and validated, and therefore, can also be adapted for the development of sustainability specific instruments.

Approaches for Measuring Sustainability Competencies
Research offers many possibilities for mapping the goal dimensions of ESD. However, as stated above, there is still a significant need for developing adequate measurement instruments for the various levels of sustainability competencies. Especially for Levels 2 and 3 (see Figure 1), only a few operationalizations exist. In order to be able to estimate ESD effects on the output level, adequate tools and tasks to display, for example, the subject-specific effects of ESD in schools, are still required. At the same time, operationalization attempts and different forms of measuring instruments are already available when other research disciplines and pioneer research projects are taken into account. Some operationalization approaches in the field of ESD exist for specific regions, applying mainly qualitative methods (for example [13]). In the following, some examples where subdimensions of sustainability competencies have already been well operationalized will be listed without making any claim to comprehensiveness. The aim of this section is to give some existing examples and to show how these can be categorized using the frame-model of sustainability competencies with cognitive, affective-motivational, and behavior-related competencies and subcompetencies on each level.
With regard to the cognitive facets of sustainability competencies, precisely described and researched instruments already exist for some of the specific areas of sustainability (see for example [54][55][56][57][58], and for the more subject-specific tasks [55,58,59]). For other areas, operationalizations have yet to be developed. To capture the cognitive facets of basal subject-specific sustainability competencies (Figure 1, field 2a), it is important to look into teaching plans and curricula and to consider them in the construction of new items.
For the second, the affective motivational domain of sustainability competencies (see Figure 1, field b), several operationalization approaches of elements of the attitude and value dimension can be found, for example in the Greenpeace Sustainability Barometer [60,61], the Sustainable Development Values-Scale [62], the 2-MEV scale (or also the Preservation and Utilization-Scale) [63][64][65][66]. Items from earlier measurement tools emerging from environmental science or environmental psychology may be helpful in the search, especially for the environmental dimension of SC, for example Kaiser et al.'s scale for Environmental Attitude or Connectedness to Nature [57,67,68] or on environmental values, beliefs, and concerns, or environmental literacy [69][70][71]; see also the revised New Ecological Paradigm (NEP) Scale [72].
The third scope, the behavioral dimension of sustainability competencies (Figure 1, field c), includes, for example, sustainable practices, habits, environmental activities, behavior, conservation behaviors, and behavioral aspects of lifestyles. In a societal context, it is first and foremost about promoting the ability to act. This is exemplified in the work undertaken mainly by researchers from environmental psychology [73][74][75][76][77], for example, the General Ecological Behavior (GEB)-scale [67,68].
To capture the subcompetencies "system competency" and "evaluation competency" (for ethical questions researchers), evaluating and developing SD solutions can also revert to existing, proven tools (for example [43,44,54,[78][79][80][81][82][83][84][85]. All these facets shown in the frame model are needed to solve real sustainability-relevant problems. A approach combining capturing sustainability consciousness with the components sustainability knowingness, attitudes, and behavior in the three dimensions of sustainable development (environmental, economic, and social) was presented by a research team from Karlstad University in Sweden [29,56]. Additionally, an important theoretical background for the test construction in the framework of our study, as well as for hypothesis formulation, were studies emerging mainly from environmental psychology that dealt with the interconnections and influence patterns of environmental knowledge, environmental attitudes, and environmental behaviors (see for example [57,[64][65][66][67][68][69][70][71][72][73][74][75][76][86][87][88][89][90]). Note that these terminologies are used for the broader concepts interconnected to these fields, and that a transfer from the mainly environmental perspective and origin of this research domains can be made, and was made, towards sustainability and ESD in the framework of our and for future/other projects.
For the development of our item pool, the above-mentioned existing research and theories [64][65][66][67][68][69][70][71][72][73][74][75][76][86][87][88][89][90] and other conceptual aids such as the Guiding Questions for Developing an Environmental Literacy Assessment Framework [59] were consulted. Thus far, in this project we used pre-existing items from the aforementioned instruments for the questionnaire parts capturing mostly Level 1 aspects. For Level 2, on the other hand, it was necessary to develop specific items in line with the valid curriculum and suitable for the sample to be examined (in our case, German students in the state of Baden-Wuerttemberg in the age range from 9 to 16). Linking the concepts and findings of empirical educational research and other relevant fields (e.g., implementation theory, environmental psychology) to ESD, the interdisciplinary research group of the University of Education Freiburg is conducting a quantitative study in order to evaluate the development of sustainability competencies through school assessments at the beginning and the end of the school year 2018/2019. Our specific research questions for this article were therefore: Research Question 1: What is the present state of (lower secondary) student sustainability competencies in Germany (Baden-Wuerttemberg) after the implementation of ESD as a new guiding principle? Research Question 2: What kinds of subscales can be developed and used to capture sustainability competencies?
In order to answer these questions, we resorted, as described above, to existing measuring instruments for the assessment of sustainably significant knowledge, attitudes, behavioral readiness, and subcompetencies to solve partial problems of sustainable development. Finally, items on such facets of sustainability competencies for which we could not find any operationalization in the literature were newly formulated. In the following section, the procedure of the construction of the test and novel data of the results from the first assessment period will be reported. In so doing, we will also investigate whether the test adequately meets important quality criteria of a quantitative measuring instrument. In the discussion of the results, we want to explore the opportunities and limits for the further development of ESD, arising from the use of appropriate assessment instruments.

Development of the Instrument and Pilot Study I
As described above, an extensive literature study of existing operationalization attempts and measurement instruments preceded the conception of the current instrument for capturing the sustainability competencies of students in secondary schools. Additionally, we conducted a detailed analysis of the new educational plan (implemented in 2016) for Baden-Wuerttemberg and studied the references that were made to ESD in each of the school subjects. Proceeding the development of a new instrument for measuring subject-specific sustainability competencies (see Figure 1, field 2a), teaching plans and curricula were studied, as were school experts, i.e., teachers and students were considered in the construction of a new instrument for measuring sustainability competencies. The initially compiled questionnaire was given to groups of experts consisting of teachers, university professors and pupils with and without specific ESD background and from different areas of specialization (language, biology, psychology). The experts assessed the different items and questionnaire parts on, for example, their relevance, language use, and age appropriateness. After this expert feedback, the instrument was pre-tested in small groups in three different types of school. Mixed student cohorts from ten different classes covered the full age range from 5th to 8th class (aged 9 to 16). During this first pre-test phase, in small groups, the research group held individual interviews and group discussions with the "thinking aloud" method to make sure that right answers occurred from understanding of the task (and not from guessing). The instrument was subsequently revised based on the feedback from the students.

Pilot Studies II and III
After these first preliminary tests in small groups, the whole questionnaire underwent standardized testing twice (in February and July 2018) in a larger pilot study with two different school types and 16 classes for scale refinement purposes. The first pilot study (N t1 = 433) aimed to prove the discriminatory power and reliability of the scales. In order to optimize item batteries for the different subscales, we analyzed ceiling effects, distractor frequencies (wrong response possibilities) and item difficulty for the cognitive items. Affective, behavioral, and intentional items were adapted due to discriminatory power, and the students' feedback on problematic or unknown terms. (The participants of the pilot studies had the possibility to list the questions and words they did not understand after having answered the questionnaire.) The second assessment of the pilot study took place within the same 16 classes (N t2 = 412). Again, descriptive and reliability analyses were conducted and the instrument was slightly modified according to the final choice of items causing difficulties and with the aim to further shorten the questionnaire. The spokespersons from the Ministry of Education, Youth and Sports Baden-Wuerttemberg, the Ministry of the Environment, Climate Protection and the Energy Sector Baden-Wuerttemberg and the Foundation for Environmental Protection (Stiftung Naturschutzfonds) approved the final version of the questionnaire.

Dimensions of the Paper-Pencil Questionnaire for Students
The final questionnaire consisted of 6 main sections: (1) socio-demographic items; (2) subject-specific and basic cross-curricular sustainability knowledge; (3) affective-motivational beliefs towards sustainability; (4) self-reported sustainability behavioral intentions; and (5) subjectand age-specific "sustainability knowledge applied" composed of single choice-items with different response formats and (6) dilemma items. 61 variables were used to captured the relevant aspects in lower secondary classes (in Germany class 5-6, age 9-14) and 69 variables in the higher classes (in Germany class 7-8, age [11][12][13][14][15][16]. Please note that the age overlap is mainly due to different school forms and other school specificities. To guarantee the anonymity in the survey, an ID was given to the students, as there will be second assessment (t2), and the results will be assigned to each individual in order to display learning progress or changes by the end of the school year 2018/2019. The question of whether the students had already heard about sustainability (and its four response options: (1) No. (2) Yes, but I could not define it. (3) Yes, but I would have to think a bit before I could define it. (4) Yes, I know the term and I could explain it to others.) preceded the other scales for operationalization of ESD competencies. This question was integrated in the socio-demographic part (1). Table 1 displays sample items and subjects of the different dimensions of the questionnaire.
The first sustainability-specific competency dimension, sustainability related knowledge (2), was operationalized by a mixture of basic cross-curricular and subject-specific items. Cross-curricular items consisted of general understanding of the term sustainability. The subject-specific sustainability knowledge was adapted according to the specific educational plans for the different classes (5-8). The final questionnaire contained 18 single-choice items to capture the knowledge dimension of sustainability. To reduce guessing probability (instead of using a true/false-format, for each knowledge-item three distractors were formulated, see Table 1).
The two dimensions, (3) affective-motivational beliefs towards sustainability and (4) self-reported sustainability-related behavior, were measured using a four-point Likert-type scale: (1 = strongly disagree, 2 = disagree, 3 = agree, 4 = strongly agree). An uneven scale was avoided, as indecisive persons tend to choose a middle point. To polarize more clearly, it was preferable to select an even-number of Likert-scale options, since a neutral or a middle value is often vague to interpret (see [91][92][93][94]).
Several items from the original questionnaire were excluded during the process of scale refinement because of low discriminatory power and other psychometrical and conceptual considerations. The subject-and age-specific intentional sustainability knowledge applied -dimension was measured by different items in the styles of the Program for International Student Assessment (PISA). This part of the questionnaire consisted of different cognitive-evaluative item formats.
The sample subjects listed under 5. (see Table 1) are aligned with the educational plans for students in the according grades/age groups (across different school forms in Baden-Wuerttemberg). The last dimension, 6. in Table 1, compromises the readiness to act in hypothetical dilemma situations. We constructed four dilemma situation items with an ordinal ranking idea (0 = no sustainability dimensions, 1 = one sustainability dimension, 2 = two sustainability dimensions, 3 = three sustainability dimensions). Three of these four items were used for the scale IV, as shown and explained later in Section 4.1. Descriptive statistics and reliabilities of the scales to measure sustainability competency. In the following section, when presenting the scales for each subdimension, we will give a detailed description of the quality criteria of the instrument for measuring student sustainability competencies.

Dilemma situations
Imagine the following situation: You are the boss of a big company and you can make all decisions for this company. Your company has earned a lot of money this year. You decide: (Please mark only one response option.) I install a new solar system and an electric power charging station for my employees' e-cars. I build a new sports area so that my staff will feel well and stay healthy. I hire a consultant who helps me with future decisions to make the production more environmentally friendly and improve the working conditions for the employees, and I implement this program. I pay myself a big salary so I have money to go on holidays with my family.
Single-choice items with four response options on an ordinal ranking 2 1 Correct answer is marked; 2 The answer with the highest score covers three sustainability dimensions and is marked.

Descriptive Statistics and Reliabilities of the Scales to Measure Sustainability Competency
Having passed the two pilot phases, the test was administered to 78 school classes in a whole class assessment procedure in ten different secondary schools in Baden-Wuerttemberg. The different school forms were chosen depending on the frequencies of the school forms represented in Baden-Wuerttemberg, and randomly selected from a list containing all schools accredited by the state. Overall, 1622 students mainly aged from 10 to 15, (min = 9, max = 16, mean = 11.73, SD= 1.26) responded to the paper-pencil questionnaires that were distributed by the research team at the start of school year 2018/2019. The sample consisted of n = 796 (49.6 percent) female and n = 777 (48.4 percent) male and n = 32 (two percent) "no gender indication" participants, with the rest consisting of absentees.
Testing time was 90 min with a short break after about half the time. Following the state's educational research and data guidelines, parental and the school principals' consent were obtained prior to the assessment. Participation was voluntary, i.e., students did not get any credit or monetary reward. Participants were assured of full confidentiality and anonymity.
The various subdimensions of sustainability competencies will be given in the following. For the first subdimension, sustainability knowledge, we can make a conceptual division of the more general or basic sustainability knowledge (situated on Level 1) and the subject-and age-specific items (situated on Level 2, see Figure 1). However, at this early stage of the operationalization attempts, psychometrically we propose one scale for the cognitive sustainability dimension. In the following, we will therefore call this subscale consisting of altogether 16 cognitive items (I) sustainability related knowledge. As this first part of the test constitutes a knowledge test, missing answers in this part of the questionnaire were treated as wrong answers.
For reliability analysis, Cronbach's α, which is the most common measure of scale reliability [95], was calculated to assess the internal consistency of the subscales. The results can be seen in the following Table 2. Henceforth, the term "self-reported" will not be repeated in the designation of the scales and the following tables. We assume that the fact that we deal with self-reported aspects of sustainability has become clear by the detailed description of our measurement procedure.

IV.
Intentional sustainability knowledge applied class 7-8 (age group 11-16) 4 0. 55 3 The four subscales consisted of seven to 16 items. The psychometric criteria of reliability measured were satisfying for the first three scales (I. Sustainability related knowledge, II. Affective-motivational beliefs towards sustainability, III. Sustainability-related behavior), with Cronbach's alpha ≥ 0.70. However, note that the use of generally accepted value is shortsighted in a way and should be treated with caution, because complex constructs of meaningful content coverage might have a lower alpha value [95,96]. As Schmitt points out "[w]hen a measure has other desirable properties, such as meaningful content coverage of some domain [ . . . ] this low reliability may not be a major impediment to its use." [96]. This is the case for the scale Intentional sustainability knowledge applied (IV). Similar to Scale I, conceptually a division can be made between the sub dimensions IV (a) that are subject-specific and those of IV (b) that are general intentional (non-subject-specific) "dilemma-items". However, psychometrically, in the framework of this project, we propose a scale IV Intentional sustainability knowledge applied that compromises the two subject-specific and non-specific aspects into one scale. However, these questionnaire parts for construction of scale IV varied along the classes 5-6 and 7-8. For this reason, two combined scales were formed for each age group: (1) classes 5-6 and (2) classes 7-8. This scale for each of the two groups consisted of four subject-specific and three dilemma items. Cronbach's α for this subcompetencies domain ranged between 0.55 (for class 7-8) and 0.67 (for class 5-6). As stated above, low reliability may not be a major impediment to the use of scales when dealing with complex and new operationalization constructs. Nevertheless, the shortcomings concerning this fourth scale will be discussed in detail in Section 5.
To sum at this stage of the project, 52 items were distilled to analyze the development of sustainability competencies: 16 items for Sustainability related knowledge (I), 16 items for Affective-motivational beliefs towards sustainability (II), 13 items for Sustainability-related behavior (III), and seven items (with respectively four subject-specific items for fifth to sixed graders and seventh to eight graders) for Intentional sustainability knowledge applied (IV). In the following, different forms to test validity of the measurement tool for student sustainability competencies will be given.

Validity of the Measurement Tool
Another measure for determining the quality of an instrument is validity. Validity is understood in our case as the extent to which a measuring instrument is well founded and likely to correspond accurately to the real world based on probability, i.e., it measures accurately what it is supposed to measure. Different forms of validity exist [97]: Convergent validity is the degree to which the scale is related to other instruments that are designed to measure similar attitudes [98]. Since no comparable measures exist that capture the presented dimensions of sustainability competencies, it is difficult to fully determine the convergent validity of the new scale.
Content validity refers to the extent to which the measure represents all facets of the sustainability competencies. As we carefully inspected theoretical literature and educational curricula with regard to sustainability competencies in order to construct the items, this validity criterion may be satisfied. In addition, we asked teachers to revise our instrument for the content and competencies of the curriculum. As described above, additional experts were involved in the evaluation of the measurement instrument.
External validity refers to the relationships between the test scores and other measurements. These relationships should be theoretically and empirically sound. Although previous research primarily focused on environmental knowledge, beliefs, and behavior, the following relationships regarding sustainability related competencies can be expected: (1) Students in higher grades should have higher sustainability related knowledge scores in the test than students in lower grades [99].
(2) Conversely, students in lower grades should have higher values in affective-motivational beliefs and exhibit to a higher degree sustainability related behavior [40,100]. The results regarding our newly developed instrument are summarized in Table 3.
Older students showed higher scores in sustainability related knowledge. These differences are statistically significant: F (3, 1621) = 88.17, p < 0.001. Regarding motivational-affective beliefs and sustainability-related behavior, the age trend is reversed and statistically significant (for beliefs: F (3, 1604) = 5.14, p < 0.01). With respect to behavior, age differences are in the assumed direction as well and are statistically significant: F (3, 1604) = 22.57, p < 0.001). Therefore, age trends are in line with previous research, thus supporting the validity of our instrument (3) It is likely that girls have higher scores in sustainability-related measures than boys, at least in western societies (gender gap, see [29,[101][102][103]), even though these "gender differences might be explained by differential item functioning rather than reflect genuine differences" [104],(p. 373), and the results vary amongst the different constructs of sustainability related aspects; e.g., for environmental awareness measured by PISA, in OECD-countries, males tended to be more aware of and more optimistic about environmental issues than females. Females tended to report a higher sense of responsibility towards the environment than males, see [105].
Nevertheless, tstatistics for our sample (rounded to two decimals) revealed that girls show higher mean scores in all scales than boys. Sex differences are statistically significant (for knowledge: t (1, 1569) = 2.76, p < 0.01, for beliefs t (1, 1470) = 7.40, p < 0.001, for behavior t (1, 1568) = 5.35, p < 0.001). The results comparing the respondents' mean scores for each dimension of sustainability competencies by gender are shown in Table 4. (4) Sustainability related knowledge should correlate with school marks to a medium degree, as knowledge from school is needed to solve sustainability related knowledge items. Table 5 presents the according correlations. In addition, both fields of achievement require fundamental learning skills such as intelligence, information processing strategies, attention, and (long-term) memory. Note: * p < 0.05, ** p < 0.01, *** p < 0.001. The German grading system is scaled conversely (very good = 1, insufficient = 6).
Sustainability-related knowledge correlates with school marks in all three subjects to a medium degree (around 0.30). As school marks are inversely scaled in the German grading system, the direction of all correlations is as anticipated.
In summarizing these results, all interrelations with external variables (age, sex, school marks) were as expected. Hence, these results support the validity of our instrument.

Discussion
Recalling the research questions of the study, we can state that a tool for measuring sustainability competencies (applicable for students in secondary schools in Baden-Wuerttemberg) could be developed and tested using four subdimensions of sustainability competencies: Sustainability related knowledge (I), Affective-motivational beliefs towards sustainability (II), Sustainability-related behavior (III), and Intentional sustainability knowledge applied (IV). When constructing an instrument to measure sustainability competencies, social desirability can be a pitfall making it difficult to state whether the given indications correspond to the actual sustainability related convictions or behaviors. Being aware of these pitfalls, we conscientiously formulated the items accordingly to avoid social desirability as much as possible. Nevertheless, difficulties of high discrepancies between the self-reported convictions, affective-motivational beliefs and behavior related to sustainability and the actual behavior remain. In this regard, Kagawa states that "[t]here are multiple factors which influence the process of behavioral change and further investigation of dissonance between students' perception of sustainability and their individual actions needs to be explored" [106]. See, for example, research on the attitude-behavior gap [103,107,108] or cognitive dissonance [109,110]. Further implications for future research and an outlook will be given in the concluding section of this article.
In addition, the question arises as to whether the items formulated in this study for measuring basic cross-curricular sustainability-related knowledge could also be used successfully and are valid in other countries. If so, we nevertheless assume that this would be less true for subject-specific related knowledge items, as the curricula of different nations varies in terms of the subjects taught. This project is only a preliminary exploration of the student sustainability competencies from four different grades and age groups. Consequently, some shortcomings arise due to the fact that the questionnaire was developed for a quite large sample including the grades five to eight and different school forms of German secondary schools. Therefore, the subject-specific cognitive and subcompetencies domains (see Figure 1, fields 2a and 2d) still constitute a "grey spot" in our measurement construct, since the model was originally theorized to contain subject-specific dimension scales across the two constructs of scale I and IV. Here it might be useful to consider further instruments and items from other studies that more specifically measure subject-specific sustainability aspects and adapt them to the curriculum specifications of, for example, a measurement unity that only consists of one grade level age group or school form instead of aiming for a large scope measurement instrument for several grade levels and school forms at the same time, as was the case in the presented study. Thus, there might be more accurate ways to capture e.g., subject-specific sustainability-related knowledge when only focusing on a small aspect or age group (see fidelity-bandwidth trade-off [111]). Nevertheless, an across-school-forms and across-class analysis, as conducted in this project, can give valid information on the development of sustainability competencies in the heterogeneous setting with different school forms and age groups (in our case German secondary schools). There is a need for combined approaches to measure sustainability competencies: those which focus on detailed age-and subject-specific aspects and those which can capture the broad picture of the educational landscape. Therefore, there is a need for more cooperation amongst the different research fields (e.g., environmental psychology, environmental sociology, science teaching, and empirical educational sciences) and projects, as well as quantitative and qualitative methods of ESD research. As described above, Figure 1 can help to situate the different existing operationalization possibilities of sustainability competencies within the meta model. Therewith, a framework for conceptualization and operationalization of sustainability competencies has been presented. The presented method and instrument for operationalization of sustainability competencies picks up core competencies for students to enable them to shape a sustainable future. However, when dealing with competency models, this concurrently raises general questions about the possibilities of evaluation, definition and the seemingly antithetical need of openness of the ESD concept in order to stay adaptable to sustainability related challenges in the future. As Wals et al. conclude, "[t]he main point is that there is no single model of education and learning for environmental sustainability, nor should there be" [112]. The conception of an adaptive and flexible concept of ESD, nevertheless, should not hinder our duty in the field of empirical research to create evidence via research programs, to verify if the undertaken programs of ESD show (the wanted) outcomes. We still argue that a focus on the ESD effects and learning outcomes is highly necessary to evaluate and improve the measures taken to enable learners to shape a sustainable future. Only if these further steps are taken can the compatibility of ESD with empirical research programs be guaranteed, and hence, its success be assessed.
The following aspect should be included in the discussion as well. Due to the fact that, as stated above, especially the research fields of environmental psychology played a pioneer role in the development of measurement tools to capture concepts like environmental values, behaviors, their interaction with knowledge, and other related concepts, there might be a slight preponderance of the environmental domain of sustainability, even though special attention was payed to add items covering the social, economic, inter-and intragenerational aspects as well. However, we also want to state the epistemological beliefs of the authors: We agree with John Evans, General Secretary, Trade Union Advisory Committee to the OECD, that there are no jobs on a dead planet [113]. Therefore, we state the fundamental importance of ecology and the need to stop the further destruction of our vital resources, favoring the pursuit of a sustainable future.

Conclusions & Outlook
In conclusion, this article has described the development of the measurement instrument to operationalize student sustainability competencies in secondary schools in a German state. Hereinafter, novel data from the results of the first assessment period were analyzed. To this end, we investigated whether the test adequately met important quality criteria of a quantitative measuring instrument. In the discussion of the results, we explored the opportunities and limitations for further ESD research, arising from the use of appropriate measuring instruments.
According to the presented results, the questionnaire can be approved for practical application to measure different dimensions of sustainability competencies. Hence, the measurement instrument presented in this article provides a valuable starting point for further test extension, e.g., for assessing learning progression, to assess the development of competencies in multilevel analysis or longitudinal studies.
The above-presented results revealed some first insights into the interactions of different socio-demographical aspects with the different sustainability competency dimensions. In future studies and analyses, it will be interesting to explore the role of the interactions of the different facets of sustainability competencies in more depth and compare them to other findings of relevant research in the field of for example psychology for sustainability to further illuminate the interconnections.
Over and above that, other important stakeholders of ESD in schools, such as the teachers, the principals, or variables such as, for example, institutional aspects in the sense of the whole institution approach, should be taken into account as well. The question of if and how a policy has been successfully implemented on the international, nationwide, or local level is an essential domain of (political) science. Neglecting the critical success questions involved in implementing a policy means that the lacks and weaknesses of the implementation process go unnoticed. This is equally true in the field of ESD. On a policy level, the development of further indicators (see for example [114,115]), or the evaluation of ESD programs (see for example [13,116]), seem like helpful supplements to foster future steps and crucial insights in the implementation process of programs that aim to promote learner competencies to build a sustainable future.