The Situational Motivational Scale (SIMS) in physical education: A validation study among Norwegian adolescents

One of the most important variables to consider in physical education (PE) is motivation. The self-determination theory (SDT) represents an essential theoretical perspective to examine and understand adolescents’ learning and motivation in PE. Based on this theory, the Situational Motivational Scale (SIMS) measures students’ situational motivation related to a subject like PE. The aim of the present study is to examine the dimensionality, reliability, and construct validity of the Norwegian version of the SIMS among adolescents in PE. In total, 318 students from six schools completed the SIMS in their PE classes during the spring of 2016. Explorative and confirmatory factor analyses were conducted, suggesting the fourteen-item version of the SIMS to be superior to the sixteen-item version. The SIMS measurement model of adolescents’ situational motivation in PE showed satisfactory reliability and construct validity. Subjects: Secondary Physical Education; Test Development, Validity & Scaling Methods; Motivation


Introduction
One of the most important variables to consider in physical education (PE) is motivation, as adaptive types of motivation have been associated with intentions to exercise, step count during Ove Østerlie ABOUT THE AUTHORS Ove Østerlie is an educator and researcher in physical education. Main research areas are didactics and teaching methods in physical education with speciality in flipped learning and physical education as an inclusion arena.
Audhild Løhre has interdisciplinary research interests including wellbeing and health among children, adolescents and young people, the school setting, teacher education, public health and health promotion.
Gørill Haugan is widely published in various fields such as validation and measurement, health promotion, quality of life, spirituality and wellbeing, and nurse-patient interaction among different populations (adolescents, postnatal women, nursing home residents).

PUBLIC INTEREST STATEMENT
Physical Education (PE) is a significant contributor for people in developing a lifelong healthy lifestyle. For obtaining good learning in PE, adolescents need good motivation. There is a need for validated instrument for measuring motivation and one instrument is the Situational Motivational Scale (SIMS) measuring adolescents' situational motivation in PE. The aim of the present study was to examine the dimensionality, reliability and construct validity of the Norwegian version of the SIMS among adolescents in PE. In total, 318 students from six schools completed the SIMS in their PE classes during the spring 2016. Explorative and confirmatory factor analyses were conducted showing the SIMS to demonstrated good reliability and construct validity.
PE classes and physical activity outside of school (Lonsdale, Sabiston, Taylor, & Ntoumanis, 2011). As a major part of adolescents do not reach the suggested levels of physical activity (Hallal et al., 2012), developing measurement tools to investigate and understand motivation in a PE context is vital. Self-determination theory (SDT) has over the last 40 years become a major theory of human motivation (Gagne & Deci, 2014). The fundamental tenets of SDT suggest that motivation and its determinants, mediators and consequences operate at three levels: global, contextual and situational (Vallerand, 1997(Vallerand, , 2001). Motivation at the global level echoes how an individual generally interacts with his/her environment (Vallerand & Rousseau, 2001). The contextual motivation is a motivational disposition towards a particular context, such as work, sports or education (Vallerand, 1997). The situational motivation refers to the "here and now" of motivation, which represents the motivation experienced while engaged in a particular activity (Vallerand, 1997). All three levels can be further refined and described by various constructs, among them the motivational factors proposed by SDT (Deci & Ryan, 1985, 1991: Intrinsic motivation (IM), identified regulation (IR), external regulation (ER) and amotivation (AM), constituting a self-determination continuum from self-determined to non-self-determined motivation. IM comes from within as internal drives that motivate you to behave in certain ways; including your core values, your interests, and your personal sense of morality. IR is the somewhat internal motivation based on conscious values being personally important to an individual. ER is exclusively external motivation and is regulated by compliance, conformity, and external rewards and punishments. In AM, you are completely non-autonomous, has no drive to speak of, and you are likely struggling to have any of your needs met.
To measure a person's situational motivation the Situational Motivation Scale (SIMS) 1 (Appendix B) was developed by Guay, Vallerand, and Blanchard (2000), assessing IM, IR, ER and AM. The SIMS has demonstrated good reliability and factorial validity in both a PE context (Lonsdale et al., 2011;Standage, Treasure, Duda, & Prusak, 2003) and a broader context, including education, interpersonal relationships and leisure (sport) (Guay et al., 2000) among adolescents. Standage et al (2003) re-specified the original 16-item SIMS to a 14-item scale by excluding two items, thereby creating improved absolute and incremental fit indices without loss of internal consistency of the two affected subscales. Internal consistency analyses, as well as single and multi-group confirmatory factor analyses (CFA) have documented support for the reliability and validity of the 14-item SIMS among UK adolescents (Lonsdale et al., 2011).
Traditionally, several aspects determine the construct validity of a measurement scale. In respect to SIMS, the study by Guay et al. (2000) found all factors except AM to be somewhat stable across measurement times and invariant across gender. In AM, a small gender difference turned out to be statistically significant. Further, the researchers reported IM and IR to be associated with behavioral intentions of future persistence towards the activity. Correspondingly, another study (Säfvenbom, Buch, & Aandstad, 2017) found a positive relationship between IM, IR and eagerness for physical activity. Ryan and Deci (2000) established a theoretical proposal for ER and AM to work in the opposite direction compared to IM and IR. However, there is still no empirical evidence to support this proposal either regarding intentions of further persistence towards an activity (Guay et al., 2000) or eagerness for physical activity (Säfvenbom et al., 2017). The self-determination continuum is proposed to have a simplex-like (ordered correlation) structure, whereby adjacent regulations (e.g., intrinsic motivation and identified regulation) should be more strongly and positively related with each other, while more distal regulations (e.g., intrinsic motivation and amotivation) are expected to be unrelated or negatively correlated with each other Ryan & Connell, 1989).
Several studies (Erdvik, Øverby, & Haugen, 2014;Säfvenbom, Haugen, & Bulie, 2014) as well as Master's theses (Bulie, 2011;Forfot, 2014;Medic, 2012;Olsen, 2011) have applied a Norwegian version of the SIMS, 2 translated by Lemyre and Roberts (2004) (Appendix A). However, these publications do not describe the SIMS translation process; neither do they refer to a validation of this Norwegian version. All the above mentioned studies refer to validation articles of the English version by Guay et al. (2000) and/or Standage et al. (2003). To the authors' knowledge, the Norwegian version of the SIMS has not been validated among adolescents in a PE setting. Therefore, this study assesses the psychometric properties of the Norwegian version of the SIMS questionnaire. For comparison, the original, English version of SIMS is included as Appendix B.

Aims
The aim of the present study is twofold: (1) To examine the psychometric properties of the Norwegian version of the SIMS among adolescents in secondary and upper secondary school PE, and (2) to test if the 14-item model is superior to the 16-item model.
The following hypotheses were tested: The SIMS questionnaire comprises four factors (H 1 ). The 14-item version of the SIMS four-factor model is superior to the 16-item four-factor version (H 2 ). The Norwegian version of the SIMS questionnaire shows good reliability and construct validity (H 3 ). The SIMS factor structure is invariant across time (H 4 ). There are significant correlations between further intentions of participation in PE and all the four SIMS factors: IM and IR in the positive direction and ER and AM in the negative direction (H 5 ). The four SIMS factors demonstrate a simplex-like structure (H 6 ).

Participants and research context
Twenty schools from three different regions in Norway were invited to participate by the researchers in sending an e-mail to the school authorities. Six schools agreed to participate, with a total of 364 students (Year 8, 9, 10 in lower-secondary school and Year 1 in upper secondary school). Of these, 318 (87.4%) students took part in this study. The schools involved represented both rural and central communities with a normal distribution of immigration and social classes. Gender and age distributions in the present sample were 145 girls with a mean age of 15.31 (SD = 1.31) and 173 boys with a mean age of 15.06 (SD = 1.13). The students' mark awarded for classwork in the second semester of 2016 (girls: 4.45; boys: 4.49) reflected the national average for 10 th grade (girls: 4.5; boys: 4.6) for the actual semester (Statistics Norway, 2016). Data were collected two times (T1 and T2), about four weeks apart, during the spring of 2016.

Variables and measures
In accordance with the standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999;Goodwin & Leech, 2003), the research questions addressed evidence related to the dimensionality, reliability and construct validity, all of which were considered to be interrelated measurement properties.
Dimensionality examines the extent to which the internal components of a test match the defined constructs. Hence, dimensionality is concerned with the homogeneity as well as the internal structure of the included items (Netemeyer, Bearden, & Sharma, 2003). A scale's internal structure (which items are consistent with which other items) reflects the internal consistency and thus reliability. In the present study, we assessed dimensionality by inspecting the factor structure and individual items.
Reliability may be viewed as an instrument's consistency and relative lack of error. One type of reliability is internal consistency, representing the interrelatedness of items or sets of items in a scale. Cronbach's alpha (α) and composite reliability (ρ) are reliability coefficients assessing internal consistency (Netemeyer et al., 2003) which are used in this study. This study assessed reliability by evaluating the factor loadings, squared multiple correlations (R 2 ) along with the reliability coefficients Cronbach's alpha and composite reliability.
Construct validity refers to how well a measure actually measures the construct it is intended to measure, and is based among other things on the construct's relationships to other variables (Netemeyer et al., 2003). In assessing discriminant validity in the present study, the correlations among the SIMS four factors were investigated to determine if they displayed a simplex-like structure and possible invariances across genders, as well as the stability across measurement times and future intentions of participation in PE.

Data collection
The participating schools and PE teachers received exact instructions from the researchers in accordance with a written instruction on how to conduct the data collection. The students filled in the SIMS anonymously in paper format at the start of a PE class. There was no time limit. All students had the opportunity to mark their answers without being observed, and to ask questions if something was unclear. To minimize the adolescent's tendency to give socially desirable responses, they were asked to answer as truthfully as they could, along with an assurance that the teacher would not be able to read their responses and that their grades would not be affected by how they responded. Written consent in accordance with the procedure acknowledged by the Norwegian Centre for Research Data (NSD, Project #47604) was given by their parents. The students normally needed approximately 10 to 15 minutes to complete the SIMS. The data collectors immediately put all the SIMS questionnaires into a sealed package. This material was then sent to the researchers for data entry.

Statistical analysis
The data were analysed by descriptive statistics and explorative factor analysis (EFA), using IBM SPSS version 24 and CFA by means of Stata 14.1 (StataCorp, 2015). When evaluating a measurement scale, researchers face two important questions: 1) the underlying dimensionality of data, and 2) the adequacy of individual items. In these instances, EFA and CFA can provide complementary perspectives on data, giving different pieces of information (Hurley et al., 1997;Netemeyer et al., 2003). The implicit assumption underlying the use of EFA in the present study is the insecurity with respect to the dimensionality of the SIMS, which has not previously been tested in Norway, nor among adolescents. Therefore, this study intended to gain insight into a potential factor structure of the SIMS and provide a broad perspective on the observed data using EFA followed by the confirmation procedure by means of CFA.
CFA is a sub-model in structural equation modelling (SEM) that deals specifically with measurement models (Brown, 2006), accounting for random measurement error. Thus, the psychometric properties of the scales used are more accurately derived. A high loading of an item indicates that there is much in common between the factor and the respective item (Sharma, 1996). Loadings below .32 are considered poor, ≥ .45 fair, ≥ .55 good, ≥ .63 very good, and above .71 excellent (Sharma, 1996).
The present study assessed model fit adequacy by χ 2 -statistics and various fit indices. In line with the "rules of thumb" given as conventional cut-off criteria (Mehmetoglu & Jakobsen, 2017) the following fit indices were used; the Root Mean Square Error of Approximation (RMSEA) and the Standardized Root Mean Square Residual (SRMS), with values below .05 indicating good fit, whereas values smaller than 0.10 are interpreted as acceptable (Mehmetoglu & Jakobsen, 2017). Further, the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI) with acceptable fit set at .90 (Mehmetoglu & Jakobsen, 2017) were used. The frequency distribution of the measurements was examined to assess deviation from normality: both skewness and kurtosis were significant, thus the Robust Maximum Likelihood (RML) estimate procedure was applied. When analysing continuous but non-normal endogenous variables, the Satorra-Bentler corrected χ 2 should be reported (Kline, 2011;Satorra & Bentler, 1994). Table 1 presents the means (M), standard deviations (SD) and Pearson's r and Kendall's tau-b correlation matrix for the SIMS scale estimated at both assessments (T1 and T2) and on both the 14-item and 16-item version. Significant correlations in the predicted direction and structures for SIMS in terms of time, intentions of participation in PE and structure were shown ( Table 1).

Descriptive analysis
The alpha levels for the SIMS factors indicated an acceptable inter-item consistency in the measures, with Cronbach's alpha coefficients between .74-.92 (Tables 2 and 4). However, a substantial body of research has indicated that Cronbach's alpha cannot be generally relied on as an estimator of reliability (Raykov, 2001). Thus, composite reliability coefficient (ρ c ) was estimated by means of the formula by Hair et al. (Hair, Black, Babin, & Anderson, 2010), as noted in Table 3. Composite reliability displayed good values (.78-.92): values ≥ 0.7 are considered to be good (Bagozzi & Yi, 1988;Hair et al., 2010;Mehmetoglu & Jakobsen, 2017).

Exploratory factor analysis (EFA)
The SIMS was assessed in the same sample twice approximately four weeks apart. Since previous studies have shown that the SIMS dimensionality is unclear, the SIMS items were subjected to exploratory factor analysis (EFA). The Kaiser-Meyer-Olkin measure (Tabachnick & Fidell, 2007) of sampling adequacy exceeded the recommended value of .60 (T1: .898; T2: .901) and Bartlett's test of Sphericity reached statistical significance (p < .0001), supporting the factorability of the correlation matrix for both assessments. The SIMS-factors were expected to be correlated (H 1 ). Thus, principal component analysis with an oblique promax rotation was used. Table 2 lists the loadings, factors and variance explained for the factors extracted from both assessments (T1 and T2). EFA revealed three factors with eigenvalue 1.0 and above. This three-factor-solution disclosed factor loadings between .509 and .906, including one cross loading for item 11 ("Because I don't have any choice") at both T1 and T2, explaining 66 % and 67 % of the total variance, respectively. However, several studies have demonstrated a four-factor-solution of the SIMS. Therefore, the data were run once more (both T1 and T2), setting the number of factors to four. This four-factor-solution displayed factor loadings between .377 and .957, including five cross loadings, explaining 70.5 % of the total variance. Hence, the fourth factor contributed about 4 % of the explanation. Looking at the dimensionality of these two models, the EFA suggested a strong first factor, including eightnine items. Accordingly, the dimensionality seemed uncertain, and we turned to CFA.

SIMS4
There may be good reasons to do this activity, but personally, I don't see any.

SIMS5
Because I think that this activity is pleasant. .92 (8) .80 (4) .80 (4) .92 (8) .74 (4) .80 (4) .81 (5) Note: troublesome, indicating loadings at all four factors and a significant residual, explaining only a minor variance (R 2 = .36). Item 10 ("By personal decision") exposed the same pattern, including significant residual and explained only a little of the variance of its factor (R 2 = .36). The corrected item-total correlation test of the two factors, which these items belonged to, revealed a better internal consistency when the items 10 and 11 were removed, supporting that these items should be considered excluded (Table 4). Hence, items 10 and 11 were disclosed and the four-factor solution was run once more.

Discussion
The research question of this study was twofold, addressing evidence related to dimensionality, reliability and construct validity of the SIMS questionnaire. The aim was to assess the psychometric properties of the Norwegian version of the SIMS measure among adolescents in Norway. Six hypotheses (H 1-H 6 ) were tested. The observed data demonstrated that the Norwegian version of the SIMS questionnaire consisted of four factors (H 1 ), and that the 14item version was superior to the 16-item version (H 2 ). Furthermore, the 14-item four-factor solution displayed good construct validity and reliability (H 3 ). The SIMS factor structure was invariant across time (H 4 ). There were correlations in predicted directions between further intentions of participation in PE and the four SIMS factors (H 5 ). The four SIMS factors demonstrate a simplex-like structure (H 6 ).

Dimensionality
In accordance with previous studies (Guay et al., 2000;Lonsdale et al., 2011;Standage et al., 2003), the present results indicated that the four-factor model of the SIMS is psychometrically superior to a possible three-factor construct suggested by the EFA in this study. However, some items seemed troublesome; in particular, this was the case for item 10 ("By personal decision") and 11 ("Because I don't have any choice"). These two items revealed cross loadings to all four factors, blurring the dimensionality of the SIMS scale. While responding to the scale, the adolescents were asked why they were currently engaged in the subject physical education. The content of the two troublesome items relate to having a choice/decision regarding the students' participation in physical education at school. As physical education is a mandatory subject in Norway, these two items will naturally be in conflict with their intended purpose of measurement. Consequently, it is rational that they correlate, showing cross loadings. Looking at the wording of Table 3. Goodness-of-fit measures for measurement models of the SIMS. Confirmatory Factor Analysis for Model-1 and Model-2, at two points of assessment framed T1 and T2 Fit Measure  the items, the first factor framed IM includes items covering positive aspects such as "physical education is interesting, pleasant, fun and good". The second factor, IR, indicates that the students participate in physical education for their own good, because it is good and important for them, while item 10 belonging to this factor brings in the perspective of personal decision, seeming to covering another aspect than the rest of this factor. Regarding item 11, a similar situation appears; the third factor framed ER includes aspects involving that the students feel they are supposed to and have to partake in this activity, while item 11 encompasses the dimension of not having a choice. Not having a choice, in this context, logically relates to all factors included in the SIMS scale; not having a choice is relevant, whether this activity is seen to be interesting, fun, good, for one's own good or not. Regardless of these aspects, the students experience that they do not have any choice; their participation is required anyway. In this perspective, item 11 relates to a dimension other than those assessed by the SIMS. Moreover, since both item 10 and 11 concern the dimension of having a choice/deciding by yourself, it is reasonable that these items significantly correlate, blurring the dimensionality. Accordingly, the four-factor model (Model-2) including only 14 of the originally 16 items is psychometrically superior (H 2 ).

Reliability
Reliability is supported by items in each factor with highly significant standardized factor loadings -preferably greater than .7 (Brown, 2006;Hair et al., 2010;Kline, 2011). This was the case for eleven out of the fourteen (sixteen) indicators; nevertheless, loadings under .7 were still good (.59, .64 and .67). Accordingly, all standardized factor loadings showed good to very good values ranging between .59-.89. The square of a standardized loading represents how much variation in an item is explained by the latent factor and is termed the variance extracted from the item (Hair et al., 2010). As loadings fall below .7, they can still be considered significant, but more of the variance in the measure is error variance than explained variance. As a result, Cronbach's alpha and composite reliability also revealed good values (Tables 2, 3 and 4), indicating good internal consistency: values greater than .7 are good (Acock, 2013;Hair et al., 2010;Mehmetoglu & Jakobsen, 2017). Hence, this study supported the reliability of the SIMS very well.

Construct validity
Constructs are latent variables which researchers cannot observe directly, but by means of indicators. Construct validation is a lengthy and ongoing process of learning more about the construct in focus, making new predictions and then testing them. Each study that supports the theoretical construct serves to strengthen the theory (Netemeyer et al., 2003). Construct validity for the SIMS refers to the assumption that this questionnaire validly measures situational motivation for physical education among adolescents.
The observed data supported that IM, IR, ER and AM regarding participating in PE correlated with each other in the expected directions ( Figure 1 and Table 1). Figure 1 shows standardised covariances (φ), while Table 1 shows Pearson's correlation coefficients (ρ), demonstrating a slight, but acceptable difference. As expected, IM and IR were highly positively correlated (φ = .87; ρ = .780), while IR and ER did not demonstrate significant correlation (φ = −.065; ρ = −.027). This is reasonable: IM and IR include positive aspects, such as PE is interesting, pleasant, fun, good and important to them, while ER contains experiences of being supposed to and having to participate. Interestingly, ER (to be supposed to/have to participate) and AM (don't see any good reasons/what it brings/not sure if it is worth pursuing) revealed a weak, but significant, factor correlation (φ = .27; ρ = .210), indicating these factors to possibly contrast with each other. Probably the experience of being supposed to/have to participate reflects issue other than whether this activity is good for the individual or worth pursuing. Nevertheless, the demonstrated correlations reflect earlier findings (Guay et al., 2000). Moreover, significant correlations in the predicted direction between the selected factors, stability over measurement times, intention of participation, and a simplex-like structure, supported convergent and discriminant validity. Amotivation was the factor demonstrating less stability across measurement times, supporting to some extend the findings of Guay et al. (2000). To investigate a possible invariance across gender the TLI (Boys: .90; girls: .97) and the CFI (Boys: .93; girls: .97) were assessed revealing a small variance (Table  3). Fit is considered adequate if the CFI and TLI values are > .90, better if they are > .95 (Kline, 2011;Mehmetoglu & Jakobsen, 2017). Hence, the SIMS factor structure did not show stability across gender, not supporting the findings of Guay et al. (2000), although they reported a small but statistically significant gender difference on the AM factor. Reasons why the SIMS structure differs among girls and boys is indistinguishable in our data other than the observation that SIMS demonstrates a better fit among girls than boys. Thus, this should be further investigated.
Taken together, the evidence supports satisfactory construct validity of the SIMS factors among adolescents in Norway. Content validity is an obligation both for reliability and construct validity (Mokkink et al., 2010;Potter & Levine-Donnerstein, 1999), and is assessed by judging the relevance and comprehensiveness of the items; both with regard to relevance for the construct to be measured as well as for the study population. In the present adolescent population, the fourteen (sixteen) items appeared to be relevant, signified by the high factor loadings and the high R 2 -values.

Strengths and limitations
The participation of 318 adolescents (response rate 87.4 %) from six schools involving three regions in Norway signifies a strength of this study. The present sample represents a diversity of locations in urban and rural areas, reflecting the general adolescent population in Norway. Next, the students' semester marks corresponded with the national average grade for the actual semester, indicating that the present sample does not differ from the general Norwegian adolescent population in terms of actual age. The PE teacher administered the data collection at the start of a PE lesson, ensuring anonymity and enough time for the students to fill in the questionnaire. This procedure using a well-known teacher in well-known surroundings contributed to making the students feel comfortable in the assessment situation, supporting reliable data. This represents a strength of this study. The fact that the same sample assessed the SIMS twice, allowing analysis of two datasets from the same sample signifies another strength of the present study. These results suggest that the Norwegian translation of the SIMS is a valid and reliable measurement model among adolescents.
Nevertheless, some limitations should be taken into consideration. This study of the Norwegian version of the SIMS included adolescents 13-17 years old. Thus, the present results cannot be generalized to younger children nor to older adolescents.

Conclusion
This study evaluated the psychometric properties of the Norwegian version of the SIMS among adolescents in secondary school, by assessing the dimensionality, reliability and construct validity. The SIMS demonstrated satisfactory reliability and construct validity, while the dimensionality seemed somewhat blurred or indistinct. However, dismissing item 10 and 11 resulted in a good fitting model which included 14 of the original 16 items. Hence, the Norwegian version of the SIMS seemed appropriate and can be used to measure situational motivation among adolescents. The two dismissed items might represent another factor related to having a choice/personal decision. Thus, a further development of the SIMS might include more items which would tap into this possible factor-representing a fifth factor.