Coping behavior versus coping style: characterizing a measure of coping in undergraduate STEM contexts

As technology moves rapidly forward and our world becomes more interconnected, we are seeing increases in the complexity and challenge associated with scientific problems. More than ever before, scientists will need to be resilient and able to cope with challenges and failures en route to success. However, we still understand relatively little about how these skills manifest in STEM contexts broadly, and how they are developed by STEM undergraduate students. While recent studies have begun to explore this area, no measures exist that are specifically designed to assess coping behaviors in STEM undergraduate contexts at scale. Fortunately, multiple measures of coping do exist and have been previously used in more general contexts. Drawing strongly from items used in the COPE and Brief COPE, we gathered a pool of items anticipated to be good measures of undergraduate students’ coping behaviors in STEM. We tested the validity of these items for use with STEM students using exploratory factor analyses, confirmatory factor analyses, and cognitive interviews. In particular, our confirmatory factor analyses and cognitive interviews explored whether the items measured coping for persons excluded due to ethnicity or race (PEERs). Our analyses revealed two versions of what we call the STEM-COPE instrument that accurately measure several dimensions of coping for undergraduate STEM students. One version is more fine-grained. We call this the Coping Behaviors version, since it is more specific in its description of coping actions. The other contains some specific scales and two omnibus scales that describe what we call challenge-engaging and challenge-avoiding coping. This version is designated the Coping Styles version. We confirmed that both versions can be used reliably in PEER and non-PEER populations. The final products of our work are two versions of the STEM-COPE. Each version measures several dimensions of coping that can be used in individual classrooms or across contexts to assess STEM undergraduate students’ coping with challenges or failures. Each version can be used as a whole, or individual scales can be adopted and used for more specific studies. This work also highlights the need to either develop or adapt other existing measures for use with undergraduate STEM students, and more specifically, for use with sub-populations within STEM who have been historically marginalized or minoritized.


Introduction
As our world rapidly becomes more interconnected and technologically advanced, the scientific challenges and problems that we face also increase in complexity (e.g., Page 2 of 26 Henry et al. International Journal of STEM Education (2022) 9:17 Cardinale et al., 2012;Daszak et al., 2020;Madhav et al., 2017;Pachauri et al., 2014). More than in the past, the next generation of scientists will need to cope with setbacks and failures on their way to successfully navigating research (Friedman, 2017). They will need to be resilient problem solvers in addition to innovative scientists. Indeed, the ability to iterate, problem solve, and navigate obstacles and failures are already considered essential skills for expert scientists (Harsh et al., 2011;Laursen et al., 2010;Simpson & Maltese, 2017;Thiry et al., 2012). Furthermore, there is evidence that successfully navigating obstacles may be a key predictor of persistence for early career scientists (Harsh et al., 2011;Lopatto et al., 2020;Simpson & Maltese, 2017). This gives rise to questions about whether and how encountering challenge and failure might affect attrition of certain groups that depart STEM at higher rates, including persons historically excluded on the basis of ethnicity or race (PEERs, Asai, 2020). From a pedagogical perspective, we should ask if we are providing support for all students to learn how to navigate failure and challenge successfully. Thus, it is paramount to be able to assess and understand how experiencing challenge and failure affects our students, especially those at higher risk of leaving, such as PEERs. However, we still understand relatively little about how these skills manifest in STEM contexts broadly, and how STEM undergraduate students, and specifically PEERs, develop these skills (Simpson & Maltese, 2017;Traphagen, 2015). This is slowly changing. Pedagogies that provide students opportunities to cope with failures and iterate in response to challenges are increasing in number. For example, course-based and individual undergraduate research experiences can provide opportunities for students to grapple with challenge, uncertainty, and failure (Auchincloss et al., 2014;Cooper et al., 2018a;Corwin et al., 2015;Laursen et al., 2010). Evidence suggests that being exposed to research challenges within undergraduate research contexts has potential to result in numerous positive outcomes for students, including increasing their resilience and ability to navigate obstacles, increasing their understanding of the nature and culture of science, and increasing their sense of their own work as "authentic" (Gin et al., 2018;Goodwin et al., 2021;Hyman et al., 2019;Jordan et al., 2014;Laursen et al., 2010;Lopatto, 2007;Lopatto et al., 2020). However, especially when students are not sufficiently supported through these experiences, challenge and failure may also result in detrimental outcomes, such as discouragement or disengagement (Cooper et al., 2018a). Thankfully, STEM instructors are beginning to more carefully consider how to support students through challenge and failure in all STEM contexts so as to achieve positive outcomes via these experiences. The recently founded Factors influencing Learning, Attitudes, and Mindsets in Education network (FLAMEnet) supports many groups in preliminary investigations of how students approach challenges and respond to failures and how instructors support these processes.
The founders of FLAMEnet published a translational framework applying psychological concepts to challenge and failure in STEM (Henry et al., 2019). This work draws upon literature in psychology and education to explore both the definitions of challenge and failure and predict how students will respond (i.e., cope) based on different intrapersonal constructs (e.g., Mindset). It defines STEM challenges as "achievement contexts that carry with them the risk of failure-that is, they push a student's skills and knowledge to a level at which the student risks a failure by engaging with them. " Likewise, it explains that failures result when a student is unable "to meet the demands of an achievement context, with the result of not achieving a goal, " (Henry et al., 2019) with an achievement context consisting of a context in which there is a defined task that is evaluated against a standard or expectation and which requires competencies to be carried out (Cacciotti, 2015). Notably, this definition is inclusive of failures that are decidedly mild given that achievement contexts can vary from high stakes (if I fail this final, I will likely not get into medical school) to low stakes (if I do not get results on my gel, I can just extract more DNA and try again). This makes the framework versatile in that it can be used to understand both "Big F" failures which may have higher impact for students and "little f " failures which may have lower impact and may not even be described using the word "failure" by students. Based on past work (e.g., Cooper et al., 2018a;Henry et al., 2021), we know that challenges and failures (both big and little f ) in STEM contexts can cause stress for STEM students and necessitate that they cope with this stress. We focus our understanding of coping within these STEM contexts.
Coping is defined as individuals' behavioral responses to stressors that typically allow an individual to either tolerate or mitigate the stress. Coping responses are both context-dependent and malleable (Lazarus, 1993). Yet, they also become more stable over time within a given context (and thus malleability progressively lessens, Spencer et al., 1997). Thus, it is particularly important to understand coping (a) within the specific context in which it occurs, and (b) at critical times during which coping responses may be more malleable. Similarly, perceptions of relevant and meaningful coping responses may vary among different populations. This motivates our efforts to understand coping responses of STEM undergraduates, whose coping responses are likely to still be malleable within specific STEM contexts (i.e., during challenge and failure) given their status as novice scientists. Coping responses have also been categorized into numerous different higher order taxonomies, with some of the most common being problem-focused vs. emotion focused (Lazarus & Folkman, 1984); cognitive vs. behavioral (Latack & Havlovic, 1992), engagement vs. disengagement (Cooper et al., 2018b); and approach vs. avoidance (Roth & Cohen, 1986;reviewed in Skinner et al., 2003, Table 4, pp 226). Despite many movements to adopt these higher order categories, they have inconsistent empirical support (Skinner et al., 2003). However, Skinner and colleagues' work suggests 13 different "families" of coping responses (Skinner et al., 2003, Table 5, pp 240-241) with high levels of support across contexts. Skinner and colleagues (2003) extensively reviewed 100 different coping measures for evidence of coping strategies and identified strategies for constructing coping taxonomies and systems. Their table of coping families (Table 5, presents the terminology and level of support for the presence of the different families across the coping measures they reviewed. We draw upon these coping families to define coping constructs in our work (see Additional file 1: Table S1).
Coping responses within STEM can also be considered either adaptive or maladaptive. Responses are adaptive when they support both students' well-being and advancement of their STEM goals and maladaptive when they exacerbate threats to well-being and prevent progress toward goals (Carver et al., 1989;Henry et al., 2019;Skinner et al., 2003). The adaptivity of coping responses is not a stable characteristic; that is, the same coping response may in some contexts be adaptive and in others be maladaptive depending on the outcome of coping for the individual. Therefore, rather than being a stable coping taxonomy, adaptivity is a transient property of the combined coping response, context, and individual. This being said, some coping responses tend to be more consistently adaptive than others. Broadly speaking, coping responses that fall into the coping families of problem solving, support seeking, information seeking, cognitive restructuring, and emotional regulation are predicted to be adaptive in academic contexts (Alimoglu et al., 2010;Brdar et al., 2006;Henry et al., 2019;Sevinç & Gizir, 2014;Struthers et al., 2000). These strategies not only support students' well-being, but also their progress toward an academic goal, such as completion of a degree (e.g., Struthers et al., 2000) and persistence in a field of study (e.g., Shin et al., 2014). Conversely, strategies such as escape, rumination, helplessness, and opposition are predicted to be maladaptive in academic contexts based on prior empirical and theoretical work and are related to negative outcomes, such as lower academic performance (Alimoglu et al., 2010;Brdar et al., 2006;Sevinç & Gizir, 2014;Struthers et al., 2000), burnout (Shin et al., 2014), and attrition (Henry et al., 2019). These prior studies and predictions can inform our own views and hypotheses regarding STEM undergraduate coping. However, this work is not specific to STEM undergraduate contexts.
To understand STEM undergraduates' coping responses, we need to have the ability to rigorously examine coping within STEM contexts when students encounter challenge and failure. The recent proliferation of publications that focus on how we can support STEM students through challenge and failure and help them cope and develop resilience is encouraging (e.g., Brigati et al., 2020;Cooper et al., 2018a;England et al., 2019;Gin et al., 2018;Goodwin et al., 2021;Hyman et al., 2019;Lopatto et al., 2020;Perez et al., 2019;Riegle-Crumb et al., 2019;Shedlosky-Shoemaker & Fautch, 2015). However, work in this area is still in the early stages and is limited by the contexts and measures available. Much of the prior work in this area is based on studies that do not focus on understanding how students approach or cope with STEM challenges and failures per se, but rather originates from studies of the impact of different pedagogies broadly speaking (e.g., Harsh et al., 2011;Laursen et al., 2010;Lopatto et al., 2020). These studies detail general increases in students' ability to cope with obstacles without specifically targeting that outcome as the topic of investigation. The result is that there is relatively little detail provided indicating how students improve their coping skills, which coping responses they employ, and how different coping responses influence research, academic, and well-being outcomes. In addition, past measures that indicate enhanced ability to cope with scientific challenges and failures are limited. For example, the SURE and CURE surveys (Lopatto, 2005), which assess summer and course-based undergraduate research experiences, respectively, are often cited as evidence of this outcome. These surveys include only a single Likert-type question asking students to self-report improvements in their "tolerance for obstacles. " This approach limits measurement validity, since multiple items are typically needed to best assess complex latent constructs, such as coping (Knekta et al., 2019). Recent studies that assess and discuss coping as a construct of interest have uncovered multiple effective coping approaches that students use when tackling STEM challenges and failures (e.g., Gin et al., 2018;Goodwin et al., 2021). However, these studies take qualitative approaches which are time-intensive and not easily applied to large numbers of students or across contexts. To expand the scope of this work and enable cross-context comparisons of coping patterns, new STEM-specific measures are needed. Specifically, Likert-type survey instruments assessing how students approach challenges and cope with both challenge and Page 4 of 26 Henry et al. International Journal of STEM Education (2022) 9:17 failure would enable innovative new studies of STEM students' coping styles across contexts and with larger sample sizes. Fortunately, instruments that aim to assess individuals' mindsets and goals as they approach challenges and respond to failures already exist. Multiple measures for notable constructs such as growth and fixed mindset (e.g., Dweck, 2006), fear of failure (e.g., Conroy et al., 2002), goal orientation (e.g., Dowson & McInerney, 2004), attributions (e.g., Russell et al., 1987), and coping responses (e.g., Carver et al., 1989) have been developed, tested, and employed across many contexts by psychologists and social scientists. However, a limitation of these measures is that they were not developed for or previously tested in undergraduate STEM contexts. Experiencing challenge and failure in STEM contexts may be unique, since failure is seen by expert scientists as an integral and necessary part of the scientific process and may even be lauded as a rite of passage (Simpson & Maltese, 2017). The idea of "growth through frustration" (Lopatto et al., 2020) abounds in STEM, and yet, this idea is countered by scientists' own perceptions that they must be "perfect" to be of value (e.g., Buck et al., 2008;Riegle-Crumb et al., 2019). As a result, measures developed for studying response to challenge and failure in other contexts may not be valid. Indeed, recent work investigating the use of common psychometric measures within STEM contexts has highlighted their limitations. Measures of growth mindset may be influenced by students' various definitions of intelligence (Limeri et al., 2020), which may be unique for STEM students. Likewise, our prior work confirmed that a published measure of fear of failure (the PFAI, Conroy et al., 2002) did not accurately measure STEM students' fear of failure. A revised version of the instrument, however, was shown to more accurately measure this construct (Henry et al., 2021). While these examples demonstrate efforts to revise and explore the validity of instruments that examine students' approaches when confronting challenges, less work has been done to understand if existing measures accurately assess STEM students' coping responses after experiencing challenge and failure. This is the subject of our work.

The current study
The availability and choice of appropriate and valid assessment tools is critical for rigorous educational research (Cronbach & Meehl, 1955). In previous work, we established that certain intrapersonal factors (i.e., fear of failure) show unique presentation within the STEM context and, therefore, specialized versions of assessment tools are needed to properly assess levels of intrapersonal factors among undergraduate STEM students (Henry et al., 2021). In addition, prior research has established that PEER students are both more likely than their non-PEER counterparts to leave STEM academic programs (Asai, 2020;NCSES/NSF 2020;Steele 1997;Stinebrickner & Stinebrickner, 2014) and to be positively impacted by interventions targeting intrapersonal factors, such as coping (Aronson et al., 2002;Fink et al. 2018;Yeager et al., 2016). For these reasons, it is important to consider the way a coping assessment will function within PEER (vs. non-PEER) samples. In this work, we aim to….
• Develop a valid, reliable, and versatile measure of coping in STEM contexts, • Test the validity of the measure for both PEER and non-PEER populations, and • Reflect on how the measure can be used and adapted by researchers, instructors, and administrators interested in investigating coping in STEM contexts given the results of our work.
To accomplish this, we draw upon existing assessments for coping style, the COPE (Carver et al., 1989), the Brief COPE (Carver, 1997), and the Student-COPE (SCOPE, Struthers et al., 2000) to inform our item selection and elucidate measurement models that may accurately describe STEM and STEM PEER undergraduate coping responses to challenge and failure. First, we adopted and wrote items that we felt would accurately capture relevant coping responses based on Skinner's coping families (Skinner et al., 2003). Next, we used exploratory factor analysis (EFA) to determine the best organizational structure(s) for our selected items based on undergraduate STEM students' responses to our survey. Third, we used confirmatory factor analysis (CFA) to verify the indicated structure(s) in additional samples of STEM undergraduates. We also investigated the suitability of the model(s) in a restricted sample of PEER (persons excluded because of their ethnicity or race; Asai, 2020) STEM undergraduate students. Fourth, we employed structured cognitive interviews among a representative group of undergraduate STEM students to assess the face and content validity of our final, revised version of the Brief COPE (Fig. 1). We describe full details on the methods, results, and conclusions for each of these steps below, and provide a broader discussion on the implications of using our modified coping measure in STEM undergraduate populations.
Step 1: Item selection and a priori model predictions S1 Methods Our central aim was to produce a measure of coping that would be valid for assessing undergraduate students' coping behaviors in response to challenges and failures in STEM learning contexts. We also desired for the measure to be useful for instructors seeking to encourage development of adaptive coping behaviors; that is, we desired a measure that could be used as a pre-post-assessment to measure a change in adaptive coping as a result of interventions. Rather than testing a single existing measure, we chose to draw upon multiple existing scales to form a bank of questions from which we developed the STEM-COPE measure. To achieve this goal, we (the authors of this work) worked together as an interdisciplinary expert panel consisting of two psychologists, one STEM education researcher, and two chemistry researchers to evaluate existing questionnaire items for inclusion in the STEM-COPE. This approach was prompted in part by our evaluation that previously developed scales had limitations of face and content validity that we wished to avoid. Specifically, prior scales: (a) lacked components we would expect to be present in STEM academic coping contexts (e.g., The Coping Strategies Indicator (Amirkhan, 1990) does not include items for positive reframing of a challenge), (b) were too context-specific or assumed properties of the context that may not always hold true (e.g., the SCOPE (Struthers et al., 2000) includes items that focus on "study guides, " which may not always be available or applicable), (c) included scales that would not be useful for instructors seeking to take action in their classroom (e.g., the COPE and Brief COPE include items about drug and alcohol use which may be inappropriate to intervene upon in STEM classrooms and which have potential to incriminate students or make students uncomfortable when responding to the survey in a STEM context), and finally (d) lacked alignment with our guiding framework (i.e., Table 5 from Skinner et al., 2003).
Despite these limitations, we still felt that many individual items from prior scales were highly appropriate for STEM undergraduate contexts. Therefore, we evaluated questions in four broadly used coping instruments including the COPE (Carver et al., 1989), the Brief COPE (Carver, 1997), the Ways of Coping Questionnaire (Folkman & Lazarus, 1985;Leigh, 1979;Scherer et al., 1988), and the Coping Strategies Indicator (Amirkhan, 1990). We determined the Brief COPE was the most appropriate scale from which to draw most of our items, since it was dimensionally comprehensive (assessed 14 constructs) and included many items that aligned with the framework supported by Skinner's literature review (Skinner et al., 2003, Table 5, pp 240). We also drew items from the original COPE (Carver et al., 1989), and the SCOPE (which is based on the COPE, Struthers et al., 2000). We selected items that the authors agreed would be relevant and valid within a broad range of STEM undergraduate contexts and that would not limit the use of the scale to a particular learning context. For example, more general items such as "I take action to try to make the situation better" were preferred in comparison to more specific items such as "I drop out of the class I'm doing poorly in, " since this example would not apply in all STEM learning contexts.
After item selection, we reviewed the list of items generated from the Brief COPE, COPE, and SCOPE to assess the content validity of the questions. We asked, "does our group of questions fully capture the relevant dimensions of STEM coping?" Based on our own and others' prior work (e.g., Brigati et al., 2020;Gin et al., 2018) and the constructs listed and described in Skinner and colleagues work (2003, Table 5, p. 240), we felt that the list of constructs was not yet comprehensive. Specifically, we were concerned that the list did not include items reflecting coping via emotional regulation. This was an important coping factor in a recent study on coping within biology and contributes to self-reliance styles of coping (described in Brigati et al., 2020), and it has been recognized as a factor that contributes to "thinking like" and "becoming" a scientist" (Hunter et al., 2007;Jaber & Hammer, 2016;Laursen et al., 2010). Therefore, we wrote several items to address the dimensions of emotional regulation. We also recognized that items related to escape were sparse and created additional items to address this dimension. To promote consistency, these items were written with items from other instruments as models (e.g., the item "I try not to let my feelings control me" was based on the Ways of Coping Item "I try to keep my feelings from interfering with things too much. ").

S1 Results
This process resulted in a total of 36 items that were deemed acceptable for testing using EFA. We predicted that these items would allow assessment of the following dimensions of coping: problem solving (and more specifically direct action and planning), support seeking (both help and comfort seeking), escape, acceptance, distraction, cognitive restructuring, self-blame, helplessness, emotional regulation, information seeking, and opposition. See Additional file 1: Table S2 for the full list of items, their original dimensions presented in the Brief COPE, COPE, or SCOPE, and our a-priori dimensional designations according to families described in Skinner et al., (2003).

S2 Methods
EFA is most appropriate when researchers do not have existing hypotheses regarding how individual survey items (e.g., "I act as though it hasn't happened") will group together into factors or subscales or when researchers wish to allow items to organize into groups without imposing an initial hypothesis. Alternatively, CFA is used when there is a hypothesized existing factor structure that researchers wish to "confirm" as the appropriate factor structure for a given population. Despite prior work suggesting factors into which the Coping items we wished to evaluate might fall (Additional file 1: Table S1), we chose to perform EFA as a first step for three reasons. First, there is disagreement within the coping literature about appropriate higher level coping "dimensions" (i.e., components of a construct; dimensions typically determine what will constitute a 'factor' in factor analysis). Disagreement abounds with regard to appropriate groupings of ideas (e.g., proactive vs. reactive; emotion-focused vs. problem focused; active vs. passive). Second, our prior work suggested that, for intrapersonal constructs such as coping, factor structures that work for general or broad undergraduate populations may not work for STEMspecific populations (Henry et al., 2021). Third, prior work on many of the items we included in our initial item set suggested that closely related constructs may group together to inform us about broader coping styles. Thus, we proceeded with EFA and hypothesized that it would not only yield a well-fitting model for use among STEM undergraduate students but also reveal insight into patterns in the ways STEM students view coping that are unique to this context.

S2 Participants
Data for this EFA were drawn from a data set of approximately 1800 undergraduate STEM students. Participants were invited to participate in this study during Fall 2018 by STEM instructors who are members of FLAMEnetan NSF-funded research collaborative that aims to gather a diverse group of STEM instructors, education researchers, and social scientists to conduct research and create resources aimed at fostering innovation and resilience in student scientists in higher education (https:// qubes hub. org/ commu nity/ groups/ flame net/). Instructors provided information about the study to students during class, via the course learning management system, and/or via email. As an incentive for participation, students were informed that they would receive a small amount of extra credit (< 1% of their grade) for participating. All students were enrolled in college-level STEM courses during the study at a range of institution types-private and public, research and liberal arts, 2 years and 4 years. After removal of outliers (described below), 1250 students remained in the data set. Based on current psychometric recommendations, this sample size should provide sufficient statistical power for a successful EFA (Knekta et al., 2019). Full demographics of participants are presented in Table 1. In general, students were majority female (68.1%), White (56.7%), and non-Latinx (88.8%).

S2 Procedures
The activities carried out as part of this study were approved by the Emory University IRB (Protocol IRB00114138). The 36 items chosen during item selection were used to construct a Qualtrics survey, which was implemented during the Fall 2018 semester (see "Participants" above). Consenting students completed a survey within the first few weeks of the semester which assessed coping, among other psychosocial variables of interest (e.g., see Henry et al., (2021) for examples of other constructs included). Prior to beginning the survey, the students were provided with the following prompts, which were designed to align with the authors' understanding of challenge and failure (see "Introduction") and to help alleviate discomfort students might feel when thinking about failures in particular: "The following questions ask you to consider the way you feel and act when you face failures and challenges in your STEM (Science, Technology, Engineering, & Mathematics) courses. In this case, a failure is simply any time when the reality of a situation falls short of what you wish had happened. We all fail at things sometimes, and this is a completely normal part of life and of being a scientist. A challenge is any situation that makes the possibility of a failure more likely. " After this initial explanatory passage, students were asked to consider a list of coping behaviors based on the following prompt which appeared as a question at the start of the list: "How often do you do the following when dealing with challenges, struggles, or failures in your STEM course(s)?" This prompt language was designed to specifically ask students to consider coping responses in STEM environments, as prior work Table 1 Demographic characteristics of participants 1 Percentages are computed from all valid data in a category and do not include missing data. Raw frequencies, therefore, may be less than the total N for each sample 2 Students were asked the question "With which gender do you most identify?" Female, Male, Non-binary, Transgender, Other, Prefer not to answer. Responses not represented above were not selected by participants 3 Data collection methods for "major" information varied across timepoint and collection location. Some instructors constrained students to 6-option forced choice selection (including "Other"), while others allowed students to list their current major as they chose. Major data were not collected in fall 2019. This accounts for some of the variation in rates of majors presented here 4 Respondents could choose multiple categories to express their racial identity; thus, total responses for the category exceed N = 433 (100%)

Variable Value
Step 2: EFA sample (Fall 2018; n = 1250) Step Prefer not to answer 9 (1.2) 12 (2.8) 1 (0.6) Page 8 of 26 Henry et al. International Journal of STEM Education (2022) 9:17 demonstrates that non-cognitive responses, such as coping, can be context specific and may manifest differently among college students in STEM environments (Henry et al., 2021;Lazarus, 1993). In addition, this prompt was designed to be broad to encompass many stressful contexts within STEM, including contexts that students perceive as challenges and those that they perceive as failures (i.e., "challenges, struggles, or failures"). We chose broad language for two reasons. First, we desired our instrument to be a versatile measure of coping across stressful contexts that ranged from what might be encountered as a challenge to what might be encountered as a failure. Thus, we used multiple terms. Second, prior research has demonstrated that students' own conceptualizations of what constitutes a "failure" or "challenge" vary from student to student (Krishnan, 2021). For example, some students might deem what we describe above as "little f failures" as "challenges" or "struggles". Others describe minor errors that can be fixed in the moment as "total failures., " (Krishnan, 2021). Using the above prompt to describe potential STEM stressors, rather than a prompt with only one term, allowed us to build an instrument that is versatile and can be used across various STEM contexts in which students experience challenges and may have to cope with failures. Students responded to survey items on a four-point response scale including Not at all, Rarely, Occasionally, and A lot. We chose this response scale based on prior use of this scale with the Brief COPE and also in an attempt to avoid social desirability bias. Research comparing four-point and five-point Likert scales has found that a four-point scale results in more responses on the negatively valenced end of the scale. This suggests that the removal of a "neutral option" may reduce unconscious desires on behalf of respondents to be helpful to the researcher or avoid giving socially "unacceptable" responses (Garland, 1991). Furthermore, since the goal of these items was to elicit information, not opinion, and a true neutral value does not logically flow with our range of responses, a four-point scale was judged the most appropriate. After completing survey items, students answered relevant demographic questions. These were placed intentionally at the end of the survey in an attempt to mitigate stereotype threat.

S2 Preliminary results
Descriptive analyses. We began by identifying outliers in our data set using the outlier labeling method (Hoaglin & Iglewicz, 1987;Hoaglin et al., 1986;Tukey, 1977). This approach classifies outliers (here defined as values more than three standard deviations beyond the mean) as missing data, thereby excluding them from data analyses without permanently removing them from the data set. In addition, exploration of descriptive statistics, including skewness and kurtosis, revealed that these data were not normally distributed. Thus, the robust maximum likelihood ratio (MLR) was used when conducting our main analyses in MPlus. Initial exploration of the data also revealed that students endorsed the full range of responses on the coping items, providing suitable variability within the sample to proceed with EFA.

S2 Main results
Determining the number of factors that should be included within our EFA was accomplished via analysis of eigenvalues and a visual inspection of the Scree plot. Eigenvalues indicate the amount of unique information provided by each individual factor. Because of this, a factor with a higher eigenvalue is typically assumed to be more useful in model definition, and a general rule is that factors with eigenvalues below 1.0 should not be included in models (Cattell, 1978;Knekta et al., 1999). Scree plots provide a visual demonstration of eigenvalues by plotting them against the number of factors. The number of factors should be limited at the point, where the slope of the Scree plot exhibits a sharp drop (Cattell, 1966;Kaiser, 1960;Knekta et al., 2019). However, guidelines regarding eigenvalues and Scree plots are subject to interpretation and should only be used as a starting point to help researchers limit the number of factors considered for an EFA. To make final determinations of the number and structure of factors, it is critical that researchers carefully examine quantitative fit statistics that are provided for all potential models and consider the theory and constructs underlying creation of survey items. These factors and the future desired use of a measurement should guide conclusions about the "best" number of factors or overall goodness of fit for any one model (Knekta et al., 2019).
While benchmarks for "good fit" can vary (Kenny, 2020), we interpret fit statistics using recommendations from Knekta et al. (2019). Specifically, we employ Akaike's Information Criterion (AIC), which compares all proposed models to a theoretical "true" model, calculating how far the data fit to the model fall from the "ideal model. " AIC also allows for comparison between models from the same sample; the AIC value is each respective model's distance from the "true" fit for the data. Therefore, lower AIC values are preferred (Akaike, 1998;Kenny, 2020). Root Mean Square Error Approximation (RMSEA) values describe the "badness of fit;" again, a lower number is preferred. Alternatively, higher values are preferred for the Comparative Fit Index (CFI), since it assesses incremental improvements in model fit above a baseline model. Finally, Standardized Mean Square Residual (SRMR) represents the standardized difference between a predicted correlation among error residuals and the actual observed correlations. Smaller differences between these correlation values indicate closer convergence. Thus, a smaller SRMR value indicates good fit (Kline, 2010;Taasoobshirazi & Wang, 2016) See Knetkta et al. (2019) for more complete descriptions of how each metric is calculated and their meaning.
For our data, investigation of eigenvalues and the Scree plot (see Fig. 2) indicated that a model of between 3 to 7 factors would provide the best fit. EFA for each of these proposed factor structures was carried out using MPlus v. 8.1 (Muthén & Muthén, 1998. Model fit statistics are displayed in Table 2 (rows 1 and 2). Fit statistics suggest that models with both 6 factors and 7 factors are a good fit for our data. To further investigate fit, we then looked at the factor loadings of items within each of the proposed models. Any item that loaded onto a factor with a loading of at least 0.40 and had a distance of at least 0.20 from any cross-loadings was retained for that factor (Masaki, 2010). We also considered whether or not the emergent factors were consistent with past research and theory regarding coping responses. After analyzing both prospective models with these criteria, both still stood as good fits for the data, providing different perspectives for assessing coping that are equally valuable and grounded in theory.
The 6-factor model, hereafter termed the Coping Behaviors model (Table 3), drops 18 items, leaving a 22-item measure. The remaining items cluster onto factors that closely echo specific coping behaviors that have been proposed to assess coping previously (e.g., Skinner et al., 2003). These behaviors include escape ("I refuse to believe this has happened"), disengagement ("I give up trying to deal with it"), support seeking ("I get help and advice from other people"), problem solving ("I think hard about what steps to take"), cognitive restructuring   ("I look for something good in what is happening"), and humor ("I make jokes about it").
The 7-factor model, which we have called the Coping Styles model (Table 4), drops 11 items, presenting a 29-item measure. Contrary to the Coping Behaviors a Factor loadings over 0.50 appear in bold. Factor loadings below 0.20 appear in lighter italic font b These items were dropped from the measures after subsequent confirmatory factor analyses c Items were dropped from the scale for one of two reasons: (1) no factor loadings were above 0.40 or (2) the difference among factors loadings above 0.40 was less than 0.2, indicating too high a degree of cross-loading model, the Coping Styles model sees many items cluster into two broad factors. One of these subsumes items from those coping styles typically viewed as "adaptive" (e.g., problem solving (direct action and planning), information seeking, etc.) and can, therefore, be thought of as assessing an individual's overall tendency to engage with and deal with a stressor in a positive way. Importantly, this factor is distinct from that in the Coping Behaviors model, since it includes the direct-action component of problem solving, implying that students who endorse these items not only try to understand the problem and plan to solve it, but also take action to solve it. We have named this challenge-engaging coping. The other large factor acts as an umbrella for many individual coping behaviors which have historically been viewed as "maladaptive" (e.g., helplessness, escape, distraction, etc.) and may be used to assess an individual's overall inclination to avoid dealing with stressors and challenges in productive ways, or challenge-avoiding coping. In addition to these two omnibus factors, the Coping Styles model continues to include cognitive restructuring, support seeking, and humor and also sees self-blame emerge as a significant, distinct factor (e.g., "I criticize myself ").

S2 Brief discussion
EFA allowed us to elucidate how our selected existing coping items (those from the COPE [Carver et al., 1989)] Brief COPE [Carver, 1997], and SCOPE [Struthers et al., 2000]) grouped into factors that describe coping in undergraduate STEM contexts. Notably, we found two models with good fit, the Coping Behaviors and Coping Styles models, both of which significantly reduced the number of items (by 22 and 11 items, respectively). The Coping Behaviors model included factors for escape, disengagement (i.e., helplessness and distraction), support seeking, cognitive restructuring, problem solving (information seeking and planning), and humor (see Additional file 1: Step 3: Confirmatory factor analyses (CFA's) to confirm fit of new models

S3 Methods
Once the Coping Behaviors and Coping Styles models emerged as good fits for the EFA data, it was critical to verify that these models also fit data pulled from more than just one sample of STEM students. CFA was, therefore, used to test the fit of these models in a second sample of students. Unlike EFA, CFA is appropriate when a priori hypotheses exist regarding the organization of a conceptual model. That is, CFA tests whether an existing measure of a construct-here, the Coping Behaviors and Coping Styles models-is consistent with the proposed understanding of the construct (i.e., coping). Because research indicates that efforts to improve educational interventions and assessment may be particularly powerful for PEER students who are at high risk of STEM attrition (NCSES/NSF 2020;Steel, 1997;Stinebrickner & Stinebrickner, 2014), we decided to run CFAs for each model separately for PEER and non-PEER students. This was done to ensure that our final recommended model(s) were effective in assessing coping in PEER students.

S3 Participants/procedures
Participants were part of the same research project conducted by FLAMEnet described above; they were recruited in the same ways and underwent the same procedures for data collection. Students, both PEER and non-PEER, were all enrolled in a STEM course at the time data were collected. All students completed the same 36 item coping measure described above (Step 2, "Materials") which specifically asks respondents to consider their responses to struggles and challenges in STEM contexts. Non-PEER students. 433 undergraduate students were recruited from STEM classrooms during the Fall 2019 semester. Students were classified as "non-PEER" based on their response to the demographic question "Which of the following best describes you?" Students who selfidentified as "White" or "Asian" on this item were classified as non-PEER. This distinction is based on data from the National Science Foundation (NSF) indicating that Asian students are not typically underrepresented in college-level STEM and health-related sciences in the United States (Asai, 2020). From this Fall 2019 sample, 363 students were classified as "non-PEER".
PEER students. To maximize statistical power, PEER students were drawn from both the Fall 2018 data set of 1309 students (Step 2) and 433-person data set collected in Fall 2019. This combined sample was used for  all analyses comparing model fit among PEER STEM students and non-PEER STEM students. While this is not ideal (Knekta et al., 2019), combining both years of PEER student samples allowed us to carry out factor analysis to assess the validity of our coping measure specifically for PEER students. Otherwise, due to the continued lower a Factor loadings over .50 appear in bold. Factor loadings below 0.20 appear in lighter italic font b These items were dropped from the measures after subsequent confirmatory factor analyses c Items were dropped from the scale for one of two reasons: (1) no factor loadings were above 0.40 or (2) the difference among factors loadings above 0.40 was less than 0.2, indicating too high a degree of cross-loading d These items had high cross-loading (> 0.50) with a seventh factor during exploratory factor analysis. However, as these were the only two items that loaded on the other factor above 0.40, the seventh factor was dropped from the model enrollment of PEER students in STEM (Asai, 2020) and the power necessary for factor analyses (Kyriazos, 2018), such work would not have been possible in this study. Once again, classification as a "PEER" was made based on responses to one demographic question. Any student who self-identified with a race or ethnicity other than "White" or "Asian" was designated as PEER. This resulted in 280 PEER students who identified as belonging to African American or Black; American Indian or Alaskan Native; Arabic or Middle Eastern; Hispanic or Latinx; multiracial; and/or other racial or ethnic groups. Full demographics for both non-PEER and PEER students can be viewed in Table 1.

S3 Preliminary results
Once data were cleaned (e.g., outliers truncated, cases with majority missing data deleted, etc.), we performed independent sample t tests to examine whether our samples were from similar populations (with the exception of the demographics we expected to vary, i.e., PEER status) and to elucidate other potential variables that might have influenced our results (e.g., Are any results that differ among samples observed purely due to PEER vs. non-PEER status, or might they also be influenced by other factors such as SES?). Independent samples t-tests indicated that there were no significant differences between the Fall 2018 ( Step 2) and Fall 2019 samples, or between non-PEER and PEER samples for gender. Students who participated in the Fall 2019 CFAs reported being slightly older and at a more advanced class level. Finally, PEER students reported significantly lower levels of parental education, especially in Fall 2018. Full details of these demographic differences can be viewed in Additional file 1: Table S3 and their implications are discussed there. Overall, we did not feel that the differences observed significantly changed the implications of our study. Again, data were not normally distributed within either the non-PEER or PEER samples, so the MLR estimator was used for conducting main analyses.

S3 Main results
Using MPlus v. 8.1 (Muthèn & Muthèn, 1998, CFAs were run separately for non-PEER and PEER students for both the Coping Behaviors and Coping Styles models. Model fit was assessed using AIC, RMSEA, CFI, and SRMR as described in the "Results'' section of Step 2, above (Kline, 2010;Taasoobshirazi & Wang, 2016). When CFAs were conducted to assess the Coping Behaviors model separately for non-PEER and PEER students, the latent variable covariance matrix (PSI) for the humor factor was not "positive definite" in either group of students. In this case, this was likely caused by the fact that other factors demonstrated a correlation greater than 1 with humor. This suggests that students' responses do not distinguish humor as a distinct coping response when considering PEER and non-PEER students separately, rather than as an omnibus group. Instead, other factors in the Coping Behaviors model are more complete and parsimonious explanations for variations in the data. The humor items were, therefore, removed from the model and the CFAs were rerun for both groups. Model fit statistics based on these analyses are displayed in Table 2 (rows  4-7).
For the Coping Styles model, an initial CFA also revealed a "not positive definite" latent covariance matrix for the humor factor. However, in this case, issues of fit with humor were only observed with PEER students, while humor remained a distinct factor of the model for non-PEER students. However, given our goal of generating a version of a coping measure that could be used effectively to assess coping for all students, including PEERs, we made the decision to drop the humor factor from both PEER and non-PEER groups when evaluating the fit of the Coping Styles model. (It is worth noting that, while there was not an issue with the covariance matrix related to humor in the non-PEER group for the Coping Styles model, removal of this factor did improve overall model fit for these students as well; AIC 76,283.655 vs. 68,742.421.) With humor removed, both the Coping Behaviors (Table 2, rows 4 and 5) and Coping Styles (Table 2, rows 6 and 7) models demonstrate good fit for both non-PEER and PEER students. This suggests that either measure could be used to assess coping within undergraduate STEM students. Which measure is ultimately most appropriate will depend on the specifics of the research question or goals of the instructor. Correlations between latent factors of both the Coping Behaviors and Coping Styles models can be viewed in Additional file 1: Tables S4 and 5 and reliability of coping dimensions across samples and models can be viewed in Additional file 1: Table S6.

S3 Brief discussion
By conducting this split CFA analysis on both PEER and non-PEER students, we demonstrate that slightly modified versions of both models (which drop the humor factor) provide statistically good fits for both PEER and non-PEER populations. The removal of humor as a factor is not surprising given that it is context-dependent and also culturally embedded (Rappoport, 2005), making the uses of humor more likely to vary across cultures and contexts. Notably, given past work on the challenges associated with pursuing a STEM career for PEER students and the increased efficacy of interventions for PEERs (e.g., Sisk et al., 2018), we feel that both the Coping Behaviors and Coping Styles versions of this Page 16 of 26 Henry et al. International Journal of STEM Education (2022) 9:17 instrument could be especially useful tools for future research. However, other identities and demographic factors may also influence interpretation of coping items, and more work is needed to determine if these versions of the STEM-COPE instrument will be useful across all identities. For example, the influence of mental health on how students interpret these items was not considered here, though this may be an interesting future avenue to pursue given recent work (Cooper et al., 2018a). Our decision to focus on model fit for PEERs was based on the wealth of research detailing the continued disparities in STEM achievement between PEER and non-PEER students despite decades of interventions (Asai, 2020). Future studies should examine model fit for other excluded and marginalized subgroups.

S4 Methods
Finally, a series of cognitive interviews directly asked STEM students about their interpretation of the survey items. By directly asking students about their interpretation of the survey items, we were able to assess whether or not their interpretation aligned with the intended purpose of the item. These interviews also elucidate what the participant is thinking and feeling while responding, which can often influence the valence of responses (Willis, 2015). Thus, we use cognitive interviews to (a) check the face validity of our items (were the items interpreted by STEM undergraduates as we intended, (b) help elucidate potential reasons that certain items did not have good fit in our EFA and CFA analyses.

S4 Participants/procedures
Eleven students completed interviews of approximately 20 min each via Zoom during the Summer 2020 and Fall 2020 semesters in exchange for a $20 Amazon gift card. Full demographics for these students can be viewed in Table 5; in general, there was an equitable distribution of gender and non-PEER/PEER students. During the session, the purpose of the interviews was explained to the students and each of the survey prompts, and 36 original coping items was reviewed individually. Students were asked to comment on: (a) if the meaning of the question was clear and how they interpreted the question, (b) if answer choices seemed appropriate, and (c) if there were any suggestions for improving the question. At the end of the interview, students were asked if they had any final thoughts on the measure as a whole.

S4 Results
Student responses indicated that the question prompts used to introduce the items were clear. Students expressed that they found the initial passage that explained failure and challenges helpful, since it contextualized what these terms meant. Specifically, the students commented that it was especially helpful that the provided definition of failure was broad and, "less of a harsh thing, " because they found it helped them come up with examples of failures more easily. They also commented that the less "extreme" definition put them at ease when considering the questions. They expressed that they understood what was meant by challenge, struggle, or failure, but did not elaborate with specific examples. Students were also asked what courses came to mind after considering the prompts. This was done to ensure that they were responding with STEM contexts in mind. Most students mentioned chemistry or biology, with two mentioning math courses, confirming that the prompt was sufficient to ensure that students responded about coping in STEM contexts. Students reported that the items retained in the two measure versions were clear and had appropriate answer choices. Students' thoughts about the items dropped for both the Coping Behaviors (18 items) and Coping Styles (11 items) models suggest several reasons why these particular items do not effectively assess coping in undergraduate STEM contexts. In particular, issues of redundancy, vague wording, and an overall sense of ambiguity or failure to consider context emerged as themes among the dropped items.
Among the Coping Behaviors scales, redundancy was frequently mentioned as a concern for dropped items which were intended to load on denial or problem solving factors. For example, of denial items "I pretend it hasn't really happened" and "I act as though it hasn't happened", respondents asked, "How is this different from 'I say to myself 'this isn't real'?'"-VH, or "Is this intentionally similar with 'I say to myself 'this isn't real'?"-PW. Similarly, many dropped problem solving items were described as "similar, " with students asking how one was different from another. When asked to comment on the item, "I take action to try to make the situation better, " VH said: This is similar to 'I take action to try to find out why it occurred' and 'I think about the reason(s) why it occurred' -they're all figuring out how to make it better.
Among the problem solving items, the item-"I think about the reason(s) why it occurred"-appeared to be preferred by some students, as it was seen as "better than [others]; more clear because thinking about why is the first step to action"-MB.
Another issue raised by our cognitive interviews was the importance of clarity among items. Student responses for several of the items on the distraction scale of the Coping Behaviors model and the support seeking scale highlighted vague wording. The one distraction item retained in the Coping Behaviors scale (albeit on the disengagement/avoidance factor)-"I do something to think about it less; e.g., watching movies, tv, reading, etc. "-was lauded by students for its clarity, and they wondered why other items were not modeled after it: Why are there options (specifics) listed for this, not others where there could be? It would be helpful.-CE.

Students had issues with other distraction items that
relied on more open-ended terminology such as, "I turn to work" ("Vague: what kind of work? Other activities can range from eating, talking on the phone, etc. "-PW), "reach my goal" ("Are we talking about long-term or short-term goals?"-MZ), and etc. In particular, students found the item, "I give up the attempt to cope with the problem" unclear: A little unclear -Giving up on solving it? Giving up [on] coping, but still trying to solve it? Letting it emotionally destroy me while still working on the problem?-MZ.
This lack of clarity could have contributed to the majority of distraction items being dropped.
Similar concerns were raised about the support seeking items on both versions of the measure (e.g., "[A past item] talked about 'emotional support' . Is this help and advice different from that?-MB). However, we do not find these comments as concerning, given that these items loaded on both models. Nonetheless, these results underscore the broader point that STEM students seem to prefer greater detail in the items used to assess coping in academic contexts.
Related to this, we also recorded several responses that indicate a lack of clarity about the context or framing of the item. With the humor items, "I make jokes about it, " and, "I make fun of the situation, " for example, students pointed out that the object and tone of the humor could vary and that this would impact their response. These students requested more context: Narrow it down to whether it's a negative or positive thing? What are they making jokes?about?-MZ I do make fun of the situation, to make myself feel better about something. So, add something about why.-PO This recognition by students of the dual nature of humor and the importance of context in terms of its function within a coping paradigm could help explain why the humor items were ultimately dropped from our factor structures. This aligns with trends seen in the CFAs indicating that humor does not form a consistent factor and that it may be acting in a more complex way that is dependent on both context and student background.
Similarly, rumination/self-blame loaded as a separate factor for the Coping Styles, but not Coping Behaviors model. In addition, based on student responses, there appear to be similar concerns regarding context and ambiguity of these items. Of "I criticize myself ", students said. Finally, these concerns around the exact context of certain items provide some insight into items that did not load onto either model. Both acceptance items, "I accept the reality of the fact that it happened, " and, "I learn to live with it, " were dropped from both models. In addition to revisiting the issue of redundancy ("[Learn to live with it] is similar to accept the reality"-DQ), some students questioned the appropriateness of this type of item given the established STEM academic coping context:

This could be difficult to use with the answer choices. If you're having a challenge or something, you'd want to figure out how to fix it. You wouldn't just live with it. "-LF
This suggests that STEM undergraduate students may not view acceptance as a viable coping response when faced with challenges or failures in academic contexts.
Similarly, respondents raised many questions regarding context specifics of the items for the emotion regulation and the opposition-venting scales, both of which focus on emotions. About items such as, "I say things to let my unpleasant feelings escape, " and, "I express my negative feelings, " students wanted clarity about the types of emotions experienced ("[Could you] define negative feelings?"-MB), by and to whom emotions were expressed ("To yourself or expressed to other people"-PW), and for what purpose ("Some people could internalize to avoid upsetting others; others may vent and rant a lot. "-CE). In particular, the emotional regulation items, "I try not to let my feelings control me, " and, "I try to express my feelings about the situation in an appropriate way, " prompted a number of concerns: It's not always necessarily bad to let feelings control you, so this could be contextual.-CE. What defines an "appropriate" way? Is it talking to someone? Taking the next step? Learning to live with it? Being angry or sad?-MB.
Only one emotional regulation item, "I am aware of my feelings about the situation, " loaded onto a factor structure during our factor analyses (problem solving for Coping Behaviors and challenge engaging for Coping Styles). Students did not comment on this item during cognitive interviews other than to indicate a general sense that it was a fine item. However, based on their concerns with other emotion regulation items, it is possible that this item is preferred, because it does not require the respondent to make any sort of judgment about the feelings being processed or the way they are communicated; one simply needs be aware of them.

S4 Brief discussion
Our results from the cognitive interviews support the structure of our survey. Among the items that were maintained in both of the instruments, there was little to no confusion or suggestions for revision suggested by interviewees. The one exception to this was that one student felt the support-seeking items could be further clarified. Items that were dropped during EFA were also generally seen as redundant, vague or unclear, or not relating to the STEM context. Notably, some of these results, such as students' comments regarding the items we predicted would measure acceptance indicate that STEM undergraduates may not view all coping strategies as relevant or appropriate in STEM contexts. Other results, such as the questions and ambiguity expressed in response to the emotional regulation items, suggest that some coping behaviors, particularly those relating to emotions, may be more complex in STEM environments. Together, the data from our cognitive interviews provided important insight into reasons for our model fit.

Limitations
While our methods and procedures for evaluating the validity of the new STEM-COPE instrument versions are robust, they are not without certain limitations. A priori power analysis using GPower 3.0 (Erdfelder et al., 1996) indicated that a sample size of 500 would be ideal for our EFA and CFAs. While we exceeded this number for our initial EFA (N = 1250), we did not reach this threshold for the non-PEER (N = 363) or PEER (N = 280) samples we used for CFA. In addition, our PEER sample was aggregated across 2 years of data collection to maximize power. To fully characterize these models of coping within PEER STEM students, future studies should recruit large, independent samples to replicate our findings. There is also a possibility that our sample size limited our ability to detect small yet meaningful differences (Little, 2013) which the educational community recognizes as important effects (Kraft, 2020). In addition, none of our student samples were randomly selected; instead, participation was voluntary resulting from recruitment with announcements of the research opportunity distributed by instructors. We anticipate that both self-selection of students into the study and recruitment specifically by FLAMEnet instructors (who typically value good pedagogy and may be more motivated to include evidencebased practice in their classrooms) may have slightly biased this study. However, our efforts to collect data across multiple disciplines and multiple institutions representing diverse contexts could have largely mitigated these biases.
Despite our efforts to recruit a highly diverse participant pool, we still encountered some limitations. While we were able to conduct a separate CFA for PEER students, we did not have sufficient racial and ethnic diversity in our sample to conduct separate CFAs for other demographic groups within this sample and the sample was not completely representative of national trends (U.S. Department of Education, 2012). While we did not include Asian students in our PEER group based on NSF definitions, certain Asian groups are underserved and underrepresented and our grouping does not take into account these subgroups (e.g., Vietnamese, Bengali etc.). Since a goal of measurement should be to assess interventions that target underserved populations in STEM, it will be important that future work take a more nuanced approach to measuring coping responses in specific underserved populations. Similarly, female-identifying students made up the majority of our sample. This is likely because we recruited heavily from biology classrooms in which female-identifying students are typically in the majority. Finally, our sample contained mostly Biology and Chemistry students and did not represent as many students enrolled in classes from other disciplines (e.g., Physics, Geoscience, Computer Science, Psychology). Students in our sample also largely reported pursuing a STEM major (e.g., Biology, Chemistry, Engineering, etc.) as opposed to a non-STEM major. Thus, while our intent is for the instrument to be useful for students in STEM courses and contexts regardless of whether they are a STEM major, the results presented here are largely based on the perspectives of STEM majors and may be skewed toward their perspective. These sample characteristics should be considered when interpreting the utility of this instrument.
It is also important to note that, while our results demonstrate good model fit for the STEM-COPE across both PEER and non-PEER samples, our conclusions are based on interpretation of model fit parameters (e.g., RMSEA, CFI, etc.) and do not include measurement invariance testing. While measurement invariance testing was not conducted with these samples due to our small sample of PEER students (Meade, 2005), it will be important for future work to establish whether or not STEM students respond to the STEM-COPE in psychometrically different ways based on PEER vs. non-PEER status. This work acts as a first step by demonstrating that the STEM-COPE shows good fit in both of these samples, with a focus on assessing the PEER STEM experience, but future studies should expand on this line of research with more robust and fine-grained distinctions.
Finally, our use of previously written questions from other coping instruments poses a limitation despite providing the benefit of prior testing of these items. Several of the items are slightly problematic in that they present respondents with two options (e.g., "I get help AND advice from other people"), providing potential for an informal fallacy. Given that this item language has been used in many other studies that use the Brief COPE and our cognitive interviews did not flag this as problematic, we did not change it. Nonetheless, we cannot discern if all respondents perceive both conditions as true when responding. Future work could investigate this and work to refine these questions; however, this is beyond the scope of what we undertook for this instrument.

Discussion
Our analyses led to the generation of two different versions of the STEM COPE instrument, Coping Behaviors and Coping Styles. We present both versions so that those considering how to measure coping have different options and can tailor their measurement more carefully and intentionally to their research question. An overview of the final dimensions included in each version and our final definitions of each dimension are included in Table 6. Overall, these two scales are similar. For example, they both group items into factors according to what researchers might predict would be adaptive, maladaptive, and other (neither adaptive nor maladaptive) coping mechanisms within STEM contexts, although how this grouping occurs differs slightly. They also drop similar constructs, with the exception of the construct of selfblame. This general stability of constructs across the two versions of the instrument implies overall stability of several coping dimensions and general grouping patterns within STEM undergraduate contexts. Thus, the stability leads us to have further confidence in our measurement of the coping dimensions included in the final versions of the STEM COPE.
Scales that are not necessarily adaptive or maladaptive stay the same between the two versions of the instrument; cognitive restructuring and support seeking are grouped into distinct factors in both models. This is notable, since both factors are likely to have a mercurial effect when it comes to adaptivity. The support seeking factor, for example, consists of items that assess both help-seeking (i.e., asking for help solving the problem) and comfort seeking (i.e., seeking understanding and emotional support). While these two types of support seeking group together in one factor, they are described in prior literature as having different effects. Comfort seeking is less likely to be effective when seeking to solve a problem or completely alleviate a stressor (e.g., Sagar et al., 2010;Skinner et al., 2003), but is effective at helping one to endure what they Page 20 of 26 Henry et al. International Journal of STEM Education (2022) 9:17 see as an unchangeable situation (Sagar et al., 2010). Yet, these results are not consistent, as comfort seeking may sometimes enable other types of coping, such as cognitive restructuring and planning (e.g., Mortenson, 2006), which in-turn enable problem solving. Likewise, asking for help in solving a problem can be dual-faced. In many cases, seeking help to understand and solve the problem is viewed as adaptive. However, seeking help to advocate for someone else doing the work or to request that someone else solve a problem is maladaptive (Martín-Arbós et al., 2021). Therefore, while help seeking is a distinct factor, it is a moving target; we cannot predict whether this behavior would consistently advance STEM goals or support well-being. More work could be done to examine different forms of help-seeking specifically, and creation of an instrument that distinguishes between these forms could further elucidate the effects of this type of coping on student success and well-being.
Much like support seeking, it is harder to predict whether cognitive restructuring might contribute to adaptive STEM coping or not. Adaptive coping in STEM is specifically defined as both supporting well-being and supporting advancement toward a STEM goal (Henry et al., 2019). While it is likely that cognitive restructuring might consistently support well-being, it is less clear whether it might always support advancement toward a STEM goal. For example, students in research experiences sometimes view those experiences as "career clarifying experiences, " in that they provide clarity with regard to what students do not want to pursue as a career (Hunter et al., 2007). Arguably, this conclusion is a result of cognitive restructuring moving from the idea that "that was a negative and unpleasant experience, " or "that was a waste of time, " toward "that allowed me to determine that this path is not for me. " While this conclusion might support students' well-being, it moves them away from achievement in STEM contexts. Thus, it does not fit the definition of adaptivity within STEM. Alternatively, cognitive restructuring might allow a student to see a failure experience in STEM as an opportunity to learn and use it to inform how they approach future STEM challenges (Gin et al., 2018;Jordan et al., 2014). This approach would be considered adaptive in STEM as it both supports students' well-being and supports their advancement in STEM. While the two scales described above did not differ between the Behaviors and Styles versions of the instrument, several other scales differed slightly but show the same overall patterns. In the Behaviors version, escape and disengagement formed two distinct scales. This is in line with Skinner's work on coping families in that their work predicted distinct families called escape, helplessness, and distraction. In this work we see helplessness combining with distraction, and we call this factor disengagement. In the Styles version, these items and several additional items originally predicted to test escape and helplessness group together to form a comprehensive factor that measures all of these behaviors. We term this factor challenge-avoiding (Henry et al., 2019) given that these behaviors involve the student actively distancing themselves from the problem or stressor and are predicted to be maladaptive. Similar to the way the challenge-avoiding factor encompasses several maladaptive coping dimensions, the challenge-engaging factor on the Styles version is more comprehensive than the problem solving factor on the Behaviors version. Challenge-engaging includes items that measure regulating emotions, planning, and acting to resolve the problem; these items describe students' efforts to engage with the problem or stressor.
Both the challenge-engaging and -avoiding factors combine items that we predicted would group into different coping families (Skinner et al., 2003); thus, they do not adhere to our original predictions. However, they do constitute groupings that are somewhat similar to higher order groupings described in previous work. For example, Compas and colleagues (2001) descried engagement coping as "responses that are oriented toward either the source of stress or toward one's emotions and thoughts" vs. disengagement coping as "responses that are oriented away from the stressor one's emotions/thoughts" (p. 92). Likewise, Latack and Havlovic (1992) describe Proactive/Control vs. Escape/Avoidance coping as a "proactive take-charge approach" vs. "staying clear of the person or situation or trying not to get concerned about it" (p. 493). They see these dimensions as encompassing both problem-focused and emotion-focused coping behaviors (Latack & Havlovic, 1992). These constructs are similar to our broader factors; yet they differ in one critical way. In our EFA, coping behaviors oriented toward one's emotions are represented as separate factors (e.g., cognitive restructuring, seeking support). Our groupings more narrowly represent acting to directly engage with the challenge (planning, and active coping) but not engaging with emotions or thoughts. Approach and avoidance coping, defined by Roth and Cohen (1986) as coping that is oriented "toward or away from threat. " does not completely align with our factor structure either, since other factors (e.g., support seeking) are also oriented toward the threat, and yet, form distinct factors in our models. Our findings that simpler two-or three-dimensional groupings of items did not encompass all forms of coping align with Skinner and colleagues' work (2003), which found that these higher order groupings cannot adequately describe all types of coping.
Finally, students' endorsement of self-blame as a factor differed between the two versions of the STEM-COPE. In the Coping Styles version, it formed a clear distinct factor. However, in the Coping Behaviors instrument, selfblame items were dropped, because they cross-loaded between problem solving and cognitive restructuring, with both items showing a negative relationship with cognitive restructuring. Lack of consistent patterns with regard to self-blame is not surprising as prior research in STEM suggests that self-blame may have a complex relationship with STEM outcomes. Self-blame, while often touted as a maladaptive coping mechanism, has a complex relationship with attribution and control, that may lead to some adaptive outcomes. At its most maladaptive, self-blame can lead to inaction and rumination (Legerstee et al., 2010) and increase stress (Straud & McNaughton-Cassill, 2019). Yet, if an individual sees their own behavior or strategy as the cause of the stressor or problem (as opposed to blaming their character, disposition, or other less controllable causes), self-blame may empower an individual to act (Shaver & Drown, 1986;Weiner, 1985). This is corroborated by our interviews in which several students commented on the dual nature of self-criticism, stating that you can have "both positive and negative criticism. " Thus, the complex nature of self-blame may explain its inconsistency as a factor and cross-loadings on the Coping Behaviors version.
Finally, our analyses indicated poor model fit for items related to several dimensions we had predicted would be relevant to STEM students. Specifically, items that were not included in either version of the instrument were those predicted to measure acceptance, distraction, emotional regulation, and opposition. Overall, we found evidence that acceptance may not be as relevant a dimension for STEM students, distraction may need to be measured with more specific items, and both emotional regulation and opposition may be relevant coping dimensions but may be more nuanced than could be captured with the items proposed. We have included a more detailed discussion of evidence from our cognitive interviews and factor analysis in the Additional file 1: Supplemental Discussion: Dropped Scales. This discussion may be especially useful for psychometricians seeking to better understand this instrument or develop additional instruments to measure other dimensions of coping.

Implications for researchers and instructors
The work presented above provides a starting point for more investigations of undergraduate STEM students' coping behaviors across multiple STEM contexts. We present two versions of the STEM COPE instrument (behaviors and styles) to allow users to better reflect on what measurement model is best for their purposes and to tailor their use of the instrument to their context. We anticipate that the styles instrument, with its broader factors describing challenge-engaging and challengeavoiding coping, might be useful for examining trends in adaptive vs. maladaptive coping patterns. For example, administrators wishing to assess general patterns of coping across a major or large STEM program may find this version of the instrument useful. Alternatively, the behavior instrument with its more narrow, specific scales will be more useful for those who wish to differentiate between specific behaviors. For example, an instructor who needs to differentiate between whether students are attempting to escape the problem or whether they are giving up and disengaging from a problem might use this scale to inform how to best enact interventions tailored to help these students. Our instrument can also be used to better understand differences in STEM student coping responses across demographics, specifically between PEER and non-PEER student groups. PEER students remain underrepresented in STEM contexts and at greater risk for leaving STEM altogether (Asai, 2020;NCSES, 2019). It is with this trend in mind that we aimed to investigate the fit of our measurement model specifically for PEER students in our CFA analyses. Since our analyses present evidence that our instrument is likely to be valid for both non-PEER and PEER groups, the STEM-COPE can be used to compare coping responses across demographics and tailor learning experiences and interventions to assist PEERs specifically. For example, failure of high-stakes exams in introductory or gateway courses are significant stressors for STEM students (England et al., 2019;Hsu & Goldsmith, 2021) that are demonstrated to disproportionately affect underrepresented students (e.g., PEERs and women in STEM; Ballen et al., 2017;England et al., 2019). There is also evidence that PEERs leave introductory and gateway courses at higher rates than their majority counterparts (Riegle-Crumb et al., 2019). Thus, studies that examine coping responses in PEER vs. non-PEER students may help elucidate how coping strategies differ among groups and may elucidate potential avenues of help for PEERs. For example, if PEERs tend to seek help less than other students, and there is evidence that help seeking improves performance on later exams, interventions might be designed to better facilitate PEERs' helpseeking behavior.
The STEM-COPE can also be used across different contexts in STEM to understand how coping might differ in different disciplines, or for students in their "home" discipline vs. other STEM disciplines in which they may not have as high a degree of comfort, self-efficacy, or intrinsic motivation. Likewise, studies across STEM classrooms with different instructors and/or instructional styles could provide insight into how instructor practices influence coping behaviors and styles. For example, a mixed methods approach that examined student coping responses in different course contexts, used coping responses to predict performance, and then interviewed students about how instructors did or did not facilitate different coping strategies could begin to uncover relationships between pedagogy, coping, and achievement. Mixed methods studies that address differences in coping across contexts could be leveraged to better understand how disciplinary cultures affect coping by asking students to elaborate on how cultural components contributed to their responses. Similarly, contributions of students' own identities to their coping patterns could be elucidated with such work.
Since the STEM-COPE can be used as a pre-postmeasure it is also well suited to test changes in coping over time or the effects of interventions designed to change coping behaviors and styles. Coping responses to common challenges or failures (e.g., getting unintelligible results from a technical research procedure) could be measured multiple times over the course of a student's career to understand if and how coping patterns change over time. Pre-post studies that ask students to report on what they have done in response to typical STEM challenges (e.g., doing worse than expected on a high-stakes exam), expose them to coping interventions and then ask them to report a second time could inform how we help students develop adaptive coping skills. Describing specific coping outcomes as a result of these efforts could inform the broad literature that describes evidence-based practices in STEM classrooms. They could also help to identify elements of the learning environment that influence students' coping in general.
Finally, studies that elucidate how STEM coping behaviors and styles relate to other variables of interest would help us to better understand the mechanisms behind student success. For example, modeling studies (e.g., model fitting and selection) that include data from multiple contexts and demographic groups could begin to elucidate consistent relationships between coping behaviors and student success. Such work could examine which coping behaviors lead to increases in student persistence in STEM and could help develop specific targets for intervention studies. Likewise, knowledge of which specific behaviors result from maladaptive outcomes, such as lack of progress, depression, high levels of anxiety, and STEM departure, would be equally as valuable, since interventions could be developed to help students avoid or overcome these maladaptive coping responses.
As indicated by the various examples above, we anticipate that there are many possible uses for this instrument. The versatility of the instrument-that there are two versions available and multiple scales within each version-enables instructors to pick and choose scales to suit their purposes. For example, if an instructor wished to see if their actions increased help-seeking behavior during STEM challenges, they could select only that scale and avoid having to employ the full survey. Importantly, however, instructors, researchers, and administrators who use the instrument should carefully consider their purpose in using the measures and use this to inform how it is introduced to study participants and the timing and context of introduction. To facilitate development of an instrument that could be used broadly across stressful STEM contexts, we introduced the items with the prompt "How often do you do the following when dealing with challenges, struggles, or failures in your STEM course(s)?" This prompt may be very useful if researchers wish to obtain a broad view of STEM coping responses for a group of students in a program or major. However, different prompts may be appropriate for studies that investigate coping in more specific contexts. For example, it could be helpful to rephrase the prompt to: "How often do you do the following when your exam score is lower than you expected in your STEM course(s)?" if the study topic specifically addresses coping in response to exam disappointments. Or the question could be phrased "How often do you do the following when your research produces uninterpretable results?" if the topic is coping with challenges during scientific research. Given that these adjustments are similar to other adjustments made by researchers who study coping in different contexts (e.g., the same coping items, but with different framing, have been used by biomedical researchers to ask patients about coping with cancer [Rand et al., 2019] and also genital herpes [Barnack-Tavlaris et al., 2011]), we feel that such adjustments are unlikely to threaten the validity of the measure. Nonetheless, we recommend pilot testing of altered introductory prompts with members of the population under study. Beyond this, we urge users of this measure to consider timing and how the survey is introduced. Biases can be introduced in survey responses if students are asked to recall events too far in the past (e.g., Tarrant et al., 1993) or if they feel social pressure to respond in a socially desirable way (Krumpal, 2013). Thus, when possible, care should be taken to collect data promptly after students experience a challenge or failure and also to collect data anonymously to limit social desirability bias.
We hope that the above information on implications, the implementation tips, and our validation efforts, which span several disciplines and racial/ethnic identities, will allow this scale to be easily taken "off-the-shelf " and used by instructors in a variety of STEM classes nationwide.

Conclusions
In conclusion, we recommend that educators and researchers interested in assessing students' coping behaviors and styles consider using one of the two versions of the STEM-COPE presented in this paper. The STEM-COPE is an instrument based primarily on the Brief COPE (Carver, 1997) with a factor structure that measures several dimensions of coping relevant to STEM students. The instrument can be used in its entirety to assess multiple dimensions of coping, or specific scales can be used as more directed measures of specific types of coping. The results of this study also highlight the need for more work to develop, test, and confirm additional measures with other relevant coping dimensions in mind. In particular, displays of emotion as coping mechanisms may be particularly unique and complex within undergraduate STEM settings. Our ability to understand relationships between intrapersonal factors and students' success and well-being in STEM hinges on our ability to describe students' experience and accurately measure these factors. This work highlights the need to carefully evaluate available measures of intrapersonal constructs for use in STEM populations, and when necessary, refine or create new measures that better reflect STEM students' experience.