A Comparison of Frequency- and Agreement-Based Response Formats in the Measurement of Burnout and Engagement

The present research compares and contrasts frequency versus agreement response formats, two approaches to measuring job burnout and work engagement. Construct-based and measurement-based arguments for the superiority of the frequency response format in measuring burnout/engagement are provided, demonstrating that frequency-based measurements will explain relatively more variance in outcome variables. Fair comparison, time order counterbalance, and multiple measuring waves justify the comparison and reduce common method errors of self-report measures. Sample 1 (N = 242) was composed of employees from multiple organizations, while the participants in Sample 2 (N = 281) were employees from one company. Relative importance analysis showed that frequency outperforms the agreement response format in measuring burnout and engagement in both samples. These findings suggest that the frequency response format provides a more valuable method of detecting the dynamic nature of burnout/engagement, which offers methodological guidance for future research involving dynamic constructs. These findings can lead to improvements in the measurement of the dynamic experiences of burnout and engagement. This is one of the first studies to provide evidence whether the dynamic nature of the constructs would have any bearing on the response formats.


Introduction
Job burnout and work engagement are both intense personal experiences and research topics in occupational, organizational, and health psychology, affecting a variety of populations such as workers [1], social workers [2], and students [3]. Both variables consist of complex, dynamic states, and our current approach to measurement may not adequately adapt to the dynamic components of burnout symptoms and engagement experiences. Thus, we propose a simple alternative to measuring burnout and engagement: Frequency-based response scales. In the following studies, we test and provide evidence suggesting that frequency-based response scales (e.g., those designed to indicate how often a target behavior or symptom occurs) rather than agreement-based response scales (e.g., those designed to assess how intensely the respondent agrees that a given symptom or experience has occurred) are better-suited for assessing the dynamic elements of burnout and engagement. Consequently, we can achieve a richer understanding of how burnout and engagement are experienced, and how they relate to variables of interest, by improving our approach to measurement.

Measuring Dynamic Constructs
Variables such as affect and behavior are usually measured by a unipolar scale of frequency, ranging from "Never" to "Always" in the extant literature (e.g., UWES [4]), while trait variables such as beliefs, values, and personality are usually measured by a bipolar scale of agreement, ranging from "strongly negative" to "strongly positive" (e.g., Big Five [5]). Agreement scales lend themselves to trait measurement because of the relatively high stability of traits. States, however, can be momentary and variable [6]. Fluctuations can occur from day to day or even moment to moment (e.g., [7,8]).
Burnout was first defined as a stable syndrome [9], yet recent research has shown that both burnout and engagement demonstrate dynamic qualities (e.g., [7,8]). For example, some scholars proposed dynamic components (i.e., task-level view of engagement) out of the general construct (job-level view) and observed that task-level engagement can "spill-over" to subsequent tasks within a job [10]. In addition, the dialectical perspective on burnout and engagement argues that burnout and engagement can occur simultaneously and independently within-person as separate, dynamic states [11]. As such, appropriate measurement tools are needed to assess the fluctuating nature of burnout/engagement. Either the agreement (i.e., a bipolar scale, ranging from "strongly disagree" to "strongly agree") or frequency response format (i.e., a unipolar scale, ranging from "never" to "daily/always") may be employed in measuring dynamic psychological states (e.g., [12]). It is unclear, however, whether agreement or frequency response formats are superior approaches to capturing dynamic psychological states. Because different response format may lead to different scores for the same measuring item, it is critical for researchers to identify the response format best-suited for their instruments and corresponding research questions.
Past research has compared rating scale performance in educational (e.g., [13]) and health-function testing (e.g., [12]). But all previous research has been cross-sectional and focused on the psychometrics of item measurement, based on item response theory, rather than comparing the validity of different rating responses. No research yet focuses on the response format differences of burnout and engagement. Moreover, past research has not considered whether the nature of the constructs (e.g., dynamic states) would have any bearing on whether the frequency or agreement response format better fits the construct. Identifying the best-fit set of response labels and item design should facilitate greater accuracy in comparisons within and between individuals and across studies. Response options that encourage inference and estimation strategies may interfere with such comparisons and encourage judgments that do not accurately represent the respondent's daily life [14]. Thus, we compare the performance of agreement response scales and frequency response scales (vague or precise) in the ensuing studies to provide data in support of identifying the response format best-suited for burnout and engagement research.

Frequency vs. Agreement Scales
Brown [13] notes that participants respond to both frequency and agreement response formats by recalling (ideally) relevant information from memory. With dynamic or fluctuating phenomena, respondents may need to average the fluctuating levels of the construct of interest, and the respondents' averaging strategy differs between frequency and agreement formats. Once the respondents have successfully calculated an average, they must then identify the appropriate, fitting response from the options available. Agreement formats (e.g., slightly agree, agree, strongly agree) can be vague and highly subjective, leading participants to rely on varying strategies to calculate fluctuations in the relevant phenomena [13]. For example, respondents may rely on reporting the number of times a particular phenomenon occurred, or they may rely on the degree of intensity in which they experienced the phenomenon. In other words, a respondent may select "agree" to indicate that they "feel somewhat drained from work" several times per week, while another respondent may also select "agree" to indicate they "feel somewhat drained from work" intensely, yet infrequently or even only once per week. Naturally, these differing response strategies create challenges in interpreting measurements when using agreement scales.
In fact, the response labels may more strongly influence the respondents' calculations to the extent that instances are more difficult to recall, for example, because of a long recall period [14]. Put differently, respondents are more likely to use the response labels as a source of information to determine the appropriate response to the item (instead of the actual frequency of the behavior targeted by the item) when recall of such instances may be difficult. In summary, agreement response scales may be vague and promote simple average and estimate strategies over specific calculations so that respondents can more easily and efficiently respond to survey items [14][15][16]. These concerns suggest that frequency responses may be better-suited to measuring the dynamic elements of burnout and engagement than agreement responses. Scholars also suggest that frequency-type ratings can adequately reflect respondents' general feelings with respect to burnout and engagement [17]. Thus, we propose the following hypotheses: Hypothesis 1. Frequency-based measurements of burnout/engagement will explain significant, incremental variance beyond agreement-based measure of burnout/engagement in the selected outcome variables.

Hypothesis 2.
Frequency-based measurements of burnout/engagement will account for a significantly greater proportion of the variance (i.e., relative importance) than agreement-based measure of burnout/engagement in the selected outcome variables.

Analytical Strategy for Gauging Relative Importance
Traditional measures of relative importance (e.g., simple correlation comparison-r, squared standardized regression weights-β 2 ) may fail to adequately characterize the relative contribution/importance of predictors when the variables are highly inter-correlated (i.e., two groups of predictors in this case [18]). When comparing the relative contributions of two measures, the practical questions are: (1) Does either measure have unique variance in the criterion variable above and beyond that of the other measure in the regression model? (2) What is the contribution of each measure in the presence of the other? Thus, we examine both incremental validity and relative importance in the context of our research foci [19].
On its own, analyzing incremental validity attributes any shared criterion-related validity to the earlier-step measure and none to the later-step measure [19]. Relative importance analysis can complement incremental validity analysis because relative importance analysis provides estimates of importance scaled in the metric of relative effect sizes (i.e., proportion of predictable criterion variance attributed to each predictor/measure) without relying on sample-size-dependent significance tests. Consequently, including information both about a predictor's incremental importance and relative importance will permit people to evaluate the statistical quality of the predictors and to illustrate the overall contribution of the predictor of interest in a more balanced manner [19].
Relative weights rather than general dominance weights will be used to calculate relative importance in this study, because the former offers an advantage when a large number of predictors are involved (e.g., more than 10 predictors in the regression model [19]).

Participants
We sampled employees from organizations in the service industries (i.e., government or government financed institutions, schools) or in business companies (i.e., cross-national companies). For the following study, we recruited two samples. For Sample 1, we requested contact persons to recruit participants randomly chosen from their respective organizations. For Sample 2, we requested our contact person to recruit participants from stores randomly chosen from their organization.
Sample 1 was composed of 242 employees from different organizations in China. Participants ranged from 20 to 53 years old, with a mean age of 32.1 (SD = 5.9) years. In this sample, 33% of participants were male. All participants were general employees (without any management titles). Participants' tenure ranged from 0.1 to 33 years, with a mean of 9.5 (SD = 6.6) years.
Sample 2 was composed of 281 employees from seven stores of a cross-regional company in China. Participants ranged from 18-44 years old, with a mean age of 25.35 (SD = 4.75) years. A total of 145 participants (52%) were male; 180 participants (64%) had less than an undergraduate education. Participants' tenure ranged from 1 to 348 months, with a mean of 49.12 (SD = 51.26) months.

Procedure
This research compared the predictive and explanatory power of agreement response formats, vague frequency response formats, and precise frequency response formats of burnout/engagement measures with respect to empirical criteria from both work-and family-domains (e.g., job satisfaction, sleep quality). Sample 1 participants were requested to rate their burnout/engagement in "the last two weeks", while Sample 2 participants were asked to consider "the past in general". The levels of difficulty are different in averaging the dynamic states of burnout/engagement in these two cases. Each sample provided data from three time points (separated by two-week intervals).
In both samples, we used response formats with identical score ranges (distributional equivalence) across the exact same items for the same factor structure and same constructs (procedural equivalence [20]). In addition, we counterbalanced the presentation of each response format. We asked each participant to rate their respective items using the frequency format at one time and the agreement format at another time, to avoid any confusion that might arise by asking participants to provide two sets of ratings on the same items simultaneously. Ideally, our counterbalanced design should minimize any common method variance (e.g., [21]) or spillover effects.
Participants were randomly assigned to two groups (n 1 = 122, n 2 = 120 in Sample 1 and n 1 = 145, n 2 = 136 in Sample 2) for counterbalancing. Results showed that the two groups did not have significant differences in any of the burnout or engagement dimensions in either sample (all ps < 0.05). Participants in Group 1 used the unipolar frequency response at Time 1 and bipolar agreement response at Time 2 (two weeks later), while participants in Group 2 used the bipolar agreement response at Time 1 and unipolar frequency response at Time 2 to rate burnout/engagement. Another two weeks later, at Time 3, all participants completed scales on outcome variables (e.g., CWB, OCB). The specific measures were administered as indicated in Table 1.  Other abbreviations were shown in the text. "f" in the brackets indicates frequency response, while "a" indicates agreement response. Participants were requested to rate items "in the last two weeks" in Sample 1 while "in the past in general" in Sample 2.
Participants completed their surveys anonymously on-line with substitute random code to indicate each participant for different waves. In Sample 1, our contacts sent out a total of 309 questionnaires and 242 (Group 1 = 122 and Group 2 = 120) completed and valid questionnaires were returned, (response rate of 78% for Time 1); 242 were sent out and 219 (Group 1 = 108 and Group 2 = 111) were returned for Time 2 (91% response rate), and 219 were sent out with 209 (Group 1 = 105 and Group 2 = 104) returned for Time 3 (95% response rate). In Sample 2, the response rates were 81% (349 sent out with 281 valid returned, Group 1 = 145 and Group 2 = 136), 77% (281 sent out with 216 returned, Group 1 = 122 and Group 2 = 94), and 87% (216 sent out with 188 returned, Group 1 = 98 and Group 2 = 90) for respective survey waves. No cases were discarded due to missing data.

Measures
Job Burnout Survey I (unipolar frequency response format). We selected the Maslach Burnout Inventory-General Survey (MBI-GS [22]) to measure two core factors of job burnout: exhaustion (α = 0.91 and 0.88 in Sample 1 and 2) and cynicism (α = 0.86 and 0.87 in Sample 1 and 2). We omitted the inefficacy dimension because the research is unclear regarding its overall fit with the burnout construct (e.g., [23,24]) when measured by the Maslach Burnout Inventory-General Survey (MBI-GS [22]) and the Utrecht Work Engagement Scale (UWES [4]). We asked participants to rate the frequency of the listed feelings and behaviors on a frequency scale with relatively vague quantifiers (1 = never, 2 = almost never, 3 = rarely, 4 = moderately/sometimes, 5 = frequently, 6 = very frequently, 7 = always/every day) in Sample 1 and with more precise quantifiers (1 = never, 2 = a few times a year or less, 3 = once a month or less, 4 = a few times a month, 5 = once a week, 6 = a few times a week, 7 = everyday) in Sample 2. Example items include "I feel drained from my work" and "I doubt the significance of my work".
Work Engagement Survey I (unipolar frequency response format). We assessed work engagement using the Utrecht Work Engagement Scale (UWES [4]). We selected the core dimensions of vigor (α = 0.85 and 0.91 in Sample 1 and 2) and dedication (α = 0.89 and 0.88 in respective samples), because they correspond to the dimensions assessed with the selected burnout measure (e.g., [23,24]). We repeated the instructions of the MBI-GS for the participants as they completed the UWES. Example items include "At my job, I feel strong and vigorous" and "I find the work that I do full of meaning and purpose".
Job Burnout Survey II (unipolar frequency response format). We chose the Shirom-Melamed Burnout Measure (SMBM [25]) to measure job burnout in Sample 2. The SMBM measured the following dimensions: Physical Fatigue (α = 0.94), Cognitive Weariness (α = 0.95), and Emotional Exhaustion (α = 0.91). We asked participants to follow the same instructions as the MBI-GS for Sample 2. Example items include "I feel physically drained", "My thinking process is slow", and "I feel I am unable to be sensitive to the needs of coworkers and customers".
Work Engagement Survey II (unipolar frequency response format). We used the Shirom-Melamed Vigor Measure (SMVM [26]) to measure work engagement. The SMVM included the following dimensions Physical Strength (α = 0.96), Cognitive Liveliness (α = 0.92), and Emotional Energy (α = 0.92). Again, participants followed the same instructions as the MBI-GS for Sample 2. Example items include "Feeling vigorous", "I feel I can think rapidly", and "I feel able to be sensitive to the needs of coworkers and customers".

Control and Criterion Variables
We controlled for demographic variables (e.g., age, gender, tenure, education level, which were found to be potentially related to burnout/engagement [9]) in both samples. In sample 2, we additionally controlled for body mass index (BMI) because BMI relates to health functioning and sleep quality/disorder (e.g., [27]).
Our criterion variables include behaviors, attitudes, and results, including both work-and familydomain outcomes. Research has previously established a relationship between our chosen criterion variables and burnout/engagement (e.g., job satisfaction and organizational commitment [28][29][30]; contextual performance/organizational citizenship behavior [29,31]; incivility/counterproductive work behavior [32]; sleep quality/disorder and work-family conflict [33]). Because work-family positive spill-over is the opposite of work-family conflict, and it is related to mental health [34], it should also be related to burnout/engagement. Participants were requested to rate CWB, OCB, Sleep Quality/Disorder, Work-family Positive Spillover and WFC items on a frequency scale (following the same quantifiers in MBI-GS in respective samples) and to indicate their General Job Satisfaction and Organizational Affective Commitment in an agreement scale (1 = strongly dissatisfied/disagree to 7 = strongly satisfied/agree).
Counterproductive Work Behavior. We measured counterproductive work behavior (CWB) using Fox and Spector's [35] CWB measure. Example items include "Tried to look busy while doing nothing" (from Organizational Deviance, α = 0.61 and 0.81 in Sample 1 and 2) and "Insulted someone about their job performance" (from Interpersonal Deviance, α = 0.71 and 0.96 in Sample 1 and 2).
Organizational Citizenship Behavior. We adapted an 8-item measure of organizational citizenship behavior (OCB, α = 0.92 and 0.83 in Sample 1 and 2) from the organizational citizenship behavior scales of Smith, Organ, and Near [36] (i.e., "Helps other employees with their work when they have been absent", "Makes innovative suggestions to improve the overall quality of the department", "Assists the supervisor with his/her duties"), from Podsakoff, Ahearne, and MacKenzie [37] (i.e., "Willingly share their expertise with colleagues"; "Willingly give of their time to help colleagues who have work-related problems"; "Encourage colleagues and give them positive feedback"; "Pay attention when colleagues describe work-related problems"), and from Farh, Zhong, and Organ [38] (i.e., "Willing to coordinate and communicate with colleagues").
General Job Satisfaction. We used Evers, Frese, and Cooper's [39] 10-item scale (α = 0.89 and 0.82 in Sample 1 and 2) to measure general job satisfaction. Example item includes: "The style of supervision." Organizational Affective Commitment. We assessed organizational affective commitment (α = 0.87 and 0.92 in Sample 1 and 2) with Chen and Francesco's [40] six-item measure. An example item was "I really feel as if this organization's problems are my own." Sleep. We followed the recommendations of the Diagnostic and Statistical Manual of Mental Disorders [41] to measure sleep quality (α = 0.67 in Sample 1). We adapted items from the Pittsburgh Sleep Q uality Index [42]. The items were "I easily go to sleep at night", "I wake up naturally in the morning", and "I have good sleep quality at night". We measured symptoms of sleep disorder (α = 0.93 in Sample 2) with seven items from the Karolinska Sleep Questionnaire [43]. Sample items include "difficulties falling asleep" and "not well-rested on awakening." Work-family Positive Spillover. We measured work-family positive spillover (α = 0.96 in Sample 2) with six items developed by Hanson, Hammer, and Colton [34]. Sample items include "Abilities developed at work help me in my family life." Work-family Conflict. We measured work-family conflict (WFC, α = 0.87 in Sample 2) with five items developed by Carlson, Kacmar and Williams [44]. Sample items include "My work keeps me from my family activities more than I would like."

Preparing for Hypothesized Model Analysis
Means, standard deviations and correlations are provided in Tables 2 and 3 for respective samples. Relative weights and incremental importance statistics are listed in Tables 4 and 5, providing comparisons with cluster variables listed as "total-agreement" and "total-frequency". The integrated tables with detailed variable level statistics information on multiple regressions (e.g., r and β) and relative importance (e.g., rescaled estimates, individual incremental importance) can be provided on request. We omitted comparisons for those models with insignificant squared semi-partial correlation (i.e., ∆R 2 ), when including burnout/engagement variables (e.g., the models predicting Interpersonal Deviance in Sample 1 and predicting General Job Satisfaction in Sample 2). For the other significant models, usefulness analysis showed that after controlling for demographic variables (i.e., age, gender, tenure) in Sample 1, frequency format burnout/engagement variables explained unique variance beyond agreement format predictors in four out of five significant models (i.e., Organizational Deviance, OCB, General Job Satisfaction, Organizational Affective Commitment), whereas with the agreement format the incremental importance was only significant in two out of five models (i.e., General Job Satisfaction, Organizational Affective Commitment).
After controlling for demographic variables (i.e., age, gender, tenure, education level) and BMI in Sample 2, frequency predictors explained more than agreement variables in all eight relationships for Survey I and all five relationships for significant models measuring with Survey II (i.e., Organizational and Interpersonal Deviance, OCB, Organizational Affective Commitment, Sleep Disorder), whereas agreement variables explained beyond frequency predictors in none of the eight relationships for Survey I and only one out of five significant relationships for Survey II (i.e., Sleep Disorder). Thus, burnout/engagement measured with frequency response explains unique variance in related outcomes beyond agreement response (especially in Sample 2). Note. "f" indicates frequency response, while "a" indicates agreement response. Ns are shown on the diagonal. * p < 0.05. ** p < 0.01.  Note. Same as those in Table 2. * p < 0.05. ** p < 0.01. Note. Rescaled importance estimates were calculated by dividing the relative weights (RW j ) by model R 2 . Because of rounding error, the values for RW j may not sum to the model R 2 and the values for rescaled estimates may not sum to unity [19]. The comparison measurement variables include vigor, dedication, exhaustion, and cynicism, with both frequency-based and agreement-based measurements. † p < 0.10; * p < 0.05; ** p < 0.01; *** p < 0.001.  Table 4. † p < 0.10; * p < 0.05; ** p < 0.01; *** p < 0.001.

Relative Importance Analysis
Relative importance analysis for the five significant models in Sample 1 showed that the frequency format of burnout/engagement measures explained greater variance in organizational deviance (R 2 = 0.06, 80%), OCB (R 2 = 0.10, 78%) and sleep quality (R 2 = 0.05, 58%) than the agreement format. Compared to agreement responses, frequency responses explained similar amounts of variance in general job satisfaction (R 2 = 0.15, 53%) and organizational affective commitment (R 2 = 0.17, 52%). Although the agreement format explained more in interpersonal deviance, the total explained variance was only marginally significant. Results of this model were therefore omitted in the consideration of measurement comparison.
In Sample 2, relative importance analysis showed that frequency-based measurements of burnout/engagement outperform agreement counterparts in all eight criteria for Survey I (R 2 ranging from 0.05-0.12), with the importance percentage ranging from 76-89% (shown in Table 4. Frequency responses outperform agreement responses in seven out of eight criteria for Survey II (R 2 ranging from 0.05-0.18), with the importance percentage ranging from 60-78% (except for Sleep Disorder: when predicting Sleep Disorder, both formats explained similar amounts of variance, see Table 5).
The above results showed that burnout/engagement measured with frequency response did explain more variance or at least as much variance in related outcomes when compared with agreement response, including both unique (incremental variance) and combined variance (relative importance variance) in the presence of agreement response predictors.

Main Findings and Implications
Previous research used either agreement or frequency response format in burnout/engagement research. This research compares frequency and agreement response formats in measuring burnout and engagement, covering both organization-(e.g., OCB, CWB) and family-domain criteria (e.g., WFC). Through incremental and relative importance analysis [19], we observed that frequency-based measurements of burnout/engagement outperformed agreement counterparts in predicting most criteria (with significant models) and was at least as valuable in predicting other criteria (e.g., General Job Satisfaction, Organizational Affective Commitment in Sample 1 and Sleep Disorder in Sample 2). Both Hypothesis 1 and 2 were basically supported. According to the extant research, participants responding to the agreement response formats likely rely on varying strategies to calculate fluctuations in the relevant phenomena [13], especially for the long recall period [14], which leads to less variance captured in the outcome.
However, we observed that frequency-based measurements had similar predictive power to the agreement-based measurements for some of our criteria. Those criteria mostly consisted of general attitudes and biological responses, both of which may be influenced by both dynamic, momentary changes and longer-term, underlying conditions. In accordance with previous research [6][7][8], we suggest that frequency-based measurements would be better-suited for explaining the variance of dynamic statuses (e.g., OCB, CWB, WFC), rather than good at explaining the variance of those general attitudes and long-term conditions. Our findings are similar to previous literature observing that domain-specific measures are not more important than general measures in predicting general job satisfaction and organizational affective commitment [45].
In our study, frequency-based measurements of burnout/engagement outperform agreement counterparts in predicting both frequency-(e.g., OCB) and agreement responses (e.g., Organizational Affective Commitment in Sample 2) among criterion variables. In addition, the precise indicators used in Sample 2 (e.g., "once a week") appeared to outperform the vague indicators in Sample 1 (e.g., "sometimes", "frequently"), suggesting that precise frequency indicators may be a better option to measure burnout and engagement and to explain variance in criterion variables. These findings are consistent with past research (e.g., [13,46]) and our own predictions.
This study advances research in rating scale performance comparison (e.g., [7,8]). To date, the previous literature regarding scale comparisons emphasized understanding the underlying processes which determine how individuals respond to an item based on its scale. Our paper builds on that literature to demonstrate the practical differences between frequency-and agreement-based response scales while respecting both distributional and procedural equivalence [20]. Specifically, this research adopted a fair comparison method to draw comparisons between frequency-and agreement-based measurements of burnout/engagement, counterbalanced the measurement orders of comparative scales to reduce time frame variation, separated the measures into different waves to reduce possible common method errors, and measured both burnout/engagement and extended outcomes to indicate the effectiveness of measurements. We further advance the research by identifying a set of conditions in which we expect to observe the greatest performance difference between the response types: When our measurement needs require us to assess burnout and engagement symptoms and experiences over time (i.e., during the past two weeks for a respondent).
Practically, this work will help people find an appropriate and reliable way to measure the dynamic elements of burnout/engagement and to detect individual psychological health functioning. That means we can expand the scope of our burnout and engagement assessments from the mere presence of symptoms to clarify the frequency, duration, and change in symptoms. These data can enable practitioners to identify precipitating burnout events and the effectiveness (both immediately and potentially long-term) of interventions. It may help us distinguish the contribution of severity versus frequency or duration of symptoms in the overall experience of burnout and engagement. In other words, our paper provides insights to burnout and engagement measurement that can inform assessment and intervention design.

Limitations and Future Research
Burnout and engagement exhibit both dynamic and fluctuating experiences and symptoms, making the measurement of such variables easily confounded by time. Sample 1 participants completed their measures in a specific time frame reference (i.e., the past two weeks), and Sample 2 participants were asked to complete their measures with respect to the past in general. In our study, frequency-based scales outperformed agreement scales by a greater margin in Sample 2 than in Sample 1. We cannot parse out the potentially confounding effects of time frame with our current design, though we argue that frequency scales may be better suited to measuring dynamic variables like engagement and burnout, especially as the time frame for participant recall increases. Respondents may more heavily rely on averaging strategies when responding with agreement scales, thereby confounding measurement (e.g., infrequent, severe symptoms may be scored the same as frequent, mild symptoms by a given respondent). The concept of time (such as rating window and duration of recall) should be taken into consideration in the new approach of burnout and engagement research [17].
Future research measuring burnout/engagement needs to take response formats into consideration because of the dynamic nature of the constructs and the cognitive processes undertaken by respondents. Specifically, future research should address the effects of time frame and overall length of recall (e.g., comparing specific time frames to the past in general).

Conclusions
The present research offers insight into the performance of response formats of measures of burnout/engagement at work. Our data suggest that the format we choose for response options can affect the inferences we draw regarding our variables of interest. With respect to burnout and engagement, we conclude that frequency-based response scales are better suited for measurement of the dynamic aspects of these constructs and explaining the variance associated with their covariates, which opens new doors for burnout and engagement research.
Author Contributions: J.T. contributed to the whole process for this research including funding acquisition, project administration, conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation and all stages of writing. R.M.B. contributed to writing-original draft preparation and review and editing. S.G.R. contributed to writing-review and editing and visualization. All authors have read and agreed to the published version of the manuscript.