Capturing Flow Experiences in Everyday Life: A Comparison of Recall and Momentary Measurement

In a real-life study using time-based ambulatory assessment, we investigated how to un-obtrusively capture within-subject and between-subject variations in flow in everyday life. We compared two observation approaches, momentary states, and coverage, which differed in the reference of flow reports and sampling frequency. Depending on condition, participants ( N = 38) answered either ten or five queries per day that referred to the cur - rent state or the last two hours ( n = 1442 observations in total). We found no effect of either approach on compliance, flow intensity, or flow reports over time. However, the approaches differed with respect to flow probability reports, within-subject variability in flow reports, and perceived burden. In addition, we introduced a reduced three-item ver - sion of the Flow Short Scale. Our results indicate acceptable to good reliability as well as concurrent, convergent and discriminant validity of this scale. Based on our findings, we recommend that the choice of observation approach for capturing everyday flow depends on the outcome of interest, the targeted comparison within-or between-subjects, and the expected task variability of the participants. Limitations regarding our sampling procedure and the retrospective assessment of flow experiences are discussed


Introduction
Flow, the experience of complete absorption and fluency in the current activity (Engeser & Rheinberg, 2008), arises when skills and demands are perfectly balanced (Csikszentmihalyi, 1975;Moneta, 2021;Peifer & Engeser, 2021).Since flow is associated with positive outcomes on an individual, task-related, and social level (e.g.increases in job satisfaction, seeking supporting resources, or improved team cohesion, Peifer & Wolters, 2021), research has already begun to explore strategies for increasing flow in everyday life (e.g.teaching goal setting, Weintraub et al., 2021).Importantly, flow states greatly vary depending on time, situation, and personal characteristics (Ceja & Navarro, 2011;Fullagar & Kelloway, 2009;Nielsen & Cleal, 2010;Tse et al., 2021).Due to this high volatility in flow states, flow interventions need individualized formats, i.e., they need to become adaptive.Flowadaptive interventions could tailor to differences between persons (e.g.only target people who do not experience flow at work) and individual states (e.g.do not interrupt a person if they are already in flow), as implemented in just-in-time adaptive interventions in general (Nahum-Shani et al., 2018).For building flow-adaptive interventions, researchers first need to be able to capture flow fluctuations in everyday life without interrupting flow experiences altogether.For that, researchers have to make decisions not only about the operationalization of the target concept, i.e., flow, but also about the general measurement method for capturing everyday states in real-time.In the following sections of the introduction, we will present theoretical literature and previous empirical research on both of these aspects before we dive more deeply into the aim and concept of the present study.
It is important to note the ongoing debate about how flow can be best conceptualized (Abuhamdeh, 2020;Peifer et al., 2022).In this article, we adopt flow as a state of high concentration and sense of control, merging of action and awareness, loss of self-consciousness, and distorted temporal experience (Nakamura & Csikszentmihalyi, 2012) based on its original conceptualization by Csikszentmihalyi (1975).According to Engeser and Rheinberg (2008), we posit that these flow characteristics can be condensed into the flow components of fluency and absorption.These two components should be differentiated from the primary flow precondition of a balance between skills and demands.Although flow can be a gratifying and enjoyable experience (known as an autotelic experience) (Abuhamdeh, 2020;Peifer et al., 2022), earlier research suggests that these affective and motivational components of flow do not appear similarly across different domains (Bassi & Delle Fave, 2012;Csikszentmihalyi & LeFevre, 1989;Delle Fave & Massimini, 2005).Experiencing flow in both productive and leisurely activities involves cognitive components such as feeling in control (Bassi & Delle Fave, 2012;Delle Fave & Massimini, 2005).On the contrary, leisurely activities are associated with a stronger autotelic experience next to the cognitive components (Delle Fave & Massimini, 2005).Therefore, in this study, we operationalize flow by measuring fluency and absorption as the core cognitive components of flow experiences.

Measurement Methods for Everyday States
Even though flow is not conceptualized as an affective state per se (Engeser et al., 2021), it has originally been investigated alongside affect using measurement methods for everyday states (Csikszentmihalyi & Larson, 1987).Hence, in the following, we discuss the benefits and limitations of these measurement methods for inferring flow.Researchers have used different terms to describe measurement methods for quantifying individual states in everyday life (e.g.experience sampling method, Csikszentmihalyi & Larson, 1987; ecological momentary assessment, Stone et al., 2002).We adopt ambulatory assessment (Fahrenberg et al., 2007;Wilhelm et al., 2012) as an umbrella term for the nowadays mostly digitized versions of methods for measuring everyday experiences (Trull & Ebner-Priemer, 2014).Even though ambulatory assessment can be targeted at measuring objective markers of individual states, e.g.neurophysiological correlates, we focus on use of self-reports to capture the subjective experience of flow.
When using self-reports, ambulatory assessment can differ in terms of the observation approach.The observation approach defines the reference that self-reports pertain to (e.g. the last hour), thereby determining how conclusions about the total observation period (e.g. one day) can be reached.There are two options for the observation approach: Participants can provide self-reports about their current state, i.e., momentarily, or about very recent experiences, i.e., by recall (Trull & Ebner-Priemer, 2014).In assessment of momentary states, participants are prompted repeatedly across the day to report on their current state (Csikszentmihalyi & Larson, 2014).If these prompts are distributed randomly over time, it is possible to generalize about all experiences over the entire time period.This approach can be compared to taking a random sample from a larger population and then extrapolating the sample-based findings to the entire population.A set of empirical flow studies have already used this approach building on the original work of Csikszentmihalyi and Larson (1987).In these studies, researchers sent queries randomly in time with a fixed total study duration and amount of observations per day (e.g.Engeser & Baumann, 2016;Johnson et al., 2014).Alternatively, they collected information about each participant's individual working hours and sent queries during those exact times (Fullagar & Kelloway, 2009).Importantly, asking about momentary flow does not necessarily include randomization of observations.For example, Rivkin et al. (2018) asked participants about their current task absorption once per day at noon over the course of ten working days.However, dismissing the randomization impedes generalizations to the full observation period.
In general, inquiring about momentary states bears strong advantages for assessing flow.First, it diminishes retrospective memory biases, thereby allowing to capture the experiential nature of the current state (Lucas et al., 2021;Robinson & Clore, 2002).Second, retrospectively recalling individual fluctuations in flow levels, i.e., recalling whether flow intensity varied across the recall period, is difficult for participants.Indeed, research shows that within-subjects variability is higher when using assessment of momentary states compared to recall (Diener & Tay, 2014).Nevertheless, capturing momentary states of flow does not come without downsides.For example, flow has been conceptualized as an optimal state (Nakamura & Csikszentmihalyi, 2012) that individuals may only occasionally experience during the day (Ceja & Navarro, 2011).Random momentary observations might accidentally miss these occasions.Most importantly, flow is a state of high concentration and immersion (Nakamura & Csikszentmihalyi, 2012).Thus, frequent prompts for providing self-reports about the current state might disrupt momentary flow.Then, the assessment of flow would be heavily biased, in that the measurement itself would lead to decreases in flow.
Ambulatory assessment that captures the observation period by use of recall alleviates some of these challenges.The coverage approach enables conclusions about individual states by retrospective recall not of the full time period (as in traditional trait questionnaires) but by dividing the full observation period into blocks.For example, in a study by Collins et al. (2009), participants reported retrospectively in the evening whether they had experienced flow on that day across ten days in total.By that, the authors avoided the problem of missing scarce states.Similarly, in Kahneman et al.'s (2004) Day Reconstruction Method, participants divide the previous day into activity-related episodes and report how they felt at that time.The major issue with recall is that the experiential nature of a momentary experience can never actually be relived (Robinson & Clore, 2002).This is due to the fact that recall is not only influenced by current emotions or general traits (Levine & Safer, 2002), but also by memory.First, episodic memory shapes our recollections in a way that introduces bias.This is due to the fact that significant moments wield a greater impact on how we retrospectively gauge our emotional experiences (Robinson & Clore, 2002).Second, biases linked to semantic memory emerge as episodic details become less available over time.This results in an elevated trust in generalized beliefs about emotions, ultimately distorting the way memories are recalled (Robinson & Clore, 2002).According to Robinson and Clore (2002), these beliefs can be normative (e.g."one needs to be concentrated at work"), but also situation-specific (e.g."performing chores is boring") or identity-related (e.g."I am easily stressed").These beliefs may not be true.For example, one might intuitively believe that flow is more prominent during leisure than during work.Based on this belief, one might overestimate one's flow during leisure, especially when compared to work tasks.However, research suggests the opposite, i.e., higher levels of flow during work than during leisure (Csikszentmihalyi & LeFevre, 1989;Engeser & Baumann, 2016).In other words, the generalized belief about common flow-inducing activities may bias the recall of the actual experience.Hence, choice of the observation approach (i.e., assessing momentary states or using coverage) does not only depend on the aforementioned (dis-)advantages of the respective options in general, but also requires considerations with regard to the specific concept under investigation, here flow.

Operationalizing Flow in Ambulatory Assessment
Initially, the subjective experience of flow was operationalized by asking participants about the challenges and skills in their current activity multiple times per day for a week.The scores for challenges and skills were z-standardized and the weekly average was calculated for each participant.The researchers assumed that a participant was in a state of flow during each observation in which both their scores (i.e., for challenges and skills) exceeded their average and in which their scores matched (i.e., when participants rated challenges and skills similarly) (Csikszentmihalyi & LeFevre, 1989;Csikszentmihalyi & Larson, 1987; see Moneta, 2021, for an overview of this so-called quadrant model and its further advancement).Some empirical research still builds on this original operationalization, e.g. in studies with nurses (Bringsén et al., 2011) or students (Johnson et al., 2014).However, flow researchers now commonly agree that inferring flow solely based on the presence of a skilldemand balance does not holistically capture the state and conflates the precondition of a skill-demand-balance and the inherent characteristics of flow (Moneta, 2021).Therefore, later empirical studies built on the approach of assessing momentary states, but applied questionnaires operationalizing flow as a multi-componential concept, e.g. the Flow Short Scale (FKS1 , Rheinberg et al., 2019), the Flow State Scale (Jackson & Eklund, 2002;Jack-son & Marsh, 1996) or the Work-Related Flow Inventory (Bakker, 2008).These componential measures differ in their domain specificity and their underlying flow concept, i.e., whether they integrate affective and motivational flow components.For example, the Flow State Scale acknowledges whether flow was perceived as intrinsically motivating, while the FKS only integrates the flow components of fluency and absorption.Since these componential operationalizations turn flow into a continuous construct by calculating the flow score as a mean across items, they might impose flow on a person even if they would not report a flow experience themselves (Abuhamdeh, 2020).For example, this could be due to reports of high task absorption that increase the mean flow score while other flow components are actually absent in that particular moment.A measure for capturing flow in everyday activities that allows a dichotomization, in that a person could be either in flow or not experience it altogether, while incorporating characteristics of flow experiences apart from the skilldemand-balance is the Flow Questionnaire (FQ; Csikszentmihalyi & Csikszentmihalyi, 1988).For example, Collins et al. (2009) applied an adapted version of the FQ that showed two quotes describing flow to participants at the end of each day asking whether they had experienced a similar state on that day.However, the FQ was criticized because it addresses mostly the absorption component of flow in the quotes (Moneta, 2021).Importantly, both operationalizations, i.e., measuring flow continuously or categorically, allow different interpretations.Continuous scores indicate flow intensity, whereas dichotomic results can be used to infer flow presence, thereby indicating probability of flow states.Thus, adopting flow as a yes-or-no continuous phenomenon using a combination of both operationalizations is a meaningful integration that has been called for in recent theoretical outlines (Peifer & Engeser, 2021).This combination implies that an individual can experience flow or not, and if they do, the intensity of flow may vary (Peifer & Engeser, 2021).The combined operationalization simplifies the development of flow-adaptive interventions by enabling the identification of the appropriate time for intervention (e.g. when an individual is not experiencing flow at all, or when they are, but flow intensity is low) and the anticipated impact of the intervention on flow.The anticipated impact determines which intervention strategies to use.In order to increase the probability of flow experiences, the flow preconditions must be established.To increase flow intensity, moderating variables could be modulated (Bartholomeyczik et al., 2023).

The Present Research
In the present work, we build on the aforementioned literature on measuring flow with ambulatory assessment covering three exploratory research questions.The first aim of our study is to explore how capturing momentary states compared to using coverage in ambulatory assessment influences participants' flow reports, compliance, and perceived burden.More specifically, we aim for comparing assessment of momentary states with recall of the time period since the last observation.For both approaches, we use repeated observations per day.Thereby, we increase closeness in time between the reference and the self-report in the coverage approach to limit memory biases due to recall (Robinson & Clore, 2002).This is especially important since time is experienced in a distorted way during flow (Nakamura & Csikszentmihalyi, 2014).For example, if a person feels like time has been flying today, this might cause them to report high levels of flow for the full day when asked about it only building on the original German version of the Flow Short Scale (Flow Kurzskala; Rheinberg et al., 2003).
once in the evening, even though they might have experienced high levels of flow only in the morning.To ensure that both observation approaches cover the same observation period, we schedule the repeated observations differently depending on the approach.For the momentary states approach, we randomize the observations in time to be able to generalize to the entire observation period.In the coverage approach, we schedule the observations in fixed time intervals assuming that it is easier to recall what one has done since a certain time of day than since the last observation query.It is important to note that the differences in sampling (fixed versus random) do not represent an additional manipulation, but rather are inherent to the observation approaches (momentary states versus coverage) used to draw conclusions about similar time periods.
As argued before, one issue with ambulatory assessment is its inherent interruption in current tasks.However, earlier research suggests that increases in the number of items per observation are more detrimental for compliance and biases in reports than increases in total number of observations (Eisele et al., 2022).Thus, the second aim of our study is to identify if a flow scale commonly used in ambulatory assessment can be shortened without negatively impacting its psychometric properties.We decided to focus on the FKS because it is a validated flow measure that can be applied across domains (Rheinberg et al., 2019).In contrast to the Flow State Scale, it captures only the flow components of fluency and absorption (without addressing the autotelic component) and does not conflate the precondition of a skill-demand-balance with the inherent characteristics of flow (Moneta, 2021;Engeser & Rheinberg, 2008).Even though the FKS has already been designed as a short measure, it still consists of ten items.In this study, we condense it to a three-item version and evaluate its within-and between-subjects reliability and validity.
The third aim of our study is to combine categorical and continuous operationalizations of flow to empirically evaluate the proposition of flow as a yes-or-no continuous phenomenon (Peifer & Engeser, 2021).Apart from keeping flow as a state of optimal experience (Abuhamdeh, 2020), this allows conclusions about flow probability and intensity, thereby allowing flow-adaptive interventions directed at a particular outcome of interest.Even though we acknowledge that as part of the ongoing debate about the composition of flow (Abuhamdeh, 2020;Peifer et al., 2022) both measures have been criticized to neglect certain flow components (Moneta, 2021), we combine the FQ (categorical) with the FKS (continuous) to evaluate a novel combination of two commonly used measures.To our knowledge, only one ambulatory assessment study has used a similar operationalization (Collins et al., 2009).However, participants in this study only provided one self-report per day, thereby lacking information on everyday within-subject fluctuations.Using repeated observations per day as in the present work allows to overcome this limitation.
In sum, our study contributes to research on ambulatory assessment of flow in at least three major ways.First, we aim to provide recommendations for the choice of the observation approach, thereby supporting researchers in capturing within-and between-subjects fluctuations in flow while limiting decreases in compliance and biases in reports.Second, we reduce problematic side effects of interruptions caused by ambulatory assessment by developing a shorter version of a commonly applied flow scale.Lastly, we provide a first empirical examination of flow as a yes-or-no continuous phenomenon, thereby allowing to infer conclusions about flow intensity and probability.

Participants and Procedure
We recruited participants from a pool of individuals who were compensated for their participation in online and onsite experimental studies hosted by a university with technical focus.The local data protection office and ethical committee approved the study.Due to limited capacity of study smartphones, participants took part in the study in two waves of two weeks each.Based on power estimates from Monte Carlo simulation (Arend & Schäfer, 2019) 2 , we aimed for at least 30 participants.The overall sample consisted of N = 38 participants (n female = 15, M age = 23.8,SD age = 2.7).The majority of participants were students (97.4%), with 63.2% of them having a side job, except for one full-time employee.To ensure consistency in the tasks completed by participants throughout the study, we asked them to work on mental tasks (e.g., preparing for an exam, writing a final thesis, or programming code) for at least four hours per day.As aimed for, 59.1% of the observations were reported as work.In comparison, 18.7% were reported as leisure, 9% as obligations, and 13.2% as other types of tasks.In the first session (on Monday), we informed participants about the study and provided them with a smartphone (Android system) with the pre-installed app movisensXS (version 1.5.23, Movisens GmbH, Karlsruhe, Germany, 2022).After giving informed consent, participants provided demographic information.Then, smartphone-based ambulatory assessment took place across the following two weeks in two blocks of three days.The first block started on Tuesday morning after the first session and finished on Thursday evening (same weekdays in the second week to control for possible differences in daily schedules).On Friday in the second week, participants returned the smartphone, filled out a feedback questionnaire and provided their payment details.The reward was based on local minimum wage and contingent on how often they had answered the e-diaries.89.5% of participants received the full payment of 45 EUR (for answering at least 65% of e-diaries).We provided an incentive of additional 10 EUR if more than 80% of e-diary queries were completed.78.9% of participants received the additional incentive.

Ambulatory Assessment Procedure
Since flow involves complete absorption in the current task, we captured participants' flow with a time-based approach.Pending queries were announced via an acoustic notification on the smartphone and participants could postpone answering an e-diary query for ten minutes.E-diary queries were prompted between 9 am and 7 pm on each day 3 (according to regular local working hours).In the e-diaries, participants answered questions about their task, flow, stress, mind-wandering, skill-demand-balance, and autotelic experience.There were two within-subject conditions for the e-diary queries, momentary states and coverage.For each block of three consecutive study days, one condition remained.We randomized their order between participants to rule out sequence effects.
The conditions differed regarding the observation approach, i.e., the reference of the provided questions, which was linked to differences in observation frequencies (e-diaries to fill out per day) and type of sampling (fixed versus random timing of observations).In the momentary states condition, questions referred to the current activity, i.e., to what participants were doing right before the e-diary query.In the coverage condition, questions were asked retrospectively about the last two hours, i.e., about the time since the last alert.We chose the interval of two hours for the coverage condition in order that participants could still accurately recall their activities.Also, when we decided about the reference (i.e., the time periods the recall refers to), it is necessary to consider the estimated frequency of occurrence of the variable of interest.Earlier research shows that flow consistently occurs during work (Engeser & Baumann, 2016).However, people are often interrupted during work, then taking them some time to get back to the initial activity (Mark et al., 2005).Thus, we opted for five e-diary queries per day in the coverage condition with fixed sampling (queries at 11 am, 1 am, 3 pm, 5 pm, 7 pm).Based on the assumption that momentary states allow generalizations about the full time period when using frequent observations with random timing, we doubled the number of e-diary queries in this condition (ten e-diary queries per day with random sampling and at least 30 min between two e-diary queries) to make inferences about similar time periods (i.e., 9 am to 7 pm) in both conditions.As intended, in the momentary states condition, the mean time between two observations on one day was M = 1.01 h (SD = 0.37) with no significant difference in time of observation between days (p = .875,SI 1).The distribution of e-diary queries throughout the day differed significantly between conditions (p < .001,SI 1), indicating an average shift of one hour later in the time of observation in the coverage compared to the momentary states condition.On the last day of each block of three days (i.e., after each condition had been completed), a feedback query was added in the evening (7.30 pm) asking about subjective perception of burden in the preceding days (see SI 2 for visualization of overall sampling procedure and differences in e-diary queries between conditions).
Over all six days, participants answered N = 1508 queries (n = 1442 e-diary queries4 , n = 66 feedback queries).The mean compliance rate was M = 84.9%(SD = 13.2) across all queries with a range from 51.1 to 100% depending on participant.All except for one participant answered at least one e-diary query on all days5 , and at least one of the two feedback queries.Since one of the aims of our study was to assess whether the assessment of flow in everyday life would raise problems with compliance, we did not exclude participants with lower compliance.Overall, participation was highest on the first day of each block and lowest on the last day of each block.

Measures
Please refer to SI 9 for the wording of items, respective answer scales, and sources.

Flow
We operationalized flow as a yes-or-no continuous phenomenon (Peifer & Engeser, 2021) by using a continuous (FKS) and a categorical (adapted FQ) measure.In the momentary states condition, participants were asked to answer the FKS and the FQ with regard to their current activity, whereas in the coverage condition they referred to the last two hours.
The FKS asks participants to indicate their agreement with ten statements on a sevenpoint Likert scale from "not at all" (1) to "very much" (7) (Engeser & Rheinberg, 2008;Rheinberg, 2015).It has a two-factorial structure capturing the flow components absorption and fluency.In the coverage condition, we used the full ten item version.Due to the doubling of e-diary queries in the momentary states condition, thereby increasing time spent with answering, we used the reduced version (r-FKS) in that condition.The r-FKS consisted of the highest loading items (loadings derived from Rheinberg et al., 2003) for each of the two FKS factors, absorption and fluency (two for fluency and one for absorption according to the 2:1 ratio of items in the full version).Specifically, the r-FKS included items 6, 8, and 9 of the original scale (e.g., "I am totally absorbed in what I am doing") (Engeser & Rheinberg, 2008;Rheinberg, 2015). 6We computed the mean across items as an indicator for what we call flow intensity with higher scores indicating higher intensity.
The FQ consisted of a single dichotomic item asking participants whether they experience flow (coded as 1) or not (coded as 0), thereby providing a variable for what we call flow presence.In the first session, we told participants that by flow, we refer to experiences as described in these quotes (Moneta, 2012, p. 494;adapted from Csikszentmihalyi & Csikszentmihalyi, 1988): "My mind isn't wandering.I am totally involved in what I am doing, and I am not thinking of anything else.My body feels good … the world seems to be cut off from me … I am less aware of myself and my problems.""My concentration is like breathing … I never think of it … When I start, I really do shut out the world.""I am so involved in what I am doing … I don't see myself as separate from what I am doing."During ambulatory assessment, these quotes were not presented to the participants each time they answered the FQ.However, they were available on the smartphone by pressing the button "What is Flow?" to ensure that participants were able to access the correct definition at any time.

Task
In the momentary states condition, participants were asked to indicate their current task, whereas in the coverage condition, they reported their main task within the last two hours.The question was provided as a single-choice item with possible answers being work, obligations (e.g., laundry, grocery shopping), leisure and other.

Flow-Associated Constructs
To assess validity of the r-FKS, we also included a set of flow-associated constructs in the e-diaries.For all constructs, we used seven-point Likert scales (similarly to the FKS) and different references (current state versus last two hours) depending on condition.First, since the FKS only captures the flow components of fluency and absorption, we used the mean across three items from Abuhamdeh and Csikszentmihalyi (2012) as an indicator for autotelic experience: "I am enjoying myself.""I find my current activity interesting.""I find my current activity exciting."Reliability was good to excellent within-(ω = 0.869) and between-subjects (ω = 0.916).The flow precondition of a skill-demand-balance was captured by one item from Engeser and Rheinberg (2008) that allowed participants to indicate the perceived height of the demands with regard to them personally from "too low" (1) to "too high" (7).Lastly, due to the conceptualization of flow as a state of high absorption with effortless attention (Engeser & Rheinberg, 2008;Hommel, 2010;Peifer et al., 2014), we expected stress and mind-wandering to be discriminant to flow.Since earlier work indicates that stress can be adequately captured by a single item (Elo et al., 2003;Katana et al., 2019), we used the item "I feel stressed" (Linnemann et al., 2018).Mind-wandering was measured with the two items "I was thinking about something other than my current activity" (Killingsworth & Gilbert, 2010;Lambert & Csikszentmihalyi, 2020) and "My mind has wandered to something other than what I am currently doing" (adapted from Kane et al., 2007;McVay et al., 2009).Due to the two-item structure of mind-wandering, reliability could not be assessed.Spearman-Brown correlation between the two items (recommended as a reliability measure for two-item scales, Eisinga et al., 2013) was high within-(p = .92)and between-subjects (p = .96).

Perceived Burden
In the feedback query at the end of each block of three days, participants indicated the perceived observation frequency, as well as the amount of perceived interruption of flow and work on a five-point Likert scale with reference to the past three days with ambulatory assessment.
For testing the influence of the condition on flow reports, we computed two-level models to account for the nested data structure with repeated observations (level 1, n = 1442) within participants (level 2, N = 38).For assessing flow intensity as the outcome variable, we employed a linear mixed model, whereas we used a generalized linear mixed model with a logit function for flow presence due to the dichotomic outcome variable.Specifically, we included fixed effects for the condition (0 = momentary states, 1 = coverage), their order (0 = group that participated in momentary states condition first, 1 = coverage condition first) and daytime (centered for middle of the day, 0 = 2 pm), as well as a random effect for the condition.Due to problems with model fit (no convergence of the model), we excluded this random effect in the generalized linear mixed model.To assess the influence of the condition on compliance, we computed the compliance rate (in %) for each condition and person.Due to the absence of normal distribution (Shapiro-Wilk: p = .04),we compared differences in compliance between conditions with a one-tailed Wilcoxon test for paired samples assuming that compliance would be lower in the momentary states than in the coverage condition because of the differences in frequency of e-diary queries.For evaluating the influence of the condition on burden, we computed one-tailed Wilcoxon tests for paired samples (Shapiro-Wilk: all p < .05)assuming that perceived observation frequency as well as interruption of work and flow would also be higher in the momentary states compared to the coverage condition.
We evaluated the psychometric properties of the r-FKS in the coverage condition since it allowed direct comparison with the full scale by aggregating mean scores (1) for all ten FKS items and (2) only for the three items of the r-FKS for each observation and person.We computed McDonald's Omega ω (Geldhof, 2014) to assess reliability.We used multilevel Pearson correlation coefficients for assessing the relationship between the r-FKS and the full version (concurrent validity) and between the r-FKS and flow-associated concepts (congruent/discriminant validity).
For testing whether flow could be operationalized as a yes-or-no continuous phenomenon by combining categorical and continuous measures, we computed two-level linear mixed models (separately for momentary states and coverage due to differences in reported flow), so that we could account for the nested data structure with repeated observations (level 1, momentary states: n = 955, coverage: n = 487) within participants (level 2, both: N = 38).Specifically, we investigated whether flow intensity differed depending on the answer to the categorical measure.Thus, we used flow intensity as the outcome variable and included fixed effects for the FQ (split into a within-and between-subjects component), condition order and daytime, as well as a random effect for the FQ (within-subject component).

Results
Descriptive statistics for the variables measured in e-diary and feedback queries are presented in Table 1.Overall, participants experienced moderate levels of flow intensity (M FKS = 4.69).For the majority of observations, they did not report presence of flow (FQ = 0; 57.70% of observations).94.7% of the participants (n = 36) reported presence of flow at least once.The null model (two-level linear mixed model without predictor variables) revealed that less than 30% of variability were due to between-person differences in flow presence and intensity (ICC FQ = 0.23, ICC FKS = 0.21).This indicates that within-person fluctuations made up the major portion of variability in flow reports, thereby supporting the need for ambulatory assessment to capture everyday flow.

Comparison Between Use of Coverage and Momentary States
Figure 1 illustrates within-and between-subject variability over time in comparison between conditions.Within-subject variability in flow intensity was higher in the momentary states (σ 2 FKS = 1.130) compared to the coverage condition (σ 2 FKS = 0.856) which ties in with the higher frequency and current reference of observations in the momentary states condition.
Visual inspections did not indicate linear decreases in flow intensity over time in both conditions.This suggests that the interruption by e-diary queries did not result in reduced flow experiences over time, even if observation frequency was increased (for additional statistical tests of influence of time via multilevel modeling see SI 3 and 4).
Results of the multilevel models estimating the influence of the condition on flow intensity and presence (controlling for daytime and order of conditions) are depicted in Table 2.
In the momentary states condition, flow intensity of a typical person was 4.85 (possible values between one and seven) when all other predictors were zero (i.e., in the group in which the momentary states condition took place first and when daytime equaled middle of the day).There was no significant difference in flow intensity between the coverage and the momentary states condition (B Condition = -0.18,p = .145).The order of the conditions neither had a significant direct effect on flow intensity (B Order = -0.14, p = .522),nor interacted significantly with the condition (B ConditionxOrder = -0.13,p = .426).This indicates successful randomization of participants to different orders of conditions and implies that their perception of the first condition did not bias reported flow intensities in the subsequent condition.In contrast to flow intensity, the probability of reporting flow was significantly higher in the coverage compared to the momentary states condition (66% compared to 46% of observations reported as flow presence; B Condition = 0.86, p < .001,OR = 2.36) when all other predictors where zero.There was also a significant difference in flow presence depending on order of conditions, in that flow probability was lower in the group of participants that did the coverage condition first (B Order = -1.09,p = .001,OR = 0.34).However, there was no significant interaction between condition and order (B ConditionxOrder = -0.12,p = .649,OR = 0.87) indicating that this group of participants generally reported presence of flow more often independent of condition.3.61 (1.12) ---Note.Flow, autotelic experience, skill-demand-balance, stress, and mind-wandering (rated on a sevenpoint Likert scale from one to seven) were reported in the e-diaries (Level 1, n = 1442; momentary states: n = 955, coverage: n = 487) by the participants (Level 2, N = 38).Perceived observation frequency, work interruption, and flow interruption (rated on a five-point Likert scale from one to five) were reported in the feedback queries (Level 1, n = 66; momentary states: n = 31, coverage: n = 35) by the participants (Level 2, N = 37).Please note that one participant did not answer the feedback queries at all.ICC = Intraclass correlation coefficient a Proportion of "Yes" responses reported instead of mean scores and standard deviations due to the dichotomic variable (FQ) Since the number of e-diary queries was twice as much in the momentary states compared to the coverage condition, we expected increased burden in the momentary states condition.In line with that, means of all indicators for perceived burden were higher in the momentary states compared to the coverage condition (Table 1).Wilcoxon tests for paired samples confirmed significantly higher perceived observation frequency (V = 184.0,p < .001),inter-  Independent of these differences in perceived burden, participants responded to similar portions of e-diary queries in the momentary states (mean of individual compliance rates: 83.9%) and the coverage condition (86.7%).Statistical analyses also did not indicate a significant difference in compliance between conditions (V = 355.5,p = .256).Similarly, visual inspections did not imply an effect of the order of conditions on compliance (SI 5).Statistically comparing the difference in compliance (momentary states versus coverage condition) between groups with different order of conditions confirmed the absence of an order effect (W = 149.5,p = .372).This indicates that the transgression between conditions from the first to the second week, i.e., whether participants experienced an increase or decrease in the number of observations per day, did not influence their compliance.
In sum, flow presence was higher in the coverage compared to the momentary states condition, whereas there was no significant difference in flow intensity between conditions.Even though the momentary states condition was associated with increased burden (including perceived interruption of flow) compared to the coverage condition, this did not have an influence on actual participation in the study (i.e., compliance) or led to decreases in flow intensity or presence over time.

Psychometric Properties of the r-FKS
In the following paragraph, we present results for the coverage condition only since it allowed us to compare the psychometric properties of the r-FKS with those of the full scale.However, similar analyses in the momentary states condition did not reveal differences in tendencies of effects (Table 3).Within-subject reliability of the r-FKS was acceptable (ω coverage = 0.67, ω momentary = 0.68), but smaller than for the full version of the scale (ω coverage = 0.86).By contrast, between-subjects reliability was good for the r-FKS (ω coverage = 0.85, ω momentary = 0.87) and the full version (ω coverage = 0.82).4) Skill-demand-balance 0.17 a (-0.07) a -0.07 a 0.10 -0.31 -0.21 (5) Stress -0.25(-0.61)-0.17 -0.17 0.66 -0.12 (6) Mind-wandering -0.12(-0.51)-0.36 -0.33 -0.30 -0.05 -Note.Correlations above the diagonal indicate correlations at the within-person level (Level 1, n = 487); correlations below the diagonal indicate correlations at the between-person level (Level 2, N = 38).The r-FKS consisted of three items, whereas the FKS consisted of ten items.Correlations between the FKS and flow-associated concepts could not be computed for the momentary condition because the full scale was only used in the coverage condition a Correlation between grand-mean centered squared skill-demand-balance and grand-mean centered flow due to inverted u-shaped association between skill-demand-balance and flow b Correlation between person-mean centered squared skill-demand-balance and person-mean centered flow due to inverted u-shaped association between skill-demand-balance and flow As reported in the previous paragraph, there was no difference in flow intensity between the momentary states and the coverage condition, i.e., between the conditions that used the reduced and the full version of the FKS.Visual inspection of flow intensity computed as a mean across the full version of the scale and solely the items of the r-FKS in the coverage condition also indicated a positive relationship between the two scores (Fig. 2).This correlation was strong within-(r = .86)and between-subjects (r = .90)(Table 3).This denotes that scores on the additional items in the full compared to the reduced scale did not largely increase or decrease mean flow intensity, thus suggesting strong concurrent validity of the r-FKS.Within-subject correlations between the reduced and the full version of the scale were also strongly positive when computed separately for the scale factors (r absorption = 0.81; r fluency = 0.78).This supports our goal of capturing both factors in the r-FKS as in the full scale.
Since we were interested in evaluating the r-FKS for use in ambulatory assessment, we were especially interested in its validity with regard to within-subject fluctuations.Within- subject associations between flow intensity measured with the r-FKS and flow-associated concepts are visualized in SI 6. Flow intensity was moderately positively associated with autotelic experience and moderately negatively associated with mind-wandering and stress on a within-subject level.Thus, the higher the flow intensity of a person was, the more they enjoyed their activity and the less they experienced simultaneous mind-wandering or stress.These correlations had similar tendencies for both scale versions but were slightly lower for the r-FKS than the FKS (Table 3).For computing the association between flow intensity and skill-demand-balance, we squared the latter variable due to the expected inverted-ushaped relationship between these two variables (Engeser & Rheinberg, 2008).In line with that expectation, within-subject correlations between flow intensity and the squared skilldemand-balance were moderately negative for the r-FKS and the FKS (Table 3).In other words, participants reported highest flow intensity when skills and demands were perfectly balanced.In sum, these results indicate convergent and discriminant validity of the r-FKS.

Differences in Flow Intensity Depending on Flow Presence
Mean difference between flow intensity in observations reported as presence or absence of flow was positive in the momentary states (M Yes = 5.29, M No = 4.62) and the coverage condition (M Yes = 5.04, M No = 4.06), indicating that flow intensity was higher when participants reported presence of flow.This was confirmed by results of our multilevel models (Table 4) showing that if a person reported presence compared to absence of flow, flow intensity increased by two-thirds to one unit (momentary states: B Presence_w = 0.65, p < .001;coverage: B Presence_w = 0.94, p < .001).Results of the models also show between-subjects effects, in that persons who reported presence of flow more often than average also reported generally higher flow intensity (momentary states: B Presence_b = 2.05, p < .001;coverage: B Presence_b = 1.23, p < .001).In addition, we computed 25th percentiles of flow intensity separately for each individual as an exploratory cut-off between presence and absence of flow.Indeed, a majority of the observations for which flow intensity did not exceed this individual cut-off was reported as absence of flow (momentary states: 85% of observations with intensity below individual cutoff; coverage: 72%).Hence, particularly low intensity scores of a person seem to provide a marker for absence of flow.
Due to the focus of the FQ on the absorption component of flow, we additionally computed the multilevel models solely with the FKS factor absorption as the outcome vari- Variable for flow presence split into within-(presence_w) and between-subjects (presence_b) components.Dichotomic variable for order of conditions (0 = group that participated in momentary states condition first, 1 = coverage condition first).Centered variable for daytime (0 = middle of observation period, 2 pm).CI = Confidence interval, LL = Lower level, UL = Upper level.* p < .05,** p < .01,*** p < .001 1 3 able.Indeed, these models resulted in stronger direct fixed effects for the within-and the between-subjects component of flow presence on absorption intensity (for model results see SI 7).This denotes that the answer to the FQ ties in more strongly with the current absorption in the task than the perceived fluency (for model results for fluency see SI 8).

Discussion
In a real-life study using time-based ambulatory assessment, we compared two observation approaches, momentary states, and coverage, to capture within-and between-subject fluctuations of flow.Despite the differences in reference between these approaches that were linked to differences in observation frequency, we found no effect on compliance, flow intensity reports, or flow reports over time.However, flow probability, within-subject variability in flow reports, and perceived burden differed between the approaches.In addition to finding support for the potency of both approaches, we found that the FKS can be shortened to the three-item r-FKS without substantial detriments in its psychometric properties.In the following, we will further discuss the contributions of these results as well as limitations of our study.

Contributions for Research and Practice
Our findings contribute to the literature on flow measurement and on assessment of everyday states in at least three ways.First, our comparison of two observation approaches allows us to speculate about recommendations for when to use which approach when measuring flow in everyday life.Based on our findings, we recommend capturing momentary states if researchers need fine-grained analyses of individual fluctuations in flow, e.g. when investigating physiological correlates of flow (e.g.Peifer et al., 2014).Especially since this approach captures higher within-subject variability (as in similar findings on current compared to retrospective assessments; Diener & Tay, 2014), this approach is particularly useful for developing flow-adaptive interventions.By allowing real-time assessment of individual states, it enables researchers to intervene in a targeted and time-based way.In contrast to the concern that frequent prompts for providing self-reports about the current state might disrupt momentary flow, we found no evidence that the increased observation frequency linked to momentary assessment biases reports of flow intensity.By contrast, even though participants noticed the differences in observation frequency between the two approaches and associated higher frequency with increased burden, noticing the differences neither influenced their actual participation in the study nor altered their flow reports over time.This result is in line with an earlier study by Hasselhorn et al. (2022) that identified no differences in compliance due to differences in assessment frequency.However, our findings do not imply that the coverage approach is not preferrable in certain cases.If researchers are mainly interested in interindividual differences, e.g. when testing the effect of an intervention with a control group (e.g.Weintraub et al., 2021), the coverage approach comes with a lower number of observations, thus is more cost-efficient and less intrusive than assessing momentary states.Meanwhile, it still allows for capturing within-subject variability due to the repeated observations per day, thereby providing a compromise between extensive momentary assessment and single, trait-oriented observations.Importantly, the coverage approach should also be the method of choice if a total number of flow experiences per day is needed.While randomly distributed momentary assessments only allow generalizations about the entire observation period (i.e., making an informed guess about periods between observations), the coverage approach captures the entire observation period without any disregarded periods in between.
Our finding of lower flow probability in the momentary states approach directly ties in with the aforementioned problem of missing scarce states, in that coverage increases the likelihood of capturing all flow experiences, whereas momentary, random sampling can miss them by chance.However, the difference in number of reported flow experiences between the two observation approaches could also be due to normative beliefs prompted by recall (e.g."I should experience flow during work" or "I have not experienced flow today yet, that can't be true"), thereby biasing the estimation (Robinson & Clore, 2002).In contrast to commonly discussed limitations of recall (e.g.Kahneman et al., 2004;Lucas et al., 2021;Robinson & Clore, 2002), its application in the coverage approach does not seem to lead to a loss of access to the experiential nature of flow though, in that there were no differences in reported flow intensity between the momentary states and the coverage approach.Even though we acknowledge that earlier studies even found recall biases for very recent experiences (e.g.see literature on the peak-end-rule; Fredrickson, 2000;Kahneman et al., 1993), we assume that the two hour intervals of reports and the assessment during everyday life (i.e., typically without exceptionally affective events) allow to prevent such biases.Hence, we recommend maintaining short time windows when opting for the coverage approach.
As a second major contribution, we have shown that the r-FKS neither loses reliability nor validity compared to the full scale in our ambulatory assessment study.Most importantly, we found an almost perfect conformity between sum scores on the full and the reduced version of the scale, thus concluding that the additional seven items of the full scale do not provide sufficiently high added value to justify their use in ambulatory assessment.Reducing the length of the flow measure for ambulatory assessment allows more frequent observations per day (e.g.important for gaining a large number of labels for neurophysiological data) and during a larger variety of tasks (e.g.allowing analyses in work settings with frequent meetings in which it would be inconvenient to fill in longer questionnaires).Also, very short scales such as the r-FKS are especially promising with regard to the use of novel, user-friendly devices for assessing everyday experiences, e.g.smartwatches (Volsa et al., 2022).Prompting short e-diaries on such unobtrusive devices, researchers cannot only increase interest in participating in ambulatory assessment studies, thereby enlarging potential sample sizes, but also prolong study duration due to the decrease in participants' burden.Since we investigated flow in everyday life, future research needs to evaluate whether our findings on comparable validity and reliability between the full and the reduced scale also translate to laboratory flow research.In addition, we encourage researchers to incorporate measures for intrinsic motivation and skill-demand-balance in e-diaries to capture the main flow preconditions and consequences (Abuhamdeh, 2020).This is especially relevant with regard to the on-going debate about the need for conceptually separating flow characteristics from preconditions and consequences in flow operationalizations (Abuhamdeh, 2020;Peifer et al., 2022).
As a third major contribution, we empirically evaluated the theoretical proposition of assessing flow as a yes-or-no continuous phenomenon.As suggested by Peifer and Engeser (2021), the combination of categorical and continuous flow measures allowed us to draw 1 3 conclusions about flow intensity and probability.This approach is important for both evaluating and designing the adaptive mechanism of flow-adaptive interventions (Bartholomeyczik et al., 2023).The combination of a categorical and a continuous flow measure enables a comprehensive assessment of the intervention's impact on both flow probability (based on the categorical measure) and flow intensity (based on the continuous measure).Similarly, the combination of a categorical and a continuous flow measure allows the flow-adaptive intervention to intervene in situations with flow presence but low levels of flow intensity, or to prevent intervening in situations with high levels of flow intensity.However, even though we chose the combination of the FQ and the FKS based on the theoretical proposition by Peifer and Engeser (2021), we do not recommend this combination for future studies.Our study confirms that the FQ and the FKS are based on two different flow operationalizations, in that the FQ does not capture the fluency component of flow (Moneta, 2021).Specifically, our findings show a stronger prediction of absorption than fluency when participants reported the presence of flow.Thus, if researchers aim for combined conclusions about flow intensity and probability, we recommend combining two measures that are based on the same flow concept.For instance, if researchers apply the flow concept operationalized in the FKS (as we did in this study), we suggest using a dichotomic item asking about flow presence based on this flow concept in combination with the continuous FKS.Alternatively, if the research objective was to solely examine the absorption component of flow, researchers could apply the FQ in combination with the FKS items that load on the absorption factor.
Importantly, interpreting flow as a yes-or-no continuous phenomenon suggests that the reports on the continuous flow measure should only be interpreted if flow presence was reported on the categorical measure (Peifer & Engeser, 2021).Depending on the research objective, researchers could thus also solely apply the categorical measure, such as the FQ.This would be feasible if a flow-adaptive intervention aims to intervene in situations where participants do not experience flow and the researchers solely wanted to evaluate whether that intervention increases the probability of experiencing flow.By contrast, using the continuous measure alone to deduce flow presence bears the risk of imposing flow on participants (Abuhamdeh, 2020;Moneta, 2021).Indeed, in the majority of observations in our study, reports of flow absence accompanied low flow intensity scores.In these cases, evaluating the flow intensity scores alone would lead to interpretations of low levels of flow, although the categorical measure suggests a complete absence of flow.However, a relevant portion of observations that belonged to the lowest range of reported flow intensity (i.e., for which reported flow intensity was within the individually lowest quantile of intensity scores) were simultaneously labeled as presence of flow (momentary states: 15%; coverage: 28%).This observation disagrees with the notion that flow scales generally impose flow on participants.
Interpreting the reports on the continuous flow measure only if participants reported flow presence was reported on the categorical measure would also come with a significant decrease in number of observations for the continuous variable, thereby decreasing statistical power of analyses.Since our findings suggest that absence of flow (indicated by the categorical measure) is associated with low intensity scores, this interdependence in presentation of the flow measures is not necessary though.Rather, our study provides first insights into another approach for reaching conclusions about flow intensity and probability in a single study without two distinct measures.Using the individually lowest intensity scores (e.g.below the individual 25th percentile) as a cut-off between presence and absence of flow is time-and cost-efficient because it only requires one measure while allowing conclusions about both flow probability and intensity.Even though this cut-off might not perfectly distinguish states of flow and no flow, it incorporates individual differences in general flow propensity and is conservative, in that it rather under-than overestimates flow presence.Nevertheless, this approach does not solve the problem of imposing flow on participants who would not report flow at all (Abuhamdeh, 2020;Moneta, 2012).Therefore, we advocate the use of this cut-off method only when a combination with a categorical measure is not feasible due to time or resource constraints.In such cases, we suggest administering a trait flow questionnaire before commencing ambulatory assessment to differentiate individuals who generally experience flow from those who do not.

Limitations and Future Research
In the following section, we need to highlight some limitations of our study as they can provide interesting starting points for future research.Most importantly, we did not separate differences in observation frequency (ten versus five per day) from distinct references (current state versus recall of last two hours) by using a 2 × 2 design.Since we decided to use a within-subjects design to control for individual flow propensity, this would have significantly increased burden for participants requiring them to participate in four conditions, i.e., over the course of four weeks.Also, we argue that the use of different references is directly linked to differences in observation frequency in that for conclusions about similar time periods, we need to adapt the sampling accordingly.Thus, we attributed the differences between the conditions to the differences in reference (e.g. when arguing about the cause for higher flow presence in the coverage approach).However, the momentary states and coverage conditions also differed with respect to questionnaire length, i.e., the use of the reduced versus the full version of the FKS (three versus ten items).Previous research on ambulatory assessment has been inconsistent on whether questionnaire length affects compliance, perceived burden, and within-person variability (Eisele et al., 2022;Hasselhorn et al., 2022).In these studies, the number of items in the shorter questionnaire condition is higher and the differences in number of items between the short and the long questionnaires are greater than in our study.Nevertheless, we cannot rule out the possibility that in our study, the increased variability of flow reports in the momentary states compared to the coverage condition is due to the shorter questionnaire length, i.e., to the lower number of items, rather than to the measurement of momentary states (Hasselhorn et al., 2022).Because of this conflation of potential confounds in the two conditions, future research would benefit greatly from analyzing the most efficient and effective (i.e., unobtrusive but informative) number of observations per day and number of items separately for each observation approach.
In addition, although power estimates from Monte Carlo simulations suggest that the Level 1 sample size (i.e., the number of observations) was large enough to detect small Level 1 effects with high statistical power (Arend & Schäfer, 2019), the Level 2 sample size (i.e., the number of participants) was only sufficient to detect large between-subjects effects with acceptable statistical power.For analyses of our multilevel data, we were primarily interested in Level 1 effects (examining within-person differences in the effect of condition on flow reports and within-person associations between flow reports and associated concepts).Thus, we would not expect changes in results with more participants.However, the effect of the condition order on flow reports was a Level 2 effect.Thus, the interpretation 1 3 of the results regarding this potential confound may be influenced by the limited number of participants.
Directly related to that, one of the two participant groups (differing in order of conditions) generally reported more presence of flow than the other.Despite the random assignment to these groups, this suggests presence of relevant covariates.Since flow is influenced by a range of internal (e.g. level of expertise in the task, interest in the task, general flow proneness; Bricteux et al., 2017;Rheinberg & Engeser, 2018;Wilson & Moneta, 2016) and external factors (e.g.receipt of feedback or not ;Hohnemann et al., 2022), we cannot pinpoint the exact reason for these differences.In addition, differences in reported flow presence could also be due to factors that are completely unrelated to flow, e.g. a generally higher tendency of answering with "Yes" in one of the groups.Also, two participants never reported presence of flow, but both belonged to the group that reported more flow in total.Hence, these outliers also cannot account for the differences.Thus, even though this group difference calls for further investigations into individual differences in flow, our finding of a group effect on flow presence but not on intensity further supports that the two flow measures do not completely coincide.
Consistent with the payment schemes in previous studies (e.g.reducing the payout when compliance was below 33%, Eisele et al., 2022; providing an additional reward for answering more than 80% of queries, Hasselhorn et al., 2022) that analyzed differences in compliance, we compensated participants monetarily and offered an additional incentive for high compliance.While meta-analyses conclude that compliance does not depend on whether participants receive a fixed or a compliance-based payout (de Vries et al., 2021;Ottenstein & Werner, 2022), they do not agree on whether the amount of the incentive impacts compliance (Ottenstein & Werner, 2022;Vachon et al., 2019;Wrzus & Neubauer, 2023).In general, monetary compensation seems to be more effective in increasing compliance than other rewards, such as feedback or vouchers (Harari et al., 2017;Ottenstein & Werner, 2022).Furthermore, the absence of rewards seems to have a negative effect on compliance compared to studies using incentives (de Vries et al., 2021;Wrzus & Neubauer, 2023).Therefore, the lack of a difference in compliance between the two observation approaches in our study may result from the generally high compliance caused by the monetary incentives.Thus, future research could evaluate how differences in compensation impact the effect of the observation approaches on compliance.An interesting avenue for future research may also be to evaluate differences in effects when participants can track their current level of compliance (Trull & Ebner-Priemer, 2020).In conclusion, it is essential to recognize that the results of our study regarding the effect of observation approaches on compliance are limited in their generalizability to studies that do not provide any incentive.
In addition, qualitative feedback of participants at the end of the study revealed an important limitation of the coverage approach.The majority of the participants reported difficulties with providing estimations about their flow if they had switched repeatedly between tasks in the two hours asked about in recall.In addition, we had initially incorporated a question about the number of experienced flow states in the last two hours (in case the FQ had been answered with "Yes") in the coverage approach.However, participants reported strong difficulties in estimating this frequency, so we did not evaluate these data further.Thus, we recommend paying particular attention to the targeted sample and their expected everyday tasks.For participants who complete a very diverse set of tasks, the coverage approach might not be suitable.To prevent this limitation, future research could focus on samples with homogenous tasks, e.g.collecting data in a working sample in a specific company instead of with students who often switch between work and leisure activities during the day.
Lastly, when asking repeatedly about flow, this might trigger participants to eventually say yes because they feel obliged to do so.This speaks in favor of continuous flow scales because they do not nudge people in that direction.However, even with these measures we cannot rule out subjective biases in self-reports.Therefore, flow research calls for more objective markers of flow (Peifer et al., 2022;Peifer & Engeser, 2021).However, despite promising advances in the last years, the current state of neurophysiological methods does not allow gathering these data unobtrusively in the field yet (Gold & Ciorciari, 2020;Peifer et al., 2022).Until then, we argue that self-report-based ambulatory assessment will remain the method of choice for evaluating everyday flow.Since our study focused on the absorption and fluency components of flow, our conclusions are limited to this specific flow concept.When measuring flow experiences in leisurely everyday activities, future research should evaluate how these findings apply to self-report-based flow operationalizations that integrate autotelic experience as a major element of flow.

Conclusion
Despite the aforementioned limitations, our study illuminates how to measure everyday flow in ambulatory assessment.Our findings show that random momentary sampling does not generally excel use of fixed recall periods for gathering information on flow fluctuations.Rather, we recommend choosing the observation approach for capturing everyday flow depending on outcome of interest (i.e., flow intensity or frequency across the day), targeted comparison (i.e., differences within-or between-subjects), concurrent application of neurophysiological measures, and expected task variability of participants.In addition, the present work eases obtrusiveness and burden of ambulatory assessment with self-reports by providing a short flow measure that can be used for conclusions about flow intensity as well as probability.Thereby, we contribute to a methodological basis for building flow-adaptive interventions grounded in real-time insights into individual flow without the need for access to neurophysiological data.

Fig. 1
Fig. 1 Spaghetti plot for flow intensity (FKS) across time in momentary states (A) and coverage (B) condition (thin black lines indicates individual change over time; thick black line indicates mean change over time; blue line indicates illustration of transgression between days)

Fig. 2
Fig. 2 Within-subject associations between flow intensity scores computed based on the full and the reduced version of the FKS (in the coverage condition, blue line indicates mean with confidence intervals)

Table 1
Descriptive statistics of variables assessed in e-diary and feedback queries compared between conditions (S = momentary states, C = coverage)

Table 2
Influence of condition on reporting on flow intensity (momentary states condition: r-FKS, coverage condition: FKS) and probability (FQ) controlling for order and daytime effects

Table 3
Level-specific bivariate correlations between reports of flow intensity (r-FKS & FKS) and flowassociated concepts in the coverage condition (correlations between r-FKS and associated concepts in the momentary states condition are reported in parentheses)

Table 4
Reported presence of flow (FQ) as a predictor for flow intensity (FKS) controlling for order and daytime effects Level 1: n momentary = 955 observations, n coverage = 487 observations; Level 2: N = 38 participants.