The growing popularity of mindfulness-based interventions (MBIs) such as Mindfulness-Based stress Reduction (Kabat-Zinn, 1991) and Dialectical Behavioural Therapy (Dimidjian & Linehan, 2003) and the encouraging evidence of their efficacy in treating psychopathology and increasing well-being has fostered a burgeoning interest in mindfulness research. In this context, mindfulness is typically operationalised as an awareness of one’s experience in the present moment with a nonjudgmental attitude (e.g., Bishop et al., 2004; Kabat-Zinn, 1991), and MBIs aim to benefit the individual by training them to become more mindful in daily life. Hence valid and accurate assessments of trait mindfulness are essential to understanding the mechanisms through which mindfulness practice leads to beneficial outcomes.

At present, researchers in the field have limited means of measuring mindfulness and rely predominantly on the use of self-report scales. The more established mindfulness instruments, such as the Five Facet Mindfulness Questionnaire (FFMQ; Baer et al., 2006) or the Mindful Attention Awareness Scale (MAAS; Brown & Ryan, 2003), display adequate psychometric properties in terms of internal consistency, test–retest reliability, factor structure, convergent validity, and responsiveness to training (for a review, see Baer, 2019). That said, self-report measures have inherent disadvantages. Value-laden items and demand characteristics may bias responses (Van Dam et al., 2009), and the utility of such measures is questionable when respondents with poor metacognition are made to judge their own metacognitive ability (Grossman, 2011). It is therefore critical for the field to find alternative means of measuring mindfulness and to establish the construct validity of these measurement techniques.

In addressing these issues, Levinson et al. (2014) proposed the Breath Counting Task (BCT) as a non-biased, behavioural measure of mindfulness and an alternative to self-report approaches. This task resembles a breathing meditation and requires participants to devote a small portion of their attention to maintaining a count of their breaths: on Breaths 1–8 they had to press one button, and on the ninth breath they pressed another. The authors claimed that BCT performance adequately operationalises mindfulness, since being accurate requires, firstly, attending to their breath in the present, and, secondly, being aware of the experience of attending to the breath (and thus aware of when attention has drifted and needs to be returned to the task at hand). The authors argued for the suitability the BCT as a measurement tool with two reasons. Firstly, the BCT has face validity due to the use of breath focused meditation as a well-established component of mindfulness practice; secondly, the requirements for good performance – attending adequately to the breath in the present moment, and awareness of when attention has drifted away and ought to be returned to task – both link back to present moment awareness as a common component of most modern definitions of mindfulness (e.g. Brown & Ryan, 2003; Levinson et al., 2014). The construct validity of the test was supported by the findings of this study, showing associations between BCT performance and greater meta-awareness, less mind wandering, better mood, and greater non-attachment (Levinson et al., 2014).

Although definitions of mindfulness are still subject to debate, the field generally agrees that mindfulness has attentional and attitudinal aspects, and that both aspects are key to the therapeutic effects of mindful practice. Given its multi-dimensional nature, it may not be possible to design a single test that fully captures this construct, and there may ultimately be value in measuring mindfulness via several complementary methods. On the face of it, the BCT would appear to more specifically reflect mindful awareness and attention rather than, say, a non-judgmental or non-reactive attitude – accurate breath counting requires awareness of one’s breath in the present moment, as well as a meta-awareness of when one has lost focus and ought to return attention to the breath. This view, while common-sensical, lacks proper support as very few studies of the relationship between BCT performance and mindfulness facets exist. This research gap motivates the current study.

If only certain aspects of mindfulness are reflected by BCT measures, a straightforward investigation might be undertaken by examining associations between various BCT performance indices and the five facets of the FFMQ, i.e., subscale scores. We currently know only of three studies (Isbel et al., 2020; Levinson et al., 2014; Wong et al., 2018) that have explicitly compared BCT performance and self-reported trait mindfulness. However, the findings in these papers are mixed, likely because of differences between samples, suggesting that these relationships deserve further scrutiny.

To that end, we carried out a mega-analysis on the relationship between BCT measures and self-reported mindfulness, pooling together raw data from five separate studies. The combined dataset was taken from only participants who had not attended a formal mindfulness course, and endorsed the statement that they did not currently maintain regular mindfulness practice. They were recruited from university student and older community populations. We predicted that BCT accuracy would correlate with the FFMQ facets reflective of attentional attributes – Acting with Awareness and Observing – after controlling for Age and Gender.

In addition, we expanded upon previous work done in Wong et al. (2018), examining not just BCT accuracy rates, but also miscount and reset rates. As described earlier, in the BCT the participant makes button presses according to their breath count – one button for Breaths 1 to 8 and another button for Breath 9. A miscount occurs when the participant makes an incorrect response for where they were in the breath counting sequence, then continues to respond with the two buttons as normal (i.e., an “uncaught error”); a reset occurs when the participant interrupts their two-button response cycle with a third button response, which is a request to restart the count cycle from 1 (i.e., a “caught error”). We hypothesised that miscounts and resets represent different classes of error, and predicted that each would display different patterns of association with FFMQ total and subscale scores. In the Wong et al. (2018) study, we observed that resets were directly correlated with self-reported mind-wandering (responses to the Mind-Wandering Questionnaire, MWQ, Mrazek et al., 2013) while miscounts correlated with everyday attentional lapses and errors in perception, memory and motor functioning (responses to the Cognitive Failures Questionnaire, CFQ, Broadbent et al., 1982). Given the greater conceptual overlap between the MWQ and FFMQ-Acting with Awareness – which both enquire about the respondent’s tendency toward distraction or lack of attention – we predicted a strong association between BCT resets and Acting with Awareness. Meanwhile, only a portion of CFQ items are concerned with distraction and attention, and as a whole do not map straightforwardly unto FFMQ-Acting with Awareness or any FFMQ facet – hence we predicted that weaker or no association might exist between BCT miscounts and FFMQ-Acting with Awareness.

In sum, we predicted associations between attitudinal aspects of mindfulness – as reflected by FFMQ- Acting with Awareness, FFMQ-Observing, and the MAAS – and specific BCT measures, mainly accuracy and reset rates. If such associations can be proved, it would lend support to the BCT as an objective measure of mindful attention.

Method

Participants

The present analysis comprises six groups of participants recruited for five different studies between 2017 and 2021 (Groups 5 and 6 were recruited for the same study). Sample characteristics for each group and publication details are presented in Table 1. Group 1 were older community members who were recruited for a randomised controlled trial of Mindfulness-Based Treatment for Insomnia (Perini et al., 2021) through various channels – a local meditation centre, word-of-mouth, or newspaper and online advertising. Participants were pre-screened for the following inclusion criteria: (1) aged between 50 and 80; (2) fluent in English; (3) reported they had not attended a formal mindfulness course, and endorsed the statement that they did not currently maintain regular mindfulness practice. (4) no cognitive impairment (Mini-Mental State Examination (Folstein et al., 1975) score ≥ 26 and Montreal Cognitive Assessment (Nasreddine et al., 2005) score ≥ 23); and (5) self-reported sleep problems (Pittsburgh Sleep Quality Index (Buysse et al., 1989) ≥ 5 AND at least one of the following: (i) average reported sleep latency of > 30 min; (ii) average wakefulness after sleep onset of > 30 min; (iii) average total sleep time of < 6.5 hr). In addition, participants were excluded if they had major neurological disorders or psychiatric conditions, contraindications for fMRI scanning, concurrent sleep medications or if they could not provide independent consent. This study was approved by the SingHealth Clinical Institution Review Board in November 2017 (identifier 2017–2830) and the National University of Singapore Institutional Review Board.

Table 1 Sample characteristics

Groups 2–6 were students recruited from the National University of Singapore, either through online advertisements posted on the university student portal or through word-of-mouth, who participated in various studies for monetary compensation. The studies were approved by the National University of Singapore Institutional Review Board. Group 2 was recruited for a study (Lin et al., 2020) examining if trait mindfulness would predict psychological and physiological responses to a stress induction. Participation was restricted to males only and participants were screened for fMRI scanning contraindications. Group 3 participated in an unpublished behavioural study examining the influence on trait and state mindfulness on mind-wandering while reading longer and shorter sections of text. Group 4 was recruited for an unpublished behavioural study to develop an objective test of non-attachment using a behavioural economics paradigm. Groups 5 and 6 were recruited for a study examining relationships between trait mindfulness and a number of questionnaires and cognitive tasks (Wong et al., 2018). Group 5 also provided fMRI scans and were screened for contraindications beforehand. Lastly, all included participants reported they had not attended a formal mindfulness course, and endorsed the statement that they did not currently maintain regular mindfulness practice.

Measures

All groups participated in a breath counting task (see below), and these data were generally collected following the main experimental paradigm. For trait mindfulness measurements, Groups 1–5 provided FFMQ (Baer et al., 2006) ratings and Group 6 provided only MAAS (Brown & Ryan, 2003) ratings. For Group 1, these ratings were provided before commencing the mindfulness intervention. All stimuli and questionnaires were presented using Psychtoolbox (Brainard, 1997).

Breath Counting Task

During the BCT, participants were seated comfortably in front of an LCD monitor with both hands resting on a standard QWERTY keyboard. Participants were instructed to maintain awareness of their breath while breathing normally, with their eyes open and resting on the screen. They were to silently count their breaths, without vocalising or using their fingers or other aids, from one to nine, returning to one at the end of each cycle. One inhalation and one exhalation would constitute one breath. For Breaths 1–8, participants were instructed to press < Left Arrow > for each breath; for Breath 9, they were to press < Right Arrow > instead. Participants were instructed that if they lost track of the count, they could press < Space > and restart the count from one on the next breath. Each participant performed this task for 20 min.

For participants in Groups 5 and 6 (reported in Wong et al., 2018), breathing rate was recorded at 32 Hz using a portable recording device (SOMNOtouch RESP, SOMNOmedics GmbH, Germany) with an effort band that was applied around the abdomen of the participant. Synchronization was done manually and the recording was started when the participants triggered a 3-s countdown to begin the task. Once the task started, the screen went blank. Each BCT lasted 20 min. Breathing data was extracted using DOMINOlight software (SOMNOmedics GmbH, Germany) and exported for analysis.

For participants in the remaining groups, breathing rate was not recorded. This decision was informed by our prior experience in administering the BCT in the Wong et al. study, in which participants’ total breath counts were found to correlate very highly with their total number of button presses (r = 0.96). Similarly high correlations have been reported by Levinson et al. (2014). Due to this high correspondence, we opted to exclude breath recording from later studies; instead of physiological confirmation we relied on examinations of BCT behavioural data to determine participant non-compliance (see Data Analyses).

Self-report Questionnaires

The FFMQ (Baer et al., 2006) is a 39-item questionnaire designed to assess trait mindfulness as a construct consisting of five facets: Observing, Describing, Acting with Awareness, Nonjudging of Experience, and Nonreactivity to Inner Experience. Each item is rated on a 5-point Likert-type scale from 1 (never or very rarely true) to 5 (very often or always true). Items relate to one of the five facets, e.g. “When I'm walking, I deliberately notice the sensations of my body moving.” (Observing); “I’m good at finding words to describe my feelings.” (Describing); “When I do things, my mind wanders off and I’m easily distracted.” (Acting with Awareness); “I tell myself I shouldn’t be feeling the way I’m feeling.” (Nonjudging); “In difficult situations, I can pause without immediately reacting.” (Nonreactivity). Subscale scores for each facet were computed by appropriately summing up the ratings (or reversed ratings) for individual items. Total FFMQ scores were the sum of the 5 subscale scores. The original validation study (Baer et al., 2006) reported the five facets to have adequate to good internal consistency (Cronbach’s alpha (Cronbach, 1951) values ranging from 0.75 for Nonreactivity to 0.91 for Describing) and expected correlations with conceptually related constructs (for example, positive correlations with emotional intelligence and self-compression; negative correlations with alexithymia, absent-mindedness, and dissociation). Correlations between the five facet scales were modest but significant – only one correlation (Observing with Nonjudging) was non-significant, all others ranged from 0.15 to 0.34.

The MAAS (Brown & Ryan, 2003) is a 15-item questionnaire designed to measure trait mindfulness as a unidimensional construct, which the authors describe as awareness of and attention to present events and experiences. Each item is rated on a 6-point Likert-type scale from 1 (almost always) to 6 (almost never). Items in the MAAS assess mindfulness indirectly, and query for lapses in mindfulness rather than mindful states (e.g., “I rush through activities without being really attentive to them.”). MAAS scores were computed from the mean of all 15 items; higher scores indicated greater mindfulness. The authors reported adequate internal consistency (Cronbach’s alpha of 0.82) and a number of indicators of convergent and divergent validity (for example, positive correlations with openness to experience, emotional intelligence, and well-being; negative correlations with rumination and social anxiety). As MAAS has high correspondence with Acting with Awareness (5 of the 8 Acting with Awareness items are MAAS-derived; Baer et al., 2006), data from this questionnaire from Group 6 were z-scored and combined with z-transformed Acting with Awareness subscores from Groups 1–5 for analysis.

Control Variables

Age and gender were entered as control variables in all our data analyses. This was not informed by any preexisting evidence or theory that either factor might predict mindfulness skills, but to exclude influences unrelated to mindfulness from modelling outcomes.

There is some evidence of increases in self-reported dispositional mindfulness over the life span (more mindful FFMQ ratings; Mahlo & Windsor, 2021). However, we were also certain that age would predict BCT performance for reasons unrelated to mindfulness, given that our older participants (which were recruited from the public) were much less familiar with formal research settings and tasks as compared to younger participants (who were predominantly university students and staff). Hence older participants might be expected to be more compliant and willing to adhere to the fairly effortful BCT, resulting in better performance. Additionally, age-related variance in our samples would not reflect greater experience with actual mindful practice, since all participants were non-meditators.

Conversely, we did not expect gender to predict BCT performance, and there are no strong theoretical links between gender and trait mindfulness (e.g., Baer et al., 2008). We noted that female participants made up the majority (about 60%) in all of our samples (excepting Group 2 where gender was an exclusion criterion; see Table 1), but whether this observation reflects an underlying influence that would result in better BCT performance by one gender or another (e.g., more interest in research participation, hence more experience or compliance) remains speculatory.

To determine if age and gender ought to be retained as controls when modelling trait mindfulness on BCT performance, we carried out initial analyses using partial correlations of associations between age/gender and each BCT measure (accuracy, miscount and reset rate). These results, and why they support our inclusion of age and gender controls in latter analyses are described in Results subsection BCT Performance and Associations with Age, Gender, and Task-Related Factors.

Data Analyses

Breath-counting data (see also Wong et al., 2018) were broken down into count cycles that terminated with either the < right arrow > or < space > key press. A correct cycle was defined as a sequence of eight < left arrow > key presses followed by one < right arrow > key press. A miscount cycle was any count cycle that ended with a < right arrow > key press without eight < left arrow > key presses preceding. A reset cycle was any count cycle that ended with a < space > key press.

We computed overall accuracy as the number of correct count cycles over the total number of count cycles; we similarly computed overall miscount rate and reset rate as the number of miscount cycles and reset cycles over the total number of count cycles respectively. In addition to these metrics, we removed instances of what we term singles, which we defined as two right arrow key presses in a row, reasoning that such trials are highly likely to represent mechanical errors rather than genuine miscounts.

For all analyses, we first excluded data from participants with (1) incomplete demographic data (primarily missing age, gender). Next, we excluded participant data with (2) accuracy at or below 25%, (3) fewer than 10 BCT count cycles or greater than 180 BCT count cycles. These participants were deemed to be non-compliant to the breath-counting instructions, who presumably spent the session pressing keys at random either excessively or rarely. For Groups without objective breath measurements, distributions of “breath durations” (i.e., durations between key presses) were also examined for non-normality – highly skewed or bimodal distributions were taken to indicate a non-compliant participant. Twenty-three participants were excluded based on these criteria. In addition, we excluded data from three additional participants who had (4) a reset rate greater than 45%—their data represented outlier performances which contributed significantly to correlations with a number of predictors. We note that these exclusion criteria are more detailed and stringent than those suggested by our group previously (Lim & Doshi, 2022; Wong et al., 2018). The final sample used for analyses consisted of 430 individuals; the numbers excluded per criterion and per sample are indicated in Table 1.

Multiple linear regression models were used to examine how BCT performance was associated with subscales of the FFMQ while controlling for Age and Gender. Due to concerns of multicollinearity arising from high correlation between the six FFMQ measures, we opted to model each measure separately and did not analyse all six as predictors within a single model. Hence for each combination of BCT measure (Accuracy, Miscounts, Resets) and FFMQ measure (Total, Observing, Describing, Acting with Awareness, Nonjudging, Nonreacting), a Main Effects Model and an Interaction Model were generated – the Main Effects Model contains Age, Gender and the given FFMQ measure as predictors; the Interaction Model contains the Main Effects terms as well as an interaction term of Age by the given FFMQ measure. These procedures generated 18 Main Effects and 18 Interaction models in total. A Sidak correction was applied for 18 multiple comparisons (i.e., analyses were divided into two families, one for the Main Effects Models and another for the Interaction Models), re-adjusting alpha to 0.00285. For simplicity, the same alpha is adopted for subsequent partial correlation calculations between BCT and FFMQ measures since these replicate the linear model analyses.

With regards to interpreting the mentioned linear models, a caveat is that the present BCT scores contain a proportion of zeros/ones and exhibit heteroskedasticity (Table S2 in Supplementary Information) – for reset rates, zero-inflation is very substantial (35.6% of participants have 0% reset rate); for accuracy and miscount rates, one- and zero-inflation respectively is less apparent and suggests a small floor/ceiling effect in task performance (2.6% of participants have 100% accuracy; 8.1% of participants have 0% miscounts). Techniques to manage skewed or zero-inflated data were considered but not utilised. We did not use log-transformed data as doing so did not change the significance values of any of the linear models. We had also considered zero-inflated models, but rejected this approach due to their assumptions being excessive for the present dataset (see footnote to Table S2 in Supplementary Information). Ultimately we felt it more informative to present our data in linear models with untransformed data, since these are widely understood and easier to interpret. All statistical analyses were conducted in R (R Core Team, 2015).

Results

Reliability of Self-Report Measures and Breath-Counting Test

Cronbach’s alphas (α; Cronbach, 1951) and McDonald’s omegas (ω; McDonald, 2013) were used to assess the internal consistencies for all self-report scales and samples; these are presented in Table 2. All FFMQ facets or the MAAS in all samples had at least adequate consistency (α, ω > 0.70) except for Nonreacting in Group 5 (7 items; α = 0.57; ω = 0.59). For this subscale, the other samples displayed good consistency (α = 0.82–0.89; ω = 0.80–0.82).

Table 2 Cronbach’s alpha and McDonald’s omega coefficients for FFMQ and MAAS scales

To assess reliability for the BCT, split-half correlations (even vs. odd count cycles; two-way average measure Intraclass Correlation Coefficients with Consistency definition) were computed for each sample; these are presented in Table 3. Nearly all three BCT measures – accuracy, miscount rate and reset rate – displayed at least moderate reliability for all Groups (all ICC > 0.50, all p < 0.001); one exception was miscount rate in Group 2 which had poor reliability (ICC = 0.40).

Table 3 Split half correlations (Even vs. Odd Count Cycles; Two-way average measure Intraclass Correlation Coefficients with Consistency definition) for BCT measures

BCT Performance and Associations with Age, Gender, and Task-Related Factors

Table 4 presents descriptive statistics for the three BCT measures – accuracy, miscount rate, and reset rate – for each Group and the full dataset. We first examined associations between BCT measurements and Age and Gender using partial correlations.

Table 4 Group means and standard deviations for breath counting accuracy, miscount rate and reset rate

Age was a significant predictor of accuracy and reset rates after controlling for gender; older participants had higher accuracies and lower reset rates compared to younger participants (Accuracy: pr(427) = 0.17, p < 0.001; Miscounts: pr(427) = -0.04, p = 0.46; Resets: pr(427) = -0.29, p < 0.001). This outcome supports our reasoning that Age would be associated with better BCT performance due to differences in participant compliance and/or experience, and hence ought to be retained as a control. Conversely, gender as a predictor of BCT performance was not significant after controlling for age (Accuracy: pr(427) = 0.06, p = 0.17; Miscounts: pr(427) = -0.09, p = 0.056; Resets: pr(427) = -0.03, p = 0.50). However, since correlation between Gender and Miscount Rate had approached significance, we decided to retain Gender as a control to remove this non-negligible source of variance from our models, our reasoning being that any association Gender might have with BCT performance, if present, ought to be excluded as an irrelevant influence (as there is no strong theoretical link between mindfulness and gender).

To ensure that accuracy was not influenced by respiration rate, we correlated this variable with the total number of breath cycles, and found no association after controlling for age and gender (pr(426) = 0.06, p = 0.18). There was also no significant correlation between miscount rates and reset rates after controlling for age and gender, indicating that the two kinds of errors arise for different reasons, and could be used independently in subsequent analysis (pr(426) = 0.02, p = 0.66).

Relationship Between Breath Counting and Self-Reported Trait Mindfulness

Multiple linear regression models (18 Main Effects Models and 18 Interaction Models) were used to examine how BCT performance was associated with subscales of the FFMQ while controlling for Age and Gender (data from Groups 1 to 5, N = 350). Table S1 (in Supplementary Information) presents descriptive statistics for FFMQ total and subscale scores; Tables 5, 6, and 7 presents the regression results as well as change statistics (F-change and R-squared) for selected terms in their respective models. Note that for the Interaction Models, we report statistics only for the by-Age interaction terms and omit the remaining predictors, since these are replicated in and reported for the Main Effects Models.

Table 5 Change statistics and regression outcomes for FFMQ total and subscale scores as predictors of BCT accuracy
Table 6 Change statistics and regression outcomes for FFMQ total and subscale scores as predictors of BCT miscount rate
Table 7 Change statistics and regression outcomes for FFMQ total and subscale scores as predictors of BCT reset rate

The by-Age interaction terms failed to reach significance in any of the Interaction Models (all p > 0.1). This outcome is important since the current sample consists of older and younger participants recruited through different channels. As such, the sample was highly likely to be heterogeneous in socio-demographic factors, such as socio-economic status and education level. The absence of significant by-Age interactions indicates that the relationship between trait mindfulness as indexed by the FFMQ and BCT performance are not different between older and younger samples. Subsequently, it suggests that these socio-demographic factors are unlikely to confound subsequent interpretations of FFMQ-derived predictors in the Main Effects Models.

Results of the Main Effects Models showed that only Acting with Awareness was a significant predictor of BCT reset rate (estimated β = -0.26, SE = 0.072, t = -3.6, p = 0.00043), over and above the influence of Age and Gender. The equivalent partial correlations were also computed and are presented in Table 8; we adopted the same Sidak corrected alpha of 0.00285 utilised for the linear models since these are equivalent analyses. All other predictors failed to reach significance in their respective models (p < 0.10). For visualisation purposes, we also present selected scatterplots and zero-order Pearson correlations in Fig. S1 (see Supplementary Information).

Table 8 Partial correlations between BCT measures and FFMQ scores after controlling for age and gender

We incorporated Group 6 into the present analysis by z-score transforming the Acting with Awareness subscores from Groups 1 to 5 and the MAAS scores from Groups 6 and combining the datasets. Associations between BCT performance and Z-transformed scores were examined by computing partial correlations while controlling for Age and Gender.

Descriptive statistics for the z-transformed and original scores of the combined Group 1–6 dataset are shown in Table S3 (see Supplementary Information). Within the Group 6 dataset alone, the untransformed MAAS scores (n = 80; M = 55.9; SD = 10.8) were already significantly predictive of accuracy and miscounts after controlling for age and gender (Accuracy: pr(77) = 0.34, p < 0.01; Miscounts: pr(77) = -0.39, p < 0.001; Resets: pr(77) = 0.01, p = 0.88). Z-transformed scores of the full dataset (n = 430) were significantly predictive of accuracy and reset rates at the Sidak corrected alpha (Accuracy: pr(427) = 0.14, p = 0.0015; Miscounts: pr(427) = -0.08, p = 0.090; Resets: pr(427) = -0.15, p = 0.00099). For visualisation purposes, we also present scatterplots and zero-order Pearson correlations in Fig. S2 (see Supplementary Information).

Discussion

In this mega-analysis, we examined combined data from five previous studies for associations between BCT performance and facets of self-reported trait mindfulness in order to study their convergent validity, while controlling for age and gender. After Sidak corrections, only BCT reset rates correlated with FFMQ Acting with Awareness subscores; BCT accuracy and miscount rates did not correlate significantly with any FFMQ-derived measure. We also carried out an alternate analysis – z-transformed Acting with Awareness subscores and z-transformed MAAS scores were combined into a single dataset, then tested for associations with BCT measures. These z-scores correlated significantly with both BCT accuracy and reset rates.

Three prior studies have addressed the relationship between BCT performance and self-reported trait mindfulness. Levinson et al. (2014) found a significant correlation between breath counting accuracy and FFMQ total scores in a combined sample of experienced meditators and meditation-naïve student and community participants. Conversely, Wong et al. (2018) did not find a correlation between BCT accuracy and MAAS scores in non-meditators; Isbel et al. (2020) similarly did not observe significant correlations between BCT accuracy and MAAS and FFMQ total scores in students prior to mindfulness training, but did find correlations with FFMQ total score only in the same students post-training. The three studies largely do not agree in their findings, possibly due to differences in meditation experience across samples. The Levinson et al. (2014) results displayed some heterogeneity: in the student subset (n = 93) BCT accuracy correlated significantly only with Acting with Awareness scores, while in the meditator and older community subset (n = 39) BCT accuracy correlated with Describing, Nonjudging and Nonreactivity scores (Levinson et al., 2014, supplementary Table S3). This observation coincides with evidence of meditators and nonmeditators exhibiting different factor structures in their responses to the FFMQ (Christopher et al., 2012; Williams et al., 2014).

Our current results lend partial support to our hypothesis that BCT performance would be selectively predicted by the attention-related facets of mindfulness and not by the affect-related facets. Of the five FFMQ facets, only Acting with Awareness—the tendency to attend to one's present activities—was associated with lower reset rate in the BCT. While we hypothesized that the Observe facet might also be correlated with BCT performance, this was not the case. However, it has been posited that scores on this subscale are moderated by meditation experience, and may lack validity in non-meditators such as those in our current sample (Baer et al., 2006; Christopher et al., 2012; Lilja et al., 2012).

More importantly, however, our findings suggest only weak support for the BCT as a measure of attentional mindfulness in non-mediators, which is in general agreement with the findings of the three studies mentioned earlier. Taken together, the regression modelling and partial correlation results suggest significant but only small correlations (0.12 to 0.18) between mindful attention, as indexed by FFMQ-Acting with Awareness and MAAS scores, and BCT accuracy and reset rate. The poor to moderate internal reliabilities, especially for reset rates, suggests that the BCT measures are still subject to a considerable amount of noise. This measurement error might explain the present small correlations, and might be linked to (1) the limitations of self-report measures with mediation-naïve participants, (2) a lack of familiarity with meditation-related practices and needing to learn them as new tasks, or (3) floor effects especially in the case of reset rates, resulting in a very high proportion of zero scores (Table S2; further discussion below). Hence only very little of the variance of the BCT measures can be attributed to mindful attention. Therefore, the utility of the BCT as a mindfulness measure specifically in non-meditators is limited and cannot be a replacement for more established self-report techniques, for instance.

Our findings in the light of previous studies also suggest that the BCT is probably more useful with meditator samples, or in studies which test participants before and after a mindfulness intervention. For example, a study by Isbel et al. (2020) indicated that the BCT has greater discriminant ability that self-report measures—improved BCT performance was observed only in a mindfulness training condition and not in a control computerised attention training program, while FFMQ and MAAS scores increased similarly in both training conditions. A similar result was obtained by Stieger et al. (2021), with increases in BCT accuracy following a Mindfulness-Based Stress Reduction course, and no significant change in BCT accuracy in a waitlist control group. These outcomes suggests that BCT scores may be sensitive to specific aspects of mindfulness training, compared to the broader capture of both specific and nonspecific training factors by self-report measures.

Additionally, the results also indicate a dissociation between miscount and reset rates in the BCT—reset rates were inversely correlated with Acting with Awareness, while miscount rates were not associated with any FFMQ-derived measure. This outcome suggests that BCT resets are more specifically reflective of mindful traits, while BCT miscounts arise from broader, nonspecific causes. This observation finds partial agreement with our conclusions from Wong et al. (2018), who reported a functional dissociation such that resets were associated with mind-wandering while miscounts were associated with attentional lapses. However, in contrast with our present analysis, this earlier paper found that neither resets nor miscounts were associated with mindful awareness as indexed by the MAAS. The current analysis thus refines and updates our earlier report in Wong et al. (2018) and underscores the need for more work to validate the BCT and other behavioural measures of mindfulness using large and diverse samples.

Are BCT reset rates worth looking into as an indicator of mindful awareness? Test–retest reliability of reset rate (shown in previous work; Wong et al., 2018) is substantially higher than overall accuracy. However, reset rate data have a number of characteristics that are unfavourable toward its utility as a scale. Reset rates will always have extremely skewed distributions as well as zero-inflation and hence are truncated as a measurement scale – high reset rates may reliably indicate low mindfulness, but low reset rates are less informative. Highly skewed data is also less amenable to statistical testing or modelling that assumes normality. That said, the present observations still have implications for how mindful attention might best be measured. The fact that weak associations with trait mindfulness are observed with reset rates but not accuracy or miscount rate suggests that a task based on detecting lapses of attention, compared to performance in a task requiring sustained focus, might be a more suitable method of measuring mindful attention. Further support for this idea comes from the deterministic relationship between accuracy rate and reset rate (since inaccurate count cycles must be classified as either miscounts or resets), which suggests that the variance in FFMQ-Acting with Awareness explained by accuracy is merely a smaller subset of that explained by reset rates. In other words, accuracy might be capturing the same and fewer aspects of mindful attention as reset rates.

Hence an alternative approach that is focused on catching attention lapses might hold more potential as a behavioural index of mindful attention. An existing example would be the Meditation Breath Attention Score (MBAS; Frewen et al., 2016), in which participants carrying out a breath meditation task had to respond to random periodic tones and indicate whether they were still on-task (i.e., attending to the breath) or distracted. The MBAS, which reflects the number of probe-caught lapses of attention during meditation, and was shown in most cases to have a significant but small correlation with FFMQ-Acting with Awareness (e.g., Frewen et al., 2016; conversely see Frewen et al., 2011) and MAAS scores (Frewen et al., 2008) in agreement with our findings with BCT reset rates. That said, there is an important qualitative difference between probe-caught lapses in the MBAS and self-caught lapses in the BCT – further study will be required to determine which is more appropriate for a measure of mindful attention.

Limitations and Future Research

Considerable methodological improvements must be carried out before any of the BCT indices can become a viable measurement tool. Reset rates are highly zero-inflated and also show poor internal reliability. One possible cause is participants who neglect resetting the counts, viewing it as auxiliary to the main task of breath counting (i.e., they do not reset even when they realise they have lost the count, adding to the miscount rate instead). To alleviate this issue, the participants’ pre-experiment briefing should highlight the study’s focus on uncaught and self-caught attentional lapses, explain that resets are the sole observable indicator of self-caught lapses, and emphasise the importance of resetting appropriately. Another approach would be to make the BCT task harder – this should reduce floor/ceiling effects in miscounts/accuracy and encourage reporting of resets, hopefully also improving internal reliabilities – this could be done by imposing a dual-task setting (e.g., responding to an infrequent sound on top of the BCT). The present study continues the search for an objective and standardised assay of mindful awareness. There has been a long-standing call to develop objective mindfulness measures with strong psychometric properties (Grossman, 2011; Hadash & Bernstein, 2019) in order to reduce over-reliance on self-report questionnaires, and mitigate their methodological shortcomings. For example, self-reported assessment of mindfulness is subject to demand characteristics – individuals that have undergone mindfulness practice are likelier to endorse statements of ideal mindful experience compared to individuals who are unfamiliar with these concepts. Another key issue is whether respondents are able to accurately introspect and rate their own internal state. Such disadvantages would not apply to a behavioural assay, which is based on objective measurements. In addition, a behavioural assay would also be useful in contexts where questionnaires may be less ideal, such as cross-cultural or cross-linguistic comparisons, or in studies where cognitive or linguistic impairment might be a factor.

More importantly, objective data can address a pertinent issue in the field: how mindfulness ought to be defined and operationalised. We claim that a key step towards understanding mindfulness can be taken via comparisons of first-person and third-person (i.e., behavioural or neurological) measures, allowing each to inform the other and thereby specify their underlying theoretical constructs (see also Van Dam et al., 2018). One approach towards narrowing down the definition of mindful awareness would be to test for correlations between performance in mindfulness measures – e.g., the BCT, the MBAS – and specific psychological attitudes (e.g., non-judging acceptance of experiences) or behavioural tendencies (e.g., sustained attention performance, meta-awareness) that are often implicated by theories of mindfulness. In this manner, we might identify the specific psychological characteristics that are related to actual mindful behaviour.

The growing popularity of mindfulness interventions, coupled with significant gaps in our understanding of the mechanisms behind their effects, point to an urgent need for the field to agree on and consolidate theories of mindfulness and methods of assessment. Our present findings suggest that a paradigm based on probing lapses of attention has promise as an objective measure of mindful attention, which could be used to complement self-report scales. Standardising measures of the attentional aspect of mindfulness through the use of the tests such as the BCT and accumulating a body of such data could be key to a firmer reification of mindfulness as a psychological construct.