Planfulness: A Process-Focused Construct of Individual Differences in Goal Achievement

Goal pursuit outcomes are partly caused by the way people think about goals. Specific patterns of thought can increase the likelihood of goal achievement, such as generating heuristics to automate goalrelated decision making, orienting present-moment attention to the future to increase the salience of a distal goal, and contrasting the anticipated enjoyment of an achieved goal with the progress required to complete it. However, it is unknown whether there are stable individual differences in the tendency to deploy particular meta-cognitions during goal pursuit. A tool to assess such differences would help to identify and intervene on personal barriers to goal progress. Here, we define a new construct within the conscientiousness domain—planfulness—that captures a person’s proclivity to adopt efficient goal-related cognition in pursuit of their goals. We hypothesize that planfulness consists of three interrelated facets representing distinct mental processes, temporal orientation (TO), cognitive strategies (CS), and mental flexibility (MF), and that planfulness predicts goal achievement on an individual basis. We developed a 30-item Planfulness Scale with three subscales tested and refined across 5 studies and 10 samples (total unique N = 4,318) using iterative exploratory and confirmatory factor analysis on data collected from both student and on-line samples. The Planfulness Scale demonstrated both convergent and discriminant validity when compared to other measurements, and scale scores predicted goal progress in a longitudinal study. We find that planfulness is a useful new construct for self-regulation research, and the 30-item Planfulness Scale to be a valid and reliable measurement of real-world goal achievement.

How do people turn their desires into reality? The process of transforming something wanted into something obtained involves goal setting and pursuit. A goal is a desired end state, but many people struggle to translate their initial goal formations to ultimate goal success. Previous experimental research indicates that if a goal is cognitively linked to a set of means to achieve it, that provides a framework for coordinating cognitive resources, decision-making, and behavior. In parallel, there has been some work on the trait construct of "planfulness," but there is not yet a clear link between the experimental and trait-based approaches success in goal pursuit. The aim of the present work is to develop a measure of individual difference in cognitive processes that lead to successful goal pursuit and gather initial evidence about its validity. Such a measure would help identify specific ways that individuals can improve their likelihood of achievement success.

Background: Two Streams of Goal Pursuit Research
Our approach brings together insights from two parallel research programs. First, working primarily from an experimental approach, psychology research conducted within laboratory contexts has uncovered a number of specific patterns of thought that are likely to lead to effective goal pursuit. When individuals are experimentally induced to use strategies such as linking specific goals to particular patterns of behavior (Gollwitzer, 1993) and envisioning the distance between future goal achievement and the present reality of one's progress (Oettingen, 2000), these strategies reliably increase rates of goal attainment in controlled lab experiments. This evidence suggests that the way one thinks about a desired goal contributes to its eventual achievement.
Second, researchers from a complementary approach focus on the influence of enduring traits or dispositions associated with successful goal pursuit. Work in this field often starts with phenotypic description, such as personality attributes represented in natural language or questionnaire items based on the descriptive language of applied practitioners (John, Naumann & Soto, 2008;Saucier & Srivastava, 2015). Such work has established links between certain stable individual differences and achievement of 'real-world' outcomes. Among the Big Five personality traits, conscientiousness is predictive of successful outcomes in a variety of domains, including academics (Noftle & Robbins, 2007), health (Roberts, Walton, & Bogg, 2005), and work performance (Dudley, Orvis, Lebiecki, & Cortina, 2006). Grit, a trait describing one's degree of sustained effort toward long-term pursuits, is also associated with measures of academic success, including undergraduate GPA and National Spelling Bee ranking (Duckworth, Peterson, Matthews, & Kelly, 2007); grit is strongly associated with conscientiousness (Credé, Tynan, & Harms, 2016). These findings, amongst others (Roberts, Kuncel, Shiner, Caspi, & Goldberg, 2007;Shiner, Masten, & Roberts, 2004), suggest that it is critical not to overlook the role of individual differences in considering the likelihood of successfully reaching a goal.
In the present work, we draw on insights from both experimental and individual-differences research. Our overarching goal is to weave these two threads together to account for naturally occurring variation in goal achievement across individuals as a function of metacognitive tendencies. To do this, we conducted a series of studies to build and validate a comprehensive measure of a construct within the conscientiousness domain, planfulness, which refers to individual differences in the tendency to use specific cognitive processes that experimental research has shown to promote goal achievement. As such, planfulness is hypothesized to be useful beyond broader measures for capturing individual differences in goal progress.

Planfulness
We use the term "planfulness" to describe an individual's tendency to engage in three particular patterns of thought with respect to goals. Previous work has described planfulness in terms of a general tendency to plan before acting (Frese, Stewart, & Hannover, 1987) but did not attempt to systematically link the construct to the cognitive habits that relate to successful self-regulation in the laboratory. We conceptualized of planfulness as relatively stable in individuals across time, circumstances, and goals. To the extent that individuals vary on a continuum of planfulness, those who are high in this trait are expected to tend to deploy more effective metacognition when they pursue goals and therefore be more likely to achieve them.
We propose that planfulness consists of three interrelated facets that map on to the broad cognitive strategies that self-regulation research shows to be reliably associated with improved goal outcomes in experimental settings: temporal orientation (TO) to the future implications of present behavior, mental flexibility (MF) in contextualizing one's actions in terms of one's goals, and cognitive strategies (CS) to anticipate and deal with potential obstacles. Importantly, this is not to suggest that these facets cover the entire scope of possible factors that contribute to goal achievement, but rather that the facets represent initial categories of practices that are currently known to reliably improve goal progress. Additional facets would be hypothesized to the extent that additional cognitive strategies are discovered that also reliably improve goal progress and explain additional variance beyond these three.

Relation to Conscientiousness
The trait of planfulness is conceptually situated in the broader personality domain of conscientiousness. Conscientiousness, like the other Big Five personality characteristics, was initially identified via the lexical approach; that is, by using factor analysis on persondescriptive adjectives sampled from natural language to isolate distinguishable patterns of how individuals describe one another. This approach assumes that the coalescence of many words around a single concept indicates that that concept is socially important (Saucier & Srivastava, 2015). Although the exact number and type of conscientiousness facets is contested (Costa, McCrae, & Dye, 1991;Hill & Roberts 2011;MacCann, Duckworth, & Roberts, 2009) the resulting broad factor assesses individuals on continua of orderliness, self-control, punctuality, and dutifulness. High conscientiousness has been associated with a number of positive life outcomes, including academic success (Chamorro-Premuzic & Furnham, 2003) and longevity (Bogg & Roberts, 2004). There are also many other measures related to self-control and achievement that are not explicitly labeled as facets of conscientiousness (see Duckworth & Kern, 2011, for a meta-analysis) that are expected to overlap with planfulness.
Despite all that is known about conscientiousness, questions remain about the psychological mechanisms and processes that contribute to individual differences in conscientiousness and influence behavior and longterm outcomes (Roberts, Lejuez, Krueger, Richards, & Hill, 2012). A personality trait can be described as the central tendency of a distribution of states, each of which reflects momentary outputs of processing (Fleeson, 2001). In the case of conscientiousness, the constituent processes are only partially understood, in part because models of the lower-order structure of conscientiousness have largely been derived from factoranalytic studies of natural language or descriptive item sets (Roberts et al., 2012). Because of our focus on the specific cognitive processes identified as effective in laboratory experiments, the present research may help clarify some of the specific mechanisms by which highly conscientious individuals achieve goal successes, and in turn, may therefore reveal tactics that individuals low in conscientiousness could be taught in order to encourage goal achievement success.

Distinctiveness from Motivation and Impulsivity
Items that specifically tap individuals' level of motivation to pursue a goal or their level of impulse control were deliberately not included in the Planfulness Scale. Planfulness is distinct from motivation and impulsivity for three reasons. First, experimental research has found that effective goal-related cognition increases rates of achievement regardless of an individual's level of motivation (Gollwitzer, 1999;Orbell & Sheeran, 2000) and capability for self-control of emotional impulses (Hofmann, Deutsch, Lancaster, & Banaji, 2010;Webb & Sheeran, 2003). Second, from a trait perspective, the relation between motivation and conscientiousness has long been unclear (Roberts et al., 2012). By avoiding items that refer to motivation, we were able to avoid this conceptual confusion and focus more precisely on cognitive processes that are conceptually distinct from motivation and impulse control. Third, although some types of impulsivity are conceptually highly similar to planfulness (e.g., lack of planning; Whiteside & Lynam, 2001) and are expected to relate negatively to it, impulsivity as it relates to in-the-moment behavior is conceptually orthogonal to planfulness (Frese et al., 1987), a divergence from constructs strictly related to selfcontrol (Friese, & Hofmann, 2009). A person who is low in motivation and/or highly impulsive is not definitionally low in planfulness; one can be unmotivated but still plan, or plan carefully then act on impulse in the spur of the moment. Also, planfulness could compensate for low motivation or high impulsivity. For example, the effects of impulsive behavior might be mitigated by adhering to specific cognitive strategies, and low levels of motivation might be overcome by reflecting on how a current action brings goal achievement closer. These relationships can be borne out empirically but are part of the construct definition of planfulness.

The Present Research
Adopting a future time perspective, developing implementation intentions, and mentally contrasting the feeling of goal actualization with the current reality have been found to be especially beneficial to goal progress. The present research investigates whether there are stable differences across individuals in their tendency to use those processes. To the extent that those differences do exist, people who tend to engage in beneficial patterns of meta-cognition are likely to make progress in their goals. Therefore, the hypothesized three-facet structure of planfulness containing Temporal Orientation (TO), Mental Flexibility (MF), and Cognitive Strategies (CS) elements directed the development of the Planfulness Scale and the selection of items to represent of each of these three facets.
We used the model of scale development suggested by Simms (2008) to balance theoretical and psychometric considerations with respect to inclusion of scale items. First, reflecting the substantive validity phase, we used our theory and related literature to define the planfulness construct and develop an initial item pool (Study 1 -see Supplemental Materials). Second, in the structural validity phase, we use an iterative process to refine scale items based on psychometric evaluation (Studies 2 through 4). Finally, we present Study 5 as the first step in the external validity phase, in which we test the Planfulness Scale on convergent, discriminant, and criterion validity. The latter is tested through the ability of our finalized scale to longitudinally predict goal achievement.

Studies 2 through 4: Iterative Scale Refinement
For the sake of brevity, we begin with reporting the formal psychometric testing of different versions of the Planfuness Scale. A full accounting of how the hypothesized structure of planfulness drove the initial item selection, and testing of a pilot pool of items in Study 1, can be found in the Supplementary Materials. The results of Study 1 supported the three-component structure of planfulness and provided 42 scale items for refinement in Studies 2 through 4.
The purpose of Studies 2 through 4 was to iteratively test and improve a selection of scale items in a hypothesized model of planfulness until adequate scale performance was reached, as assessed by psychometrics and measures of model fit. For clarity, each study here refers to a unique scale version, and each sample within a study refers to a unique group of people who completed that scale version. Our framework for this process was to begin with item and model testing, then make modifications of the scale based on the previous analyses, and to iterate through the cycle again by testing these modifications with either the same or a different version of the scale in a new, independent sample. For example, if certain items exhibited poor psychometrics in one sample, improvements in model fit with these items removed were confirmed in a new sample. All model tests were conducted using entirely independent samples, often resulting in the collection of multiple samples per study; in this way we ensured that the final scale and model versions would not be unduly influenced by the characteristics of any given sample.
As the methods for data collection and analysis across Studies 2 through 4 were the same, the following sections review these iterative studies collectively.

Participants
A total of five independent samples were collected from Amazon's Mechanical Turk (mTurk) to test the three different scale versions used in Studies 2, 3, and 4. mTurk serves as an online repository for tasks (called "HITs") that participants (called "workers") complete in exchange for market rate-based payment. The workers on mTurk have been shown to be demographically diverse (Paolacci, Chandler, & Ipeirotis, 2010) and as attentive or more attentive during experiments than the typical university student sample (Hauser & Schwartz, 2014). Participants were individuals 18 years or older, native English speakers, and current U.S. residents. We paid for a completed HIT at the rate that we paid for lab-based experiments, between $5-$6 per half hour. Prior to being permitted access to the study, individuals were asked to provide their mTurk worker ID. These IDs were then prescreened to ensure that each sample consisted of unique participants. Workers who had previously participated in a planfulness study were not granted access to the survey, nor was any data recorded from those workers. In total, 813 participants were recruited for Study 2 (407 in A; 406 in B), 814 participants were recruited for Study 3 (416 in A; 398 in B), and 406 participants were recruited for Study 4.

Materials
Each study tested a different version of the Planfulness Scale. Each scale version was similar to the previous, with changes reflecting either the exclusion or slight word alteration of one or more items; these changes were based on the results of previous exploratory and confirmatory analyses as described below. Each version of the scale included items meant to assess the three hypothesized facets of planfulness, and included forward-and reversekeyed statements. The additional seventeen items generated for Study 3 were primarily developed using the same tactic used to generate the initial pilot pool of items. Several original items were also created for this scale version in an attempt to balance the number of statements per each subscale; for example, the item, "When it comes to achieving my goals, I think of any misstep as a failure," was developed for the MF subscale during this process. Participants in each sample responded to scale items using a five-point Likert scale (1 = Strongly disagree; 3 = Neither disagree nor agree; 5 = Strongly agree). Each scale version took between ten and twenty minutes to complete. See Table 1 for a summary of scale version construction and results.

Procedure
All studies were approved by the University of Oregon Research Compliance Services. Each version of the Planfulness Scale was completed online. After electronically consenting to participate in the study, participants in each sample were asked to indicate their agreement with scale items. Data from individuals who failed attention checks during the study were not recorded. The collected data were cleaned of missing observations via listwise deletion prior to analysis. This procedure resulted in the following final sample sizes of 372 (Study 2A), 377 (Study 2B), 257 (Study 3A), 356 (Study 3B), and 373 (Study 4). The items on each version of the scale were examined for psychometric quality in SPSS23. All items, including those identified as having poor psychometrics, were tested in at least two independent samples to lessen the influence of any specific sample's characteristics on item inclusion/exclusion decisions. Tests of the planfulness model fit were conducted using confirmatory factor analysis (CFA) with a maximum likelihood estimator in R version 3.1.3 using lavaan, a structural equation modeling package (Rosseel, 2012). These tests used a two-step approach, testing first the measurement model (i.e., item loadings onto theoretical latent constructs) and then the structural model (i.e., including relationships amongst all latent constructs; Anderson & Gerbing, 1988). In the primary model, planfulness was established as a first-order (superordinate) latent variable that explains the three second-order (subordinate) latent variables of Temporal Orientation, Cognitive Strategies, and Mental Flexibility; these latent variables in turn explain each of their indicator scale items. The second-order variables were not allowed to covary directly with each other, so that all of the covariance among them is explained by the first-order variable. Based on the results from the principal components analysis of the pilot data which revealed a distinct factor for reversekeyed items (see Supplemental Materials), an additional measurement factor (ACQ) was included in the model to capture variance potentially generated by participant acquiescence bias. In the primary model, each indicator item loads on this measurement factor, with loadings fixed to either +1 or -1 for forward-and reverse-keyed statements, respectively; additionally, this factor was set to have no correlation with the first-order planfulness factor, as is necessary for specification in models that include measurement factors (Billiet & McClendon, 2000). Figure 1 is a visual depiction of this model. Alternative models, including a model without subscale factors, were also tested to determine the relative fit of the primary model (see Figure 2). Inspected indices of model fit included the comparative fit index (CFI), root mean square error of approximation (RMSEA), the χ 2 goodness-of-fit statistic, and the Aikake information criterion (AIC); these statistics were compared to determine relative model performance across samples and scale versions.

Scale Psychometrics
As is shown in Table 1, Cronbach's alpha was high for each scale version, ranging between .92 and .95 across all five independent samples. Inter-item correlations were examined closely to identify statements that poorly (r < .30) or negatively correlated with others either on the same subscale or the whole scale; if statements Note: the number of scale items in this table reflects the total number of items per each study, but not necessarily the total number of items tested in each sample. Often, multiple different items were dropped during exploratory model testing before confirmatory tests were conducted in new, independent samples. Scale scores were calculated by averaging across all items in a given version of the scale, resulting in a possible range of 1-5.
identified during this process were confirmed to have poor psychometric quality in a subsequent sample, those items were removed in the following scale version. An example item identified during this process is the CF statement, "I do not get caught up in thinking about the barriers to my goals". This statement's corrected item-total  correlation, a measure of an individual item's correlation with the whole scale, was low in both Study 3 samples (r = .10 in A, r = -.05 in B), and it negatively correlated with multiple items on the CF subscale in both samples. This process resulted in three items being removed from the scale version in Study 2 to Study 3, and four items removed from the version in Study 3 to the one in Study 4. Finally, scale scores were calculated by averaging all item responses on each version; distributions of these scores were slightly negatively skewed with means close to 4, also shown in Table 1.

Confirmatory Factor Analysis
Results of CFAs conducted on each sample's data supported the primary model as comparatively best fitting and revealed incremental improvement in model fit across scale versions. The CFI and RMSEA of the full planfulness model tested in Study 2A were .71 and .081, respectively, and both of these values improved to a CFI of .74 and a RMSEA of .068 for the model tested in Study 4. As expected, in each sample the primary model was determined to be of better fit than the alternative models, as indicated by a reduction in AIC and by a statistically significant change in chi-square test. In all alternative model tests across all samples the reduction in the chisquare statistic in the primary model was significant at p < .001. See Table 2 for a summary of model test results. Standardized factor loadings and variances were inspected for the individual scale item indicators as well as the second-order subscale factors. In each sample, the majority of item indicators loaded on their respective subscale latent factors above .30, indicating unidimensionality of the subscales. For example, in Study 3B all but 5 indicators (out of 60 total) loaded at .30 or greater, with an average loading of .50. Of the five with low loadings, two were previously flagged during psychometric evaluation, including the aforementioned item, "I do not get caught up in thinking about the barriers to my goals," which loaded at .013 on the CF subscale. All item variances were significant at p < .001 in every sample. Review of the subscale standardized factor loadings and variances indicated a high degree of multicollinearity. In every sample, each subscale loaded at .80 or greater onto the first-order planfulness factor in the primary model. Additionally, while the variances of each of the subscales did reach significance at the .05 level in every sample, the reported p values were surprisingly high (i.e., ≥.001) given the large sample sizes. For example, in Study Four the reported variance of the TO subscale was 0.14, p = .010, and the reported variance of the MF subscale was 0.22, p = .001. Finally, the variance of the acquiescence measurement factor was significant at p < .001 in every sample.

Discussion
The iterative process of scale development resulted in improved model fit of the Planfulness Scale across five independent samples. Troublesome items were identified during multiple steps of data inspection and were confirmed to be poorly performing in new independent data sets before being removed from the scale. The addition of novel items in Study 3 helped to balance the number of items representing each planfulness facet. Together, these procedures led to improved unidimensionality of each of the subscales and consequently increased the fit of the primary model from Study 2A to Study 4. The mean scale score distribution was slightly negatively skewed. A possible reason for this is that items that were designed to be more socially desirable but indicate low planfulness were disproportionately cleaned out for poor performance across versions. For example, of the four items removed between Studies 3 and 4, three were intended to sound socially desirable, including the aforementioned CS item. We conclude that acquiescence bias was present in responses to the tested scale versions, leading us to generate new socially desirable items for Study 5 in an attempt to reduce the observed skewness.
The data produced unclear results as to the unidimensionality of the overall scale. The low (r < .30) mean inter-item correlation across all scale versions suggests that the scale has multidimensional qualities; this is also supported by the previously reviewed PCA conducted on the pilot scale data in Study 1 (see Supplementary Materials). The CFAs conducted on the primary models in each sample reveal the loadings of all of the second-order facet factors on the first-order planfulness factor to be very high (>.80), as were the subscale covariances in the equivalent models. As the observed p-value of the subscale variances in the primary model was also surprisingly high in some samples, these results are possibly indicative of empirical under-identification (Kenny, 1979). Empirical under-identification of the primary model would suggest that the subscales do not contribute enough unique variance to be reliably estimated as distinct parameters, and thus the scale may be unidimensional. In contrast, the results of the series of alternative model tests do not seem to support this conclusion, as the primary model fit the data significantly better than the model that collapsed all subscale factors into one singular planfulness factor. Taking into account the theoretical basis for the hypothesized structure of planfulness, the conclusion to be drawn from these mixed results may be that while there are distinct categories of planfulness behaviors, scores on the Planfulness Scale should be calculated in aggregate rather then by individual facet scores. This conclusion is further tested in Study 5, which includes a new balanced scale version as well as novel model tests.

Study 5: Reliability and Validity of a Balanced Planfulness Scale
The previous studies supported the hypothesized model of planfulness and narrowed down a collection of items that together improved the fit of this model. In Study 5 we sought to generate a balanced scale with equivalent numbers of forward-and reverse-keyed items per each subscale. Balancing scales across forward-and reversekeyed items have been found to be beneficial for reducing acquiescence bias in participant responses, but such scales might also have effects on the structure of the data by causing the reverse-keyed items to load on a different factor from the forward-keyed ones (Kulas, Klahr, & Knights, 2018;Vautier & Pohl, 2009). We found this pattern with earlier versions of the planfulness scale, which is why we continue to include a measurement factor accounting for the keying of the items.
The balanced scale version was again inspected for psychometric quality, and tested using CFA against alternative models; additionally, given previous concerns about the significance of second-order subscale factor variances, a parceled model was also included to better examine the fit of the hypothesized structural model. We hypothesized that the primary model would continue to fit the data better than alternative models, as shown in Studies 2-4. We further hypothesized that tests would support the ability of each subscale to capture distinguishable patterns of thought about goals, as indicated by significant variance of the subscale factors. Three independent samples, A, B, and C, were collected for model testing using this balanced Planfulness Scale.
The balanced scale also underwent additional validity tests. In Samples A and B, the Planfulness Scale was compared with other related but theoretically distinct scales to test for convergent and discriminant validity. Planfulness was expected to correlate strongly (r > 0.50) with measures of conscientiousness or related constructs, such as grit, and to have weaker correlations (r < .30) with unrelated measures, such as extraversion. We additionally wanted to test whether planfulness is separable from the expected related constructs. Prior work has suggested that doing so using multiple regression of observed variables can lead to inflated Type I errors (Westfall & Yarkoni, 2016). Therefore, we designed a structural equation model wherein each construct was included as a latent variable explaining a latent factor with questions asking about specific goal-related outcomes as indicators. As this outcome factor was specific to goal achievement, we expected the Planfulness Scale to explain its latent variance beyond that of the other measures. Six comparison scales were included in Sample 5A, and four of these same scales were tested again in 5B.
Finally, longitudinal data were collected in Sample 5C to test the predictive validity of the balanced scale and estimate its test-retest reliability. In this sample, individuals completed the scale and also reported about three specific goals that they wished to achieve in the coming months; measures of planfulness and goal progress were both collected at two timepoints separated by three months. We hypothesized that the test-retest reliability of planfulness scores would be high, r > .50. We additionally hypothesized a strong positive relationship between planfulness at the first timepoint and goal progress at the second timepoint (i.e., change in goal completion in the interim). If supported, the results would reveal the ability of the Planfulness Scale to predict variance in goal achievement as a function of specific patterns of goalrelated thought.

Participants
All participants were native English speakers, and data from individuals who failed survey attention checks were not recorded. Participants for Samples A and B were recruited from mTurk using the same methods as described previously, including the additional measure to ensure that each sample consisted of individuals who had not previously taken any version of the Planfulness Scale, and were again compensated at approximately $5-$6 dollars per half hour. Sample A included 410 individuals, 51% of whom were male, primarily identified as non-Hispanic Caucasian (79%; the two next largest groups were 7% Asian or Pacific Islander and 6% non-Hispanic Black), and who ranged in age from 18-78 (M = 36.41, SD = 11.22). There were 411 participants in Sample B; this sample was primarily male (58%) and non-Hispanic Caucasian (81%; 7% non-Hispanic Black, 6% Asian or Pacific Islander), and included individuals ranged in age from 19-84 (M = 35.46, SD = 11.25). Sample C consisted of 1,192 participants recruited through a nationally representative Qualtrics panel of adults. In this sample, 62% of participants were women, 80% identified as Caucasian non-Hispanic (8% Black non-Hispanic, 6% Hispanic), and participants ranged in age from 19 to 89 years (M = 47.73, SD = 16.64). All 50 U.S. states were represented by participants in this sample; a choropleth map showing response rate by state is available in Appendix B.

Materials
The balanced Planfulness Scale was tested in all three samples. To decrease potential error induced by respondent social desirability bias, we included items that assessed low planfulness while maintaining high face-valid social desirability, such as, "I prefer my days to be spontaneous rather than scheduled." We also structured this version to balance forward-and reversecoded items across each subscale, and to represent each subscale equivalently across total items. The balanced scale therefore had a total of 30 items; 10 total from each subscale, with 5 of each of these items reversecoded. As with prior versions of the scale, all subscale items were presented in a fixed, interleaved pattern. This version of the scale took approximately seven minutes to complete. The full text of the scale can be found in Appendix A.
Participants in Samples A and B additionally completed the Big Five Inventory (BFI, 44 items; John & Srivastava, 1999), the Brief Self-Control Scale (BSCS, 13 items; Tangney, Baumeister, & Boone, 2004), the Barratt Impulsiveness Scale (BIS-11, 30 items;Patton, Stanford & Barratt, 1995), and the 12-item Grit scale (Duckworth, Peterson, Matthews, & Kelly, 2007). Sample A participants also completed the Mindful Attention Awareness Scale (MAAS, 15 items; Brown & Ryan, 2003), and the Need for Cognition scale (18 items; Cacioppo, Petty, & Kao, 1984). Finally, Sample B participants completed an 'Outcomes' questionnaire. Outcome items were designed to be face-valid questions about specific, tangible goal achievement experiences; for example, one item read, "I meet my deadlines on time". Participants responded to questionnaire items using a five-point Likert scale (1 = Strongly disagree; 3 = Neither disagree nor agree; 5 = Strongly agree). There were nine outcome items, three of which were reverse-coded; see Appendix D. Each scale took between ten to twenty minutes to complete.
Finally, participants in Sample C answered questions about goals and goal progress at each of two timepoints (adapted from Palfai & Weafer, 2006, based on the approach described in McGregor & Little, 1998). At the first timepoint, they were prompted to, "[p]lease take a few minutes and write down the top three personal projects that you are currently engaging in or considering for the next three months" (see Appendix E for full instructions). Inspection of the projects that participants listed revealed them as varying in size ("plan for retirement," "plant & grow starter plants") and specificity ("saving", "send out 10 job applications"). After recording these projects using three free-response text boxes, they then answered three questions in a fixed order per each goal; these three questions were also asked at timepoint two, after participants viewed and confirmed their verbatim projects, which were ported from timepoint one. The first question, "How committed to this project are you?" and the second question, "How important is this project to you?" were both answered using a five-point Likert scale (1 = Not at all committed/important; 3 = Moderately committed/important; 5 = Extremely committed/important). For the third question, "What percentage of this project is completed right now?" participants responded with percentages ranging from 0-100. The third question allowed us to calculate two dependent measures of goal progress, and the other two questions were included as potential moderators, although we did not include these in the present analyses. If participants did not recognize the three provided projects at the second timepoint as their own, they were asked to re-enter their projects to the best of their ability. They then were subsequently asked to rate how confident they felt about their ability to recall each of those projects from memory on a scale of 1 = Not at All Confident to 3 = Very Confident.
Two dependent measures of goal progress were calculated. The first dependent measure represented the raw change in percentage complete of a goal from timepoint one at timepoint two, T2%-T1%. While face valid, this index is an inadequate measurement of progress made towards goal completion; for example, a participant who moved from 10% to 20% complete would obtain the same score as another participant who moved from 90% to 100%. Thus, the a priori decision was made to calculate a second variable to measure the amount of progress made as a function of the amount of progress remaining, given by the formula (T2%-T1%)/(100-T1%). For example, on this measure moving from 10% to 20% yields 11%, whereas moving from 90% to 100% yields 100% of progress towards completion made. These indices are calculated per each goal and then averaged across all goals to form composites of average raw percentage change.

Procedures
The research conducted in Study 5 was approved by the University of Oregon Research Compliance Services prior to commencement. Participants in all samples completed the study online via a Qualtrics survey and provided electronic consent before proceeding to the survey questions. Participants in Samples A and B were presented with all scale measures in a randomized order, except for the Outcomes Questionnaire, which was presented to Sample B participants immediately following consent in order to avoid any carry-over effects from the other measures. If participants did not provide an answer to a survey item, they were prompted with a request to do so; this process resulted in complete data from every individual in both samples. Participants in Sample C were invited to complete the study at two timepoints separated by a three-month period. At the first timepoint, participants were given a fixed order of the Planfulness Scale followed by input of their three goals, and the three questions about those goals. At the start of the second timepoint participants were shown their goals from the first timepoint verbatim and confirmed that they were accurate. If not, participants entered in their three original goals to the best of their recollection. All participants then completed the Planfulness Scale and goal-related questions in randomized order. Of the 1,192 participants who completed the study at the first timepoint, 501 (42%) participants returned to complete the second session. Lists of goals were matched across timepoints, and cases that did not include matching goals at each timepoint were excluded from regression analysis; individuals who had indicated that their goals were 100% complete at timepoint one were also excluded. Finally, cases with missing values were removed with listwise deletion, resulting in a total N = 457 (91%) of observations included in tests of predictive validity and test-retest reliability. Model testing and scale psychometrics were performed on the dataset from the first timepoint, cleaned of missing observations using listwise deletion, for a final N of 1,188.
Scale psychometrics were inspected in the same manner as the previous studies using SPSS23. Bivariate correlations of scale scores in Samples A and B were also conducted using SPSS. Model testing was again completed in R version 3.1.3 using lavaan, and included the previously described primary and alternate models. A new, parceled model was also included to address concerns about empirical underidentification uncovered during Studies 2-4. Parceled models do not allow inferences about individual items, but the benefits of using parcels include a reduction of sampling error and decreased likelihood of correlated residuals, both of which can clarify the structural relationship among latent factors in the model (Little, Cunningham, & Shahar, 2002). We therefore built three parcels for each subscale using a blend of two techniques discussed in Little, Cunningham, and Shahar (2002) in order to increase parcel unidimensionality and test the hypothesized structure of the scale. The TO parcels were built using a domain-representative approach, by splitting all items into categories of 'past', 'present', and 'future' focus, and then dividing these items evenly across parcels. The CS and MF subscales lacked obvious categories in their respective items, and thus parcels for these subscales were constructed using an item-toconstruct balance approach. In this approach, a principal components analysis is run on each subscale and items are equivalently distributed across parcels based on their factor loadings, such that each parcel has approximately the same average loading of items on the constructs; three parcels are recommended as the minimum requirement for model identification. The PCA was first run on the largest dataset, Sample C, to determine parcel components, and then these same components were used in Samples A and B's models (see Table 4 for parcel items). The model to test whether planfulness could be considered separable from related constructs was similarly constructed. Three parcels each were generated using an item-to-construct balance approach for grit, conscientiousness, impulsivity, self-control, and the Outcomes Questionnaire. These parcels were explained by their respective latent construct variables, and the latent variables of planfulness, grit, impulsivity, self-control, and conscientiousness in turn explained the latent outcomes variable (see Figure 3). Finally tests of Sample C's planfulness test-retest reliability and predictive validity were conducted in R.

Scale Psychometrics
Despite reducing the number of items in the scale, Cronbach's alpha remained high in all samples, α ≥ .89. Compared to previous studies, the average interitem correlation increased while the variance decreased across all samples. Inspection of the corrected item-total correlations in each sample revealed no poorly (r < .30) correlating items in Sample A, but revealed the same three items (3, 18, and 28) in Samples B and C as correlating between a range of .19 and .27. Further investigation of these items' performance on their respective subscales indicated that item 18 correlated well (r ≥ .35) with the rest of the CS items in both samples; however, items 3 and 28 continued to have low correlation (r ≤ .27) with the whole TO subscale in both samples. Finally, planfulness scores were calculated by averaging across all item responses, resulting in a possible score range of 1-5. The mean scale scores did decrease on average as compared to the previous studies (e.g., M = 3.69 in Sample A here vs. M = 3.81 in Study 2A), and mean scores continued to be slightly negatively skewed. A truncated range of scale scores was also observed in each sample, as scores began at approximately 2. See Table 3 for a summary of all scale psychometrics in each sample.
Results of model tests confirmed two of our hypotheses. First, the balanced Planfulness Scale improved in fit over previous scale versions according to various indices (ex: Sample C CFI = .82, RMSEA = .066, see Table 4  Parceled models were run to address our concerns about empirical under-identification. The subscale variances reached statistical significance in the parceled models in all three samples, supporting multidimensionality of the subscales; the difference between the parceled and non-parceled models suggests item variance obscures multidimensionality. In Samples A and B the variance of the CS subscale was significant at p = .011, while all variances were significant in Sample C at p < .001. Correlations amongst the subscales were very high in all samples, ranging from .92 to .96.

Convergent and Discriminant Validity
All scores for the comparison scales were calculated according to their original instructions, resulting in a final range of scores from 1-4 for the BIS-11, 1-6 for the MAAS, and 1-5 for all other included scales. As hypothesized, the correlation of Planfulness scores with other scale scores tracked closely to those of the conscientiousness scores, correlating strongly with related measures and exhibiting weak correlations with unrelated metrics in both Samples A and B (see Table 5). Specifically, scores on the Planfulness Scale correlated the most highly with Grit (r = .76 in both samples) and the BIS-11 (r = -.74 in A, r = -.75 in B), and the lowest with extraversion (r = .22 in A, r = .26 in B) and openness (r = .27 in A, r = .24 in B). Planfulness scores correlated highly with conscientiousness itself, as expected, r = .71 in A, r = .75 in B.

Separability of Constructs
Responses to the Outcomes Questionnaire were gathered in Sample B to test the incremental validity of the Planfulness scale in predicting goal outcomes above and beyond related constructs. Scores on the Outcomes Questionnaire were averaged across all nine items (reverse-coding three), resulting in a range of 1-5, where five indicates a strong self-reported ability to achieve set goals. In Sample B, the mean Outcomes score was 3.76 (SD = 0.62) and the Cronbach's alpha for the nine items was .87. The items were entered into a formative model wherein all items loaded on one general Outcomes construct; this was then In this model, all latent construct factors are explaining the variable representing the Outcomes Questionnaire given to Sample 5B. All indicators are parcels constructed using an item-to-construct balance approach, except for the TO parcels which were constructed with a domain-representative method. All construct latent factors are allowed to covary with each other, although only neighbor covariances are depicted here for clarity of the structure. Note: Sample C data is from the first timepoint. Scale scores ranged from 1-4 on the BIS-11, 1-6 on the MAAS, and 1-5 for all other scales. tested using confirmatory factor analysis with a maximum likelihood estimator. The items loaded moderately (standardized coefficient >.35) to highly (>.80) on the latent Outcome factor, and all loadings and variances were significant. For a full report of item loadings, see Table 6.
The scores for the Planfulness scale and the comparison measures, Grit, the BIS-11, the BSCS, and the conscientiousness facet of the BFI, were calculated as previously noted and their descriptives are shown in Table 3.
Results of a principal components analysis conducted for each scale measurement resulted in the construction of three parcels balanced across factor loadings (see Table 7a for parcel details). These parcels were then included in a structural equation model with their respective latent constructs, and all latent constructs were modeled as predictors of the Outcomes variable (Figure 3). Testing this model supported our hypothesis. Latent Planfulness significantly predicted variation in Outcomes while controlling for latent Grit, Impulsivity, Self-control, and Conscientiousness, with the standardized coefficient for latent Planfulness = .51, p < .001. Furthermore, though conscientiousness was the only other construct that was significantly related to Outcomes, this relationship was weaker than that of planfulness (Beta = .37, p = .01), although comparison of this model with one wherein the paths from Planfulness and Conscientiousness to Outcomes were set to be equivalent suggests that this difference did not reach statistical significance, change in chi-square p = .13. Overall, the variance of each Outcome parcel was ~. 30, meaning that roughly 70% of the variance in each Outcome parcel was accounted for by the other latent variables in the model. Finally, a high degree of multicollinearity (>.7) was again observed amongst the related constructs; see Tables 7a and 7b.

Discussion
The balanced Planfulness Scale was the strongest version of the scale tested. Psychometric analyses indicated that even with a reduced number of items, internal consistency remained high, as did other metrics of unidimensionality. The distribution of scale scores appeared more normal for the balanced scale, although negative skew was still evident in the high observed mean and truncated range. However, negative skew was also evident in the distributions of several other measurement scores collected in Samples A and B, which may suggest that the results were reflective of the characteristics of the samples rather than of a weakness of the balanced scale. Analyses also identified two items on the TO facet (3 and 8) that correlated weakly with the full scale as well as other TO items. These results may be a function of the reduced subscale item size enhancing a multifaceted nature of TO. It has been previously shown that time perspective is a multidimensional construct (Zimbardo & Boyd, 1999), and a clear delineation of past, present, and futurefocused items was evident when constructing TO parcels for model testing. Therefore, we believe that the low interitem correlations among questions addressing different timeframes in the TO subscale are to be theoretically expected given the distinct temporal reference points. The balanced scale was also found to be the best fitting model yet according to various fit indices. As with previous versions, the primary planfulness model was shown to be of significantly better fit than alternate models. However, at least one subscale variance failed to reach significance across all three samples, making it unclear whether each subscale was contributing unique information to the model. Parceled models were constructed to test the dimensionality of the scale by focusing on the structural relationship amongst latent factors. Across all samples the variances of the subscales reached statistical significance when parcelation was used, although correlations amongst them remained high. These results further support the previous conclusion; namely, that while there are three distinct facets to planfulness, the Planfulness Scale should be considered in its entirety to represent a single construct, and not scored only according to its subscales.
The balanced Planfulness Scale was compared against other validated measurements and tested for its association with real-world achievement outcomes. Planfulness scores and BFI conscientiousness scores were found to share patterns of correlations with other measurements, including an exhibited inverse relationship with the BIS-11. While it may not be surprising that highly conscientious and planful individuals also self-report as being low in impulsivity, this association does not necessarily mean that the two constructs are simply inversely related. As discussed in the introduction, it may be that being highly planful can moderate or mask the effect of certain types of impulsivity in the context of goal pursuit. For example, part of being planful could be to anticipate contexts in which one is likely to make potentially maladaptive spurof-the-moment decisions and to take steps to avoid those situations to mitigate the effect of impulsiveness.
The results from Samples A and B are evidence of the convergent and discriminant validity of the balanced Planfulness Scale and support the hypothesis that planfulness falls within the conscientiousness domain. A test of incremental predictive validity further supported the ability of the Planfulness Scale to explain unique variance in self-reported general ability to achieve goals, above and beyond other measures within this domain. This incremental predictive effect was observed even in the presence of a high degree of collinearity with related constructs such as grit and conscientiousness. Given that this test was only conducted in one sample and relied on comparing self-report measures, future tests of construct validity will need to focus on using non-self-report measures to further untangle the relationship between planfulness and other constructs.
The longitudinal component of this study additionally supports the Planfulness Scale as being reliable over time, as scale scores separated by a period of three months were found to be highly correlated. Critically, scale scores from the first timepoint were prospectively predictive of idiographic real-world goal progress. Given that it is not uncommon for people to make progress towards a goal and then either get stuck or abandon it altogether, we calculated a priori a variable that weighted achievement of a goal more heavily than raw change in reported progress toward that goal. Indeed, we found that the Planfulness Scale was not predictive of raw goal completion change, but instead predicted goal progress as a function of the amount left to make until completion, as hypothesized. Importantly, Planfulness scores represent the degree to which individuals tend to engage in specific mental processes that have been previously supported as effective for goal achievement in laboratory research. Our longitudinal results therefore show that individuals who report engaging in these patterns of thought are more likely to achieve their goals-not just make incremental progress towards them-in the real world.

General Discussion
We proposed the construct of planfulness to build upon insights from parallel lines of inquiry on traits related to achievement, from the personality field, and mental processes related to goal success, from the self-regulation field. We defined planfulness as the tendency to engage in specific mental processes with respect to goals, which we hypothesized would explain between-person variability in effective goal-related cognition. We articulated the theoretical rationale behind the construct, and developed a scale to measure it in the general population using an iterative process of item testing and guided adjustment across five studies and ten independent samples. This process improved each subsequent scale version as indicated by psychometrics and fit indices, culminating in the balanced scale version tested in Study 5. Further tests of reliability and validity supported the balanced Planfulness Scale as reliable and valid metric for assessing individuals' thoughts about their goals and predicting real-world outcomes. Planfulness offers a unique and specific perspective on goals as the construct provides a bridge between two broad areas of research. The study of goal setting and achievement in personality psychology is trait-oriented, describing broad phenotypic characteristics of individuals that are stable across time and circumstances (e.g., conscientiousness). In contrast, the laboratory research on goal achievement is process-oriented, focusing on the specific mental strategies that are likely to encourage goal achievement in the general population. These two research traditions investigate the same general topic with divergent methods and assumptions, and as such, tend not to directly interact or inform one another. Conceptualizing planfulness as a stable individual difference in the tendency to deploy specific mental processes joins these two theoretical worlds, and developing a valid and reliable measure of the construct provides a research instrument that can be beneficial to both. More generally, our approach to studying planfulness might serve as a model for conceptualizing how specific cognitions or behaviors can accumulate across time and manifest as an observable "trait", thereby allowing for longitudinal predictions at an individual level based on brief mental processes.
One seemingly paradoxical implication of this hybrid perspective of goal achievement is that we argue for the existence of stable individual differences in goal-related cognition, and also that the processes can be altered within a person. The present study provides evidence for stability over time in the tendency to use these strategies, and at the same time there is ample evidence that these same patterns of thought about goals can be manipulated in a laboratory setting. Together, these observations suggest that people tend to deploy similar patterns of thought about goals, but that those patterns of thought are not so entrenched that they cannot be influenced by brief manipulations. It might be that people are simply unaware of alternative ways of thinking about goals than the ones that they have used previously.
The Planfulness Scale was constructed to capture individual differences in three specific processes that are known to reliably improve the likelihood of goal achievement. The scale items were selected to tap these processes specifically, so the scope of the scale is deliberately circumscribed around temporal orientation, mental flexibility, and cognitive strategies. As noted in the introduction, we theorize that additional facets could be added to the extent that additional cognitive processes are identified that reliably increase goal progress beyond these initial three facets. However, the ability of the Planfulness Scale to predict goal progress with only the three facets included suggests that the extant scale, on its own, explains meaningful variance in goal-related behaviors.

Limitations and Future Directions
There is an inherent limitation to studying planfulness that may have impacted the results reported here. Being high in planfulness involves a degree of goal meta-cognition (the ability to think about how one thinks about goals), as indicated by items such as, "I can easily identify why I have not achieved goals in the past." It may be that those low in planfulness do not regularly reflect on how they approach goals, or are less accurate in recognizing the specific steps they take that lead to achievement success or failure. In turn, these individuals may then overestimate the degree to which they engage in planful thoughts or behaviors, exhibiting the so-called Dunning-Kruger effect (Kruger & Dunning, 1999). This might introduce heterogeneity to how a group of individuals use the scale in addition to other sources of noise typical of self-report measures. If this effect indeed is occurring in low planfulness individuals, it could help to explain why the distribution of scores from Study 5 is approaching a normal shape, but with a truncated range (histogram in Appendix C).
The fact that the Planfulness Scale is a significant predictor of real-world goal progress somewhat mitigates the aforementioned concern, however goal progress itself was measured through self-report items in this research. This is a second limitation of the work presented herewe did not collect an objective, behavioral measure of goal progress. It may be then that our results represent a strong correlation between thinking about goals and thinking that progress has been made, regardless of the objective truth. Future research should focus on validation of the scale using real-world goals that can be objectively measured in order to explore this possibility.
Another limitation of this research is the lack of a direct comparison between the Planfulness Scale and measures of conscientiousness in their ability to predict objective measures of goal progress. We introduce planfulness here as a way of shifting the focus of trait-level measurements away from the structure of personality and toward processes that result in habitual thoughts and behaviors. We provide initial evidence that this conceptualization exhibits reliability and validity in concordance with traditional personality measurements, as well as evidence based on self-report measurements that planfulness explains distinct variance in goal outcomes from related measures within the conscientiousness domain. Planfulness and conscientiousness are not theoretical "competitors" because planfulness is theorized as a set of processes in the conscientiousness domain. However, researchers and practitioners may nevertheless want to know as a practical matter which scale provides better or stronger predictive validity or whether the Planfulness Scale has incremental validity over existing Big Five scales. Having moved through the first of two phases in the scale construction model suggested by Simms (2008), an important next step to take following this work in the external validity phase is to thoroughly inspect whether the Planfulness Scale is able to capture information in a unique way from other measures of related traits using objective measurements of behavior. Future work will determine the relative performance of the Planfulness Scale, but we believe we have provided evidence here of its stand-alone value.

Conclusion
Having undergone thorough testing in multiple unique samples, the Planfulness Scale contains items representing three categories of goal cognition that contribute overall to general propensity for goal achievement and success. An initial test of predictive validity suggests that planfulness is positively correlated with progress on realworld goals across time. This initial scale construction and validation process should be followed up with tests of the Planfulness Scale's ability to predict goal outcomes that are not self-report.

Data Accessibility Statement
De-identified data and analysis scripts are available for download on the Planfulness Scale Open Science Framework page: https://osf.io/56ja2/.

Additional Files
The additional files for this article can be found as follows: