Measuring Prospective Imagery: Psychometric Properties of the Chinese Version of the Prospective Imagery Task

Objective Prospective negative imagery is suggested to play an important role in the development and maintenance of anxiety and depression. The Prospective Imagery Task (PIT) was developed to assess prospective imagery. Given the importance of prospective imagery for mental health in the Chinese cultural context, our objective was to examine the psychometric properties of the PIT in a Chinese sample. Methods The instrument was validated among a sample of 1,372 Chinese individuals (mean age = 19.98, SD = 4.57; 35.2% male) who completed the PIT immediately following the Beck Depression Inventory-II (BDI-II) and State-Trait Anxiety Inventory-Trait version (STAI-T). Results The two-factor structure of the PIT was in line with the original study, with satisfactory reliability and positive correlations with the BDI-II and STAI-T scores. Latent profile analysis revealed a three-class pattern. The measurement invariance indicated that the instrument can be used among different age groups as well as among males and females. Conclusion The Chinese version of the PIT is a reliable and valid tool to measure prospective imagery, and the positive subscale is meaningful for clinical psychology. Limitations and future research directions are discussed.


INTRODUCTION
Mental imagery differs from real perception; the former is the perceptual experience of individuals without parallel sensory input (Kosslyn et al., 2006) and the simulation or recreation of perceptual experience (Kosslyn et al., 2001). Previous experimental cognitive studies have focused on nonemotional mental imagery. However, in the past 20 years, emotional mental imagery has received increasing attention. Holmes and Mathews (2010) emphasized a "special relationship" between mental imagery and emotions. Mental imagery has been viewed as an emotional amplifier or a potent driver of emotion (Burnett Heyes et al., 2013).
Research in clinical and counseling psychology has focused on the mental imagery of emotional disorders in particular. A growing number of studies have found a link between imagery and emotional disorders (Brewin et al., 2010). Blackwell et al. (2013) found a negative correlation between depression and the vividness of imagining positive future events, and Werner-Seidler and Moulds (2011) suggested that people with depression have impairments in their ability to visualize the future. Compared with healthy people, people with depression and anxiety have lower subjective estimates of the vividness and likelihood of positive future events (Morina et al., 2011), and people with bipolar disorder have more vivid imagery of negative future events . From a clinical perspective, stimulating future events through imagery is important if the action is negative (Holmes et al., 2007); for example, some people with depression have intrusive mental images of future suicide (Crane et al., 2012).
Tool has been generated to measure the vividness of imagery for prospective events. Initially, Stöber (2000) investigated the vividness of prospective positive and negative mental imagery among a non-clinical population using the 30-item Prospective Imagery Task (PIT, based on MacLeod and Byrne, 1996) on a 7-point scale. The scale was proven to have satisfactory reliability and validity. Based on Stöber's (2000) PIT, Holmes et al. (2008) shortened the instrument to 20 items and measured it on a 5-point scale; they reported that its reliability and validity were in line with Stöber's (2000) findings.
The advantage of the PIT is that it has been widely used in many studies related to mental imagery (Morina et al., 2011;Blackwell et al., 2013). Therefore, the PIT can be replicated by comparison with previous studies and compared among a variety of countries. This is an advantage because Chinese culture is quite different from that of the West, and the generation and content of imagery are closely related to culture. Because culture is a vital element in mental imagery, cross-cultural validation of the PIT is necessary (Yoon et al., 2016). For instance, a study found that culture, as opposed to the language of a message, drove imagery-generation capabilities among participants from China, Singapore, and the United States (Liang et al., 2010). When exposed to abstract advertising messages, East Asians tend to generate more imagery than Westerners (Liang and Kale, 2012). In addition to cultural differences, mental imagery is related to various emotional disorders. A decrease in the expectation of future positive events is a typical feature of depression, while anxiety is characterized by an increase in the number of perceived negative future events (Rief et al., 2015;Gadassi Polack et al., 2020).
Thus, the present study attempted to establish the criterion validity of the PIT by exploring its correlation with depression as measured by the Beck Depression Inventory-II (BDI-II) (Beck et al., 1996) and with anxiety as measured by the State-Trait Anxiety Inventory-Trait version (STAI-T) (Spielberger et al., 1970).

Participants and Procedure
A total of 1,500 participants were recruited. Of these participants, 128 were excluded because (a) they were younger than 16 or older than 65, (b) they were unwilling to participate/give informed consent, or (c) they had a history of psychiatric illness. Therefore, the effective sample comprised 1,372 participants. Of these, 483 (35.2%) were male, and 889 (64.8%) were female. Students constituted the majority of the participants (83.5%). The mean age of the overall sample was 19.98 years (SD = 4.57).
Questionnaires were distributed online and offline. The study was approved by the institute's ethics committee. Written informed consent was obtained from the participants after they received a description of the study. Teenage participants obtained informed consent from their guardian. Each participant received a reward of 5 yuan.
The participants were administered questionnaires twice. At baseline (i.e., the pretest), the sociodemographic and clinical characteristics, including age, gender and career, of the 1,372 participants were collected together with three self-report scales, including the PIT, BDI-II, and STAI-T. At 2 months following the pretest, the PIT was sent to 60 of the participants online to assess the test-retest reliability of the scale.

Prospective Imagery
Prospective imagery was measured with the PIT containing 10 positive (e.g., "People you meet will like you") and 10 negative ("You will be the victim of crime") future scenarios (Holmes et al., 2008). The Chinese translation of the PIT was developed through iterative back-translation by a team of bilingual psychologists and with the help of one of the authors. Consequently, we modified the translation by comparing its comprehension and accuracy with the original PIT and determined the final version. Participants were asked to rate the vividness of each image on a 5-point scale ranging from 1 ("no image at all") to 5 ("very vivid"). A higher score for each subscale (positive or negative) indicates more vivid imagery. The PIT has good internal consistency (0.83 < α < 0.90) (Stöber, 2000;Blackwell et al., 2013).

Depression
We assessed depression with the Chinese version of the Beck Depression Inventory-II (BDI-II) (Wang et al., 2011), a 21-item self-report scale. Participants were asked to rate each item on a 4-point scale ranging from 0 ("rarely or none of the time") to 3 ("most or all of the time"). The higher the score, the more severe the level of depression. The Chinese version of the BDI-II was found to have good reliability and validity among Chinese populations (Wang et al., 2011).

Anxiety
The Chinese version of the State-Trait Anxiety Inventory-Trait version (STAI-T) (Shek, 1988) was used to measure trait anxiety. The inventory consists of 20 anxiety-related items. Participants were asked to rate how they "generally feel" on a 4-point scale. Two previous studies tested and validated the Chinese version of the STAI-T for use in the Chinese community with good reliability (Shek, 1988) and validity (Shek, 1993).

Data Analysis
The data set of the participants was randomly divided into two halves to explore factor structure. Exploratory factor analysis (EFA) was conducted with half of the sample (n = 686), and the maximum likelihood robust estimator (MLR) method was used to extract the factor loadings. The other half of the sample (n = 686) was used for confirmatory factor analysis (CFA) with the maximum likelihood (ML) method. Goodness-of-fit indices were reported, including the Tucker-Lewis index (TLI), comparative fit index (CFI), Akaike information criterion (AIC), Bayesian information criterion (BIC), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). TLI and CFI range in value from zero to 1.00, with a value close to 1.00 indicating a better fit (Mulaik et al., 1989). For RMSEA and SRMR, values less than 0.05 are considered good, values less than 0.08 are appropriate, and values greater than 0.10 indicate that there is room for improvement (Finch and West, 1997).
We assessed the internal consistency of the PIT using Cronbach's alpha coefficients and test-retest reliability with Pearson's correlation coefficient. According to Cicchetti (1994), values less than 0.60 are poor, values between 0.70 and 0.80 are acceptable, and values greater than 0.80 indicate good reliability.
To examine validity, Pearson's correlation coefficients of the PIT, BDI-II, and STAI-T were examined. Then, regression analyses were conducted to examine whether prospective imagery contributed independently to the prediction of depression and anxiety after adjustment for gender, age, and career. In the first step, the two facets of prospective imagery were entered into the regression model. In the second step, gender, age and career were entered.
All receiver operating characteristic (ROC) curve analyses were conducted using SPSS 23.0. For each ROC curve analysis, we calculated the area under the ROC curve (AUC) (Green, 1989) and the optimum cut-off point (Youden index) to distinguish individuals with and without depression or anxiety and to determine the optimal cut-off point to maximize sensitivity and specificity. The critical value for significance for AUC was set at p = 0.05. Šimundić (2009) suggested that an AUC value greater than 0.80 is excellent, 0.70-0.80 is good, 0.60-0.70 is sufficient, and less than 0.60 is poor.
We conducted latent profile analysis (LPA) for the vividness of positive and negative prospective imagery using maximum likelihood estimation with robust standard errors, judging the latent category and distribution in Mplus 7.11 Muthén, 1998-2015). We gradually increased the number of types of LPA; the smaller the model fitting index of AIC, BIC, and aBIC, the better the model fit. The value of entropy represents classification accuracy, and its general criterion is 0.80 (Clark, 2010). Higher entropy and significance levels of the Lo-Mendell-Rubin (LMR) test and bootstrapped likelihood ratio test (BLRT) (p < 0.05) indicate that the model of k categories is better than the model of k -1.
Mplus 7.11 was also used to examine the measurement invariance (MI) across gender and age by means of multigroup CFA. Since MI compares a series of nested models, in addition to the commonly used fitting indexes, such as χ 2 , CFI and RMSEA, we can use χ 2 . Nevertheless, in large sample cases, compared with χ 2 , CFI and RMSEA are superior for evaluating model fit ( CFI < 0.01, RMSEA < 0.015) (Finch and French, 2018;Counsell et al., 2020). Due to the large sample size, the MI in our study was mainly examined through CFI and RMSEA.
Data collected at baseline with a total sample of 1,372 participants were used to estimate the internal consistency using Cronbach's α coefficients. Pearson's correlation coefficients between the PIT scores at baseline and the 2-month followup were calculated to examine the test-retest reliability. To validate the Chinese version, its correlations with the BDI-II and STAI-T were examined. We conducted regression analyses to examine whether prospective imagery dependently predicted depression and anxiety. LPA was used to determine the optimal number of latent profiles. The MI was used to test the general applicability of the PIT.

Factor Structure
Through EFA, we examined the potential factor structure of the PIT. According to Kline (2010), compared with the single-factor model, the AIC, and BIC of two-factor model decreased the most sharply. Integrating other fitting indexes, the two-factor model was the best (see Table 1, χ 2 /df = 2.68, TLI = 0.90, CFI = 0.91, RMSEA = 0.05, SRMR = 0.04). The factor loadings for each item are illustrated in Table 2. All items that loaded on Factor 1 originally belonged to the negative subscale, except for Item 18, which originally belonged to the positive subscale. All items that loaded on the second factor belonged to the positive subscale. The factor loadings of all items were greater than 0.40. The correlation between the two factors was small (r = 0.27).
Based on the two-factor model obtained by EFA, CFA was performed on the other half of the sample data (n = 686, see Table 1). Subsequent analyses of reliability and criterion validity were based on the proposed structure of the correlated twofactor model.

Validity
As shown in Table 3, positive and significant correlations were found among depression, anxiety, and negative prospective imagery, whereas negative and significant correlations were found among depression, anxiety, and positive prospective imagery. Both subscales were related to depression and anxiety      even after controlling for differences in gender, age and career (p < 0.001, R 2 = 0.03, see Table 4).

ROC Curve Analysis
An optimal cut-off point is valuable for discriminating between clinical and healthy populations. Thus, cut-off points for the PIT were examined by using the recommended cut-off of 28 on the BDI-II and 48 on the STAI-T. As shown in Tables 5, 6, when we took the BDI-II and STAI-T scores as the state variables, only the AUC value of the PIT-P for BDI-II was fair (AUC = 0.80, p < 0.05). In ROC curve analysis, the Youden index (sensitivity + specificity − 1) is often used to represent the cut-off point for maximum discrimination. As shown in Table 6, when the BDI-II score was taken as the state variable, the optimum screening score was 19 (sensitivity = 94.8%, specificity = 75.0%, Youden index = 0.70).

Latent Profile Analysis
The PIT model fitting indices are shown in Table 7. Referring to Nylund et al. (2007), LPA was conducted by starting with two types and gradually increasing the number of types. The fitting indices in the class 3 model exhibited the sharpest fall, and this model was simpler than the others. Therefore, the class 3 model is the best model. The score distribution of the latent class of prospective imagery on each item is shown in Figures 1, 2. The population of class 1, which accounted for 22.9 and 32.3% of positive and negative prospective imagery, respectively, was named the "low vividness" group (C1). Class 2 accounted for 46.9 and 47.1%, respectively, and was named the "moderate vividness" group (C2). Class 3 accounted for 30.3 and 20.6%, respectively, and was named the "high vividness" group (C3).

Measurement Invariance Across Gender and Age
According to the multigroup CFA fit indices, the model fit was acceptable (all CFIs were close to 0.90, RMSEA < 0.08, SRMR < 0.08). Specifically, MI across genders was examined by means of multigroup CFA; the CFIs were all < 0.010, and the RMSEAs were all < 0.015 (see Table 8). Therefore,   gender does not affect subjective judgments of prospective imagery. Subsequently, MI across age was examined by means of multigroup CFA; the CFIs were all < 0.010, and the RMSEAs were all < 0.015 (see Table 8). In other words, age does not affect subjective judgments of prospective imagery.

DISCUSSION
We aimed to develop a Chinese version of the PIT and to ensure that its psychometric properties were consistent with previous studies. The contributions of this study are threefold: first, we developed a native tool for measuring prospective imagery; second, we verified previous results; and third, we showed that the tool has significance as a reference for the diagnosis of clinical depression. In this paper, the PIT was introduced to the mainland Chinese population. The structure of the revised Chinese version of the PIT scale was basically in line with the original research (Stöber, 2000). The original scale was divided into positive images (10 items) and negative images (10 items). EFA in this study showed that Item 18, belonging to positive dimensions, was classified as a negative dimension in this study. One reason for the inconsistency of the item "Your mind will be very alert and 'on the ball"' with the original scale may be that the meaning of "alert" differs among cultural backgrounds, leading to misunderstanding of the connotation of the item by domestic subjects. Another possibility is that individuals have different degrees of uncomfortable physical and mental experiences under stress (Dickerson et al., 2004). Participants are more likely to experience the discomfort of the stress state evoked by experiencing mental imagery. Thus, they tend to attach a negative meaning to it.
EFA showed that the structure of the Chinese version of the PIT included two stability factors of negative images (11 items) and positive images (9 items). Moreover, CFA determined that the goodness-of-fit indices of the two-factor model of the Chinese version of the PIT was acceptable.
The test-retest correlation coefficients in the current sample were good. Although the interval between baseline and followup was 2 months, the test-retest reliability was still statistically significant and approximated the internal reliability coefficient. The good correlation found in the present study supports this claim. The internal consistency coefficients of the total, positive, and negative scales were 0.84, 0.81, and 0.83, respectively. The retest reliability values after 2 months were 0.89, 0.78, and 0.89 at a significance level of p < 0.01, indicating high stability across time and good measurement requirements.
Stöber (2000) showed that only anxiety (but not depression) was related to enhanced imagery of future negative events. However, our study showed that positive imagery was negatively correlated with depression (r = −0.30, p < 0.01) and anxiety (r = −0.36, p < 0.01), while negative imagery was positively correlated with depression (r = 0.31, p < 0.01) and anxiety (r = 0.30, p < 0.01). Both findings have potentially important implications for research on anxiety and depression.
According to Šimundić (2009), AUC is generally used in ROC analysis to reflect the diagnostic performance of an evaluation tool. Positive prospective imagery may be significant as a reference for depression. A cut-off point of 19 provided optimum diagnostic accuracy against the BDI-II.
LPA revealed a three-class pattern. Three groups in two subscales were labeled "low vividness" (22.9 and 32.3% of the sample, respectively), "moderate vividness" (46.9 and 47.1% of the sample, respectively), and "high vividness" (30.3 and 20.6% of the sample, respectively) groups.
MI was mainly assessed in this study by analyzing and comparing four models: (1) configural invariance (equality of factor structure), (2) metric invariance (equality of factor structure and loadings), (3) scalar invariance (equality of factor structure, loadings, and intercepts), and (4) strict factorial invariance (equality of factor structure, loadings, intercepts, and unique variances). Furthermore, the MI showed that the PIT can be used among different age and gender groups.

Limitations and Future Research Directions
There were several limitations of this study. First, the high proportion of women in our sample may not be representative of the community and may limit the generalizability of the results. Further research should validate the Chinese PIT by expanding the sample to a wider population. Second, it is uncertain whether participants actually followed the instructions to imagine positive or negative events and, if so, whether the contents of their prospective imagery were truly positive or negative. Finally, the present findings await replication with clinical participants.

CONCLUSION
In summary, the structural validity, Cronbach's α, and criterion validity of the Chinese PIT were verified, indicating that the scale's reliability and validity had suitable adaptability under different sampling methods and different subpopulation conditions and that the scale had good ecological validity. The Chinese PIT is a reliable and valid instrument for assessing prospective imagery and, to some extent, depression.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.