Development of a Chinese and American scale for measuring spirituality

Abstract A consensus has not been reached on the definition of spirituality. Consequently, it is difficult to understand the concept and to develop scales to assess spirituality. To measure this concept in a non-Western culture is even more difficult. Following sound scale development procedures, the current study endeavors to develop a Spirituality Scale for College Students that could apply to both Chinese and American college students. The scale focuses on three core aspects of spirituality. Data were collected from college students both in China and the U.S. to provide validity and reliability evidence. The results showed that a three-factor model fit the American sample, the Chinese sample, and the entire sample. A measurement invariance analysis revealed that the scale achieved partial measurement invariance. Implications and limitations are also discussed.


PUBLIC INTEREST STATEMENT
What is the definition of spirituality is still a large question among researchers in the field of psychology. Since spirituality has been demonstrated to have a positive influence on people's psychological well-being, it seems important that we have an effective measurement tool available for this concept. To fill this research gap, the current study developed a scale to measure spirituality among college students from both China and the United States, and examined the quality of it. Results shown that spirituality had three core components in both cultures, including search for a sacred, differentiation from religion, and the function of spirituality. The current study demonstrated that the scale was fair and equitable for both Chinese and U.S. college students. Applied researchers could administer this scale to college students from China and the U.S. to understand the cultural difference in the concept of spirituality and its influence on college students.

Defining spirituality
To date, there has been no consensus in defining spirituality. Some researchers have taken a "substantive" approach, which usually centers on ideas of "the sacred" or "divine phenomena". This way of approaching the concept tends to delineate a definitional boundary between psychology and theology. Other definitions tend to focus on the "functional" role of spirituality, and aim to describe what spirituality does, or how individuals or groups are affected, thus blurring those definitional boundaries (Moberg, 2002).
Some researchers have shown their preference for the first approach through their description of spirituality. For example, Hodge (2000) stated that "spirituality is defined as a relationship with a Transcendent Being (or whatever is considered Ultimate), informed by a certain spiritual tradition, which fosters a sense of meaning, purpose, and mission in life" (p. 2). Later, Hodge (2001) specified the term "Transcendent Being" as "God", and spirituality was defined "as a relationship with God, or whatever was held to be the Ultimate, that fostered a sense of meaning, purpose and mission in life" (p. 204). Along similar lines, Hill et al. (2000) defined spirituality as "the feelings, thoughts, experiences, and behaviors that arise from a search for the sacred" (p. 66).
In terms of the second approach to define spirituality mentioned by Moberg (2002), Tanyi (2002) stated that spirituality is a personal search for meaning and purpose in life, which involves connection to self-chosen and/or religious beliefs, values, and practices that give meaning to life. Frey, Daaleman, and Peyton (2005) define spirituality as self-efficacy and life-scheme. Additionally, meaning and purpose in life, connectedness, inner strength, self-transcendence, and belief are important components of spirituality.

Relation to religion
Many would agree that when referring to spirituality, the idea usually embedded in a larger religious context. Hill et al. (2000) posited that religion and spirituality are related concepts, and they proposed a set of criteria for defining and measuring spirituality and religiosity. The criterion for spirituality refers to "the feelings, thoughts, experiences, and behaviors that arise from a search for the sacred" (p. 66). Criteria for religion include the same criterion for spirituality, AND/OR "A search for non-sacred goals (such as identity, belongingness, meaning, health, or wellness) in a context that has as its primary goal the facilitation of (A), AND: The meaning and methods (e.g., rituals or prescribed behaviors) of the search that receive validation and support from within an identifiable group of people" (p. 66).
In addition to these definitions of spirituality and religiousness, these two terms have been defined differently as to which one has a broader construct. Zinnbauer defined spirituality "as a personal or group search for the sacred", and religiousness "as a personal search for the sacred that unfolds within a traditional sacred context" (Zinnbauer & Pargament, 2005, p. 35).
Essentially, the way in which spirituality and religiousness are defined depends on the specific psychological inquiry. Spirituality being defined broader follows the trends of believers and psychologists, since they also believe in this way. On the other hand, religiousness being broader, involves the continuity of research within the psychology in religion over the last century (Zinnbauer & Pargament, 2005).
It is clear that researchers have concluded that spirituality can not be fully distinguished from religion based on their historical association. The current authors conceptualize spirituality as a concept that overlaps with religion in many ways; however, there are also some distinctive features that are solely pertinent to spirituality. It is these distinct features of spirituality that the current study explores.

Existing measures of spirituality
Even though it has been found that "given the multiplicity of religious and spiritual meanings, selfratings of religiousness and spiritually (e.g., Likert-type ratings) are likely to yield uninformative and ambiguous data" (Zinnbauer, Pargament, & Scott, 1999, p. 914), Likert-type measurements are still utilized quite often in the field to study the concept of spirituality. There are many existing Likert-type scales that measure the concept of spirituality from different perspectives. Examples include the Spiritual Transcendence Index (STI; Seidlitz et al., 2002), Spiritual Well Being Scale (SWBS; Ellison, 1983), Spiritual Index of Well Being (SIWB; Frey et al., 2005), Spiritual Experience Index-Revised (SEI-R; Genia, 1997), Spiritual Assessment Inventory (SAI; Hall & Edwards, 2002), and the Miller Measure of Spirituality (MMS; Miller, 2004).
These existing measures all focus on the concept of spirituality. Because these scales were developed in American society, the concept of spirituality is not only related to Christianity, but it is also largely informed by the Christian religion. Take the Spiritual Transcendence Index, for example. Researchers have claimed that it is the " focus on perceived psychological effects of one's spirituality rather than on specific behaviors or beliefs, and its lack of reference to 'religion'" that differentiate the scale from other scales that "connotes organized religion" (Seidlitz et al., 2002, p. 441). However, taking a further look at the scale, one finds that four of the eight items include the word "God". It was reported that 96% of Americans say they believe in God (Shorto, 1997). Therefore, it makes sense to people when it is applied in societies that are highly informed and influenced by Christianity, such as the U.S. However, we must remind ourselves that it is perhaps problematic for people from other cultures where Christianity and the concept of "God" are not as pertinent, such as China. When the measure is introduced to people in these cultures, one would postulate that they are unlikely to score high on spiritual transcendence, and this might be because they do not identify with the concept of "God". However, is it really true that these individuals are less spiritually transcendent?

Research gap
Even though there are numerous measurements of the concept of spirituality, there is a lack of agreement on the conceptualization of the concept itself (Kapuscinski & Masters, 2010;Piedmont, 2014). In addition, for the existing measures of spirituality, there tends to be a general connotation of religion that is mainly based on a Western theological framework. Therefore, our aim is to develop a scale on spirituality that could tackle both hurdles by developing a scale that really focuses on the essence of the concept of spirituality and one that could be utilized for non-Western cultures.

Theoretical framework
In an effort to develop a scale that could clearly define the concept of spirituality, the current paper managed to conceptualize spirituality from three dimensions (search for a sacred/higher power, differentiate from religiosity, and function of spirituality). The set of criteria for defining and measuring spirituality and religiosity developed by Hill et al. (2000) were utilized to develop the first scale. Questions about the difference between spirituality and religious beliefs provided the items for the second scale, since we believe it is essential to account for the relation between spirituality and religion. Tanyi (2002) presents a definition of spirituality that differs from Hill and colleagues. From research in nursing, Tanyi operationalizes spirituality as "a personal search for meaning and purpose in life" (p. 506), which involves connecting to self-chosen and/or religious beliefs, values, and practices that give meaning to life. This framework provided the questions comprising the third scale.

The purpose
Researchers have confirmed the positive influence that spirituality/religiosity has on people's psychological well-being (Kim, Reed, Hayward, Kang, & Koenig, 2011 Rugira, Nienaber, & Wissing, 2013;Unterrainer, Ladenhauf, Moazedi, Wallner-Liebmann, & Fink, 2010). We believe that it is important to develop an understanding of the concept of spirituality in order to better tailor our work to this population.
Researchers in China have stated that in the research area of college student spirituality, there is no well-accepted and clear definition of spirituality (Li & Cai, 2016). The current study is also dedicated to understanding spirituality among college students in the Chinese cultural context, which should be a contribution to cross-cultural studies.

Construct specification
The Spirituality Scale for College Students (SSCS) has three dimensions: search for the sacred (SFAS), differentiate from religiosity (DFR), and function of spirituality (FOS). Items were written in a Likert-type format, with statements of spirituality followed by a five-answer option scale: 1 (strongly disagree), 2 (disagree), 3 (neither agree nor disagree), 4 (agree), 5 (strongly agree). Items with high scores indicate an endorsement of a higher level of spirituality (SFAS), a lower level of differentiation between spirituality and religiosity (DFR), and a higher level of belief that spirituality possesses function (FOS).

Item generation
The authors developed a pool of items based on the theoretical framework of spirituality. A single structure was applied, meaning that each item only measures one dimension. The original pool of items provided 13, 13, and 12 items to measure SFRS, SFR, and FOS respectively. Sample items of the three subscales are "I believe there is some type of sacred or higher power", "People who are religious are also spiritual", and "Being spiritual enables me to find meaning in my life". See Appendix 1 for the entire scale.

Focus group
Four people, including the developer of the measure, attended the focus group, during which each member was asked a few questions, including "What does spirituality mean to you?", "How do you tell if an individual is spiritual?", and "Do you think there are any aspects that you would want to add to the scale"?
One item was noted as being a double-barreled question, and so the developer changed it to two different questions. Two items were removed from scale 1 because they could not be included in any of the three scales. Another item was moved from scales 1 to 3 because it fits better in that particular group. Three additional items were added to scale 3 to incorporate group members' opinions regarding interconnectedness. ollowing the focus group, scales 1 through 3 had 10, 13, and 16 items, respectively.

Pilot study
A pilot study was conducted with a sample of 25 college students (13 male, 14 female; mean age = 34.4 years) using the original item pool of 39 items to select high-quality items to form the final version of the SSCS. Preliminary results showed the Cronbach's alphas were .806, .835, and .964 for three subscales, respectively. After reviewing the item-total correlation and item contents, items with low item-total correlation and items that decreased the Cronbach's alpha were removed from the original item pool. Lastly, the final version of the SSCS contained 21 items in total, and there were 6, 5, and 10 items in three subscales, respectively. Table 1 presents the test specification of the SSCS, where an entry 1 means this item exists in one particular subscale. Moreover, the Cronbach's alpha for the final version of each subscale were .905, .916, and .967, respectively.

Back translation
Prior to administering the scales, the original English version was translated into Chinese by the primary researcher (bilingual in Chinese and English), then a second bilingual (Chinese-English) graduate student translated the Chinese version back into English. The primary researcher then checked with the original scales to revise the translation of the Chinese version of the scales. The primary researcher checked with the second graduate student to determine if the revised translation was more accurate.

Analytic plan
Data analysis proceeded in three steps. First, a three-factor confirmatory factor analysis (CFA) model was fitted to the American and the Chinese samples separately to examine the construct of the spirituality in both samples, which specified three latent factors, search for a sacred/higher power (SFAS), differentiate from religiosity (DFR), and function of spirituality (FOS), as displayed in Figure 1.
Second, after the same construct was found in both samples, the CFA model obtained from the first step was fitted into the entire sample to examine the construct of spirituality.
Third, after demonstrating that two samples had the same construct of spirituality, a multiplesample CFA model was adopted to test the extent to which the three-factor model exhibited measurement invariance between the American and Chinese samples. This study followed the procedures for measurement invariance suggested by Vandenberg and Lance (2000), including configural invariance, metric invariance, scalar invariance, and residual invariance. However, the full measurement invariance might be too strict and unrealistic in most social scientific research (Steinmetz, Schmidt, Tina-Booh, Wieczorek, & Schwartz, 2009). Hence, the concept of partial measurement invariance is applied in many social science studies, meaning that only a subset of items was invariant whereas other items were allowed to be varied across groups (Byrne, Shavelson, & Muthén, 1989). Several researchers have argued that at least two items with metric and scalar invariance are sufficient to provide meaningful factor comparisons (Baumgartner & Steenkamp, 1998;Byrne et al., 1989). Thus, the current study also applied the partial measurement invariance in its testing procedures.
Model testing procedures were carried out using the lavaan package (Rosseel, 2012) in the programming environment R (R Development Core Team, 2017). Robust maximum likelihood (MLR) estimation was used for all analyses. Accordingly, the nested model comparison was conducted via a likelihood ratio test (LRT). Full information maximum likelihood estimator (FIML) was used to handle missing data. Model fit was tested by using the comparative fit index (CFI), the Tucker-Lewis Index (TLI), and the root mean square error of approximation (RMSEA). An acceptable model fit was achieved when the CFI > .90, the TLI > .90, and the RMSEA <. 09 (Hu & Bentler, 1999).

Three-factor CFA
First, a three-factor CFA model was fit to the American samples, Chinese samples, and the entire sample, respectively. The model was identified by setting all three latent factor means to 0 and variances to 1, such that all item intercepts, item factor loadings, and item residual variances could be estimated freely. Three pairs of correlations among factors were freely estimated in the model. However, the original model (Single-group CFA0: American Sample and Single-group CFA0: Chinese Sample) did not achieve acceptable model fit. After reviewing modification indices and item contents, three pairs of error covariance were added between items I12 and I13, I10 and I11, and I1 and I2. As shown in Table 2, the modified models (Single-group CFA1: American Sample and Single-group CFA1: Chinese Sample) achieved the acceptable model fit. Next, a single-group CFA was fit to the entire sample and results suggested that acceptable model fit was achieved.
Next, the standardized factor loadings and correlations among factors are presented in Table 3. All factor loadings are statistically significant. Moreover, the standardized factor loadings of SFAS, DFR and FOS, range from 0.70 to 0.89, 0.67 to 0.86, and 0.78 to 0.91, respectively, indicating the R 2 values are nearly above 0.5, such that the factor loadings are practically significant as well. See Appendix 2 for other parameter estimates.
Since the item factor loadings were not equal within one factor, the reliability of each factor cannot be estimated using alpha. Thus, the reliability of each factor was calculated using Omega, as described in Brown (2014). As shown in Table 4, the reliability of each factor ranges from .90 to .97, indicating high marginal reliability for all factors.

Measurement invariance
The measurement invariance on the SSCS was examined across American and Chinese samples, which could ensure the meaningful comparison of the spirituality between two cultures if the measurement invariance held. Following the testing procedures by Vandenberg and Lance (2000), a total of five invariance models were tested; the configural invariance model, the metric invariance model, the scalar invariance model, the residual invariance model, and the structural invariance model. LRT was conducted to compare the nested model with different levels of measurement invariance, where the non-significant results indicated invariance hold. When the full invariance model did not hold, the modification indices were reviewed to release one parameter constraint to examine if partial invariance held.
First, a configural invariance model was specified that the three-factor model was estimated in each sample simultaneously, with factor means fixed to 0 and factor variances fixed to 1 in each sample. As shown in Table 2, the configural invariance model achieved acceptable model fit. Next, parameter constrains were added to this configural invariance model to test different levels of measurement invariance. Next, a full metric invariance was specified with equality of the unstandardized item factor loadings across the group. Additionally, the factor variances of American samples were fixed to 1, but the factor variances of Chinese samples were freely estimated. However, Table 4 shows that the full matric invariance model significantly worsens the model fit compared to the configural invariance model. Next, after reviewing the modification indices, a series of item-factor loading constrains were released one at a time. Lastly, a partial metric invariance model was achieved after freely estimating 2 items (I5 and I8) (seen in Table 5).  Third, the scalar invariance model was specified that constrained equal item intercepts across samples except for the items with unequal item factor loadings. Additionally, the factor means of American samples were fixed to 0, but the factor means of Chinese samples were freely estimated. However, this scalar invariance model significantly worsened the model fit compared to the partial metric invariance model. Next, in a similar procedure as the last step, Table 5 shows that a partial scalar invariance model was achieved by freely estimating 8 item intercepts (I3, I4, I6, I7, I10, I12,  I13, and I19).
Fourth, the residual invariance model was tested, which constrains equal item residual variance across samples except for the items without equal intercepts. The residual invariance model significantly worsened the model fit compared to the partial scalar invariance model. Next, another 4 residual items were freely estimated (I1, I10, I16, and I20) to achieve the same model fit as the partial scalar invariance model, as shown in Table 5.
Lastly, the structural invariance model was tested by constraining the factor means to 1 for both samples. The results in Table 5 show this factor variance invariance model significantly worsened the model fit compared to the partial residual invariance model, indicating there were significant difference in factor variance between American and Chinese samples. Because the factor variance invariance did not hold, it was not necessary to test other structural level invariance. The partial residual invariance model was the final model, as shown in Table 2, which achieved the acceptable model fit.
In summary, after achieving the partial measurement invariance as described, a total of 19 items achieved metric invariance, 8 items achieved the scalar invariance, and 4 items achieved Note: Δ LL represents the difference in log-likelihood between 2 compared models; Δdf represents the difference in degree of freedom between 2 compared models. residual invariance. As the above criteria for partial measurement invariance, the SSCS achieved partial measurement invariance such that it ensures meaningful group comparisons the factor means, factor variances, and factor covariance. Table 6 presents the structural model parameter estimates from the partial residual invariance model. As the U.S. samples were the reference group, the factor variances and factor means were fixed to 1 and 0, respectively. The factor variances and factor means of the Chinese samples were freely estimated.
As shown in Table 6, Chinese participants had lower factor means across all factors compared to American participants. Additionally, Chinese participants had smaller factor variance, indicating there was less within-group difference in spirituality in Chinese participants compared to U.S. participants. In terms of factor covariance, U.S. participants had a higher correlation between SFAS and FOS than Chinese participants. However, there was similar covariance between SFAS and DFR, and DFR and FOS between two samples.

Discussions
As stated earlier, a three-factor CFA model fit the entire sample. Thus, we conclude that our threefactor scale of spirituality is solid. Based on measurement invariance analysis, it is reasonable to conclude that the current scale has achieved partial measurement invariance. Hence, our attempt to make group comparisons between the Chinese and American samples at the structural level was validated.
It is interesting to see that Chinese participants had lower factor means across all factors compared to the American participants. This indicates that the Chinese sample had a lower average endorsement on these three factors: SFAS, DFR, and the FOS. It confirmed previous research findings that China is not a religious country where people's understanding of spirituality would be necessarily related to a sacred or a higher power (Turner, 2012). For DFR, because a higher score on the scale means a lower differentiation between spirituality and religiosity, our Chinese participants actually endorsed a higher level of differentiation than their American counterparts. Our finding is consistent with that of another researcher showing that the majority of Americans have a tie with God (Shorto, 1997); hence, it is more likely that American participants do not draw a clear line between these two concepts. In regard to between-group differences on the FOS, Chinese participants still had a lower average endorsement, although the difference is smaller. It is our understanding that spirituality plays a role that is growing in importance in these young Chinese college students' life, and future research could take this possibility into consideration.
The factor variances of these two samples indicate that Chinese participants have overall lower within-group differences than American participants on all three factors. Chinese sample's age range is from 18 to 28 years, while the American one is from 19 to 43. The wider age range of the American sample provided a possibility for heterogeneous levels of spirituality based on an existing developmental model. People have different developmental tasks at different ages; hence their focus on life and their perception of life may vary (Erikson, 1963). Therefore, Chinese participants similar in age may have a lower variance in their current life tasks than their American counterparts who have a larger age difference.
It is important to note that American participants have a larger factor covariance between SFAS and FOS than Chinese participants. We postulate that the American college students in this study closely identify one important function of spirituality as being helpful in seeking the sacred, while this is less salient for Chinese college students. This association possibly contributed to the higher factor covariance between SFAS and FOS in the American sample than the Chinese sample.

Implications
The current study attempted to fill the research gap in understanding the concept of spirituality. From the beginning, we conceptualized a spirituality that entails the essence of the concept. Thanks to the validity and reliability evidence we gathered, the newly developed scale could potentially help researchers and educators to further understand the status of spirituality among the college student population in America. As mentioned earlier, the role of spirituality in the lives of Chinese college students remains largely unknown due to limited research within this population. We are able to contribute our own understanding of this population's level of spirituality, and our effort to validate the scale in China has taken us one step closer in understanding the status of spirituality among college students in a non-Western culture. With the globalization process, all areas are essentially interconnected. Therefore, comparing the status of spirituality among college students around the world could prove very meaningful.
Another important contribution of the current study involves psychological well-being. With spirituality being identified as an important variable associated with people's psychological wellbeing (Brown & Parrish, 2011;Shin & Steger, 2016), this scale could function as an index to measure spirituality so that researchers can further explore its relationship with psychological well-being.

Limitations
Although we attempted to conduct cross-cultural studies, our sampling process could be improved. Specifically, it would be ideal if we could collect data from college students in different areas both in China and the U.S. By doing so, we could increase sample heterogeneity, which could provide stronger validity evidence for the current scale.
We utilized the back translation process to ensure the reliability of the administering process; however, it is possible that an understanding of the concept of spirituality of Chinese college students is different from what it is in the U.S., as researchers have shown that the concept of the "sacred" in English has three meanings, but only one of the three could correspond to a word in an Indian language (Lutzky, 1993;Otto, 1950;Paloutzian & Park, 2013). Therefore, our next step is to provide convergent and concurrent validity evidence for this current scale. It is also our hope to see future research continue to bridge the gap in our understanding of this issue.

Conclusions
Our current research was dedicated to developing empirically validated scales to measure spirituality. It contributes to the current literature on measuring the concept of spirituality that taps into the most salient aspects identified by us. We have also taken the measure to a different culture and collected validity evidence for the scale, which enhanced our understanding of the concepts between different cultures.