Translation and Validation of the BFI-2 in a Croatian Sample

: This paper describes the process of translation and validation of the adaptation of the Big Five Inventory-2 (BFI-2) in the Croatian language. The translation process from English to Croatian was conducted using a forward and backward translation process. The resulting adaptation was then assessed for structural and construct validity, as well as reliability on a convenience sample of 320 Croatian participants. The results showed good reliability estimates at the domain level and somewhat lower yet satisfying estimates at the facet level. Confirmatory factor analyses (CFAs) supported the proposed hierarchical structure. The BFI-2 domains and facets showed adequate construct validity, estimated via the within-and between-domain correlations of the facet scales. Overall, the preliminary results of the Croatian adaptation are satisfactory and support efforts for further improvement and assessment in a larger sample, adequate for research in personality psychology.

The Big Five model of personality consists of five broad personality dimensions commonly referred to as Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to Experience (John & Srivastava, 1999). These five dimensions (Big Five; Goldberg, 1992;John et al., 2008;McCrae & Costa, 1987) facilitate an efficient description of stable individual differences in thinking, feeling, and behaving and are widely used in psychological research and application. The dimensions are the result of a lexical rather than theoretical approach, thus deriving from an analysis of language terms people use to describe themselves (John & Srivastava, 1999). The word "big" in the name emphasizes the breadth of abstraction of these dimensions with each one encompassing many specific personality traits (Costa & McCrae, 1995;John, 1990). The model provides a descriptive taxonomy for personality research, thus allowing cooperative research and integration of findings across different instruments (McCrae & John, 1992), languages, and cultures (Carlo et al., 2014).
Consequently, extensive research has been conducted using many different instruments to assess the Big Five personality traits. One of the most frequently used measures is the Big Five Inventory (BFI; John et al., 1991), a 44item questionnaire assessing the core features of the five domains in a general population with good psychometric properties. However, since its development, the understanding of the structure of personality has advanced to a hierarchical view; accordingly, this was reflected in the instruments. The revised version of the BFI, the BFI-2 (Soto & John, 2017), therefore, contains a hierarchical structure as well.
The BFI-2 improved the original BFI along three objectives (for a detailed description, see Soto & John, 2017). First, for every Big Five domain, three facets were defined based on previous research on the hierarchical structure of personality (e.g., DeYoung et al., 2007;Goldberg, 1999). Each facet is measured with four items (i.e., 12 items per domain) that were developed with a focus on clarity, balancing the questionnaire's complexity with its psychometric quality. Second, to control for acquiescence (i.e., the tendency of a participant to agree to an item, regardless of its content; Jackson & Messick, 1958), the items are content-balanced (i.e., there is an equal number of positively and negatively keyed items) for every facet and domain. Finally, new labels for two of the domains were adopted: Negative Emotionality instead of Neuroticism (due to its clinical connotation) and Open-Mindedness instead of Openness (to clarify its reference to mental rather than social experiences and thus its distinction to Extraversion).
Altogether, the BFI-2 consists of 60 items (30 negatively keyed and 30 positively keyed) in the form of short phrases combined with (a) a synonym (e.g., "Is dependable, steady"), (b) a definition (e.g., "Is reliable, can always be counted on"), or (c) context (e.g., "Thinks poetry and plays are boring") that can be completed in 10 min. Furthermore, the BFI-2 is freely available for use in research, thereby facilitating personality research around the world. The psychometric evaluation of the BFI-2 revealed good reliability of the scores at both the domain and facet levels, as well as robust factor structure (Soto & John, 2017). Also, it supported the convergent and discriminant validity of the scores by associating them with other personality inventories such as NEO PI-R (McCrae & Costa, 2010), BFAS (DeYoung et al., 2007), and Big Five minimarkers (Saucier, 1994). Moreover, compared to its precursor, the BFI-2 showed incremental power in predicting various psychological and behavioral, as well as peer-reported, criteria.
Importantly, the BFI-2 has been developed for research purposes and has been used in cross-cultural research (e.g., Gardiner et al., 2019). Therefore, since its development, various adaptations to different languages (e.g., German -Danner et al., 2019;Dutch -Denissen et al., 2019;Slovakian -Halama et al., 2020;Norwegian -Føllesdal & Soto, 2022;Spanish -Gallardo-Pujol et al., 2022) have been developed and have provided further evidence for its good psychometric properties. Many other preliminary translations, including a Croatian version, were created and are available online as a part of the International Situations Project (ISP; Baranski et al., 2020). However, this version has not yet been validated or used with a sample representing the general Croatian population. Therefore, this study aimed to develop a Croatian adaptation of the BFI-2 and assess its psychometric properties. We expect to replicate the findings of the original study (Soto & John, 2017) by demonstrating a robust multidimensional structure at the domain and facet levels, as well as strong internal consistency and between-domain discrimination.

Method
Translation of the BFI-2 We translated the BFI-2 in several steps: First, the items were translated from English by the first author of this study who is a trilingual (Croatian, English, and German) native Croatian speaker and a former US resident, in cooperation with a Croatian translator. In some instances where the items included phrases rather specific to the English language, the German adaptation (Danner et al., 2019) was used as an aid. For example, the item "Tends to find fault with others" (bfi2_agre_3_rec) can be ambiguously translated, whereas the German version of the item "Ich neige dazu, andere zu kritisieren" ("I tend to criticize others") directly corresponds to the phrase commonly used in the Croatian language "Je sklon kritiziranju drugih," and was therefore adopted as such. Nonetheless, it is important to emphasize that the German version was only used as a support in such cases as described. The goal of the translation process was to find the most literal adaptation possible for individual items while reflecting the content of the domains and facets as accurately as possible.
The second step included the comparison of the preliminary translation with the one created for the ISP (Baranski et al., 2020), with consent from its author. This version was an ad hoc version used on a student sample and had not yet been validated. Given that the target population of the BFI-2 is the general population, and considering the standard practice in BFI-2 adaptations to employ at least two independent translators (see, e.g., Danner et al., 2019;Gallardo-Pujol et al., 2022;Halama et al., 2020), we opted to regard this translation as one of the two independent translations for the current study. The differences between the translations were identified and discussed, and for each item, the more suitable version was chosen or both versions were combined. While deciding on one version or the other, we considered the nuances and connotations of the Croatian language and the potential impact of the choices on psychometric properties. For instance, to avoid using the same Croatian translation for the word "compassion" in two items: "Is compassionate, has a soft heart" (bfi2_agre_4_rec) and "sympathy" in the item "Feels little sympathy for others" (bfi2_agre_1), we decided for the word "samilost" for "compassion" and "suosjećanje" for "sympathy," as using the same word might have resulted in correlated residuals between items.
Finally, the back-translation to English was done by a Croatian PhD student in entrepreneurial studies and compared to the original version (Soto & John, 2017). Although there were no major contextual discrepancies between the original English version and the back-translated English version of our adaptation, a few modifications were made for the items where a single word could have impacted its expected psychometric difficulty. For instance, the preliminary adaptation of the item "Stays optimistic after experiencing a setback" ("Nakon neuspjeha ostaje optimističan") resulted in a back-translation of the word "neuspjeh" as "failure," rather than "setback." A strong word such as "failure" could result in a more positively skewed distribution of the item responses than the one assumed and was therefore replaced with a milder alternative (i.e., "nazadovanje"/"setback"). For a detailed description of the reasoning behind this decision and to better understand our translation process, refer to Supplementary Material 4.
As mentioned above, the aim of the adaptation was to obtain optimal balance between the closeness of the translation to the original BFI-2 and the intelligibility of the items. Consequently, we wanted to minimize any deviation from the original item formulation. Therefore, we decided to maintain the structure of the items as uncompleted sentences (e.g., "Je otvoren, društven" ["Is outgoing, sociable"], combined with the "Ja sam netko tko. . ." ["I am someone who. . ."]) appearing at the beginning of the questionnaire and referring to each item respectively. Accordingly, we decided against the structure of completed sentences, as applied in the German version (Danner et al., 2019). That is also why we decided against the often-used alternative phrase "Ja sam osoba koja. . ." ["I am a person who. . ."]. The assumption behind the usage of the word "osoba" [person] rather than "netko" [someone] could be that osoba may be more inclusive, since it is of the female gender, whereas netko is of the male gender. However, in everyday language and contemporary literature as well as research, both expressions are used interchangeably and refer to both genders, respectively. Furthermore, we applied the same five-point rating scale as in the original BFI-2, with the labels uopće se ne slažem

Participants and Procedure
A convenience sample of Croatian participants was recruited through the personal network of the first author and social media websites. The participants were provided with a link to the online survey. After completion of the survey, participants were asked to forward the link to other potential participants. The participants were informed about the purpose of the study, as well as the data privacy and anonymity policy, to which they had to consent prior to taking part in the study. Additionally, the participants had to be of legal age (i.e., at least 18 years old) and of Croatian nationality. The entries that did not include responses to entire BFI-2 items were excluded from further analysis. The final sample consisted of 320 adults, ranging in age from 18 to 77 years (M = 39.86, SD = 13.62), with 61% under 35 years and 30% over 50 years of age. Two hundred nineteen participants identified as female (68%) and 101 as male (32%). Thirty-four percent of the participants graduated high school, 17% had a bachelor's degree, 33% had a master's degree, and 12% had an MD or PhD.

Measures
All participants rated themselves on the Croatian version of the BFI-2 questionnaire consisting of the translated items. Furthermore, all participants reported their age, gender, and level of education. Since we were not able to provide our participants with other incentives than feedback on their personality trait scores, we chose to keep the survey as short as possible. Therefore, no further measures were assessed in the survey.

Analyses
Overall, we conducted all analyses reported by Soto and John (2017), unless we lacked the relevant data (e.g., peer reports). Data and materials to reproduce the results of this article are available at https://osf.io/ha268/. First, we examined the descriptive and distributional properties and bivariate correlations of the Croatian BFI-2 scores at the item, facet, and domain levels. Similarly, we assessed the mean-level gender differences and reliability estimates for facet and domain scales. For the latter, we used Cronbach's αas it is the most widely used reliability estimate and for the comparison to the estimates for the original studyand McDonald's ωas a better estimate for hierarchical scales (e.g., Goodboy & Martin, 2020;McNeish, 2018). Most importantly, we expected to find satisfactory reliability estimates consistent with those reported for the original study and evidence for discriminant validity based on the comparison of between-and within-domain correlations.
To assess the structural properties of the Croatian BFI-2 items, we conducted confirmatory factor analysis (CFA) and principal component analysis (PCA). Given the specific assumptions on the structure of the BFI-2, as postulated by the authors of the original version, we only examined theoretically relevant models either at the facet level or at the domain level, but then modeled the facets as well; we excluded, for example, the models that Soto and John (2017) had considered, but fitted the data poorly, such as a one-factor model. Consequently, we fit a series of three different CFAs. First, the facets were modeled in individual measurement models. Here, if the model fit was not satisfactory, we added correlated residuals between same-keyed items based on modification indices. We then examined two further models that were identical to those tested in the original study (Soto & John, 2017, p. 133): First, all facets were modeled in one structural model, and in the next step, an acquiescence factor was added. Similar to Soto and John, we report CFI, TLI, BIC, RMSEA, and SRMR. However, in line with Kenny et al. (2015) and Shi et al. (2022), RMSEA was not used to evaluate model fit for models with few (i.e., 1 or 2) degrees of freedom. We expected acceptable to good model fit for each model, as well as improvement in fit for models accounting for acquiescence, thereby providing evidence for structural validity, congruent with the findings for the original version.
Finally, we conducted a series of three PCAs to further examine the structural model and test its robustness to acquiescence. The analyses were performed on (1) the 15 facet scores, where we expected to find a clear five-factor solution since averaging item responses cancels out the acquiescence-related variance (see Soto & John, 2017, for a general description); (2) the 60 raw item scores (i.e., without controlling for acquiescence); and (3) the same item scores after within-person centering (i.e., subtracting each person's average score across all 60 items from each of their individual item responses). By doing so, the common variance of all items (i.e., acquiescence) should be removed, resulting in five factors, whereas, for the PCA on the raw item scores, we expected six factors to emerge.

Descriptive Statistics and Correlations at the Item Level
The analysis of item descriptive and visual properties revealed some items to be somewhat easy (i.e., most of the participants rated themselves high on them). Those items included bfi2_agre_2, bfi2_agre_7, bfi2_agre_11, and bfi2_-cons_9. These items displayed a limited answer range of three scale points (i.e., 2-5), meaning that no participant responded with totally disagree on these items. However, these items displayed no issues in any of the further analyses. Furthermore, when compared to the results of the descriptive analysis of the original items (Soto, 2022), some items, especially agreeableness items, seem to display extreme scores, making this a common characteristic rather than an issue specific to our adaptation. Therefore, a revision of these items, solely on the basis of high values, is not indicated. Also, considering that these items assess respect and care for others, the aspect of social desirability could explain why precisely these items scored so high.

Descriptive Statistics at the Facet and Domain Levels
Next, we examined the intercorrelations and descriptive statistics of the BFI-2 scales at the facet and domain levels. The results are presented in Tables 1, 2, and 3.
Twelve of 15 facet scales had a relatively high average score (≥ 3.42). The highest mean was that of the Respectfulness facet (M = 4.23), since two of its items (bfi2_agre_2 and bfi2_agre_11), mentioned before, were one of the highestscoring items. Accordingly, the domain mean scores were also high [Extraversion (3.49), Agreeableness (3.89), Conscientiousness (3.90), Negative Emotionality (2.81), Open-Mindedness (3.75)], and variances constrained (SDs ≤ 0.95), but comparable to the descriptive statistics of the original version (Soto, 2022;Soto & John, 2017, p. 128). Table 1 also presents the mean-level gender differences. Comparing differences across groups requires at least scalar invariance across groups (Sass, 2011;Vandenberg & Lance, 2000). We report measurement invariance tests in Supplementary Material 3. Eleven of 18 tested models showed scalar invariance. We, therefore, interpret only the results for which scalar measurement invariance held. The results suggest that women tend to characterize themselves as significantly more conscientious than men. At the facet level, women reported higher scores for Compassion, Respectfulness, Organization, Responsibility, Depression, and Emotional Volatility. For those facets for which scalar measurement invariance held, no effects could be observed for Assertiveness, Trust, and Creative Imagination. These results mostly converge with those reported for the original study (Soto & John, 2017, p. 128). However, due to the size and much higher prevalence of women in this sample, the results may not illustrate the gender differences quite accurately.

Correlations at the Facet and Domain Levels
Tables 2 and 3 present interscale correlations among the BFI-2 facets and domains, respectively.
The absolute within-domain facet correlations (e.g., between Sociability and Assertiveness) ranged from .36 to .66, averaging .49 r-to-z transformation before averaging was applied, whereas absolute between-domain correlations (e.g., between Sociability and Intellectual Curiosity) ranged from .01 to .51, averaging only .19 r-to-z transformation before averaging was applied. At the domain level, absolute correlations between domains (e.g., Extraversion and Agreeableness) ranged from .16 to .39, averaging .28. Although some of the facets displayed high interdomain correlations, these results indicate overall good discriminant validity between the BFI-2 domains, as well as an appropriate level among the facets. Moreover, the average correlations are similar to those reported for the original study (Soto & John, 2017, pp. 125-126).

Reliability Estimates
Tables 2 and 3 present reliability scores estimated using the psych package (Revelle, 2022) in R (R Core Team, 2022) for the BFI-2 facets and domains, respectively. As shown in Table 2, Cronbach's α estimates for the 15 facet scales ranged from .53 to .86, averaging .66, whereas ω ranged from .60 to .88, averaging .73. The α values suggest somewhat lower internal consistency on the facet level compared to the original study (Soto & John, 2017, p. 126), but similar to those of other BFI-2 adaptations (e.g., Danner et al., 2019;Gallardo-Pujol et al., 2022).
As shown in Table 3, the five domain scales had Cronbach's α scores ranging between .75 and .89, averaging .81, and ω scores ranging from .66 to .85, averaging .73. Both estimates indicate acceptable to good internal consistency on the domain level, with α scores similar to those reported for the original version (p. 125).

Confirmatory Factor Analyses
Next, we examined the structure of the BFI-2 by conducting a series of three different CFA models using the lavaan package (Rosseel, 2012) in R (R Core Team, 2022). The robust maximum likelihood estimator was used to estimate parameters that are robust to non-normality. Fit statistics for these models are provided in Table 4. To evaluate model fit, we focus on the comparative fit index (CFI) and standardized root-mean-square residual (SRMR) for models with small degrees of freedom and root-meansquare error of approximation (RMSEA) for models with many degrees of freedom.
The first single facet model included one factor representing a single facet scale within each of the Big Five domains. This model allowed four items to load on their corresponding facet factor, respectively, and no residual variances were allowed to correlate. In contrast to the original study, this model was added to explore item and facet properties and identify possible misfits before combining them in a larger model. The single facet model provided a good fit (CFIs ≥ .947, SRMRs ≤ .060) for Assertiveness, Energy Level, Compassion, Trust, Organization, Productiveness, Anxiety, Depression, Emotional Volatility, Aesthetic Sensitivity, and Intellectual Curiosity. Contrarily, it Note. Gender d = Cohen's d for the mean-level difference between men and women, with positive values indicating higher scores for women. Negatively keyed items were reverse-keyed before the computation. All entries were rounded on two decimal points. Differences of 0.31 or larger are significant at p < .01. a We tested for measurement invariance across gender (Sass, 2011;Vandenberg & Lance, 2000). The results of the measurement invariance tests can be found in the corresponding Supplementary Material 3. Models marked with c passed configural invariance, marked with l passed metric invariance, and with i scalar invariance.
In most cases where the model fit was not satisfactory, modification indices suggested a substantial improvement by allowing residual variances of two same-keyed items to correlate. In the case of Respectfulness, for instance, we, therefore, allowed residual correlation between the negatively keyed items bfi2_agre_5_rec and bfi2_agre_8_rec, which led to improvement of the model fit. Since the residual correlation was positive, it is also possible to model it as an acquiescence factor with loadings fixed to one and uncorrelated with the other latent variables. Thus, in line with the original interpretation (Soto & John, 2017), it is plausible to assume that the residual correlation reflected the acquiescence factor at the facet level. We performed this for all four facets described above and attained a good fit (CFIs ≥ .988; SRMRs ≤ .036).
The second, three facets model, just like in the original study, included three factors representing the three facet scales within a Big Five domain. Each item was only allowed to load on a single facet factor, and the three facet factors were allowed to intercorrelate. No residual variances were allowed to correlate. Compared to the single facet model, this model combines single facet scales of each Big Five domain, respectively. For instance, Respectfulness, which was described before, was  Note. N = 320. df = degrees of freedom; CFI = comparative fit index; TLI = Tucker-Lewis index; BIC = Bayesian information criterion; RMSEA [90% CI] = rootmean-square error of approximation with 90% confidence intervals; SRMR = standardized root-mean-square residual. Table entries correspond to the robust values of the MLR estimator output. CFI and TLI values ≥ .900, and RMSEA ≤ .080 and SRMR ≤ .060, are in bold for models with many or few degrees of freedom, respectively. a Model in which residual correlations between two same-keyed items were allowed to improve the model fit. b RMSEA is reported but not interpreted to evaluate models with few (i.e., 1 or 2) degrees of freedom (Kenny et al., 2015;Shi et al., 2022) combined with Compassion and Trust from the single facet model to build the three-facet structure of the Agreeableness domain. This model provided an acceptable or near-acceptable overall fit for Extraversion (CFI = .880, RMSEA = .078), Negative Emotionality (CFI = .919, RMSEA = .068), and Open-Mindedness (CFI = .909, RMSEA = .070), while the fit for Agreeableness (CFI = .767, RMSEA = .094), and Conscientiousness (CFI = .893, RMSEA = .101) was unacceptable. Nevertheless, the results were similar to those of the original study (p. 133). The final three facets plus acquiescence model added an acquiescence factor to the three facets model just described. Thus, each item was allowed to load on both its facet factor and an acquiescence method factor. All loadings on the acquiescence factor were constrained to one, and the acquiescence factor was not allowed to correlate with any of the facet factors. Such constraints ensure the distinction between acquiescence and meaningful personality content (Soto & John, 2017). According to our expectations, compared with the previous model, this model provided an improvement in fit for each Big Five domain (ΔCFIs ≥ .016, ΔRMSEAs ≥ .006), further supporting our assumption made about the influence of acquiescence on the facet level. Moreover, it provided an acceptable overall fit for Extraversion (CFI = .914, RMSEA = .066), Negative Emotionality (CFI = .936, RMSEA = .062), Open-Mindedness (CFI = .925, RMSEA = .064), and Conscientiousness (CFI = .927, RMSEA = .084), as well as a near-acceptable fit for Agreeableness (CFI = .850, RMSEA = .076). The reason for the unacceptable fit for Agreeableness seems to lie in the items bfi2_agre_3_rec and item bfi2_agre_6. Both items belong to the Trust facet but displayed a secondary loading on the Respectfulness facet. Nevertheless, compared with the results from the original study, the model fit for all domains was very similar, with small discrepancies depending on which fit indices one focuses on. Moreover, the same pattern of improvement from the model without the acquiescence factor emerged.

Principal Component Analyses
To examine the viability of using reverse-keyed items to deal with acquiescence, a series of three principal component analyses (PCA) was conducted. The first PCA was performed on the 15 facet scores (calculated by averaging raw item responses) of the Big Five domains. Conforming with the assumption that content-balanced facet scales control for acquiescence (Soto & John, 2017), the parallel analysis suggested five components. Therefore, we extracted five varimax-rotated components from the facet scales, which, as expected, yielded a clear five-factor structure. The loadings from this analysis are presented in Table 5. All facet scales had their primary loadings on the intended component. Also, just like reported for the original version, the highest secondary loadings were those of Depression on Extraversion and of Trust on Negative Emotionality, which hold conceptually meaningful associations. The extracted components could account for 57% of the observed variance.
To test for the effects of acquiescence at the item level, a PCA was conducted on the 60 raw items of our BFI-2 adaptation. The results were then compared to those of the second PCA, performed on the same items as the previous one, after within-person centering. As expected, the parallel analysis of the items without within-person centering suggested six components, indicating an additional acquiescence component. In contrast, the same analysis on the centered items proposed five components, thereby implying an effective elimination of the acquiescence component (see Figure 1).
We then extracted five varimax-rotated components from both the raw and centered items. The loadings of the PCA on within-centered items are presented in Table 6.
Both analyses yielded a clear Big Five structure with most of the items (55 in the PCA of the raw and 54 of the centered items) having their primary loadings on the intended component (the factor loadings of the uncentered items are available in the Online Supplement). The polarity of all the loadings was analogous to that reported for the original version (pp. 130-131), as well as the item-keying (i.e., given the positive loadings of the positive-keyed items, the negative-keyed items had a negative loading and vice versa). Items bfi2_open_1_rec and bfi2_extr_2 had very low loadings altogether (λ ≤ .18), which was not surprising given their insufficient MSA coefficients. For the remaining items that did not display the expected pattern of loading, the difference between the primary and secondary loadings (i.e., the loadings on the intended component) was small [Δs (.01-.08)]. The extracted components could account for 42% of the item variance. Taken together, the results support the underlying assumption (Soto & John, 2017) that the withincentering of the items eliminates some of the acquiescence variance.

Discussion
The present study provides an adaptation of the BFI-2 in the Croatian language and assesses its psychometric properties. The translation process consisted of three separate steps: (1) primary translation of the original BFI-2 (Soto & John, 2017), using the German adaptation (Danner et al., 2019) as an aid; (2) comparison and modification in relation to the Croatian translation created for ISP (Baranski et al., 2020); and (3) backtranslation and further modification of the items. Every single item of our adaptation was extensively discussed and deliberated, with valuable input from a professional  translator. The resulting adaptation of the BFI-2 was then tested on 320 adult Croatian participants. The collected data were subsequently used to estimate the psychometric quality through different analyses that were for the most part identical to those conducted for the original English version.

Item Characteristics
The descriptive analyses of items yielded no noticeable discrepancies that were relevant for the subsequent analyses. The examination of the item correlations indicated a substantive level of interitem correlation (KMO = .84; MSAs ≥ .65) for all but two items (bfi2_o-pen_1_rec and bfi2_extr_2), which consequently yielded low PCA loadings (λs ≤ .18). The cause of the poor properties of these items could be contextual. Our effort to deliver the most literate translation possible could have come at the expense of the appropriate item wording for the general population and thus unambiguous interpretation across individuals. This may especially apply to the item bfi2_extr_2 "Ima asertivnu osobnost" ["Has an assertive personality"] given it consists of two words (i.e., assertive and personality) that are essentially technical terms rather than words commonly used. It is therefore possible that this item was incomprehensible to some of our participants. If so, the variance of this item would not capture the individual differences in assertiveness but rather something else that is not substantial and could henceforth not covariate with the rest of the items. We, therefore, propose an initial, yet unvalidated, alternative translation that addresses these issues. Since there is no single Croatian word for "assertiveness," other than the anglicism (i.e., "asertivnost") already used in this item, we propose an item consisting of two adjectives that combined capture the essence of this trait: "Je samosvjestan I samouvjeren," which translates to "Is self-conscious and self-confident." On the other hand, the issue with the item bfi2_o-pen_1_rec "Ima malo umjetničkih interesa" ["Has few artistic interests"] is somewhat less apparent. Once again, the focus on the fidelity to the original item may have resulted in a slightly uncommon phrasing of the item and thus possible confusion. To preserve the term "artistic interests" rather than "interest in art," we were bound to use the word "malo" ["little"] as the only possible translation of the word "few," thereby possibly failing to convey its intended meaning ("some but not many") and thus created additional interpretation related variance. We, therefore, propose an alternative, yet unvalidated, translation that may be more appropriate: "Se ne zanima pretjerano za umjetnost" ["Is not overly interested in art"]. It is important to emphasize that regardless of the results that suggest poor properties of these items and alternative translations provided, it does not mean that these results are not sample-specific and that these items are fundamentally inadequate. For instance, it is likely that in a sample of psychology students, the item bfi2_extr_2 would perform better than the alternative, given that such a sample would be familiar with the terminology (i.e., assertive personality). We, therefore, recommend using both versions for future data collection and comparing the results for a more definite appraisal.
We then examined the descriptive statistics and gender differences of the Croatian BFI-2 facet and domain scores. The results for the complete sample showed that most facets and domains had relatively high scores paired with somewhat limited variance. We tested measurement invariance across gender to examine mean differences in the facet and domain scores. Comparing means across groups was possible for 11 of 20 tested models, as for these models, scalar invariance held (Sass, 2011;Vandenberg & Lance, 2000). Overall, we observed gender differences that are in line with previous findings reported for the original BFI-2 (Soto & John, 2017, p. 128).

Reliability
The reliability estimates of the scales were, as already reported, satisfactory. Although some facets had comparatively low internal consistency estimates, the reliability of the domains was good. Note, however, that the internal consistency estimates are based on four items only and that the test-retest correlations may be higher. Moreover, a similar pattern could be observed for the German (Danner et al., 2019), Norwegian (Føllesdal & Soto, 2022), and Slovak (Halama et al., 2020) and Spanish (Gallardo-Pujol et al., 2022) adaptations, making it a common difficulty of cross-cultural adaptation of such questionnaires, rather than an issue specific to this adaptation. The scales may therefore be used for research purposes and the examination and comparison of groups, although they are most likely insufficiently reliable for individual-level analyses (Emons et al., 2007).

Structural Validity
The results of the series of CFAs provided evidence for the goodness of the Big Five model as the underlying theoretical structure of the data. This means that the translated version of the BFI-2 assesses five factors with three facets each. Although some fit indices were lower compared to the traditional standards by Hu and Bentler (1999), they were nevertheless comparable to the results provided for the original version (Soto & John, 2017, p. 133) and in line with expectations when fitting complex models to personality self-report data (Hopwood & Donnellan, 2010). Two items, bfi2_agre_3_rec "Je sklon kritiziranju drugih" ["Tends to find fault with others"] and bfi2_agre_6 "Je pomirljive naravi, lako oprašta" ["Has a forgiving nature"], had cross-loadings on other factors: Both items belong to the Trust facet but displayed a secondary loading on the Respectfulness facet. Although post hoc explanations are possible, it is essentially unclear why these cross-loadings emerged. Yet, we believe this neither prevents one from using the facet scores (given the good model fit of both facets, Trust and Respectfulness), nor the domain score of Agreeableness, to which both items belong. However, future applications of this adaption may inspect if this also occurs in other samples, which in the long run, might call for item revision.

Acquiescence
The results of the principal component analyses provide further evidence for the robust five-factor structure of the Croatian BFI-2 items and the different ways of dealing with acquiescence. For item-level analyses, the within-centering of the items seems adequate. For facet-level analyses, the items can be averaged to form a composite reflecting the facet scores. Finally, the CFAs also show that including the acquiescence factor increases model fit (as do correlated residuals of the same-keyed items in measurement models). Overall, this series of analyses show that acquiescence occurs in the data but can effectively be dealt with using the approaches implemented by Soto and John (2017).

Limitations and Future Directions
Importantly, this study does not provide any convergent or discriminant validity evidence with respect to other (personality) self-reports or other modes of assessment (e.g., informant reports). However, the facets of the same domain (i.e., within-domain) correlated, on average, higher among each other than they did with facets of other domains (i.e., between-domain), which provides some evidence of discriminant validity. Furthermore, the factor structure emerged as expected, providing evidence for structural validity. Finally, the gender differences, as far as they could be interpreted, were also in line with previous research, which further corroborates our conclusion that the adaptation was successful. Therefore, it can safely be concluded that this translated version captures the Big Five personality traits reasonably well. Nevertheless, the expansion of the nomological net of these traits by adding criterion measures (e.g., important life outcomes, such as income, well-being, health) or further convergent and discriminant measures (e.g., other personality inventories in Croatian, such as IPIP Big Five markers [Mlačić & Goldberg, 2007] and peer reports) is desirable in the future research. This study allows for this by providing a welltranslated and validated version of the BFI-2, including open data and open materials for further analysis. As the scale is increasingly used and applied, the evidence regarding its usefulness and practicality will continue to accumulate. We anticipate that this open access version will expedite this process and facilitate wider adoption.

Conclusion
The present study reports the process of translation of a Croatian adaptation of the BFI-2 and the evaluation of its psychometric properties. Based on the results of descriptive and correlational analysis, principal component, and CFA, as well as reliability estimates, this study provides evidence for the structural, construct validity, and internal consistency of the scores consistent with those reported for the original English version and the Big Five factor model. Thus, the current adaptation of the BFI-2 along with its corresponding open access materials and data could be useful and suitable for research in the Croatian population and international, cross-cultural research.
thank Dubravka Marić for her professional input and support in the translation process. Any errors that remain are solely the authors' responsibility.

Open Science
Open Data: The information needed to reproduce all of the reported results is available at https://osf.io/ha268/ (Hausding & Horstmann, 2023).
Open Materials: The information needed to reproduce all of the reported methodology is available at https://osf.io/ha268/ (Hausding & Horstmann, 2023).
Preregistration and Analysis Plan: This study was not pre-registered.