Introduction

In recent decades, mentalization (or mentalizing), also referred to as reflective functioning (RF), has emerged as a prominent empirical topic, steadily garnering heightened attention and interest. The term “mentalization” was first used in the 1960s and 1970s as a clinical psychoanalytic concept mostly related to psychosomatic states and conditions (Bion, 1962; Marty, 1991). More recently, it was defined as “the mental process by which an individual implicitly and explicitly interprets the actions of himself and others as meaningful based on intentional mental states such as personal desires, needs, feelings, beliefs, and reasons” (Bateman & Fonagy, 2004, p. XXI). This process is regarded as pivotal in both emotional and cognitive development. It is closely linked to issues such as aggressiveness, delinquency, substance abuse, and various mental disorders (for reviews, see Luyten et al., 2020, Johnson et al., 2022; see also Chevalier et al., 2023).

During the early 1990s, Peter Fonagy and his colleagues developed a tool aimed at assessing individuals’ capacity to reflect on their attachment experiences. This tool, known as the Reflective Function Scale (RFS; Fonagy et al., 1998), remains the gold standard in mentalization research, renowned for its remarkable reliability, validity, and well-established structure (Tauber et al., 2013). As stressed by some authors (e.g., Hüwe et al., 2023), the RF scale is very demanding, as it requires training and certification. The Reflective Functioning Questionnaire (RFQ; Fonagy et al., 2016) was expected to address these concerns, offering promise due to its concise format of only eight items and the convenient scoring system available online.

Recent research by Müller et al. (2022) highlights significant issues with the validity of the RFQ-8. While it aims to assess individuals’ ability to understand both their own and others’ behavior through intentional mental states, it predominantly focuses on self-understanding rather than providing a comprehensive view of both self and others. Consequently, the RFQ-8 fails to capture the multifaceted nature of mentalization adequately. Moreover, its two subscales only address extremes of certainty and uncertainty, representing a limited portion of the broader mentalization spectrum. Further analysis reveals a unifactorial structure, lacking a bifactorial composition. In response, Horváth et al. (2023) introduced the RFQ-7, offering a streamlined questionnaire and innovative scoring system to address these limitations. This unidimensional tool includes a dimension of hypomentalizationFootnote 1, spanning from low to high uncertainty levels.

A large body of studies demonstrated that poor mentalization is associated with several mental disorders, including borderline and antisocial personality disorders (Fonagy et al., 2016; Perroud et al., 2017), depression (Luyten et al., 2012), and eating disorders (Pedersen et al., 2015; Skårderud, 2007a, b; see also Fonagy et al., 2016). Additionally, a deficit in the capacity to “hold mind in mind” is associated, among others, with substance abuse (Allen et al., 2008; Lecointe et al., 2016; Möller et al., 2017; Suchman et al., 2018), gambling disorder (Ciccarelli et al., 2021, 2022a, b; Cosenza et al., 2019; Lindberg et al., 2011; Nigro et al., 2019; Spada & Roarty, 2015), as well as with other forms of out-of-control behaviors, such as sexual (Berry & Berry, 2014) and food addiction (Innamorati et al., 2017).

However, it remains unclear to what degree these associations specifically indicate a deficit in self-understanding rather than a broader deficiency in mentalizing. The RFQ-8, with its focus predominantly on self-awareness, save for one item, leaves ambiguity regarding whether the aforementioned connections primarily signify a lack of self-comprehension or a general impairment in mentalizing abilities (Müller et al., 2022; see also Müller et al., 2023).

Other self-report measures assessing mentalization have emerged concurrently with the Reflective Functioning Questionnaire. The first was the Mentalization Questionnaire (MZQ; Hausberg et al., 2012), a 15-item scale featuring four subscales. However, its reliability ranges between 0.57 and 0.68, and its Italian translation diverged notably from the original version, indicating a potentially unidimensional structure rather than the intended four-dimensional framework (as observed in Ponti et al., 2019). Subsequently, the Mentalization Scale (MentS; Dimitrijevic et al., 2018) gained widespread international use and consistently demonstrated robust performance with minimal observed shortcomings.

The Mentalization Scale (MentS) consists of 28 self-report items, utilizing a 5-point Likert scale from completely agree to completely disagree. Elevated scores indicate a more advanced capacity for mentalization. Typically, respondents take approximately 10 min to complete the assessment.

In the extensive community sample utilized for validation, MentS showcased robust reliability (Cronbach’s Alpha = 0.84), with subscale reliabilities at 0.76 (MentS-S) and 0.77 (MentS-O and MentS-M). It demonstrated commendable whole-scale reliability and strong convergent-discriminant validity by exhibiting meaningful correlations with related constructs and fundamental personality traits. Moreover, the scale effectively differentiated between individuals with borderline personality disorder and controls, revealing significant distinctions. While the clinical sample exhibited acceptable internal consistencies across all subscale scores, MentS-M presented a deviation (for specific details, refer to Dimitrijevic et al., 2018).

Since its introduction in 2018, the MentS scale has garnered considerable attention, prompting validation studies across various linguistic contexts. These studies have encompassed translations into Chinese (Wen et al., 2022), Farsi (Ahmadian & Ghamarani, 2021), Persian, and Iranian (Ahmadian & Ghamarani, 2021; Asgarizadeh et al., 2023), as well as Japanese (Matsuba et al., 2022), Korean (Surim & Munhee, 2018), Polish (Jańczak, 2021), and Turkish (Törenli Kaya et al., 2023). Currently, efforts are underway for translations into Catalan, German, and Spanish.

Furthermore, the scale has been employed in various peer-reviewed research studies conducted across multiple languages, including French (Francoeur et al., 2020), Hindi (Bhola & Mehrotra, 2021), Hungarian (Fekete et al., 2019), Lithuanian (Gervinskaitė-Paulaitienė et al., 2023), Norwegian (Brattland et al., 2022), and Serbian (Berleković & Dimitrijević, 2020).

Several studies suggest that the MentS serves as a valid and reliable self-report tool for assessing mentalization. Its effectiveness extends to efficiently assessing sizable community samples and proving advantageous in clinical research. Across these studies, the three-factor structure of the MentS consistently emerged, with a few individual items occasionally loading on unintended factors. Test-retest reliability coefficients typically ranged from 0.68 to 0.85. Cronbach’s alphas for the overall scale varied between 0.73 and 0.86, except for the Turkish version where it was 0.63. Subscale scores ranged predictably lower yet remained between 0.74 and 0.80. Notably, the correlation between scores from the Reflective Function Scale (RFS) and MentS was 0.65 (p < 0.01), indicating a significant relationship. Moreover, correlations between individual subscales were observed to vary between 0.41 and 0.56, remaining statistically significant across all three cases (Richter et al., 2021).

Considering the substantial evidence supporting its robustness and utility, we opted to validate the MentS in Italian. This paper presents comprehensive details of our validation study.

Overview of studies

The current research aimed to explore the psychometric properties of the Italian adaptation of the MentS through two studies. Initially, the MentS underwent translation into Italian, following the meticulous procedure outlined by Beaton et al. (2000), involving forward and backward translation, as well as pilot testing. Participants were drawn from both adult and adolescent populations.

Study 1 focused on assessing the construct validity of the Italian version of the MentS in adolescents and adults, utilizing exploratory and confirmatory factor analyses. In Study 2, the convergent validity and temporal stability of the Italian MentS were examined. Specifically, Study 2 delved into evaluating the correlation between the MentS and the Reflective Functioning Questionnaire (RFQ-8; Fonagy et al., 2016) in a substantial cohort of high-school students. Additionally, it gauged the 4-week test-retest reliability of the instrument among undergraduates. For both studies, we have reported the actual number of participants. Incomplete questionnaires (approximately 2% for both samples) were excluded from the final samples.

Consistent with the original MentS version, our expectation was to reproduce the scale’s three-dimensional structure and to observe gender differences in both adult and adolescent samples. In addition, we expected significant correlation between MentS and RFQ-8 scores among the adolescent sample.

All studies were carried out in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Department of Psychology of the first author’s university. Before participation, all subjects provided informed consent. For minors, informed consent was obtained from parents.

Study 1

Method

Participants

In Study 1, a total of 1338 participants from both adolescent and adult cohorts were involved. The adolescent subset encompassed 618 high school students (48.5% boys; Mage = 17.67; SD = 0.53) attending various public high schools, including lyceums and technical and trade schools in Southern Italy. These participants were randomly divided into two groups of equal size, with the first group (151 boys and 158 girls; Mage = 17.61; SD = 0.70) used for the exploratory factor analysis (EFA) and the second group (149 boys and 160 girls; Mage = 17.73; SD = 0.65) for the confirmatory factor analysis (CFA).

The adult cohort consisted of 720 volunteers (42.4% men), ranging in age from 20 to 65 years (Mage = 38.28; SD = 14.69), recruited from a community-based population. Like the adolescent group, this sample was randomly split into two equivalent groups. The first adult group (151 males and 209 females; Mage = 39.17; SD = 14.36) underwent exploratory factor analysis (EFA), while the second group (154 males and 206 females; Mage = 37.40 years; SD = 14.98) participated in the confirmatory factor analysis (CFA).

Inizio modulo.

Measures

The Mentalization Scale. The MentS is structured into three distinct subscales, namely: The Self-related Mentalization subscale (MentS-S), the Other-related Mentalization subscale (MentS-O), and the Motivation to Mentalize subscale (MentS-M).

The MentS-S scale comprises eight items that center on the individual’s perception of their ability to comprehend their own mental states (e.g., 18. “I find it difficult to admit to myself that I am sad, hurt, or afraid”; 22. “It is difficult for me to find adequate words to express my feelings). The MentS-O dimension consists of ten items aimed at gauging the individual’s confidence in understanding the mental states of others (e.g., 10. “I can make good predictions of other people’s behavior when I know their beliefs and feelings”; 20. “I can describe significant traits of people who are close to me with precision and in detail”). Finally, the MentS-M subscale encompasses ten items aimed at assessing the individual’s inclination towards utilizing their capacity for mentalizing and how significant this mentalizing ability is to them (e.g., 7. “When someone annoys me, I try to understand why I react in that way”; 17. “I like reading books and newspaper articles about psychological subjects”).

Statistical analyses

Data analyses were performed using IBM SPSS version 29.0. The significance threshold was set at p < 0.05. Initially, all variables underwent scrutiny for missing data, distribution irregularities, and outlier identification. Univariate analysis of variance (ANOVA) was employed to examine gender differences in the data.

For both the adolescent and adult samples, scores obtained from the Italian version of the MentS underwent a principal components analysis followed by Oblimin rotation with Kaiser normalization. Before conducting the analyses, three key indices, as recommended by Field (2013), were assessed to ensure the data’s suitability for factor analysis. These included the Kaiser-Meyer-Olkin measure (KMO) for sampling adequacy, the determination of the correlation matrix to detect multicollinearity, and the Bartlett’s test of sphericity. Bartlett’s test specifically evaluates the null hypothesis that the original correlation matrix is an identity matrix (Field, 2013, p. 695). Confirmatory factor analysis was carried out utilizing the Eq. 6.2 software program designed for structural equation modeling, as detailed by Bentler (2008).

Inizio modulo.

Results

Initially, to explore potential gender-based differences in MentS scores, univariate ANOVA was employed. As expected, results from both the adolescent and adult samples revealed noteworthy disparities: males attained significantly higher scores in the Self dimension, whereas females exhibited superior performance in the Others and Motivation dimensions, along with the overall MentS score.

Table 1 displays descriptive statistics for the entire samples as well as breakdowns by gender, along with Cronbach’s alpha values and the results of the univariate ANOVA. The reliability of the MentS subscales was confirmed for both the adolescent and adult samples, as evidenced by the Cronbach’s alpha values reported in Table 1.

Table 1 Study 1– Descriptive statistics, reliabilities, and gender differences

It is important to note that we initially calculated Cronbach’s alpha coefficients to facilitate comparisons with the original version of the MentS and its subsequent adaptations. However, to obtain a more refined and accurate measure of reliability, we also computed the omega coefficient (ω; McDonald, 1999) for each subscale and the total score. It is common knowledge that this coefficient goes beyond Cronbach’s alpha by incorporating both item factor loadings and uniqueness, resulting in a more nuanced and precise estimation of reliability. As Table 1 shows, the values of internal consistency as measured by omega coefficients were good for the full scale and the subscales of the MentS.

Exploratory factor analysis

In both samples, the Kaiser-Meyer-Olkin (KMO) values were notably high (Adolescents = 0.822; Adults = 0.861). The determinant of the correlation matrix was consistent at 0.001 for both groups, and Bartlett’s test of sphericity returned significant results (Adolescents: χ2(378) = 2016.64; p < 0.001; Adults: χ2 (378) = 2378.76; p < 0.001). These outcomes signified sufficiently large correlations, supporting the suitability of the data for Principal Component Analysis (PCA).

In each case, the determination of retained factors relied on parallel analysis, conducted using the SPSS syntax developed by O’Connor (2000). Parallel analysis consistently indicated a three-component solution as the most appropriate for both samples.

In the adolescent sample, three factors collectively explained 35.95% of the variance. The first factor accounted for 18.29% of the variance, the second factor accounted for 11.55%, while the third factor explained 6.11% of the variance. Table 2 presents individual item loadings on these retained components. Notably, the first factor encompassed the ten items of the MentS Others subscale, the second factor comprised the eight items of the MentS Self scale, and the third factor consisted of the ten items of the MentS Motivation scale.

Table 2 Pattern matrix for the Mentalization Scale (MentS)– Adolescent sample

Inizio modulo.

Regarding the adult sample, the three-factor solution accounted for a cumulative variance of 36.07%. The distribution of variance across these factors was as follows: the first factor explained 21.41%, the second factor accounted for 9.27%, and the third factor elucidated 5.39%. These relationships are detailed in Table 3, which displays the subscale loadings across the three dimensions. Factor loadings distinctly revealed the composition of each factor: the first factor encapsulated the ten items of the MentS Others subscale, the second factor encompassed the eight items of the MentS Self scale, and the third factor included the ten items of the MentS Motivation dimension.

Table 3 Pattern matrix for the MentS– Adult sample

Confirmatory factor analysis

Confirmatory Factor Analysis (CFA) utilizing maximum likelihood estimation was employed to examine the reproducibility of the proposed factor structure outlined by Dimitrijević et al. (2018) in adolescent and adult Italian samples.

In both samples, three models underwent testing. The initial model was a one-factor structure where all items were anticipated to load onto a single factor. The second model consisted of three factors with no correlation among them, while the third model allowed for intercorrelation between the factors.

Each model’s goodness of fit was assessed using various measures: the likelihood ratio chi-square test statistic, adjusted for data nonnormality using Satorra and Bentler’s method (1994; S-B χ2), alongside four descriptive fit indices: standardized root-mean-square residual (SRMR), root-mean-square error of approximation (RMSEA) with its 90% confidence interval (90% CI), goodness of fit index (GFI), and comparative fit index (CFI). Considering the sensitivity of the χ2 statistic to sample size (MacCallum, 1990; Marsh et al., 1988), interpretations of model fit were guided by a range of fit indices. Adequate model fit was identified by a non-significant S-B χ2, GFI, and CFI values of 0.90 or higher, as well as an RMSEA less than 0.08.

Table 4 presents the model fit statistics for the three models across both groups. The model exhibiting the highest GFI and CFI estimates while displaying the lowest RMSEA and SRMR values was considered the most suitable or best-fitting model.

Table 4 Confirmatory factor analysis fit indexes for alternative models

Discussion

The study aimed to assess the reliability of the MentS and examine its factor structure within substantial cohorts of adolescents and adults. Reliability analysis demonstrated that the MentS subscales exhibit good internal consistency. Moreover, outcomes from exploratory factor analysis notably indicated that the three-factor model appropriately captured a substantial proportion of variance, reflected in strong factor loadings.

In line with Dimitrijević et al. (2018), both exploratory and confirmatory factor analyses consistently supported a three-factor structure for the Italian version of the MentS across adolescent and adult populations. Additionally, as anticipated, gender differences in MentS scores were observed in both samples. Specifically, males attained significantly higher scores in the Self dimension, while females reported higher scores on the Others and Motivation dimensions, as well as on the overall MentS score.

Study 2

Study 2 was undertaken to test the convergent validity and temporal stability of the Italian version of the MentS. A large sample of adolescents were administered the MentS and the Reflective Functioning Questionnaire (RFQ-8; Fonagy et al., 2016; Italian version for adolescents: Cosenza et al., 2019; see also, Bizzi et al., 2022). Furthermore, the test-retest reliability of the instrument was evaluated using a 4-week interval between measurements on a sample of undergraduate students.

Method

Participants

Four hundred and seventy-two adolescents (44.1% males), aged between 16 and 19 years (Mean age = 17.63; SD = 0.72), participated in this study. They were administered the Italian versions of the MentS and the RFQ-8. The RFQ-8, an eight-item self-rating questionnaire, is specifically designed to assess reflective functioning. Respondents rate items on a seven-point Likert scale, ranging from 1 (strongly disagree) to 7 (strongly agree). The questionnaire comprises two subscales that tap into distinct mental processes: Certainty about mental states (RFQ_C) and Uncertainty about mental states (RFQ_U). Low agreement on the RFQ_C scale denotes a tendency toward excessive yet inaccurate mentalizing (hypermentalizing), while higher agreement signifies a more authentic mentalizing approach. Similarly, very high scores on the RFQ_U indicate a near absence of knowledge about mental states (hypomentalizing), whereas lower scores reflect recognition of the complexity of one’s own and others’ mental states, indicative of genuine mentalizing.

Zero-order correlations between the three dimensions of the MentS and the two subscales of the RFQ-8 were computed.

Additionally, a new sample of 128 undergraduates (24.2% males), aged between 20 and 29 years (Mean age = 21.22; SD = 1.56), completed the MentS twice to assess the scale’s 4-week test-retest reliability.

Results

Results showed a strong positive correlation between MentS-Self and RFQ-8 Certainty scale scores (r = 0.44; p < 0.001), as well as a significant negative association between MentS-Self and RFQ-8 Uncertainty scale (r = -0.43; p < 0.001).

As for the temporal stability, the Italian version of MentS demonstrated an acceptable 4-week test-retest reliability for the three dimensions of the instrument, as well as for the full scale (MentS-Self: r = 0.63; p < 0.001; MentS-Others: r = 0.65; p < 0.001; MentS-Motivation: r = 0.63; p < 0.001; MentS full scale: r = 0.83; p < 0.001).

Discussion

Study 2 aimed to assess both the convergent validity and temporal stability of the Italian version of the MentS among adolescents and undergraduates, respectively. The results highlighted the scale’s good convergent validity with the Reflective Functioning Questionnaire (RFQ-8) and demonstrated reliable (4-week) test-retest reliability.

While the RFQ-8 is designed to assess an individual’s capacity to understand intentional mental states within themselves and others (Fonagy et al., 2012; Luyten et al., 2020), our study revealed a strong correlation between RFQ-8 scores and MentS-Self scores. This outcome underscores a distinct connection between reflective functioning and self-awareness, while revealing no such association with other dimensions of the MentS. These findings suggest that the RFQ-8 may particularly emphasize the comprehension of one’s own mental states rather than those of others. This aligns with previous research (e.g., Dimitrijević et al., 2018; Müller et al., 2022, 2023), highlighting the significance of introspective abilities in reflective functioning. Additional support for this notion can be found in the work of Li, Carragher, and Bird (2020).

General discussion

In recent decades, mentalization (also known as mentalizing) has surged as a prominent empirical field, steadily gaining heightened attention and interest. Imbalance in the ability to perceive and interpret both the self and others’ behavior in terms of intentional mental states, such as thoughts, feelings, desires, wishes, goals, and attitudes (Fonagy et al., 2012), has received significant attention over the past years (for a review, see Luyten et al., 2020).

Research exploring the significance of mentalization in psychopathology is rapidly expanding, reflecting an increasing interest in comprehending its implications. The present studies contributed to this ongoing line of research by developing and testing an Italian version of the MentS scale, a 28-item self-report measure of mentalization.

An initial measurement study (Study 1) employing exploratory and confirmatory factor analyses on large samples of adolescents and adults revealed support for the three-correlated factors model postulated by Dimitrijević et al. (2018). Study 2 was devoted to testing the convergent validity and temporal stability of the MentS. The results obtained from a sample of adolescents demonstrated that the MentS shows a good convergent validity with the Reflective Functioning Questionnaire (RFQ-8). In addition, results from a sample of undergraduates showed that the Italian version of MentS demonstrates good test-retest reliability for the three dimensions of the instrument and the full scale.

Notably, in all studies, we observed significant differences in MentS scores due to gender. In both adult and adolescent samples, male participants scored significantly higher on the dimension MentS-Self, but significantly lower on the subscales Motivation and Other, respectively, as well as on the MentS total score. This outcome aligns seamlessly with the conclusions drawn by Dimitrijević et al. (2018), which highlighted a superior proficiency in understanding one’s mental states among males. Conversely, females exhibited greater confidence in grasping the mental states of others and demonstrated their need to understand the psychic world of self and others. Our findings suggest that gender affects mentalization, albeit in a differentiated manner depending on the specific dimension under consideration. The utilization of a multidimensional measurement approach enabled the capture of crucial nuances that might have otherwise been overlooked. As emphasized by Krach et al. (2009), the longstanding hypothesis that women differ from men in their mentalizing abilities underscores the importance of employing measurement tools capable of capturing diverse facets of mentalization when evaluating gender differences.

Limitations and future research

While the current studies advanced work on instruments assessing mentalization, at least two limitations should be considered. Firstly, our studies rely on convenience samples. Secondly, the estimation of test-retest reliability was conducted on an undergraduate sample with a notably higher percentage of females than males, potentially affecting the representation of gender in the results.

Future research evaluating the extent to which the three subfactors differentially predict outcomes in substantive domains is desirable. The use of a multidimensional instrument, such as the MentS, could help in clarifying this relevant issue and test intervention strategies focused on recovering the capacity to understand others and oneself in terms of internal mental states, always bearing in mind the various dimensions of mentalization. Furthermore, future research ought to persist in examining gender differences associated with mentalization across both normative and clinical populations, encompassing not only adolescents and adults but also older individuals, a demographic that has received limited attention in previous studies.