Key insights from studies on the stability of personality disorders in different age groups

While for decades, temporal stability has been conceived as a defining feature of personality disorders (PDs), cumulative findings appear to question the stability of PDs and PD symptoms over time. However, stability itself is a complex notion and findings are highly heterogenous. Building upon a literature search from a systematic review and meta-analysis, this narrative review aims to capture key findings in order to provide critical implications, both for clinical practice and future research. Taken together, this narrative review revealed that unlike previous assumptions, stability estimates in adolescence are comparable to stability estimates in adulthood and PDs and PD symptoms are not that stable. The extent of stability itself depends yet on various conceptual, methodological, environmental, and genetic factors. While findings were thus highly heterogenous, they all seem to converge in a notable trend towards symptomatic remission, except for high-risk-samples. This challenges the current understanding of PDs in terms of disorders and symptoms and argues instead in favor of the AMPD and ICD-11 reintroducing the idea of self and interpersonal functioning as the core feature of PDs.


Introduction
Traditionally conceived as a defining feature of personality disorders (PDs), stability has quickly become a major concern, adding to the ongoing debate about the procedure of conceptualizing and diagnosing a PD. For decades, temporal stability has been a major factor in distinguishing axis I from axis II disorders with the stability of PDs being considered to be higher than for other mental disorders. Cumulative findings, however, gradually challenged the stability of PDs, indicating a notable trend towards improvement over time (1,2). Unlike previous assumptions, PDs have thus not been found to be much more stable than other mental disorders (3). Nevertheless, stability is a complex notion that should be assessed in the light of several factors (4,5). As such, PDs may be conceptualized in multiple ways including categories, symptom counts, and pathological traits. Similarly, various conceptually and statistically distinct approaches may lead to distinct types of stability. These different types, then, may depend on various methodological factors, such as sampling procedures (i.e., age range, clinical status, follow-up interval), the assessment modality, and the type of instrument being used. As a result, study findings are highly heterogenous, and misconceptions about the course of PDs still seem to remain.
In this narrative review, we capture key findings of the current literature on the stability of PDs across different age groups and critically discuss general implications for both clinical practice and future research. We start by describing different PD constructs and different types of stability, followed by an overview of recent studies in childhood, adolescence, and adulthood. Finally, we emphasize key findings and conclude with general implications.

Personality disorder constructs
PDs can be conceptualized according to different constructs, features, and frameworks. As such, in both the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders [DSM-5 (6)]; and the 10th edition of the International Classification of Disorders [ICD-10 (7)], PDs are defined as discrete categories, each with a distinct set of diagnostic criteria (i.e., either a PD is present or not). Within this categorical system, PDs can also be conceptualized more dimensionally, in terms of symptom counts (e.g., seven out of nine borderline PD symptoms). In recent PD models, such as the Alternative Model of Personality Disorders (AMPD) in section III of the DSM-5 (6), as well as the 11th edition of the ICD (7), PDs are, moreover, perceived in terms of core impairments in personality functioning (i.e., self-, and interpersonal functioning), specified by a set of pathological traits (i.e., extreme variants of normal personality dimensions, such as emotional lability, attention seeking or impulsivity). These different constructs and approaches may naturally affect stability estimates. Although a growing number of longitudinal studies investigate dimensional measures of personality pathology [e.g. (8,9)], previous research has focused primarily on PD categories and PD symptoms counts, except for child-and adolescent studies focusing exclusively on maladaptive personality traits. Therefore, the current review focusses exclusively on DSM and ICD based categorical and symptom-based models.

Different types of stability
Apart from the aforementioned constructs, multiple ways to describe stability over time are common, and stability itself tends to differ according to the type of stability assessed. In the present review, we focus on the two types of stability that have been studied most frequently, namely, mean-level and rank-order stability.
Mean-level stability refers to the degree to which the average level of a PD or PD symptom changes over time. Categorical mean-level stability, also known as diagnostic stability, then refers to the consistency of PD diagnoses, typically measured through the proportion of enduring cases from baseline to follow-up (e.g., four out of ten participants, who were diagnosed with BPD at baseline, still meet the criteria at follow-up, resulting in a categorical mean-level stability of 40%). Dimensional mean-level stability then refers to the consistency of PD symptom counts, usually measured by mean-difference scores (i.e., difference between mean symptom count at follow-up and mean symptom count at baseline).
Rank-order stability, in turn, refers to the consistency of an individual's relative ordering compared to others in a given sample, indicating thus the degree to which interindividual differences are preserved over time. As such, individuals may retain their relative ordering with regard to a specific PD or PD symptom over time, even if the average level of a PD or PD symptom in a given sample increases or decreases over time. Subsequently, rank-order changes are independent of mean-level changes (10). Categorical rank-order stability then refers to the rank-order stability of individuals' PD diagnosis, typically measured with Cohen's κ. While a negative value indicates no agreement, a κ between 0 and 0.20 indicates a low, a κ between 0.21 and 0.40 a fair, and a κ between 0.41 and 0.60 a moderate agreement. A κ between 0.61 and 0.80, then, indicates a substantial agreement, and a κ between 0.81 and 1.0 a perfect agreement (11). Dimensional rank-order stability, in turn, refers to the rank-order stability of an individuals' PD symptom count, commonly measured through a test-retest correlation (e.g., Pearson's r). A r between 0.1 and 0.3 is said to be low, a r between 0.3 and 0.5 moderate, and a r between 0.5 and 0.8 high (12). Another powerful method to assess the stability of PDs over time, consists in using structural equation models. Structural equation models encompass a set of multivariate approaches [e.g., individual growth curve models (13); growth mixture modeling (14)] that allow to distinguish between measurement error and true individual differences related to change processes.

Overview of the current literature review
The literature search for this narrative review was part of a systematic review and meta-analysis, conducted in accordance with the PRISMA standards (15) as well as the MOOSE guidelines (16). The literature search conducted in four electronic databases (EMBASE, PsycInfo, PubMed, and Web of Science) on October 26, 2020, and updated on June 7, 2022 (d'Huart et al., under review). Keywords and Medical Subject Headings (MeSH) terms were used to identify peerreviewed articles reporting on the stability of PDs between 1980 and 2022. In brief, following search terms were used in the literature search: "personality disorders, " "axis II disorders, " "stability, " "consistency, " "longitudinal, " "prospective, " "life span, " and "life course. " Only longitudinal studies, assessing the stability of PDs at two different time points at least 1 month apart, were considered for the current paper. Studies will be presented from a developmental perspective, including childhood, adolescence, and adulthood. A complete overview is given in Tables 1-3.

Childhood
Only two studies to date, namely the studies from Crick et al. (17) and the study from de Clercq et al. (18), have examined the stability of maladaptive personality traits in childhood. While both studies exclusively focused on borderline PD (BPD) traits among communitybased, primary school-aged children, they differed regarding the instrument type and the follow-up period, as described in Table 1.  17) found only moderate dimensional rank-order stability, while de Clercq et al. (18) found substantial dimensional rank-order stability over time. de Clercq et al's (18) findings on dimensional mean-level stability indicated that children's maladaptive trait scores generally decreased as they grow older, with a smaller decline for children who initially had higher levels of maladaptive personality traits.

Adolescence
Overall, ten studies reported data on the stability from adolescence to adulthood (see Table 2). Five studies were from clinical settings (21-23, 26, 27), four studies from community-based samples (19,20,24,25) and one study from a high-risk sample [i.e., young adults with a history of child welfare and juvenile justice placements (10)]. From the studies conducted in clinical settings, two studies (21, 23) were conducted among patients with mixed axis I comorbidities, two studies (22,27) were conducted among previously suicidal youth and one study (26) was conducted among depressed adolescent outpatients. Three studies (20, 22,27) focused exclusively on BPD, while the remaining seven studies focused on any PD or most of the DSM-5 PDs. The follow-up period ranged between 6 months (27) and 10 years (10, 20) and four studies (10, 19, 21, 26) used the Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II), while the remaining six studies (20, [22][23][24][25]27) each used different measurement instruments, as presented in Table 2. Most studies focused on PD symptom counts, with only four studies (10, 19,21,22) investigating PD categories, three studies (10, 21, 27) reporting data on categorical rank-order stability, seven studies (10, 20, 21, 23-26) reporting data on dimensional mean-level stability, and five studies (10, 21, 24-26) reporting data on dimensional rank-order stability. Findings on diagnostic stability included two studies (21, 22) suggesting substantial stability over time and two studies (10, 19) suggesting only moderate estimates over time. Findings on categorical rank-order stability included two studies (9,14) indicating moderate categorical rank-order stability for any PD and low to high categorical rank-order stability for individual PD diagnoses, and one study (23) suggesting low categorical rank-order stability for a BPD diagnosis. Findings on dimensional mean-level stability, however, consistently indicated significant decreases for most of PD symptoms over time (20, 21, 23-26). Only one study (10), revealed significant increases for most of PD symptoms over time. The authors concluded that this finding may be explained by the nature of the high-risk sample, as many adolescents in the child welfare and juvenile justice system have experienced severe childhood adversities (e.g., child abuse and neglect) as well as a range of other critical risk factors (i.e., unfavorable parenting practices, low socioeconomic status, childhood psychopathology, self-harming behavior, and youth delinquency) which all have been shown to be significantly associated with the stability of PDs over time. Finally, findings on dimensional rank-order stability revealed highly heterogenous patterns, with three studies (10, 25, 26) ranging from low to moderate, one study (21) ranging from low to high, and one study (24) ranging from moderate to high, depending on PD types.

Insights from the current literature review
Six key findings emerged from the current literature review, which warrant a more detailed discussion.

Stability estimates in adolescence are comparable to those in adulthood
Although research focusing on adolescence has substantially increased over recent years, the number of studies assessing the stability of PDs in childhood and adolescence still appears to be low when compared to studies in adulthood. Part of this may be due to the widespread reluctance to diagnose PDs in adolescence because of the stigma associated with the disorder (56, 57) and the belief that personality in adolescence itself is driven by strong emotions and impulsive behavior (58,59). Yet recent literature clearly indicates that PDs can be validly and reliably diagnosed prior to the age of 18 years (58-60) and that the stability in adolescence is comparable to that in adulthood. Nevertheless, while maladaptive personality traits can be found as early as childhood, it is reasonable to assume that more severe forms of PDs only become clinically apparent in later adolescence, when individuals have acquired skills to integrate knowledge about themselves and others into a coherent selfidentity (61).

Except for high-risk samples, most PD diagnoses and PD symptoms tend to decrease over time, regardless of age
Although most studies largely differed in terms of methodological and conceptual factors, they all seem to converge in the fact that most PD categories (i.e., diagnostic stability) and PD symptoms (i.e., dimensional mean-level stability) decrease over time, while individuals' rank-ordering (i.e., dimensional rank-order stability) seems to persist. Specifically, studies on the diagnostic stability, overall revealed that many individuals diagnosed with a PD at baseline are likely to not fulfill diagnostic criteria at follow-up. This is most notable, highlighting one of the major shortcomings of the categorical PD  Frontiers in Psychiatry 08 frontiersin.org system for specific PDs in being based on an arbitrary diagnostic threshold that can easily be met (diagnosis PD) or unmet (no diagnosis PD) by an increase or decrease in a single criterion. This, indeed, favors diagnostic instability, while minor changes in the pathology remain unidentified and the subclinical expression of the individual's symptoms may remain high (62). Thus, the diagnostic stability of specific PDs appears to be a rather inappropriate measure to assess the stability of PDs over time, as a categorical scaling leads to a substantial loss of information. This shortcoming could be in part compounded by looking at the stability of any PD (including PD NOS) rather than the diagnostic stability of specific PDs. As such, it may be that patients change specific categorical diagnoses but fait to discard the general diagnosis of any PD. Studies on dimensional mean-level stability mostly suggested considerable declines of PD symptoms over time. Although one might think that this is mainly due to treatment effects (63) significant decreases were also found in community-based samples, which suggests a rather natural improvement. While in healthy personality research, mean trait levels tend to change toward increasing maturity in community based settings over time [i.e., decrease in neuroticism, increase in extraversion, agreeableness, and conscientiousness (64)], this might be true for PD traits too. Indeed, the findings of Wright et al. (65), showed that decreases in avoidant PD traits were associated with increases in dominance and warmth and decreases in neuroticism. Studies on dimensional rank-order stability, however, generally indicated moderate to high stability estimates, meaning that individuals who exhibited high levels of a specific PD symptom at one time point also showed relatively high levels of that symptom at a second time point. Taken together, the mean-level of PDs and PD symptoms tends to decrease over time, regardless of participants' age. Participants' rank-ordering, however, tends to persist.

Stability estimates tend to vary with respect to study-specific factors
The extent of stability, nonetheless, considerably differed across studies, depending on the PD construct (i.e., categorical diagnoses or dimensional symptoms), the type of stability (i.e., diagnostic, meanlevel or rank-order stability), and the specific PD and PD symptom being assessed. In addition, studies differed largely with respect to methodological factors, which yet again, influenced stability estimates. As such, at least six different findings must be emphasized: (a) stability estimates tend to be considerably higher when PDs are assessed dimensionally (i.e., PD symptom counts or PD traits) compared to PDs assessed categorically (PD categories). For instance, the study from Durbin and Klein (33) suggested poor to fair stability estimates for PD categories, while the stability for dimensional PD symptoms were found to be fair to moderate; (b) dimensional rank-order stability estimates seem to be higher than dimensional mean-level stability estimates, meaning that PD symptoms tend to decrease on average, while individual's rank-ordering in a given sample remains almost the same (33, 66); (c) dimensional stability estimates appear to be higher for self-reported PD symptoms than for interview-based PD symptoms (33,35,38,67). As such, Lenzenweger (38) found smaller 4 years dimensional rank-order stability estimates for interview-based PD symptoms (r = 0.61) than for self-reported symptoms (r = 0.70). Consistently, Durbin and Klein's (33) stability estimates were 0.49 for interview-assessed symptoms and 0.69 for self-reported symptoms; (d) shorter sampling intervals will generally result in higher stability estimates compared to longer sampling intervals. For instance, dimensional mean-level changes in the Collaborative Longitudinal Personality Disorders Study [CLPS (68)]; were described as "small" at a 2 years follow-up, "medium" at a 4 years follow-up, and "large" at a 10 years follow-up interval; (e) in terms to the type of PD being assessed, cluster B PDs seem to be generally more stable than cluster A and C PDs (25); (f) PD patients in clinical settings seem to attain symptomatic remission more quickly than those from communitybased samples. According to Morey and Hopwood (4), one possible reason could be that in clinical samples, participants are often drawn from treatment settings, targeting clinical remission. Therefore, participants in clinical settings tend to show faster declines (i.e., lower stability) compared to other settings. In sum, the extent of stability considerably differs according to the PD type and construct, the type of stability being assessed and several methodological factors, such as the assessment modality, sampling interval, and clinical setting.

Stability estimates tend to vary with respect to environmental and genetic factors
In addition to conceptual and methodological factors, stability estimates, however, also seem to vary as a function of environmental and genetic factors. According to behavioral genetics research, individuals may be genetically predisposed to exhibit more or less stable personality traits. In other words, an individual's overall score of PD symptoms as well as the extent to which this individual exhibits symptomatic change is strongly heritable (20). Yet individuals evolve within specific environments which may considerably affect stability estimates. As such, the study from Reichborn-Kjennerud and colleagues (45) indicated that the rank-order stability of ASPD and BPD symptoms was largely due to genetic factors, whereas symptomatic change was due to environmental risk factors. Bornovalova and colleagues (20), in contrast, found that stability and change in BPD symptoms were substantially affected by genetic factors and only modestly by environmental factors. However, the authors point out that the strong influence of genetic factors does not mean that environmental factors are unimportant, but rather indicate that the environment, indeed, is likely to influence gene expression, and emphasize the need for interventions to ensure that the individual's family may serve as a protective factor against the manifestation of pathological traits.

Symptomatic remission does not equate full recovery
Although study findings overall suggest that most PD categories and PD symptoms decrease over the lifespan, it should be kept in mind that a symptomatic remission is not necessarily accompanied by full recovery. Thus, while symptomatic remission is defined as no longer meeting diagnostic criteria for at least 2 years, full recovery is defined as attaining good social and vocational functioning in addition to symptomatic remission.  (55). However, notably, only half of the patients had achieved significant functional improvements over the 16 years follow-up, with some even experiencing relapse or worsened functioning. Accordingly, the authors conclude that good social and vocational functioning is more difficult to attain than symptomatic remission and, therefore, sustained recovery is much less common than sustained symptomatic remission from BPD. A decrease in PD symptoms is thus not necessarily accompanied by an increase in social and vocational functioning.

Studies in high-risk samples are scarce
Finally, studies investigating the stability of PDs in high-risk samples are surprisingly scarce. Thus, only two studies (10, 47) examined stability estimates in high-risk samples, namely in adolescents placed in the child welfare and juvenile justice system (10) and incarcerated adults (47). This is especially striking given that individuals from high-risk samples are particularly at risk for developing a PD. Consistently, both studies (10, 47) suggested substantial increases in PD diagnoses over time (11,45), while clinical and community-based studies overall converged in that most PD diagnoses and symptoms decrease over time.

Implications
Overall, studies suggest that PDs, either assessed categorically or dimensionally, are not as stable as previously assumed. This highlights the need to overcome the clinical assumption that PDs are "enduring, " "pervasive" and "inflexible" over time. This emphasizes that PDs are treatable, and thus, should be assessed and diagnosed prior to the age of 18 in order to provide the best possible outcome later in life. As a consequence, patients as well as clinicians may be cautiously optimistic about the prognosis of a PD. In addition, if PDs and PD symptoms are not as stable as previously thought, this raises the question whether it is still appropriate to consider stability as a central feature of PDs? In other words, is it still reasonable to refer to a PD or PD symptoms, if the concept itself depends on numerous conceptual, methodological, genetic, and environmental factors? Or is it rather the general level of personality functioning (i.e., self and interpersonal functioning), which is conceptually separated from PD categories and symptoms, that actually determines a PD? This issue, in turn, emphasizes the current shift to more dimensional conceptualizations, as defined in the AMPD or ICD-11. In fact, both models introduce a radical change in the structure and diagnosis of PDs, by conceptualizing PDs as core impairments in self-and interpersonal functioning, amplified by a severity ranking and specific trait specifiers related to negative affectivity, detachment, dissociality (i.e., antagonism in the AMPD), disinhibition, and/or anankastia in the ICD-11 and psychoticism in the AMPD. We suggest that moving away from PD categories and PD symptoms helps clinicians to perceive the patient as a whole, by refocusing on the original meaning of personality, that is the subjective experience of what it means to be human (71). This may help to not only see if patients suffer, but also how they suffer. While the classification of severity may help inform clinical prognosis and intensity of treatment, the classification of trait specifiers may help to identify individual problems, resulting in more individualized, tailormade treatments (72,73).
To this date, the literature currently lacks data about the stability of the general level of personality functioning. Although we have reasons to think that it may be more stable, e.g., (12,13), this remains to be proven. We therefore suggest that future research should focus more intensively on personality functioning and specific trait expressions in order to determine whether AMPD's and ICD-11's new conceptualizations clarify the issue of stability over time. Specifically, studies should investigate the course and outcome of personality functioning and pathological personality traits from childhood to late adulthood. Thereby, research should increasingly rely on dimensional assessments and longer follow-up intervals. Future work on the etiological origins of these constructs and the mechanisms by which these constructs evolve over time, will be of great importance. Moreover, future research needs to address methodological factors to prevent unnuanced responses to the complex notion of stability. In fact, researchers still often use the general term "stability" without being explicit regarding the type of stability they are referring to. This is particularly problematic as different types of stability can vary substantially as pointed out in the present review. In addition, future studies should incorporate more sophisticated sampling and statistical procedures to overcome possible limitations. In particular, studies should focus on multi-wave study designs, including multiple measurement points, in order to analyze the shape of each person's individual trajectory and distinguishing true change from measurement error (74). Furthermore, studies of high-risk samples, especially in childhood and adolescence, may be crucial as these children and adolescents are particularly at risk of developing maladaptive personality traits and PD prevalence rates among these samples are alarmingly high. Finally, and most importantly, upcoming research should address genetic, contextual, and situational factors that may influence the course of PDs or personality functioning over the lifespan. After all, while the direction of change is known, the causes of change remain unclear.

Conclusion
In recent decades, research on the stability of PDs has considerably increased, yet it remains a much-debated topic as it is foremost a conceptual and methodological endeavor. This narrative review, however, has highlighted key findings from the current literature, suggesting comparable stability estimates in adolescence and adulthood, with considerable improvement over time. Future work may, eventually, determine whether the new conceptualization will clarify some of the issues related to the stability of PDs. Nevertheless, it should be acknowledged that a symptomatic remission is not necessarily accompanied by a full recovery, with most PD patients never managing to fully participate in society, despite considerable remission. Understanding the process of change is thus particularly important, in order to identify protective factors, that potentially might mitigate long-term impairments. Taken together, these findings Frontiers in Psychiatry 10 frontiersin.org challenge our current understanding of PDs in terms of disorders and symptoms and argue instead in favor of the AMPD and ICD-11 reintroducing the idea of self and interpersonal functioning as the core feature of PDs. This might enable clinicians to perceive the patient as a whole, by identifying individual problems, which, could, ultimately, contribute to more personalized and tailor-made treatments.

Author contributions
DH and BB contributed to conceiving and designing the present manuscript. DH and SS conducted the literature search. DH wrote the first draft of the manuscript. SS, DB, MB, CB, MS, KS, and BB commented on an earlier draft of this article and supervised the entire process. All authors contributed to the article and approved the submitted version.