Novel approach towards musculoskeletal phenotypes

The multidimensional array of clinical features and prognostic factors makes it difficult to optimize management within the heterogeneity of patients with common musculoskeletal pain. This study aimed to identify phenotypes across prognostic factors and musculoskeletal complaints. Concurrent and external validity were assessed against an established instrument and a new sample, respectively, and treatment outcome was described.


| INTRODUCTION
Chronic musculoskeletal (MSK) pain is highly prevalent in the general population (Briggs et al., 2018) and is among the highest rated causes of years lived with disability and reduced health worldwide (GBD, 2018). Randomized controlled trials show only small to moderate effects across different conservative treatments in patients with common MSK disorders, such as neck, low back and shoulder pain (Babatunde et al., 2017). The large patient heterogeneity might be one explanation for the modest results in trials, where 'one-sizefits-all' management is commonly employed. This is clearly a less than ideal approach in heterogeneous patient populations where a number of prognostic factors may affect treatment outcome differently on the individual level. Gaining further insight about homogeneous patient groups, that is different phenotypes, can be useful in developing targeted interventions to improve treatment outcome.
A noble suggestion to improve care has been to classify patients according to prognostic factors rather than pain diagnosis (Croft et al., 2015). Evidence shows that prognostic factors and treatment outcome are surprisingly similar across traditional MSK diagnostic groups (Artus, Campbell, Mallen, Dunn, & van der Windt, 2017;Green et al., 2018;de Vos Andersen, Kent, Hjort, & Christiansen, 2017). It has thus been argued that research should advance towards clustering of patients who share similar features and prognostic factors across the biopsychosocial domains, and tune management to the characteristics of the different clusters (Hill et al., 2011). Studies using cluster analyses in the MSK patient population have either been confined to specific pain diagnosis (Barons et al., 2014;Carlesso, Raja Rampersaud, & Davis, 2018;de Luca, Parkinson, Downie, Blyth, & Byles, 2017;Murphy, Lyden, Phillips, Clauw, & Williams, 2011;Nielsen, Kent, Hestbaek, Vach, & Kongsted, 2017;Rabey, Smith, Beales, Slater, & O'Sullivan, 2016;Reme et al., 2012;Stynes, Konstantinou, Ogollah, Hay, & Dunn, 2018), a single subgrouping variable (i.e. pain sites; Hartvigsen, Davidsen, Hestbaek, Sogaard, & Roos, 2013;Lacey et al., 2014) or factors related to one specific dimension (i.e. psychological; Rabey et al., 2016). However, no studies have investigated if there exist hidden patterns or underlying subgroups of MSK patients based on a broad set of prognostic factors across the biopsychosocial domains. To address this, we applied a cluster analysis on data from primary care physiotherapy patients with common MSK disorders.
The first aim of this study was to identify musculoskeletal phenotypes based on common prognostic factors across the biopsychosocial domains. A second aim was to validate the phenotypes by (a) investigating the concurrent validity with an established prognostic screening tool, (b) describing the treatment outcome and (c) investigating the external validity in a new sample.

| Study sample
This study used baseline and 3-month follow-up data from a longitudinal observational project in Norwegian physiotherapy practice (FYSIOPRIM; Evensen et al., 2018). Physiotherapists working in primary health care asked patients to participate at their first consultation. Inclusion criteria were patients aged 18-67 years seeking physiotherapy in primary health care for treatment of complaints in the neck, shoulder, low back or multisite/complex pain as their main problem. That is a patient that for example were categorized with shoulder pain as the main complaint could also have complaints of less intensity in other body regions, but according to the treating therapist not as extensive to be classified as multisite/complex. The physiotherapists categorized the patients in the four pain groups during the consultation, and the patients thereafter completed self-report questionnaires. Classification of multisite/complex pain was based on the the patient-reported history, number of pain sites, overall severity of the patient's symptoms and clinical examination, and was entirely a decision by the treating physiotherapist. Exclusion criteria were rheumatoid arthritis, fractures, neurological conditions (i.e. stroke, multiple sclerosis etc.), preor postoperative patients, pregnancy-related disorders and poor comprehension of the Norwegian language.
In this study, we used data from 463 patients registered by 45 physioterapists in the period March 2016 to February 2018. Patients received usual care physiotherapy and content and number of treatments were at the discretion of the physiotherapists.
A different but similar patient sample (n = 107) was recruited for the external validity part of the study. These patients were recruited consecutively from March to September pain and points to a need for more targeted interventions in common musculoskeletal disorders to improve treatment outcome.

| 923
MEISINGSET ET al. 2018 in the FYSIOPRIM project and underwent the same selection criteria as for the development sample.
The FYSIOPRIM project was approved by the Regional committees for Medical andHealth Research Ethics in Norway (REC no. 2013/2030), and the approval covers the assignment in this study.

| Variables included in the latent class modelling
Latent class analysis (LCA) was used to identify underlying distinct phenotypes using 11 indicator variables and four active covariates (Figure 1). The analysis is detailed in the Statistical analysis section below. Latent class modelling aims to identify unobserved heterogeneity and to find meaningful subgroups (phenotypes) that are similar in their responses to measured variables, that is participants with the same phenotype are considered homogeneous based on the combination of indicator variables included in the model.

| Other variables
Smoking was assessed by the question 'Do you smoke?' with the response options 'yes' and 'no'. Self-reported leisure time physical activity was assessed by three questions (Kurtze, Rangul, Hustvedt, & Flanders, 2008) regarding frequency, intensity and duration, where physical inactivity was defined as frequency less than once per week and/or if intensity was reported as 'take it easy' (Nes et al., 2011).
Patients were asked to indicate employment status ('paid work', 'student', 'pensioner', 'disability pension', 'work allowance', 'sick leave' and 'nonpaid work'). We reported the proportion of patients in paid work with no sick leave benefits (i.e. no work allowance or sick leave). The 10-item Örebro screening questionnaire was used to assess risk for long-term work disability (range 0-100, where higher score indicate higher levels of estimated risk for long-term work disability; Linton, Nicholas, & MacDonald, 2011). Use of analgesics was assessed with the question 'Have you used pain medication last week'? with response options yes or no. Patients were also asked whether they had received physiotherapy treatment the last 12 months.

| Treatment outcome at 3-month follow-up
Function and health-related quality of life outcomes were assessed at 3-month follow-up. Function was assessed F I G U R E 1 Overview of the analysis model in the latent class analysis by the Patient-Specific Functional Scale (PSFS; 0-10; Stratford, Gill, Westaway, & Binkley, 1995). The patient defined the most troublesome activity that she/he had to perform and scored the activity on a numerical rating scale from 0 to 10, where 0 indicated not able to perform the activity and 10 indicated no problem to perform the activity. Health-related quality of life was assessed by the EuroQol instrument (EQ-5D-5L; Herdman et al., 2011). The EQ-5D evaluates the following five dimensions: mobility, self-care, usual activities, pain and/or discomfort and anxiety and/or depression. The response options for each dimension were 'no problems', 'slight problems', 'moderate problems', 'severe problems' and 'extreme problems'. The scores were converted to an index value for health status using the UK value set (theorethical range −0.285 to 1, where −0.285 means extreme problems on all dimensions, and 1 means perfect health; Devlin, Shah, Feng, Mulhern, & van Hout, 2018). All patients received reminders once a week, up to three times, by email and sms, if they had not responded at 3-month follow-up.

| Statistical analysis
We excluded patients with missing data on four or more indicator variables at baseline to avoid estimate inaccuracies of the latent class modelling. The data analysis was conducted in two steps. First, the LCA model was used to identify distinct patient phenotypes using the 11 indicator variables and the four covariates. A detailed description of the LCA procedure applied in this study is given in Methods S1 and S2. Second, the phenotypes were validated by (a) investigating the concurrent validity, (b) describing the treatment outcome in the development sample and (c) investigating the external validity in a new sample.

| Validation
We investigated the concurrent validity and described the treatment outcome of the phenotypes derived from the development sample. The different phenotypes were formed based on the scoring patterns across biopsychosocial prognostic factors. Concurrent validity of our phenotypes was tested against an already established prognostic tool used to identify patients at risk for long-term work disability, the Örebro screeening questionnaire. Distribution of the Örebro scores across the phenotypes are presented in a box plot. We described the treatment outcome at 3-month follow-up in the development sample. The median health-related quality of life (EQ-5D) and the proportion of patients reporting good function (PSFS ≥8) was compared across the identified phenotypes.
To assess the external validity of the phenotypes, we used the Step3-Scoring module in Latent Gold 5.1 (Statistical Innovations Inc) to obtain an algorithm and related SPSS syntax for scoring new cases (Vermunt & Magidson, 2016). We classified the patients in the external validation sample using the algorithm defined in the development sample, and to obtain the posterior probabilities for belonging to each phenotype. The posterior probabilities were used to assess the goodness of fit of the external validation, in addition to subjective comparison of the phenotype characteristics in the external versus the development sample.

| Software
Latent class modelling was conducted using Latent Gold 5.1. We used the default settings of the program, except for inclusion of subjects with missing values on the indicators. Descriptive statistics were performed in STATA/IC 15.1. Test of external validity was performed in IBM SPSS Software 24.

| Descriptives
We excluded 28 patients due to missing on four or more indicator variables, therefore, 435 patients were included in the development sample. Table 1 presents the baseline characteristics of the study sample overall and stratified by the five phenotypes. Of the 435 patients included, 116 (26.7%) had complaints or pain in the neck, 118 (27.1%) in the shoulder, 89 (20.5%) in the low back and 112 (25.8%) had multisite/complex pain. Most patients (62%) had complete data for the indicator variables, whereas 17%, 14% and 6% had one, two or three variables missing, respectively. Overall response rates at 3-month follow-up were 70% (n = 305) with no significant differences between responders and nonresponders for baseline characteristics and across phenotypes. The proportions of patients with neck, shoulder or low back pain were distributed similarly across all the phenotypes and all phenotypes consisted of ~70% women. Patients categorized as multisite/complex pain were mainly found in phenotype 4 and 5.

| Derivation of clinical phenotypes
The latent class analysis identified a model with five phenotypes as the optimal fit with the 11 indicator variables, that is the BIC was lowest for this model and the | 925 bootstrapped likelihood ratio test implied no statistical gain by adding an extra phenotype (Methods S2). The average posterior probabilities for phenotype 1 to 5 were 0.88, 0.89, 0.89, 0.92 and 0.92, respectively (see Figure  S2 for graphical display of the posterior probabilities). Furthermore, based on the face validity, the model with five phenotypes provided clinically relevant differences between the phenotypes. Table 2 shows the detailed characteristics of the patients across the indicator variables, while Figure 2 summarizes the characteristics graphically. The phenotypes varied broadly in characteristics at baseline. Phenotype 1 (n = 77, 17.7%) and phenotype 2 (n = 142, 32.6%) were characterized by the lowest scores across all domains ('pain', 'beliefs and thoughts', 'psychological' and 'activity and lifestyle'), but phenotype 2 had somewhat higher levels of symptoms across the domains. Compared to phenotype 1 and 2, phenotype 3 (n = 89, 20.5%) and 4 (n = 78, 17.9%) were more affected in all domains. The main differences between phenotype 3 and phenotype 4 were the opposite scoring pattern within the psychological domain (higher fear avoidance and lower mental distress in phenotype 3 and vice versa for phenotype 4), and worse symptoms in the pain domain (longer pain duration and more pain sites) in phenotype 4. Phenotype 5 (n = 49, 11.3%) were characterized by worse symptoms across all domains, especially in the domains 'pain' and 'activity and lifestyle'. Overall, higher phenotype affiliation tended to be associated with more obesity and lower education (Table 1). Figure 3 shows the concurrent validity of the development sample phenotypes indicated by the Örebro screening questionnaire. The risk of long-term work disability increased from phenotype 1 to phenotype 5, except that phenotype 3 and 4 had similar risk. Figure 4 presents function and health-related quality of life at 3-month follow-up for the development sample. Similar to the baseline patterns presented in Table 1, the order of the severity of function and health-related quality of life remained almost the same across the five phenotypes at follow-up.

| Validation
Tables S2 and S3 present the characteristics of the external validation sample, overall and stratified by the five phenotypes. The baseline characteristics of patients in the external validation sample were similar compared with the development sample, except that almost 50% were classified with multisite/complex pain. The average posterior probabilities were ≥0.90 for all phenotypes, except for phenotype 3 where it was 0.87 (see Figure S2 for more details). Similar to the development sample, we observed that the phenotypes varied broadly in clinical characteristics. However, the differences between phenotype 3, 4 and 5 became somewhat attenuated.

| DISCUSSION
This study identified five distinct musculoskeletal phenotypes based on 11 key prognostic factors over the biopsychosocial domains. Importantly, the phenotypes were

Activity and lifestyle
Work ability (0-10), mean (SD) a 5.7 (3.0) 8.6 (1. identified irrespective of traditional patient groupings based on primary pain location (neck, low back, shoulder and multisite pain). The phenotypes were externally validated in another patient sample, showed good concurrent validity, and higher phenotype affiliation at baseline was characterized by poorer function and health-related quality of life at 3-month follow-up. Thus, our findings suggest a new and generic approach for screening and classification of common MSK complaints in primary care, where MSK patients are phenotyped into distinct subgroups based on prognostic factors rather than merely the location of their primary pain.

F I G U R E 2
Radar graph summarizing indicator variables across the five phenotypes. Higher scores indicate a severe/worse level on the variables. Continuous variables (pain intensity, recovery expectations, pain self-efficacy, fear avoidance, work ability) are transformed to a scale from 0 to 1. Categorical and binominal variables are presented as proportions (%). Pain sites are categorized into the proportion with cut-off set to ≥4 pain sites, pain duration is categorized into the proportion with ≥1-year duration and mental distress is categorized into the proportion scoring ≥1.85, defined as a cut-off to indicate symptoms of mental distress. Reduced daily activity level is defined as the proportion reporting quite to very much reduced, and sleep problems is defined as the proportion reporting moderate to severe sleep problems

| Description of phenotypes and comparison with the literature
Previous subgrouping studies in the MSK patient population have either been confined to specific pain diagnosis (Barons et al., 2014;Carlesso et al., 2018;de Luca et al., 2017;Murphy et al., 2011;Nielsen et al., 2017;Rabey et al., 2016;Reme et al., 2012;Stynes et al., 2018), a single subgrouping variable (e.g. pain sites; Hartvigsen et al., 2013;Lacey et al., 2014), or factors related to one specific dimension (i.e. psychological; Rabey et al., 2016). An important contribution of this study is that, regardless of the main pain location, grouping patients on key biopsychosocial prognostic factors provide strong clustering within separate, clearly different and homogeneous phenotypes that are associated with different treatment outcome. This is supported by the high probabilities for patients being correctly classified within their respective phenotype. Furthermore, the variance in some of the indicator variables within classes were almost as large as the sample mean, but the overall pattern indicates less variance and thus more homogenous phenotypes. Our findings are supported by ongoing work showing that patients with common musculoskeletal disorders can be subgrouped using a modified version of the STarT Back approach (Hill et al., 2016). The multidimensional factors used in our subgrouping approach covers the proposed new framework for diagnosing and describing chronic pain as a composite measure of pain intensity, painrelated distress, interference with daily function, temporal characteristics and the presence of psychological and social factors (Treede et al., 2019). The novelity of this study lies in combining these multidimensional factors to uncover the complex interplay of prognostic factors not uncovered by studying individual prognostic factors in isolation.
Our findings highlight the complex relation between prognostic factors in different domains and support the view that clinicians need to apply a multifaceted approach in their encounter with MSK patients (Synnott et al., 2015). For instance the severity of prognostic factors varied widely across the five phenotypes, showing that higher phenotype affiliation was associated with higher symptom levels overall, and within all domains. Phenotype 5 showed highest severity across all domains, and had the poorest outcome at 3 months, indicating that these patients have a very complex pain condition. Patients in phenotype 3 and 4 were affected similarly in the domain 'activity and lifestyle' and had similar treatment outcomes at 3-month follow-up. The two phenotypes showed disparities in the psychological domain, with markedly higher fear avoidance in phenotype 3, whereas phenotype 4 had clearly higher mental distress. Phenotype 4 was also characterized by more multisite pain patients and long-term pain duration. Although a surprising finding, this current result underscore that it is important to differentiate mentral distress and cognitive appraisals in response to pain (Rabey, Slater, O'Sullivan, Beales, & F I G U R E 3 Data distribution of the Örebro screening questionnaire, assessing risk for long-term work disability, across the five phenotypes. The horizontal reference line indicates cut-off for high risk (>50) for long-term work disability MEISINGSET ET al. Smith, 2015). This differentiation is also in line with the biopsychosocial model of pain (Gatchel, Peng, Peters, Fuchs, & Turk, 2007). More research is needed to evaluate whether this difference is clinically relevant. Phenotype 1 and 2 were quite similar; however, phenotype 2 had somewhat higher levels of symptoms across the biopsychosocial domains and slightly poorer treatment outcome. The clinical relevance of splitting these two classes is therefore questionable, as both potentially reflect patients with low levels of symptoms and low risk of poor treatment outcome. The importance of these phenotypes needs to be explored further, also in treatment studies. To summarize, our findings are in agreement with studies emphasizing the importance of using a broad set of biopsychological factors to uncover distinct patient phenotypes (Beneciuk, Robinson, & George, 2012;Larsson, Gerdle, Bernfort, Levin, & Dragioti, 2017), as differences observed between subgroups call for more targeted treatment approaches .

| CLINICAL IMPLICATIONS
The phenotype approach may have relevance for clinical work. First, the generic prognostic factors and the assigned phenotype can be applied across patients with long-term pain in the neck, shoulder, low back and multisite/complex pain. Second, most of the prognostic factors are modifiable through health care management and may thus act as potential treatment effect modifiers, for example recovery expectations, fear avoidance, mental distress and sleep problems. Third, our data show that the phenotypes are associated with different treatment outcome, which adds information in management decisions. As shown in a recent study (Protheroe et al., 2019), subgrouping of patients who share similar features and prognostic factors across the biopsychosocial domains may add contributions to the work on developing stratified treatment strategies. However, the clinical relevance of stratifying MSK patients in prognostic phenotypes rather than traditional pain diagnoses must be investigated in future studies by evaluating whether matched treatment leads to better treatment outcomes.

| STRENGTHS AND LIMITATIONS
We broadly included patients with common MSK pain disorders seeking physiotherapy care with comprehensive data collections at baseline and at 3-month follow-up. In tune with current guidlines, we a priori included a range of biopsychosocial variables that cover key prognostic factors for treatment outcome in these groups (Steyerberg et al., 2013). Importantly, we performed an external validation of our phenotyping approach in a new sample of the same types of patients with common MSK disorders, which showed very low classification error and comparable phenotype characteristics as with the development sample. Some limitations should be considered in the interpretation of our findings. We used single items for some of the indicators in the LCA model, where the single items are not validated in isolation and may not capture the underlying contruct, for example recovery expectations. Some of the indicator variables were more important in the model development and removing the variables with least influence may have resulted in a more parsiminous model, however, it should be noted that all indicator variables contributed statistically significant. Although a strength of LCA is the use of maximum likelihood estimation, the considerable proportion with missing values on the indicator variables may have influenced the accuracy of the model development. Moreover, we did not have information about other potential important indicator variables, such as movement restriction (Artus et al., 2017;Mallen, Peat, Thomas, Dunn, & Croft, 2007) and social support (Mallen et al., 2007). The sample size for the external validation can be seen as relatively small and larger samples are therefore needed to validate our findings. Several of the physiotherapists recruited few patients, indicating that patients were not included consecutively in the FYSIOPRIM study. However, results from the protocol paper shows that patients in FYSIOPRIM are similar to the whole patient population seeking physiotherapy in Norway in terms of diagnosis, age and gender (Evensen et al., 2018). Finally, only 70% answered relevant questions at 3-month follow-up. However, there were no baseline differences between nonresponders and responders at follow-up, indicating a limited attrition bias.

| CONCLUSION
Independent of primary pain location, patients with common musculoskeletal disorders were classified into clinically meaningful phenotypes, indicated by distinct characteristics across the biopsychosocial domains, good external and concurrent validity and phenotypes with different treatment outcome. However, studies with targeted treatments matched to these phenotypes are needed to test the effectiveness of the subgrouping approach before it can be implemented in clinical practice. Thus, our findings may inform the development and design of targeted interventions aimed at improving the treatment efficiency in patients with common MSK disorders.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
The FYSIOPRIM project is conducted according to the Helsinki declaration and approved by the Regional committee for Medical and Health Research Ethics in Norway (REC no. 2013/2030). Project information and consent form were available in Norwegian and English. All patients have provided written informed consent to participate.

TRANSPARENCY DECLARATION
The lead author affirms that the manuscript is an honest, accurate and transparent account of the study reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.