Journal Pre-proof Data-driven pathway analysis of physical and psychological factors in low back pain

Abstract

• The worse the overall symptoms, the greater the importance of physical and 48 activity factors in directly and indirectly predicting disability in people with low 49 back pain (LBP). 50 • Psychological factors explained the pain-disability relationship only in the 51 group with worse overall symptoms. 52

What this adds to what is known? 53
• Combining data-driven machine learning algorithms with traditional statistical 54 inferential methods provide a powerful method of developing, testing, and 55 refining causal hypothesis. 56

What is the implication, what should change now? 57
• Physical factors play an important role in the understanding of pain-related 58 disability, particularly so in the subgroup with worse pain and psychological 59 health. 60 Introduction 64 Low back pain (LBP) is the leading cause of years lived with disability globally [1], 65 with high socio-economic cost [2], particularly among individuals with persistent symptoms 66 [3]. Despite an exponential increase in clinical research focused on LBP over recent decades, 67 no treatment has been shown to have large, significant, and consistent benefits for patients. 68 Causal mediation analysis (CMA) has been applied in attempting to disentangle the 69 mechanisms of LBP [4,5]. Current mediation studies have primarily focused on the role of 70 psychological factors in mediating the relationship between pain and disability [4,[6][7][8]. Results 71 have been mixed with some studies reporting that fear-avoidance and psychological distress 72 mediated the relationship between pain and disability [4,5]. Also, for some interventions 73 designed to target specific psychological factors like fear, reduced fear mediated the effect of 74 the intervention on disability [9], while in others fear did not mediate the effect of the 75 intervention [8]. 76 A structural model defines the dependent variable(s), independent variable(s), and 77 mediator(s), and is the first step in CMA [10]. Specifying a structural model with many 78 variables can be challenging, and may rely on existing theoretical frameworks such as the fear-79 avoidance model [11], clinical expertise, and/or the literature. Alternatively, a data-driven 80 structural modelling approach such as Bayesian Networks (BN) [12][13][14] The proportion of missing data ranged from 0.96% to 23.93% (Supplementary Figure  132 1). Multiple imputations were performed on all variables with missing values, regardless of the 133 amount of missing data, using the Multivariate Imputation by Chained Equations method [30]. 134 The random forest method was used for imputation. We imputed the data using a maximum 135 number of iterations of 30 for imputation. 136 Confirmatory Factor Analysis (CFA) 137 CFA was used to assess the fit of the proposed measurement model, which defines the 138 relationship between the observed variables, and the latent variables of Physical, Pain, 139 Psychology, and Activity ( Figure 2 A hierarchical agglomerative cluster analysis (HACA) was used to identify 147 homogenous LBP subgroups based on all observed variables of the latent variables, and sex 148 and age. A hierarchical cluster tree was formed using the "complete" linkage method and 149 Gower's distance (see supplementary material). The optimal number of clusters was 150 determined using qualitative visual inspection of the cluster tree, and quantitative internal 151 measures of cluster validation. When using internal validation measures, the goal is to achieve 152 the smallest within-cluster average distance and the largest between-cluster average distance 153 ( Figure 1). Herein we used two validation measuresthe Connectivity and Silhouette width. 154 The Connectivity has a value between zero and ∞, with a value closer to zero indicating a more 155 optimal clustering solution. The Silhouette width has a value between -1 to 1, and the closer it 156 is to 1, the better the clustering solution. Connectivity and Silhouette width were calculated for BN and SEM analyses will be conducted on three datasetsthe entire cohort, subgroups 1 and 160

BN modelling 162
All continuous variables were scaled to a mean of zero and standard deviation (SD) of 163 one after subgrouping, but before performing BN modelling. In the BN framework, prior 164 knowledge of known relationships can be included in the model as blacklist and whitelist arcs 165 (Supplementary material). Structural expectation-maximization of the hill-climbing (HC) 166 algorithm was used for structural learning for each dataset with the blacklist and whitelist 167 included [37]. The HC algorithm iteratively adds, deletes, or reverses edges until the Bayesian 168 Information Criterion of the model fit can no longer be improved [37]. 169

Structural Equation Modelling (SEM) 170
The structural paths from the BN models were used for SEM analysis to estimate the 171 parameters, as described in previous paragraphs. The same estimator and model fit indices as 172 the CFA was used presently. For the measurement and path models, the standardised 173 coefficients are reported. Significance was defined by P < 0.05. 174

175
A total of 3,849 participants were included in the analysis. Table 1 reports the 176 descriptive characteristics of the participants in subgroups 1 (n = 2,358) and 2 (n = 1,491). 177 Participants in subgroup 1 had poorer physical attributes, higher LBP and leg pain intensities, 178 more negative psychological attributes, and higher disability, compared to subgroup 2 (Table  179 1). 180

Path coefficients 193
For the whole cohort, the explained variance of disability, as measured by the Oswestry 194 Disability Index (ODI) was R 2 = 0.59. The variable most strongly associated with ODI was 195 pain, where a one SD higher pain severity was associated with a 0.417 SD higher ODI 196 (P<0.001). Psychology was directly associated with ODI ( = 0.310 ( < 0.001)), and also 197 indirectly via pain ( Figure 3, Table 2). A more negative psychological level was associated 198 with higher pain severity ( = 0.734( < 0.001)), whilst higher pain severity was associated 199 with higher ODI (Figure 3, Table 2). For subgroup 1, the explained variance of ODI was R 2 = 200 0.51. The variable most strongly associated with ODI was psychology, where a one SD more 201 negative psychology level was associated with a 0.363 SD higher ODI (P<0.001) ( Table 3). 202 Physical was directly associated with ODI ( = −0.077 ( = 0.004)), and also indirectly via 203 pain and psychology ( Figure 4, Table 3). Activity was directly associated with ODI ( = 204 −0.203 ( < 0.001)), and also indirectly via the path of psychology, and the serial paths of 205 physical and pain ( Figure 4, Table 3). For subgroup 2, the explained variance of ODI was R 2 = 206 0.48. The variable most strongly associated with ODI was pain, where a one SD higher pain 207 severity was associated with a 0.408 SD higher ODI (P<0.001) ( Table 4) Psychological, physical, activity, pain, and disability factors either worsened or 226 improved together in both subgroups [38,39]. One study which used K-means clustering 227 reported that the "Severe physical-psychological" group had a worse self-reported physical 228 impairment, psychological distress, and pain levels than the "Mild" group [38]. Another study 229 that used hierarchical clustering reported that the "Maladaptive" group had a low positive 230 affect, atypical trunk muscle activity, and higher pain intensity than an "Adaptive" subgroup 231 [39]. An interesting observation in that study was that the link between physical factors and 232 pain was present only in the subgroup with the poorer psychological state. In treatments like 233 cognitive functional therapy [40], the rationale for treating both psychological and physical 234 factors is that negative psychological factors can result in physical impairment [16], which 235 results in greater pain. The present study's findings suggest poor physical health and activity 236 levels are not only a consequence, but may also be a predictor of pain and disability that is 237 partially explained by psychological health, even in people with poorer psychological state. 238 In subgroup 1, where symptoms and signs were worse than in subgroup 2, the model 239 suggested that the physical factor directly affected the psychological factor, and also indirectly 240 via the pain factor. This implies that an intervention that attempts to improve the average value 241 of the physical factor over a period of time, can expect to result in improvements in the average 242 value of the psychological factor, part of which can be attributed to the intermediary effect of 243 pain (i.e. "between-subject" effect) [28]. Alternatively, if the observed associations reflect a 244 within-person process, an intervention that attempts to improve the physical factor now can 245 expect to find improvements to the psychological factor shortly after (i.e. "within-subject" has been recommended for "medium risk" individuals [47]. This aligns with our findings in 291 subgroup 2, but given that the model fit in subgroup 2 was inadequate, we are cautious to make 292 interpretations from these findings. 293 This study has several limitations. First, being a cross-sectional study, extrapolating our 294 findings to longitudinal changes over time within a participant should be done with caution. 295 The present findings should be interpreted within an exploratory causal hypothesis generation 296 framework. To date, it is still uncertain how quickly physical, psychological, activity, and 297 function factors influence each other [7]. For example, kinesiophobia and depression predicted 298 disability when both these variables were measured at the same time, and not when they were 299 measured two days apart [7]. This suggests that kinesiophobia and depression affect disability 300 in ≤48 hours [48]. Second, the relationship between our latent variables of pain, psychological, 301 and physical factors may alter based on the observed variables collected. Presently, the latent 302 variable of physical factors comprised of muscle endurance and mobility measures. Hence, it 303 was deemed biologically reasonable for it to both affect and be a result of the latent variable of 304 psychology. A third limitation of the present study was that the influence of potential 305 unmeasured variables, like sleep, on the variables included in the network analysis was not 306 investigated. 307 J o u r n a l P r e -p r o o f Conclusion 308 Presently, pain and psychological factors directly predicted disability, regardless of 309 symptom severity, albeit with different paths of action. Negative psychological features were 310 more likely to be a consequence of pain and reduced physical factors in individuals with worse 311 overall symptoms. In contrast, psychological features in individuals with milder overall 312 symptoms were more likely to contribute to pain and negative physical factors. 313 Notwithstanding that within-subject pathways cannot be established from cross-sectional data, 314 data-driven structural learning of subgroup-specific pathways may open the doors toward more 315 optimal individualised treatments to better manage a complex disorder like LBP. 316   Activity 4(2) 5(2) 4(2) <0.001