Original Article
Exploring conceptual preprocessing for developing prognostic models: a case study in low back pain patients

https://doi.org/10.1016/j.jclinepi.2020.02.005Get rights and content
Under a Creative Commons license
open access

Abstract

Objectives

A conceptually oriented preprocessing of a large number of potential prognostic factors may improve the development of a prognostic model. This study investigated whether various forms of conceptually oriented preprocessing or the preselection of established factors was superior to using all factors as input.

Study Design and Setting

We made use of an existing project that developed two conceptually oriented subgroupings of low back pain patients. Based on the prediction of six outcome variables by seven statistical methods, this type of preprocessing was compared with medical experts’ preselection of established factors, as well as using all 112 available baseline factors.

Results

Subgrouping of patients was associated with low prognostic capacity. Applying a Lasso-based variable selection to all factors or to domain-specific principal component scores performed best. The preselection of established factors showed a good compromise between model complexity and prognostic capacity.

Conclusion

The prognostic capacity is hard to improve by means of a conceptually oriented preprocessing when compared to purely statistical approaches. However, a careful selection of already established factors combined in a simple linear model should be considered as an option when constructing a new prognostic rule based on a large number of potential prognostic factors.

Keywords

Prognostic models
Preprocessing
Subgrouping
Latent class analysis
Low back pain
Lasso
Random forest
Linear model

Cited by (0)

Funding statement: AMN was partially financially funded, and the original data collection was fully funded, by the Danish Foundation for Chiropractic Research and Post Graduate Education, Denmark (grant numbers 11/1445 and 01/1624, respectively). SF acknowledges an Eccellenza grant (186932) from the Swiss National Science Foundation (SNSF). The funding bodies had no control over design, conduct, data, analysis, review, reporting, or interpretation of the research conducted.

Conflict of interest statement: The authors have no financial or non-financial competing interests to declare.