Prediction of segmental motor outcomes in traumatic spinal cord injury: Advances beyond sum scores

Background and objectives: Neurological and functional recovery after traumatic spinal cord injury (SCI) is highly challenged by the level of the lesion and the high heterogeneity in severity (different degrees of in/complete SCI) and spinal cord syndromes (hemi-, ant-, central, and posterior cord). So far outcome predictions in clinical trials are limited in targeting sum motor scores of the upper (UEMS) and lower limb (LEMS) while neglecting that the distribution of motor function is essential for functional outcomes. The development of data-driven prediction models of detailed segmental motor recovery for all spinal segments from the level of lesion towards the lowest motor segments will improve the design of rehabilitation programs and the sensitivity of clinical trials. Methods: This study used acute-phase International Standards for Neurological Classification of SCI exams to forecast 6-month recovery of segmental motor scores as the primary evaluation endpoint. Secondary endpoints included severity grade improvement, independent walking, and self-care ability. Different similarity metrics were explored for k-nearest neighbor (kNN) matching within 1267 patients from the European Multicenter Study about Spinal Cord Injury before validation in 411 patients from the Sygen trial. The kNN performance was compared to linear and logistic regression models. Results: We obtained a population-wide root-mean-squared error (RMSE) in motor score sequence of 0.76(0.14, 2.77) and competitive functional score predictions (AUC walker = 0.92, AUC self-carer = 0.83) for the kNN algorithm, improving beyond the linear regression task (RMSE linear = 0.98(0.22, 2.57)). The validation cohort showed

comparable results (RMSE = 0.75(0.13,2.57), AUC walker = 0.92).We deploy the final historic control model as a web tool for easy user interaction (https://hicsci.ethz.ch/).Discussion: Our approach is the first to provide predictions across all motor segments independent of the level and severity of SCI.We provide a machine learning concept that is highly interpretable, i.e. the prediction formation process is transparent, that has been validated across European and American data sets, and provides reliable and validated algorithms to incorporate external control data to increase sensitivity and feasibility of multinational clinical trials.

Background
Traumatic spinal cord injury (SCI) greatly impacts the quality of life of affected individuals due to impairments including sensorimotor function.(Badhiwala et al., 2019;Post and Noreau, 2005) Patients and their families have important questions regarding their prognosis of recovery and probable long-term disability to plan for their healthrelated, financial, and social future.Personalized outcome predictions could provide a more specific perspective and help guide physicians in their rehabilitation planning.However, accurate prediction of individual sensorimotor recovery after traumatic SCI is challenging as injury and recovery patterns of traumatic SCI are variable (Fawcett et al., 2007;Kramer et al., 2012).
Currently, SCI severity is clinically assessed using the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) exam (Kirshblum, n.d.;Kirshblum and Waring 3rd., 2014) based on scoring a patient's motor and sensory function at the level of individual myotomes and dermatomes, and further summarized by total motor and sensory scores and whether the injury is complete or incomplete using the American Spinal Injury Association (ASIA) impairment scale (AIS) (Rupp et al., 2021).Despite this detailed assessment, current prediction models and evaluation endpoints are reduced to sum scores (upper, lower extremity, and total motor scores (UEMS, LEMS, TMS) that aggregate the segmental data (DeVries et al., 2020;Wilson et al., 2012;Kaminski et al., 2017).This results in oversimplification of the complex underlying patterns of retained motor function that may greatly impact a patient's functional ability.Similarly, prediction rules are restricted to group levels.An improved method is unbiased recursive partitioning with conditional inference trees (URP) (Buri et al., 2022) which assigns a patient to a more specific subgroup based on their sum scores and AIS grade alone.However, personalized predictions that utilize the segmental motor score sequences of individuals are lacking.Similarly, for functional endpoints, the range of investigated approaches leaves room for improvement.The current state-of-the-art in the field is a logistic regression model by van Middendrop et al. (van Middendorp et al., 2011) that provides a prediction of an individual's walking ability.
In the era of growing SCI databases, it is a natural next step to consider data-driven recovery prediction models (Belliveau et al., 2016;Khan et al., 2019;Inoue et al., 2020;McCoy et al., 2019), to create personalized predictions based on the full segmental motor score sequence, and to inform functional evaluation endpoints.While some personalized biomarkers of recovery have previously been developed based on magnetic resonance imaging or injury-derived serum or cerebrospinal fluid (CSF) proteins, these are not available in large databases comprising hundreds to thousands of patients (Kirshblum et al., 2021;Kirshblum et al., 2016;Wilson et al., 2018;Freund et al., 2019;Dalkilic et al., 2018).This study proposes methods to more fully utilize the ISNCSCI scores that are part of routine clinical assessment.
Apart from providing a superior recovery prediction for the individual, the accurate prediction of the segmental motor score sequence would enable the simulation of synthetic controls for SCI trials.Despite numerous clinical trials over the last decades, effective treatments to ameliorate traumatic damage to the spinal cord, other than spinal cord decompression and optimized blood pressure control, are lacking.The heterogeneity of outcomes and the scarcity of reliable outcome predictors have likely contributed to the failure of SCI clinical trials, where, due to its low incidence (8.0 to 246.0 cases per million inhabitants per year), recruitment of participants is challenging (Feigin et al., 2019).Clinical trial planning and design may be improved through the use of historic or simulated control cohorts as extensions to placebo-controlled trial designs allowing a maximum number of patients to receive the investigational therapy.This approach has been adopted by the Nogo Inhibition in Spinal Cord Injury (NISCI) trial (clinicaltrials.gov #NCT03935321), a recently completed phase-II study that followed a phase-I study (Kucher et al., 2018)..The NISCI trial randomized patients at a ratio of 1:2 into placebo and treatment arms while accounting for the smaller randomized control group through historical controls.Integral to the success of such advanced trial designs is a reliable means of identifying suitable historic controls.Currently, a comprehensive comparison of different ways to effectively utilize historical controls in such studies is missing.
In this study, we investigate recovery prediction based on k-nearest neighbors (kNN) (k-nearest neighbor algorithm, 2014).The "historic neighbors" of a given patient form a more accurate prognosis by pooling these known outcomes.The similarity metric quantifying which patients are comparable is key in this context.By benchmarking different ways to quantify this similarity in two international patient databases we also identify an optimal strategy to match historical controls.Our central hypothesis was that kNN regression would be superior in predicting recovery as compared to patient stratification by conventional clinical (e.g.sum scores) and demographic features or popular linear regression and classification models (van Middendorp et al., 2011).Our predictions advance beyond the previous modeling of sum scores to incorporate the full segmental motor score sequence and functional endpoints.We further include a notion of prediction confidence stemming from the assessment uncertainty of the ISNCSCI exam.By making the framework publicly available as a web tool the historic control matching methodology is available for the community to explore.

Included data
We based the analysis on the European Multicenter Study about Spinal Cord Injury (EMSCI, clinicaltrials.gov#NCT01571531, data collected between 2000 and 2021), comprising over 5000 patients as of April 2021.The EMSCI study is performed in accordance with the Declaration of Helsinki and approved by all contributing institutional review boards.We validate our findings against an independent cohort from the Sygen trial (Geisler et al., 2001a;Geisler et al., 2001b;Geisler et al., 2001c) conducted between 1992 and 1998 in the United States.Given no treatment effect was demonstrated by the Sygen trial we pooled control and intervention groups.Table 1 summarizes demographic information and injury characteristics of the EMSCI and Sygen cohorts as used within this study.

Patient inclusion criteria
Included patients with traumatic SCI were required to have complete ISNCSCI (Rupp et al., 2021;Kirshblum et al., 2011) assessments and AIS grades in the acute (within two weeks of injury) and the early recovery phase (~26 weeks, 150-186 days).We excluded patients with a neurological level of injury (NLI) in the sacral segments and those displaying notable deterioration in motor function (> 1 point per scorethis threshold was motivated by the assessment uncertainty, see Fig. S2).Patients from EMSCI and Sygen were included irrespective of the year in which the SCI occurred, i.e. pooling data over several decades.Despite significant changes in patient management (Cornea et al., 2022;Schiller and Mobbs, 2012), we justify this step as no differences in improvements of recovery were demonstrated (Bourguignon et al., 2022).This led to 1267 patients from the EMSCI (Table 1, Fig. S1) being identified as a reference pool and assessment of the kNN by leave-one-out cross-validation.Patients from the Sygen trial were matched to the EMSCI reference pool.
Included variables for neighbor matching (2-week assessments): All ISNCSCI scores, including motor (MS) and sensory scores (SS, i.e. light touch (LTS) and pinprick (PPS)), deep anal pressure (DAP) sensation and voluntary anal contraction (VAC) were used.Optional patient matching was performed based on the NLI (absolute match), patient demographics (age (ten-year interval), sex (absolute match)), and the AIS grade (absolute match).

Evaluation endpoints (26-week assessments)
Figure 1B summarizes the neurological and functional evaluation endpoints used.The primary evaluation endpoint was recovery of motor function quantified as root-mean-squared errors (RMSE) below the NLI (RMSE blNLI ) between the true and predicted segmental MS sequences.Secondary endpoints included differences in LEMS (ΔLEMS), AIS grade conversion by at least one grade, the ability to walk independently (van Middendorp et al., 2011), or to care for oneself.For the latter, the Spinal Cord Independence Measure (SCIM) subscores (#1-4) assessing selfcare were dichotomized into self-carers (SCIM 1-4 ≥ 15, i.e. reaching at least 75% functional ability which was here deemed relevant) and dependent patients (SCIM 1-4 < 15).The EMSCI study used different versions of the SCIM (II (Catz et al., 2001) and III (Itzkovich et al., 2018)) which were pooled and available for 1189 of the 1267 patients.For patients within Sygen, SCIM was not assessed.Instead, Benzel et al.'s modified Japanese Orthopaedic Association Scale (Benzel et al., 1991) with equivalent cutoff of 4 to predict walking ability was used.This stratified patients into those unable to walk (4 and below), and those with limited (5) to normal (7) walking.Self-care potential was not assessed.

kNN implementation
All of the described concepts were implemented in Python (version 3.8) and our code is available on https://github.com/BMDS-ETH/SCIRecoveryPredictionPublic.git.Fig. 1A outlines the three-step procedure followed for neighbor matching and prediction calculation: Reference pool stratification (step 1), MS matching (step 2), and neighbor agglomeration (step 3).Step 1 comprises an optional stratification by variables not representing the motor function pattern (NLI, AIS grade, sex, age +/− 10-year bracket) to account for biases.All retained reference patients matched the patient of interest for the optionally selected variables.In step 2 the sensorimotor function of the patient of interest was matched based on one of four similarity metrics summarized in Table 2 matching either LEMS (+/− 5 points, Type 1), or the mean MS below the NLI (+/− 0.5 points, Type 2), or achieved an RMSE blNLI below 0.5 between the acute-phase MS sequence with that of the patient of interest (Type 3), or was identified as kNN (with k in [1,3,10,20]) based on either all MSs, or both MSs and SSs.The recovery-phase outcome variables of all identified neighbors are agglomerated by either mean or median in the final step to arrive at the prediction.

Alternative prediction models
We compare our prediction of walking ability to a logistic regression (sklearn.linear_model.LogisticRegression) based on five input variables: age dichotomised at 65 years, and the larger of the left/right side motor and light touch scores of the gastrocsoleus (S1) and quadriceps femoris (L3) muscles.We performed leave-one-out cross-validation on the EMSCI data, whereas a model trained on all EMSCI data was used for predictions in the Sygen subcohort.Similarly, we provide predictions of MSs at 6 months from a Ridge regression model (sklearn.linear_model.Ridge) using all acute-phase MS and SS as inputs.Here hyperparameters were tuned via grid search on the validation subset of a 5-fold nested cross-validation, and results are reported for the pooled test sets over all folds.

Motor score assessment uncertainty and bootstrapping
Despite best efforts, the ISNCSCI scores are subject to uncertainty stemming from the examiner's level of experience (systematic) and personal fluctuation (random), as well as the patients' compliance with the examination and natural variation of physical fitness (random variation) (Franz et al., 2022;Schuld et al., 2013).We base our estimation of MS uncertainty distributions on a study by Bye et al. (Bye et al., 2021) to calculate the probability density function for each MS level (Fig. S2).We calculate n = 100 bootstraps for each MS sequence and repeat the matching.In all visualizations, we show median values with 95% confidence intervals.

Prediction performance reporting
We represent the segmental MS distribution as a graph, whose y-axis shows the MS, and the x-axis the segments of the key muscle from C5 to T1 and L2 to S1 in the order from rostral to caudal.For each patient, we quantify the agreement between sequences based on the RMSE blNLI .Models are ranked based on the normalized (to the minimum over all models) sum of the median and 97.5th RMSE blNLI percentiles taken over all patients.The obtained rank is used to identify the best-performing model for neurological recovery prediction.
Functional score prediction performance is quantified as the area under the receiver operator characteristic (ROC-AUC) based on the probability of scores of all matched neighbors for a patient.

Subgroup assignment and dimensionality reduction
Patient subgroups were assigned using k-means clustering (sklearn.cluster.KMeans) over all acute-phase ISNCSCI scores scaled to [0,1].We chose a number of eight clusters as a compromise between the minimum number of patients per cluster (99) and subgroup detail covered.We visualize the identified subgroups of patients in a 2D-dimensional embedding of the high-dimensional feature space using Uniform Manifold Approximation and Projection (u-map) (McInnes, 2018) implemented in the umap package.

Ethics approval and consent to participate
The EMSCI study is performed in accordance with the Declaration of Helsinki and approved by all contributing institutional review boards.EMSCI follows the ethical guidelines of the participating countries and patients gave their written informed consent before being included in the database.The Sygen clinical trial also received ethical approval but was conducted before clinical trials were required to be registered (i.e.,  no clinicaltrial.govidentifier was available).

Results
Figure S1 shows the CONSORT-style flowchart illustrating the selection of 1267 EMSCI (primary cohort; reference pool) and 411 patients from the Sygen trial (validation cohort).Table 1 shows relevant summary statistics.

Optimal matching strategy identification
In total 176 different combinations of reference pool stratification (step 1), MS matching (step 2), and neighbor agglomeration (step 3) were tested.Table 3 summarizes the performance metrics for the best combination for each of the MS similarity metrics.
We observe that a combination of matching by AIS grade with a 20-NN regression-based on MSs alone (no sensory scores) and mean agglomeration provided the best compromise in terms of median and 95%CI RMSE blNLI at 6 months between the true and predicted segmental MS sequences.This implies that identifying multiple rather than a single closest match was superior, and that encoding sensory information through AIS grade was sufficient to include information regarding sensation.Further patient stratification, e.g. by age and sex, or NLI did not improve neurological recovery prediction performance.Matching based on the LEMS rather than the full segmental MS sequence also reduced performance considerably (median RMSE blNL, Type1 = 0.98 (0.25, 2.70)), whereas alternative metrics such as the mean MS below the NLI (RMSE blNL,Type2 = 0.77 (0.00, 3.39)), or the RMSE (RMSE blNL, Type3 = 0.75 (0.03, 3.00)) showed only small differences to the optimal strategy (RMSE blNL,Type4A = 0.76 (0.14, 2.77)).The kNN approach also clearly outperformed a linear regression model for the prediction of the MS sequence (RMSE blNL, Ridge = 0.98 (0.22, 2.57)).We show a representative example of a patient with tetraplegia and AIS grade A in Fig. 2A illustrating the full range of predicted parameters along with the prediction confidence scores.
The kNN also performed well for AIS grade conversion (improvement for inclusion of SSs) and functional score predictions (ROC-AUC best for walking ability prediction, 8th for AIS grade conversion, best model without age/sex information for self-care ability prediction).Generally, AIS grade conversion was less accurately predicted, and selfcare ability prediction benefitted from the inclusion of age and sex information (Table S1).Walking ability prediction within the EMSCI cohort (ROC-AUC kNN = 0.92) was comparable to that achieved by a logistic regression model (ROC-AUC LR (van Middendorp et al., 2011;Hicks et al., 2017) = 0.94, Fig. 2B).

Patient subgroup performance variation
Next we assessed differences in prediction performance between different patient subgroups for the 20-NN approach.For visualization, all patient data are projected into a two-dimensional embedding of the full variable spaces using u-map (McInnes, 2018) (Fig. 3A).By coloring patients in this embedding by selected clinical scores (AIS grade, NLI, LEMS, DAP) subgroups become apparent.Fig. 3B shows the variation in prediction performance (i.e.recovery-phase RMSE blNLI ) for the patient cohort.Predictions for patients with an acute phase AIS A and D were generally more accurate than those for AIS B and C. We quantify patient subgroup performance in more detail by clustering the cohort (Fig. 3C) and assessing the performance metrics for each of the clusters (Table 4)).The identified clusters separate the cohort by the NLI, AIS grade, and retained motor and sensory function as indicated in the accompanying heatmap (Fig. 3C).Clusters 1 to 4 comprised more heterogeneous  4. The heatmap (Fig. 3C) also gives an indication of the approximate NLI within each cluster.Within each cluster, some patients did not achieve an acceptable (i.e.RMSE blNLI < 1.0) prediction as indicated by wide 95% confidence intervals of the reported metrics.We observe that clusters comprising a larger fraction of patients with complete (AIS A) paraplegia (clusters 7, 8) or a large proportion of patients with an AIS D (cluster 3) performed best.Predictions were less accurate for patients with a lumbar injury (cluster 2) or tetraplegic AIS B, and C patients (cluster 4).Representative examples of patients and the 20-NN predictions achieving subgroup median performance scores from each cluster are shown in Fig. 3D.All predictions are provided with 95% confidence bands for the predicted MS sequences and a probability in addition to a binary label for functional endpoints.This illustration gives a notion of the clinical relevance of the achieved median RMSE blNLI given the uncertainty in the ISNCSCI score assessment.

External validation
We tested the proposed pipeline on an independent cohort derived from the Sygen trial (Table 5).Here, the same model identified for the EMSCI cohort (AIS; Type 4 (k = 20); mean) performed best in terms of neurological recovery and achieved a comparable ROC-AUC for walking ability prediction as in the primary cohort and for both the kNN and logistic regression models (Fig. 2B, ROC-AUC LR = ROC-AUC kNN Sygen = 0.92).Results in terms of RMSE blNLI and ΔLEMS were also comparable for patient subgroups and the population as a whole (Table 5) with the worst performance observed in clusters 2 and 4, whereas patients in clusters 7 or 8 had the best prediction performance.In summary, we were able to reproduce our findings in an independent patient cohort.

Web tool deployment
We make the best-performing approach that matches patients based on their AIS grand and MS sequence to 20 nearest historical neighbors publicly available as a web tool (https://hicsci.ethz.ch/).Here users can provide 20 MS and the AIS grade as assessed in the very acute phase of the SCI to be provided with a prediction regarding AIS grade, walking and self-care ability, as well as MS sequence at six months after the injury.As in the presented study, the EMSCI database is used here as a reference cohort.

Discussion
KNN has previously been applied in the context of SCI predictions but not applied to recovery prediction from ISNCSCI scores (Popp et al., 2019;Masood et al., 2022;Kayikcioglu and Aydemir, 2010).Given its ease of calculation and direct interpretability, kNN is particularly promising for highly heterogeneous, small patient cohorts characterized by few clinical variables, such as SCI.As a key finding of this study, we present an optimal strategy to identify patients that are similar in a reference cohort and provide very detailed predictions (full segmental motor score sequence as well as functional endpoints) together with a notion of prediction confidence in light of input data uncertainty.This poses an important advancement beyond the use and prediction of sum scores alone.Validating our results in an external patient cohort emphasizes the generalizability of the proposed method.The 20-NN matching based on all MSs with AIS grade stratification may hence be a suitable strategy for personalized recovery predictions, and identification of historic controls for clinical trials.Importantly, we show that choosing multiple, rather than a single closest match would be superior for this task.
A range of similarity metrics was investigated to account for popular clinical metrics such as the LEMS in addition to metrics scoring the entire sequence of motor (and sensory) scores.This comparison emphasizes the benefit of including information on the full segmental MS sequence but also points out that accounting for retained sensory function through AIS grades was superior to including all sensory scores assessed by the ISNCSCI exam.In agreement with previous work (Bourguignon et al., 2022), age or sex information did not improve neurological outcome predictions but benefitted the prediction of functional endpoints (Wilson et al., 2014;Lee et al., 2016).These variables were not required by the top-performing matching strategies to predict MS sequences, however, we observed that age and sex information benefitted the prediction of self-care ability.We suggest that neurological assessments, such as MSs, are intrinsically reported relative to the expected maximum possible implying a potential assessment bias concerning age and sex.Functional endpoints, on the other hand, assess the ability to perform specific tasks which in turn may be driven by the patient's age.
We further show the superiority of 20-NN over a linear regression model to predict neurological recovery while performance for functional endpoints was comparable to that of a logistic regression (van Middendorp et al., 2011;Hicks et al., 2017).We suggest that an important aspect of this outcome may have been the inclusion of ISNCSCI scores based on the Hamming distance upon nearest neighbor identification.This metric treats the scores as categorical rather than continuous variables which is more representative of their ordinal nature.Also, despite its popularity, a linear regression model using all ISNCSCI scores inherently violates the assumption of statistically independent input variables which may negatively impact the model's performance even in the presence of feature selection through regularization in a Ridge

Table 4
Performance comparison of different kNN similarity metrics evaluated on patient subgroups within the primary (EMSCI) cohort.Neurological recovery prediction performance is scored using the RMSE blNLI , and ΔLEMS.Abbreviations: AIS: American Spinal Injury Association impairment scale, NLI: neurological level of injury, LEMS: lower extremity motor score, RMSE blNLI : rootmean-squared error below the NLI, MS: motor score, ROC-AUC: area under the receiver operator characteristic.Performance comparison of different kNN similarity metrics evaluated on the validation (Sygen) cohort both the cohort as a whole and subgroups are reported.Neurological recovery prediction performance is scored using the RMSE blNLI , and ΔLEMS.Functional recovery prediction performance is assessed by AIS grade conversion (at least one point), and walking ability.Positive class prevalences: AIS A conversion: 0.24, AIS B conversion: 0.63, AIS C conversion: 0.87, walker: 0.24.Note that results in clusters comprising very few patients may be subject to large variation (greyed out).Abbreviations: AIS: American Spinal Injury Association impairment scale, NLI: neurological level of injury, LEMS: lower extremity motor score, RMSE blNLI : root-mean-squared error below the NLI, MS: motor score, ROC-AUC: area under the receiver operator characteristic.regression implementation.Subgroup analysis showed an expected pattern.Patients presenting with either severe (AIS A) or mild injuries (AIS D) achieved better predictive performance, likely due to ceiling and flooring effects in addition to larger patient numbers.Similar to previous studies (Lee et al., 2016;Zörner et al., 2010), we recorded the strongest variation for AIS B and C patients, which represent 30% of the included.Patients with an AIS B/C display large variations in motor and sensory function implying a stronger degree of interpatient heterogeneity.However, the reported predictions still succeeded in predicting the general trends and MS sequence patterns as demonstrated by quantitative metrics (RMSE) and representative examples (Fig. 2A, Fig. 3D).
Several studies have previously addressed the challenge of SCI recovery prediction of various endpoints such as walking scores, AIS grade conversion, or TMS recovery (Belliveau et al., 2016;Hicks et al., 2017;Chay and Kirshblum, 2020;Denis et al., 2018).Despite promising results within small cohorts, including detailed additional data such as CSF biomarkers or imaging data, (Freund et al., 2019;Kim et al., 2012) a more generalizable approach based only on clinically well-established scores allowing for large cohort evaluations was missing.Our analysis fills this gap with a data-driven, personalized, and multi-faceted recovery prediction for the individual.An important advantage of our study was the validation in an independent patient cohort as well as the inclusion of prediction confidence ranges due to ISNCSCI score uncertainty which has not been previously reported.In summary, we improved beyond the state-of-the-art in terms of prediction detail, interpretability of the underlying model, and the accounting for input data uncertainty.
The clinical impact of this study is twofold.Firstly, the presented concept provides a prediction regarding neurological and functional recovery for the individual.The prediction process is inherently interpretable by reviewing the selected neighbors.Secondly, we provide a means to identify historic controls for individual trial patients to increase and improve the pool of controls in clinical trials.This is of particular importance given the large heterogeneity and low incidence of SCI.Our publicly available code enables the direct application of the concept.An in silico evaluation of the statistical power of this way to identify historic control cohorts will be addressed in our follow-up study.
Despite this success, some limitations remain.The ISNCSCI exam data lacked potentially important additional clinical information, e.g. the timing and success of decompression surgery, patient comorbidities, or other treatment and response effects (e.g.blood markers, medication, or imaging information).To the best of our knowledge, such information is not currently available for a sufficiently large patient population in a database.Relying solely on the ISNCSCI exam was a compromise between the number of patients and the assessment detail covered.We suggest repeating the presented evaluation once more detailed data become available.Importantly, given the dynamically evolving field of SCI management, we stress that the proposed approach is flexible to restrict patient reference pools to those of the specific hospital environment to mitigate the possible impact of specific rehabilitation strategies.The presented study is intended as a proof of principle that is flexible to work with different data sets as defined by the used.
We purposefully did not include longitudinal information of later time points as the aim of a historic cohort by our interpretation should be i) a perspective of recovery in the very acute injury phase, and ii) historic control cohort building for trials in which patients are randomized in the acute injury phase.It is known that adverse, diseasemodifying events, e.g.severe pneumonia, may greatly influence recovery (Kopp et al., 2017).Since we restricted this analysis to a single time point for neighbor matching, we ignored longitudinal changes in recovery trajectory due to modifying events and excluded patients that greatly deteriorated.The proposed analysis hence represents predictions based on an expected normal/optimal recovery trajectory only and this limitation should be considered in case a patient experiences a severe disease-modifying event.
Patient numbers and the comparably early time of evaluation (6 months) may be a further limitation.The EMSCI database is one of the largest, longitudinal SCI databases to date assessing patients at 26 and 52 weeks at the latest.Owing to large proportions of missing data, it was not possible to include all enrolled patients and motivated the use of the 6-month time point given larger data availability.Our kNN is strongly based on motor scores which are difficult to impute leading to completecase analysis, i.e. discarding any patients with missing data (EMSCI 75%, Sygen 46%, see Fig. S1).We would like to stress that SCI recovery at 6 months is likely not final and further improvements are possible.However, we presented a proof-of-principle evaluation for a single time point (six months) that is translatable to different evaluation endpoints.
Another limitation of the concept in itself is the lack of any underlying mechanistic modeling or description of potential relations among variables and predictions as is the case for other machine learning and mathematical models.Thus, this approach cannot infer beyond the provided pool of reference patients whereas other algorithms may increase inference strength given a sufficiently large number of training examples.This especially limits performance for patient subgroups where variation in motor and sensory scores is large, but sample numbers are limited (here AIS B and C).It would be essential to increase the pool of historic reference patients to further boost the performance of kNN in these groups.It should also be noted that the definition of AIS C has changed slightly over time with adaptations being recommended in 2003 and 2011.This implyies that this group may not be fully comparable between Sygen and EMSCI.Given that our approach accounts for more than the AIS grade for matching this should be kept in mind but is not detrimental.In a similar direction, also the way the ISNCSCI exam is conducted has evolved since the 1990s, implying possible differences in the data and relevant recorded recovery trajectories over time.While Bourguignon et al. (Bourguignon et al., 2022) assessed these changes in the EMSCI, the Sygen trial was conducted before this period.Despite these limitations, given that the Sygen trial offers the largest SCI trial database to date it is of high relevance to the field and hence motivated its choice as validation set in this study.While other clinical trial data would certainly also have been available for this task, the limitations in patient number here would have demanded pooling of cohorts from several studies which was beyond the scope of offering a first proof of principle external validation.
Finally, despite investigating several similarity metrics, it is clear that this is not a fully exhaustive evaluation and further optimization of the matching criteria is possible.

Conclusions
We presented a systematic analysis of the concept of nearest neighbor matching for recovery prediction of traumatic SCI based on acute injury phase ISNCSCI assessment scores and benchmarked it against linear prediction models.Our model provides competitive, detailed predictions of the full segmental motor score sequence in addition to functional endpoints -each of these with a notion of prediction confidence due to input data uncertainty.The approach was further successfully validated in an independent patient cohort.By making our software pipeline publicly available we provide a path for application in the community by interested patients and medical doctors.

Funding
This study was supported by the Swiss National Science Foundation (Ambizione Grant, #PZ00P3_186101), Wings for Life Research Foundation (#2017_044, #2020_118, #2024_301), and the International Foundation for Research in Paraplegia (IRP, P192).SB was supported by the Botnar Research Centre for Child Health Postdoctoral Excellence Programme (#PEP-2021-1008).The funders did not specify the study

Fig. 1 .
Fig. 1.Study overview.A: overview of the investigated nearest neighbor matching pipeline.Matching starts with an optional stratification of the reference pool (1) to match AIS grade, NLI and/or patient age (within +/− 10 years for each patient, not by age categories), and sex to the patient of interest, followed by the identification of neighbors based on one of four types of motor function pattern quantifications (2) as outlined in Table 1.The final recovery prediction constitutes an agglomeration of the recovery phase MSs of all identified neighbors by mean or median (3) leading to the predicted MS sequences shown as graphs, whose y-axis is the MS value, and the successive myotome location is encoded in the x-direction.Predictions (grey) and the true (red) MS sequence for a tetraplegic patient classified as AIS C are shown.Uncertainty bands (shaded) stem from bootstrapping.B: Evaluation endpoints assessed including scores quantifying neurological and functional recovery.Abbreviations: SCI: spinal cord injury, AIS: American Spinal Injury Association impairment scale, NLI: neurological level of injury, LEMS: lower extremity motor score, RMSE blNLI : root-mean-squared error below the NLI, MS: motor score, meanMS: mean motor score, kNN: k-nearest neighbor, SCIM: spinal cord independence measure.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2. Prediction performance.A: Visualization of a representative patient with SCI from the EMSCI cohort achieving an RMSE bl.NLI of 0.78.We show, separately for the left and right motor scores, the patient matching (acute phase) and prediction (6 months) results for the full segmental MS sequence as the true (red) and predicted (black) sequences.Shaded bands correspond to 95% confidence bounds stemming from the MS assessment uncertainty.In row two we show the corresponding identified neighbors separately.The table summarizes the relevant prediction of additional endpoints with relevant 95% CI (numeric scores) or class probabilities (binary predictions).B: ROC with relevant AUCs for the prediction of walking ability for the EMSCI and Sygen cohorts by kNN or logistic regression.C: ROC with relevant AUC for the prediction of self-care ability for the EMSCI cohort by kNN regression, D: ROC with relevant AUCs for the prediction of AIS grade conversion for the EMSCI and Sygen cohorts by kNN regression.Abbreviations: AIS: American Spinal Injury Association impairment scale, RMSE blNLI : root-meansquared error below the NLI, MS: motor score, ROC: receiver operator characteristic.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3. Patient subgroup performance.A: u-map embedding of the included EMSCI cohort with various clinical attributes highlighted.B: Visualization of the prediction performance (RMSE blNlI ) variation of the best model across the primary patient cohort.C: u-map embedding colored by patient cluster (left, numbered 1 to 8) with the corresponding heatmap visualizing the contribution of each MS LTS, and PPS to the clustering (right).Heatmap colors indicate the level of normalized MS, LTS, or PPS.D: For each cluster, representative examples (achieving the median RMSE blNLI in the relevant cluster) are shown with their true (red) and predicted (black) left-side recovery MSs and functional scores.Abbreviations: AIS: American Spinal Injury Association impairment scale, NLI: neurological level of injury, LEMS: lower extremity motor score, RMSE blNLI : root-mean-squared error below the NLI, u-map: uniform manifold approximation and projection, MS: motor score, LTS: light touch score, PPS: Pinprick score, DAP: deep anal pressure, ROC: receiver operator characteristic, nan: Not a number -missing data for DAP.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 1
Characteristics of patients included in this analysis relative to the full EMSCI and Sygen cohorts.*percentages excluding unknowns.+ the Sygen trial was restricted to cervical and thoracic injuries.Abbreviations: EMSCI: European Multicenter Study about Spinal Cord Injury; AIS: American Spinal Injury Association Impairment Scale; NLI: Neurological level of injury.

Table 2
Overview of the four types of motor sequence quantification for matching.Abbreviations and variables: LEMS: lower extremity motor score; MS: motor score; m i : motor score of myotome i; NLI: neurological level of injury; N: number of myotomes below the NLI, RMSE: root-mean-squared error; dist H : Hamming distance; s i : score of dermatome or myotome i including LTS, PPS, and MS; LTS: light touch score, PPS: pinprick score.
Neighbors are the k nearest neighbors by Hamming distance in high dimensional space spanned by all 128 SSs and 20 MSs.All scores are normalized to the interval [0,1].kin[1,3,10,20]

Table 3
Performance comparison of different kNN similarity metrics evaluated on the primary (EMSCI) cohort.Neurological recovery prediction performance is scored in terms of the RMSE blNLI , and ΔLEMS.Functional recovery prediction performance is assessed by AIS grade conversion (by at least one point), walking (SCIM 12 > 4 points), and self-care (SCIM self-care subscore >15 points) ability.The best-performing approach based on a combination of 50th and 97.5th percentile RMSE blNLI is highlighted in bold.Positive class prevalences: EMSCI: AIS A conversion: 0.27, AIS B conversion: 0.66, AIS C conversion: 0.73, walker: 0.35, self-care: 0.57; Validation cohort (Sygen): AIS A conversion: 0.24, AIS B conversion: 0.63, AIS C conversion: 0.87, walker: 0.24.Abbreviations: AIS: American Spinal Injury Association impairment scale, NLI: neurological level of injury, LEMS: lower extremity motor score, RMSE blNLI : root-mean-squared error below the NLI, MS: motor score, ROC-AUC: area under the receiver operator characteristic.