Corrigendum to “ predicting non-response to ketamine for depression: an exploratory symptom-level analysis of real-world data among military veterans

for patients undergoing repeated ketamine treatments for TRD, and an approach for predicting non-response using patients ’ baseline symptoms alone. Again, none of the main conclusions have changed as a result of the correction. The authors would like to apologise for any inconvenience caused

The authors regrettably identified a calculation mistake affecting the paper "Predicting non-response to ketamine for depression: an exploratory symptom-level analysis of real-world data among military veterans (Miller et al., 2024)."Correcting this mistake does not change any of the conclusions of the paper.
The conclusions of the paper, which remain unchanged, are as follows: (1) linear models were better than exponential models at describing the item-by-item symptom trajectories for individual patients undergoing ketamine sessions for treatment resistant depression (TRD); (2) analysing the slopes of those models showed that all nine symptoms of depression improved across treatment, but depressed mood improved relatively faster than low energy; (3) principal component analysis (PCA) in the space of symptom trajectories reveals a first principal component (PC) that describes overall treatment response, and a second PC that reflects differential response across affective versus somatic symptom clusters; (4) logistic regression classifiers were able to predict overall treatment response (sign of the first PC) better than chance using patients' baseline PHQ-9 symptoms, as assessed by cross validation; (5) parametrically adjusting the regression decision threshold identifies a set of models that can predict treatment non-response with high negative predictive value while retaining a meaningful specificity, better than pre-defined performance criteria; and (6) analysing the features of the best models revealed that more severe baseline depression predicted response, but also that certain patterns of symptoms predicted nonresponse.Higher scores on item #9 (self-harm or suicidal ideation), relative to other symptoms such as item #2 (depressed mood), predicted non-response for the best model.
We specifically identified a mistake in the calculation of crossvalidated confusion matrices for the threshold-tuned models in Fig. 3c and 3d of the paper.Correcting this mistake results in slightly lower negative predictive value (NPV) and specificity for the best models.We previously wrote that the best model had NPV of 96 % and specificity of 22 %.After correction, the best model's NPV is 91 % and specificity is 20 %.Both are still above our predefined performance criteria of > 90 % NPV and > 10 % specificity.Furthermore, we still identify dozens of different models meeting those performance criteria, though somewhat fewer than before, with a range of different NPV/specificity tradeoffs (Fig. 3c).The features of those models contribute in the same relative directions, indicating no change in how baseline symptoms predicted response/non-response (Fig. 3d).In fact, our finding that relatively higher scores on item #9 (self-harm or suicidal ideation) favored nonresponse was even more consistent after correction.We include here an updated Fig. 3 and list all necessary textual changes that reflect those limited corrections.
In summary, the paper presented an exploratory analysis for modelling item-by-item symptom trajectories for patients undergoing repeated ketamine treatments for TRD, and an approach for predicting non-response using patients' baseline symptoms alone.Again, none of the main conclusions have changed as a result of the correction.The authors would like to apologise for any inconvenience caused.

Highlights
"…prediction of non-response with over 96 % predictive value…" was changed to "…prediction of non-response with over 91 % predictive value…" Abstract "…negative predictive value of over 96 %, while retaining a specificity of 22 %.Thus, we could identify 22 % of patients…" was changed to "…negative predictive value of over 91 %, while retaining a specificity of 20 %.Thus, we could identify 20 % of patients…"

Results
"We identified hundreds of models…" was changed to "We identified dozens of models…" "The best of these models (see Methods) had a NPV of 96.4 %, while retaining a specificity of 22.1 %." was changed to "The best of these models (see Methods) had a NPV of 91.4 %, while retaining a specificity of 20.3 %." "…higher values on item #9 (self-harm or suicidal ideation) favored non-response in 98.7 % (224 of 227) models in which it was a feature (Fig. 3d)." was changed to "…higher values on item #9 (self-harm or suicidal ideation) favored non-response in 100 % (20 of 20) models in which it was a feature (Fig. 3d)."Fig. 3. tuning to confidently predict non-response (a) Diagram depicting threshold tuning approach.Red and green ellipses represent a hypothetical projection of data for non-responders and responders, respectively.Black dashed line indicates standard decision threshold for logistic regression, which maximizes accuracy.Green and red dashed lines indicate alternate choices for the decision threshold, which maximize positive predictive value (PPV) for predicting response or negative predictive value (NPV) for predicting non-response, respectively.(b) Diagram depicting three major steps of model selection procedure: feature search, threshold tuning, and cross validation.Grey boxes represent the different sets of baseline PHQ-9 items that can be tried as a feature set.Threshold tuning, as in (a), was conducted for each feature sets, resulting in over 10,000 models that then underwent cross validation.(c) Scatter plot of cross validation model performance for some of the best models for predicting response (green) or non-response (red).Plot shows the inherent tradeoff between predictive value (PPV or NPV) and coverage of the relevant cases (sensitivity or specificity).Green dots represent the best models, in terms of highest sensitivity (y-axis), among all of the models with at least a minimum PPV (x-axis).Red dots represent the best models, in terms of highest specificity (y-axis), among all of the models with at least a minimum NPV (x-axis).Dashed grey lines show cutoffs for our predefined model performance specifications: PPV or NPV > 90 %, sensitivity or specificity > 10 % (upper right quadrant).(d) Violin plots showing the distribution of regression coefficients for each PHQ-9 item, across all models meeting the performance criteria of NPV > 90 % and specificity > 10 % (upper right quadrant in panel c).Positive coefficients favored response, and negative coefficients favored non-response.Dotted horizontal line indicates a coefficient of 0.
"Item #3 (difficulty sleeping) also consistently favored non-response for 100 % (130 of 130) of the models in which it was a feature (Fig. 3d)." was changed to "Item #3 (difficulty sleeping) also consistently favored non-response for the two models in which it was a feature (Fig. 3d)." "…with very high confidence (over 96 % NPV) for a subset of 22 % of the patients…" was changed to "…with very high confidence (over 91 % NPV) for a subset of 20 % of the patients…"

Discussion
"…highly unlikely to respond to ketamine with over 96 % predictive value…" was changed to "…highly unlikely to respond to ketamine with over 91 % predictive value…" "…could predict non-response to ketamine, i.e. treatment failure, with 96 % NPV." was changed to "…could predict non-response to ketamine, i.e. treatment failure, with 91 % NPV." "…so we detect only a subset of the non-responders (22 %)." was changed to "…so we detect only a subset of the non-responders (20 %)." Limitations "Our best model has over 96 % NPV at the expense of identifying a subset of 22 %…" was changed to "Our best model has over 91 % NPV at the expense of identifying a subset of 20 %…" "…even in our current model, 22 % of individuals who were referred…" was changed to "…even in our current model, 20 % of individuals who were referred …"