Machine learning prediction of malaria vaccine efficacy based on antibody profiles

Immunization through repeated direct venous inoculation of Plasmodium falciparum (Pf) sporozoites (PfSPZ) under chloroquine chemoprophylaxis, using the PfSPZ Chemoprophylaxis Vaccine (PfSPZ-CVac), induces high-level protection against controlled human malaria infection (CHMI). Humoral and cellular immunity contribute to vaccine efficacy but only limited information about the implicated Pf-specific antigens is available. Here, we examined Pf-specific antibody profiles, measured by protein arrays representing the full Pf proteome, of 40 placebo- and PfSPZ-immunized malaria-naïve volunteers from an earlier published PfSPZ-CVac dose-escalation trial. For this purpose, we both utilized and adapted supervised machine learning methods to identify predictive antibody profiles at two different time points: after immunization and before CHMI. We developed an adapted multitask support vector machine (SVM) approach and compared it to standard methods, i.e. single-task SVM, regularized logistic regression and random forests. Our results show, that the multitask SVM approach improved the classification performance to discriminate the protection status based on the underlying antibody-profiles while combining time- and dose-dependent data in the prediction model. Additionally, we developed the new fEature diStance exPlainabilitY (ESPY) method to quantify the impact of single antigens on the non-linear multitask SVM model and make it more interpretable. In conclusion, our multitask SVM model outperforms the studied standard approaches in regard of classification performance. Moreover, with our new explanation method ESPY, we were able to interpret the impact of Pf-specific antigen antibody responses that predict sterile protective immunity against CHMI after immunization. The identified Pf-specific antigens may contribute to a better understanding of immunity against human malaria and may foster vaccine development.

A: The mean PR-AUC score over all Pearson correlation coefficients of the multitask-SVM approach with a kernel combination of 'RRR' is 0.898 and higher than the mean PR-AUC score of the multi-time RLR approach of 0.885.

B:
The mean PR-AUC score over all Pearson correlation coefficients of the multitask-SVM approach with a kernel combination of 'RRR' is 0.905 and disproportionally lower than the mean PR-AUC score of the multi-time RLR approach of 0.906.

A)
Pre-CHMI B)  B: Performance results of 10 times repeated 5-fold grid search CV for different combination of kernel combination and their parameter setting of the multitask SVM based on the whole and selective set of cell-surface antibody reactivity profile per time point.To evaluate the best prediction performances (PR-AUC) for the different kernel combinations for the kernel parameter ranges from S1 Table a 10 time repeated 5-fold grid search CV on the whole antibody reactivity profile and the pre-selected cell-surface antigens from immunized and non-immunized individuals were performed.The table shows the kernel combinations and their parameter setting which resulted in the highest mean PR-AUC per time point.R0 represents the gamma value of the rbf kernel function for the time series, R1 or P1 represents either the value of the rbf kernel function or the polynomial kernel function for the Pf-SPZ dose and R2 for the antibody signal intensity.Based on the highest mean PR-AUC value of the kernel combination per time point (III+14 or C-1) , the kernel parameter combination 'RRR' was used to evaluate the ESPY value for each feature in the multitask-SVM classification of protected immunized and non-protected immunized/non-immunized individuals based on the whole and selective cell-surface antigen antibody reactivity profile.Here, the 12 Pf-specific antigen associated for the protection state and the top 38 Pf-specific antigens associated for the non-protected state evaluated based on their ESPY (|d_norm|) value are listed.The effect of the evaluated Pf-specific antigen for the classification of protected versus non-protected individuals is defined by the respectively effect symbol, where '+' denotes a positive and '-' denotes a negative effect.The top 12 of the positive associated and the top 38 of the negative associated Pf-specific antigens were selected based on the maximum |d_norm| value.from the proteome antibody profile of pre-selected cell-surface antigens at post-immunization.Here, the top 25 Pf-specific antigens associated for the protection state and the top 25 Pf-specific antigens associated for the non-protected state evaluated based on their ESPY (|d_norm|) value are listed.The effect of the evaluated Pf-specific antigen for the classification of protected versus non-protected individuals is defined by the respectively effect symbol, where '+' denotes a positive and '-' denotes a negative effect.from the proteome antibody profile of pre-selected cell-surface antigens at pre-CHMI.Here, seven Pf-specific antigens associated for the protection state and the top 43 Pf-specific antigens associated for the non-protected state evaluated based on their ESPY (|d_norm|) value are listed.The effect of the evaluated Pf-specific antigen for the classification of protected versus non-protected individuals is defined by the respectively effect symbol, where '+' denotes a positive and '-' denotes a negative effect.

Fig B :
Fig B: Performance of multitask SVM models in in predicting the protection status based on cell-surface antibody reactivity profile over different Pearson correlation coefficients as compared to state-of-the-art approaches.The PR-AUC score of the RLR, the RF, the single-task SVM (trained either on each single time point or on the combined time points), and the multitask-SVM models (using different combinations of kernel functions) for predicting the protection status based on cell-surface antibody reactivity profile over a range of Pearson correlation coefficients from 0.1 to 1 was assessed via 10-times repeated nested stratified 5-fold cross-validation.RF, RLR and single-task SVM models trained on each time point separately are labeled by the extension 'singleTime'.The PR-AUC performance of the different applied models is shown A: at post-immunization (III+14) and B: at pre-CHMI (C-1).The multitask-SVM approach with a kernel combination of either 'RPR' or 'RRR' shows a robust PR-AUC score over the range of Pearson correlation coefficients compared to the state-of-the-art approaches at post-immunization (A) and at pre-CHMI (B).Only multi-time RLR achieved similar prediction performances over the range of Pearson correlation coefficients, but with a higher standard deviation at post-immunization (A) and at pre-CHMI (B).The dashed line represents the mean PR-AUC score over all Pearson correlation coefficients for the multitask-SVM model with the kernel combination of 'RRR'.A: The mean PR-AUC score over all Pearson correlation coefficients of the multitask-SVM approach with a kernel combination of 'RRR' is 0.898 and higher than the mean PR-AUC score of the multi-time RLR approach of 0.885.B: The mean PR-AUC score over all Pearson correlation coefficients of the multitask-SVM approach with a kernel combination of 'RRR' is 0.905 and disproportionally lower than the mean PR-AUC score of the multi-time RLR approach of 0.906.

Table A :
Parameter ranges for evaluation of the best combinations of kernels for the multitask SVM approach.Parameter ranges for the ten time repeated nested stratified 5-fold cross-validation of the RLR, RF and single-task SVM (with rbf kernel) methods.

Table C : Table of top 50 identified Pf-specific antigens from the whole proteome antibody profile evaluated by ESPY at post-immunization.
Here, the top 25 Pf-specific antigens associated for the protection state and the top 25 Pf-specific antigens associated for the non-protected state evaluated based on their ESPY (|d_norm|) value are listed.The effect of the evaluated Pf-specific antigen for the classification of protected versus non-protected individuals is defined by the respectively effect symbol, where '+' denotes a positive and '-' denotes a negative effect.