Searching for the Predictors of Response to BoNT-A in Migraine Using Machine Learning Approaches

OnabotulinumtoxinA (BonT-A) reduces migraine frequency in a considerable portion of patients with migraine. So far, predictive characteristics of response are lacking. Here, we applied machine learning (ML) algorithms to identify clinical characteristics able to predict treatment response. We collected demographic and clinical data of patients with chronic migraine (CM) or high-frequency episodic migraine (HFEM) treated with BoNT-A at our clinic in the last 5 years. Patients received BoNT-A according to the PREEMPT (Phase III Research Evaluating Migraine Prophylaxis Therapy) paradigm and were classified according to the monthly migraine days reduction in the 12 weeks after the fourth BoNT-A cycle, as compared to baseline. Data were used as input features to run ML algorithms. Of the 212 patients enrolled, 35 qualified as excellent responders to BoNT-A administration and 38 as nonresponders. None of the anamnestic characteristics were able to discriminate responders from nonresponders in the CM group. Nevertheless, a pattern of four features (age at onset of migraine, opioid use, anxiety subscore at the hospital anxiety and depression scale (HADS-a) and Migraine Disability Assessment (MIDAS) score correctly predicted response in HFEM. Our findings suggest that routine anamnestic features acquired in real-life settings cannot accurately predict BoNT-A response in migraine and call for a more complex modality of patient profiling.


Introduction
According to the Global Disease Burden study (GDB), migraine is one of the most disabling neurological conditions in young adults, affecting 14% of the population worldwide [1]. Migraine is classified as chronic (CM) when a headache is present for at least 15 days/month [2]. For when the number of headache days is less than 15 per month, two additional subtypes have been identified for clinical and research purposes: lowto-moderate frequency (up to 6-7 headache days/month) and high-frequency episodic migraine (HFEM) (when the monthly headache days are between 8 and 14) [3]. HFEM is considered a subtype of migraine with a high risk of transformation into CM. Therefore, it appears crucial to focus on preventive treatments that target this condition to reduce migraine frequency and prevent a negative evolution [4].
CM is the most debilitating subtype of migraine [5]. Its treatment currently presents a medical challenge. To date, only onabotulinumtoxinA (BoNT-A), antibodies and small Toxins 2023, 15, 364 2 of 14 molecules against the Calcitonin Gene-Related Peptide (CGRP) and topiramate are specifically approved for the prophylaxis of CM. BoNT-A inhibits CGRP, P substance and glutamate release as well as transient receptor potential ankyrin 1 (TRPA1) and transient receptor potential vanilloid 1 (TRPV1) translocation, thus switching off peripheral sensitization and, consequently, central sensitization [6]. It was approved by the FDA in 2010 and it is currently administered according to the PREEMPT (Phase III Research Evaluating Migraine Prophylaxis Therapy) protocol, with additional injections in the "follow-the-pain" extension. BoNT-A has also been proved to be effective in reducing the burden of disease in subjects with HFEM when administered according to the PREMPT paradigm in a recent pilot study [7]. A sponsored, larger, multicentric, controlled study is now ongoing to confirm the signal detected in this study (ClinicalTrials.gov Identifier: NCT05028569).
Nevertheless, not all patients adequately respond to BoNT-A injections. Several studies have investigated the possible predictors of treatment responsiveness using clinical, anamnestic, molecular, or imaging features, but they brought only limited evidence [8]. A recent large multicentric study evaluated the rate of excellent responders to BoNT-A and explored the predictors of such responses according to different definitions of efficacy [9]. For example, the factors which were independently associated with an excellent response to BoNT-A, based on the percentage of migraine days reduction, included the presence at baseline of medication overuse and a higher excellent response rate already after the first and the second injection. Females were less likely to present fewer than four monthly headache days.
Unfortunately, most features highlighted as predictors were not replicated in different datasets. Ideally, to be successfully applied in clinical practice, a biomarker should be easily available for analysis (e.g., economically and biologically) and should have a clear cause-effect relationship with the investigated outcome. Overall, there are still no strong single factors correlated with the response to preventive treatment in migraine. Therefore, identifying these features with different approaches is crucial in migraine management, as it could allow targeted therapy, improve patients' life quality, and diminish healthcare costs, which are all relevant findings when planning treatment strategies in clinical practice.
A machine learning (ML) algorithm is a type of computer program designed to automatically improve its performance by learning from data. Machine learning algorithms use statistical models to identify patterns and relationships in data and use these patterns to make predictions or decisions [10]. There are three types of machine learning algorithms: supervised learning (trained on labelled data), unsupervised learning (trained on unlabeled data), and reinforcement learning (which learns by interacting with the environment). Machine learning algorithms are used in various fields of medicine, including image and speech recognition, natural language processing, and predictive analytics [11]. These machine learning predictive algorithms are being increasingly used in medicine to assist with diagnosis, treatment, and prognosis, as already described in the literature in different areas of application-Alzheimer's disease, psychiatric disorders, multiple sclerosis, stroke, etc. [12]. When it comes to migraine, some studies applied ML to the setting of correctly classifying primary headache disorders [12][13][14][15][16], whereas only a few of them have focused on predicting treatment response to antibodies against the CGRP [17]. Not only that, but ML could also be applied to extract prognostic information from demographic, clinical and biochemical data to predict the risk to develop medication overuse headache [18].
The primary aim of this single-center, retrospective, real-life study was to detect clinical/anamnestic features able to predict treatment responsiveness in migraine patients undergoing BoNT-A administration for migraine prevention using machine learning algorithms.

Population Analysed
Two hundred thirty-nine subjects started BoNT-A treatment at IRCCS Mondino Foundation between January 2016 and March 2021. The patients' disposition is illustrated in Figure 1.

Population Analysed
Two hundred thirty-nine subjects started BoNT-A treatment at IRCCS Mondino Foundation between January 2016 and March 2021. The patients' disposition is illustrated in Figure 1. The final analysis was carried out on 91 eligible subjects who completed the four BoNT-A cycles (59 with CM and 32 with HFEM) and on 54 subjects who terminated the treatment after one single cycle (all CM).
A good/excellent response rate (>50% monthly migraine days (MMD) reduction) after 1 year of treatment was observed in 38% of the subjects who completed the four cycles. Demographic and anamnestic features of the population analyzed, divided according to the response rate to BoNT-A, are listed in Tables 1 and 2. The final analysis was carried out on 91 eligible subjects who completed the four BoNT-A cycles (59 with CM and 32 with HFEM) and on 54 subjects who terminated the treatment after one single cycle (all CM).
A good/excellent response rate (>50% monthly migraine days (MMD) reduction) after 1 year of treatment was observed in 38% of the subjects who completed the four cycles. Demographic and anamnestic features of the population analyzed, divided according to the response rate to BoNT-A, are listed in Tables 1 and 2.   Table: Groups were labelled as follows: Group 1: early termination (after 1 BoNTA administration); Group 2: nonresponders (<25% response after the 4th cycle); Group 3: poor responders (25-50% response after the 4th cycle); Group 4: good responders (50-75% response after the 4th cycle); Group 5: excellent responders (>75% response after the 4th cycle). All variables are expressed as mean value, ±standard deviation. Legend: n = number; MIDAS = Migraine Disability Assessment; HIT6 = Headache Impact Test-6; ASC-12 = Allodynia Symptom Checklist-12.

Machine Learning Analysis
All ML methods applied to the datasets (classified according to the primary or secondary endpoints) failed to discriminate good/excellent responders from nonresponders to BoNT-A administration after the fourth cycle of the treatment using only demographic, anamnestic or clinical features as the input. Table 3 shows the performances using all the different ML approaches applied in the classification comparing groups 2 vs. 4 + 5. The panel of features used for each different classification between groups is listed in Table S1.    When the analysis was extended by including the early terminators (group 1) among the non-responders (groups 1 + 2), the best performance for the primary endpoint with a high classification accuracy of 84.27% (area under the curve (AUC) = 89%) was obtained using 24 baseline features, applying the random forest (RF) algorithm. Table S2 shows the performances and the panel of features resulting from the use of all the different ML approaches applied in the classification comparing groups 1 + 2 vs. groups 4 + 5.
In the subgroup of 32 people with HFEM, BoNT-A reduced MMD at the end of the fourth cycle by 3.68 days (−33.1%, p < 0.01) as highlighted in our previous publication [7]. In this subset, RF discriminated responders from nonresponders with a high classification accuracy of 85.71% (AUC = 90.91%) using four baseline features (Table 4). High responsiveness positively correlated with migraine onset age and hospital anxiety and depression score, namely the anxiety subscore (HADS-a). On the other hand, high responsiveness negatively correlated with ongoing opioid use and Migraine Disability Assessment (MIDAS) score at baseline (Table 5).  When forcing the ML algorithms to use the pattern of four features described to validate this model in the cohort of people living with CM, the ML failed to confirm the validity of the model. The performances of the validation protocols are listed in Table S3.

Discussion
This is the first single-center, real-life retrospective study applying machine learning algorithms to identify factors associated with the efficacy of BoNT-A already at baseline or after the first trimester in a relatively large database of patients.
Clinicians have been seeking biomarkers predictive of BoNT-A effectiveness in migraine for several years now, also searching in many different biological samples, including blood, urine, tissue, and saliva [19]. Acquiring this information is crucial to make the best use of healthcare resources and planning treatments; Table S4 provides an updated summary of the available articles reporting the factors associated with the efficacy of BONT-A in chronic migraine. More specifically, regarding clinical predictors, one of the first features was suggested by Jakubowski et al., who observed that an ocular or imploding phenotype of pain was associated with response to BoNT-A, while exploding pain was not [20]. This finding was later confirmed by some authors [21][22][23], but not by others [24,25]. It appears noteworthy that these studies were conducted with different injection paradigms. Other characteristics were reported as predictive factors of response to BoNT-A in some studies, including unilateral pain, cutaneous allodynia and pericranial muscle tenderness, but there is contrasting evidence as not all studies succeeded in finding these associations [26][27][28][29][30]. Conflicting evidence is also available for the presence/absence of comorbidities (e.g., the presence of depressive symptoms) [31,32] or disease duration [33,34]. CGRP and PTX3 plasmatic levels have also been reported as predictors of responsiveness and/or efficacy [35,36]. Brain structural and functional MRI could also be a tool for assessing response. In a retrospective study, Hubbard et al. demonstrated that BoNT-A responders had increased cortical thickness in the right primary somatosensory cortex, anterior insula left superior temporal gyrus and pars opercularis, compared to nonresponders [37], but it remains unclear whether these findings could be predictive characteristics of the response. Another proposed hallmark of responsiveness was the level of iron accumulated in periaqueductal grey matter, which seemed to correlate with poor response in one study [38].
Overall, despite the mentioned studies, there is currently not enough data to help clinicians make treatment decisions or predict the drug's effectiveness for a particular patient because the extent of their impact is limited (see Table S4). Moreover, while interesting, testing CGRP or performing an fMRI scan is not yet practical nor economically sustainable in everyday practice.
For this reason, considering the limited prediction potential of a single biomarker, we decided to investigate, with several machine learning algorithms, whether the commonly available clinical information could be combined into a novel panel of features able to predict response to BoNT-A. This approach would provide a predictive tool easily transferable to the real-world setting because the ML-based model is based on anamnestic features that are routinely collected in everyday practice.
The findings obtained with the group of CM patients are partially disappointing because all ML models reached an overall mediocre accuracy but underperformed and lacked specificity. Moreover, due to the low AUC, the algorithms were not able to correctly distinguish one group from the other. A reliable level of discrimination was reached only when a high number of clinical features were considered simultaneously. Precisely, the best performance for the primary endpoint with a high classification accuracy was obtained using 24 features. Considering the requirement of the concomitant presence of all 24 selected features, this condition appears to be impossible to apply in the real-life setting.
This output does not seem related to a limitation of the mathematical approach because the constraint of this analysis lies primarily in the relevance of the information used as an input for the investigation. An example was recently published by Gonzalez-Martinez et al. [17], who used data from a multicentric Spanish database to build an ML model that can predict anti-CGRP response at 6, 9 and 12 months.
More exciting was the output of ML in our subanalysis of the group of HFEM patients, where the random forest algorithm discriminated responders from nonresponders with a high classification accuracy of 85.71% using, altogether, four baseline features: migraine onset age, opioid use, hospital anxiety score, and the disability calculated with the MIDAS score. High responsiveness to BoNT-A treatment after 1 year positively correlated with higher migraine onset age and higher HADS-a score before starting the treatment and negatively correlated with ongoing opioid use as an abortive medication and higher MIDAS score at baseline. These four features represent a panel of easy-to-obtain parameters that, when evaluated together, predict BoNT-A therapy responsiveness in patients with HFEM, even if each feature is not sufficient to fully account for the result.
Unfortunately, this four-feature panel does not predict the response in our cohort of patients with CM. This may be explained by the increased complexity of the pathophysiology when the condition progresses to a more severe pattern, which includes persistent peripheral and central sensitization caused by repetitive and prolonged trigeminal nociceptive activation, and the decrease in endogenous brainstem inhibitory control [39]. This mutated condition is more difficult to treat and is associated with a substantially higher burden of disease, number of comorbidities, and social impact [40].
Machine learning is increasingly being used in the healthcare industry to solve various problems, including clinical decision-making support. Initially, machine learning was mainly used to analyze single-mode data. However, to improve the accuracy of predictions and to simulate the multifaceted nature of clinical decision making, researchers in the field of biomedical machine learning are combining data from different sources to create a more comprehensive dataset [41] on which to apply a multimodal machine learning approach. In the application of BoNT-A to migraine prevention, in order to define a set of features associated with its efficacy, a multimodal approach is, therefore, mandatory to overcome the lack of information when considering only the anamnestic features available. By acquiring and combining data from clinical, radiological, and wet biomarkers, it will probably be possible to build an effective algorithm, even if its applicability will be limited by the cost, reproducibility of the analysis, and technology required to collect/interpret this information. This approach of fusing disparate features, bringing together multiple data sources to capitalize on the unique and complementary information in an algorithmic framework, is aimed at replicating the holistic approach used by clinical experts in their decision-making process. In the near future, medicine will not witness ML overcoming clinical practice, but ML will definitely assist clinicians in their decision-making process [42].

Limitations
This was a single-center study in a limited, though relatively large, population. Therefore, we were forced to opt for the cross-validation approach to train and test the ML algorithms, which means that the testing records were taken from the same dataset used for training. A second limitation is represented by the retrospective nature of the study, which prevented the collection of some of the features for BoNT-A efficacy that proved interesting with the evolving lines of literature. Information such as the characteristics of the pain (for example, whether it was throbbing or explosive, unilateral or bilateral) and the presence of cranial autonomic symptoms or pericranial muscles tenderness were not collected systematically over the years and, therefore, were not included in the analysis. A prospective study would require their inclusion for an integrated and updated analysis. As a third limitation, the decision to also include in the nonresponder group the patients who stopped the treatment after the first cycle may have resulted in a data selection bias by analyzing as nonresponders patients who might have responded at later times if not discontinued. To control for this, we also applied unsupervised ML algorithms. The last limitation to mention is regarding the sample size, which was unbalanced between the subgroups. The size difference between the early terminator group and the responders might have impacted the statistical power of the ML algorithms. To minimize the potential impact of this, we decided to include both excellent and good responders in the responder group, presenting, therefore, an overall 50% reduction in the MMD, in line with the available literature.

Conclusions
Overall, ML findings suggest that routine anamnestic features acquired in real-life settings cannot accurately predict the response to BoNT-A treatment in CM patients. A deeper phenotyping of patients' features, possibly combined with multimodal parameters, is probably required to identify features associated with the response to BoNT-A. In the case of HFEM, ML techniques identified an easy-to-obtain panel of four features that are associated with the response to BoNT-A treatment. This finding is very important and paves the way to tailored therapy in a population of migraine patients who are at high risk of chronification and/or medication overuse.

Materials and Methods
This was a single-center, observational, retrospective, real-life study conducted at the Headache Science and Rehabilitation Center of the IRCCS Mondino Foundation of Pavia, Italy.
Data from patients with CM or HFEM on preventive treatment with BoNTA between January 2016 and March 2021 were evaluated. Data were previously acquired and anonymized within two different protocols both performed at the IRCCS Mondino Foundation of Pavia (Protocol 1 FM-BOEM authorized by the Pavia local ethical committee on 22 July 2017, reg. n • NCT 04578782 clinicaltrial.gov and Protocol 2 authorized by the Pavia local ethical committee n • 0097925/21). In Italy, studies using retrospective anonymous data from administrative databases, which do not involve direct access for investigators to identification data, do not require further Ethics Committee approval, notification, or patient-informed consent signing.
We selected patients who were started on BoNT-A treatment according to the PRE-EMPT injection paradigm (155-195 UI in 31-39 injection sites every 12 weeks) for at least one cycle and for whom a prospectively filled headache diary was available for the following 12 months. We excluded patients with insufficient health and/or headache documentation or with significant comorbidities that could have interfered with treatment response. Patients who had already received BoNT-A treatment before the observation period were also excluded. We collected demographic, general health history and headachespecific history. Concomitant medications were collected and classified into with/without potential migraine preventive effect and prescribed/not prescribed for migraine. Information concerning demographics, headache history (especially disease duration at first BoNT-A administration), comorbidities (such as depression, anxiety, low-back pain, hypertension, epilepsy, and sleep apnoea), and previous preventive treatment was also considered. Treatment efficacy was investigated by using self-assessment scales (Migraine Disability Assessment (MIDAS) [43]; Headache Impact Test-6 (HIT6) [44]; Allodynia Symptom Checklist-12 (ASC-12) [45]) after every treatment, in addition to the mere count of migraine/headache days.

Classification
Patients' response to treatment was used to separate subjects into different groups. They were primarily classified based on their percent-based reduction in monthly migraine days in the 12-week period after the fourth BoNT-A treatment, as compared to a 28-day baseline period (<25%; 25-50%; 50-75%; >75% reduction rate). Patients who terminated treatment early after 1 administration without any effects were profiled as well (early termination (ET)). ET was considered regardless of the clinical indication. The patients terminating the treatment early due to side effects were excluded from the analysis. The patients' disposition is illustrated in Figure 1.
Groups were labelled as follows based on the outcome at 12 weeks after the first administration cycle or 12 weeks after the 4th administration cycle: Group 1: early termination, BoNT-A was administered for only 1 cycle due to inefficacy. Group 2: nonresponders (<25% response after the 4th cycle). Group 3: poor responders (25-50% response after the 4th cycle). Group 4: good responders (50-75% response after the 4th cycle). Group 5: excellent responders (>75% response after the 4th cycle).
The demographic and anamnestic characteristics of the population studied are reported in Tables 1 and 2, considering this primary working classification. All variables are expressed as mean values ± standard deviation. Secondary classifications were calculated considering the % reduction in abortive medication intake, the % reduction in days in which an abortive medication is required and the MIDAS value % reduction.

Analysis Plan
Considering the entire dataset of people living with migraine, the primary outcome was the prediction of monthly migraine days (MMD) reduction in the 12-week period after the last BoNT-A treatment as compared to baseline. In a real-life monocentric prospective study investigating CM patients treated with BoNT-A, the authors observed that the benefit of the administrations could manifest itself over the canonical 2 cycles, thus suggesting extending treatment forward, to prevent early withdrawal in patients who could become responsive after the 3rd or the 4th cycle [46]. Therefore, we decided to include in the analysis patients with at least 4 subsequent cycles of BoNT-A treatment. The collected data at baseline and after the 1st cycle of BoNT-A were used as input features to run several supervised and unsupervised machine learning algorithms to predict the response to the BoNT-A treatment. The complete ML approaches applied are described in the next sections. Compared to the run-in baseline frequency, a 50% reduction in MMDs after the fourth cycle was considered as a good/excellent response, while a <25% reduction in MMDs was considered a lack of efficacy.
MMDs for each evaluation period were calculated as the mean of the three 4-week segments. We initially compared group 2 (nonresponders) vs. groups 4 + 5 (good and excellent responders). The analysis was subsequently carried out also considering the group of the early terminator as part of the nonresponders' group (groups 1 + 2 vs. groups 4 + 5).
Secondary endpoints were contemplated as well, and the same analysis plan with the different ML algorithms was carried out, first considering the reduction in abortive medication and, consequently, the number of days when the abortive medications were used. Compared to the run-in baseline condition, a reduction of more than 50% in abortive medication intake or days of drug use after the fourth cycle was considered as a good/excellent response, while a reduction of less than 25% was labelled as a lack of efficacy. As an exploratory endpoint, the reduction in MIDAS value was also evaluated, with the same approaches (classification: responders if >50% reduction after the fourth cycle compared to baseline, nonresponders if <25% reduction).
A final analysis was performed considering the subgroup of HFEM. The primary outcome was the reduction in MMDs in the 12-week period after the last BoNT-A treatment as compared to baseline. Therefore, the collected data were used as input features to run a machine learning algorithm to predict responders (>50% MMD reduction after the fourth administration vs. baseline). The Pearson correlation coefficient was calculated between the resulting features and the percentage of MMD reduction. Finally, the panel of features observed in the HFEM population was applied to the entire CM population to validate its predictive potential.

Feature Selection
The feature selection is an essential step for extracting the most informative features for the specific task from a dataset, discarding those which would add only redundant information. In this manuscript, each patient assessment resulted in a long feature vector and, therefore, a very large dataset which could have also led to overfitting issues in the following steps of the analysis. For these reasons, before running any ML code, the entire dataset underwent the application of the ReliefF feature selection algorithm [47], which outputs a ranking of features according to their relevance in determining the class value of the dataset records. After the feature selection step, the data underwent the ML algorithms to build the classification models.

Machine Learning Analysis
In this study, the automatic classification between groups was achieved implementing different ML algorithms (i.e., classifiers): artificial neural network (ANN), support vector machine (SVM), adaptive neuro-fuzzy inference system (ANFIS), random forest and fuzzy c-means clustering (FCM). All the methods were implemented as part of a home-developed [12] tool in Matlab (v. R2018b, The Mathworks, Inc., Natick, MA, USA).
Each ML classifier was run separately to perform model construction and assess validation. For model construction purposes, for each ML classifier, a tuning of the relevant parameters was performed to optimize the setup of the algorithm, therefore maximizing its classification performance. In order to minimize any risk of overfitting, a balanced cross-validation approach (see Section 5.3.3) was adopted and used to train and test each constructed model. For each classifier, we identified the best pool of features to separate patients' groups with the best classification performance.
Artificial Neural Network (ANN) ANN are ML approaches are inspired by biological neural networks [48]. ANN models are based on connected units (i.e., artificial neurons) that simulate the structure and functionality of brain neurons and their synapses. Among ANN models, the radial basis function network (RBFN) is a feed-forward neural network that uses the radial basis (Gaussian) function as an activation function [10]. Compared to other ANN models, the RBFN special architecture can grant important advantages such as a simpler structure and a faster learning approach. Support Vector Machine (SVM) Support vector machines (SVMs) are fast and robust classification models that tend to perform very well, even when run on a limited amount of data. SVM algorithms' objective is to use training data to find the hyperplane in an N-dimensional space (where N is the number of the features) which best separates data from different groups [49]. For the present study, two SVM architectures with a different nonlinear kernel function were used: SVM with the linear kernel (SVM LIN ) and SVM with radial basis function (RBF) kernel (SVM RBF ).

Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS is a classification approach, in between the ANN methods and the fuzzy logic systems. ANFIS integrates both ANN and fuzzy logic principles and converges the benefits from both methods, offering adaptability, which is characteristic of the ANN backpropagation, and the smoothness that characterizes the fuzzy control interpolation [50]. For this work, we used the ANFIS algorithm included in the fuzzy logic toolbox in Matlab with a Sugeno-type fuzzy inference system (FIS) and Saussian functions as membership functions to specify the fuzzy set.

Random Forest (RF)
The random forest approach operates on training data by constructing a forest of decision trees. The classification output is the class predicted by most trees [51]. In this work, the random forest algorithm was implemented in Matlab using the TreeBagger function, which uses bootstrap aggregation (i.e., bagging) as an ensemble method to control for overfitting, therefore improving the model generalization.

Fuzzy c-Means Clustering (FCM)
Fuzzy c-means clustering (FCM) is a clustering method which allows a data point to belong to more than one cluster [52]. Each data point is assigned to a cluster to some degree which depends on its membership grade. In this work, the FCM approach was implemented in Matlab as part of the homemade classification toolbox.

Cross Validated Accuracy
To improve the classification performances of each ML algorithm, decreasing their variance and, therefore, reducing overfitting issues, a balanced Monte Carlo 10-fold crossvalidation (CV) approach using 100 bootstraps was considered [12]. At each run, the CV algorithm splits the original input data into 10 parts, with the dataset classes equally represented, therefore generating 100 new different CV datasets. For each algorithm, for each newly generated CV dataset, nine parts (i.e., nine folds) were used to run the ReliefF feature selection (see Section 5.3.1) and then to train the classifier, while the remaining part (i.e., one fold) was used to test it. For each ML algorithm, the best classification performance, obtained over the 100 bootstraps, and its related model were taken as the final result.

Performance Comparison
In this paper, each ML classifier performed a binary classification to discriminate group A from group B (where A and B are generically used here to address two different groups of patients). For a performance comparison, we calculated classification accuracy, specificity and sensitivity [10]. For each built model, the receiver operating characteristic (ROC) curve was calculated, using its area under the curve (AUC) to compare the different classifiers' performance [53].
Two hundred thirty-nine subjects started BoNT-A treatment at IRCCS Mondino Foundation between January 2016 and March 2021. Twenty-seven patients were excluded because it was not possible to track medical records (n = 3) or those had already used BoNTA (n = 4). Fifteen were lost at follow up and four presented an adverse event, because of which it was decided not to proceed with the treatment. Of the remaining 212, 32 were affected by HFEM, and the remaining 204 suffered from CM. The analysis was carried out on 145 eligible subjects, of whom 91 completed the 4 BoNT-A cycles, while 54 terminated the treatment early after 1 single cycle. Sixty-seven subjects who decided to terminate the treatment after the second or third cycle were not included in the analysis.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/toxins15060364/s1, Table S1: List of the features evaluated with each different ML algorithm. Results of the different machine learning (ML) approaches applied to primary, secondary, and exploratory endpoints. The population is composed of only those patients who completed 1 year of treatment. The comparison is performed between groups 2 vs. 4 + 5. Legend: accuracy (ACC); sensibility (Sens); specificity (Spec); area under the curve (AUC); the number of features used for classification (N); support vector machine (SVM); RBF kernel; ANFIS (aNN); MLP (aNN); unsupervised (unsup.) * performance failure in predicting the outcome. Data are expressed as percentage (%). Table S2: Results, responders vs. nonresponders or early terminator. Results of the different machine learning (ML) approaches applied to primary, secondary, and exploratory endpoints. The population is composed of those patients who completed 1 year of treatment as well as the early terminator who lacked efficacy after 1 cycle and decided to stop the treatment. The comparison is performed between groups 1 + 2 vs. 4 + 5. Legend: accuracy (ACC); sensibility (Sens); specificity (Spec); area under the curve (AUC); the number of features used for classification (N); support vector machine (SVM); RBF kernel; ANFIS (aNN); MLP (aNN); unsupervised (unsup.). Table S3: Validation, in people living with chronic migraine, of the 4-feature panel obtained with the population of people living with high-frequency episodic migraine. The comparison is performed between groups 1 + 2 vs. 4 + 5. Legend: accuracy (ACC); sensibility (Sens); specificity (Spec); area under the curve (AUC); the number of features used for classification (N); support vector machine (SVM); RBF kernel; ANFIS (aNN); MLP (aNN); unsupervised (unsup.). Table S4: Summary of the available literature regarding BonT-A efficacy and predictors.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki. In Italy, studies using retrospective anonymous data from administrative databases that do not involve direct access by investigators to identification data do not require further Ethics Committee approval, notification, or patient-informed consent signing. Therefore, for this specific evaluation, specific approval by the Institutional Ethics Committee was not required. This study was run using previously acquired and anonymized data within two different protocols. They were performed at the IRCCS Mondino Foundation of Pavia and both were conducted in accordance with the Declaration of Helsinki. They were approved by the local ethical committee: [Protocol 1] FM-BOEM authorized by the Pavia local ethical committee on 22/7/2017, reg. n • NCT 04578782 clinicaltrial.gov and [Protocol 2] authorized by the Pavia local ethical committee n • 0097925/21.

Informed Consent Statement:
Patient consent was waived due to the use of anonymized datasets. Data were initially collected within the framework of two different local protocols for which informed consent was obtained from all subjects involved in the studies.
Data Availability Statement: Data used for this study are available from the corresponding author upon request.