The association between sample and treatment characteristics and the efficacy of repetitive transcranial magnetic stimulation in depression: A meta-analysis and meta-regression of sham-controlled trials

BACKGROUND
Repetitive transcranial magnetic stimulation (rTMS) is a form of non-invasive neuromodulation that is increasingly used to treat major depressive disorder (MDD). However, treatment with rTMS could be optimized by identifying optimal treatment parameters or characteristics of patients that are most likely to benefit. This meta-analysis and meta-regression aims to identify sample and treatment characteristics that are associated with change in depressive symptom level, treatment response and remission.


METHODS
The databases PubMed, Embase, Web of Science and Cochrane library were searched for randomized controlled trials (RCTs) reporting on the therapeutic efficacy of high-frequent, low-frequent, or bilateral rTMS for MDD compared to sham. Study and sample characteristics as well as rTMS parameters and outcome variables were extracted. Effect sizes were calculated for change in depression score and risk ratios for response and remission.


RESULTS
Sixty-five RCTs with a total of 2982 subjects were included in this meta-analysis. Active rTMS resulted in a larger depressive symptom reduction than sham protocol (Hedges' g = -0.791 95% CI -0.977; -0.605). Risk ratios for response and remission were 2.378 (95% CI 1.882; 3.005) and 2.450 (95% CI 1.779; 3.375), respectively. We found no significant association between sample and treatment parameters and rTMS efficacy.


CONCLUSIONS
rTMS is an efficacious treatment for MDD. No associations between sample or treatment characteristics and efficacy were found, for which we caution that publication bias, heterogeneity and lack of consistency in the definition of remission might bias these latter null findings. Our results are clinically relevant and support the use of rTMS as a non-invasive and effective treatment option for depression.

respectively (Olesen et al., 2012;Greenberg et al., 2021). Importantly, in up to 35% of patients with MDD first-line treatments (pharmaco-or psychotherapy) are ineffective, which is often referred to as treatment-resistant depression (TRD) (Rush et al., 2006). Unfortunately, there is no unified definition of TRD. TRD has been operationalized in multiple ways (McAllister-Williams et al., 2020;Sforzini et al., 2021). For example, Conway recently proposed a two-stage model of moderate and severe treatment resistance, which is largely based on the inflection point in antidepressant efficacy that is seen after two treatment trials (Conway et al., 2017). However, the Thase and Rush staging method is commonly used in clinical trials and defined five levels of treatment resistance, based on the number and classes of failed antidepressant treatment trials (Thase and Rush, 1997). Patients with TRD show higher levels of chronicity, comorbidity and suicidality, affirming the need for more effective treatment options for this population (Eaton et al., 2008).
Repetitive transcranial magnetic stimulation (rTMS) is a form of noninvasive neurostimulation that is increasingly being used in MDD, most commonly in TRD and with promising effects (Anon, 2016). The therapeutic effect of rTMS is achieved by delivering magnetic pulses through a coil that is positioned above the head. The magnetic field causes an electrical current in the underlying cortex that modulates neuronal activity. Based on the observation of dysfunctional dorsolateral prefrontal cortex (DLPFC) activity in depressed patients in neuroimaging studies, the DLPFC initially was the primary target for MDD (George et al., 1994;Reid et al., 1998). A decrease in activity was observed in the left DLPFC, whereas the opposite was seen in the right DLPFC (Kennedy et al., 1997;Bench et al., 1995). To remedy this imbalance, activating, high-frequency stimulation is usually applied over the left DLPFC, and inhibiting, low-frequency stimulation is applied to the right DLPFC. Whilst targeting the DLPFC with different frequencies has been an established parameter in rTMS treatment for MDD, other parameters have varied substantially in attempts to optimize treatment. Over the years, the "dosing" of rTMS has tended to increase, with a higher intensity of stimulation, a larger number of pulses, and more sessions. This has resulted in a highly variable set of treatment parameters across studies, contributing to difficulties in interpreting the merits of these parameters.
Apart from treatment parameters, outcomes may also be influenced by characteristics of the participating patients. Less severely depressed patients may be more likely to respond to treatment, and older age has been suggested to be negatively associated with rTMS response (Grammer et al., 2015;Fitzgerald et al., 2016;Pallanti et al., 2012;Fregni et al., 2006). Therefore, treatment with rTMS may be optimized by identifying optimal treatment parameters and identification of patients who are most likely to benefit. In 2006, Hermann and colleagues reviewed variables that potentially modified the effectiveness of rTMS, such as treatment resistance, medication effects, and stimulation intensity (Hermann and Ebmeier, 2006). Based on 33 studies, the authors concluded that these predictors could not be found due to a number of factors, including low sample sizes and large heterogeneity in study parameters. Moreover, patient characteristics and treatment parameters are likely to interact and study results might be confounded by indication, e.g. in elderly patients a higher stimulation intensity might be more effective to compensate for potential atrophy and a more intensive treatment protocol with superior efficacy might only be applied in a group of highly treatment-resistant patients.
In general, rTMS studies tended to have low statistical power due to small sample sizes, however several systematic reviews and metaanalyses have clearly confirmed the efficacy of rTMS in treating depression, and in general show a superior efficacy of rTMS over sham (Berlim et al., 2013a(Berlim et al., , 2013b(Berlim et al., , 2014Mutz et al., 2018;Schutter, 2010;Brunoni et al., 2017). However, most reviews and meta-analyses focused on a specific population, such as adolescents or non-TRD, or a specific type of rTMS, such as bilateral rTMS or rTMS in combination with antidepressants (Magavi et al., 2017;Voigt et al., 2019;Zhang et al., 2015;Wei et al., 2017). As a result, many of these meta-analyses included at most 20-30 studies. For the subset of meta-analyses that additionally aimed to identify rTMS treatment parameters associated with efficacy, this number might still be too low to ensure enough statistical power, especially when one considers the potential interaction between patient-groups and treatment parameters (Berlim et al., 2014;Wei et al., 2017;Sehatzadeh et al., 2019). A recent network meta-analysis by Mutz and colleagues included 53 rTMS trials, making it one of the largest rTMS meta-analyses to date . However, no analyses on sample and treatment parameters were included. Therefore, a new meta-analysis with a large number of studies that includes analyses on sample and treatment characteristics could potentially enable us to identify parameters to stratify patients to receive the most adequate form of rTMS. Information on which of these characteristics are associated with higher efficacy would be an important finding with much clinical relevance that could ultimately inform clinical practice.
We therefore aimed to study the efficacy of rTMS treatment in depression compared to sham-rTMS in the largest and most inclusive meta-analysis to date. In addition, we aimed to examine the association between sample and treatment characteristics and efficacy with metaregression. Based on studies that show low efficacy of rTMS in patients with a high level of treatment resistance, we expected the efficacy of rTMS to be higher in patients with a lower level of treatment resistance (van Eijndhoven et al., 2020;Lisanby et al., 2009;Kiebs et al., 2019). We hypothesized that rTMS is similarly effective in unipolar and bipolar depression, but that efficacy is lower in patients with psychotic depression (Nguyen et al., 2021;Lefaucheur et al., 2019). We expected that a higher number of rTMS pulses would increase rTMS efficacy (Kar, 2019). Finally, although this effect might be confounded by level of severity, we hypothesized that efficacy of rTMS would be higher in patients who received concurrent pharmaco-or psychotherapy compared to patients receiving rTMS monotherapy (Donse et al., 2018;Carpenter et al., 2012).

Search strategy, study selection and in-/exclusion criteria
The electronic databases PubMed, Embase, Web of Science and Cochrane Library were searched up to July 11th 2022, supported by an experienced librarian (see appendix p2 for search terms). Reference lists of review articles were searched to find additional articles. In order to include as many studies as possible, no limit on language and publication date was applied. After removal of duplicates, two independent reviewers (ID and PvE) assessed titles and abstracts for study eligibility. Thereafter, full-texts of selected articles were assessed. Discrepancies in eligibility were discussed between the two reviewers to reach consensus.
Inclusion criteria were sham-controlled, randomized rTMS studies assessing effects of rTMS for primary depression in adults, using highfrequency (HF), low-frequency (LF), or bilateral (BL) rTMS. We excluded studies assessing other types of TMS (i.e. accelerated TMS, theta-burst TMS, priming rTMS), studies assessing the effect of rTMS on depression within a specific somatic disease population group (e.g. Parkinson's disease), and studies that simultaneously initiated a different treatment, such as antidepressants together with rTMS, because this complicated our focus on efficacy of rTMS. As our primary outcome measure was a clinician-rated depression scale (e.g. Hamilton Depression Rating Scale, HDRS, or Montgomery-Asberg Depression Rating Scale, MADRS), studies that did not include such a scale were also excluded. No minimum number of participants or treatment sessions were required as these were parameters of interest for the metaregression.

Outcomes
Our primary outcome was the standardized mean difference (Hedges' g) of the change between baseline and end of treatment for the depression score measured on a clinician-rated scale. Secondary outcomes were treatment response, defined as a 50% symptom reduction, and treatment remission, which is a score below a pre-defined cut-off value on the clinician-rated scales mentioned above. This value differed between studies; with a cut-off value < 10 or < 11 for MADRS, and < 12, < 11, < 9 or < 8 for the different versions of the HDRS, consisting of 28, 24, 21, or 17 items, respectively.

Risk of bias assessment
Two reviewers (ID and JB) independently assessed the internal validity of the included studies, using the 'Risk of Bias' assessment tool developed by the Cochrane Collaboration (Higgins et al., 2011). All studies were examined on six criteria: 1) random sequence generation; 2) concealment of allocation; 3) blinding of participants, personnel and outcome assessors; 4) incomplete outcome data; 5) selective outcome reporting; and 6) other sources of bias. Each criterion received a score of high, low or unclear risk of bias. Disagreements were resolved by consensus.

Data extraction
Two reviewers (ID and JB) independently extracted data from the included studies, disagreements were resolved by consensus. Data on study characteristics, sample sizes, patient characteristics, rTMS parameters, and outcome variables were extracted (see appendix p5 for an overview of all variables). Data that could not be retrieved were requested from the corresponding authors. Non-responsive authors were reminded two more times before data analysis started, two weeks and six weeks after the first email was send. In order to make use of articles that studied multiple arms, the sample size of the control group was split and matched with the experimental groups. Although the mean stayed the same, the standard deviation was recalculated using the following formula (Follmann et al., 1992):

Statistical analysis
Analyses were conducted using Stata 16 (StataCorp, 2019). The difference between rTMS and sham conditions in pre-posttreatment change of depression severity scores was expressed as Hedges' g effect sizes, which is the standardized mean difference adjusted for small sample bias. This allowed the comparison of different scales measuring depression, including different versions of the HDRS. If pre-to posttreatment change scores were unavailable, the mean change and its S.D. were calculated based on the baseline and end of treatment scores, using an assumed correlation coefficient of 0.5 (Ruhé et al., 2007). For response and remission, risk ratios were calculated. We applied a random-effects model, and forest plots were constructed. Heterogeneity of the included studies was assessed by the I 2 statistic and its 95% confidence intervals, as well as τ 2 . I 2 ≥ 50% was considered to be indicative of substantial heterogeneity.
Meta-regression analyses were performed to evaluate the effect of several participant characteristics and TMS treatment parameters on our primary outcome. For these analyses, only study-level variables were used and interpreted as such to avoid ecological fallacies (Berlin et al., 2002;Morgenstern, 1982). Univariate meta-regressions were conducted for all study level sample-and treatment parameters. Sample characteristics were presence of co-therapy, inclusion of bipolar patients, inclusion of psychotic patients, minimal baseline inclusion HDRS score, minimal baseline inclusion TRD-level based on number of medication trials. Investigated treatment parameters were rTMS type, localization method, coil type, sham type, percentage of motor threshold used for stimulation, amount of LF pulses per session, amount of HF pulses per session, and number of sessions. For categorical variables, the most common category was chosen as a reference, in order to find out how the others differed from this category. A multivariate meta-regression was performed with co-therapy and minimal level of TRD at baseline, as well as with the variables that had a p-value of < 0.10 in the univariate meta-regressions. This value of < 0.10 was chosen as a selection criterion, not to be confused with an indicator of statistical significance (Razza et al., 2018). Results of the multivariate meta-regression analysis were interpreted with a p-value of < 0.05 as indicator of statistical significance.

Sensitivity analyses
To examine the potential impact of the assumption of a correlation coefficient of 0.5 if score change was unavailable, sensitivity analyses were performed to assess whether a reduced (r = 0.2) or increased (r = 0.8) correlation would change the overall results of the metaanalysis. One sensitivity analysis was conducted to determine the robustness of the meta-analytic findings, by excluding studies that scored ''high'' on one of the risk of bias criteria. To assess small-study effects, a third sensitivity analysis examined only large studies (n = ≥40). Within the field of rTMS studies including ≥ 40 participants can be considered large (median n = 32 participants for the included trials). A fourth sensitivity analysis performed the meta-analysis without outliers, defined as studies with a 95% confidence interval (CI) outside the 95% CI of the pooled effect (Viechtbauer and Cheung, 2010). Meta-regression analyses were performed on the secondary outcomes as a sensitivity analysis. Finally, possible publication bias was assessed by inspecting a funnel plot and conducting Egger's test. If the funnel plot and Egger's test indicated funnel plot asymmetry, both the "trim and fill" method and the Copas method were applied to further assess the influence of potential publication bias (Copas and Shi, 2001;Duval and Tweedie, 2000). The trim and fill method provides an adjusted effect size by removing smaller studies causing asymmetry (trim) and by imputing their missing counterparts (fill). As this method is known to perform poorly in the presence of substantial between-study heterogeneity (Peters et al., 2007), a Copas selection model was applied as well, which is less affected by heterogeneity. The Copas selection model provides a corrected estimate of the effect size under the assumptions that the propensity for publication depends on (1) the study's effect size and (2) the study's sample size. A correlation between the observed effect size and the propensity for publication may then indicate selection bias (Copas and Shi, 2001). . 1 shows the search results and study selection. A total of 3359 records were identified after searching the databases. After removal of 1573 duplicates, title and abstract screening of the remaining 1786 records excluded 1646 records. Therefore, 140 records were screened fulltext for study eligibility. Sixty-seven articles were excluded for several reasons (as indicated in Fig. 1), resulting in 73 articles meeting study criteria. After the data extraction, eight more articles were excluded from the analysis (see Fig. 1). Therefore, we finally included 65 articles. Fourteen studies contained more than two experimental arms, yielding 79 comparisons in total. Four studies scored 'high' on one of the risk of bias criteria (appendix p6-8). For an overview of patient characteristics, see appendix p13-15.

Characteristics of included trials
Tables 1 and 2 show an overview of study characteristics. In total, the studies included 2982 patients, of which 1659 received rTMS and 1323 were allocated to a sham protocol. Of the 79 comparisons included for analysis, 61 studied the efficacy of HF-rTMS, nine of LF-rTMS and nine of BL-rTMS. Seventy-four comparisons used the HDRS, however, different versions were applied (assessing 17, 21, 24, 25 or 28 items), and five comparisons used the MADRS as their primary outcome measure. The articles that contained multiple comparisons studied either LF and HF stimulation, HF and BL stimulation, LF and BL stimulation, or HF with varying rTMS parameters, such as motor threshold and stimulation frequency. The motor threshold used in the articles ranged between 80% and 120%, and the frequency used ranged between 0.3 Hz and 20 Hz. The minimum amount of sessions was five and did not exceed 30. In the sham procedure, 19 and 31 comparisons tilted the coil 45 or 90 degrees, respectively, 26 used a sham coil, two stimulated the vertex, and for one article this was unclear. Fifty-nine of the comparisons combined the rTMS treatment with ongoing medication, whereas 19 did not (for one comparison this was unclear). Thirteen comparisons included participants that had been treated with at least one antidepressant, for 38 comparisons this needed to be at least two, and for two comparisons three antidepressant trials were needed (for 26 comparisons this was unclear).
Separate sensitivity analyses examining alternative correlations between pre-and post-intervention depression severity scores, excluding studies scoring 'high risk' on one of the risk of bias criteria, excluding small studies, and excluding outliers did not change the results (appendix p9). However, the exclusion of outlier studies resulted in a large reduction in I 2 (74% to 4%).
Sixty-three comparisons were included in the analysis of response rates. The risk ratio for treatment response was 2.378, 95% CI 1.882; 3.005, p < .0001; Fig. 3, with low heterogeneity (I 2 = 13%, 95% CI 0%; 37%; τ 2 = 0.186). Forty-six comparisons were included in the analysis of remission. The risk ratio was 2.450, 95% CI 1.779; 3.375, p < .0001; Fig. 4, again with low heterogeneity (I 2 = 5%, 95% CI 0%; 33%; τ 2 = 0.223). Table 3 shows the results of the meta-regression analyses performed to assess the effect of several independent variables on the calculated effect sizes in the main analysis. In the independent meta-regressions only one of the variables had a p-value below the pre-specified value (p < .10). A second analysis, examining these variables in one model, was therefore not indicated. The multivariate meta-regression analysis including co-therapy and minimal level of TRD at baseline was nonsignificant (co-therapy; β = 0.335, p = .389, TRD-level; β = − 0.235, p = .506; a negative regression coefficient is associated with a higher effect size). As a sensitivity analysis, univariate meta-regression analyses were performed on the secondary outcomes (appendix p9-11). In these analyses, a positive regression coefficient is associated with an increase in the effect size. For the analysis with log risk ratio of response rate as outcome, inclusion of psychotic patients fit the selection criterion of p < .10 (β = − 0.791, p = .056). For remission, localization method

Publication bias
The funnel plot (n = 61) in Fig. 5 shows a substantial asymmetrical distribution of studies compared to the midline. Egger's test was significant (p < .0001), indicating asymmetry as well. To assess the influence of this suggested publication bias, we first applied the trim and fill method. This resulted in a more conservative treatment estimate, Hedges' g = − 0.557 (95% CI − 0.639; − 0.474). Considering the high between-study heterogeneity, we subsequently applied the Copas method. The Copas-model suggested that the original random effects meta-analysis showed a biased estimate of the treatment effect (due to publication or other selection biases). Adjusting for this bias resulted in a more conservative treatment estimate, Hedges' g = − 0.200, 95% CI − 0.587; 0.187, p = .310). Funnel plots for the secondary outcomes (response rate, n = 60; remission rate, n = 44) did not indicate asymmetry suggestive for publication bias (appendix p12).

Discussion
This meta-analysis and meta regression investigated the efficacy of rTMS in the treatment of depression compared to sham stimulation, while also taking into account the association between several treatment and sample parameters and rTMS efficacy. Sixty-five articles were identified, yielding a total of 79 comparisons in 2982 patients. We found a significant improvement in depressive symptoms at end of treatment after rTMS compared to sham protocols, with a large effect size (Hedges' g = − 0.791), but also considerable heterogeneity between studies. With respect to our secondary outcome parameters, we showed medium to large effect sizes for response and remission, respectively, whilst heterogeneity was low. However, results need to be interpreted with some caution since we identified a risk of publication bias. When correcting for publication bias with the trim and fill method the effect-size was reduced (Hedges' g = − 0.556), and when using the Copas method, the effect size of our primary endpoint became non-significant. Instead, funnel plots for the secondary outcomes did not indicate asymmetry suggestive of publication bias.
First, a critical note on publication bias tests is warranted. When heterogeneity is large, interpretation of funnel plots requires great caution. Tests of publication bias may not be appropriate if the I 2 statistic is greater than 50% (Ioannidis and Trikalinos, 2007). As we observed substantial heterogeneity, it is possible that asymmetry in the funnel plot of the primary outcome is -at least partly-caused by heterogeneity instead of publication bias. Furthermore, tests for publication bias, such as Egger's test, tend to detect more statistically significant publication bias in larger meta-analyses, e.g. with > 20 studies (Lin et al., 2018). Indeed, it has even been suggested that tests for funnel plot asymmetry are only appropriate in a minority of meta-analyses (Ioannidis and Trikalinos, 2007). Moreover, we found a difference in heterogeneity between our primary and secondary outcomes. A possible explanation could be the difference between standardized change scores on a clinician-rated questionnaire (primary outcome) versus response and remission rates which are based on cut-off scores (secondary outcomes). Despite that change scores are standardized, the raw scores of the primary outcome might result in a larger change score to be more easily achieved if baseline scores are higher to begin with. Instead, a 50% reduction from baseline score is by definition corrected for higher scores at baseline. Interestingly, the removal of outliers reduced the level of heterogeneity considerably from high to low (I 2 from 74-4%), whilst there was only a minor change in overall effect size. Based on the valuable suggestion of an unknown reviewer, we further examined whether correcting for baseline severity would affect the primary outcome by performing a meta-regression analysis (Harrer et al., 2021(Harrer et al., , 2021Chaimani, 2015). This actually increased the level of heterogeneity (I 2 from 74% to 82%). This further strengthens our suggestion that the significant tests for publication bias are indeed the result of heterogeneity instead of actual publication bias. Taken together, the indication of publication bias might be, at least partially, explained by heterogeneity between studies. Therefore, the rest of the discussion we will consider the effect size of − 0.832 as our best estimate of efficacy, while keeping the possibility of publication bias in mind according to the more conservative Copas analysis.

Sample characteristics
For our primary outcome, the meta-regression provided no evidence for associations with sample characteristics on efficacy.
We hypothesized to find a similar efficacy between studies that included only patients with unipolar depression and studies that included both unipolar and bipolar patients. Our results corroborate this assumption, as inclusion of bipolar patients was not indicated as a potential influence on rTMS efficacy based on the meta-regression analyses. A recent meta-analysis found rTMS to be significantly more effective than sham in bipolar patients, with a comparable effect size as reported here (Nguyen et al., 2021). Our meta-regression analysis therefore supports that there is no difference in efficacy of rTMS in unipolar and bipolar depression.
A second hypothesis was in regard to lower efficacy in studies including psychotic patients, as recent guidelines suggest rTMS to be ineffective in case of depression with psychotic features (Lefaucheur et al., 2019). Our results do not support this hypothesis.
Level of treatment resistance was expected to predict efficacy, with less treatment-resistant patients being more likely to respond to rTMS (van Eijndhoven et al., 2020;Lisanby et al., 2009). We found no evidence for this relation, based on the minimum level of treatment resistance required for inclusion in the studies.
For these variables, our conclusions could only be based on inclusion criteria of the studies. For example, instead of percentage of patients with e.g. bipolar depression, we could only use information on whether or not patients with bipolar depression were included. More subtle differences between studies that included a low (e.g. 5%) versus a majority (e.g. 65%) of bipolar patients will therefore be lost, which might have influenced our results. This is also the case for treatment resistance; an inclusion criterion of ≥ 1 AD or ≥ 2 AD show substantial overlap which might have obscured the contrast to detect the influence of level of treatment resistance. Therefore, these effects might only be detected when performing a large individual participant data (IPD) megaanalysis.
Finally, the presence of co-therapy did not seem to be associated with rTMS efficacy. Although this has not been investigated directly in a comparative study, it is generally believed that combining rTMS with pharma-or psychotherapy increases efficacy. A study that combined rTMS with psychotherapy showed higher remission percentages (55%) than a similar naturalistic study with rTMS as monotherapy (37%) (Donse et al., 2018;Carpenter et al., 2012). This has also been shown for pharmacotherapy and psychotherapy as monotherapies as compared to the combination of the two (Arns et al., 2019). Although it is common practice to combine rTMS with psycho-and or pharmacotherapy in the clinical setting, increased efficacy of augmentation of rTMS with other therapies is not supported by our results and merits future studies. This could partly be explained by the fact that it is common for patients to  Pallanti et al. (b), 2010 (Pallanti et al., 2010) 100-110 1 10 420 1000 15 5 cm f8 Sham coil Ray et al., 2011(Ray et al., 2011 Wang et al., 2022 120 n.a. 10 n.a. 2400 10 f8 Sham coil Yesavage et al., 2018(Yesavage et al., 2018 120 n.a. 10 n.a. 4000 20-30 f8 Sham coil Zheng et al., 2010(Zheng et al., 2010 110 n.a.    continue with the ineffective antidepressant medication whilst starting treatment with rTMS. Because co-therapy might be more common in more severely depressed patients, we additionally investigated the combined effect of co-therapy and level of treatment resistance at inclusion, but an interaction of these two variables on rTMS efficacy could not be confirmed.

Treatment parameters
Based on our meta-regression, the different types of rTMS do not differ in efficacy. This conclusion is not in line with the recommendation of the recently updated evidence-based guideline on the therapeutic use of TMS, which suggest that based on the number of studies, their quality and their results, HF-rTMS is the best choice (Lefaucheur et al., 2019). Our review does not corroborate this distinction in efficacy, or quality. We appraised studies by the risk of bias assessment tool, where both HF-rTMS and LF-rTMS were found to be equal in efficacy and quality, as was bilateral rTMS.
Results on the total number of sessions of rTMS remain inconclusive. Although this was a significant variable in the multivariate metaregression analysis with remission rate as outcome, the coefficient is too low to be of meaningful clinical use. The first clinical rTMS studies treated patients with five to ten sessions, whereas it is now common practice for a rTMS course to consist of twenty to thirty sessions (O'Reardon et al., 2007;Arns et al., 2019;George et al., 1995;Avery et al., 1999). In a meta-analysis, a subgroup analysis on this contrast indeed found that increasing the number of sessions also increases efficacy (Teng et al., 2017), however, we could not replicate this. Notably, in this subgroup analysis only HF-rTMS studies were included, resulting in a smaller, more homogeneous set of studies, which could explain this discrepancy.
We did not find a relationship between efficacy and number of pulses per session, even though other meta-analyses do support a positive relation between efficacy and number of pulses (Schutter, 2009;Gershon et al., 2003). A recent randomized controlled trial compared standard to high number of pulses per session of both LF-and HF-rTMS (Fitzgerald et al., 2020). Although there was no significant difference between the groups receiving standard or high doses of pulses, remission rate was higher in the group that received high dose HF-rTMS when controlling for duration of illness.
Evidence for a relation between efficacy and percentage of the motor threshold used for stimulation is similarly inconclusive. Studies have shown that efficacy increases when rTMS is applied at a higher percentage of the MT, which is most notable when comparing subthreshold stimulation to stimulation at or above the threshold (Fitzgerald et al., 2016;Padberg et al., 2002). Specifically, RTCs that randomized between sub-threshold and (supra-)threshold MT show twice as high remission percentages in the groups that received treatment at ≥ 100% of the MT (Padberg et al., 2002;Rossini et al., 2005). However, other studies did not find this relationship (Bakim et al., 2012;Loo et al., 1999). Nevertheless, it is generally recommended to stimulate suprathreshold, as it might increase efficacy and is still within safety limits (Rossi et al., 2009). In summary, our review does not support nor refute the common practice to apply treatment at the higher end of number of pulses and % MT, to optimize chances of efficacy.

Strengths and limitations
To our knowledge, this meta-analysis has included the most randomized sham-controlled rTMS trials to date, providing strength and statistical power to our conclusions. Furthermore, three different sets of outcomes were investigated, increasing the clinical relevance of our results. Finally, multiple sensitivity analyses were performed for both the meta-analysis and meta-regression, increasing the robustness of the results.
Nevertheless, some limitations exist. In our meta-analysis and metaregression, we could only examine study-level variables, with inherent limitations to detect small effects and/or interactions. To better identify variables that are predictive of (non-)response to rTMS, we propose to perform individual participant data mega-analyses. Second, we observed a lack of consistency in definitions of remission across studies. This -despite low heterogeneity -may have influenced the pooled risk ratio for remission rate. More consistent reporting of and consensus on cut-off scores for remission would increase the value of remission rates as an outcome. Third, it is unclear whether all study participants were receiving rTMS for the first time. In some of the included studies previous treatment with rTMS is explicitly stated as an exclusion criterion, however, in most it is not mentioned. Especially for older studies it is unlikely that patients have previously received rTMS, as it has only recently become available as a standard clinical treatment. Although we are unable to give a definite conclusion on this subject, we would expect that nearly all participants received rTMS for the first time.

Table 3
Results of the univariate meta-regression analyses, with Hedges's g as effect size. Variables with a p-value of < 0.1 were included in a multivariate metaregression analysis. n refers to the number of studies included in the analysis. HDRS, Hamilton Depression Rating Scale; TRD, treatment-resistant depression; LF, lowfrequency; HF, high-frequency; BL, bilateral. rTMS and sham type were represented as dummy variables with the most common group serving as the reference group. * indicates a value below the selection criterion of p < .1 A negative regression coefficient is associated with a higher effect size.

Recommendations for future studies
If the choice of the outcome measure is indeed responsible for differences in heterogeneity (as observed between primary and secondary outcomes), reliability of outcomes need to be improved. For example, an interview version of the Hamilton Depression Rating Scale has been developed, with high inter-rater reliability as well as high internal consistency, even when administered by interviewers without psychiatric background (Williams, 1988;Potts et al., 1990). These measures should not only be used consistently in research, but also in clinical settings, increasing our external validity.
Furthermore, trials should include more information on the participants they include. For example, in some studies it was unclear whether treatment resistance was solely based on lack of clinically significant response or also on intolerability, and although some comorbid disorders (e.g. substance use disorders) are often mentioned as an exclusion criterion, information about common comorbid disorders is often insufficient. Since these variables could be of influence it is valuable to elaborate on them.
Finally, to avoid ecological fallacies and more reliably identify predictors of (non-)response to rTMS, individual participant data megaanalysis should be applied. As the majority of rTMS studies include at most a few hundred participants, a global database of combined study data should be initiated. For example, in the field of electroconvulsive therapy the Global ECT-MRI Research Collaboration (GEMRIC) was founded to perform mega-analysis to ultimately inform clinical practice (Oltedal et al., 2017). A similar effort in the field of rTMS, as is currently being set up by the Big TMS Data Collaboration, could mean a large step forward, also improving translation from research to clinical practice (Corp, 2022).

Conclusion
In conclusion, this meta-analysis supports the efficacy of rTMS in the treatment for depression with large effect sizes for change in depression severity and medium to large effects on response and remission rates. Indicators of publication bias were found, however we argue that this might actually be a representation of heterogeneity. We found no clear evidence that sample and treatment characteristics are associated with the efficacy of rTMS. No indication was present for a difference in efficacy of rTMS for unipolar and bipolar depression, or for patients with psychotic features. The different types of rTMS (high-frequent, lowfrequent and bilateral) seem to be equally effective. In summary, our results are clinically relevant and support the use of rTMS as a noninvasive and effective treatment option for different levels of treatment resistant depression.

Declaration of Competing Interest
We received no support from any organization for the submitted work. FS has received funding support from the Netherlands Organization for Health Research and Development (ZonMW) and the Dutch Ministry of Health (VWS). JS is Chair of the Dutch committee multidisciplinary guideline depression. IT has received funding support from the Netherlands Organization for Health Research and Development (ZonMW;852001925). HR has received funding support from the Netherlands Organization for Health Research and Development (ZonMW;10140021910006), the Dutch Ministry of Health (VWS), and the Radboudumc Nijmegen, has received speaker fees, and is a member of the executive board of the International Society of Affective Disorders. PvE has received funding support from the Netherlands Organization for Health Research and Development (ZonMW; 636310018), received speaker fees for lectures on treatment-resistant depression, and is vice-director of Dutch Flemish Brain Stimulation foundation. All other authors declare no competing interests.