Current level of evidence for improvement of antidepressant efficacy and tolerability by pharmacogenomic-guided treatment: A Systematic review and meta-analysis of randomized controlled clinical trials

The aim of the study was to assess the clinical utility of currently available pharmacogenomic (PGx) tools compared with treatment as usual (TAU), using a meta-analysis of dichotomous and continuous antidepressant efficacy and tolerability data from previously published clinical trials. MEDLINE, clinicaltrial.gov, EU Clinical Trials Register, WHO ICTRP and CENTRAL were systematically searched; of the 962 results originally reviewed, 15 trials were included. Antidepressant efficacy was quantified by relative and absolute changes in symptom severity after eight weeks of treatment and by response and remission rates, while tolerability was estimated by the rate of study discontinuation for any reason. In the PGx-guided patients, symptom severity reduced by an average of 31.0% after eight weeks of treatment, compared to an average reduction of 26.8% in the TAU group. Accordingly, PGx-guided patients experienced a greater reduction in symptom severity of 3.4% (95%CI: 1.6-5.3%), which corresponded to a reduction in the Hamilton Depression score of 0.75 (0.30-1.21), a 37% (15-63%) higher remission rate, and an 18% (5-33%) higher response rate compared with TAU patients, while no difference was observed in discontinuation rate between groups. Notably, the majority of associations lost statistical significance when restricting the dataset to low risk of bias studies, while certain funnel plots suggested a potential publication bias favoring the reporting of statistically significant results. In summary, PGx tools marginally enhance antidepressant efficacy, but not antidepressant tolerability; thus, additional research and advancement of PGx tools are needed to improve integration of PGx in clinical pharmacotherapy of depression.


Introduction
Pharmacogenomics (PGx) can be used to personalize and subsequently improve pharmacological treatment of major depressive disorder (MDD) by predicting and mitigating potentially harmful drug-gene interactions (Jukić et al., 2022).Research into numerous genetic biomarkers associated with antidepressant metabolism, efficacy and tolerability, including allelic variants of CYP2C19, CYP2D6, SLC6A4, HTR2A, HLA-B etc. (Jukić et al., 2022) has facilitated the development of several PGx-based clinical decision support tools aimed at improving antidepressant treatment outcomes (Bousman and Hopwood, 2016).Over the past decade, several randomized controlled clinical trials have been conducted to investigate whether the use of the PGx tools developed to date can improve the efficacy and tolerability of antidepressant treatment compared with the treatment-as-usual (TAU) (Minelli et al., 2022, Zeier et al., 2018).However, the results of the individual trials were insufficient to draw a firm conclusion on this question (Bousman and Hopwood 2016;Minelli et al., 2022, Zeier et al., 2018, Zubenko et al., 2018, Goldberg et al., 2019a).
Recently, four meta-analyses (Brown et al., 2022, Arnone et al., 2023, Bunka et al., 2023, Wang et al., 2023) pooled the data from the available trials to obtain more meaningful results.All these meta-analyses detected a robust increase in remission and response rate as a result of PGx-guided treatment (Brown et al., 2022, Arnone et al., 2023, Bunka et al., 2023, Wang et al., 2023) and advocate clinical utility of currently available PGx tools for personalized antidepressant treatment of MDD.However, these meta-analyses (Brown et al., 2022, Arnone et al., 2023, Bunka et al., 2023, Wang et al., 2023) have mainly focused on dichotomous data such as response, remission and discontinuation rates, while the analysis of continuous outcomes was mainly absent.This is of major relevance because the percentage change from baseline in symptom severity at eight weeks has been reported as the primary endpoint in almost all available PGx trials and because it frequently occurred that certain efficacy outcomes within the same trial demonstrated the superiority of PGx-guided treatment over TAU, while others do not (Greden et al., 2019, Han et al., 2018, Perlis et al.,2020, Shan et al., 2019, Tiwari et al., 2022).Therefore, analyzing only a subset of outcomes does not provide the complete overview regarding clinical impact of PGx-guided treatment and continuous outcomes should also be investigated.Next, most of the available PGx trials presented outcome data for different time points after treatment initiation (Brown et al., 2022, Arnone et al., 2023, Bunka et al., 2023) and did not focus on a defined time point for all trials, which is of importance to standardize the conditions and minimize outcome heterogeneity.Finally, previous reports showed several methodological limitations: 1) Omission of several trials leading to the narrowing the scope of their meta-analyses, 2) the inclusion of the original trial data in the case where correction to the said data was published; 3) interchangeable use od Per protocol (PP) and Intention-to-treat (ITT) data; and 4) Not exploring demographics and methodological heterogeneity in sensitivity analysis and only using "leave-one-out" method (Table 1).In conclusion, the listed shortcomings argue for a need for a more comprehensive meta-analysis on this topic.
The aim of this systematic review and meta-analysis was to build on recent work and comprehensively describe all available data, both dichotomous and continuous, from the representative PGx trials.This analysis aims to clarify whether antidepressant therapy for patients with MDD, guided by the currently available PGx tools, leads to improvements in efficacy and tolerability compared to treatment as usual, and whether these effects are statistically and clinically significant.

Materials and methods
This report was written in accordance with the PRISMA 2020 guidelines (Page et al., 2021).The protocol for this meta-analysis series was not pre-registered because the results of the literature search were originally taken from another research project (Jukić et al., 2022) and the authors detected no additional benefit in registering the protocol after the original literature search that was published.

Literature search and study selection
Several digital databases were searched including: MEDLINE, clinicaltrial.gov,EU Clinical Trials Register, WHO ICTRP and CENTRAL.Published and unpublished reports of clinical trials conducted between January 1990 and 19th September 2022 (most recent re-run 24th October 2023) were screened for the inclusion.The search terms used were: "(personaliz* OR pharmacogen*) AND depress* AND randomized" (MEDLINE, EU Clinical Trials Register and WHO ICTRP), "Pharmacogenomics Depression Randomized Controlled Trial" (CENTRAL) and "Randomized Depression Pharmacogenomics" (clinicaltrials.gov).
A clinical trial was included if: (i) Participants were randomized between the TAU and PGx-guided groups; (ii) participants were treated with antidepressants for MDD; and (iii) MDD symptom severity was assessed using a validated clinical assessment scale; specifically, the Hamilton Depression Rating Scale (HAM-D), the Montgomery-Asberg Depression Rating Scale (MADRS), the Patient Health Questionnaire 9 (PHQ-9), the Quick Inventory for Depressive Symptomatology (QIDS) or the Children's Depression Rating Scale (CDRS).Trails with serious methodological flaws were excluded post-hoc.No restrictions were made on manuscript language, age of participants, type of antidepressant prescribed or type of PGx tool used.Searching, screening, data extraction, and transformation were performed by FM.The final selection of trials was decided by consensus between FM and MMJ.Risk of bias was assessed using the ROB2 tool (Sterne et al., 2019) across five domains.

Data extraction
All outcome data measured eight weeks after randomization were extracted; if no measurements were taken exactly after eight weeks, the measurement closest to the eight weeks within the interval of 6-12 weeks was selected.Extracted outcome data for the TAU and PGx arms included: (i) Symptom severity at baseline and at eight weeks; (ii) Nominal and relative change in symptom severity at eight weeks; (iii) Response rate at eight weeks, where response was defined as a reduction in symptom severity greater than 50%; (iv) Remission rate at eight weeks, where remission was defined as a symptom severity level HAM-D ≤7, CGI-I ≤2, PHQ ≤5, QIDS-SR16≤6; CDRS-R≤28 or MADRS≤10; (v) Rate of discontinuation for any reason at eight weeks; and (vi) Frequency, intensity, burden of adverse events (FIBSER) frequency score at baseline and eight weeks.For continuous outcomes, the means and standard deviations were extracted for both arms, while for dichotomous outcomes, the total number of affected and unaffected participants per arm was extracted.
When available, both per-protocol (PP) and intention-to-treat (ITT) data were extracted.ITT data were prioritized and data from PP were only included if ITT data were not available.As most of the included trials had published both ITT and PP data for dichotomous endpoints (response and remission rates), the differences between these approaches were explored in the sensitivity analyses and presented in the supplement.The exact definitions of the endpoints selected for each meta-analysis conducted can be found in the supplement (Table S3).The following data were extracted from each included trial for additional analyses: (i) patient demographics (male/female ratio, mean age, ethnic background, baseline symptom severity), (ii) patient inclusion/exclusion criteria, (iii) PGx tool characteristics, (iv) antidepressants prescribed, (v) funding sources, (vi) type of allocation blinding, (vii) choice of primary and secondary endpoint, (viii) protocol pre-registration, (ix) total duration of follow-up and (x) data required for risk of bias analysis, i.e. the data on blinding, deviation from the intended intervention, absence of data, outcome measurement and data reporting.

Missing data imputation procedures
If nominal change in symptom severity at eight weeks was not reported, but values for symptom severity at baseline and week eight were available, this outcome was estimated using a previously published method (Higgins et al., 2022).If the percentage change in symptom severity between baseline and week eight was not reported, but baseline symptom severity and nominal symptom severity change were reported or estimated for the same time points, the Taylor expansion method was used to estimate this outcome (Stuart and Ord, 1998).Standard errors and confidence intervals were converted to standard deviations using standard formulae (Higgins et al.,2022), while data shown graphically were extracted by extrapolating the data shown in the figure.Nominal changes in symptom severity scores at eight weeks, as measured by rating scales in the included trials, were converted to nominal changes in the HAM-D scale for the same time points, based on previously published correlation formulae (Sun et al., 2020, Jarrett et al., 2008).When remission and response rates were not reported by the authors, but continuous depression severity data were available, response and remission rates were estimated using the method described previously (Furukawa et al., 2005).The validity of the transformations was ascertained in previous papers (Weir et al., 2018;Kambach et al., 2020) and double-checked by using data from certain trials (Han et al., 2018, Oslin et al., 2022, Perez et al., 2017) where both input and output data for the

Table 1
Comparison of inclusion of outcome data and summary results between the current and previous meta-analyses.Differences in previous meta-analyses in the selection of input data compared to the current report are highlighted in red.Regarding the meta-analyses on remission and response rates, the current report provides the largest scope compared to the four previous meta-analyses, but still excludes study with critical methodological problems (Bradley et al., 2018).The meta-analysis was performed using the "Mantel-Haenszel" method and a random effects model.respective outcome transformation process had already been provided in respective reports.If the published trial data were not sufficient for inclusion even after all listed procedures, the corresponding author was contacted to provide additional data; subsequently, a trial was excluded if the data were ultimately not sufficient for the analysis of a specific outcome, one month after the contact attempt.

Meta-analyses
RevMan V5.4.1 software (Cochrane Collaboration, 2020) was used for the meta-analyses.Dichotomous outcomes were analyzed using the risk ratio metric and the Mantel-Haenszel method, while the odds ratio metric was calculated only for the purpose of comparison with other meta-analyses.Comparison between groups for continuous outcomes was performed using the mean difference metric and the inverse variance method.Due to the inherent heterogeneity resulting from the different PGx tool and choice of antidepressants in the included trials, a random effects model was used for all meta-analyses.The percentage change from baseline in symptom severity at eight weeks was the primary meta-analysis for antidepressant efficacy.Three additional secondary meta-analyses of antidepressant efficacy were conducted: (i) meta-analysis of nominal change from baseline in HAM-D scores; (ii) meta-analysis of response rate; and (iii) meta-analysis of remission rate.Due to the limited availability of data on antidepressant adverse drug reactions, no reliable meta-analysis on antidepressant tolerability could be performed.However, we conducted a meta-analysis on the discontinuation rate for any reason, supplemented by the secondary metaanalysis of the change from baseline in FIBSER frequency scores.A sensitivity analysis was performed to examine the robustness of the effect to changes in the trial inclusion and exclusion criteria; inclusion was restricted to trials: (1) associated with low risk of bias, (2) with more than hundred patients, (3) dealing with adult patients (18-65 years old) only, (4) with outcome measured exactly eight weeks after treatment initiation, (5) with no missing data imputation needed.Also, exploratory analyses included the analysis where the trial with the most pronounced positive result was excluded; inclusion was limited to Caucasians or East-Asians alone, and inclusion was limited to usage of a distinct PGx tool.Small-study effect and potential publication bias were assessed using visual inspection of contoured funnel plots (Peters et al., 2008).Sterne et al. (2019) and Egger's test (Egger et al.,1997) and the methodology behind these tests is described in full detail in the supplement.

Interpretation of clinical meaningfulness
To gauge the clinical meaningfulness of presently available PGx tools based on continuous outcomes the effect was compared with a previously established criterion that the lowest observable change in symptom severity reduction is a three-point reduction in HAM-D score (Leucht et al., 2013).Evaluation of the clinical meaningfulness of PGx tools based on categorical outcomes involved the use of number needed to treat (NNT) and number needed to genotype (NNG) metrics for response and remission rates.In essence, NNT and NNG, in the context of this study, represents the number of individuals requiring treatment or genotyping to facilitate one additional event of remission or response.Calculated based on event risks among PGx-guided individuals and those treated-as-usual (Tonk et al., 2017), values lower than 10 are considered clinically meaningful, aligning with previously established criteria (Goldberg et al., 2019a).

Summary of the trial and participant characteristics
Of the 962 records reviewed, 15 trials met the inclusion criteria (Bradley et al., 2015, Greden et al., 2019, Han et al., 2018 The authors did not analyze remission rates and also included 2 non-randomized studies not included in other meta-analyses (not shown). 1 Data for remission rates are estimated, although original data are available.
RCT: Randomized controlled trial; PP: Per protocol; ITT: Intention-to-treat; HAM-D: Hamilton depression rating scale; PGI-I: Patient's global scale of impression for treatment improvement; 95%CI: 95% Confidence interval; N/A not applicable.
2017, Perlis et al., 2020, Shan et al., 2019, Singh 2015, Tiwari et al., 2022, Vande Vroot et al., 2021, van der Schans et al., 2019, Vos et al., 2023, Winner et al., 2013) (Fig. 1).The cohort characteristics (Table S1) and study design (Table S2) of the included trials are presented in detail in the supplement.Overall, most trials included predominantly female and predominantly Caucasian patients with a mean age of around 45 years.Exceptions were: one trial with adolescents (Vande Vroot et al., 2021), one trial with older patients (van der Schans et al., 2019), and two trials with East Asian patients (Han et al., 2018, Shan et al., 2019).Six trials included patients who had a history of psychiatric drug failure, three included patients taking an antidepressant for the first time, three included both categories, and in three trials this was unclear.All but one study [28] did not restrict the choice of antidepressant, while most trials allowed the introduction of the second antidepressant.Two of the included trials (Bradley et al., 2018, Singh, 2015) had critical methodological problems and were therefore excluded from certain analyses.In particular, the study by Bradley et al., 2018 reported discontinuation rates for the entire cohort, while data on changes in HAM-D scores were only reported for those patients with baseline HAM-D scores of more than 20 points, but not for the other patients.Because the authors selectively reported efficacy data only for the subgroup of the randomized cohort while omitting the other patients, and because the corresponding author never responded to multiple email inquiries, this trial was removed from all efficacy analyses because of the high risk of reporting bias.Singh 2015 trial reported highly imbalanced baseline depression severity scores between PGx and TAU patients.As higher baseline severity leads to higher nominal and relative reductions in HAM-D score during follow-up (Hieronymus et al., 2019), this study was only included in the analysis of remission and discontinuation rates.Additionally, due to the age being and important factor in antidepressant treatment, trials on adolescents (Vande Vroot et al., 2021) and elderly cohort (van der Schans et al., 2019) were excluded from the calculations as a part of sensitivity analysis.Using the RoB2 analysis of risk of bias (Table S9), five trials were rated as having a low risk of bias, seven trials were rated as having a serious risk of bias, and three trials were rated as having a high risk of bias.The most common causes of risk of bias were suboptimal handling of missing information, lack of a preregistered study protocol and suboptimal concealment of randomization.

The effect of pharmacogenomic guidance on antidepressant efficacy and tolerability
The primary endpoint of antidepressant efficacy, the percentage change in MDD symptom severity after eight weeks of antidepressant treatment (Fig. 2A), showed a mean reduction in symptom severity of 31.0%(95%CI: 29.7, 32.3%) in the PGx-guided group and a mean reduction of 26.8% (95%CI: 25.6-28.1%) in the TAU group; comparison of these values resulted in a statistically significant 3.4% (95%CI: 1.6%-5.2%)better improvement in PGx-guided compared to TAU group.When comparing the nominal reduction in HAM-D score after eight weeks from baseline (Fig. 2B), symptom improvement was 0.75 HAM-D points higher in the PGx-guided than in the TAU group (95%CI: 0.30-1.21).Meta-analyses of dichotomous outcomes related to antidepressant efficacy showed a mean response rate of 29.4% (95%CI: 27.6%-31.2%)in the PGx group and 24.8% (95%CI: 23.0%-26.6%) in the TAU group, which translated into 1.18 times (95%: 1.05-1.33)higher response rate in the PGx-guided compared to TAU group (Fig. 2C).In addition, the mean remission rate was 19.5% (95%CI: 18.1%-21.0%)in the PGx-guided group and 14.7% (95%CI: 13.3%-16.2%)in the TAU group, which translated into 1.37-fold (95%CI: 1.15-1.63)higher probability of remission in the PGx-guided compared to TAU group (Fig. 2D).The meta-analysis of remission rate showed moderate heterogeneity (I 2 =50%), which can be attributed to Perlis et al. 2020 andSingh 2015, as the exclusion of ether of these two trials abolishes the heterogeneity within this meta-analysis.
There were no significant differences between the PGx and TAU groups in the rate of treatment discontinuation for any reason after eight weeks (Fig. 3A).The meta-analysis on FIBSER frequency score showed no differences between the PGx and TAU arms in the change from baseline (Fig. 3B).Meta-analysis of change from baseline in FIBSER frequency score showed significant heterogeneity (I 2 =57%), likely due to the small number of included trials.In summary, the use of currently available pharmacogenomic tools statistically significantly improves efficacy with no impact on discontinuation rate.

Effect Robustness and Consideration of Publication Bias
The impact on the relative change in symptom severity from baseline lost significance in the sensitivity analysis when the inclusion was limited exclusively to trials with low risk-of-bias (Table S4), while the effect on the nominal reduction in HAM-D scores maintained significance across all sensitivity analyses (Table S5).Notably, a substantial alteration in effect magnitude emerged in the sensitivity analysis involving only East Asian patients (Shan et al., 2019;Han et al., 2018).In this case, the improvement in symptom severity reduction within the PGx-guided group compared to the treatment-as-usual (TAU) group escalated significantly to 10.6% relative change or a nominal 2.8 HAM-D decrease.The sensitivity analyses focused on response and remission rates revealed a loss of statistical significance for both effects when inclusion was restricted exclusively to trials with low risk-of-bias (Table S6  and S7).Additionally, the effect on response rate was no longer significant when inclusion was limited exclusively to large trials (refer to Table S6).Individual trials exhibited notable influence on the meta-analysis of remission rates; specifically, excluding the Singh 2015 study decreased the risk ratio from 1.37 to 1.27, while exclusion of the Perlis et al. 2020 study increased the risk ratio from 1.37 to 1.45 (refer to Table S7).As it was the case for the continuous endpoints, the risk ratio of response rates favouring the PGx group increased from 1.18 to 1.42 when the inclusion was limited to East Asians exclusively, while the difference in remission rates remained consistent.In an exploratory sensitivity analysis focusing on dropout rates, certain changes to inclusion criteria resulted in a significant advantage favouring better tolerance in the TAU group compared to the PGx arm.Concerning the use of individual tools, restricting the analysis to studies using Genesight® PGx tools yielded results consistent with the principal meta-analyses, while sub-analyses of other PGx tools remained inconclusive due to limited data availability.
Visual assessment of the contoured funnel suggests asymmetry due to the possibility of missing studies with low precision and statistically non-significant results indicates asymmetry for the relative change in symptom severity (Fig. 4A), nominal change in symptom severity (Fig. 4B), and response rate (Fig. 4D).The statistically significant in Egger's test for the change in HAM-D score from baseline (Fig. 4B) indicated that publication bias cannot be ruled out for this effect.While the significant asymmetry for the discontinuation rate that was also observed in Egger test (Fig. 4C), this asymmetry was likely caused by methodological heterogeneity between studies rather than by publication bias because the funnel plot clearly indicates that the asymmetry is not related to the significance level of the outcome (Sterne et al., 2019).In the meta-analyses of remission rate (Fig. 4E), no significant asymmetry was found either visually or statistically.In summary, the robustness of the observed effect is not unequivocal, as the analysis of studies associated with low risk of bias does not indicate superiority PGx-guided treatment compared to TAU, while publication bias cannot be equivocality ruled out for most of the analyzed outcomes.

Clinical meaningfulness of the effect
Related to continuous outcomes, the difference in HAMD score reduction between PGx and TAU arms after eight weeks was 0.75 (95% CI: 0.30-1.21),which is below the level regarded as clinically meaningful according to previously established criteria (Leucht et al. 2013).Related to categorical outcomes, the number of patients needed to treat (NTT) to achieve one additional response was 21.1, while the NNT to achieve one additional remission was 21.3, meaning that both were substantially higher than 10, which is a threshold for clinically meaningful effect according to the previously established criterion (Goldberg et al., 2019a).Insufficient data was presented for number needed to genotype metric calculations, as the response and remission rates for participants with variant and reference alleles within PGx-guided groups was seldom reported separately.In summary, even with concerns related to the effect robustness and publication bias aside, the effect size based on the outcomes indicating the improvement of treatment is below the threshold to be considered clinically meaningful.

Discussion
To our knowledge, this manuscript provides the most comprehensive and balanced meta-analysis studying the impact of antidepressants on both continuous and dichotomous efficacy parameters in clinical trials comparing standard treatment and treatment guided by currently available pharmacogenomic clinical decision support tools in patients with major depression.Although the results suggest a statistically significant increase in antidepressant efficacy, a possible bias towards positive results, the small size of the effect, and the insufficient robustness of the observed increase, together with the lack of impact on antidepressant tolerability, suggest that the clinical utility of currently  available pharmacogenomic tools in antidepressant treatment is still questionable.
Four previously published meta-analyses (Brown et al., 2022, Arnone et al., 2023, Bunka et al., 2023, Wang et al., 2023) had a similar aim like this analysis, but then focused only on dichotomous outcome measures, which were defined as secondary endpoints in the included trials.A direct comparison between these analyses showed many similarities, such as the fold-change in remission rate between groups, suggesting that one additional remission event per twenty patients guided with PGx is expected.However, there are also many inconsistencies due to differences in methodology and interpretation (Table 1).Here, these inconsistencies through an umbrella review of previously published reports and re-analysis of the data; specifically, (i) the trials with serious methodological problems were excluded, (ii) the time at which the results were analyzed was standardized, and (iii) the results were analyzed primarily using the intention-to-treat approach rather than the per protocol approach, in line with the well-established practice (Samara et al., 2016).The present analysis shows a less pronounced increase in response and remission rates in the PGx compared to TAU group and suggests a lack of robustness (Table 1).Furthermore, while response and remission rates are highly relevant, the threshold-based binary categorization results in a loss of differences in response intensity between patients and limits the quantitative assessment of the magnitude of the effect of the PGx tool usage on antidepressant efficacy.Here, we focused on both dichotomous and continuous outcomes and were able to quantify a nominal 3.5% greater reduction in symptom severity from baseline eight weeks after treatment initiation, corresponding to an average improvement of 0.77 HAM-D points in the PGx group compared to TAU.Clinical significance of these results can be assessed in two ways: 1) Changes in HAM-D score were lower than 3 points are considered clinically meaningful (Leucht et al., 2013); and 2) Number needed to treat (NTT) metric for both remission and response rates for our meta-analysis were 21 while, while NTT less than 10 is considered clinically meaningful (Goldberg et al., 2019a).In other words, 21 patients need to be treated with either currently available PGx tool in order to produce one additional MMD remission or response.Thus, it is difficult to conclude unequivocally that the observed effects on currently available PGx tools are clinically relevant given the small effect size and the possibility of the publication bias.
Importantly, all included trials were either inconclusive or negative stand-alone; specifically, they either did not have a preregistered primary endpoint or the preregistered primary endpoint did not show superiority of the PGx tool compared to TAU.Pre-specification of the primary endpoint and adherence to this endpoint later in the analysis and reporting is a very important practice that reduces unconscious and conscious bias, while without this, it is difficult to assert causality between independent and dependent variable (Goldberg 2019b).Examples of this are the trials published by Oslin et al., 2022 andPapastergiou et al. 2021, which reported positive results, but included significant discrepancies between the predefined protocol and the published results (Table S2).
Finally, the true potential of PGx to improve antidepressant treatment outcomes has likely been diluted by a substantial proportion of placebo responders and absolute non-responders (Preskorn, 2014) and attenuated by inherent shortcomings of currently available PGx tools.In particular, currently available PGx tools sometimes include variants whose relevance to antidepressant treatment are not yet clearly established and fully understood (Border et al., 2019) while omitting clinically relevant genetic variants and diplotypes, such as CYP2C19 and CYP2D6 loss-of-function alleles (Jukić et al., 2022).Moreover, these PGx tools usually translate genetic biomarkers into clinical recommendations in a proprietary and non-transparent manner.Further, rather than giving the precise treatment guidelines, most available PGx tools serve to alert clinicians to avoid potentially relevant gene-drug interaction, which may lead to the unneeded discontinuation of potentially effective drugs (Goldberg et al, 2019a).In addition, the included trials were heterogeneous in terms of choice of PGx tool, patient cohort and type of antidepressants used (Table S2).Although the results of sensitivity analysis presented here may provide additional context (Tables S4-S8), Fig. 3. Forest plots of antidepressant tolerability comparison between PGx and TAU arms.(A) There were no significant differences in discontinuation rates between the PGx-guided and TAU arm (Risk ratio 95%CI: 0.98, 1.17; df=12, p=0.12, n=13, N=5 766).(B) There was no significant difference in FIBSER frequency score (item 1) change from baseline until week eight (Mean difference: -0.47; 95%CI: -1.02, 0.09; df=4, p=0.10, n=5, N=939) of treatment between PGx guided and TAU arms.PGx-guided: Pharmacogenomics guided therapy of depression; TAU: Treatment-as-usual; RCT: Randomized controlled trial; CI: Confidence interval; SD: Standard deviation; HAM-D: Hamilton depression rating scale; FIBSER: Frequency, intensity, burden of side-effects rating scale.our summary results may have limited generalizability because the method we used is inherently unable to account for existing heterogeneities.Finally, although data imputation methods show more correct results compared to the omission of the incomplete datasets (Weir et al., 2018, Kambach et al., 2020), it is impossible to completely eliminate the risk of inaccurate data estimation in our meta-analyses.
In summary, adequately exploiting the true potential of pharmacogenomic biomarkers requires further genetic research focusing on individual gene-drug interactions, better defined interventions based on individual drug-biomarker interactions, and sufficiently powered and designed clinical trials focusing on such improved and standardized pharmacogenomic clinical decision support tools.

Fig. 1 .
Fig. 1.PRISMA flowchart for literature search: Out of 962 screened records from several digital literature databases, 15 RCTs were included across different metaanalyses.The flowchart is representative of the most recent literature search re-run performed on October 24th 2023.

Fig. 4 .
Fig. 4. Contour-enhanced funnel plots: Visual assessment of contour-enhanced funnel plots revealed possible asymmetry in (D) response rate (p=0.67)while both visual and statistically significant asymmetries were observed for (A) percentage change from baseline (p=0.061),(B) nominal change from baseline in HAM-D score (p=0.038) and (C) discontinuation rate for any reason (p=0.0002)meta-analyses.No asymmetry was observed for (E) remission rate (p=0.11)meta-analyses.Significance of the funnel plot asymmetry was tested using Egger's test, where p<0.10 is considered statistically significant [16].SE: Standard error; MD: Mean difference; HAM-D: Hamilton depression rating scale; RR: Risk ratio; CI: Confidence interval