Graphical tools for visualizing the results of network meta‐analysis of multicomponent interventions

Network meta‐analysis (NMA) is an established method for assessing the comparative efficacy and safety of competing interventions. It is often the case that we deal with interventions that consist of multiple, possibly interacting, components. Examples of interventions' components include characteristics of the intervention, mode (face‐to‐face, remotely etc.), location (hospital, home etc.), provider (physician, nurse etc.), time of communication (synchronous, asynchronous etc.) and other context related components. Networks of multicomponent interventions are typically sparse and classical NMA inference is not straightforward and prone to confounding. Ideally, we would like to disentangle the effect of each component to find out what works (or does not work). To this aim, we propose novel ways of visualizing the NMA results, describe their use, and illustrate their application in real‐life examples. We developed an R package viscomp to produce all the suggested figures.

What is already known • Interventions may consist of multiple interacting components such as the characteristics of the intervention, who delivers it? how? where? etc. • In sparse networks, network meta-analysis (NMA) summary estimates are mainly informed by direct evidence and are prone to confounding What is new • Novel ways of visualizing NMA results of multicomponent interventions and associating presence or absence of components with efficacy/ effectiveness • R package viscomp Potential impact for Research Synthesis Methods readers outside the authors' field • The proposed visualization tools can be used for identifying which components work (or do not work) • The proposed methodology can be easily applied through the R package viscomp

| INTRODUCTION
Network meta-analysis (NMA) is a widely used method for synthesizing quantitatively the results from many trials with different intervention comparisons. NMA results in more precise effect estimates and allows estimating the relative efficacy between interventions that have never been compared head-to-head. The main assumption of NMA is that the distribution of effect modifiers is similar across intervention comparisons (transitivity assumption). This assumption allows us to estimate relative efficacies between interventions indirectly. If we have three interventions A, B, and C, one can learn about the relative efficacy of B versus C indirectly by using the AB and AC trials, as long as these trials have similar distribution of effect modifiers. An intervention may consist of several (possibly interacting) components. Such interventions are often encountered in the literature as complex or multicomponent interventions. The characteristics of an intervention, the mode of delivering it (e.g. face-to-face vs remotely, individually or in groups), the provider (e.g. clinician, nurse) location (e.g. hospital, home) and other intervention characteristics may have an impact on the intervention's effect. Note that by components we refer to integral parts of the interventions. For example, consider a psychological intervention that (1) includes educating patients about health risks, the disease etc. and (2) focuses on changing behavioral patterns (e.g., taking up exercise). This intervention has an educational and a behavioral component. The intervention may also have other constituents, such as being delivered face-to-face, individually by a psychologist at the patient's home. These are embedded in the intervention and may affect the intervention's effect, but they cannot be seen as covariates outside the intervention that modify the effect of the intervention (effect modifiers). On the other hand, the intensity with which the intervention is applied, can be an effect modifier (e.g., number of sessions per week, duration of session). Typical examples of multicomponent interventions are psychological, behavioral and selfmanagement interventions (SMIs). [1][2][3][4] In these types of interventions, one should think a-priori what constitutes the intervention. 5 Hence, along with the effect modifiers, a knowledgeable expert must define a-priori what the intervention consists of by defining well-classified components. For pharmacological interventions, interventions with multiple drugs (e.g., dual or triple therapy) can be seen as multicomponent interventions.
In such cases, the interest often lies in the evaluation of the components' efficacy and the identification of the 'best' component or the best combination of components. The first two items cannot be addressed in a standard NMA since multicomponent interventions are treated as single nodes. The third item, ideally, is answered by standard NMA, however, in practice, networks of multicomponent interventions are typically sparse, and nodes are predominantly informed by studies involving these nodes. As a result, the efficacy of interventions is confounded with study characteristics, and consequently the transitivity assumption is challenged.
There is an increase in including multicomponent interventions in empirical studies and networks of trials. [1][2][3]6 For example, a review of 201 networks found that up to 2015 (starting from 2000) 18 (9%) NMAs included multicomponent interventions, while in the last 3 years (2016-2018) this number rose to 12% (24/201). 7 Visualizing NMA results would offer insight into components' efficacy and would assist with understanding their behavior and identifying the most effective components easier. Pillay et al. identified the most effective components of behavioral programs for controlling HbA 1c by looking at the NMA summary effects and observing which components were included in the most effective interventions. 2 This is not easy to do when multiple components are present and a visual representation of how components relate with summary effects would be useful. Although there are multiple graphical approaches in meta-analysis and NMA 8 there are no approaches to exemplify the individual components as obtained from NMA.
The aim of this article is to present ways of visualizing NMA results consisting of interventions with multiple components. To this aim, we developed an easy-to-use R package called viscomp, 9 also available on GitHub (https:// github.com/georgiosseitidis/viscomp). The viscomp package takes as input the results of the network meta-analysis model as obtained by the netmeta package. 10 We describe the proposed visualization tools and demonstrate them using real-life data examples.

| EMPIRICAL DATASETS
The data were collected within the COMPAR-EU 4 project that aims to identify, compare and rank the most effective and cost-effective SMIs for adults in Europe living with one of the four high-priority chronic conditions (type-II diabetes, obesity, chronic obstructive pulmonary disease, and heart failure). We included two outcome datasets: (a) 461 randomized controlled trials (RCTs) comparing 97 SMIs for reduction in hemoglobin A 1c (HbA 1c ), and (b) 41 RCTs comparing the effectiveness of 30 SMIs on improving self-management behaviors (SMB) in patients with type-II diabetes. The latter dataset was used only for the construction of the rank-heat plot (see Section 3.8). Description of the data sets and components' abbreviations are presented in Tables S1 and S2. The SMIs have six characteristics whose names are masked as these results will be submitted soon to a medical journal and cannot be revealed beforehand. The geometry of the network for the HbA 1c and SMB outcomes is presented in Figures 1 and S1, respectively. For both outcomes usual care (Α) was selected as a reference intervention. Note that for HbA 1c , small effect sizes are favorable for the outcome, while the opposite holds for SMB. Since each effect is associated with a different amount of uncertainty, we compute the corresponding z score defined as NMA effect NMA standard error $ N 0, 1 ð Þ. In the graphs presented in this paper, we assume mainly relative effects, though one can use the z scores instead.
Broadly, there are four different approaches suggested to handle multicomponent interventions in evidence synthesis. These approaches include the grouping of the F I G U R E 2 Components cross-table visualizing the number of arms that include a component or any pair of two components, for the reduction of HbA 1c . Parentheses in the diagonal elements denote the proportions of study arms that include the component, while in the non-diagonal elements denote the proportion of study arms that include the corresponding pair of components out of those study arms that include the component in the corresponding row. The intensity of the color is proportional to the relative frequency of the corresponding component (combination) [Colour figure can be viewed at wileyonlinelibrary.com] active interventions in a single group (single-effect model), the NMA model (full interaction model), the additive NMA model (additive main effects model), also termed as component network meta-analysis (CNMA) model, and its extension to include two-way interactions terms (two-way interaction model). 11 The single-effect model is useful when we have many trials with the same or similar control groups (e.g., placebo, usual care) and by lumping all intervention groups together and contrasting them to control groups, it helps answering the question whether an intervention works in general. In Figure 1, using the single effect model would entail keeping only the trials including "A" (which is the "usual care" node) and would answer the question whether SMIs are useful as a whole, irrespective of what they consist of. This type of model fails to provide insight into which intervention components work, and heterogeneity is often large due to the lack of consideration for intervention differences across trials.
The CNMA models [11][12][13] are based on the additivity assumption, suggesting that combining components results in summing up their effects. That would result in estimating effects for components A to K (say d A to d K Þ and the effect of a combination of components, say "A In practice, this assumption is difficult to test and/or defend. Interaction-effects can be added. For example, if we put an interaction between components "A" and "B", then the effect for " If d AB > 0 the components work synergistically, if d AB < 0 the components work antagonistically and if d AB ¼ 0 the components are independent. Hence, interactions should be chosen with caution since power reduces substantially as the number of interaction terms increases. 14 In this work, we focus on the classical NMA model. In Figure 1, the NMA model would estimate 96 effects (all SMIs vs a reference one). With sparse networks, we typically result in large and imprecise summary effects. This is because these are informed mainly by a couple of trials. Apart from these issues, there is not a clear association between efficacy and presence of components. For example, we may have intervention c 1 þ c 2 with a large effect, intervention c 1 þ c 2 þ c 3 with a small effect and intervention c 1 þ c 2 þ c 3 þ c 4 with a moderate effect. The T A B L E 1 Components' frequency for the reduction of HbA 1c

| GRAPHICAL APPROACHES
We developed an R package 15 (viscomp) to implement the proposed graphical approaches. Details on the installation and the usage of package can be found on https:// georgiosseitidis.github.io/viscomp/index.html.

| Components descriptive analysis
In a network of multicomponent interventions, a basic descriptive visualization of how the included components are combined is useful. A cross-table with the components' frequencies helps us to explore which components (or combinations of components) are the most frequent in the study arms. Each cell represents the total number of arms in the eligible studies where the corresponding component (or combination) was observed. Diagonal elements refer to single components and in parentheses we give the proportion of study arms including that F I G U R E 4 Leaving one component out scatter plot for the comparisons that differ by one specific component in the HbA 1c example, using network meta-analysis relative effects component, while off-diagonal elements to the frequency of pairs of components and in parentheses we give the proportion of study arms with both components out of those study arms that include the component in the row. The intensity of the color is proportional to the relative frequency of the total number of arms where the corresponding component (combination) was observed. Dark red colors suggest large percentages. In Figure 2, the component cross-table for the reduction of HbA 1c is presented. Diagonal elements indicate that the most frequent components are "E" and "B", observed in the 59.04% (565/957) and 38.77% (371/957) of intervention arms, respectively, while the least frequent components are "K" and "D", observed in the 2.93% (28/957) and 5.43% (52/957) of intervention arms, respectively. Off-diagonal elements suggest that "E", "B" and "F" are the most frequently combined components. Also, note that "A", and "D" were not combined with any other components. This is because these two components refer to two different control groups. Moreover, the off-diagonal elements of column "E", indicate that "E" was almost always part of the intervention when the intervention consist of several components, since all the corresponding percentages are close to 100%, as shown by the dark red color. However, the same does not apply to the remaining components. For example, "E" was always included in interventions that included "G", whereas "G" was observed only in the 20.71% of study arms that included "E". Components cross-table was created through the function compdesc.
Although we focus on NMA, it is likely that one will conduct both NMA and CNMA models. CNMA models are mainly based on the additivity assumption, 12,13 under which components that appear in both arms of a comparison cancel out. Exploring the number of studies in which the underlying component is included in all study arms, in at least one arm, or not included in any study arm could provide valuable insight on the understanding F I G U R E 5 Waterfall plot for the comparisons that differ by the component "K" in the HbA 1c example, using network meta-analysis relative effects [Colour figure can be viewed at wileyonlinelibrary.com] of the CNMA estimates. Table 1 summarizes these aspects for the reduction of HbA 1c . The table was derived using the function compdesc.

| Components network graph
A components network graph (CNG) helps visualizing the frequency of components' combinations found in the network. The CNG resembles a network plot where nodes represent the individual components found in the network and edges represent the combination of components found in at least one intervention arm of the trials included in the NMA. Each edge's color represents one of the unique interventions (combination of components) found in the network of interventions. Edges' thickness indicates the frequency by which each intervention (combination of components) was observed in the network (number of arms in which the combination was assigned).
In Figure 3, the most frequent combinations of components for the reduction of HbA 1c are presented. The most frequent combination is between the components "E" and "B" (intervention E + B: included in 52 arms, thickest blue edge), followed by "E", "B" and "F" (intervention E + B + F: included in 51 arms, orange edge). The CNG was created through the function compGraph. A figure including all possible combinations would have been impossible to read, therefore we chose to select the seven most common, which in our example translated to those combinations that appear in at least 20 arms.

| Leaving one component out scatter plot
We can explore whether the inclusion or exclusion of a component has a positive or negative impact on the efficacy of an intervention by looking at the NMA relative effect estimates of those interventions that differ by this specific component. We recommend a scatter plot where the x-axis represents the NMA relative effect of the intervention that includes the component, and the y-axis represents the NMA relative effect of the intervention that consists of the same components just like the one in the x-axis with the sole difference that it does not include the component of interest. This scatter plot visualizes the F I G U R E 6 Components Heat Plot for the reduction of HbA 1c . Each cell refers to a combination of components and presents the median network meta-analysis relative effect of all SMIs that include that combination of components. Gray boxes are proportional to the estimates' precision. "D" was not combined with any other component in the data [Colour figure can be viewed at wileyonlinelibrary.com] impact of the inclusion/exclusion of a specific component. A point on the line y ¼ x indicates that the inclusion/exclusion of the underlying component does not affect the efficacy of the interventions. For a beneficial outcome, dots above the y ¼ x line, indicate that the inclusion of a component hampers the intervention effect while dots below this line signify a component that increases efficacy. The opposite holds for a harmful outcome. Such a scatter plot also serves as a visual method to evaluate the additivity assumption. Additivity implies that the effect of an intervention is calculated as the sum of its relevant components. Thus, it is expected the inclusion/exclusion of a component to have the same impact on the interventions that differ by this component. This is expressed visually in the scatter plot by a line parallel to y ¼ x. The scatter plot can be extended to compare interventions that differ by a specific component combination or adjusted to use z-scores instead of intervention effects.
In our example, Figure 4 indicates that the efficacy of interventions is not affected by the inclusion/exclusion of a specific component. However, by taking into consideration the uncertainty in the NMA estimates, we see that the inclusion of components "K" and "C", reduces the strength of evidence ( Figure S2). This is because most of the identified cases are below the line y ¼ x which is translated to a greater z score when the intervention includes the component. Thus, the inclusion of these two components reduces the intervention efficacy, as HbA 1c is a harmful outcome. Also, the inclusion of component "E" seems to decrease the intervention efficacy as the majority of data points are below the line y ¼ x. Moreover, most estimates with or without the underlying component fall below or above the y ¼ x line (Figure 4). This indicates that the additivity assumption might not hold for the CNMA model. Note that for the evaluation of the additivity assumption, we based solely on Figure 4 where NMA relative effects are used, since z scores in Figure S2 do not reflect the additivity assumption. Scatter plots were constructed using the function loccos.  Figures 5 and S3 present the component combinations that differ by component "K" using the NMA relative effects and the zscores, respectively. The waterfall plot was created using the function watercomp. À Á components combinations in a square matrix, where diagonal elements represent the C components, and off-diagonal elements depict the two-by-two pairwise combinations of the components. Each element summarizes the NMA relative effect (this is the default choice, but one can choose z score instead) of the interventions that include the corresponding component (diagonal) or pairs of components (off-diagonal) by using either the median (default) or the mean. More specifically, for each element, we consider the corresponding median (or mean) NMA relative effect (or z-score) from the interventions that include the pair of components shown in the rows and columns of the matrix. The number of nodes that include the corresponding component (or combination of components) is also provided.

| Components heat plot
Z scores quantify the strength of statistical evidence that the corresponding combination performs better or worse than the reference intervention. The magnitude of each z-score (or NMA relative effect) is denoted by the color's intensity. In the case where z-scores are used, dark green (or red) indicates strong statistical evidence that the corresponding component (or combination of components) performs better (or worse) than the reference intervention, while in the case of NMA relative effects, indicates a large magnitude of the estimated effect. Letter "X" is used to highlight any combinations of components that are not observed in the network.
In our example, the CHP indicates that self-management interventions reduce the level of HbA 1c since all diagonal elements have a negative value (represented by a green color) ( Figure 6). The most effective component seems to be the "B" with a median estimate of À0.44, followed by F I G U R E 8 Violin plots exploring if the number of components included in intervention affects the effectiveness of the interventions used for the reduction of HbA 1c . Dots are proportional to the precision of the network meta-analysis relative effect estimates [Colour figure can be viewed at wileyonlinelibrary.com] the "G" and "H" (median estimates: À0.43 and 0.42 respectively), while the most effective component combination is between "C" and "G" (median estimate À0.91). Note that we are moderate confident about the results because of the uncertainty in the estimates, which is reflected by the size of the medium small gray boxes. Figure S4 suggest that there is significant statistical evidence that "B" and "C + G" performs better than UC (median z-score: À2.01 and À 1.99 respectively). However, the latter estimate should be treated with caution since this estimate was derived solely from one NMA estimate. CHP was created through the function compheat.

| Violin plot
The violin plot combines information of a boxplot and a kernel density plot. Each component is represented by its own violin plot; as data points, we used the z-scores for interventions that include those components. The violin plots can help us identify which components are associated with large z scores.
In Figure 7, we present the violin plots of the NMA relative effects for all SMIs vs UC that include each of the individual components for the HbA 1c outcome. Overall, the self-management interventions are effective, but none of the components appears to be associated with larger effect. This is due to the large heterogeneity in the data. Some overlap between the violin plots was expected, since interventions consisting of multiple components will give the same relative effect for each of their components. Note that the median estimates of the violins are also presented in the diagonal of the CHP. Similar results were presented with the violin plots of the z scores ( Figure S5).
All violin plots were constructed using the function specc. The command is very flexible and we can use it to compare combinations of components for example, if we are interested to explore the distribution of relative effects for interventions including both components A and B. Alternatively, one may be interested to explore how SMI effects change with number of components. In Figures 8  and S6, efficacy of the interventions increases with the number of components; yet, for more than 4 components in the intervention, efficacy slightly decreases. The number of components can also be categorized into groups. In our example, we have grouped the number of components as 1-3 components, 4-5 components and more than 6 components. By grouping the components, we do not observe any substantial difference between the three categories ( Figures S7 and S8).
F I G U R E 9 Density plot for the component used for the reduction of HbA 1c , comparing the efficacy of the interventions when the component is included in the intervention and when is not, using network meta-analysis (NMA) relative effects. Note that "D" was not used since it was included only in one NMA estimate [Colour figure can be viewed at wileyonlinelibrary.com]

| Density plot
The efficacy of components can also be explored by comparing the corresponding density plots of all NMA relative effects (or z scores) that include each of the components (or a combination of them) to those not including the component(s) of interest.
In Figures 9 and S9, we present the impact of including each individual component on the relative efficacy against usual care. If we focus on "I" and "K" components, we can see that interventions that do not include them show larger efficacy compared to those that include them. This could be an indication that these two components potentially interact antagonistically with some of the remaining components and reduce the efficacy of the interventions. Density plots were constructed using the function denscomp.

| Rank-heat plot
When multiple outcomes are available, we are interested in identifying which interventions perform best in more than one outcome. A Hasse diagram can be used to represent a finite partially ordered set by drawing curves between interventions. 16,17 An alternative to the Hasse plot is the rank-heat plot, where the ranking of interventions can be presented across multiple outcomes. 18 Advantages of the rank-heat plot are the presentation of the P score values in the plot along with the different coloring scheme, and its production even when different interventions are included in the studied outcomes.
An extension to the rank-heat plot of interventions, is the visual presentation of the component hierarchy across multiple outcomes. Each circle corresponds to a different outcome, and each rad to a different component. Sectors are colored according to the ranking of the relevant components within the underlying outcomes. Ranking is calculated as the median of the intervention P scores including the component of interest in the particular outcome. The colored scale ranges between red (with P score = 0%) and green (P score = 100%), with yellow color representing P score = 50%. Uncolored sectors, if any, suggest that the underlying component was not included in any of the interventions in the NMA for the particular outcome. Rank-heat plot was created using the function rankheatplot. Figure 10 displays the hierarchy of 11 components for HbA 1c and SMB outcomes using the rank-heat plot. The rank-heat plot of components suggests that "A" and "D" are the worst components for both outcomes. The P scores of the remaining components did not differ importantly. The highest p score in HbA 1c was observed for the "B" component (P score = 59%) followed by "G" (P score = 58%), and in SMB for the "B" (P score = 52%) and "K" (P score = 49%) components, respectively.

| DISCUSSION
Networks of multicomponent interventions typically consist of many nodes, have few head-to-head trials and few trials per comparison. Also, most comparisons involving the reference intervention (e.g., usual care) are informed by a few trials, including this comparison (direct evidence), and not by indirect evidence. As a result, in classical NMA, we may encounter pooled effect sizes that are mainly driven by studies involving the relevant interventions. Moreover, it is difficult to defend the transitivity assumption.
Alternative models that focus on estimating the component effects have been suggested (CNMA models). We argue in favor of doing both an NMA and a CNMA when dealing with sparse networks. The assumptions involved in both models are difficult to justify but, broadly speaking, the sparser the network, the weaker the NMA analysis. To strengthen the NMA analysis, we suggested a series of graphs with the aim to find out which F I G U R E 1 0 Rank-heat plot of p scores for 11 components included in the interventions of HbA 1c and SMB outcomes. Numbers within each sector correspond to the median p-score (%) values as calculated across the interventions including the individual components. Each sector is colored according to the assigned P score of the corresponding component and outcome. The color scale ranges between red (0%), yellow (50%), and green (100%) [Colour figure can be viewed at wileyonlinelibrary.com] components or combinations of components work (or do not work). Though the visual representation of components' behavior lacks statistical justification (e.g., p values, confidence intervals), we argue that it is very helpful in identifying patterns in results and in associating components to efficacy.
Initially, we should start with fully exploring the geometry of the network. Consider the extreme case where two components always occur together and never separately or occur together almost always. In the former case, they should be joined to form one component. Let us assume the former case and that these components were observed in a few interventions with large effects. If we rely solely on the CHP and the violin plot, we conclude that these components are equally important, as they both appear in interventions with large effects. These will not be shown (correctly) in the leave-one-out scatter plot and the waterfall plot due to missing comparators.
A limitation of these figures is that they can be applied only in connected networks. It is important to mention that there could be cases in which a component appears effective due to the complex nature of the interventions and the other components within which it has been observed. This is the case with all other types of analyses (NMA, CNMA). A large estimate in the CHP does not necessarily indicates that the corresponding component (or component combination) is effective. It might be the case where the underlying component is appeared in many interventions with small, moderate, and large effect estimates. For example, for the reduction of HbA 1c Figure 6 indicates that component "E" actually works with a median effect estimate of À0.38. However, from an overall look of the Figures, we see that this component is included in most interventions with large, moderate, and small effects. Therefore, the interpretation of the results must be handled with caution using all the available evidence.
AUTHOR CONTRIBUTIONS Georgios Seitidis conducted the statistical analysis, interpreted the results, developed the software code and the R package, wrote a draft manuscript, edited and revised the article. Sofia Tsokani contributed to the software's code, edited and revised the article. Christos Christogiannis and Katerina-Maria Kontouli contributed to the software's code and edited the article. Alexandros Fyraridis contributed to the software's code. Stavros Nikolakopoulos and Areti Angeliki Veroniki contributed to the software's code, edited and revised the article. Dimitris Mavridis conceptualised and designed the study, edited and revised the article. All authors contributed to the development of the methodology, read and approved the final manuscript.

CONFLICT OF INTEREST
The author declares that there is no conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from COMPAR-EU. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the authors or can be retrieved from https://platform.self-management.eu with the permission of COMPAR-EU.