methyvim: Targeted, robust, and model-free differential methylation analysis in R [version 1; peer review: 1 approved with reservations, 2 not approved]

We present methyvim, an R package implementing an algorithm for the nonparametric estimation of the effects of exposures on DNA methylation at CpG sites throughout the genome, complete with straightforward statistical inference for such estimates. The approach leverages variable importance measures derived from statistical parameters arising in causal inference, defined in such a manner that they may be used to obtain targeted estimates of the relative importance of individual CpG sites with respect to a binary treatment assigned at the phenotype level, thereby providing a new approach to identifying differentially methylated positions. The procedure implemented is computationally efficient, incorporating a preliminary screening step to isolate a subset of sites for which there is cursory evidence of differential methylation as well as a unique multiple testing correction to control the False Discovery Rate with the same rigor as would be available if all sites were subjected to testing. This novel technique for analysis of differentially methylated positions provides an avenue for incorporating flexible state-of-the-art dataadaptive regression procedures (i.e., machine learning) into the estimation of differential methylation effects without the loss of interpretable statistical inference for the estimated quantity.


Introduction
DNA methylation is a fundamental epigenetic process known to play an important role in the regulation of gene expression. DNA methylation most commonly occurs at CpG sites and involves the addition of a methyl group (CH 3 ) to the fifth carbon of the cytosine ring structure to form 5-methylcytosine. Numerous biological and medical studies have implicated DNA methylation as playing a role in disease and development 1 . Perhaps unsurprisingly then, biotechnologies have been developed to rigorously probe the molecular mechanisms of this epigenetic process. Modern assays, like the Illumina Infinium HumanMethylation BeadChip assay, allow for quantitative interrogation of DNA methylation, at single-nucleotide resolution, across a comprehensive set of CpG sites scattered across the genome; moreover, the computational biology community has invested significant effort in the development of tools for properly removing technological effects that may contaminate biological signatures measured by such assays [2,Dedeurwaerder et al. 3 ]. Despite these advances in both biological and bioninformatical techniques, most statistical methods available for differential analysis of data produced by such assays rely on over-simplified models that do not readily extend to such high-dimensional data structures without restrictive modeling assumptions and the use of inferentially costly hypothesis testing corrections. When these standard assumptions are violated, estimates of the population-level effect of an exposure or treatment may suffer from large bias. What's more, reliance on restrictive and misspecified statistical models naturally leads to biased effect estimates that are not only misleading in assessing effect sizes but also result in false discoveries as these biased estimates are subject to testing and inferential procedures. Such predictably unreliable methods serve only to produce findings that are later invalidated by replication studies and add still further complexity to discovering biological targets for potential therapeutics. Data-adaptive estimation procedures that utilize machine learning provide a way to overcome many of the problems common in classical methods, controlling for potential confounding even in high-dimensional settings; however, interpretable statistical inference (i.e., confidence intervals and hypothesis tests) from such data-adaptive estimates is challenging to obtain 4 .
In this paper, we briefly present an alternative to such statistical analysis approaches in the form of a nonparametric estimation procedure that provides simple and readily interpretable statistical inference, discussing at length a recent implementation of the methodology in the methyvim R package. Inspired by recent advances in statistical causal inference and machine learning, we provide a computationally efficient technique for obtaining targeted estimates of nonparametric variable importance measures (VIMs) 5 , estimated at a set of pre-screened CpG sites, controlling for the False Discovery Rate (FDR) as if all sites were tested. Under standard assumptions (e.g., identifiability, strong ignorability) 6 , targeted minimum loss-based estimators of regular asymptotically linear estimators have sampling distributions that are asymptotically normal, allowing for reliable point estimation and the construction of Wald-style confidence intervals [7, van der Laan and Rose 8 ]. In the context of DNA methylation studies, we define the counterfactual outcomes under a binary treatment as the observed methylation (whether Beta-or M-) values a CpG site would have if all subjects were administered the treatment and the methylation values a CpG site would have if treatment were withheld from all subjects. Although these counterfactual outcomes are, of course, impossible to observe, they do have statistical analogs that may be reliably estimated (i.e., identified) from observed data under a small number of untestable assumptions 6 . We describe an algorithm that incorporates, in its final step, the use targeted minimum loss-based estimators (TMLE) 9 of a given VIM of interest, though we defer rigorous and detailed descriptions of this aspect of the statistical methodology to work outside the scope of the present manuscript [9, van der Laan and Rose 7 , van der Laan and Rose 8 ]. The proposed methodology assesses the individual importance of a given CpG site, as a proposed measure of differential methylation, by utilizing state-of-the-art machine learning algorithms in deriving targeted estimates and robust inference of a VIM, as considered more broadly for biomarkers in Bembom et al. 10 and Tuglus and van der Laan 11 . In the present work, we focus on the methyvim software package, available through the Bioconductor project [12, Huber et al. 13 ] for the R language and environment for statistical computing 14 , which implements a particular realization of this methodology specifically tailored for the analysis and identification of differentially methylated positions (DMPs).
For an extended discussion of the general framework of targeted minimum loss-based estimation and detailed accounts of how this approach may be brought to bear in developing answers to complex scientific problems through statistical and causal inference, the interested reader is invited to consult van der Laan and Rose 7 and van der Laan and Rose 8 . For a more general introduction to causal inference, Pearl 6 and Hernan and Robins 15 may be of interest.

Implementation
The core functionality of this package is made available via the eponymous methyvim function, which implements a statistical algorithm designed to compute targeted estimates of VIMs, defined in such a way that the VIMs represent parameters of scientific interest in computational biology experiments; moreover, these VIMs are defined such that they may be estimated in a manner that is very nearly assumption-free, that is, within a fully nonparametric statistical model. The statistical algorithm consists of several major steps summarized below. Additional methodological details on the use of targeted minimum loss-based estimation in this problem setting is provided in Supplementary File 1.
1. Pre-screening of genomic sites is used to isolate a subset of sites for which there is cursory evidence of differential methylation. Currently, the available screening approach adapts core routines from the limma R package. Following the style of the function for performing screening via limma, users may write their own screening functions and are invited to contribute such functions to the core software package by opening pull requests at the GitHub repository: https://github.com/nhejazi/methyvim.
2. Nonparametric estimates of VIMs, for the specified target parameter, are computed at each of the CpG sites passing the screening step. The VIMs are defined in such a way that the estimated effects is of an binary treatment on the methylation status of a target CpG site, controlling for the observed methylation status of the neighbors of that site. Currently, routines are adapted from the tmle R package.
3. Since pre-screening is performed prior to estimating VIMs, we apply the modified marginal Benjamini and Hochberg step-up False Discovery Rate controlling procedure for multi-stage analyses (FDR-MSA), which is well-suited for avoiding false positive discoveries when testing is only performed on a subset of potential targets.

Parameters of Interest
For CpG sites that pass the pre-screening step, a user-specified target parameter of interest is estimated independently at each site. In all cases, an estimator of the parameter of interest is constructed via targeted minimum loss-based estimation.
Two popular target causal parameters for discrete-valued treatments or exposures are Estimating the VIM corresponding to the parameters above, for discrete-valued treatments or exposures, requires two separate regression steps: one for the treatment mechanism (propensity score) and one for the outcome regression. Technical details on the nature of these regressions are discussed in Hernan and Robins 15 , and details for estimating these regressions in the framework of targeted minimum loss-based estimation are discussed in van der Laan and Rose 7 .
Class methytmle We have adopted a class methytmle to help organize the functionality within this package. The methytmle class builds upon the GenomicRatioSet class provided by the minfi package so all of the slots of GenomicRatioSet are contained in a methytmle object. The new class introduced in the methyvim package includes several new slots: • call -the form of the original call to the methyvim function.
• screen_ind -indices identifying CpG sites that pass the screening process.
• clusters -non-unique IDs corresponding to the manner in wich sites are treated as neighbors. These are assigned by genomic distance (bp) and respect chromosome boundaries (produced via a call to bumphunter::clusterMaker).
• var_int -the treatment/exposure status for each subject. Currently, these must be binary, due to the definition of the supported targeted parameters.
• param -the name of the target parameter from which the estimated VIMs are defined.
• vim -a table of statistical results obtained from estimating VIMs for each of the CpG sites that pass the screening procedure.
• ic -the measured array values for each of the CpG sites passing the screening, transformed into influence curve space based on the chosen target parameter.
The show method of the methytmle class summarizes a selection of the above information for the user while masking some of the wealth of information given when calling the same method for GenomicRatio-Set. All information contained in GenomicRatioSet objects is preserved in methytmle objects, so as to ease interoperability with other differential methylation software for experienced users. We refer the reader to the package vignette, "methyvim: Targeted Data-Adaptive Estimation and Inference for Differential Methylation Analysis," included in any distribution of the software package, for further details.
Operation A standard computer with the latest version of R and Bioconductor 3.6 installed will handle applications of the methyvim package.

Use cases
To examine the practical applications and the full set of utilities of the methyvim package, we will use a publicly available example data set produced by the Illumina 450K array, from the minfiData R package, accessible via the Bioconductor project at https://doi.org/doi:10.18129/B9.bioc.minfiData.
Preliminaries: Setting up the data We begin by loading the package and the data set. After loading the data, which comes in the form of a raw MethylSet object, we perform some further processing by mapping to the genome (with mapToGenome) and converting the values from the methylated and unmethylated channels to Beta-values (via ratioConvert). These two steps together produce an object of class GenomicRatioSet, provided by the minfi package.

Differential Methylation Analysis
For this example analysis, we'll treat the condition of the patients as the exposure/treatment variable of interest. The methyvim function requires that this variable either be numeric or easily coercible to numeric. To facilitate this, we'll simply convert the covariate (currently a character): var_int <-(as.numeric(as.factor(colData(grs)$status)) -1) n.b., the re-coding process results in "normal" patients being assigned a value of 1 and cancer patients a 0. Now, we are ready to analyze the effects of cancer status on DNA methylation using this data set. We proceed as follows with a targeted minimum loss-based estimate of the Average Treatment Effect. methyvim_cancer_ate <-methyvim(data_grs = grs, var_int = var_int, vim = "ate", type = "Beta", filter = "limma", filter_cutoff = 0.20, obs_per_covar = 2, parallel = FALSE, sites_comp = 250, tmle_type = "glm" ) ## Loading required package: nnls Note that we set the obs_per_covar argument to a relatively low value (just 2, even though the recommended value, and default, is 20) for the purposes of this example as the sample size is only 10. We do this only to exemplify the estimation procedure and it is important to point out that such low values for obs_per_covar will compromise the quality of inference obtained because this setting directly affects the definition of the target parameter.
Further, note that here we apply the glm flavor of the tmle_type argument, which produces faster results by fitting models for the propensity score and outcome regressions using a limited number of parametric models. By contrast, the sl (for "Super Learning") flavor fits these two regressions using highly nonparametric and data-adaptive procedures (i.e., via machine learning). Obtaining the estimates via GLMs results in each of the regression steps being less robust than if nonparametric regressions were used.
We can view a Finally, we may compute FDR-corrected p-values, by applying a modified procedure for controlling the False Discovery Rate for multi-stage analyses (FDR-MSA) 16 . We do this by simply applying the fdr_msa function.
fdr_p <-fdr_msa(pvals = vim(methyvim_cancer_ate)$pval, total_obs = nrow(methyvim_cancer_ate)) Having explored the results of our analysis numerically, we now proceed to use the visualization tools provided with the methyvim R package to further enhance our understanding of the results.

Visualization of results
While making allowance for users to explore the full set of results produced by the estimation procedure (by way of exposing these directly to the user), the methyvim package also provides three (3) visualization utilities that produce plots commonly used in examining the results of differential methylation analyses.
A simple call to plot produces side-by-side histograms of the raw p-values computed as part of the estimation process and the corrected p-values obtained from using the FDR-MSA procedure.  plot(methyvim_cancer_ate, type = "raw_pvals") plot(methyvim_cancer_ate, type = "fdr_pvals") Remark: The plots displayed above may also be generated as side-by-side histograms in a single plot object. This is the default for the plot method and may easily be invoked by specifying no additional arguments to the plot function, unlike in the above.
From the code snippets displayed above, Figure 1 displays a histogram of raw (or uncorrected) p-values from hypothesis testing of the statistical parameter corresponding to the average treatment effect while Figure 2 produces a histogram of p-values from the same set of hypothesis tests, correcting for multiple testing using the FDR-MSA method 16 . While histograms of the p-values may be generally useful in inspecting the results of the estimation procedure, a more common plot used in examining the results of differential methylation procedures is the volcano plot, which plots the parameter estimate along the x-axis and −log 10 (p-value) along the y-axis. We implement such a plot in the methyvolc function: methyvolc(methyvim_cancer_ate) Figure 3 above displays a volcano plot of the raw (or unadjusted) p-values against estimates of the effect of interest (by default, the average treatment effect in methyvim). The purpose of such a plot is to ensure that very low (possibly statistically significant) p-values do not arise from cases of low variance. This appears to be the case in the plot above (notice that most parameter estimates are near zero, even in cases where the raw p-values are quite low).
Yet another popular plot for visualizing effects in such settings is the heatmap, which plots estimates of the raw methylation effects (as measured by the assay) across subjects using a heat gradient. We implement this in the methyheat function: methyheat(methyvim_cancer_ate, smooth.heat = TRUE, left.label = "none") Remark: Figure 4 displays the results of invoking methyheat in this manner produces a plot of the top sites (25, by default) based on the raw p-value, using the raw methylation measures in the plot. This uses the exceptional superheat R package 17 , to which we can easily pass additional parameters. In particular, we hide the CpG site labels that would appear by default on the left of the heatmap (by setting left.label = "none") to emphasize that this is only an example and not a scientific discovery.

Summary
Here we introduce the R package methyvim, an implementation of a general algorithm for differential methylation analysis that allows for recent advances in causal inference and machine learning to be leveraged in computational biology settings. The estimation procedure produces straightforward statistical inference and takes great care to ensure computationally efficiency of the technique for obtaining targeted estimates of nonparametric variable importance measures. A detailed account of the statistical procedure, including an overview of targeted minimum loss-based estimation, is made available in Supplementary File 1. The software package includes techniques for pre-screening a set of CpG sites, controlling for the False Discovery Rate as if all sites were tested, and for visualzing the results of the analyses in a variety of ways. The anatomy of the software package is dissected and the design described in detail. The methyvim R package is available via the Bioconductor project.  The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Open Peer Review
The manuscript by Hejazi et al. describes a new R package, methyvim, which utilizes targeted lossbased estimation (TMLE) and causal inference approaches to estimate and conduct inference on the effects of exposures on DNA methylation. Overall, I found the machine learning/causal inference technical details to be sound and the arguments for using this TMLE based algorithm to be compelling; namely that it is a nonparametric, flexible approach that is computationally efficient and has asymptotic properties that allow straightforward point and interval estimation. The visualization tools are also very nice. However, for the reader who is unfamiliar with causal inference, generally, or TMLE, specifically, the manuscript can be a bit confusing. I also suggest that more guidance be provided to the reader on interpretation of results rather than focusing purely on what is outputted by the software. Some of the paper resembles software documentation (a manual on how to implement) rather than a traditional manuscript that describes a new method with intuition. A few comments follow that I hope the authors will find helpful in revising their manuscript: It would be helpful to have more description on variable importance measures (VIMs). VIMs are not commonly used measures by statistical geneticists or other researchers; however, they are integral to understanding the approach underlying methyvim. VIMs are described only very briefly in the Introduction.

1.
The Methods section is a bit choppy and could use better organization. Perhaps there could first be a section laying out the causal question and statistical approach, followed by implementation and operation.

2.
For those not familiar with causal inference terms, the definition of the ATE could be simplified by simply using expectations rather than psi.

3.
Importantly, it would be very helpful to provide comparisons of methyvim to other commonly used methods and software used for DNA methylation studies. This would provide more compelling evidence for why researches should use methyvim. Providing both a comparison of outputs and interpretation of results across methods would be useful for users.

4.
Review of Software Implementation: The package methyvimData is required to run the code example in the documentation for the methyvim function. If the package is not installed, the user gets an error. Please either make this package a dependency so that it will necessarily be installed, add a test to see whether the package is installed, and if not, install the package before loading, or at the very least add comments to the code example alerting users to the requirement that the package be installed.

1.
When trying to use the methyvolc function for a methytmle object with vim = "rr", I get the following error: 2.
Error in param > param_bound : comparison (6) is possible only for atomic and list types 3. Could an example be provided in the documentation for use with a continuous treatment?
4. When I run the code: methyvim_cancer_ate <-methyvim(data_grs = grs, var_int = var_int, vim = "ate", type = "Beta", filter = "limma", filter_cutoff = 0.20, obs_per_covar = 2, parallel = FALSE, sites_comp = 250, tmle_type = "glm" ) as provided in the vignette, I get the following error repeated 13 times: Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument 5. I get slightly different results from those in the vignette. For example, I get the following row as part of the output: cg01782097 -9.229141e-03 0.0010232901 0.0112757216 2.736161e-05 8.449024e-01 6. Does the code have a stochastic component? If so, a seed should be set at the start of the vignette to ensure users get the same results as those in the vignette. 7. In the plot produced by methyvolc, what is the "0" color label referring to? Would this ever have multiple values? Could a label for the legend be provided?
Is the rationale for developing the new software tool clearly explained? Yes

Is the description of the software tool technically sound? Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Partly

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Partly The paper "methyvim: Targeted, robust, and model-free differential methylation analysis in R" authored by Hejazi NS, Phillips RV, Hubbard AE and van der Laan MJ, presents a model-free method for determining differential methylation in the R programming environment, leveraging methods used in machine learning research to determine variable importance. The paper topic is very interesting and potentially quite valuable to the epigenetics and bioinformatics communities, especially given the similarity of methods currently utilised for differential methylation analysis.
The manuscript is very good but we had a number of concerns, comments and suggestions regarding how the method is presented or can be improved:

Readability:
The real value of F1000 papers is the freedom that the journal gives the author to extensively explain individual steps in an analysis. The way the manuscript is currently set out is fine for a standard paper, but it would be great if there were direct examples explained where concepts are introduced (i.e. in the "implementation" section), rather than having examples later in the "Operation/Use Cases" section. This would vastly simplify the text and enable each function and parameter to be comprehensively explained in the context of the method presented.
○ Additionally, the authors do a great job at giving background to variable importance measurements, however some of the explanations could be simplified for readability. Introducing background to a concept such as this can be challenging so the authors have done quite well at explaining this. I am definitely not an expert in this approach and even after re-reading the introduction it was difficult to understand without additional materials.

Lack of performance comparisons:
Given the departure from the current consensus (perhaps perceived consensus) of using model-based statistical tests to determine differential methylation, a comparison of methyvim to other algorithms or methods would enable the reader to accurately gauge the differences between the approaches. For example, what are the differences between parametric and non-parametric approaches and are there any additional advantages from using this approach compared to `methylkit`? ○

Pre-screening:
Overall there needs to be a great explanation regarding the pre-screening of genomic-sites. This would allow the reader to gauge the best parameters for this non-parametric approach. For example, the amount of CpG sites were reduced from 485,512 sites to under 500 sites based on Figure 1 (p-values histogram) and Figure 2. What was the number of the significant p-values on the first report of these public data (or if you run with any other existing package)? Will there be an issue with too much data trimming for the sake of ○ having less computational demand?

Scalability:
The example described in this manuscript is from a 450K array, which makes for an easy example that is widely used in human epigenetics research. How does this approach scale to larger numbers of sites or samples? For example, if you used >1 million sites, do you get a comparable number of trimmed sites? Can this be implemented in whole genome bisulfite sequencing (WGBS) analyses, which is likely to have significantly increased numbers of sites?
○ Given the size of current DNA methylation studies, the human or other research areas, perhaps a larger dataset would be helpful to include.

Non-CpG methylation contexts:
How would this `methyvim` package treat data for non-CpG contexts? For example, CHH methylation contexts have a methylation ratio which is much lower than the CpG context in humans, and therefore maybe difficult for the `methyvim` to filter. Illumina EPIC arrays (~850K) have a mixture of these sites on the array, so does that mixture create issues? Some level of work looking at that would be really important in this manuscript.

Other comments:
Some explanation about the results of the methyvim_cancer_ate would be nice (e.g. what was the maximum distance of two neighbouring CpG sites to be called neighbours? Are these results sorted by their coordinates (it might be easy to understand neighbouring effects if sorted)? Has pre-screening affected these max_cor_neighbors statistics?
○ What would the sample classification look like using this non-parametric approach (e.g. PCA-plot)?
○ In general, a greater explanation of results from each function would be useful.

Is the rationale for developing the new software tool clearly explained? Partly
Is the description of the software tool technically sound? Partly

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? No
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

No
In light of these 3 major concerns, each discussed further below, I find it very difficult to assess whether methyvim is software I would be interested in using or recommending to someone analysing DNA methylation data. Consequently, I cannot approve this article at this time.

Main Concerns:
Lack of a simple explanation and justification for the statistical procedures I came to this paper being unfamiliar with a statistical technique central to this paper, namely, 'targeted minimum loss estimators' (TMLE). Unfortunately, after carefully reading the paper several times, it's still not clear to me the purported benefits and limitations of this method nor its appropriateness and utility for analysing DNA methylation data.
Although there are several references to books and papers that cover the "general framework of targeted minimum loss-based estimation and [detail] accounts of how this approach may be brought to bear in developing answers to complex scientific problems through statistical and causal inference", there is no simple explanation of TMLE for the reader who might care about it solely in the context of the current paper: how it is being used to identify differentially methylated loci and how this differs from existing methods.

Choice of example dataset and analysis
Please use a dataset that is suitable to demonstrate the utility and appropriateness of methyvim.
The example dataset comes from the minfiData R/Bioconductor package and contains matched tumour-normal samples from 3 donors (n = 6, mistakenly referred to as n = 10 on p6). The authors admit that this is a small sample size for their method and that this "compromises the quality of inference obtained" by their method. Consequently, it seems unlikely that methyvim is going to produce new insights on this dataset nor will it exemplify the purported utility and appropriateness of the methods implemented in methyvim.

Lack of comparison to existing methods
The results obtained by methyvim need to be compared to those obtained by one of the existing tools (that are claimed to have poor performance for the types of problems methyvim seeks to address). In particular, as a reader, I was looking for the types of differentially methylated sites that this method detects that others might not and vice versa.
To satisfy my curiosity, I applied the very simple minfi::dmpFinder() to the example dataset and took the top-250 CpGs (the same number of CpGs as reported in by the example code in the paper). I then plotted methyvim's and minfi::dmpFinder()'s top-250 CpGs using minfi::plotCpg() to visually assess the quality of the differential methylation analysis. The top-250 CpGs from minfi::dmpFinder() look like real differentially methylated CpGs: large between-condition mean differences and small within-condition variances. In contrast, many of the top CpGs identified by methyvim do not look like real differentially methylated CpGs: small betweencondition mean differences and/or large within-condition variances. This is exemplified by the top CpG called by methyvim (cg15703790, P = 6 x 10^-33, adjusted-P = 3 x 10^-27), which is not a called as a differentially methylated CpG by minfi::dmpFinder() (P = 0.11, Q = 0.26) and when plotted does not appear to be a real differentially methylated CpG. The code to run this comparison and the results figures are available in Result 1 (the R file containing code to generate Result 2 and Result 3), Result 2 (the methyvim output from Result 1) and Result 3 (the minfi output from Result 1) (produced using methyvim v1.3.1).

Minor Suggestions:
Some of these suggestions may be difficult to incorporate (even if desirable) while incorporating backwards compatibility.

Design of the methytmle class
The clusters slot could perhaps be a metadata column on the rowRanges slot, accessible via the rowData() getter/setter. That way when the object is subsetted the clusters would automatically get properly subsetted (currently the clusters slot doesn't behave when the object is subset with [).

○
The screen_ind slot could also be a metadata column on the rowRanges slot but would need to be a TRUE/FALSE vector (rather than a numeric vector) with the same length as the number of rows of the object.

○
For both of the above, you could use how spike-in genes are handled in the SingleCellExperiment class for inspiration. The var_int slot seems like it should just be part of the colData slot rather than its own slot. Again, this would ensure proper subsetting behaviour when the object is subset with [.

○
The call, param and vim slots could perhaps be elements of the metadata slot.

○
The vim slot could be a DataFrame rather than a data.frame. The main advantage is that then the show,methytmle-method wouldn't print out as much output as it currently does (the obvious alternative would be to alter the show() method to prevent so much output).

Constructor function
.methytmle(): A period at the start of a function name typically indicates that the function is for internal use and not exported; see https://bioconductor.org/packages/devel/bioc/vignettes/Summ arizedExperiment/inst/doc/Extensions.html#defining-the-class-and-its-constructor

Plots
The colour scale and legend on the histograms (Figures 1-2) and volcano plots (