Minimal-assumption inference from population-genomic data

  1. Daniel B Weissman  Is a corresponding author
  2. Oskar Hallatschek  Is a corresponding author
  1. Emory University, United States
  2. University of California, Berkeley, United States

Decision letter

  1. Magnus Nordborg
    Reviewing Editor; Vienna Biocenter, Austria

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Minimal-assumption inference from population-genomic data" for consideration by eLife. Your article has been favorably evaluated by Diethard Tautz (Senior Editor) and three reviewers, one of whom, Magnus Nordborg (Reviewer #1), is a member of our Board of Reviewing Editors. The following individuals involved in review of your submission have agreed to reveal their identity: Stephan Schiffels (Reviewer #2); Paul Marjoram (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Polymorphism data contains information about the evolutionary history of the population, and can be used for inference about the process that gave rise to the data. In the era of cheap genome sequencing, this is of great interest. However, generally speaking, the polymorphism data reflects an underlying "ancestral recombination graph", which, in itself, is a product of the evolutionary process. If we could infer this graph, we would not need the polymorphism data. This theoretical paper describes a method for extracting information about this graph under a fairly general model – information that can then be used for further dissection of the evolutionary process. As is demonstrated in the paper, this can be massively more efficient than trying to infer the details of this process directly from the polymorphism data.

Essential revisions:

The analysis of real data and its presentation could be improved, and help readers understand the method better. For example, Figure 4 shows that MAGIC apparently doesn't improve population size estimates in Yoruba over MSMC, even though the MSMC results are based on single individuals while MAGIC analyses all 9 simultaneously. At the same time, when estimating tip branch lengths, Figure 4 (right hand side) shows impressively how MAGIC's estimates are contradicting the Ne model from both MSMC and MAGIC based on pairwise coalescence times, thus perhaps revealing that the model is not good. This advantage should be made clearer, and it may also be useful to point out potential ways forward. For example, joint analysis of pairwise coalescence times and tip branch lengths might suggest better models, or generally improve estimates for certain parameter values?

This would emphasize MAGIC's utility as a flexible analysis tool that can handle large data sets.

https://doi.org/10.7554/eLife.24836.013

Author response

Essential revisions:

The analysis of real data and its presentation could be improved, and help readers understand the method better.

For the analysis, the switch to the piecewise-exponential form has slightly improved the performance, and the additional curves in Figure 4 hopefully give a somewhat better sense for the levels and forms of noise and bias in the method.

For example, Figure 4 shows that MAGIC apparently doesn't improve population size estimates in Yoruba over MSMC, even though the MSMC results are based on single individuals while MAGIC analyses all 9 simultaneously.

Yes, this is an important point. We have added it in the first paragraph of the subsection “Human data”.

At the same time, when estimating tip branch lengths, Figure 4 (right hand side) shows impressively how MAGIC's estimates are contradicting the Ne model from both MSMC and MAGIC based on pairwise coalescence times, thus perhaps revealing that the model is not good. This advantage should be made clearer, and it may also be useful to point out potential ways forward. For example, joint analysis of pairwise coalescence times and tip branch lengths might suggest better models, or generally improve estimates for certain parameter values?

This would emphasize MAGIC's utility as a flexible analysis tool that can handle large data sets.

We have only added a few lines of text to this section and the Discussion, but we hope that they help make things a little clearer. The key part is –at the end of the subsection “Human data” – we think that MAGIC can be used with ad hoc ABC to match multiple inferred branch length distributions, or to check the results of multiple stand-alone inference methods. We have also added more on this point in the second paragraph of the subsection “Approach” and in the fourth paragraph of the Discussion.

https://doi.org/10.7554/eLife.24836.014

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Daniel B Weissman
  2. Oskar Hallatschek
(2017)
Minimal-assumption inference from population-genomic data
eLife 6:e24836.
https://doi.org/10.7554/eLife.24836

Share this article

https://doi.org/10.7554/eLife.24836