Linguistic Phylogenies Support Back-Migration from Beringia to Asia

Recent arguments connecting Na-Dene languages of North America with Yeniseian languages of Siberia have been used to assert proof for the origin of Native Americans in central or western Asia. We apply phylogenetic methods to test support for this hypothesis against an alternative hypothesis that Yeniseian represents a back-migration to Asia from a Beringian ancestral population. We coded a linguistic dataset of typological features and used neighbor-joining network algorithms and Bayesian model comparison based on Bayes factors to test the fit between the data and the linguistic phylogenies modeling two dispersal hypotheses. Our results support that a Dene-Yeniseian connection more likely represents radiation out of Beringia with back-migration into central Asia than a migration from central or western Asia to North America.

Adding dummy characters (unobserved site patterns) for division 1 WARNING: There are 32 characters incompatible with the specified coding bias. These characters will be excluded. Setting output file names to "/Users/msicoli/Yeniseian-NaDene-Typlogical_noHAX.nex.run<i>.<p| t>" Exiting data block Reached end of file MrBayes > lset nst=6 rates=gamma Setting Rates to Gamma Successfully set likelihood model parameters Adding dummy characters (unobserved site patterns) for division 1 WARNING: There are 32 characters incompatible with the specified coding bias. These characters will be excluded.

MrBayes > prset brlenspr=clock:uniform
Setting Brlenspr to Clock:Uniform Successfully set prior model parameters Adding dummy characters (unobserved site patterns) for division 1 WARNING: There are 32 characters incompatible with the specified coding bias. These characters will be excluded.
MrBayes > constraint ingroup = 1-37 Defining constraint called 'ingroup' MrBayes > prset topologypr = constraints(ingroup) Setting Topologypr to Constraints Successfully set prior model parameters Adding dummy characters (unobserved site patterns) for division 1 WARNING: There are 32 characters incompatible with the specified coding bias. These characters will be excluded.
In a rooted or clock tree, the tree is rooted using the model and not by reference to an outgroup. Each bipartition therefore corresponds to a clade, that is, a group that includes all the descendants of a particular branch in the tree. Taxa that are included in each clade are denoted using '*', and taxa that are not included are denoted using the '.' symbol.
The output first includes a key to all the bipartitions with frequency larger or equual to (Minpartfreq) in at least one run. Minpartfreq is a paramiter to sumt command and currently it is set to 0.10. This is followed by a table with statistics for the informative bipartitions (those including at least two taxa), sorted from highest to lowest probability. For each bipartition, the table gives the number of times the partition or split was observed in all runs (#obs) and the posterior probability of the bipartition (Probab.), which is the same as the split frequency. If several runs are summarized, this is followed by the minimum split frequency (Min(s)), the maximum frequency (Max(s)), and the standard deviation of frequencies (Stddev(s)) across runs. The latter value should approach 0 for all bipartitions as MCMC runs converge. This is followed by a table summarizing branch lengths, node heights (if a clock model was used) and relaxed clock parameters (if a relaxed clock model was used). The mean, variance, and 95 % credible interval are given for each of these parameters. If several runs are summarized, the potential scale reduction factor (PSRF) is also given; it should approach 1 as runs converge. Node heights will take calibration points into account, if such points were used in the analysis.
Note that Stddev may be unreliable if the partition is not present in all runs (the last column indicates the number of runs that sampled the partition if more than one run is summarized). The PSRF is not calculated at all if the partition is not present in all runs.The PSRF is also sensitive to small sample sizes and it should only be considered a rough guide to convergence since some of the assumptions allowing one to interpret it as a true potential scale reduction factor are violated in MrBayes.
Additionally at the begining of each step 0 generations (0 samples) will be discarded as burnin.
Type "about" for authorship and general information about the program.
In a rooted or clock tree, the tree is rooted using the model and not by reference to an outgroup. Each bipartition therefore corresponds to a clade, that is, a group that includes all the descendants of a particular branch in the tree. Taxa that are included in each clade are denoted using '*', and taxa that are not included are denoted using the '.' symbol.
The output first includes a key to all the bipartitions with frequency larger or equual to (Minpartfreq) in at least one run. Minpartfreq is a paramiter to sumt command and currently it is set to 0.10. This is followed by a table with statistics for the informative bipartitions (those including at least two taxa), sorted from highest to lowest probability. For each bipartition, the table gives the number of times the partition or split was observed in all runs (#obs) and the posterior probability of the bipartition (Probab.), which is the same as the split frequency. If several runs are summarized, this is followed by the minimum split frequency (Min(s)), the maximum frequency (Max(s)), and the standard deviation of frequencies (Stddev(s)) across runs. The latter value should approach 0 for all bipartitions as MCMC runs converge. This is followed by a table summarizing branch lengths, node heights (if a clock model was used) and relaxed clock parameters (if a relaxed clock model was used). The mean, variance, and 95 % credible interval are given for each of these parameters. If several runs are summarized, the potential scale reduction factor (PSRF) is also given; it should approach 1 as runs converge. Node heights will take calibration points into account, if such points were used in the analysis.
Note that Stddev may be unreliable if the partition is not present in all runs (the last column indicates the number of runs that sampled the partition if more than one run is summarized). The PSRF is not calculated at all if the partition is not present in all runs.The PSRF is also sensitive to small sample sizes and it should only be considered a rough guide to convergence since some of the assumptions allowing one to interpret it as a true potential scale reduction factor are violated in MrBayes.

List of taxa in bipartitions:
1 --gwi 2 --dgr 3 --scsh 4 --xsl 5 --bea 6 --crx 7 --chp 8 --txc 9 --haa 10 --ing 11 --kuu 12 --hoi 13 --koy Summary statistics for informative taxon bipartitions (clades) (saved to file "DY-25Dec-strictNOHAX.tstat"): Summary statistics for branch and node parameters (saved to file "DY-25Dec-strictNOHAX.vstat"):  Using relative burnin (a fraction of samples discarded). Summarizing parameters in file DY-25Dec-strictNOHAX.p Writing summary statistics to file DY-25Dec-strictNOHAX.pstat Using relative burnin ('relburnin=yes'), discarding the first 25 % of samples Below is a rough plot of the generation (x-axis) versus the log probability of observing the data (y-axis). You can use this graph to determine what the burn in for your analysis should be. When the log probability starts to plateau you may be at stationarity. Sample trees and parameters after the log probability plateaus. Of course, this is not a guarantee that you are at staanalysis should be. When the log probability starts to plateau tionarity. When possible, run multiple analyses starting from different random trees; if the inferences you make for independent analyses are the same, this is reasonable evidence that the chains have converged. You can use MrBayes to run several independent analyses simultaneously. During such a run, MrBayes will monitor the convergence of topologies. After the run has been completed, the 'sumt' and 'sump' functions will provide additional conver- MrBayes > ssp ngen-100000 diagnfreq=1000 filename=YND-Typ-NoHax-ss Could not find parameter "ngen-100000" MrBayes > ssp ngen=100000 diagnfreq=1000 filename=YND-Typ-NoHax-ss   Starting stepping-stone sampling to estimate marginal likelihood. 50 steps will be used with 1500 generations (3 samples) within each step. Total of 76500 generations (153 samples) will be collected while first 1500 generations (3 samples) will be discarded as initial burnin. Additionally at the begining of each step 0 generations (0 samples) will be discarded as burnin.
In a rooted or clock tree, the tree is rooted using the model and not by reference to an outgroup. Each bipartition therefore corresponds to a clade, that is, a group that includes all the descendants of a particular branch in the tree. Taxa that are included in each clade are denoted using '*', and taxa that are not included are denoted using the '.' symbol.
The output first includes a key to all the bipartitions with frequency larger or equual to (Minpartfreq) in at least one run. Minpartfreq is a paramiter to sumt command and currently it is set to 0.10. This is followed by a table with statistics for the informative bipartitions (those including at least two taxa), sorted from highest to lowest probability. For each bipartition, the table gives the number of times the partition or split was observed in all runs (#obs) and the posterior probability of the bipartition (Probab.), which is the same as the split frequency. If several runs are summarized, this is followed by the minimum split frequency (Min(s)), the maximum frequency (Max(s)), and the standard deviation of frequencies (Stddev(s)) across runs. The latter value should approach 0 for all bipartitions as MCMC runs converge. This is followed by a table summarizing branch lengths, node heights (if a clock model was used) and relaxed clock parameters (if a relaxed clock model was used). The mean, variance, and 95 % credible interval are given for each of these parameters. If several runs are summarized, the potential scale reduction factor (PSRF) is also given; it should approach 1 as runs converge. Node heights will take calibration points into account, if such points were used in the analysis.
Note that Stddev may be unreliable if the partition is not present in all runs (the last column indicates the number of runs that sampled the partition if more than one run is summarized). The PSRF is not calculated at all if the partition is not present in all runs.The PSRF is also sensitive to small sample sizes and it should only be considered a rough guide to convergence since some of the assumptions allowing one to interpret it as a true potential scale reduction factor are violated in MrBayes.

MrBayes > prset brlenspr=clock:uniform
Setting Brlenspr to Clock:Uniform Successfully set prior model parameters Adding dummy characters (unobserved site patterns) for division 1 WARNING: There are 32 characters incompatible with the specified coding bias. These characters will be excluded.