Unbiased inference of the fitness landscape ruggedness from  imprecise fitness estimates

Song, Siliang; Zhang, Jianzhi

Work Description

Title: Unbiased inference of the fitness landscape ruggedness from imprecise fitness estimates Open Access Deposited

Attribute	Value
Methodology	The code processing the data can be found at https://github.com/song88180/fitness-landscape-error All .pkl files are generated using pickle 4.0 with python 3.8, Jupyter notebook 6.3.0, Annaconda 4.10.3 Python objects (list, numpy array, or pandas dataframe) are written to .pkl files by: with open('./path/to/file','wb') as f: pickle.dump(object, f) And data be load from .pkl files to python by: with open('./path/to/file','rb') as f: object = pickle.load(f)
Description	Fitness landscapes map genotypes to their corresponding fitness under given environments and allow explaining and predicting evolutionary trajectories. Of particular interest is the landscape ruggedness or the unevenness of the landscape, because it impacts many aspects of evolution such as the likelihood that a population is trapped in a local fitness peak. Although the ruggedness has been inferred from a number of empirically mapped fitness landscapes, it is unclear to what extent this inference is affected by fitness estimation error, which is inevitable in the experimental determination of fitness landscapes. Here we address this question by simulating fitness landscapes under various theoretical models, with or without fitness estimation error. We find that all eight examined measures of landscape ruggedness are overestimated due to imprecise fitness quantification, but different measures are affected to different degrees. We devise a method to use replicate fitness measures to correct this bias and show that our method performs well under realistic conditions. We conclude that previously reported fitness landscape ruggedness is likely upward biased owing to the negligence of fitness estimation error and advise that future fitness landscape mapping should include at least three biological replicates to permit an unbiased inference of the ruggedness.
Creator	Song, Siliang Zhang, Jianzhi
Depositor	siliangs@umich.edu
Contact information	siliangs@umich.edu
Discipline	Science
Keyword	adaptation estimation error evolution NK model Rough Mount Fuji model polynomial model
Resource type	Dataset
Curation notes	On September 29, 2021, an additional contributor was added to the record metadata, and full citations for references were added to the record's readme file.
Last modified	11/26/2022
Published	09/27/2021
DOI	https://doi.org/10.7302/0kzc-az82
License	http://creativecommons.org/licenses/by-nc/4.0/

To Cite this Work:
Song, S., Zhang, J. (2021). Unbiased inference of the fitness landscape ruggedness from imprecise fitness estimates [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/0kzc-az82

Relationships


This work is not a member of any user collections.

Files (Count: 2; Size: 1.78 GB)

Thumbnailthumbnail-column	Title	Original Upload	Last Modified	File Size	Access	Actions
	README.txt	2021-09-24	2021-09-30	7.37 KB	Open Access	View Details Download
	Data.zip	2021-09-17	2021-09-18	1.78 GB	Open Access	View Details Download

Date: 16 September 2021

Title of related publication: Unbiased inference of the fitness landscape ruggedness from
imprecise fitness estimates

Authors: Siliang Song and Jianzhi Zhang
Contact: Siliang Song siliangs@umich.edu

Abstract:
Fitness landscapes map genotypes to their corresponding fitness under given environments
and allow explaining and predicting evolutionary trajectories. Of particular interest is
the landscape ruggedness or the unevenness of the landscape, because it impacts many
aspects of evolution such as the likelihood that a population is trapped in a local
fitness peak. Although the ruggedness has been inferred from a number of empirically
mapped fitness landscapes, it is unclear to what extent this inference is affected by
fitness estimation error, which is inevitable in the experimental determination of fitness
landscapes. Here we address this question by simulating fitness landscapes under various
theoretical models, with or without fitness estimation error. We find that all eight
examined measures of landscape ruggedness are overestimated due to imprecise fitness
quantification, but different measures are affected to different degrees. We devise a
method to use replicate fitness measures to correct this bias and show that our method
performs well under realistic conditions. We conclude that previously reported fitness
landscape ruggedness is likely upward biased owing to the negligence of fitness estimation
error and advise that future fitness landscape mapping should include at least three
biological replicates to permit an unbiased inference of the ruggedness.

Overview of data:
The data contain raw and simulated fitness landscape data, as well as summary data used for
plotting all figures in the paper.

Data-specific Description:
./2_Effect_of_Measurement_Error/plot_data/*.pkl
Each .pkl file contains summary data about the effect of measurement error for a
specific model of theoretical landscape (NK, Polynomial, RMF), a number of variable
sites (5, 10, 15), and a ruggedness measure (N_max, epi, r_s, open_ratio, E, gamma,
adaptwalk_probs, adptwalk_steps). Files are used to generate Figs. 2, S1, S2, S3.

./3_Ruggedness_Error_Curve/plot_data/*.pkl
Each .pkl file contains summary data about the ruggedness-error curve for
a specific model of theoretical landscape (NK, Polynomial, RMF), a number of variable
sites (5, 10, 15), and a ruggedness measure (N_max, epi, r_s, open_ratio, E, gamma,
adaptwalk_probs, adptwalk_steps). Files are used to generate Figs. 3, S4.

./4_Extrapolation_Evaluation/raw_data/*_raw.pkl
Each .pkl file contains raw results about the prediction of 8 extrapolation methods
for a specific model of theoretical landscape (NK, Polynomial, RMF), a number of variable
sites (5, 10, 15), and a ruggedness measure (N_max, epi, r_s, open_ratio, E, gamma,
adaptwalk_probs, adptwalk_steps).

./4_Extrapolation_Evaluation/extrapolation_model_selection_result.pkl
The file contains summary results about the performance of 8 extrapolation methods,
and is generated by evaluating the prediction results recorded in .pkl files in
./4_Extrapolation_Evaluation/raw_data/. The file is used to generate Figs. 4, S5-11.

./5_Empirical_Extrapolation/SD_seq/SD_seq_arti_data.csv
The file contains raw genotype-expression data of SD sequences in E. coli
(Kuo et al., 2020).

./5_Empirical_Extrapolation/SD_seq/*_plot.pkl
Each .pkl file contains summary data of the extrapolation results on SD sequence
landscapes. files are used to generate Figs. 5, S12.

./5_Empirical_Extrapolation/trna_Domingo/trna_Domingo_data.csv
The file contains raw genotype-fitness data of a yeast tRNA (Domingo et al., 2018).

./5_Empirical_Extrapolation/trna_Domingo/*_plot.pkl
Each .pkl file contains summary data of the extrapolation results on Domingo et al.'s
tRNA landscapes. files are used to generate Figs. 5, S12.

./5_Empirical_Extrapolation/trna_Li/All_data_df.pkl
The file contains raw genotype-fitness data of a yeast tRNA (Li et al., 2016).

./5_Empirical_Extrapolation/trna_Li/*_plot.pkl
Each .pkl file contains summary data of the extrapolation results on Li et al.'s tRNA
landscapes. Files are used to generate Figs. 5, S12.

./6_Model_Parameters_Effect/FL_stratified/*_stratified.pkl
Each .pkl files contains parameter-stratified simulated fitness landscape data with one
of three theoretical fitness landscape model (NK, Polynomial, RMF).

./6_Model_Parameters_Effect/plot_df_data/*_plot_df.pkl
Each .pkl files contain ruggedness data for simuilated landscapes in
./6_Model_Parameters_Effect/FL_stratified/
Four ruggedness measure (N_max, epi, r_s, open_ratio) are considered in separate files.
Files are used to generate Fig. S14.

./FL_data_3X10/*_landscape_3X10.pkl
Each .pkl files contains 3X10 ruggedness-stratified simulated fitness landscapes with three
ruggedness level (low, middle, high). Fitness landscapes in each file are simulated by a
specific theoretical landscape model (NK, Polynomial, RMF), a number of variable sites
(5, 10, 15), and stratified by a specific ruggedness measure (N_max, epi, r_s, open_ratio,
E, gamma, adaptwalk_probs, adptwalk_steps). Files are used for drawing ruggedness-error
curve (Figs. 3, S4) and for extrapolation evaluation (Figs. 4, S5-11)

./FL_data_100X10/*_landscape_list_100X10.pkl
Each .pkl files contains 100X10 simulated fitness landscapes. Fitness landscapes in each
file are simulated by a specific theoretical landscape model (NK, Polynomial, RMF), a number
of variable sites (5, 10, 15). Files are used for evaluating the effect of measurement
error on ruggedness inference (Figs. 2, S1-3)

./index_file/*.pkl
Pre-calculated index files that help improve speed of ruggedness calculation.

Data-specific Methodology:
The code processing the data can be found at https://github.com/song88180/fitness-landscape-error
All .pkl files are generated using pickle 4.0 with python 3.8, Jupyter notebook 6.3.0, Annaconda 4.10.3
Python objects (list, numpy array, or pandas dataframe) are written to .pkl files by:
with open('./path/to/file','wb') as f:
pickle.dump(object, f)
And data be load from .pkl files to python by:
with open('./path/to/file','rb') as f:
object = pickle.load(f)

Sources referenced:
Kuo, S. T., Jahn, R. L., Cheng, Y. J., Chen, Y. L., Lee, Y. J., Hollfelder, F., ... & Chou, H. H. D. (2020). Global fitness landscapes of the Shine-Dalgarno sequence. Genome research, 30(5), 711-723.

Domingo, J., Diss, G., & Lehner, B. (2018). Pairwise and higher-order genetic interactions during the evolution of a tRNA. Nature, 558(7708), 117-121.

Li, C., Qian, W., Maclean, C. J., & Zhang, J. (2016). The fitness landscape of a tRNA gene. Science, 352(6287), 837-840.

Use and Access:
These data are made available under a Creative Commons Attribution Non-Commercial license
(CC BY-NC 4.0).

To cite data:

--For the related publication:

Song, S., and J. Zhang (2021) Unbiased inference of the fitness landscape ruggedness from
imprecise fitness estimates. Evolution, in press.

--For the dataset:

Song, S., and J. Zhang (2021) Unbiased inference of the fitness landscape ruggedness from
imprecise fitness estimates [Data set]. University of Michigan Deep Blue Data Repository. https://doi.org/10.7302/0kzc-az82

Update Provenance Log Entries

Download All Files (To download individual files, select them in the “Files” panel above)

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.