Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Authors

  • Jiahao Qiu Princeton University
  • Hui Yuan Princeton University
  • Jinghong Zhang University of California San Diego
  • Wentao Chen MLAB Biosciences Inc
  • Huazheng Wang Oregon State University
  • Mengdi Wang Princeton University

DOI:

https://doi.org/10.1609/aaai.v38i13.29386

Keywords:

ML: Online Learning & Bandits, APP: Natural Sciences

Abstract

While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient, diversity-promoting, and able to find top designs using reasonably small mutation counts.

Published

2024-03-24

How to Cite

Qiu, J., Yuan, H., Zhang, J., Chen, W., Wang, H., & Wang, M. (2024). Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 38(13), 14686-14694. https://doi.org/10.1609/aaai.v38i13.29386

Issue

Section

AAAI Technical Track on Machine Learning IV