The University of Southampton
University of Southampton Institutional Repository

AI3SD Video: Lessons learned from generative models of biological sequences

AI3SD Video: Lessons learned from generative models of biological sequences
AI3SD Video: Lessons learned from generative models of biological sequences
De novo protein design for catalysis of any desired chemical reaction is a long-standing goal in protein engineering because of the broad spectrum of technological, scientific and medical applications. However, mapping protein sequence to protein function is currently neither computationally nor experimentally tangible. Here, I will present a recently develop ProteinGAN approach, a self-attention-based variant of the generative adversarial network that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino-acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase (MDH) as a template enzyme, we show that 24% (13 out of 55 tested) of the ProteinGAN-generated and experimentally tested sequences are soluble and display MDH catalytic activity in the tested conditions in vitro, including a highly mutated variant of 106 amino-acid substitutions. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse functional proteins within the allowed biological constraints of the sequence space.

Talk is based on recently published work:
Repecka, D., Jauniskis, V., Karpus, L. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 3, 324–333 (2021). https://doi.org/10.1038/s42256-021-00310-5
AI, AI3SD Event, Artificial Intelligence, Machine Intelligence, Machine Learning, ML, Proteins
Zeleniak, Aleksej
31e202ce-0f8d-4a6b-9995-52b019234c80
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Zeleniak, Aleksej
31e202ce-0f8d-4a6b-9995-52b019234c80
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Zeleniak, Aleksej (2021) AI3SD Video: Lessons learned from generative models of biological sequences. Frey, Jeremy G., Kanza, Samantha and Niranjan, Mahesan (eds.) AI 4 Proteins Seminar Series 2021. 14 Apr - 17 Jun 2021. (doi:10.5258/SOTON/P0101).

Record type: Conference or Workshop Item (Other)

Abstract

De novo protein design for catalysis of any desired chemical reaction is a long-standing goal in protein engineering because of the broad spectrum of technological, scientific and medical applications. However, mapping protein sequence to protein function is currently neither computationally nor experimentally tangible. Here, I will present a recently develop ProteinGAN approach, a self-attention-based variant of the generative adversarial network that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino-acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase (MDH) as a template enzyme, we show that 24% (13 out of 55 tested) of the ProteinGAN-generated and experimentally tested sequences are soluble and display MDH catalytic activity in the tested conditions in vitro, including a highly mutated variant of 106 amino-acid substitutions. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse functional proteins within the allowed biological constraints of the sequence space.

Talk is based on recently published work:
Repecka, D., Jauniskis, V., Karpus, L. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 3, 324–333 (2021). https://doi.org/10.1038/s42256-021-00310-5

Video
AI4Proteins-Seminar-Series-AleksejZelezniak-170621 - Version of Record
Available under License Creative Commons Attribution.
Download (376MB)

More information

Published date: 17 June 2021
Additional Information: Aleksej Zelezniak is a tenured Associate Professor, SciLifeLab fellow at the Chalmers University of Technology, Gothenburg, Sweden. He graduated MSc degree in Bioinformatics from the Technical University of Denmark with PhD at the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany developing network-based omics data integration methods for studying metabolic networks. For his postdoctoral training as an EMBO fellow, he joined the lab of Dr Markus Ralser at the University of Cambridge and the Francis Crick Institute, London, developing applications of machine learning to high-throughput mass spectrometry data. From 2017 he leads an independent research group developing machine learning approaches for de novo protein and DNA designs for biotechnology and synthetic biology applications.
Venue - Dates: AI 4 Proteins Seminar Series 2021, 2021-04-14 - 2021-06-17
Keywords: AI, AI3SD Event, Artificial Intelligence, Machine Intelligence, Machine Learning, ML, Proteins

Identifiers

Local EPrints ID: 450161
URI: http://eprints.soton.ac.uk/id/eprint/450161
PURE UUID: b25917bc-b9d5-44ee-a99e-f0a56a544a7e
ORCID for Jeremy G. Frey: ORCID iD orcid.org/0000-0003-0842-4302
ORCID for Samantha Kanza: ORCID iD orcid.org/0000-0002-4831-9489
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 14 Jul 2021 16:33
Last modified: 17 Mar 2024 03:51

Export record

Altmetrics

Contributors

Author: Aleksej Zeleniak
Editor: Jeremy G. Frey ORCID iD
Editor: Samantha Kanza ORCID iD
Editor: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×