research-article

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

Authors:
Valentin Macé

InstaDeep, Paris, France

Sorbonne University, Paris, France

InstaDeep, Paris, France

Sorbonne University, Paris, France

https://orcid.org/0009-0002-4185-0936
View Profile

,
Raphaël Boige

InstaDeep, Paris, France

InstaDeep, Paris, France

https://orcid.org/0009-0008-2571-3978
View Profile

,
Felix Chalumeau

InstaDeep, Paris, France

InstaDeep, Paris, France

https://orcid.org/0000-0001-9476-2900
View Profile

,
Thomas Pierrot

InstaDeep, Boston, United States of America

InstaDeep, Boston, United States of America

https://orcid.org/0000-0002-5227-6194
View Profile

,
Guillaume Richard

InstaDeep, Paris, France

InstaDeep, Paris, France

https://orcid.org/0009-0001-2738-1603
View Profile

,
Nicolas Perrin-Gilbert

Sorbonne University, Paris, France

Sorbonne University, Paris, France

https://orcid.org/0000-0001-8626-1938
View Profile

GECCO '23: Proceedings of the Genetic and Evolutionary Computation ConferenceJuly 2023Pages 1221–1229https://doi.org/10.1145/3583131.3590433

Published:12 July 2023Publication History

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1221–1229

ABSTRACT

In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.

Supplemental Material

Available for Download

pdf

p1221-mace-suppl.pdf (1.4 MB)

Supplemental material.

References

Alberto Alvarez, Steve Dahlskog, Jose Font, and Julian Togelius. 2019. Empowering quality diversity in dungeon design with interactive constrained MAP-Elites. In 2019 IEEE Conference on Games (CoG). IEEE, 1--8.Google ScholarDigital Library
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).Google Scholar
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jaxGoogle Scholar
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.Google Scholar
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part I 16. Springer, 213--229.Google Scholar
Leo Cazenille, Nicolas Bredeche, and Nathanael Aubert-Kato. 2019. Exploring Self-Assembling Behaviors in a Swarm of Bio-micro-robots using Surrogate-Assisted MAP-Elites. arXiv preprint arXiv:1910.00230 (2019).Google Scholar
Felix Chalumeau, Raphael Boige, Bryan Lim, Valentin Macé, Maxime Allard, Arthur Flajolet, Antoine Cully, and Thomas Pierrot. 2022. Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery. arXiv preprint arXiv:2210.03516 (2022).Google Scholar
Megan Charity, Ahmed Khalifa, and Julian Togelius. 2020. Baba is Y'all: Collaborative Mixed-Initiative Level Design. arXiv preprint arXiv:2003.14294 (2020).Google Scholar
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. 2021. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems 34 (2021), 15084--15097.Google Scholar
Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scaling MAP-Elites to deep neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 67--75.Google ScholarDigital Library
Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals. Nature 521, 7553 (2015), 503--507.Google Scholar
Antoine Cully and Yiannis Demiris. 2017. Quality and diversity optimization: A unifying modular framework. IEEE Transactions on Evolutionary Computation 22, 2 (2017), 245--259.Google ScholarCross Ref
Antoine Cully and Yiannis Demiris. 2018. Hierarchical behavioral repertoires with unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference. 69--76.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Linhao Dong, Shuang Xu, and Bo Xu. 2018. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5884--5888.Google ScholarDigital Library
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
Sondre A Engebraaten, Jonas Moen, Oleg A Yakimenko, and Kyrre Glette. 2020. A framework for automatic behavior generation in multi-function swarms. Frontiers in Robotics and AI 7 (2020), 579403.Google ScholarCross Ref
Manon Flageat, Felix Chalumeau, and Antoine Cully. 2022. Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains. ACM Transactions on Evolutionary Learning (2022).Google Scholar
Manon Flageat and Antoine Cully. 2020. Fast and stable MAP-Elites in noisy domains using deep grids. arXiv preprint arXiv:2006.14253 (2020).Google Scholar
Manon Flageat and Antoine Cully. 2023. Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains. arXiv preprint arXiv:2302.00463 (2023).Google Scholar
Manon Flageat, Bryan Lim, Luca Grillotti, Maxime Allard, Simón C Smith, and Antoine Cully. 2022. Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning. arXiv preprint arXiv:2211.02193 (2022).Google Scholar
Matthew Fontaine and Stefanos Nikolaidis. 2020. A quality diversity approach to automatically generating human-robot interaction scenarios in shared autonomy. arXiv preprint arXiv:2012.04283 (2020).Google Scholar
Matthew C. Fontaine, Scott Lee, L. B. Soros, Fernando De Mesentier Silva, Julian Togelius, and Amy K. Hoover. 2019. Mapping Hearthstone Deck Spaces with Map-Elites with Sliding Boundaries. In Proceedings of The Genetic and Evolutionary Computation Conference. ACM.Google Scholar
C Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. 2021. Brax-A Differentiable Physics Engine for Large Scale Rigid Body Simulation. arXiv preprint arXiv:2106.13281 (2021).Google Scholar
Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018).Google Scholar
Michael Janner, Qiyang Li, and Sergey Levine. 2021. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems 34 (2021), 1273--1286.Google Scholar
Marija Jegorova, Stéphane Doncieux, and Timothy M Hospedales. 2020. Behavioral repertoire via generative adversarial policy networks. IEEE Transactions on Cognitive and Developmental Systems (2020).Google ScholarCross Ref
Niels Justesen, Sebastian Risi, and Jean-Baptiste Mouret. 2019. Map-elites for noisy domains by adaptive sampling. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 121--122.Google ScholarDigital Library
Kuang-Huei Lee, Ofir Nachum, Sherry Yang, Lisa Lee, C. Daniel Freeman, Sergio Guadarrama, Ian Fischer, Winnie Xu, Eric Jang, Henryk Michalewski, and Igor Mordatch. 2022. Multi-Game Decision Transformers. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=0gouO5saq6KGoogle Scholar
Bryan Lim, Maxime Allard, Luca Grillotti, and Antoine Cully. 2022. Accelerated Quality-Diversity for Robotics through Massive Parallelism. arXiv preprint arXiv:2202.01258 (2022).Google Scholar
Douglas Morrison, Peter Corke, and Jurgen Leitner. 2020. EGAD! an Evolved Grasping Analysis Dataset for diversity and reproducibility in robotic manipulation. IEEE Robotics and Automation Letters (2020).Google Scholar
Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).Google Scholar
Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted map-elites. In Proceedings of the Genetic and Evolutionary Computation Conference. 866--875.Google ScholarDigital Library
Olle Nilsson and Antoine Cully. 2021. Policy Gradient Assisted MAP-Elites; Policy Gradient Assisted MAP-Elites. (2021). Google ScholarDigital Library
Thomas Pierrot, Valentin Macé, Felix Chalumeau, Arthur Flajolet, Geoffrey Cideron, Karim Beguir, Antoine Cully, Olivier Sigaud, and Nicolas Perrin-Gilbert. 2022. Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization. In GECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference.Google Scholar
Thomas Pierrot, Guillaume Richard, Karim Beguir, and Antoine Cully. 2022. Multi-Objective Quality Diversity Optimization. arXiv preprint arXiv:2202.03057 (2022).Google Scholar
Justin K Pugh, Lisa B Soros, and Kenneth O. Stanley. 2016. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI 3 (2016), 40.Google ScholarCross Ref
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google Scholar
Nemanja Rakicevic, Antoine Cully, and Petar Kormushev. 2021. Policy manifold search: Exploring the manifold hypothesis for diversity-based neuroevolution. In Proceedings of the Genetic and Evolutionary Computation Conference. 901--909.Google ScholarDigital Library
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et al. 2022. A generalist agent. arXiv preprint arXiv:2205.06175 (2022).Google Scholar
Vassilis Vassiliades, Konstantinos I. Chatzilygeroudis, and Jean-Baptiste Mouret. 2016. Scaling Up MAP-Elites Using Centroidal Voronoi Tessellations. CoRR abs/1610.05729 (2016). arXiv:1610.05729 http://arxiv.org/abs/1610.05729Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Tianping Zhang, Yuanqi Li, Yifei Jin, and Jian Li. 2020. AutoAlpha: an Efficient Hierarchical Evolutionary Algorithm for Mining Alpha Factors in Quantitative Investment. arXiv preprint arXiv:2002.08245 (2020).Google Scholar

Index Terms

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Evolutionary robotics

Recommendations

Diversity policy gradient for sample efficient quality-diversity optimization
GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single eficient solution to a given problem. Aiming ...
Read More
Specialization with NeuroEvolution in a collective behaviour task
GECCO '08: Proceedings of the 10th annual conference companion on Genetic and evolutionary computation

In Nature, behavioral specialization is ubiquitous. Groups benefit from complementary and specialized behaviors in individuals, especially in tasks requiring collective behavior. We apply four multiagent NeuroEvolution approaches to such a task: ...
Read More
Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference

Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
July 2023
1667 pages
ISBN:9798400701191
DOI:10.1145/3583131
Chair:
Sara Silva,
Program Chair:
Luís Paquete
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
neuroevolution
quality-diversity
decision transformer
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 64
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Diversity policy gradient for sample efficient quality-diversity optimization

Specialization with NeuroEvolution in a collective behaviour task

Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Diversity policy gradient for sample efficient quality-diversity optimization

Specialization with NeuroEvolution in a collective behaviour task

Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media