Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities

Akihiro Oda; Takatomo Mihana; Kazutaka Kanno; Makoto Naruse; Atsushi Uchida

doi:10.1587/nolta.13.112

Special Section on Laser Dynamics and Complex Photonics

Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities

Akihiro Oda, Takatomo Mihana, Kazutaka Kanno, Makoto Naruse, Atsushi Uchida

Author information

Keywords: decision making, reinforcement learning, chaos, semiconductor laser, multi-armed bandit problem, AI photonics

JOURNAL FREE ACCESS

2022 Volume 13 Issue 1 Pages 112-122

DOI https://doi.org/10.1587/nolta.13.112

Details

Abstract

We numerically demonstrate the principle of adaptive decision making for solving multi-armed bandit problems in dynamically changing reward environments. We use the tug-of-war method by comparing a threshold and a chaotic temporal waveform generated from a semiconductor laser observed in an experiment. We propose a method for detecting dynamic changes in hit probabilities by evaluating short-term standard deviations of the estimated hit probabilities. Furthermore, the threshold is forced to be initialized when changes in the hit probabilities are detected. We perform adaptive decision making in time-varying hit probabilities, including cases in which the differences in the hit probabilities are small. The proposed method paves the way for ultrafast photonic decision making in dynamically changing environments for various applications, such as cognitive wireless communications and robot control using reinforcement learning.

Corresponding author

Register with J-STAGE for free!