Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback
Published in Proceedings of the 34rd International Joint Conference on Artificial Intelligence (IJCAI), 2025
Recommended citation: Shishen Lin. (2025). " Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback." Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI). 7 pages, Montreal, Canada, 2025. to appear. https://arxiv.org/pdf/2505.13562
Abstract: Learning in games is a fundamental problem in machine learning and artificial intelligence, with numerous applications (Silver et al., 2016; Schrittwieser et al., 2020). This work focuses on learning in two-player zero-sum matrix games with an unknown payoff matrix and bandit feedback, where players observe their actions and corresponding noisy payoffs. Previous studies have proposed algorithms for this setting (O’Donoghue et al., 2021; Maiti et al., 2023; Cai et al., 2023), with O’Donoghue et al. (2021) highlighting the effectiveness of deterministic optimism (e.g., UCB) in achieving sublinear regret. However, the potential of randomised optimism in matrix games remains theoretically unexplored.
We introduce Competitive Co-evolutionary Bandit Learning (COEBL), a novel algorithm that incorporates evolutionary algorithms (EAs) into the bandit framework to implement randomised optimism through the variation operator of EAs. We prove that COEBL achieves sublinear regret, matching the performance of deterministic optimism-based approaches. To the best of our knowledge, this is the first regret analysis of an evolutionary bandit learning algorithm in matrix games.
Empirical evaluations on various matrix game benchmarks demonstrate that COEBL not only achieves sublinear regret but also outperforms classical bandit algorithms, including EXP3 (Auer et al., 2002), EXP3-IX (Cai et al., 2023), and UCB (O’Donoghue et al., 2021). These findings highlight the potential of evolutionary bandit learning, particularly the effectiveness of randomised optimism via evolutionary algorithms, in game-theoretic settings.