Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback
Published in Proceedings of the 34rd International Joint Conference on Artificial Intelligence (IJCAI), 2025
Abstract: Learning in games is a fundamental problem in machine learning and artificial intelligence, with numerous applications (Silver et al., 2016; Schrittwieser et al., 2020). This work focuses on learning in two-player zero-sum matrix games with an unknown payoff matrix and bandit feedback, where players observe their actions and corresponding noisy payoffs. Previous studies have proposed algorithms for this setting (O’Donoghue et al., 2021; Maiti et al., 2023; Cai et al., 2023), with O’Donoghue et al. (2021) highlighting the effectiveness of deterministic optimism (e.g., UCB) in achieving sublinear regret. However, the potential of randomised optimism in matrix games remains theoretically unexplored.
Recommended citation: Shishen Lin. (2025). " Randomised Optimism via Competitive Co-Evolution for Matrix Games with Bandit Feedback." Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI). 7 pages, Montreal, Canada, 2025. to appear. https://arxiv.org/pdf/2505.13562