Maximizing and Satisficing and Multi-armed Bandits with Graph Information

Abstract

We study pure exploration in multi armed bandits with graph side information. In particular, we consider the best-arm and near best-arm identification problem in the fixed confidence setting under the assumption that the arm rewards are smooth with respect to a given arbitrary graph. This captures a range of real world pure exploration scenarios where one often has information about the similarity of the options or actions under consideration. We propose a novel algorithm GRUB for this problem and provide a theoretical characterization of its performance that elicits the benefit of the graph side information. We complement our theory with experimental results that show that capitalizing on available graph side information yields significant improvements over pure exploration methods that are unable to use this information

Publication
Neural Information Processing Systems, 2022