Les séminaires du LIFO

29/01/2018 : Introduction to Multi-Armed Bandits
Marta Soare (LIFO (CA)) Résumé

Introduction to Multi-Armed Bandits Marta Soare, LIFO (CA)

A multi-armed bandit model is a simple framework that captures the exploration-exploitation trade-off that a learning agent needs to solve when facing an unknown and uncertain environment: The learning agent gets to sequentially choose "arms" (options/actions) available in the environment and has to infer their associated values based on the "rewards" (observations) returned by the environment as a response to the agent's choice. Given an objective, typically that of maximizing the cumulative reward, the agent can decide to "exploit" the information acquired thus far about the environment (keep selecting the arm with the seemingly largest reward), or to "explore" the arms whose associated value is more uncertain. This is a dynamic research topic with a wide range of applications, including clinical trials for deciding on the best treatment to give to a patient, on-line advertisements and recommender systems, or game playing. In this introduction we will present one of the simplest version of this problem, that of a stochastic multi-armed bandit game, and we will review some algorithms for optimally solving the arm selection problem.