Lifo - Laboratoire d'Informatique Fondamentale d'orléans INSA Centre Val de Loire Université d'Orléans Université d'Orléans

Lifo > Les séminaires du LIFO

 English Version


LIFO - Bâtiment IIIA
Rue Léonard de Vinci
B.P. 6759
F-45067 ORLEANS Cedex 2

Email: contact.lifo
Tel: +33 (0)2 38 41 70 11
Fax: +33 (0)2 38 41 71 37

Les séminaires du LIFO

Accès par année : 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Sauf exception, les séminaires se déroulent le lundi de 14h à 15h, Salle de réunion 1, bâtiment IIIA (voir plan du campus).

29/01/2018 : Introduction to Multi-Armed Bandits
Marta Soare (LIFO (CA)) Résumé

Résumés des séminaires

Introduction to Multi-Armed Bandits Marta Soare, LIFO (CA)

A multi-armed bandit model is a simple framework that captures the exploration-exploitation trade-off that a learning agent needs to solve when facing an unknown and uncertain environment: The learning agent gets to sequentially choose "arms" (options/actions) available in the environment and has to infer their associated values based on the "rewards" (observations) returned by the environment as a response to the agent's choice. Given an objective, typically that of maximizing the cumulative reward, the agent can decide to "exploit" the information acquired thus far about the environment (keep selecting the arm with the seemingly largest reward), or to "explore" the arms whose associated value is more uncertain. This is a dynamic research topic with a wide range of applications, including clinical trials for deciding on the best treatment to give to a patient, on-line advertisements and recommender systems, or game playing. In this introduction we will present one of the simplest version of this problem, that of a stochastic multi-armed bandit game, and we will review some algorithms for optimally solving the arm selection problem.