ODT - THESIS TOPIC: Balázs Csanád Csáji: Reinforcement Learning

Reinforcement Learning

THESIS TOPIC PROPOSAL

Institute: Eötvös Loránd University, Budapest
computer sciences
Doctoral School of Informatics

Thesis supervisor: Balázs Csanád Csáji
Location of studies (in Hungarian): SZTAKI: Institute for Computer Science and Control
Abbreviation of location of studies: SZTAK

Description of the research topic:

Reinforcement learning (RL) is one of the main branches of machine learning and it deals with the problem of learning from sequential interactions with an uncertain, dynamic environment based on feedbacks (e.g., states and immediate costs). The aim is to find a control policy (decision strategy) which minimizes the expected (discounted or average) costs in the long run. Markov decision processes (MDPs) constitute the main mathematical background of RL; however, unlike in classical MDP studies, in RL the model of the system is typically unavailable, therefore, the dynamics and the costs have to be learned (estimated) while the decision maker tries to work efficiently. These two goals (exploring the environment and exploiting the information gathered so far) are working against each other leading to the fundamental problem of exploration vs exploitation (estimation vs control). Theoretical support for classical RL methods, such as Q-learning and TD(lambda), are usually asymptotic and presuppose either a tabular representation of the value function or a linear function approximation. Novel challenges in RL include providing methods with non-asymptotic (and distribution-free) guarantees, handling partial observability and changing environments, studying the connections of deep learning and RL, as well as studying the notorious exploration-exploitation trade-off (even in simplified problems, such as multi-armed- or contextual bandits). Distributed RL methods is another possible research direction, where investigating the effects of local vs global information and decision making are among the main research goals. Finally, analyzing stochastic approximation methods, such as stochastic gradient, and studying their consistency, computational and sample complexity as well as potential acceleration techniques are fundamental for RL.

Keywords: exploration-exploitation trade-off, stochastic approximation, non-asymptotic guarantees, distributed RL, partial observability, changing environments, bandit algorithms

Required language skills: English
Further requirements:
Solid background in probability and statistics, programming skills (e.g., Matlab, Python).

Number of students who can be accepted: 2

Deadline for application: 2023-05-31