Thesis supervisor: Balázs Csanád Csáji
Location of studies (in Hungarian): SZTAKI: Institute for Computer Science and Control Abbreviation of location of studies: SZTAK
Description of the research topic:
Reinforcement learning (RL) is one of the main branches of machine learning and it deals with the problem of learning from sequential interactions with an uncertain, dynamic environment based on feedbacks (e.g., states and immediate costs). The aim is to find a control policy (decision strategy) which minimizes the expected (discounted or average) costs in the long run. Markov decision processes (MDPs) constitute the main mathematical background of RL; however, unlike in classical MDP studies, in RL the model of the system is typically unavailable, therefore, the dynamics and the costs have to be learned (estimated) while the decision maker tries to work efficiently. These two goals (exploring the environment and exploiting the information gathered so far) are working against each other leading to the fundamental problem of exploration vs exploitation (estimation vs control). Theoretical support for classical RL methods, such as Q-learning and TD(lambda), are usually asymptotic and presuppose either a tabular representation of the value function or a linear function approximation. Novel challenges in RL include providing methods with non-asymptotic (and distribution-free) guarantees, handling partial observability and changing environments, studying the connections of deep learning and RL, as well as studying the notorious exploration-exploitation trade-off (even in simplified problems, such as multi-armed- or contextual bandits). Distributed RL methods is another possible research direction, where investigating the effects of local vs global information and decision making are among the main research goals. Finally, analyzing stochastic approximation methods, such as stochastic gradient, and studying their consistency, computational and sample complexity as well as potential acceleration techniques are fundamental for RL.