contact: nmanginas@iit.demokritos.gr
Planning is one of the most foundational problems in AI. It involves finding a sequence of actions such that an agent fulfills a goal in an environment, e.g. a robot safely navigates to a given location. Simple games, like mazes and puzzles are still amongst the most popular testbeds for assessing the capabilities of autonomous agents. Planning has a long history. Initially, it boiled down to specifying a formal model of the environment in which the agent is deployed, e.g. the rules of a game, and using search to find a plan such that the agent fulfills its goal. To this day, this approach to planning is still prevalent in logistics, scheduling e.t.c. However, it eventually became clear that environments fully specified by a formal (logical) model are limiting. The effects of the agent’s decisions are most often stochastic in nature, i.e. sometimes when attacking a monster in the game NetHack more damage will be dealt, and some less. This led to the development of decision theoretic planning, or planning under uncertainty. The automated techniques for solving deterministic environments were then upgraded to probabilistic settings. Lately, deep learning has been employed to create fully autonomous agents that don’t rely on specifications of the environment and simply learn to plan via interaction or demonstration. This breakthrough has enabled the development of agents in domains where knowledge about the environment is scarce and only historical data exists. However, such fully statistical, neural network based agents are limited in their planning capabilities, especially when encountering situations distant from their historical data.
In this thesis, the student will attempt to combine deep learning, which allows the agent to understand its current environment state from image representations of games, and a probabilistic model of the environment of the game which can be used to plan based on the outputs of the neural network. The student will compare this hybrid neuro-symbolic approach, which combines the abilities of symbolic planners with those of deep learning, to a fully neural planner. The proposed research will be done in Minihack, an environment based on the Nethack game and will be based on current research from the institute on neuro-symbolic planning and temporal reasoning.
If the student has some other favourite game which satisfies some conditions, i.e. is not too complex, they can work on that game instead.
References:
[1] MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
[2] Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
[3] LLM+P: Empowering Large Language Models with Optimal Planning Proficiency