contact: nmanginas@iit.demokritos.gr
Transformer based architectures, such as GPT, have shown exemplary results in modelling complex data. They belong to a very broad and researched family of models in probability theory, called autoregressive models. These models can learn the distribution of sequences, e.g. language or genes, by generating outputs one at a time. Once an autoregressive model is trained one can pose several queries on it. The most common one, used for example in ChatGPT, is sampling, i.e. getting a possible sequence of words from the probability distribution of all continuations. When using these models in different settings, e.g. to generate sequences representing the path of a robot, or the status of a patient though time various other queries become of interest. These include, questions of the form “is the patient going to get a fever in the next 10 days?” or “will the robot arrive at its goal within the next minute without passing through unwanted areas?”.
Answering such questions is in, computational complexity terms, intractable for autoregressive models. This means that computing the probability of a given query exactly can, in practice, only be done for very limited settings. Focus must then be given to approximate techniques. In their simplest form, approximating answers to such queries can be done by sampling multiple possible sequences from the model and counting the percentage in which the query is true.
The student will: a) become familiar with the probability theory behind autoregressive models, which powers modern transformers, b) experiment with simple approximate methods for complex queries, and develop more sophisticated approximation schemes, c) apply their techniques on controlled domains, such as generation of game states in simple games like LunarLander and Atari games and d) measure the quality and efficiency of their approximation schemes and discuss the results.
References:
[1] Probabilistic Machine Learning: Advanced Topics (Chapter 22)
Note: The student should not be surprised if this is a bit hard to follow. After all it is the 22nd chapter in a book called Advanced Topics. They will be given a simpler introduction after which following this theory will be much easier.
[2] Predictive Querying for Autoregressive Neural Sequence Models (up to Sec. 3)