Data capture and networking advances have increased the availability of continually arriving real-time information in a variety of settings, ranging from the digital economy to the internet of things. Learning (in the statistical sense of parameter estimation and inference) on data streams has two unique constraints. First, its memory footprint needs to be constant over time, and often very light. Second. it is expected that real-time phenomena tend to change over time, often in unpredictable ways, so that models must assume a dynamic, evolving process.
This talk is in two parts. In Part 1, we will outline some recent theoretical advances that demonstrate close relationships between popular streaming data heuristics (such as windowing and forgetting factors) and Bayesian dynamic modelling that pave the way for ultra-efficient yet principled parameter estimation. In Part 2, we will describe the practical challenges we had to overcome in our startup, Mentat Innovations, that is focused on machine learning for data in motion.