Learning and Information Theory

Generalization in classical machine learning models is often understood in terms of the bias-variance trade-off: over-parameterized models fit the data better but are more likely to overfit. However, the push for ever larger deep network models, driven by their remarkable empirical generalization challenges this understanding. There are still many seemingly simple but open questions in machine learning: “how can I measure the amount of information in a dataset?”, “Can I determine whether my machine learning model has learned some or all the information in a dataset?”, “why do deep learning models show exceptional generalization performance when they are so overparameterized?”. Progress on these fundamental questions could have wide impact on the way we think about and design machine learning algorithms, and we believe that here, information theory can provide important insights.