In the last decade Information Theoretic methods slowly but surely became popular in the data mining community for selecting the best model for the data at hand. In this tutorial we present an overview of these methods. Starting from the basics, to how one defines and finds good solutions using Information Theory, to how such models provide highly competitive solutions to many data mining tasks.
Selecting a model for a given set of data is at the heart of what data analysts do, whether they are statisticians, machine learners or data miners. However, the philosopher Hume already pointed out that the "Problem of Induction" is unsolvable; there are infinitely many functions that touch any finite set of points. So, it is not surprising that there are many different principled approaches to guide the search for a good model. Well-known examples are Bayesian Statistics and Statistical Learning Theory.
In the last decade Information Theoretic methods for selecting the best model slowly but surely became popular in the data mining community. In this tutorial we present an overview of these methods. Starting from the basics — i.e., Information Theory — to how one defines and finds good pattern based models, to how to use Information Theoretic insights to solve a broad range of data mining tasks.
The first part of the tutorial introduces information theory and basic concepts in this area, such as mutual information and entropy. We discuss their relation with optimal compression, and give examples how they are/can be used in data mining. Next, we proceed to the more advanced concept of entropy for continuous-valued data. In particular, we will discuss the quirks of differential and the strenghts of cumulative entropy, and show how it can be used in practice.
The second part of the tutorial discusses algorithmic information theory, and its applications towards model selection in data mining. The basic idea is that the better a model compresses the data, the better a model of the data it is. We will discuss this in light of Kolmogorov Complexity as well as the Minimum Description Length principle. After discussing the basics of information theoretic modelling, we will explain algorithmics techniques for how we can discover small sets of characteristic patterns efficiently, as well as give insight in how to construct a good encoding for a summarization problem.
ITDM will be 3 hour tutorial on Saturday, May 2, at SIAM SDM 2015, from 8:30 to 12:00 in Room Port of Singapore (3rd level).
The 2015 tutorial on Information Theoretic Methods in Data Mining is organized in conjunction with SDM.