A Probabilistic Theory of Pattern Recognition (Stochastic Modelling and Applied Probability)
by
Luc Devroye
Description: A Probabilistic Theory of Pattern Recognition offers a detailed overview of the probabilistic methods used in pattern recognition, focusing on the mathematical analysis of various approaches within stochastic modelling and applied probability
ISBN: 0387946187
View on Amazon
We may earn a commission from purchases made through links on this page.
In order to make it tractable, you pick a finite model space, train it on finite data, and use a finite algorithm to find the best choice inside of that space. That means you can fail in three ways---you can over-constrain your model space so that the true model cannot be found, you can underpower your search so that you have less an ability to discern the best model in your chosen model space, and you can terminate your search early and fail to reach that point entirely.
Almost all error in ML can be seen nicely in this model. In particular here, those who do not remember to optimize validation accuracy are often making their model space so large (overfitting) at the cost of having too little data to power the search within it.
Devroye, Gyorfi, and Lugosi (http://www.amazon.com/Probabilistic-Recognition-Stochastic-M...) have a really great picture of this in their book.