A Probabilistic Theory of Pattern Recognition (Stochastic Modelling and Applied Probability)

Found in 1 comment on Hacker News

tel · 2013-12-09 · Original thread

A professor of mine stated it very well. If you can imagine that there is a true model somewhere out in infinitely large model space then ML is just the search for that model.

In order to make it tractable, you pick a finite model space, train it on finite data, and use a finite algorithm to find the best choice inside of that space. That means you can fail in three ways---you can over-constrain your model space so that the true model cannot be found, you can underpower your search so that you have less an ability to discern the best model in your chosen model space, and you can terminate your search early and fail to reach that point entirely.

Almost all error in ML can be seen nicely in this model. In particular here, those who do not remember to optimize validation accuracy are often making their model space so large (overfitting) at the cost of having too little data to power the search within it.

Devroye, Gyorfi, and Lugosi (http://www.amazon.com/Probabilistic-Recognition-Stochastic-M...) have a really great picture of this in their book.

ISBN: 0387946187