Found in 1 comment on Hacker News
dxbydt · 2012-12-18 · Original thread
In the textbook "multivariate stats" by izenman, (http://www.amazon.com/gp/product/0387781889/ ) , he claims that stats & ML progressed in parallel. So traditional stats techniques like OLS, multiple regression, nonlinear regression, logistic regression, GLMs are generally not covered in ML. Similarly ML topics like k-means, svm, random forests etc. are not taught by the stats dept.

What is happening in this past decade is a convergence of stats & ML, primarily driven by data scientists working in the domain of big data. The stats folks are slowly incorporating ML techniques into stats & finding rigorous heuristics for when they should be employed. Similarly ML guys, who are mostly CS folk who unfortunately have taken only 1 course on undergraduate stats & probability, are discovering you can do so much more without resorting to needless large-scale computation, by sampling intelligently & leveraging plain old statistics.

This schism between stats & ML can be leveraged very profitably during interviews :))

When I interview data science folks, I usually ask very simple questions from plain stats - how would you tell if a distribution is skewed...if you have an rv X with mean mu, and say rv Y = X-mu, then what is the mean of Y...if you have an rv A with mean 0 variance 1, then what are the chances of being 3 standard deviations away from the mean if you have no clue about the distribution of A ? What if you knew A was unimodal ? What if A is now normally distributed ?

Now if its a stats guy, I ask very simple ML....what is perceptron, have you heard of an neural network etc.

surprisingly, the stats guys do much better on ML than the ML guys on stats!