Modern Multivariate Statistical Techniques: Regression, Classificatio…

dxbydt · 2012-12-18 · Original thread

In the textbook "multivariate stats" by izenman, (http://www.amazon.com/gp/product/0387781889/ ) , he claims that stats & ML progressed in parallel. So traditional stats techniques like OLS, multiple regression, nonlinear regression, logistic regression, GLMs are generally not covered in ML. Similarly ML topics like k-means, svm, random forests etc. are not taught by the stats dept.

What is happening in this past decade is a convergence of stats & ML, primarily driven by data scientists working in the domain of big data. The stats folks are slowly incorporating ML techniques into stats & finding rigorous heuristics for when they should be employed. Similarly ML guys, who are mostly CS folk who unfortunately have taken only 1 course on undergraduate stats & probability, are discovering you can do so much more without resorting to needless large-scale computation, by sampling intelligently & leveraging plain old statistics.

This schism between stats & ML can be leveraged very profitably during interviews :))

When I interview data science folks, I usually ask very simple questions from plain stats - how would you tell if a distribution is skewed...if you have an rv X with mean mu, and say rv Y = X-mu, then what is the mean of Y...if you have an rv A with mean 0 variance 1, then what are the chances of being 3 standard deviations away from the mean if you have no clue about the distribution of A ? What if you knew A was unimodal ? What if A is now normally distributed ?

Now if its a stats guy, I ask very simple ML....what is perceptron, have you heard of an neural network etc.

surprisingly, the stats guys do much better on ML than the ML guys on stats!

Get the best books from Hacker News each week