Machine Learning: a Probabilistic Perspective, by Murphy
http://www.cs.ubc.ca/~murphyk/MLbook/
Pattern classification, by Duda et all
http://www.amazon.com/Pattern-Classification-Pt-1-Richard-Du...
The Elements of Statistical Learning, by Hastie et all. It is free from Stanford.
http://www-stat.stanford.edu/~tibs/ElemStatLearn
Mining of Massive Datasets, free from Stanford.
http://infolab.stanford.edu/~ullman/mmds.html
Bayesian Reasoning and Machine Learning, by Barber, free available online.
http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...
Learning from data, by Abu-Mostafa.
It comes with Caltech video lectures: http://work.caltech.edu/telecourse.html
Pattern Recognition and Machine Learning, by Bischop
http://research.microsoft.com/en-us/um/people/cmbishop/prml/
Also noteworthy
Information Theory, Inference, and Learning Algorithms, by Mackay, free.
http://www.inference.phy.cam.ac.uk/itprnn/book.html
Classification, Parameter Estimation and State Estimation, by van der Heijden.
Computer Vision: Models, Learning, and Inference, by Prince, available for free
http://www.computervisionmodels.com/
Probabilistic Graphical Models, by Koller. Has an accompanying course on Coursera.
http://www.amazon.com/Pattern-Classification-2nd-Richard-Dud...
It is very pragmatic, including algorithms for many machine learning and artificial intelligence topics (from fitting functions for classification or regression purposes to search processes). The authors have a strong industrial background (in addition to the academic).
For general machine learning, there are many, many books. A good intro is [1] and a more comprehensive, reference sort of book is [2]. Frankly, by this point, even reading the documentation and user guide of scikit-learn has a fairly good mathematical presentation of many algorithms. Another good reference book is [3].
Finally, I would also recommend supplementing some of that stuff with Bayesian analysis, which can address many of the same problems, or be intermixed with machine learning algorithms, but which is important for a lot of other reasons too (MCMC sampling, hierarchical regression, small data problems). For that I would recommend [4] and [5].
Stay away from bootcamps or books or lectures that seem overly branded with “data science.” This usually means more focus on data pipeline tooling, data cleaning, shallow details about a specific software package, and side tasks like wrapping something in a webservice.
That stuff is extremely easy to learn on the job and usually needs to be tailored differently for every different project or employer, so it’s a relative waste of time unless it is the only way you can get a job.
[0]: < https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma... >
[1]: < https://www.amazon.com/Pattern-Classification-Pt-1-Richard-D... >
[2]: < https://www.amazon.com/Pattern-Recognition-Learning-Informat... >
[3]: < http://www.web.stanford.edu/~hastie/ElemStatLearn/ >
[4]: < http://www.stat.columbia.edu/~gelman/book/ >
[5]: < http://www.stat.columbia.edu/~gelman/arm/ >