This. Highly recommend Russel & Norvig  for high-level intuition and motivation. Then Bishop's "Pattern Recognition and Machine Learning"  and Koller's PGM book  for the fundamentals.
Avoid MOOCs, but there are useful lecture videos, e.g. Hugo Larochelle on belief propagation .
FWIW this is coming from a mechanical engineer by training, but self-taught programmer and AI researcher. I've been working in industry as an AI research engineer for ~6 years.
-  Pattern Recognition and Machine Learning (Information
Science and Statistics)
-  The Elements of Statistical Learning
-  Reinforcement Learning: An Introduction by Barto and Sutton
-  The Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio
-  Neural Network Methods for Natural Language Processing (Synthesis Lectures on Human Language Technologies) by Yoav Goldberg
Then some math tid-bits:
 Introduction to Linear Algebra by Strang
-  [PDF](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%...)
-  [amz](https://www.amazon.com/Reinforcement-Learning-Introduction-A...)
-  [site](https://www.deeplearningbook.org/)
-  [amz](https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma...)
-  [pdf](http://incompleteideas.net/book/bookdraft2017nov5.pdf)
-  [amz](https://www.amazon.com/Language-Processing-Synthesis-Lecture...)
-  [amz](https://www.amazon.com/Introduction-Linear-Algebra-Gilbert-S...)
For general machine learning, there are many, many books. A good intro is  and a more comprehensive, reference sort of book is . Frankly, by this point, even reading the documentation and user guide of scikit-learn has a fairly good mathematical presentation of many algorithms. Another good reference book is .
Finally, I would also recommend supplementing some of that stuff with Bayesian analysis, which can address many of the same problems, or be intermixed with machine learning algorithms, but which is important for a lot of other reasons too (MCMC sampling, hierarchical regression, small data problems). For that I would recommend  and .
Stay away from bootcamps or books or lectures that seem overly branded with “data science.” This usually means more focus on data pipeline tooling, data cleaning, shallow details about a specific software package, and side tasks like wrapping something in a webservice.
That stuff is extremely easy to learn on the job and usually needs to be tailored differently for every different project or employer, so it’s a relative waste of time unless it is the only way you can get a job.
: < https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma... >
: < https://www.amazon.com/Pattern-Classification-Pt-1-Richard-D... >
: < https://www.amazon.com/Pattern-Recognition-Learning-Informat... >
: < http://www.web.stanford.edu/~hastie/ElemStatLearn/ >
: < http://www.stat.columbia.edu/~gelman/book/ >
: < http://www.stat.columbia.edu/~gelman/arm/ >
First, you need a strong mathematical base. Otherwise, you can copy paste an algorithm or use an API but you will not get any idea of what is happening inside
Following concepts are very essential
1) Linear Algebra (MIT https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra... )
2) Probability (Harvard https://www.youtube.com/watch?v=KbB0FjPg0mw )
Get some basic grasp of machine learning. Get a good intuition of basic concepts
1) Andrew Ng coursera course (https://www.coursera.org/learn/machine-learning)
2) Tom Mitchell book (https://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/00...)
Both the above course and book are super easy to follow. You will get a good idea of basic concepts but they lack in depth. Now you should move to more intense books and courses
You can get more in-depth knowledge of Machine learning from following sources
1)Nando machine learning course ( https://www.youtube.com/watch?v=w2OtwL5T1ow)
2)Bishops book (https://www.amazon.in/Pattern-Recognition-Learning-Informati...)
Especially Bishops book is really deep and covers almost all basic concepts.
Now for recent advances in Deep learning. I will suggest two brilliant courses from Stanford
1) Vision ( https://www.youtube.com/watch?v=NfnWJUyUJYU )
2) NLP ( https://www.youtube.com/watch?v=OQQ-W_63UgQ)
The Vision course by Karparthy can be a very good introduction to Deep learning. Also, the mother book for deep learning ( http://www.deeplearningbook.org/ )is good
After you've got a grasp of what these things are doing then you can move into the how. For that you will need some math background, with emphasis in calculus and probability.
After that, you can take a look at PRML. https://www.amazon.com/Pattern-Recognition-Learning-Informat...
Some people might prefer seeing things from another approach. http://pgm.stanford.edu/
I haven't read much of this early access book yet, but I'd give the authors a lot of benefit of the doubt. Christopher Bishop wrote one of my favorite machine learning books (I read it after my graduate study in machine learning and it filled in a lott of the gaps): https://www.amazon.com/Pattern-Recognition-Learning-Informat...
If you are just learning programming, plan on taking your time with the algorithms but practice coding every day. Find a fun project to attempt that is within your level of skill.
If you are a strong programmer in one language, find a book of algorithms using that language (some of the suggestions here in these comments are excellent). I list some of the books I like at the end of this comment.
If you are an experienced programmer, one algorithm per day is roughly doable. Especially so, because you are trying to learn one algorithm per day, not produce working, production level code for each algorithm each day.
Some algorithms are really families of algorithms and can take more than a day of study, hash based look up tables come to mind. First there are the hash functions themselves. That would be day one. Next there are several alternatives for storing entries in the hash table, e.g. open addressing vs chaining, days two and three. Then there are methods for handling collisions, linear probing, secondary hashing, etc.; that's day four. Finally there are important variations, perfect hashing, cuckoo hashing, robin hood hashing, and so forth; maybe another 5 days. Some languages are less appropriate for playing around and can make working with algorithms more difficult, instead of a couple of weeks this could easily take twice as long. After learning other methods of implementing fast lookups, its time to come back to hashing and understand when its appropriate and when alternatives are better and to understand how to combine methods for more sophisticated lookup methods.
I think you will be best served by modifying your goal a bit and saying that you will work on learning about algorithms every day and cover all of the material in a typical undergraduate course on the subject. It really is a fun branch of Computer Science.
A great starting point is Sedgewick's book/course, Algorithms . For more depth and theory try , Cormen and Leiserson's excellent Introduction to Algorithms. Alternatively the theory is also covered by another book by Sedgewick, An Introduction to the Analysis of Algorithms . A classic reference that goes far beyond these other books is of course Knuth , suitable for serious students of Computer Science less so as a book of recipes.
After these basics, there are books useful for special circumstances. If your goal is to be broadly and deeply familiar with Algorithms you will need to cover quite a bit of additional material.
Numerical methods -- Numerical Recipes 3rd Edition: The Art of Scientific Computing by Tuekolsky and Vetterling. I love this book. 
Randomized algorithms -- Randomized Algorithms by Motwani and Raghavan. , Probability and Computing: Randomized Algorithms and Probabilistic Analysis by Michael Mitzenmacher, 
Hard problems (like NP) -- Approximation Algorithms by Vazirani . How to Solve It: Modern Heuristics by Michalewicz and Fogel. 
Data structures -- Advanced Data Structures by Brass. 
Functional programming -- Pearls of Functional Algorithm Design by Bird  and Purely Functional Data Structures by Okasaki .
Bit twiddling -- Hacker's Delight by Warren .
Distributed and parallel programming -- this material gets very hard so perhaps Distributed Algorithms by Lynch .
Machine learning and AI related algorithms -- Bishop's Pattern Recognition and Machine Learning  and Norvig's Artificial Intelligence: A Modern Approach 
These books will cover most of what a Ph.D. in CS might be expected to understand about algorithms. It will take years of study to work though all of them. After that, you will be reading about algorithms in journal publications (ACM and IEEE memberships are useful). For example, a recent, practical, and important development in hashing methods is called cuckoo hashing, and I don't believe that it appears in any of the books I've listed.
 Sedgewick, Algorithms, 2015. https://www.amazon.com/Algorithms-Fourth-Deluxe-24-Part-Lect...
 Cormen, et al., Introduction to Algorithms, 2009. https://www.amazon.com/s/ref=nb_sb_ss_i_1_15?url=search-alia...
 Sedgewick, An Introduction to the Analysis of Algorithms, 2013. https://www.amazon.com/Introduction-Analysis-Algorithms-2nd/...
 Knuth, The Art of Computer Programming, 2011. https://www.amazon.com/Computer-Programming-Volumes-1-4A-Box...
 Tuekolsky and Vetterling, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 2007. https://www.amazon.com/Numerical-Recipes-3rd-Scientific-Comp...
 Vazirani, https://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazi...
 Michalewicz and Fogel, https://www.amazon.com/How-Solve-Heuristics-Zbigniew-Michale...
 Brass, https://www.amazon.com/Advanced-Data-Structures-Peter-Brass/...
 Bird, https://www.amazon.com/Pearls-Functional-Algorithm-Design-Ri...
 Okasaki, https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...
 Warren, https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
 Lynch, https://www.amazon.com/Distributed-Algorithms-Kaufmann-Manag...
 Bishop, https://www.amazon.com/Pattern-Recognition-Learning-Informat...
 Norvig, https://www.amazon.com/Artificial-Intelligence-Modern-Approa...
0. Milewski's "Category Theory for Programmers"
1. Goldblatt's "Topoi"
2. McLarty's "The Uses and Abuses of the History of Topos Theory" (this does not require , it just undoes some historical assumptions made in  and, like everything else by McLarty, is extraordinarily well-written)
3. Goldblatt's "Lectures on the Hyperreals"
4. Nelson's "Radically Elementary Probability Theory"
5. Tao's "Ultraproducts as a Bridge Between Discrete and Continuous Analysis"
6. Some canonical machine learning text, like Murphy or Bishop
7. Koller/Friedman's "Probabilistic Graphical Models"
8. Lawvere's "Taking Categories Seriously"
From there you should see a variety of paths for mapping (things:Uncertainty) <-> (things:Structure). The Giry monad is just one of them, and would probably be understandable after reading Barr/Wells' "Toposes, Triples and Theories".
The above list also assumes some comfort with integration. Particularly good books in line with this pedagogical path might be:
9. Any and all canonical intros to real analysis
10. Malliavin's "Integration and Probability"
11. Segal/Kunze's "Integrals and Operators"
Similarly, some normative focus on probability would be useful:
12. Jaynes' "Probability Theory"
13. Pearl's "Causality"
Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Flach):
Machine Learning: A Probabilistic Perspective (Murphy):
Pattern Recognition and Machine Learning (Bishop):
There are some great resources/books for Bayesian statistics and graphical models. I've listed them in (approximate) order of increasing difficulty/mathematical complexity:
Think Bayes (Downey):
Bayesian Methods for Hackers (Davidson-Pilon et al):
Doing Bayesian Data Analysis (Kruschke), aka "the puppy book":
Bayesian Data Analysis (Gellman):
Bayesian Reasoning and Machine Learning (Barber):
Probabilistic Graphical Models (Koller et al):
If you want a more mathematical/statistical take on Machine Learning, then the two books by Hastie/Tibshirani et al are definitely worth a read (plus, they're free to download from the authors' websites!):
Introduction to Statistical Learning:
The Elements of Statistical Learning:
Obviously there is the whole field of "deep learning" as well! A good place to start is with: http://deeplearning.net/
Also, if you needed more information about optimization methods all of Stephen Boyd's books are really good, just check out his entire website for information. http://www.stanford.edu/~boyd/
Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)
Which goes nicely with the Weka open source ML toolkit
(although it is a good read without the toolkit)
If you want a bit more math, I really like the recent (Oct 2007) book:
Pattern Recognition and Machine Learning
by Christopher M. Bishop
It is nicely self contained, going through all the stats you'll need.
Get dozens of book recommendations delivered straight to your inbox every Thursday.