I've heard that Google and Baidu essentially started at the same time, with the same algorithm discovery (PageRank). Maybe someone can comment on if there was idea sharing or if both teams derived it independently.
This. Highly recommend Russel & Norvig [1] for high-level intuition and motivation. Then Bishop's "Pattern Recognition and Machine Learning" [2] and Koller's PGM book [3] for the fundamentals.
Avoid MOOCs, but there are useful lecture videos, e.g. Hugo Larochelle on belief propagation [4].
FWIW this is coming from a mechanical engineer by training, but self-taught programmer and AI researcher. I've been working in industry as an AI research engineer for ~6 years.
[1] https://www.amazon.com/Artificial-Intelligence-Modern-Approa...
[2] https://www.amazon.com/Pattern-Recognition-Learning-Informat...
[3] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
[0] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
0. Milewski's "Category Theory for Programmers"[0]
1. Goldblatt's "Topoi"[1]
2. McLarty's "The Uses and Abuses of the History of Topos Theory"[2] (this does not require [1], it just undoes some historical assumptions made in [1] and, like everything else by McLarty, is extraordinarily well-written)
3. Goldblatt's "Lectures on the Hyperreals"[3]
4. Nelson's "Radically Elementary Probability Theory"[4]
5. Tao's "Ultraproducts as a Bridge Between Discrete and Continuous Analysis"[5]
6. Some canonical machine learning text, like Murphy[6] or Bishop[7]
7. Koller/Friedman's "Probabilistic Graphical Models"[8]
8. Lawvere's "Taking Categories Seriously"[9]
From there you should see a variety of paths for mapping (things:Uncertainty) <-> (things:Structure). The Giry monad is just one of them, and would probably be understandable after reading Barr/Wells' "Toposes, Triples and Theories"[10].
The above list also assumes some comfort with integration. Particularly good books in line with this pedagogical path might be:
9. Any and all canonical intros to real analysis
10. Malliavin's "Integration and Probability"[11]
11. Segal/Kunze's "Integrals and Operators"[12]
Similarly, some normative focus on probability would be useful:
12. Jaynes' "Probability Theory"[13]
13. Pearl's "Causality"[14]
---
[0] https://bartoszmilewski.com/2014/10/28/category-theory-for-p...
[1] https://www.amazon.com/Topoi-Categorial-Analysis-Logic-Mathe...
[2] http://www.cwru.edu/artsci/phil/UsesandAbuses%20HistoryTopos...
[3] https://www.amazon.com/Lectures-Hyperreals-Introduction-Nons...
[4] https://web.math.princeton.edu/%7Enelson/books/rept.pdf
[5] https://www.youtube.com/watch?v=IS9fsr3yGLE
[6] https://www.amazon.com/Machine-Learning-Probabilistic-Perspe...
[7] https://www.amazon.com/Pattern-Recognition-Learning-Informat...
[8] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
[9] http://www.emis.de/journals/TAC/reprints/articles/8/tr8.pdf
[10] http://www.tac.mta.ca/tac/reprints/articles/12/tr12.pdf
[11] https://www.springer.com/us/book/9780387944098
[12] https://www.amazon.com/Integrals-Operators-Grundlehren-mathe...
[13] http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...
[14] https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...
Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Flach): http://www.amazon.com/Machine-Learning-Science-Algorithms-Se...
Machine Learning: A Probabilistic Perspective (Murphy): http://www.amazon.com/Machine-Learning-Probabilistic-Perspec...
Pattern Recognition and Machine Learning (Bishop): http://www.amazon.com/Pattern-Recognition-Learning-Informati...
There are some great resources/books for Bayesian statistics and graphical models. I've listed them in (approximate) order of increasing difficulty/mathematical complexity:
Think Bayes (Downey): http://www.amazon.com/Think-Bayes-Allen-B-Downey/dp/14493707...
Bayesian Methods for Hackers (Davidson-Pilon et al): https://github.com/CamDavidsonPilon/Probabilistic-Programmin...
Doing Bayesian Data Analysis (Kruschke), aka "the puppy book": http://www.amazon.com/Doing-Bayesian-Data-Analysis-Second/dp...
Bayesian Data Analysis (Gellman): http://www.amazon.com/Bayesian-Analysis-Chapman-Statistical-...
Bayesian Reasoning and Machine Learning (Barber): http://www.amazon.com/Bayesian-Reasoning-Machine-Learning-Ba...
Probabilistic Graphical Models (Koller et al): https://www.coursera.org/course/pgm http://www.amazon.com/Probabilistic-Graphical-Models-Princip...
If you want a more mathematical/statistical take on Machine Learning, then the two books by Hastie/Tibshirani et al are definitely worth a read (plus, they're free to download from the authors' websites!):
Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/
The Elements of Statistical Learning: http://statweb.stanford.edu/~tibs/ElemStatLearn/
Obviously there is the whole field of "deep learning" as well! A good place to start is with: http://deeplearning.net/
[1] http://www.amazon.com/dp/B00IOY8XWQ/ref=fs_kv
[2] http://www.amazon.com/Probabilistic-Graphical-Models-Princip...
Probably also canonical are Goodfellow's Deep Learning [2], Koller & Friedman's PGMs [3], the Krizhevsky ImageNet paper [4], the original GAN [5], and arguably also the AlphaGo paper [6] and the Atari DQN paper [7].
[1] https://aima.cs.berkeley.edu/
[2] https://www.deeplearningbook.org/
[3] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
[4] https://proceedings.neurips.cc/paper_files/paper/2012/file/c...
[5] https://arxiv.org/abs/1406.2661
[6] https://www.nature.com/articles/nature16961
[7] https://www.nature.com/articles/nature14236