0. Milewski's "Category Theory for Programmers"[0]
1. Goldblatt's "Topoi"[1]
2. McLarty's "The Uses and Abuses of the History of Topos Theory"[2] (this does not require [1], it just undoes some historical assumptions made in [1] and, like everything else by McLarty, is extraordinarily well-written)
3. Goldblatt's "Lectures on the Hyperreals"[3]
4. Nelson's "Radically Elementary Probability Theory"[4]
5. Tao's "Ultraproducts as a Bridge Between Discrete and Continuous Analysis"[5]
6. Some canonical machine learning text, like Murphy[6] or Bishop[7]
7. Koller/Friedman's "Probabilistic Graphical Models"[8]
8. Lawvere's "Taking Categories Seriously"[9]
From there you should see a variety of paths for mapping (things:Uncertainty) <-> (things:Structure). The Giry monad is just one of them, and would probably be understandable after reading Barr/Wells' "Toposes, Triples and Theories"[10].
The above list also assumes some comfort with integration. Particularly good books in line with this pedagogical path might be:
9. Any and all canonical intros to real analysis
10. Malliavin's "Integration and Probability"[11]
11. Segal/Kunze's "Integrals and Operators"[12]
Similarly, some normative focus on probability would be useful:
12. Jaynes' "Probability Theory"[13]
13. Pearl's "Causality"[14]
---
[0] https://bartoszmilewski.com/2014/10/28/category-theory-for-p...
[1] https://www.amazon.com/Topoi-Categorial-Analysis-Logic-Mathe...
[2] http://www.cwru.edu/artsci/phil/UsesandAbuses%20HistoryTopos...
[3] https://www.amazon.com/Lectures-Hyperreals-Introduction-Nons...
[4] https://web.math.princeton.edu/%7Enelson/books/rept.pdf
[5] https://www.youtube.com/watch?v=IS9fsr3yGLE
[6] https://www.amazon.com/Machine-Learning-Probabilistic-Perspe...
[7] https://www.amazon.com/Pattern-Recognition-Learning-Informat...
[8] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
[9] http://www.emis.de/journals/TAC/reprints/articles/8/tr8.pdf
[10] http://www.tac.mta.ca/tac/reprints/articles/12/tr12.pdf
[11] https://www.springer.com/us/book/9780387944098
[12] https://www.amazon.com/Integrals-Operators-Grundlehren-mathe...
[13] http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...
[14] https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...
http://www.amazon.com/Causality-Reasoning-Inference-Judea-Pe...
2. The Boardman Tasker Omnibus (http://www.amazon.co.uk/Boardman-Tasker-Omnibus-Peter/dp/189...)
The broad idea in (a) is to start with a fully connected graph, and eliminate edges between nodes that can be tested as independent, or independent conditionally on other nodes. This gives you a non-directed graph which can be oriented by several methods (identifying V-structures, looking at residuals of regressions of X on Y vs Y on X).
The theory in (b) actually generalizes instrumental variables and lays out graphical configurations where you can measure the causal effect of a variable onto another variable, and how to compute that effect.
A great reference: https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...
A nice introduction: https://www.youtube.com/watch?v=RPgvfSeQB8A