For distributed systems I'd read Lynch's book on Distributed Algorithms.
For stats Michael Freedman's book, Statistics, is a good simple introduction. Someone mentioned Calculus and Statistics (http://www.amazon.com/Calculus-Statistics-Dover-Books-Mathem...) here on HN a couple of weeks ago. I had looked it when it came out and re-reviewed it with that thread -- I really like it a fair bit more than I remember and a better text for those looking for a more rigorous treatment than Freedman -- although probably still too simple if you're reading Foundations of Statistical NLP.
There's also Scott's book on Programming Languages, which is worth reading.
This is silly. All probability distributions are cadlag, so how can you even teach probability without the notion of right continous with left limits, which means you have to resort to limits & derivatives => Calc.
Actually, the argument for combining Calc & Stats is very compelling, because there is too much synergy. How can you teach a continous probability distribution like say the Gaussian without teaching how to integrate under the curve for the cumulative distribution function, or obtaing the probability density function via the derivative, or obtaining the variance aka second central moment via the moment generating function, which means you now have to teach atleast some fourier transforms which again means Calculus. At both UChicago & Stanford where I learnt all of my probability, calculus was quite intertwined with the teaching of probability. I believe its the same case in most other schools as well.
Without calc in probability, you can do "lame" stuff like discrete distributions ( Binomial, Poisson etc....but even there, the key insight is to show how the CDFs of the discrete distributions, which will generally have terribly complicated formulae with giant factorial expressions, can be very nicely approximated by the continous distributions for large n, small p etc. ( aka continous correction http://en.wikipedia.org/wiki/Continuity_correction ). So for a large number of coin flips trials, you use a Normal to approximate the CDF because otherwise the original binomial CDF is too hard to compute with your TI-84s (because you have one giant factorial divided by another giant factorial and the numerical overflows will kill the computation unless you are very careful about how you go about computing the result).
My favorite go-to guide remains the excellent Calc & Stat Dover book ( http://www.amazon.com/Calculus-Statistics-Dover-Books-Mathem... ), which combines Calc & Stats from page 1. There is simply no better way to learn stats than via calc.
Get dozens of book recommendations delivered straight to your inbox every Thursday.