http://www.amazon.com/Nonlinear-Dynamics-And-Chaos-Applicati...
In fact everything by that author is great, buy and read it all:)
http://www.amazon.com/Steven-H.-Strogatz/e/B001KHB290/ref=dp...
Just make sure you've read up on your differential equations before you start:) This isn't a topic for someone without a strong background in the mathematics!
Unfortunately I have yet to find a good introductory course, video, book, pdf....anything for this topic. I had to jump in with a specific model in mind and keep banging away until I iterated my way towards the solution.
Suggestions for introductory material are welcome!!!
A field that does inspire a lot of deep learning folks and never gets mentiond in this sort of thing is the theory of physical dynamical systems. Attractor is a term that came from here, for example, and much of the mathematics behind the numerical fuckery behind deep nets is dynamical in nature. RNN's are entirely dynamical systems. Classic there is Strogatz book (https://www.amazon.com/Nonlinear-Dynamics-Chaos-Applications...).
There is also information theory, of course, which is part of the MacKay source.
Many of the earlier papers in deep learning-land are really nontrivial to read, because the terminology and worldview of everybody has changed so much. So reading original Werbos or Rumelhart is really difficult. This is really not the case for Sutton and Barto, "RL: An Introduction" (http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html). Two editions, apparently the second edition is basically getting with the program on shoving DL into everything.
Schmidhuber often mentions that Gauss was the original shallow learner. This is a technically correct statement (best kind of statement), but you definitely should probably know linear and logistic regression like the back of your hand before starting on DL too much.