which tells the secret of when to stop when you're doing "early stopping", something I've seen many modern deep learners fail to get right.
Off the top of my head I would say the fundamental math about deep networks is not really new. Most of the work in that field is pretty ad-hoc and not a lot is proven; probably people that are proving things are using difficult graduate-level math but you don't need to go there.
https://www.amazon.com/Networks-Recognition-Advanced-Econome...
Another oldie-but-goodie is
https://www.amazon.com/Neural-Networks-Lecture-Computer-Scie...
which tells the secret of when to stop when you're doing "early stopping", something I've seen many modern deep learners fail to get right.
Off the top of my head I would say the fundamental math about deep networks is not really new. Most of the work in that field is pretty ad-hoc and not a lot is proven; probably people that are proving things are using difficult graduate-level math but you don't need to go there.