PaulHoule · 2021-10-12 · Original thread
This book is a little old but you should learn whatever math it takes to understand it

Another oldie-but-goodie is

which tells the secret of when to stop when you're doing "early stopping", something I've seen many modern deep learners fail to get right.

Off the top of my head I would say the fundamental math about deep networks is not really new. Most of the work in that field is pretty ad-hoc and not a lot is proven; probably people that are proving things are using difficult graduate-level math but you don't need to go there.

