I do not believe "you need to understand all these deep and hard concepts before you start to touch ML." That is a contortion of what I said.
First point: ML is not a young field- term was coined in 1959. Not to mention the ideas are much older. *
Second Point: ML/'AI' relies on a slew of various concepts in maths. Take any 1st year textbook -- i personally like Peter Norvig's. I find the breadth of the field quite astounding.
Third Point: Most PhDs are specialists-- aka, if I am getting a PhD in ML, i specialize in a concrete problem domain/subfield, so I can specialize in all subfields. For example, I work on event detection and action recognition in video models. Before being accepted into a PhD you must pass a Qual, which ensures you understand the foundations of the field. So comparing to this is a straw man argument.
If your definition of ML is taking a TF model and running it, then I believe we have diverging assumptions of what the point of a course in ML is. Imo the point of an undergraduate major is to become acquainted with the field and be able to perform reasonably well in it professionally.
The reason why so many companies (Google,FB,MS etc) are paying for this talent, is that it is not easy to learn and takes time to master. Most people who just touch ML have a surface level understanding.
I have seen people who excel at TF (applied to deep learning) without having an ML background, but even they have issues when it comes to understanding concepts in optimization, convergence, model capacity that have huge bearings on how their models perform.
If no, what are the great resources for starters?
The videos / slides / assignments from here:
Any tips before I get this journey going?
Depending on your maths background, you may need to refresh some math skills, or learn some new ones. The basic maths you need includes calculus (including multi-variable calc / partial derivatives), probability / statistics, and linear algebra. For a much deeper discussion of this topic, see this recent HN thread:
Luckily there are tons of free resources available online for learning various maths topics. Khan Academy isn't a bad place to start if you need that. There are also tons of good videos on Youtube from Gilbert Strang, Professor Leonard, 3blue1brown, etc.
Also, check out Kaggle.com. Doing Kaggle contests can be a good way to get your feet wet.
And the various Wikipedia pages on AI/ML topics can be pretty useful as well.
As you can see in my "tag" on my post - most of what I have learned came from these courses:
1. AI Class / ML Class (Stanford-sponsored, Fall 2011)
2. Udacity CS373 (2012) - https://www.udacity.com/course/artificial-intelligence-for-r...
3. Udacity Self-Driving Car Engineer Nanodegree (currently taking) - https://www.udacity.com/course/self-driving-car-engineer-nan...
For the first two (AI and ML Class) - these two MOOCs kicked off the founding of Udacity and Coursera (respectively). The classes are also available from each:
Udacity: Intro to AI (What was "AI Class"):
Coursera: Machine Learning (What was "ML Class"):
Now - a few notes: For any of these, you'll want a good understanding of linear algebra (mainly matrices/vectors and the math to manipulate them), stats and probabilities, and to a lessor extent, calculus (basic info on derivatives). Khan Academy or other sources can get you there (I think Coursera and Udacity have courses for these, too - plus there are a ton of other MOOCs plus MITs Open Courseware).
Also - and this is something I haven't noted before - but the terms "Artificial Intelligence" and "Machine Learning" don't necessarily mean the same thing. Based on what I have learned, it seems like artificial intelligence mainly revolves around modern understandings of artificial neural networks and deep learning - and is a subset of machine learning. Machine learning, though, also encompasses standard "algorithmic" learning techniques, like logistic and linear regression.
The reason why neural networks is a subset of ML, is because a trained neural network ultimately implements a form of logistic (categorization, true/false, etc) or linear regression (range) - depending on how the network is set up and trained. The power of a neural network comes from not having to find all of the dependencies (iow, the "function"); instead the network learns them from the data. It ends up being a "black box" algorithm, but it allows the ability to work with datasets that are much larger and more complex than what the algorithmic approaches allow for (that said, the algorithmic approaches are useful, in that they use much less processing power and are easier to understand - no use attempting to drive a tack with a sledgehammer).
With that in mind, the sequence to learn this stuff would probably be:
1. Make sure you understand your basics: Linear Algebra, stats and probabilities, and derivatives
2. Take a course or read a book on basic machine learning techniques (linear regression, logistic regression, gradient descent, etc).
3. Delve into simple artificial neural networks (which may be a part of the machine learning curriculum): understand what feed-forward and back-prop are, how a simple network can learn logic (XOR, AND, etc), how a simple network can answer "yes/no" and/or categorical questions (basic MNIST dataset). Understand how they "learn" the various regression algorithms.
4. Jump into artificial intelligence and deep learning - implement a simple neural network library, learn tensorflow and keras, convolutional networks, and so forth...
Now - regarding self-driving vehicles - they necessarily use all of the above, and more - including more than a bit of "mechanical" techniques: Use OpenCV or another machine vision library to pick out details of the road and other objects - which might then be able to be processed by a deep learning CNN - ex: Have a system that picks out "road sign" object from a camera, then categorizes them to "read" them and use the information to make decisions on how to drive the car (come to a stop, or keep at a set speed). In essence, you've just made a portion of Tesla's vehicle assist system (first project we did in the course I am taking now was to "follow lane lines" - the main ingredient behind "lane assist" technology - used nothing but OpenCV and Python). You'll also likely learn stuff about Kalman filters, pathfinding algos, sensor fusion, SLAM, PID controllers, etc.
I can't really recommend any books to you, given my level of knowledge. I've read more than a few, but most of them would be considered "out of date". One that is still being used in university level courses is this:
Note that it is a textbook, with textbook pricing...
Another one that I have heard is good for learning neural networks with is:
There are tons of other resources online - the problem is separating the wheat from the chaff, because some of the stuff is outdated or even considered non-useful. There are many research papers out there that can be bewildering. I would say if you read them, until you know which is what, take them all with a grain of salt - research papers and web-sites alike. There's also the problem of finding diamonds in the rough (for instance, LeNet was created in the 1990s - but that was also in the middle of an AI winter, and some of the stuff written at the time isn't considered as useful today - but LeNet is a foundational work of today's ML/AI practices).
Now - history: You would do yourself good to understand the history of AI and ML, the debates, the arguments, etc. The base foundational work come from McCulloch and Pitts concept of an artificial neuron, and where that led:
Also - Alan Turing anticipated neural networks of the kind that wasn't seen until much later:
...I don't know if he was aware of McCulloch and Pitts work which came prior, as they were coming at the problem from the physiological side of things; a classic case where inter-disciplinary work might have benefitted all (?).
You might want to also look into the philosophical side of things - theory of mind stuff, and some of the "greats" there (Minsky, Searle, etc); also look into the books written and edited by Douglas Hofstadter:
There's also the "lesser known" or "controversial" historical people:
* Hugo De Garis (CAM-Brain Machine)
* Igor Aleksander
* Donald Michie (MENACE)
...among others. It's interesting - De Garis was a very controversial figure, and most of his work, for whatever it is worth - has kinda been swept under the rug. He built a few computers that were FPGA based hardware neural network machines that used cellular automata a-life to "evolve" neural networks. There were only a handful of these machines made; aesthetically, their designs were as "sexy" as the old Cray computers (seriously).
Donald Michie's MENACE - interestingly enough - was a "learning computer" made of matchboxes and beads. It essentially implemented a simple neural network that learned how to play (and win at) naughts and crosses (TIC-TAC-TOE). All in a physically (by hand) manipulated "machine".
Then there is one guy, who is "reviled" in the old-school AI community on the internet (take a look at some of the old comp.ai newsgroup archives, among others). His nom-de-plume is "Mentifex" and he wrote something called "MIND.Forth" (and translated it to a ton of other languages), that he claimed was a real learning system/program/whatever. His real name is "Arthur T. Murray" - and he is widely considered to be one of the earliest "cranks" on the internet:
Heck - just by posting this I might be summoning him here! Seriously - this guy gets around.
Even so - I'm of the opinion that it might be useful for people to know about him, so they don't go to far down his rabbit-hole; at the same time, I have a small feeling that there might be a gem or two hidden inside his system or elsewhere. Maybe not, but I like to keep a somewhat open mind about these kinds of things, and not just dismiss them out of hand (but I still keep in mind the opinions of those more learned and experienced than me).
AI, a modern approach (Norvig & Russel) - For classic AI stuff, although nowadays it might fade a bit with all the deep learning advances.
While it's not strictly CS, Tufte's Visual Display of Quantitative information should probably be on every programmer's shelf.
If you are just learning programming, plan on taking your time with the algorithms but practice coding every day. Find a fun project to attempt that is within your level of skill.
If you are a strong programmer in one language, find a book of algorithms using that language (some of the suggestions here in these comments are excellent). I list some of the books I like at the end of this comment.
If you are an experienced programmer, one algorithm per day is roughly doable. Especially so, because you are trying to learn one algorithm per day, not produce working, production level code for each algorithm each day.
Some algorithms are really families of algorithms and can take more than a day of study, hash based look up tables come to mind. First there are the hash functions themselves. That would be day one. Next there are several alternatives for storing entries in the hash table, e.g. open addressing vs chaining, days two and three. Then there are methods for handling collisions, linear probing, secondary hashing, etc.; that's day four. Finally there are important variations, perfect hashing, cuckoo hashing, robin hood hashing, and so forth; maybe another 5 days. Some languages are less appropriate for playing around and can make working with algorithms more difficult, instead of a couple of weeks this could easily take twice as long. After learning other methods of implementing fast lookups, its time to come back to hashing and understand when its appropriate and when alternatives are better and to understand how to combine methods for more sophisticated lookup methods.
I think you will be best served by modifying your goal a bit and saying that you will work on learning about algorithms every day and cover all of the material in a typical undergraduate course on the subject. It really is a fun branch of Computer Science.
A great starting point is Sedgewick's book/course, Algorithms . For more depth and theory try , Cormen and Leiserson's excellent Introduction to Algorithms. Alternatively the theory is also covered by another book by Sedgewick, An Introduction to the Analysis of Algorithms . A classic reference that goes far beyond these other books is of course Knuth , suitable for serious students of Computer Science less so as a book of recipes.
After these basics, there are books useful for special circumstances. If your goal is to be broadly and deeply familiar with Algorithms you will need to cover quite a bit of additional material.
Numerical methods -- Numerical Recipes 3rd Edition: The Art of Scientific Computing by Tuekolsky and Vetterling. I love this book. 
Randomized algorithms -- Randomized Algorithms by Motwani and Raghavan. , Probability and Computing: Randomized Algorithms and Probabilistic Analysis by Michael Mitzenmacher, 
Hard problems (like NP) -- Approximation Algorithms by Vazirani . How to Solve It: Modern Heuristics by Michalewicz and Fogel. 
Data structures -- Advanced Data Structures by Brass. 
Functional programming -- Pearls of Functional Algorithm Design by Bird  and Purely Functional Data Structures by Okasaki .
Bit twiddling -- Hacker's Delight by Warren .
Distributed and parallel programming -- this material gets very hard so perhaps Distributed Algorithms by Lynch .
Machine learning and AI related algorithms -- Bishop's Pattern Recognition and Machine Learning  and Norvig's Artificial Intelligence: A Modern Approach 
These books will cover most of what a Ph.D. in CS might be expected to understand about algorithms. It will take years of study to work though all of them. After that, you will be reading about algorithms in journal publications (ACM and IEEE memberships are useful). For example, a recent, practical, and important development in hashing methods is called cuckoo hashing, and I don't believe that it appears in any of the books I've listed.
 Sedgewick, Algorithms, 2015. https://www.amazon.com/Algorithms-Fourth-Deluxe-24-Part-Lect...
 Cormen, et al., Introduction to Algorithms, 2009. https://www.amazon.com/s/ref=nb_sb_ss_i_1_15?url=search-alia...
 Sedgewick, An Introduction to the Analysis of Algorithms, 2013. https://www.amazon.com/Introduction-Analysis-Algorithms-2nd/...
 Knuth, The Art of Computer Programming, 2011. https://www.amazon.com/Computer-Programming-Volumes-1-4A-Box...
 Tuekolsky and Vetterling, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 2007. https://www.amazon.com/Numerical-Recipes-3rd-Scientific-Comp...
 Vazirani, https://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazi...
 Michalewicz and Fogel, https://www.amazon.com/How-Solve-Heuristics-Zbigniew-Michale...
 Brass, https://www.amazon.com/Advanced-Data-Structures-Peter-Brass/...
 Bird, https://www.amazon.com/Pearls-Functional-Algorithm-Design-Ri...
 Okasaki, https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...
 Warren, https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...
 Lynch, https://www.amazon.com/Distributed-Algorithms-Kaufmann-Manag...
 Bishop, https://www.amazon.com/Pattern-Recognition-Learning-Informat...
 Norvig, https://www.amazon.com/Artificial-Intelligence-Modern-Approa...
The course I took used the Norvig text as a textbook, which I also recommend.
http://ai.berkeley.edu/project_overview.html. See the "Lectures" link at the top for all the course videos/slides.
http://www.amazon.com/Artificial-Intelligence-Modern-Approac... Note that the poor reviews center on the price, the digital/Kindle edition and the fact that the new editions don't differ greatly from the older ones. If you've never read it and you have the $$, a hardbound copy makes a great learning and reference text, and it's the kind of content that's not going to go out of date.
Has plenty of examples in Python. You can also look at different Udacity courses. They have a couple dealing with ML with Python.
I'm fairly certain that the claim in the introduction – "Unfortunately, current AI texts either fail to mention this
algorithm or refer to it only in the context of two-person game searches" – is no longer true.
From my current textbook (Artificial Intelligence: A Modern Approach ):
"Iterative deepening search (or iterative deepening depth-first search) is a general strategy, often used in combination with depth-first tree search, that finds the best depth limit. It does this by gradually increasing the limit — first 0, then 1, then 2, and so on — until a goal is found... In general, iterative deepening is the preferred uninformed search method when the search space is large and the depth of the solution is not known."
In an intro machine learning course you'd learn about minimax and others, but skip paying any money and just read the basic algorithms here and look up the wiki pages for even more examples: http://www.stanford.edu/~msirota/soco/blind.html (The introduction has some term definitions.)
Edit: Also, the obligatory plug for Artificial Intelligence: A Modern Approach http://www.amazon.com/Artificial-Intelligence-Modern-Approac...
A couple of videos about Kiva: http://www.raffaello.name/KivaSystems.html
A paper on Kiva: http://www.raffaello.name/Assets/Publications/CoordinatingHu...
http://www.cs.utexas.edu/~pstone/Courses/344Mfall10/assignme... has all of the readings for the class (with more on the 'resources' page). If you want to learn something about AI, it's certainly a good place to start!
If you're more into the algorithms side of AI, you should certainly read AI: A Modern Approach (http://www.amazon.com/Artificial-Intelligence-Modern-Approac...). It's a text book, don't get me wrong, but it clearly explains many relevant algorithms in AI today with accompanying pseudocode and theory. If you've got a CS background, it's a great reference/learning tool. I bought mine for a class, and won't be returning it!