Found 5 comments on HN
cschmidt · 2017-12-08 · Original thread
Or you could clone his Github repo, and do it in Jupyter:

I was really pleased to notice that the second edition of the "Pandas book" ( just came out in late October. I'm about halfway through reading it now.

pythonbull · 2016-10-24 · Original thread
Data Science from Scratch

Python for Data Analysis

Web Scraping with Python: Collecting Data from the Modern Web

Python Machine Learning

danso · 2016-09-16 · Original thread
There's a large number of such books, though none that are as authoritative with respect to Python (this is a statement about the size of Python's community vs. R, not necessarily about the authors):

- via Wes McKinney, creator of pandas (which makes Python about as close to R as you can get):


There are a bunch of books specific to machine learning too though I haven't read them myself.

pvnick · 2015-07-16 · Original thread
Good article for beginners. A couple thoughts, just to build on what the author said:

First off, data science == fancy name for data mining/analysis. Wanted to clear that up due to buzzwordy nature of "data science."

Learn SQL - this is the big one. You must be proficient with SQL to be effective at data science. Whether it's running on an RDBMS or translating to map/reduce (Hive) or DAG (Spark), SQL is invaluable. If you don't know what those acronyms mean yet, don't worry. Just learn SQL.

Learn to communicate insights - I would add here to try some UI techniques. Highcharts, d3.js, these are good libraries for telling your data story. You can also do a ton just with Excel and not need to write any code beyond what you wrote for the mining portion (usually SQL).

I would also go back to basics with regards to statistical techniques. Start with your simple Z Score, this is such an important tool in your data science toolbox. If you're just looking at raw numbers, try to Z-normalize the data and see what happens. You'd be surprised what you can achieve with a high school statistics textbook, Postgres/MySQL (or even Excel!), and a moderate-sized data set. These are powerful enough to answer the majority of your questions, and when they fail then move on to more sexy algorithms.

Edit: one more thing I forgot to mention. After SQL, learn Python. There are a ton of libraries in the python ecosystem that are perfect for data science (numpy, scipy, scikit-learn, etc). It's also one of the top languages used in academic settings. My preferred data science workspace involves Python, IPython Notebook, and Pandas (This book is quite good:

poof131 · 2012-11-30 · Original thread
I guess the question is what do you mean by advanced topics? What direction do you want to go in? The latter book you mentioned seems to cover a number of topics and is probably a good bet.

If you are interested in the web, both these books were good:

Here are a few books that cover some "advanced?" topics that I'd like to read when I have time (would also like to hear other peoples' recommendations on them):

I'm not sure on your background or the quality of these books, but an understanding of data structures, algorithms, and object oriented programming could be considered important:

Although these and other intermediate to advanced topics tend to be covered better in non-language-specific books such as this shotgun blast to the head. Don't worry, it's just an "introduction":

Get dozens of book recommendations delivered straight to your inbox every Thursday.