Hi all, original author here.

Some have questioned why I would spend the time advocating against the use of Hadoop for such small data processing tasks as that's clearly not when it should be used anyway. Sadly, Big Data (tm) frameworks are often recommended, required, or used more often than they should be. I know to many of us it seems crazy, but it's true. The worst I've seen was Hadoop used for a processing task of less than 1MB. Seriously.

Also, much agreement with those saying there should be more education effort when it comes to teaching command line tools. O'Reilly even has a book out on the topic:

Thank you for all the comments and support.

O'Reilly is having a 50% sale on all ebooks through 9 September.

I just bought the early release of that exact book for $13.60, which was 60% off, because you get 60% off if you order $100 worth of prediscount ebooks.

When the book is finished you get the final version. It's mostly already finished.

"With Early Release ebooks, you get books in their earliest form — the author's raw and unedited content as he or she writes — so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle."

