by Vince Buffalo
ISBN: 9781449367480
Buy from O’Reilly
Found in 3 comments on Hacker News
faizshah · 2021-12-26 · Original thread
The makefile data pipeline is definitely an underrated technique a couple great HN comments on this technique:

- https://news.ycombinator.com/item?id=22283368

- https://news.ycombinator.com/item?id=18896204

I personally learned it from bioinformaticians theres great coverage of this and other command line data skills in this book: https://www.oreilly.com/library/view/bioinformatics-data-ski...

The SQLite, pandas, bash, make stack for quick data science projects is a great and maintainable one that doesn’t require too much specialized knowledge.

endrebak · 2015-09-07 · Original thread
Oreilly has an excellent bioinfo data skills book: http://shop.oreilly.com/product/mobile/0636920030157.do
a_bonobo · 2015-04-26 · Original thread
She may also enjoy Vince Buffalo's Bioinformatics Data Skills

http://shop.oreilly.com/product/0636920030157.do

It's more focused on how to analyze existing biological data with the shell, R, and how to use git.

Personally, I've rarely seen advanced machine learning being used outside of genome-wide association studies, and even there most people just use PLINK's logistic regression without understanding what's being done and call it a day.

Another really good book on how to understand statistics is Motulsky's Intuitive Biostatistics - it introduces all common "tests" and methodologies people working in the life sciences use, but without the formulas (you use R for that anyway). It's more about the caveats of each test, in which situation you'd use it, what can go wrong, how to interpret the results etc., all written in a very lively style.