Found in 4 comments
throwawaypls · 2018-05-16 · Original thread
I read this book titled "Designing Data Intensive Applications", which covers this and a lot of other stuff about designing applications in general. https://www.amazon.com/Designing-Data-Intensive-Applications...
jpamata · 2018-05-10 · Original thread
Designing Data-Intensive Applications[0] by Martin Kleppmann. There's a previous HN thread about it[1]. Helped me understand a bit more about databases and systems. The book is also very approachable and has the perfect blend of application and theory at a high level that anyone approaching the industry for the first time stands to gain a lot from reading it.

The Architecture of Open Source Applications[2] series is a good one for leaning how to build production applications and you can read it online. The chapter on Scalable Web Architecture[3] is a must-read.

[0] https://www.amazon.com/Designing-Data-Intensive-Applications...

[1] https://news.ycombinator.com/item?id=15428526

[2] http://aosabook.org/en/index.html

[3] http://aosabook.org/en/distsys.html

adamnemecek · 2017-01-17 · Original thread
You should try to understand how databases in general work, it will help you with your query writing.

One thing you have to realize is that once you get a little advanced, you have to get to the details of the single SQL implementations, it's not about SQL but about Postgres.

I've found these books really valuable

# SQL Performance Explained Everything Developers Need to Know about SQL Performance

https://www.amazon.com/Performance-Explained-Everything-Deve...

This book fundamentally talks about how to effectively use and leverage the SQL indices. Talks about all the important implementations (Postgres, MySQL, Oracle, SQL Server).

# Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

https://www.amazon.com/Designing-Data-Intensive-Applications...

This book gets mentioned a bunch around here and for a good reason. There aren't too many concrete resources on making your systems "webscale" and this one is really good.

# PostgreSQL 9.0 High Performance

https://www.amazon.com/PostgreSQL-High-Performance-Gregory-S...

Discusses all the different settings and tweaks you can do in Postgres. It's crazy how much of a perf gain you can get just by twiddling the parameters of the database, i.e. all the tricks you can do when the single instances are bottle necks.

There's a similar book for MySQL https://www.amazon.com/High-Performance-MySQL-Optimization-R...

# PostgreSQL 9 High Availability Cookbook

https://www.amazon.com/PostgreSQL-9-High-Availability-Cookbo...

Discusses how do you go from 1 Postgres instance to 1+ instance. Talks about replication, monitoring, cluster management, avoiding downtime etc i.e. all the tricks you can do to manage multiple instances. Again there's a similar book for MySQL https://www.amazon.com/MySQL-High-Availability-Building-Cent...

Last but not least check out the postgres documentation, people consider it a standard of what good documentation looks like https://www.postgresql.org/docs/9.6/static/index.html

Also last but not least, read up on relational algebra (the foundation of SQL) https://en.wikipedia.org/wiki/Relational_algebra. I've always found SQL to be extremely verbose (the syntax reminds me of idk COBOL or smth) but there's another query language called Datalog, that's for our purposes similar to SQL but the syntax is much more legible.

E.g. check out these snippets from these slides (page 29) (and check out the whole class too)

https://pages.iai.uni-bonn.de/manthey_rainer/IIS_1617/IIS201...

Datalog:

s(X) <- p(X,Y).

s(X) <- r(Y,X).

t(X,Y,Z) <- p(X,Y), r(Y,Z).

w(X) <- s(X), not q(X).

SQL:

CREATE VIEW s AS (SELECT a FROM p)

UNION

(SELECT b FROM r);

CREATE VIEW t AS

SELECT a, b, c

FROM p, r

WHERE p.b = r.a,

CREATE VIEW w AS (TABLE s)

MINUS (TABLE q);

mindcrash · 2016-02-25 · Original thread
You probably might want to read this (for free): http://book.mixu.net/distsys/single-page.html

And pay a little to read this book: http://www.amazon.com/Designing-Data-Intensive-Applications-...

And this one: http://www.amazon.com/Big-Data-Principles-practices-scalable...

Nathan Marz brought Apache Storm to the world, and Martin Kleppmann is pretty well known for his work on Kafka.

Both are very good books on building scalable data processing systems.

View on Amazon
Stay up to date on the latest finds. Delivered to your inbox every Thursday.   Preview