Found in 29 comments on Hacker News
xnorswap · 2025-10-29 · Original thread
It's a tricky problem, I'd recommend reading DDIA, it covers this extensively:

https://www.oreilly.com/library/view/designing-data-intensiv...

You can generate distributed monotonic number sequences with a Lamport Clock.

https://en.wikipedia.org/wiki/Lamport_timestamp

The wikipedia entry doesn't describe it as well as that book does.

It's not the end of the puzzle for distributed systems, but it gets you a long way there.

See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock

Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":

https://ia904606.us.archive.org/32/items/distributed-systems...

softfalcon · 2025-06-06 · Original thread
Seconded, this book goes hand-in-hand with "Designing Data-Intensive Applications" by Martin Kleppmann [0].

[0](https://www.oreilly.com/library/view/designing-data-intensiv...)

mnsc · 2024-10-21 · Original thread
I finished reading Kleppman's Designing Data-Intensive Applications last night and this looks like it's straight out of the last chapter that talk about the future. They don't use the term "dataflow" though.

https://www.oreilly.com/library/view/designing-data-intensiv...

larve · 2024-08-15 · Original thread
I actually find the quality of programming books to have starkly increased in the last decade. I find a lot of manning's and o'reilly's release to have a pretty long shelf-life.

For example, I really enjoyed and often go back to:

- https://www.oreilly.com/library/view/building-event-driven-m...

- https://www.oreilly.com/library/view/designing-data-intensiv...

- https://www.manning.com/books/100-go-mistakes-and-how-to-avo...

- https://www.amazon.com/Systems-Performance-Brendan-Gregg/dp/...

And more recently:

- https://www.manning.com/books/build-a-large-language-model-f...

- https://www.manning.com/books/the-creative-programmer

- https://www.manning.com/books/the-programmers-brain

- https://www.amazon.com/Understanding-Software-Addison-Wesley...

I also find books about specific technologies that indeed run the risk of being deprecated after a few years to be useful too

- https://www.oreilly.com/library/view/networking-and-kubernet...

- https://www.brendangregg.com/bpf-performance-tools-book.html

Furthermore, nothing keeps you from reading books about topics peripheral to computer science, say to keep up with the general vibes:

- https://www.amazon.com/Probabilistic-Machine-Learning-Introd...

- https://www.amazon.com/Deep-Learning-Foundations-Christopher...

- https://www.amazon.com/Joy-Abstraction-Exploration-Category-...

I find that all of these contribute significantly to my growth as an engineer.

teleforce · 2024-04-15 · Original thread
This book by Martin Kleppmann is really good for learning distributed systems foundations [1]. Couple this with any OS textbook, I think you will be loaded for the bear.

[1] Designing Data-Intensive Applications:

https://www.oreilly.com/library/view/designing-data-intensiv...

yonz · 2024-02-20 · Original thread
I'll add to this gratitude thread.

Martin has had a material impact on my career. I don't think I would have gotten my job at LinkedIn if it wasn't for his book, Designing Data-Intensive book: https://www.oreilly.com/library/view/designing-data-intensiv...

I learned how to build a web application with FE, BE, DB, and distributed workers for my first job. But it wasn't until I read his book that I understood the enormous gap between building web apps and planet-scale web applications. The book saved me from bombing my interviews.

chuckhend · 2023-08-07 · Original thread
Most of the content in https://www.oreilly.com/library/view/designing-data-intensiv... applies to ML systems.

https://github.com/noahgift also has a lot of content that is worth following.

Honestly a better resource for this is the following book . It is very easy to read and understand to use what to do where.

https://www.oreilly.com/library/view/designing-data-intensiv...

Row based databases are optimized for accessing compete rows and joins. Columnar storage is optimized for accessing all, or many column values across rows. This makes aggregates and applying transformation logic faster with columnar storage than row based storage. Ie they are great for data warehouses and other analytical workloads.

Ps, great and still highly relevant resource covering all the major database system designs, their advantages and drawbacks: https://www.oreilly.com/library/view/designing-data-intensiv...

photochemsyn · 2022-10-30 · Original thread
Twitter's core operation is running a massively distributed database, as I understand it, where concurrency issues are pretty important to get right. See Data-Intensive Applications (2017), Part II: Distributed Systems (Replication, Partitioning, Transactions, etc.) for an overview.

https://www.oreilly.com/library/view/designing-data-intensiv...

Just guessing, you'd think Starlink would be more experienced in that area than either Tesla or SpaceX? I suppose there are things like remote software updates and internal working databases to manage at SpaceX and Tesla, but it seems managing satellite traffic is a closer match.

Who knows, they might be, I really don't trust WaPo reporting anymore.

amdolan · 2022-09-03 · Original thread
Designing Data Intensive Applications seems like a good place to start. Some chapters will be applicable.

https://www.oreilly.com/library/view/designing-data-intensiv...

ptrik · 2022-02-10 · Original thread
Would also recommend reading "Designing Data-Intensive Applications" https://www.oreilly.com/library/view/designing-data-intensiv...

The chapter "Data Structures That Power Your Database" offers a great overview of various storage mechanisms of databases

lioeters · 2021-10-04 · Original thread
I often see people on HN recommend the book, Designing Data-Intensive Applications (2017). I've personally been chewing on the material for a while now, gaining new insights.

Here's the table of contents: https://www.oreilly.com/library/view/designing-data-intensiv...

It seems to cover roughly the same areas and range as the book you mentioned, Database Systems: The Complete Book (2008). http://infolab.stanford.edu/~ullman/dscb.html

xupybd · 2021-09-06 · Original thread
This is a hard question to answer.

You can do most anything with the wrong tool. I'd rather ask the question, when is PostgreSQL or MySQL the wrong tool for the job. I'm not sure I'm qualified to answer this, but I can point you in the direction of a book that has given me a much better understanding of the space. https://www.oreilly.com/library/view/designing-data-intensiv...

candu · 2021-05-08 · Original thread
Agree with others here that "web development" isn't quite as neatly bounded as "core computer science principles".

That said, I'll add MDN Web Docs [1] to the pile of links here as a good resource for practical details. If you're interested in the fundamentals of large-scale data-driven distributed systems (into which category many larger web applications fit), Designing Data-Intensive Applications [2] is quite excellent. NNGroup [3] has a lot of great foundational material on basic concepts of UX, usability, interaction, etc. for digital products.

[1] https://developer.mozilla.org/en-US/ [2] https://www.oreilly.com/library/view/designing-data-intensiv... [3] https://www.nngroup.com

LiamPa · 2020-01-05 · Original thread
Recommend ‘Designing data intensive applications’

https://www.oreilly.com/library/view/designing-data-intensiv...

mykowebhn · 2019-06-09 · Original thread
Based on your profile it looks like you have a lot of experience, so I would first rely on your experience.

That said, there are plenty of resources that have been helpful to me:

1) http://highscalability.com/

2) http://shop.oreilly.com/product/0636920032175.do

3) https://github.com/donnemartin/system-design-primer

Hope this helps!

dwater · 2019-02-06 · Original thread
If you want to do a real deep dive into the architectural differences of graph databases, the book "Designing Data-Intensive Applications" by Martin Kleppmann is a great resource. https://www.oreilly.com/library/view/designing-data-intensiv...
basetensucks · 2017-01-11 · Original thread
I've found this book to be quite solid so far: http://shop.oreilly.com/product/0636920032175.do

I'm about a quarter to half of the way through and it's been interesting and quite thorough even though it's still a "beta" book. The content is a little high level so some familiarity with distributed systems principles is useful but the text is very approachable and easy to understand (so far).

I got it after seeing several recommendations in other HN threads so I'm not the only person that has found it useful.

MediumD · 2017-01-03 · Original thread
It's not released yet, but I've been reading the early release version of Designing Data-Intensive Applications by Martin Kleppmann (http://shop.oreilly.com/product/0636920032175.do). I've found it pretty useful and well-written thus far. He does a good job of explaining concepts and then tying them to real-world implementations and examples. It's a good balance of theory and practical knowledge.
olalonde · 2016-11-27 · Original thread
Yes, it's only available as a preview at http://shop.oreilly.com/product/0636920032175.do but it's almost complete.
romanhn · 2016-06-16 · Original thread
Check out "Designing Data-Intensive Applications" by Martin Kleppmann - http://shop.oreilly.com/product/0636920032175.do. It's still work-in-progress, but covers a big chunk of distributed systems material, is up to date and has good reviews. You can read the 10 out of 12 chapters via Safari Books Online.

The downside is that I pre-ordered the book in November, expecting it in April and it now shows November of this year as the release date on Amazon. I'd be surprised to get it this year at all. Haven't found other books of similar scope and recency though, so I guess I'll wait some more.

sciurus · 2016-05-13 · Original thread
I'm looking forward to the publication of Martin Kleppmann's book Designing Data-Intensive Applications.

http://shop.oreilly.com/product/0636920032175.do?sortby=publ...

adamnemecek · 2015-09-22 · Original thread
The author of this paper, Martin Kleppmann, is writing a book "Designing Data-Intensive Applications" (http://shop.oreilly.com/product/0636920032175.do?cmp=af-stra...). I've been reading it via the O'Reilly immediate access and I think that it's the book you are looking for.
cosmolev · 2015-06-22 · Original thread
It's not even published yet, but based on the 7 of 11 chapters available I can say the book gonna be really fundamental.

Designing Data-Intensive Applications

The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

By Martin Kleppmann

http://shop.oreilly.com/product/0636920032175.do

http://dataintensive.net/

The author has great sense of humor.

The Architecture of Open Source applications [0]. I suggest also to check, but they are more specific, Enterprise Integration Patterns [1]. Building Big Data systems, Data Intensive applications [2], [3]. The best way to learn is practice - open source project or lending an appropriate job.

[0] http://aosabook.org/en/index.html

[1] http://martinfowler.com/books/eip.html

[2] http://www.manning.com/marz/

[3] http://shop.oreilly.com/product/0636920032175.do

gfodor · 2014-11-09 · Original thread
"Designing Data-Intensive Applications" is shaping up to be an excellent treatement of modern databases and their underpinnings. It's at an excellent level of abstraction, deep enough to convey database internals while high level enough (so far at least) to be able to cover a wide variety of database systems. It also has its feet firmly planted in database history, and is NoSQL-koolaid free. Highly recommended.

http://shop.oreilly.com/product/0636920032175.do