Big Data: Principles and best practices of scalable realtime data systems

Found in 2 comments on Hacker News

ignoramous · 2020-04-14 · Original thread

Lambda architecture for data processing, as popularized by Nathan Marz et al [0], has two components, the Batch layer and the Stream layer. At a high level, Batch trades quality for staleness whilst Stream optimises for freshness at the expense of quality [1].

I believe what GP means by Lambda is that, you'd need a system that batch processes the data to be amended / changed (reprocess older data) but stream processes whatever that's required for real-time [2].

An alternative is the Kappa architecture proposed initially by Jay Kreps [3][4], co-creator of Apache Kafka.

---

[0] https://www.amazon.com/dp/1617290343

[1] https://en.wikipedia.org/wiki/Lambda_architecture

[2] https://speakerdeck.com/druidio/real-time-analytics-with-ope...

[3] https://engineering.linkedin.com/distributed-systems/log-wha...

[4] https://dataintensive.net/

mindcrash · 2016-02-25 · Original thread

You probably might want to read this (for free): http://book.mixu.net/distsys/single-page.html

And pay a little to read this book: http://www.amazon.com/Designing-Data-Intensive-Applications-...

And this one: http://www.amazon.com/Big-Data-Principles-practices-scalable...

Nathan Marz brought Apache Storm to the world, and Martin Kleppmann is pretty well known for his work on Kafka.

Both are very good books on building scalable data processing systems.

ISBN: 1617290343