Elements of Distributed Computing

Found in 2 comments on Hacker News

oinksoft · 2014-12-23 · Original thread

Elements of Distributed by Computing by Vijay Garg is great: http://www.amazon.com/Elements-Distributed-Computing-Vijay-G...

strlen · 2014-01-03 · Original thread

Couple of suggestions:

1) I'd avoid mentioning CAP and FLP impossibility result much further in, this is like starting a discussion of mathematical logic or Computer Science with Godel's Incompleteness Theorem.

I'd definitely cover vector clocks and causality before: you really need this background in order to understand CAP/FLP. You may want to take a look at the approach Vijay Garg takes in his book:

http://www.amazon.com/Elements-Distributed-Computing-Vijay-G...

(E.g., use of vector clocks for proofs)

2) Don't over-focus on "webscale" (using the term only semi-ironically here) projects (I say this having contributed to multiple such projects). NFS, AFS, and CodaFS all raised very interesting questions about distributed state and disconnected operations. You may also note how some of the authors on the Plan9 (9fs, 9p, fossil) papers overlap with authors of some of the Google papers. "Mobile sensor networks" have been essentially an excuse to build distributed systems. Chord, Pastry, and Kademelia (sp? The System that Bittorrent uses) -- are actually far more truly "distributed" than many webscale systems (consistent hashing is a very simple DHT). Finally, don't forget that the Internet itself is a distributed system: the most successful eventually consistent, highly available distributed database is DNS.

I'd actually cover the web-giant infrastructure paper stuff a bit later on. While the Dynamo paper did have some novel contributions (regarding Gossip and anti-entropy), it's not what it's notable for: what's far more interesting in that paper is the way these concepts fit together.

I'd even hold my nose and cover CORBA: it's an interesting example in that it handled everything by the textbook, but failed due to complexity costs it had to incur in order do that. Far simpler RPC protocols and Web Services prevailed at the end by promising less. JINI may also be interesting to cover -- excellent system, far too ahead of its time -- Java VM and ecosystem were immature (anyone else remember Blackdown JDK on Linux?), marketed incorrectly (driver architecture?) to a market that wasn't ready for it (this is when dot-com startups could suffice on buying a couple of E10Ks or high-end Alphas).

3) This is an enormous and highly ambitious project and I commend you from embarking on it (I just re-read http://paulgraham.com/hs.html -- see the part about hard problems and ambition). Nonetheless, if you're serious about this, be sure to get plenty of editors -- there's a lot of room to make subtle mistakes (which I've done all the time). A lot of this can be genuinely confusing -- e.g., partitions aren't just due to networks, but at the same time not every type of a failure or a stop is a partition (I used to site GC pauses an example of a partition, but now I'm not so sure).

I'd help, but I've neither the cycles nor the habit of academic rigor; if you're still in touch with any profs from your uni, they could either help themselves or point you to others.

ISBN: 0471036005