Friday, November 9, 2007

Distributed computing != Parallel computing (obviously!)

While the title of the post appears to state an obvious, no-brainer fact, its still common to see patterns used to solve parallel computing problems being incorrectly applied to address those in distributed systems. The common justification usually takes the form of "You could consider a distributed system to be a group of parallel - albeit remote - processes communicating over the network, and hence many of the problems in this area should be readily solvable using learnings from implementing concurrent systems". Unfortunately, there are a few important differences one cannot ignore:

1)For any set of collaborating processes , replacing shared memory with the network has huge implications: bandwidth, latency, security and failure detection, to name a few.
2)There is no common notion of time across processes. You cannot infer chronology or causality relationships between two event based on their perceived occurrence. You need to take recourse to techniques like this.
3)One of the implications of #1 above is you do not have the luxury of using synchronization primitives (mutexes, barriers, condition variables, Test and Set fields, latches etc. provided by the operating system or the underlying hardware across remote processes). You need to invent techniques like this.

But does this imply that we need to invent distributed/network-aware versions of all primitives/services provided by the hardware or the operating system? Should such primitives keep user programs oblivious to the fact that they are interacting with remote processes? Should the network be abstracted out? Not so fast, as this Sun labs publication from the RMI/Jini team tells us.
So, what is the approach that successful middleware technologies adopt today? Well, that's a topic for another post.