Thursday, December 13, 2007

BGGA closures: The end of many Java careers

Before you begin, I'd encourage you to read Neal Gafter's latest rebuttal to Josh Bloch's explanation of why BGGA can do more harm than good to Java. My two cents (as a Java developer) after going through both arguments:
  1. First, Neal Gafter states that Josh's examples were carefully chosen from "twisted" compiler test cases written for the BGGA closure implementation. Why did the test cases have to be "twisted" beyond comprehension? Well, its perhaps because complicated features beget complicated test cases. While I agree that test cases are generally more heavyweight (primarily due to the boiler plate and to cover edge cases) compared to the feature being tested, you will rarely find that they are an order of magnitude complicated than the feature itself. You won't find unrealistic test cases either. You will only need gnarled test cases for convoluted, counter-intuitive "features". The fact that Josh didn't have to look very far to come up with a "hard" example says a lot about the BGGA proposal itself.

  2. Let's look at Item 1 of Chapter 2, Effective Java: Programming Language Guide. It says:
    "One advantage of static factory methods is that, unlike constructors, they have names. If the parameters to a constructor do not, in and of themselves, describe the object being
    returned, a static factory with a well-chosen name can make a class easier to use and the resulting client code easier to read. For example, the constructor BigInteger(int, int,
    Random), which returns a BigInteger that is probably prime, would have been better expressed as a static factory method named BigInteger.probablePrime".

    Let's also refer to section 1.3 of one of the fundamental text books on programming, Structure and Interpretation of Computer Programs. In the section preceding higher order procedures (not to be confused with functions), the authors say:
    "Our programs would be able to compute cubes, but our language would lack the ability to express the concept of cubing. One of the things we should demand from a powerful programming language is the ability to build abstractions by assigning names to common patterns and then to work in terms of the abstractions directly"
    (Before you jump to the conclusion that this chapter favors the BGGA approach, let me remind you that it merely explains one programming paradigm and does not advocate introducing functions or procedures on a whim to a strongly typed, mature, Object Oriented language.)

    Finally, here are couple of quotes from Martin Fowler's bliki -
    "...this principle comes with an important consequence - that it's important that programmers put in the effort to make sure that this code is clear and readable. " and
    "...the first step to clear code is to accept that code is documentation, and then put the effort in to make it be clear. I think this comes down to what was taught to most programmers when they began to program... We as a whole industry need to put much more emphasis on valuing the clarity of code."

    So, having read the above excerpts (and assuming that seasoned Java developers get used to the ugly syntax in the near future), what can be said about a very simple BGGA construct like {int, int, Random => BigInteger} ??
    Is it readable? Intuitive? Self explanatory?
    And yes, while BGGA doesn't mandate such a coding style, it certainly provides plenty more scope for it than the Java language does in its current state. As Josh Bloch mentions, its not the syntax, but the semantics (like that of the ugly non local returns) that make BGGA complicated.

  3. It would be a mistake to evaluate the complexity of BGGA in isolation. Indeed, the 427 page generics FAQ, for instance, doesn't talk about generics in isolation. In reality, it is the interaction of generics with other language features that makes generics so hard to master. Its a curious coincidence that Angelika Langer on the generics FAQ thanks 3 of the 4 BGGA authors for patiently answering "countless questions posed" regarding generics. You'd expect that the BGGA authors' first initiative would be to simplify generics and make its behavior & interaction with other language features more predictable, instead of attempting to make changes that further convolute the type system and add cognitive load to the language for little benefit. (I'd highly recommend reading Alex Buckley's excellent write up on the role of complexity in language design and ways to evaluate the complexity.)

  4. From Sun's rebuttal to Microsoft's Delegates proposal -

    "Many of the advantages claimed for Visual J++ delegates -- type safety, object orientation, and ease of component interconnection -- are simply consequences of the security or flexibility of the Java object model. These advantages are routinely enjoyed by all programmers using the Java language, without resorting to non-standard language syntax or platform-specific VM extensions.

    Bound method references are simply unnecessary. They are not part of the Java programming language, and are thus not accepted by compliant compilers. Moreover, they detract from the simplicity and unity of the Java language. Bound method references are not the right path for future language evolution."

    Why were delegates or bound method references considered antithetical to the Java language? From the Java language specification,
    "The Java programming language is a general-purpose, concurrent, class-based, object-oriented language. It is designed to be simple enough that many programmers can achieve fluency in the language. The Java programming language is related to C and C++ but is organized rather differently, with a number of aspects of C and C++ omitted and a few ideas from other languages included. It is intended to be a production language, not a research language, and so, as C. A.R. Hoare suggested in his classic paper on language design, the design has avoided including new and untested features."

    Firstly, it is meant to be object oriented. Why would you wince if you saw C-style procedural code written in Java? Because Java is not meant to be used in such a style of programming. Each style of programming has its place (and has a dedicated set of languages for it). While I'm not against functional programming, I certainly wouldn't be using Java if I needed to program in such a style. I'd probably use Erlang, (like Amazon Simple DB supposedly does), Lisp or Scheme. It's horses for courses.
    Saying "me too" to every new language and trying to cram multiple programming styles into an evolved language like Java is nothing short of catastrophic, to say the least.
    If a language theorist wants to try that, he shouldn't attempt to force changes into a language that millions of programmers earn their livelihood from. (Hence the title of this post.) Remember, its a blue collar language. And its ubiquity is attributed to the ease with which a newbie can write reasonably reliable, readable Java code after relatively less training. Besides, as Josh mentioned in the video interview, he'd have probably included support for closures in a new language built ground-up for that purpose. The danger lies in continuously adding seemingly minor, self-contained features to a language that is more than a decade old, has a multi-billion dollar industry relying on it and is deployed in mission critical environments. If developers still seek functional programming with strong typing and a binding to the JVM, there's an excellent implementation in the form of Scala.

  5. To rephrase the latter half of the previous point, the practical (read economic) consequences of making sweeping changes to the foundation of a language or even a runtime cannot be ignored. For the same reason, Sun has always been (rightly) obsessed with maintaining binary compatibility of the Java platform. Also, you don't see every over zealous JUG or committee proposing a VM spec change with every release of Java. But does the same rule not apply to the Java language? Does it imply that the language can change swiftly in any manner subject to the whims of a few experts (or even a few JUGs)?

  6. As James Gosling stated (about GridBagLayout), "with great power comes great complexity". Assuming we need the "power" of BGGA closures, what cost are we paying for it? Were the costs even considered while making this proposal? Speaking of which, what (ideally) differentiates an engineer from a theorist is the former's ability to evaluate various tradeoffs with a certain pragmatism. Apart from the the use-cases mentioned by Bloch - fine grained concurrency and resource management which can be solved more elegantly with CICE - there's hardly a compelling use case that can't be addressed by present day Java. Do you want to use closures for event handling? Tom Ball, one of the initial members of the Swing/AWT engineering team, should know a thing or two about it? Read Charles Nutter's comment to that blog post and you'll see that BGGA closures aren't a significant improvement over anonymous inner classes and in fact perform poorly as proven by Josh. So, are we getting drastically better capabilities in return for the conundrum?

  7. Josh's presentation makes a passing reference to "dialects" being encouraged by BGGA. Why should we worry about dialects, you might ask. Firstly, it is important to realize that code lives on forever. A recent java.net poll found more than 50% of all running code being more than 6 -10 years old. Its highly unlikely that the author of the code is still sitting around maintaining that code. To give you more evidence, during my short career (working on telecom software, data center management software and the server side of a payments engine), I've had to read and modify existing code far more than write shiny new components. And despite such varied fields of application and diverse use cases, I've always found Java code behaving reasonably predictably with respect to how it reads. (What you read is what you observe in production, barring bugs and deployment goof-ups. No magic there. ) And with minimal effort (despite poor documentation in some cases), I was able to get along with my work within a few weeks (or a couple of months at most) of joining the new company. Does any of this sound surprising or astonishing to any seasoned Java developer? No. It is something you come to expect. And imagine how much money the employer saves by having the developer be productive within a few weeks of joining . Now, add BGGA's library/programmer defined control structures to this scenario: at each company that a developer moves to (or each new project he/she works on), he will have to learn a new "style" of Java programming (as against a new API). Knowing the language alone will be even more futile, and a new engineer will potentially have to go through months of unlearning an old library construct (because, you see, its a new flavor of Java altogether) and then learning a new style of Java programming.

  8. Looking at the above argument slightly differently, why would anyone choose Java for large scale (and more importantly, long lived) business applications when you can (theoretically) write hand-tuned platform specific code efficiently in C/C++/Assembly or what have you? In addition to platform independence, it is primarily due to the ease of reading and writing code in Java. What one engineer writes can be read by another. Assuming (wrongly) that Java is slower than C++ for long running server applications, why would anyone still use it? Because hardware is cheap. Maintenance isn't.

  9. Speaking of the CICE and ARM proposals, it'd be great to see more meat and concrete illustrations added to these proposals (maybe even a kitchen sink implementation - anyone willing to give it a shot?) to highlight their merits over other proposals. IMHO, the TODOs must be completed asap. While I'm unequivocally in favor of these two proposals, the truth remains that a sizeable chunk of the developer community is more easily convinced with specifics and tangibles.

  10. Lastly, I'm relieved that Josh finally spoke out and expressed his view point and the rationale behind his stand. His arguments were based on examples obtained from the BGGA implementation itself and technical publications from Java's formative days. Most importantly, his statements were steeped in pragmatism. I find it rather amusing that the voting trend at JavaPolis reversed soon after his talk. Until then, for nearly a year, the proponents of BGGA conveniently interpreted the community's demand for closures (and relief from the boiler plate of inner classes) as support for the BGGA closures.
    Having said that, no language feature should be determined on the basis of a poll or vote. Nor should we rush into such JSRs. Its taken a long time for Java to reach its current stature in the developer community, and the least we owe the language is a lot of thought, consideration, debate and deliberation before changing it significantly. We should also refrain from portraying the closures debate as a war between two rival camps. Our energies would be better spent by simply focusing on what the language needs (and doesn't need) to thrive in the next decade.

    Update: A few readers seem to believe that the title is a bit far fetched. Here's the explanation:
    You add more complexity to the language and make it harder for newbies or existing programmers to use it -> Programmers look for saner/more intuitive languages -> Java adoption plummets -> "Java specialists" have no option but to choose alternative careers.
    Is that too far fetched an extrapolation? Am I exaggerating? Trying to spread panic? Think again. Look at the battles that the Java platform itself is waging against Flex and other rival runtime environments, for instance. Would we rather stabilize the language and focus on strengthening the platform, or make the language harder to understand, and as a result further alienate the developer community?

Friday, November 9, 2007

Distributed computing != Parallel computing (obviously!)

While the title of the post appears to state an obvious, no-brainer fact, its still common to see patterns used to solve parallel computing problems being incorrectly applied to address those in distributed systems. The common justification usually takes the form of "You could consider a distributed system to be a group of parallel - albeit remote - processes communicating over the network, and hence many of the problems in this area should be readily solvable using learnings from implementing concurrent systems". Unfortunately, there are a few important differences one cannot ignore:

1)For any set of collaborating processes , replacing shared memory with the network has huge implications: bandwidth, latency, security and failure detection, to name a few.
2)There is no common notion of time across processes. You cannot infer chronology or causality relationships between two event based on their perceived occurrence. You need to take recourse to techniques like this.
3)One of the implications of #1 above is you do not have the luxury of using synchronization primitives (mutexes, barriers, condition variables, Test and Set fields, latches etc. provided by the operating system or the underlying hardware across remote processes). You need to invent techniques like this.

But does this imply that we need to invent distributed/network-aware versions of all primitives/services provided by the hardware or the operating system? Should such primitives keep user programs oblivious to the fact that they are interacting with remote processes? Should the network be abstracted out? Not so fast, as this Sun labs publication from the RMI/Jini team tells us.
So, what is the approach that successful middleware technologies adopt today? Well, that's a topic for another post.

Saturday, October 6, 2007

Java (seemingly) on the decline again (for the n'th time) ?

Yet another (wishful) doomsday prediction for Java surfaces in the form of pingdom's survey of cherrypicked sites that mostly run on LAMP. Despite the scant respect that surveys and benchmarks should rightly be treated with, I thought it was worth posting a short (though insufficient) response to this one. Seems rather amusing, doesn't it, that during more than a decade of Java's existence, there have been innumerable death knells sounded for Java and yet, it continues to survive, flourish and reinvent itself? Here's what I had to say response to Nati Shalom's query-

"The pingdom sample space is misleading, to say the least. Countless companies in the banking space, insurance, manufacturing and logistics rely on Java engines (not necessarily Java EE, but surely a Java SE server at least). Add to that eBay, Amazon, AdWords and the usage of Hadoop and Nutch in Yahoo, and you have every important player that matters. And we haven't even started talking about niche, little known implementations like those at JPL, Nasa. The talk of TCO and expense associated with J2EE is complete B.S, of course. There're arguably more F/OSS tools and technologies in the Java world than there are in the LAMP world. Additionally, the unparalleled security and trust associated with Java implementations, and the confidence they rightly inspire in the large financial institutions and telcos is worth a dedicated post altogether.
All of this makes you realize that pingdom's survey borders on FUD and ceases to be a respectable publication due to the sample space deliberately chosen."

Wednesday, July 11, 2007

Howdy

Hello there. This is meant to be my dedicated blog for ramblings and random thoughts on distributed systems and Java. (My personal blog has been in existence here for a while now. While at Sun, I used to blog here.) More write-ups in the weeks to come. Stay tuned.