Wednesday, October 29, 2008

Are code generators dumbing down our models?

I suspect that I am about to bring the Wrath of Ed down on me, but here it goes...

There are quite a few Java code generators that can take an arbitrary XML Schema and spit out tons of code that a developer doesn't have to write. There is JAXB, Apache XMLBeans and of course EMF. I am sure there are others. While code generators save us a lot of time, the push-button approach can lead to dumbing down of our models.

I would argue that there are relatively few core modeling patterns, but the flexibility of XML makes it easy to express these patterns in a variety of ways. The generated code then unintentionally surfaces (rather than hiding) these XML serialization details in the model layer. This forces the clients of the model to deal with inconsistent and often difficult to use API.

Consider the basic example of a boolean property. I have seen at least three different ways of representing that in XML:

<some-flag>true</some-flag>
<some-flag-enabled/>  <!-- absence means false --> 
<some flag value="true"/>  <!-- the attribute is required -->

The above three cases would generate different model code even though from the modeling perspective, they represent the same construct.

A more complex pattern is the selector-details construct where selector is a type enumeration and details provide settings specific to the type. I stopped counting how many different ways I've seen that pattern represented. Here are two of the most common examples:

Example 1: An explicit type element controls which property elements are applicable.

<type>...</type>  <!-- valid values are X and Y -->
<property-1>...</property-2>  <!-- associated with type X -->
<property-2>...</property-2>  <!-- associated with type Y -->
<property-3>...</property-3>  <!-- associated with type X and type Y -->

Example 2: In this case, the elements alternative-x and alternative-y are mutually exclusive. The element names are functioning as type selectors.

<alternative-x>
  <property-1>...</property-1>
  <property-3>...</property-3>
</alternative-x>
<alternative-y>
  <property-2>...</property-2>
  <property-3>...</property-3>
</alternative-y>

I would argue that the the above cases are semantically identical and therefore should have the same representation in the Java model. Of course, that doesn't happen. All of the existing code generators that I am aware of will produce drastically different code for these two alternatives.

So why should we care? I would argue that in many cases, while we are saving time by generating model code, the cost savings are at the expense of complicating the model consumer code. Recently, I took over a project at Oracle that was building a form-based editors for several rather complicated WebLogic Server deployment descriptors. The schemas of these descriptors evolved over many server releases and many people had a hand at augmenting them. The result is a complete lack of consistency. You could say that perhaps the schemas should have been more carefully evolved, but I would argue that they represent a rather realistic example of what real world complex schemas look like. In any case, the first attempt at building these editors was to generate an EMF model based on the XSDs and to build UI that would bind to EMF. That worked ok for a while, but eventually the UI code started to get too complicated. Many of the UI binding code had to be hand-written. It ultimately made sense to throw away the generated model code and to hand-code the model. That allowed us to arbitrarily control how model surfaces XML constructs and made it possible to reduce the amount of custom UI code that was necessary by literally several orders of magnitude.

I am certainly not trying to say that generated model code is a bad idea, but the ease with which it is possible to toss an XSD into a code generator and get a bunch of model code in return plays part in encouraging developers to pay less attention than is really necessary to the model layer.

Thursday, October 23, 2008

Common Servers View for Eclipse

There is a cool collaboration happening right now between WTP and DTP to build a shared Servers view that will replace the separate WTP's Servers view and DTP's Data Source Explorer view. The new view is built using the Common Navigator framework and will make it easy for other Eclipse Projects to contribute content. One of the important goals is to reduce clutter on user's workbench by collapsing many individual views into one. If you are interested in this effort, you should check out Bug 252239. There has also been some relevant discussion on Bug 245013 and Bug 247934.

So, I am looking at you... CVS Tooling, Subversive, DSDP Target Management, SOA Tools, etc. You know who you are. Come join the party.

Eclipse Project Declaration : Faceted Project Framework

I am blogging today to raise awareness of a project declaration that might have gone unnoticed in the inboxes of Eclipse committers and other members. The goal of the Faceted Project Framework is to provide a re-usable system that facilitates treating Eclipse projects as composed of units of functionality (called facets) that can be easily added or removed by users. The initial code contribution will come from a mature and a rather successful component in the Eclipse Web Tools Platform (WTP), but the ultimate goal for creating this independent project is to encourage broader adoption in contexts beyond that of WTP. I expect this project will evolve substantially as others bring their use cases to the table.

Anyone who has ever wondered if there was a better solution that a multitude of "Enable function X" menu items or has ever thought that users should be able to add and remove natures without hacking the .project file should get involved. I am looking for both potential contributors and potential consumers. If you have use cases or just random thoughts on this subject, I encourage you to jump in and get involved at the newly-created newsgroup.

Project Proposal: http://www.eclipse.org/proposals/fproj/

Newsgroup: http://www.eclipse.org/newsportal/thread.php?group=eclipse.fproj

Tuesday, October 21, 2008

Creating API, Lessons Learned

The subject of how to properly create and declare API is frequently debated at various communities within the Eclipse ecosystem. One of the thorniest issues is the disagreement over how to treat the so-called "provisional API", basically the API that either has not yet received sufficient feedback or has known issues that cannot be addressed prior to the release. There is a lively debate going on right now on this very subject on the mailing list of the Eclipse E4 project (the effort to build the next generation Eclipse platform - sometimes referred to as Eclipse 4.0), so I thought I should jump in and share our experience in this area at the Eclipse Web Tools Platform (WTP) project in hopes that others don't repeat the mistakes that were made.

It's important to note that opinions expressed here are mine alone. Other involved parties may not agree. Names and other identifying information is withheld to protect the guilty.

History

The start of WTP project was pretty rough. Large code contributions had to be rationalized in the context of a platform that is supposed to be extensible for many external adopters. One of the challenges was the belief by many of the committers from the company that made the code contribution that the APIs are good as is because they existed like that for a long time inside that company's commercial products and are therefore proven. Aligned against that was the growing feedback from new adopters who were saying that the APIs were insufficient and in some cases just plain wrong. Creating good APIs that are flexible enough to address variety of adopter usecases takes a very long time, but unfortunately time was running out. Major companies involved in WTP were pressuring the project to make a release so that they can build commercial products with it. After much debate, a compromise was reached. WTP was going to make a release, but we were not going to declare any API as stable. Everything will be labeled as provisional.

Sounds good in theory, right? What went wrong is that the concept of provisional API was not concretely defined as part of the initial agreement. Everyone (from committers to adopters) ended up with their own ideas about the meaning of the concept. The first release happened and WTP team went to work on the next release trying to improve the APIs based on the growing feedback. That's when the fireworks really started. Certain adopters were not particularly happy that WTP was continuously breaking them despite the fact that they were leveraging code clearly placed in internal packages or otherwise marked as provisional. Granted, adopters didn't have much choice if they wanted to build products on top of WTP, but that's what you get when a project is starting out. None of that seemed to matter and eventually WTP PMC bowed under the pressure by instituting a very restrictive code change policy. An "adopter usage scan tool" was created that could be used by WTP adopters to scan their code base for references to WTP code and send these reports back to WTP where they would be collected and used as a reference for determine whether a change is allowed or not. This new policy effectively negated the original promise that was made with regard to provisional API. The new contract covered everything including code previously designated as provisional and purely internal code. Instead of committers promoting API, anything that an adopter touched (as represented by these reports) effectively became API.

Work on improving API essentially ground to a halt. It just became too expensive to fix many of the larger problems. Technically, a committer could seek PMC approval to break code referenced in adopter scans, but exceptions were rarely granted. The argument that was frequently made is that by making a proposed change, many lines of code in adopter products would be effected. It's "cheaper" for committers to not make the change in question or at least make it in a way that's completely backwards compatible. I will leave it as an exercise for the reader to see the fallacy of that argument.

The end result is that WTP was left with large amounts of "in progress" API code in random internal packages that was effectively frozen because it became too expensive for committers to continue to work on this API within the imposed constraints. In many cases, providing the requisite backwards compatibility would have effectively doubled the amount of work. Some improvements that were easy to make in additive fashion continued to be made over the next few releases, but real progress essentially stopped.

Finally, last year a group of committers was convened to try to improve the situation by proposing a new API policy for WTP. The end result formally defined provisional API and started the process of phasing out the flawed adopter usage scans policy. As someone who was involved in drafting the new API policy, I can tell you that I still see many flaws in it, but it is an effort to take a step in the right direction. Only time will tell for sure.

Thoughts on API Creation

The following is a collection of my somewhat random thoughts on API creation and the related processes.

  • Ability to declare provisional API is an essential step in API creation. You cannot be sure that the API is right until you received sufficient and diverse feedback. It is impossible to attain that level of feedback with one release cycle. Most external adopters will not start looking at a release until it's close to being finished. They will not start building products on it until even later. The best you will get early on is "yeah, that looks about right", which is not good enough.
  • Placing provisional API in an internal package (such as internal.provisonal convention sometimes used by Eclipse Platform ) creates unnecessary churn for adopters and committers. Consider the case where provisional API turns out to be 90% correct. The advantage of the internal.provisional approach is that you don't have to separately define expectations for provisional code (it gets treated as internal by virtue of the package name), but I would argue that it's worth taking the time to define a separate contract for provisional API since allowing provisional API in non-internal packages results in less work for both adopters and committers.
  • It's important to have a good system for determining whether API is ready to be declared as fully supported (not provisional any more). Leaving the decision completely in the hands of committers or even project leads can lead to problems since people are inherently biased towards their own code. Some things to check when deciding if API is ready to be declared are level of documentation, unit test coverage, presence of outstanding API issues in bugzilla and level (as well as diversity) of adopter feedback. I prefer a system where a committer nominates the API for declaration and there is a process where other committers and adopters can raise objections.
  • It's important to carefully balance the needs of committers working on the API and adopters consuming the API. It's a mistake to only look at the problem from the perspective of resource expenditure. For any successful platform, there will always be far fewer resources working on the platform than consuming the platform. Trying to add too much protection for platform adopters can inhibit innovation in the platform and ultimately hurt those same adopters.