Wednesday, October 29, 2008

Are code generators dumbing down our models?

I suspect that I am about to bring the Wrath of Ed down on me, but here it goes...

There are quite a few Java code generators that can take an arbitrary XML Schema and spit out tons of code that a developer doesn't have to write. There is JAXB, Apache XMLBeans and of course EMF. I am sure there are others. While code generators save us a lot of time, the push-button approach can lead to dumbing down of our models.

I would argue that there are relatively few core modeling patterns, but the flexibility of XML makes it easy to express these patterns in a variety of ways. The generated code then unintentionally surfaces (rather than hiding) these XML serialization details in the model layer. This forces the clients of the model to deal with inconsistent and often difficult to use API.

Consider the basic example of a boolean property. I have seen at least three different ways of representing that in XML:

<some-flag>true</some-flag>
<some-flag-enabled/>  <!-- absence means false --> 
<some flag value="true"/>  <!-- the attribute is required -->

The above three cases would generate different model code even though from the modeling perspective, they represent the same construct.

A more complex pattern is the selector-details construct where selector is a type enumeration and details provide settings specific to the type. I stopped counting how many different ways I've seen that pattern represented. Here are two of the most common examples:

Example 1: An explicit type element controls which property elements are applicable.

<type>...</type>  <!-- valid values are X and Y -->
<property-1>...</property-2>  <!-- associated with type X -->
<property-2>...</property-2>  <!-- associated with type Y -->
<property-3>...</property-3>  <!-- associated with type X and type Y -->

Example 2: In this case, the elements alternative-x and alternative-y are mutually exclusive. The element names are functioning as type selectors.

<alternative-x>
  <property-1>...</property-1>
  <property-3>...</property-3>
</alternative-x>
<alternative-y>
  <property-2>...</property-2>
  <property-3>...</property-3>
</alternative-y>

I would argue that the the above cases are semantically identical and therefore should have the same representation in the Java model. Of course, that doesn't happen. All of the existing code generators that I am aware of will produce drastically different code for these two alternatives.

So why should we care? I would argue that in many cases, while we are saving time by generating model code, the cost savings are at the expense of complicating the model consumer code. Recently, I took over a project at Oracle that was building a form-based editors for several rather complicated WebLogic Server deployment descriptors. The schemas of these descriptors evolved over many server releases and many people had a hand at augmenting them. The result is a complete lack of consistency. You could say that perhaps the schemas should have been more carefully evolved, but I would argue that they represent a rather realistic example of what real world complex schemas look like. In any case, the first attempt at building these editors was to generate an EMF model based on the XSDs and to build UI that would bind to EMF. That worked ok for a while, but eventually the UI code started to get too complicated. Many of the UI binding code had to be hand-written. It ultimately made sense to throw away the generated model code and to hand-code the model. That allowed us to arbitrarily control how model surfaces XML constructs and made it possible to reduce the amount of custom UI code that was necessary by literally several orders of magnitude.

I am certainly not trying to say that generated model code is a bad idea, but the ease with which it is possible to toss an XSD into a code generator and get a bunch of model code in return plays part in encouraging developers to pay less attention than is really necessary to the model layer.

Thursday, October 23, 2008

Common Servers View for Eclipse

There is a cool collaboration happening right now between WTP and DTP to build a shared Servers view that will replace the separate WTP's Servers view and DTP's Data Source Explorer view. The new view is built using the Common Navigator framework and will make it easy for other Eclipse Projects to contribute content. One of the important goals is to reduce clutter on user's workbench by collapsing many individual views into one. If you are interested in this effort, you should check out Bug 252239. There has also been some relevant discussion on Bug 245013 and Bug 247934.

So, I am looking at you... CVS Tooling, Subversive, DSDP Target Management, SOA Tools, etc. You know who you are. Come join the party.

Eclipse Project Declaration : Faceted Project Framework

I am blogging today to raise awareness of a project declaration that might have gone unnoticed in the inboxes of Eclipse committers and other members. The goal of the Faceted Project Framework is to provide a re-usable system that facilitates treating Eclipse projects as composed of units of functionality (called facets) that can be easily added or removed by users. The initial code contribution will come from a mature and a rather successful component in the Eclipse Web Tools Platform (WTP), but the ultimate goal for creating this independent project is to encourage broader adoption in contexts beyond that of WTP. I expect this project will evolve substantially as others bring their use cases to the table.

Anyone who has ever wondered if there was a better solution that a multitude of "Enable function X" menu items or has ever thought that users should be able to add and remove natures without hacking the .project file should get involved. I am looking for both potential contributors and potential consumers. If you have use cases or just random thoughts on this subject, I encourage you to jump in and get involved at the newly-created newsgroup.

Project Proposal: http://www.eclipse.org/proposals/fproj/

Newsgroup: http://www.eclipse.org/newsportal/thread.php?group=eclipse.fproj

Tuesday, October 21, 2008

Creating API, Lessons Learned

The subject of how to properly create and declare API is frequently debated at various communities within the Eclipse ecosystem. One of the thorniest issues is the disagreement over how to treat the so-called "provisional API", basically the API that either has not yet received sufficient feedback or has known issues that cannot be addressed prior to the release. There is a lively debate going on right now on this very subject on the mailing list of the Eclipse E4 project (the effort to build the next generation Eclipse platform - sometimes referred to as Eclipse 4.0), so I thought I should jump in and share our experience in this area at the Eclipse Web Tools Platform (WTP) project in hopes that others don't repeat the mistakes that were made.

It's important to note that opinions expressed here are mine alone. Other involved parties may not agree. Names and other identifying information is withheld to protect the guilty.

History

The start of WTP project was pretty rough. Large code contributions had to be rationalized in the context of a platform that is supposed to be extensible for many external adopters. One of the challenges was the belief by many of the committers from the company that made the code contribution that the APIs are good as is because they existed like that for a long time inside that company's commercial products and are therefore proven. Aligned against that was the growing feedback from new adopters who were saying that the APIs were insufficient and in some cases just plain wrong. Creating good APIs that are flexible enough to address variety of adopter usecases takes a very long time, but unfortunately time was running out. Major companies involved in WTP were pressuring the project to make a release so that they can build commercial products with it. After much debate, a compromise was reached. WTP was going to make a release, but we were not going to declare any API as stable. Everything will be labeled as provisional.

Sounds good in theory, right? What went wrong is that the concept of provisional API was not concretely defined as part of the initial agreement. Everyone (from committers to adopters) ended up with their own ideas about the meaning of the concept. The first release happened and WTP team went to work on the next release trying to improve the APIs based on the growing feedback. That's when the fireworks really started. Certain adopters were not particularly happy that WTP was continuously breaking them despite the fact that they were leveraging code clearly placed in internal packages or otherwise marked as provisional. Granted, adopters didn't have much choice if they wanted to build products on top of WTP, but that's what you get when a project is starting out. None of that seemed to matter and eventually WTP PMC bowed under the pressure by instituting a very restrictive code change policy. An "adopter usage scan tool" was created that could be used by WTP adopters to scan their code base for references to WTP code and send these reports back to WTP where they would be collected and used as a reference for determine whether a change is allowed or not. This new policy effectively negated the original promise that was made with regard to provisional API. The new contract covered everything including code previously designated as provisional and purely internal code. Instead of committers promoting API, anything that an adopter touched (as represented by these reports) effectively became API.

Work on improving API essentially ground to a halt. It just became too expensive to fix many of the larger problems. Technically, a committer could seek PMC approval to break code referenced in adopter scans, but exceptions were rarely granted. The argument that was frequently made is that by making a proposed change, many lines of code in adopter products would be effected. It's "cheaper" for committers to not make the change in question or at least make it in a way that's completely backwards compatible. I will leave it as an exercise for the reader to see the fallacy of that argument.

The end result is that WTP was left with large amounts of "in progress" API code in random internal packages that was effectively frozen because it became too expensive for committers to continue to work on this API within the imposed constraints. In many cases, providing the requisite backwards compatibility would have effectively doubled the amount of work. Some improvements that were easy to make in additive fashion continued to be made over the next few releases, but real progress essentially stopped.

Finally, last year a group of committers was convened to try to improve the situation by proposing a new API policy for WTP. The end result formally defined provisional API and started the process of phasing out the flawed adopter usage scans policy. As someone who was involved in drafting the new API policy, I can tell you that I still see many flaws in it, but it is an effort to take a step in the right direction. Only time will tell for sure.

Thoughts on API Creation

The following is a collection of my somewhat random thoughts on API creation and the related processes.

  • Ability to declare provisional API is an essential step in API creation. You cannot be sure that the API is right until you received sufficient and diverse feedback. It is impossible to attain that level of feedback with one release cycle. Most external adopters will not start looking at a release until it's close to being finished. They will not start building products on it until even later. The best you will get early on is "yeah, that looks about right", which is not good enough.
  • Placing provisional API in an internal package (such as internal.provisonal convention sometimes used by Eclipse Platform ) creates unnecessary churn for adopters and committers. Consider the case where provisional API turns out to be 90% correct. The advantage of the internal.provisional approach is that you don't have to separately define expectations for provisional code (it gets treated as internal by virtue of the package name), but I would argue that it's worth taking the time to define a separate contract for provisional API since allowing provisional API in non-internal packages results in less work for both adopters and committers.
  • It's important to have a good system for determining whether API is ready to be declared as fully supported (not provisional any more). Leaving the decision completely in the hands of committers or even project leads can lead to problems since people are inherently biased towards their own code. Some things to check when deciding if API is ready to be declared are level of documentation, unit test coverage, presence of outstanding API issues in bugzilla and level (as well as diversity) of adopter feedback. I prefer a system where a committer nominates the API for declaration and there is a process where other committers and adopters can raise objections.
  • It's important to carefully balance the needs of committers working on the API and adopters consuming the API. It's a mistake to only look at the problem from the perspective of resource expenditure. For any successful platform, there will always be far fewer resources working on the platform than consuming the platform. Trying to add too much protection for platform adopters can inhibit innovation in the platform and ultimately hurt those same adopters.

Tuesday, September 30, 2008

Eclipse Nexus Project Proposal

It's an unfortunate fact that many Eclipse projects operate in their own little worlds without knowledge of or collaboration with other projects. This leads to code duplication and visible seams in the finished Eclipse product. Part of the problem is social. It's hard to keep track of all the projects out there and even harder to reach out to discuss collaboration, but a big part of it is also due to the lack of infrastructure that facilitates code sharing efforts. Consider the situation where two sibling projects (no dependency on each other) want to collaborate on some shared code. There is really no effective way for this collaboration to take place. Where would you put the shared code?

In order to try to address this problem, I have been working a proposal for Eclipse Nexus Project that would take on facilitating such collaboration. Right now, I am discussing the draft proposal with the Eclipse Technology Project PMC which would serve as a host for Nexus as it gets off the ground.

Anyone interested in learning more or in joining the effort can read the draft proposal, add themselves to the wiki, contact me directly, etc.

Eclipse Nexus Project Proposal Wiki

Thursday, September 18, 2008

Facets FAQ : Supporting modular runtimes

This is the first entry in a series of posts where I will answer some frequently asked questions about the Faceted Project Framework in Eclipse.

Question: Some application servers such as WebLogic and JBoss are modular in nature. Various components of the server can be selectively present or absent. Further, even if the component is present, a particular configuration of the server might turn that component on or off. So say I have a portal server component and a corresponding facet for enabling tooling related to that portal component. I want the portal facet to only be available for selection if the targeted runtime includes the portal component. How do I set that up?

Answer: Faceted Project Framework models runtimes as made up of one or more runtime components. When you create a facet, you get to declare which runtime components it is supported on. In addition to what's specified explicitly, the facet will have implicit constraints on runtimes based on the dependencies that it declares on other facets.

Let's take the portal example described in the question. A Java application server will typically be modeled as composed of at least two runtime components. One will represent the JRE that the server is running on. The other will represent the base application server with all of its core capabilities. In the portal example, we will want to add a third runtime component to the mix to represent the portal module. Once we do that, we can easily map the portal facet to the portal component and facet will only be shown to the user if the runtime component is present in the runtime.

The following little snippet declares a portal runtime component and maps portal facet (assumed to be already declared) to it.

<extension point="org.eclipse.wst.common.project.facet.core.runtimes">
  <runtime-component-type id="sample-portal-component"/>
  <runtime-component-version type="sample-portal-component" version="1.0"/>
  <supported>
    <runtime-component id="sample-portal-component" version="1.0"/>
    <facet id="sample-portal-facet" version="1.0"/>
  </supported>
</extension>

So how do I add runtime components to my runtime? In WTP there is a bridge that translates runtimes defined using WTP Server Tools API to API that's understood by Faceted Project Framework. The default behavior of the bridge is to create two-component runtimes as described in the previous paragraph, but there is an extension point that lets you add a component provider that will be called when bridge is converting the runtime. Here is how you add declare a component provider:

<extension point="org.eclipse.jst.server.core.internalRuntimeComponentProviders">
  <runtimeComponentProvider
    id="[extension-id]"
    class="[provider-class-name]"
    runtimeTypeIds="[server-tools-runtime-id]"/>
</extension>

Once invoked, the component provider can examine various aspects of the runtime (such as state on disk at the location pointed to by the runtime or settings in the workspace). It can then construct and return the appropriate runtime components that will be merged with components created by the bridge to create a fully-specified runtime definition. Here is a sketch of how such component provider might look like:

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.eclipse.wst.common.project.facet.core.runtime.IRuntimeComponent;
import org.eclipse.wst.common.project.facet.core.runtime.IRuntimeComponentType;
import org.eclipse.wst.common.project.facet.core.runtime.IRuntimeComponentVersion;
import org.eclipse.wst.common.project.facet.core.runtime.RuntimeManager;
import org.eclipse.wst.server.core.IRuntime;
import org.eclipse.wst.server.core.internal.facets.RuntimeFacetComponentProviderDelegate;

public final class SampleRuntimeComponentProvider extends RuntimeFacetComponentProviderDelegate
{
    private static final IRuntimeComponentType PORTAL_TYPE 
        = RuntimeManager.getRuntimeComponentType( "sample.portal.component" );
    
    private static final IRuntimeComponentVersion PORTAL_VERSION_1 
        = PORTAL_TYPE.getVersion( "1.0" );
    
    public List<IRuntimeComponent> getRuntimeComponents( final IRuntime runtime )
    {
        final File location = runtime.getLocation().toFile();
        final List<IRuntimeComponent> components = new ArrayList<IRuntimeComponent>();
        
        if( isPortalPresent( location ) )
        {
            final IRuntimeComponent portalComponent
                = RuntimeManager.createRuntimeComponent( PORTAL_VERSION_1, null );
            
            components.add( portalComponent );
        }
    }
    
    private static boolean isPortalPresent( final File location )
    {
        return false;  // TODO: Implement the check for portal.
    }
}

Calling all UI experts

So I was working on an Eclipse form-based editor the other day when I came across a rather interesting UI puzzle. In my model, I have an enumeration field. Depending on what the user selects, certain other detail fields become relevant and need to be shown. Previously, I have solved this problem by putting the master combo field first followed by a details frame that contains all the detail fields and is updated when combo selection is made. That worked relatively well when that master-details block was in a section by itself, but as soon as it got surrounded by other fields, it started to become difficult to tell at a glance that the combo and the frame belonged together and were separate from other fields in the section.

See for yourself...

loggingSection

Besides dancing around the problem by adding a little white space above the master combo or putting the entire master-details block into a section by itself, one other approach occurred to me. Here, the master and detail fields are next to each other and enclosed together in a frame.

loggingSectionAllInFrame

Both approaches have their pluses and minuses. The second approach does make it more clear that master and details are tied together, but we loose the ability to identify what is master and what is details. My question for the UI experts out there is how do you typically render such master-details blocks? Do you use one of these approaches that I described? Something else entirely?

In San Francisco area next week

I will be in the San Francisco area visiting Oracle HQ next week. Send me an e-mail if you are in the area and want to get together for drinks after work and talk about Eclipse, WTP, etc.

Wednesday, September 10, 2008

Are Eclipse committer elections a little too open?

Since I became a committer on Eclipse WTP project a few years ago, I have witnessed and participated in many committer elections. One aspect of the way elections are conducted struck me as somewhat problematic. Clearly having transparent processes is important to an open source community, but I wonder if there is such thing as too much of a good thing. In all the committer elections that I observed, I have never seen a single negative vote. That somewhat defeats the point of having an election in the first place. As you can imagine, I have a theory for why negative votes don't happen and I am curious if other people agree with my observations... Consider that a nominated contributor likely knows a few existing committers on the project in other ways than just interactions over past contributions. Perhaps the nominee works for the same employer. These committers might have a vested interest in getting the nominee elected. Then suppose, you have another committer on the project who has a real objection to the nominee getting elected. What options does this committer have? He can vote his true opinion and likely face retribution from other committers on the project (thus making it more difficult for him to work on the project). He can bite his tongue and abstain. Or he might actually feel compelled to vote +1. Since voting record is visible, he might feel that even abstaining would jeopardize his working relationship with other committers on the project. I wonder if a system where only EMO knows who voted how and public record hides the names (comments would still be shown) would create an environment for more effective committer elections?

Friday, September 5, 2008

Designing API for fragile content

Just as with any other profession, programming involves quite a bit of monotonous and repetitive work. Interesting problems do come up, of course, but not so frequently that encountering one always brings a smile to my face. One of my deep interests in the programming profession is API design, so it would be fair to say that when I recently encountered a tricky API design puzzle I got pretty excited. So I was tasked with building a forms-based editor in Eclipse for an XML file with a certain schema. I started out by extending the XML source editor that's part of WTP. That gave the source view tab for my editor and I could access XML DOM that the source editor exposed. Any changes I made to the DOM would propagate to the source buffer. That's a pretty good start, but I did not want my forms UI working directly with DOM. I don't know if DOM API has any fans, but I am certainly not one of them. I didn't want my UI code getting cluttered with it. Ok, easy enough. Just take DOM and wrap it in API custom-created for the schema. Many of the elements in this particular document schema are tightly-typed. There are integers, class names, file paths, etc. My first cut at the API used these types in the getters and setters...
Integer getMinDuration(); void setMinDuration( Integer minDuration );
That works well enough when content is well-formed, but this is an XML file that's edited directly by users. Handling of malformed content is very important. Let's say that the min-duration element is found, but it's content cannot be parsed as an integer. The only option that the above API left me was to return null. That might be acceptable in some cases, but it's produces a rather poor user experience in the context of an editor. The text field that would be bound to this property would be blank, forcing the user to either type in a new value or revert to the source view in order to fix the existing value. What I wanted to do is show the malformed value in the text field together with a problem decoration so that the user can see and fix it easily. Ok, so let's augment the API a bit...
Integer getMinDuration(); String getMinDurationUnparsed(); void setMinDuration( Integer minDuration ); void setMinDuration( String minDuration );
That's better, but min duration has a default value and only positive integers are valid. A bit more API augmentation was in order...
Integer getMinDuration(); String getMinDurationUnparsed(); Integer getMinDurationDefault(); void setMinDuration( Integer minDuration ); void setMinDuration( String minDuration ); IStatus validateMinDuration();
Now I had enough information in the API to build the UI that I needed, but the API was starting to smell a bit. That's six methods for one element in the schema that has dozens of elements. There has to be a better way to structure this API. After some head-scratching, I decided to try returning a surrogate object from the getter method instead of the actual value. The surrogate would handle parsing, default values and validation...
IntegerValue getMinDuration(); void setMinDuration( Integer minDuration ); void setMinDuration( String minDuration ); class IntegerValue {     String getString();     String getString( boolean useDefault );     Integer getParsedValue();     Integer getParsedValue( boolean useDefault );     IStatus validate(); }
The getMinDuration() method would always return a non-null surrogate object. The caller then decides what aspect of value they are interested in querying. The IntegerValue class supplies default validation logic for handling unparsable content, but additional validation can be added. For instance, in this case only integers greater than zero are valid. Since range is a pretty common constraint, I made the IntegerValue constructor take the min and max values (in addition to the raw string value of the property and the default value). More complicated validation scenarios can be handled by subclassing the IntegerValue class. Note that only the getter deals with surrogate object. I wanted to keep the surrogate objects immutable so that they can be handled in a manner similar to basic value types without worrying about synchronization. When setting a value, you either have a raw value (either it can't be parsed or the code in question doesn't want to deal with parsing it) or you have a tightly-typed value. An overloaded setter method takes care of both of these scenarios. As you can imagine, it was simple at this point to extend this pattern to other types. I created a base class for all value types, which made it possible for some code to handle variety of types without knowing what they actually are. A good example of this is text field data binding code. Since any value can be retrieved and set as a string, any value can be bound to a text field.
abstract class Value<T> {     String getString();     String getString( boolean useDefault );     T getParsedValue();     T getParsedValue( boolean useDefault );     IStatus validate(); } class IntegerValue extends Value<Integer> {     ... }
I actually ended up using the same pattern even for properties that were strings by creating a StringValue class. Even though there is no parsing involved, the benefit of having consistent access to default value handling and validation made it worth it. So what do you think?