Thursday, March 7, 2013

Key unsolved problems with Java jar architecture

Writing correct code and thorough testing is difficult enough, without having to worry about mechanics of how the build puts application code together. Any manual step that stands in the way of developer getting feedback from his code changes will eventually be paid for with fatigue, lowered productivity and bugs. Dependency system, architecture, and the build system should intuitively do the right thing, and be streamlined for the most common development operations. The purpose of this article is to describe a type of Java architecture I am familiar with and its underlying limitations. Throughout, I will interject some possible remedies, although I think the most value comes just from articulating the issues and recognizing that they can be solved.

World of JAR graphs

The kind of Java ecosystem I focus on here consists of applications, and modules they depend on. The distinguishing feature of an application is that it either has a Main class or is deployed on a web server. Duplicated code is abstracted into modules called jars, which can be shared among applications. Additional benefit of modules is that they can be precompiled into jars and stored in a repository and the application can be built much quicker. In order to stabilize the application promotion process, jars are versioned and versions are explicitly specified for every dependency. Jars can depend on other jars, and thus the whole ecosystem forms one or more directed graphs.

To get the idea of the kind of problems I will be describing here, note how the existence of versioned jars introduces intrinsic problems in the build. There may not be a single class in common between two versions of the same jar, and their dependency graph could be completely different. If we wanted to assure correctness, we should treat each version of the module as a separate module in its own right. But then how to account for the fact that most modules share most of the same class names between versions? If this is the kind of thing that keeps you awake at night, and frustrated during the day, please read on.

Local build problem

When code is located in the application, the change-test cycle is quick and seamless. Change code, run the application and observe new behavior, seem to be the smallest set of steps (although even this has been questioned by Bret Victor). This quick cycle allows for good flow and encourages small steps. However as soon as the code change has to happen in a module, additional steps are introduced into the feedback cycle. Once the change is made, module has to be published to a local repository, usually with a new version number, and the application has to specify that new version number before it is built. Having to perform these steps breaks the flow by making a developer shift focus to mechanics of the build.

Local build problem is adressed by many techniques, from introducing a project dependency in Eclipse, through maven snapshot builds, to git submodule feature. Neither of these are satisfactory. Ideally, the build system should detect if source code is present on the local developer machine and build from source, otherwise use the binary dependency (I first saw this idea mentioned in Adam Murdoch's post).

Publish problem

Once a change in the module has been made and tested with the local version of the application, it is time to publish that module to a public repository and start the promotion process for the app. This single logical operation involves multiple steps, for instance: manually change the version, commit the code, build out the jar to be published, make a change in the application to use the new version. In an ideal world, developer would just commit the code and everything else would happen automatically.

Possible solution to the local build and publish problem

We do not have to accept the status quo. Consider an example of a hypothetical system that supports low-friction workflow. This system will use version naming conventions to facilitate automation.

We will use the following version numbering scheme: moduleA-X.Y.Z.jar where X, Y, and Z are version numbers. Position X can be manually updated by developer, if there are breaking changes that need to be visually indicated to other developers or dependency conflict resolution mechanism. Position Z is reserved for local builds. Artifacts in public repository always have 0 in position Z. When a developer makes a change in the module and runs the deploy script, it automatically increments Z, say by assigning a timestamp to it. Application is always configured to let local workstation versions of the module win resolution with the versions in the public repository, ie the dependency is specified as "plus" type (moduleA-X.Y.+). When application is built locally, the build system automatically deletes from local workstation repository all versions from the previous day (facilitated by using timestamps). This prevents stale versions from unrelated work in other modules from getting into the local application build. When application is built for the promotion, say on a CI server, the + dependency resolves to officially published version in the public repository, which is always 0.

The publish problem is then somewhat alleviated by allocating position Y for automatically generated public versions of the jar. Upon commit, Continuous Integration server picks up a change, and starts a publish job, with timestamp or other incremental value in position Y (and 0 in position Z). This solution addresses only part of the publish problem by eliminating the manual version bump. The application's dependencies still need to be updated with a new public version of the jar. Read on to see how this problem could be solved with introduction of a new build primitive.

Bump downstream problem

When an update is needed in a jar that the application brings in transiently, the publish problem degenerates into what I refer to as bump downstream problem. Consider a case where AppX depends on moduleA, which depends on moduleB, which depends on moduleC. If the change needs to happen in moduleC, the versions have to be bumped in the hierarchy all the way to the application level. This is a manual and tedious process. If additional applications also depend on moduleC, it is unlikely that the developer would try to integrate his change in moduleC into those applications, because the manual process is in the way.

What could be useful is a build that supports a primitive operation "bump-downstream" with terminal node in the path as the parameter (ie from within moduleC's build directory we could execute command "bump-downstream(AppX)" to have all the necessary modules brought in locally, bumped, unit tested, committed, and deployed all the way through the application level. The process would terminate if tests failed at any level before getting to the application.

Workflow example

Let us take a look of what a streamlined workflow with a module may look like if we added "bump-downstream" operation to it.

Developer checks out AppX, and moduleA module source code. He writes a failing test in AppX. He makes a change in moduleA source, runs "local-deploy" target, and runs his tests in the app. He continues this cycle until he is satisfied with local development and testing. He then commits the module, and AppX code. He runs "bump-downstream(AppX)" from within moduleA.

Note that this workflow is the same no matter how distant a dependency moduleA is. There are no manual modifications to version numbers anywhere. All the complexity of maintaining the network of dependencies and publishing is managed automatically inside the bump-downstream operation. This kind of solution is achievable by any development team with the use of existing Java build tools and a bit of dedication.

Continuous integration problem

The existence of multiple versions of jars in the dependency graph poses a number of problems.

ModuleA and moduleB could depend on different versions of moduleC, which would be incompatible (ie a class could have been deleted or renamed). If an application needs both moduleA and moduleB, a choice has to be made which version of moduleC to bring. Dependency conflict resolution tools do not have a good way of dealing with incompatible versions, and all they can offer is "latest win" or "fail on major conflict" strategy. Here again OSGI offers to solve this problem, but at the price of complicating the build.

Another issue with multiple simultaneous live versions is that if a change in a jar has a compatibility issue downstream, the developer who makes it, gets no immediate feedback. If developer knew of the problem he could try another non-breaking solution, or at least would have everything fresh in his mind to resolve as many downstream problems as possible. The static dependency system that is so essential to application release stability, makes it hard to do the right thing.

I refer to this as Continuous integration problem, because I am biased in solving it through increasing integration pressure. Using the "bump-downstream" primitive and cautiously optimistic algorithm as described in Continuous Delivery book chapter on component architectures, the system can automatically increment all inter-jar dependencies in the graph, while leaving applications alone. This is not a trivial problem, as any automated system at this scale needs to have good traceability and has to be able to automatically recover from incompatible changes.

Appendix: Other problems

Since this article has a humble title that makes pretenses to completeness, I wanted to list some other problems here, which I do not discuss in depth. These problems are a result of insufficient checks provided by JVM or compiler that allow logic errors in dependency organization to go undetected. Feedback is delayed until a problem manifests itself as a hard to track bug.

Cycles in the graph

It is possible to have cycles in the application's dependency graph. The problems with dependency cycles have been well described, and even though the remedies are known, not even high profile projects, like logback are immune to it. Java compiler and Virtual Machine does nothing to prevent cycles.

Class name conflict

Java package system discourages name conflicts, but does not prevent them. Class org.foo.Bar can exist in more than one jar, and neither the compiler nor JVM do anything to prevent this. Which class gets used depends on runtime behavior of the app. One common scenario in open source world is when the module name or organization changes and the old versions continue to exist in public repositories under the old names. Another is when a framework requires certain classes by name, leaving implementation of these classes to third parties. Some frameworks will complain at runtime if multiple bindings exist in such case. OSGI can help solve this problem, but at the cost of complicating the build. This problem exists in spring-struts 3.0.6.RELEASE (see my post on another site).

Stale dependencies problem

Nothing prevents dependencies to be specified for a module that are not needed. While it is possible to point out which dependencies are not needed to compile existing code, only a thorough test suite can assure that all needed dependencies are exercised. I am not familiar with a tool that would identify unneeded runtime dependencies. Having unnecessary dependencies in the graph makes it harder to draw correct conclusions drawn from the graph analysis.

Dependencies not specified at the correct level

The dependency resolution systems I am familiar with will resolve the entire graph and add everything into the classpath before attempting to compile. This means that a dependency needed at compile time may be brought transiently through a different module. If next version of that module removes the dependency, this will result in a surprising compile error that may be difficult to track. While this problem is in principle solvable for compile time dependencies, it is very hard to both solve and debug for runtime dependencies.

Tools

Maven dependency plugin has an analyze target that addresses stale and missing dependencies. Tattletale is a straightforward to use tool that I have used to identify class name conflicts and circular dependencies.

No comments:

Post a Comment