When modularity comes down to OSGi

Last week I attended a tech talk by Christian Schlichtherle, organized by Euro Staff in Berlin. The talk was about modularity and its different forms in software development. Christian talked about packages, libraries, APIs, dependency management and – as always when it comes down to modularity – about OSGi.

Every time somebody is talking about OSGi I get this tiny pain in my stomach. OSGi is a really great idea and a lot of people are working with it for many years right now. That’s cool! But I still have some questions about this…

Who wants to install code during run time on a server?

Let’s start with my biggest point of criticism. OSGi has two main points:

  • OSGi (maybe) solves the class path hell (we will come to this later)
  • OSGi introduces modularity during run time for Java software

But really – who builds software with a highly static language such as Java but then installs code during run time on his production environment!? Maybe I just don’t know the use-case, but every company I saw so far is looking for a stable server architecture. They want to deploy something in their test environment, then go to staging and finally do a green-blue-deployment. They don’t want to throw some new software in and let itself install it. And if they would do so, I would wonder how they deal with open user sessions, with buggy updates which breaks the system, with security and so on. In my opinion there are only very few cases where you really need and want modularity on the server.

OSGi is solving the class path hell!

It’s often said that OSGi is solving the class path hell. And when somebody is saying this, he comes up with the diamond problem:

Some package A needs a package B and C. B and C need a package D but in two different version. And you are screwed!

OSGi solves this problem, because every package will get its own class loader and every Java class is not only identified by its name but also by it class loaded. This makes it possible to load the same class in different version for package B and C. Problem solved!

But what happens when package B and C are returning objects from package D? Let’s make some stupid example: Package D contains a model: Person.class, Address.class. Package B and C are different factory strategies for this model. Package B reads data from a web service and package C from a local file. And package A is using both of them:

The problem with this code is that it looks perfectly fine and will compile, too. But it will fail on run time with a class cast exception! Why? Because User.class from package B was loaded with a different class loader as User.class from package C. So for Java, those are two different classes!

To solve this problem in OSGi you would write a service with an interface in package B and C, describe it in XML and wire it in package A. You would expose those services explicitly and create bunch of XML. Then you would package everything with an OSGi compatible manifest, assign version number to each bundle, re-pack third party libraries to OSGi bundles too, define a start-up order for the bundles, load an OSGi run time environment and then – a couple of hours later – you really solved the diamond problem.

IMHO

I think solving the class path hell in OSGi means to introduce a class loader and bundle hell instead. In pure Java it’s simple: if something is on the class path, you can use it. In OSGi, this is more complicated and if you miss to strictly use OSGi services to decouple you software you will end up with a highly coupled bunch of bundles which might throw class cast exception because you still cannot user version 1.5 and version 2 at the same time.

Best regards,
Thomas

Why using Spring’s @Value annotation is bad

@Value

Configuration is an important topic of every application which has more than a couple of hundred lines of code. In case you are using Spring, you would typically use Spring’s @Value annotation to load values from a Java properties file. This could look like this:

If you do so, Spring would inject the value for the key my.config.property.key from your properties file right into your class and you are done. Easy! Maybe too easy…

What will happen…

What will happen when you use this technique through your application? Well, whenever you need some property, you will inject it. You will inject it in services, controllers and components and you will properly inject the same property in different classes. That’s easy and absolutely what the mechanism is about.

However, this means nothing else than scattering your configuration about your whole application. Every class can pick a couple of properties to use. You will have no idea or control which class is using which properties. You will end-up with doing a full text search on your project to find out where a single key is used. You will have a lot of fun if you want to rename one of those keys or when you need to set each and every property for each and every class in your unit tests.

Configuration is an service

Instead of this, configuration should be an encapsulated service of your application such as any other functionality, too. You encapsulate persistence in DAOs, you encapsulate REST services in controllers and you encapsulate security in authenticators. So why not encapsulating configuration?

If you do so, you will have a single point of responsibility to load and get your configuration from. As any other service, you can easily change the implementation. Maybe you don’t want to load your properties from a properties file, but from a database or web service! If you encapsulate your configuration in a service, that’s not a big deal. You just change the implementation. You can also write some unit tests for your configuration (Why not? Configuration can become complicated when properties are combined or a certain configuration is determined based on a couple of other properties!) or you can do sanity checks for properties in the configuration service. You can do logging, you can do security (Maybe the path to the password file shouldn’t be visible everywhere, right?), you can do caching, you can do… well, you get the point.

Best regards,
Thomas

Thread pools with Java’s ExecutorService

Some days ago I had to do a migration of legacy data to a new system. The task was mainly about copying stuff from the old system to the new system. This was done via a REST interface of the new service. So the task was completely IO bound and most of the time the migration script was waiting for a response of a REST call. A perfect situation to do some threading!

The idea was to start a new thread for every data record to migrate. So while the first record was already waiting for the REST response, the second record would be build together and sent to the REST service and so on. This would hopefully speed up the process a lot.

However, since I got about ~5000 records, I didn’t want to spawn a new thread for each of those. 5000 threads (at the same time) would be hard – for my own system as well as for the REST service. So I decided to use Java’s ExecutorService to spawn a fixed number of threads at any time:

In the example above I do the following:

  1. I make an ExecutorService instance with a fixed size of 10 threads. So at any time, a maximum of 10 threads will run!
  2. I execute 100 threads. However, if 10 threads are already running, the ExecutorService will wait until one of those has finished, before starting the next one.
  3. I call shutdown() and awaitTermination(...) on the ExecutorService which will make it impossible to run new threads (except the 100 threads submitted before) and block until all 100 threads have finished.

This pattern makes it possible to submit a large number of threads for execution, but only run a fixed number of threads in a pool.

Best regards,
Thomas

Anti Patterns

A couple of days ago I have been to a tech talk from eBay in Berlin. The talk was hold by a developer of the eBay Kleinanzeigen team (a subsection of eBay in Germany) who talked about development patterns his team learned during the years. He mentioned behavioral patterns, design patterns and coding pattern his colleagues and he discovered as good practices for their projects. Since nearly everything could be seen as a pattern, the talk was a mix of learned lessons and actual advises on how to structure code. Cool! It is always good to talk and hear about coding experiences and it also reminded me at something which is often more important than patterns: anti patterns!

2015-01-22 19.27.11

The idea of anti patterns always comes to my mind when I’m reading some old code. And it comes more often if the code is super sophisticated and fancy. The more people try to engineer their code, the more they build things which looks like some complicated pattern but are actually anti patterns.

You like some examples? Here’s a very trivial anti pattern, not a big thing. It’s just a bad habit. It looks like that:

Anything wrong here? No, just a factory. Anything useless? Yes, a little bit. It’s very likely to see a class with only static methods (like a factory) to have a private constructor. Why? Because the developer thought it’s evil to instantiate it and he must prevent it for the safety of everybody in the room! So he creates something which look like a singleton (but isn’t). He does this in every suitable class, because it’s his patterns. But it is actually nothing. What would happen if you instantiate such an object? Nothing. And if you call static methods on that object? Nothing, too. It doesn’t look good, but it would work and your IDE would warn you. So what problem does it solve? There’s none. It just looks like some useful pattern, but actually does nothing but clutter your code. An anti pattern.

There are more of those small bad habits (like using final everywhere) but also bigger ones (like an interface for everything). I’m going to think about this a little more and come up with another example!

Best regards,
Thomas

Making a Spring bean session scoped

Spring can not only inject beans (components, services, entities, however you want to call them), it can also inject them according to a certain scope. This is great if you have state-full objects which belongs (for example) to a dedicated user and not to the whole application. To do that, Spring introduced the @Scope annotation.

The usage of @Scope is straight forward: just annotate your bean with it and specify the scope to use:

The value of @Scope can be one of singleton (one single instance for the whole app), prototype (a new instance for every injection), request (an instance per web request) or session (an instance per web session). See here.

The last two scopes (request and session) are only available in Spring web applications where the current session is exposed. To do so, you need to add a listener to the web.xml which makes the context available:

If you got both, your annotated bean and the listener in your web.xml, you will typically have another problem: how to use a session scoped bean in a normal (not session scoped) service?

To solve this problem, Spring provides a proxy mechanism. This means Spring can wire a proxy with the same interface as your actual bean instead of the bean itself. The proxy will not be session scoped (it will be a singleton), but it will fetch the according session scoped bean for each request. This means you can wire your session scoped bean in a not session scoped service and the proxy will do the rest behind the scenes.

To use such a proxy, you can just add it to the @Scope annotation like that:

If you do so, you will discover the next problem soon: what about your integration tests you made which probably will fail now as they got no session? The easiest fix for this, is to add the following bean configuration, which adds a session context for each thread (see the post of Tarun Sapra for the original description):

Best regards,
Thomas

A Windows SSO (for Java on client and server)

A couple of months ago I worked on a single sign-on (SSO) for a Windows client and server made in Java. The scenario was the following:

  • A client made with Java running on Windows
  • A server made with Java running on Windows
  • Both where logged-in to the same domain (an Active Directory LDAP)

The question was, how the server could get the identity (the name of the Windows account) of the client and – of course – how it could trust this information. But if the client would just send a name (e.g. from Java’s System.getProperty("user.name"); method), the client could send anything.

The solution for this dilemma (trust what the client sends to you) is to use a (so called) trusted third party. A trusted third party is an instance which both, client and server, know and trust. The client authenticates itself to this party and the server can verify requests against it. In the scenario above, the domain of the company (an Active Directory LDAP) is the trusted third party. Each client identifies itself against this domain when it logs-in to Windows. Its Windows username and password are checked by the domain/LDAP. On the other side, the server has also access to the domain controller and can verify information send by the client.

The nice thing about this is, that the Windows domain is already configured on nearly every machine which stands in a company. Every company, bigger than maybe five people, will have a Windows domain to log-in. Therefor, a SSO based on the Windows domain will work right out of the box in most cases and we don’t need and configuration in our Java code, since it is already configured in Windows.

Java Native Access (JNA)

To use Windows and the domain controller for authentication, we can use native Windows APIs. To call those APIs in Java, we can use the Java Native Access (JNA) library, which you can find on GitHub at https://github.com/twall/jna and on Maven central:

For example, to get all user groups of the current user, you would do:

Waffle

On top of JNA exists a library called Waffle which encapsulates all functionality you need to implement user authentication. You can find it on GitHub at https://github.com/dblock/waffle and also on Maven central:

You can use Waffle to create a token on the client, send it to the server (e.g. over HTTP or whatever) and to validate that token on the server. At the end of this process (create, send and validate) you will know on the server who the client is – for sure!

Here is an example of how to identify a client on the server. Note that this piece of code is executed completely on one machine. However, you could easily split it into two parts, one on the client and one on the server. The only thing you would need to do, is to exchange the byte[] tokens between client and server. I commented the appropriate lines of code.

(By the way, I asked this myself on Stackoverflow some times ago).

The only thing that is a little bit complicated with that solution is, that you need to do a small handshake between client and server. The client will send a token to the server, which will response with another token, which the client needs to answer again to get the final “you are authenticated” token from the server. To do this, you need to hold some state on the server for the duration of the handshake. Since the handshake is done in a second or two, I just used a limited cache from Google’s Guava library to hold maybe 100 client contexts on the server.

The exchanged tokens are validated against the underlying Windows and its domain.

Best regards,
Thomas

DeployMan (command line tool to deploy Docker images to AWS)

DeployMan

2014-07-29 11_34_11-Java EE - Eclipse

Yesterday, I published a tool called DeployMan on GitHub. DeployMan is a command line tool to deploy Docker images to AWS and was the software prototype for my master thesis. I wrote my thesis at Informatica in Stuttgart-Weilimdorf, so first of all, I want to say thank you to Thomas Kasemir for the opportunity to put this online!

Disclaimer

At the time I am writing this post, DeployMan is a pure prototype. It was created for academic research and as a demo for my thesis. It is not ready ready for production. If you need a solid tool to deploy Docker images (to AWS), have a look at Puppet, CloudFormation (for AWS), Terraform, Vagrant, fig (for Docker) or any other orchestration tool that came up in the last couple of years.

What DeployMan does

DeployMan can create new AWS EC2 instances and deploy a predefined stack of Docker images on it. To do so, DeployMan takes a configuration file called a formation. A formation specifies how the EC2 machine should look like and which Docker images (and which configurations) should be deployed. Docker images can either be deployed from a Docker registry (the public one or a private) or a tarballs from a S3 storage. Together with each image, a configuration folder will pulled from a S3 storage and mounted to the running container.

Here is an example of a formation which deploys a Nginx server with a static HTML page:

Interfaces

DeployMan provides a command line interface to start instances and do some basic monitoring of the deployment process. Here is a screenshot which shows some formations (which can be started) and the output of a started Logstash server:

Run_Logstash_Server

To keep track of the deployment process in a more pleasant way, DeployMan has a web interface. The web interface shows details to machines, such as the deployed images and which containers are running. Here is how a Logstash server would look like:

Machine_Details

The project

GitHub-Mark

You can find the project on GitHub at https://github.com/tuhrig/DeployMan. I wrote a detailed README.md which explains how to build and use DeployMan. To test DeployMan, you need an AWS account (there are also free accounts).

The project is made with Java 8, Maven, the AWS Java API, the Docker Java API and a lot of small stuff like Apache Commons. The web interface is based on Spark (for the server), Google’s AngularJS and Twitter’s Bootstrap CSS.

Best regards,
Thomas

JAXB vs. GSON and Jackson

XML vs. JSON

The discussion about XML and JSON is as old as the internet itself. If you Google XML vs. JSON you will get about 2.2 million results. And here is mine – a comparison of the parsing speed of JAXB, GSON and Jackson.

XML

XML holds data between named nodes which could have attributes, too. Each node can hold child nodes or some data. A typical XML file would look like this:

XML itself is pretty simple, but comes with a totally over engineered ecosystem. There exists different types of validation schemes, query languages, transformation languages and weird dialects. If you had an XML course during your studies, the most difficult part was to instantiate the XML parser in Java (File > DocumentBuilderFactory > DocumentBuilder > Document > Element).

JSON

JSON holds data as JavaScript objects. It has two types of notations: lists enclosed in [...] and objects enclosed in {...}. The document from above could look like this:

Compared to XML, JSON is more lean and lightweight. The format contains less clutter and there are not as much tools as for XML (e.g. not query or transformation languages).

Parsing Speed Test

Since JSON is a leaner format, I was wondering if a JSON file could be parsed faster than an equivalent XML file. In the end, both formats contain the same data in a hierarchical representation of nodes and elements. So is there any difference?

Test data

I create a simple test data. A class called Course could hold a list of Student objects and Topic objects. Each object has some properties such as a name. Before each test, I create a new Course object fill with random values. I made my tests with 200, 2.000, 20.000, 100.000 and 200.000 students/topics and repeated each test 500 times.

Candidates

JAXB XML Java official Java Architecture for XML Binding Maven
GSON JSON Java by Google Maven
Jackson JSON Java open source project Maven

JSON writes faster, a little

The first result of my tests was that Jackson (JSON) writes data a little bit faster than JAXB (XML) and GSON (JSON). The difference is not that big

JSON reads faster, a lot

More interesting was the fact that both JSON implementation (Jackson and GSON) read data much faster than JAXB.

JSON files are smaller

OK, this one is a no-brainer, but JSON files are (of course) smaller than XML files. In my example, the JSON file was about 68% of the size of the corresponding XML file. However, this highly depends on you data. If the data inside the nodes is large, the overhead of the XML tags is not as much as for very small data (e.g. <number>5</number>). The file generated by GSON has of course the same size as the file generated by Jackson.

Best regards,
Thomas

Docker vs. Heroku

Untitled drawing

Since a couple of weeks I am working with Docker as an application container for Amazon’s EC2. Despite my eternal fight with the Docker registry, I am absolutely amazed about Docker and enjoyed my experience.

But sometimes it is hard to explain what Docker is and what is has to do with all this cloud and PaaS and scalability topic. So I thought a little bit about some similar concepts between Docker and Heroku -maybe the most popular PaaS provider. But let’s start with a small…

Disclaimer

Docker and Heroku maybe have similar concepts (as you will see below), but they are two completely different things: while Docker is an open source software project, Heroku is a commercial service provider. You can download, build and install Docker on your own laptop or participate on its online community. On Heroku, you can create yourself an user account, pay some money (maybe) and get a really great service and hosting experience for your applications and code. So obviously, Docker and Heroku are very different things. But some of their core concepts have at least some similarities.

Docker vs. Heroku

Docker Heroku
Dockerfile BuildPack
Image Slug
Container Dyno
Index Add-Ons
CLI CLI

Docker and Heroku have a lot of similarities, especially in their core concepts. This makes Docker an interesting alternative for people who are looking for an alternative for Heroku – maybe on their own infrastructure.

Dockerfile vs. BuildPack

Docker images can be build with a Dockerfile. A Dockerfile is a set of commands, e.g. to add files and folders or to install packages. It defines how the final image should look like. Here is an example of a Dockerfile which installs memcache from the official website:

Heroku’s pendant are so called BuildPacks. BuildPacks are also a set of scripts which are used to setup the final state of an image. Heroku comes with a couple of default BuildPacks such as for Java, Python or the Play! framework. But you can also write your own. Here’s a snippet of the Heroku BuildPack for Java apps:

BTW, there are even projects to enable the usage of Heroku’s BuildPacks for Docker images (like this).

Image vs. Slug

When you run a Dockerfile, it creates a Docker image. Such an image contains all data, files, dependencies and settings you need for your application. You can exchange those images and start them right away on any machine with Docker installed.

When you run a build on Heroku, the BuildPack creates a so called Slug. Those slugs are “are compressed and pre-packaged copies of your application” as Heroku says. Similar to Docker’s images, they contain all dependencies and can be deployed and started in a very short time.

Container vs. Dyno

After starting a Docker image, you have a running container of this image. You can start an image multiple times, to get multiple isolated container of the same application. This enables you to build an image once and start easily multiple instances of it.

Heroku does the very same. After you build your app with your BuildPack, you get a slug which you can run on a Dyno. Such a dyno is “a lightweight container running a single user-specified command” as Heroku describes it.

Heroku even uses LXC for virtualization of their containers (dynos), which is the same technology Docker uses at its core.

Index vs. Add Ons

Docker images can be shared with the community. This is possible by uploading them to the official Docker index. All images on this index can be download and used by everyone. Most of them are documented very well and can be started with a single command. This makes it possible to run a lot of applications as building blocks. Here’s an example how to run elasticsearch:

A similar concept applies to Heroku’s add-on market. You can use (or buy) different pre-configured add-ons for your application (e.g. for elasticsearch). This makes it possible to build a complex app with common building blocks – such as Docker is doing it!

So both, Docker’s index and Heroku’s add-ons, underline a service oriented way of developing applications and reusing components.

CLI

2014-05-05 17_28_04-C__Users_tuhrig_Desktop_AWSRepo_formations_RELEASE_7.0.0.5_PIM.json (static) - S

Although the four points mentioned before are the most important concepts of both, Docker and Heroku have one more thing in common: both have a powerful command line interface which allows to manage containers. E.g. you can run heroku ps to see all your running slugs or docker ps to see all your running containers or you can request the log of a certain container.

Resources

Best regards,
Thomas

Development speed, the Docker remote API and a pattern of frustration

One of the challenges Docker is facing right now, is its own development speed. Since its initial release in January 2013, there have been over 7.000 commits (in one year!) by more than 400 contributors. There are more than 1.800 forks on GitHub and Dockers brings up approximately one new release per month. Docker is in a super fast development right now and this is really great to see!

However, this very high development speed leaves a lot of third-party tools behind. If you develop a tool for Docker, you have to keep a very high pace. If not, your tool is outdated within a month.

Docker remote API client libraries

A good example how this development speed affects projects, are the remote API libraries for Docker. Docker offers a JSON API to access Docker in a programmatic way. It enables you for example to list all running containers and stop a specific one. All via JSON and HTTP requests.

To use this JSON API in a convenient way, people created bindings for their favorite programming language. As you can see below, there exist bindings for JavaScript, Ruby, Java and many more. I used some of them on my own and I am really thankful for the great work their developers have done!

But many of those libraries are outdated at the time I am writing this. To be exact: all of them are outdated! The current remote API version of Docker is v1.11 (see here for more) which none of the remote API libraries supports right now. Many of them don’t even support v1.10 or v1.9.

Here is the list of remote API tools as you find it at http://docs.docker.io/reference/api/remote_api_client_libraries/.

Language Name Remote API
Python docker-py v1.9
Ruby docker-api v1.10
JavaScript (NodeJS) dockerode v1.10
JavaScript (NodeJS) ocker.io v1.7
JavaScript (Angular) WebUI dockerui v1.8
Java docker-java v1.8
Erlang erldocker v1.4
Go dockerclient v1.10
PHP Docker-PHP v1.9
Scala reactive-docker v1.10

How to deal with rapidly evolving APIs

How to deal with rapidly evolving APIs is a difficult question and IMHO Docker made the right decision. By solely providing a JSON API Docker chose a modern and universal technique. A JSON API can be used in any language or even in a web browser. JSON (together with a RESTful API) is the state-of-the-art technique to interact with services. Docker even leaves the possibility to fall back to an old API version by adding an version identifier to the request. Well done.

But the decision to stay “universal” (by solely providing a JSON API) also means to don’t get specific. Getting specific (which means to use Docker in a certain programming language) is left to the developers of third party tools. These tools are also evolving rapidly right now, no matter if those are remote API bindings, deployment tools (like Deis.io), or hosting solutions (like CoreOS). This enriches the Docker ecosystem and makes the project even more interesting.

Bad third party tools will fall back on you

The problem is, even if Docker made a good job (which they did!), outdated or poorly implemented third party tools will fall back on Docker, too. If you use a third party library (which you maybe found via the official website) and it works fine, you will be happy with Docker and the third party library. But if the library doesn’t work next month because you updated Docker and the library doesn’t take care of the API version, you will be frustrated about the tool and about Docker.

Pattern of frustration

This pattern of frustration occurs a lot in software development. Bad libraries cause frustrations about the tool itself. Let’s take Java as an example. A lot of people complain about Java that it is verbose, uses class-explosions as a pattern and makes things much more complicated as they should be. The famous AbstractSingletonProxyFactoryBean class of the Spring framework is just such an example (see +Paul Lewis). Another example is reading a file in Java which was an awful pain:

And even the new NIO API which came with Java 7 is not as easy as it could be:

You need to put a String into a Path to pass it into static method which output you need to put into a String again. Great idea! But what about something like this:

However, it is not the fault of Java, but of a poorly implemented third party tool. If you need to put a File into a FileReader which you need to put into a BufferedReader to be able to read a file line by line into a StringBuilder you use a terrible I/O library! But anyway, you will be frustrated about Java and how verbose it is (and maybe also about the API itself).

This pattern applies to many other things: You are angry about your smartphone, because of a poorly coded app. You are angry about Eclipse because it crashes with a newly installed plugin. And so on…

I hope this pattern of frustration will not apply to Docker and the community will develop a stable ecosystem of tools to provide a solid basis for development and deployment with Docker. A tool like Dockers lives trough its ecosystem. If the tools are buggy or outdated, people will be frustrated about Docker – and that would be a shame, because Docker is really great!

Best regards,
Thomas