The thesis writing toolbox

I spend the last weeks (or months?) at my desk writing my Master thesis about OSGi, PaaS and Docker. After my Bachelor thesis two years ago, this was my second big “literary work”. And as any computer scientist, I love tools! So here is my toolbox I used to write my thesis.

LyX/LaTeX

2014-06-20 14_07_02-Program Manager

I did most of my work with LyX, a WYSIWYM (what you see is what you mean) editor for LaTeX. LaTeX is a document markup language written by Leslie Lamport in 1984 for the type setting system TeX (by Donald E. Knuth). So what does this mean? It means that TeX is the foundation for LaTeX (and LyX). It gives you the ability to set text on paper, print words italic or leave space between two lines. But it is pretty raw and hard to handle. LaTeX is a collection of useful functions (called macros) to make working with TeX more pleasant. Instead of marking a certain word in bold and underline it, you could just say it is a headline and LaTeX will do the rest for you. A simple LateX document would look like this:

By choosing the document class article, LaTeX will automatically render your text to A4 portrait paper with 10 pt font size and so on. You will not have to worry about the layout, just the content. This is how the above code will look like as a PDF:

2014-06-20 14_54_57-Namenlos-1.pdf - TeXworks

The main difference between writing your document in LaTeX instead of (e.g.) Microsoft Word, is that you do not mix content and styling. If you write with Microsoft Word, you will always see your document as the final result (e.g. a PDF) would look like. The Microsoft Word document will look the same as the PDF. This principal is called WYSIWYG (what you see is what you get). In LaTeX however, you will only see your raw content until you press the compile button to generate a PDF. The styling is separated from the content and only applied in the last final step when generating the PDF. This is useful, because you do not have to fight with your formatting all the time – but you have to write some pretty ugly markup.

This is where LyX comes into play. LyX is an editor to work with LaTeX, without writing a single line of code. It follows the WYSIWYM (what you see is what you mean) principal. This means you will see your document not in its final form, but as you mean it. Headlines will be big, bold words will be bold and italic word will be italic. Just as you mean it. However, the final styling will come in the end when generating the PDF. The LyX screenshot from above will look like this as a PDF:

2014-06-20 15_16_12-thesis_de.pdf - Adobe Reader

JabRef

An important part of every thesis is literature. To organize my literature I use a tool called JabRef. Its website looks very ugly, but the tool itself is really cool. It lets you generate a library with all your books and articles you want to use as references. This is my personal library for my Master thesis (with 53 entries in total):

2014-06-20 15_25_13-Program Manager

JabRef will generate and organize as so called BibTex file. This is a simple text file, where every book has its own entry:

Every book or article I read will get its own entry in this file. I can make new entries with JabRef itself or with generators like www.literatur-generator.de. I can link my library to my LyX document, so every time I want to make a new reference or citation, I just open a dialog and pick the according reference:

2014-06-20 15_31_51-LyX_ ~_Dropbox_MA - SS14 - Master Arbeit_thesis_de.lyx

LyX will automatically create the reference/citation and the according entry in the bibliography:

2014-06-20 15_34_29-thesis_de.pdf - Adobe Reader

la­texd­iff

When you write a large document such as a thesis, you will probably make mistakes. Your university wants to have some changes and you will improve your document until its final version. However, a lot of people will not read it twice. Instead, they read it once, give you some advice and want to see the changes again. With la­texd­iff you can compare two LaTeX documents and create a document visualizing the changes. Here is an example from the PDF page shown above:

2014-06-20 15_41_35-diff.pdf - Adobe Reader

As you can see, I changed a word, inserted a comma and corrected a typo.

A great tutorial about la­texd­iff can be found at https://www.sharelatex.com/blog/2013/02/16/using-latexdiff-for-marking-changes-to-tex-documents.html.

Google Drive

To make graphics I use Google Drive. It is free, it is online and it is very easy. The feature I like most on Google Drive is the grid. You can align objects to each other so your drawing will look straight.

2014-06-20 15_54_26-Untitled drawing - Google Drawings

Dropbox

If you loose your thesis you are screwed! Since LyX/LaTeX documents are pure text, you could easily put it into a version control system such as Git or SVN. However, you will probably use some graphics, maybe some other PDF and stuff like that. To organize this, I simple use Dropbox. Not only will Dropbox save your files for you, it also has a history. So you can easily restore a previous version of your document:

2014-06-20 15_49_38-Revisionen - Dropbox

eLyXer – HTML export

elyxer

eLyXer is a tool to export LyX documents as HTML. Although, LyX is meant to create PDF documents in the first place, it is nice to have a HTML version to. eLyXer is already build in to LyX, so you can just export your document with some clicks:

2014-07-02 11_19_37-LyX_ ~_Dropbox_MA - SS14 - Master Arbeit_thesis_de.lyx

Here is the CSS I used to style the HTML output:

The result looks like this:

2014-07-02 11_24_13-Portierung einer Enterprise OSGi Anwendung auf eine PaaS

The thesis writing toolbox

LyXA WYSIWYM editor for LaTeX to write documents without a single line of code.
JabRefAn organizer for literature.
latexdiffCompares two LaTeX documents and generates a visualization of their differences.
eLyXerExports LyX documents to HTML

Best regards,
Thomas

JAXB vs. GSON and Jackson

XML vs. JSON

The discussion about XML and JSON is as old as the internet itself. If you Google XML vs. JSON you will get about 2.2 million results. And here is mine – a comparison of the parsing speed of JAXB, GSON and Jackson.

XML

XML holds data between named nodes which could have attributes, too. Each node can hold child nodes or some data. A typical XML file would look like this:

XML itself is pretty simple, but comes with a totally over engineered ecosystem. There exists different types of validation schemes, query languages, transformation languages and weird dialects. If you had an XML course during your studies, the most difficult part was to instantiate the XML parser in Java (File > DocumentBuilderFactory > DocumentBuilder > Document > Element).

JSON

JSON holds data as JavaScript objects. It has two types of notations: lists enclosed in [...] and objects enclosed in {...}. The document from above could look like this:

Compared to XML, JSON is more lean and lightweight. The format contains less clutter and there are not as much tools as for XML (e.g. not query or transformation languages).

Parsing Speed Test

Since JSON is a leaner format, I was wondering if a JSON file could be parsed faster than an equivalent XML file. In the end, both formats contain the same data in a hierarchical representation of nodes and elements. So is there any difference?

Test data

I create a simple test data. A class called Course could hold a list of Student objects and Topic objects. Each object has some properties such as a name. Before each test, I create a new Course object fill with random values. I made my tests with 200, 2.000, 20.000, 100.000 and 200.000 students/topics and repeated each test 500 times.

Candidates

JAXBXMLJavaofficial Java Architecture for XML BindingMaven
GSONJSONJavaby GoogleMaven
JacksonJSONJavaopen source projectMaven

JSON writes faster, a little

The first result of my tests was that Jackson (JSON) writes data a little bit faster than JAXB (XML) and GSON (JSON). The difference is not that big

JSON reads faster, a lot

More interesting was the fact that both JSON implementation (Jackson and GSON) read data much faster than JAXB.

JSON files are smaller

OK, this one is a no-brainer, but JSON files are (of course) smaller than XML files. In my example, the JSON file was about 68% of the size of the corresponding XML file. However, this highly depends on you data. If the data inside the nodes is large, the overhead of the XML tags is not as much as for very small data (e.g. <number>5</number>). The file generated by GSON has of course the same size as the file generated by Jackson.

Best regards,
Thomas

Who is using OSGi?

Who is using OSGi? If you take a look http://www.osgi.org/About/Members you will see more than a hundred members of the the OSGi Alliance and a lot of big players just like IBM or Oracle. But lets do a little investigation with Google Trends on our own.

Google Trends

Google Trends is a service where you can search for a particular key word and get a time line. The time line shows you when the key word was search and how many requests have been made. That is a great way to estimate how popular a certain technology is at a time. Google will also show you where the search requests have been made – and that is where we start.

OSGi in Google Trends

If we search for “OSGi” on Google Trends we will see a chart just like the one shown below. As we can see, OSGi is somehow over its peak and the interest in the technology is decreasing since a couple of years.

But even more interesting is the map which shows where the search requests have been made. As we can see, most requests came from China. On the fourth place is Germany.

If we take a closer look on Germany, we see that most requests come from the south of Germany.

But we can even get more specific and view the exact cities. It is a little bit hard to see in the chart, but you can click on the link at the bottom to see the full report. You will see Walldorf, Karlsruhe and Stuttgart on top. So what? Well, in Walldorf, there is one big player who is not on the list of the OSGi Alliance: SAP.

We can do the very same with the USA and we will end up in California and Redwood City, where companies like Oracle and Informatica are located.

Best regards,
Thomas

Docker Registry Rest API

The Docker Registry

The Docker registry is Docker’s in-build way to share images. It is an open-source project and can be found at https://github.com/dotcloud/docker-registry in the official repository of DotCloud. You can set it up on your private server (maybe in the cloud) at push and pull your images to it. You can also secure it, e.g. with SSL and a NGINX (maybe I will write about this later).

The Rest API

Similar to Docker itself, the registry provides a Rest API to interact with it. Using the Rest API, you can list all images, search or brows a certain repository. The only prerequisite is that you define a search back-end in the registry’s config.yaml:

Now you can use the Rest API like this:

List a certain repository

Search

Get info to a certain image

List all image

And thanks to bwilcox from StackOverflow, this is how you can list all images:

More

Best regards,
Thomas

Cloud vendors with Windows

The cloud is build on Linux – that is my own humbling opinion. But is it really? To answer this question for myself, I took a look at a bunch of cloud vendors to see what they got under the hood. Here is what I found.

But note that the list is neither complete nor representative. I am also comparing two very different things: IaaS and PaaS. While IaaS vendors like AWS provide virtual machines, PaaS vendors like Heroku provide a tooling to setup complete environments.

However, the list shows that most of the vendors use Linux as their base system and the more you go to the PaaS direction, the more Windows vanishes.

VendorWindowsLinuxTypeComment
Microsoft AzureyesyesIaaS
AWSyesyesIaaSAWS has a lot of Linux distributions and Windows version on their IaaS EC2.
AWS Elastic BeanstalkyesyesIaaS
eNlight CloudyesyesCentOS, Red Hat Enterprise Linux, SUSE Linux, Oracle Linux, Ubuntu, Fedora, Debian, Window Server 2003, Windows Server 2008, Windows 7.
Google App EnginePaaSGoogle App Engine has a sandbox and hides the OS.
Google Compute EngineyesyesIaaSLinux, FreeBSD, Microsoft Windows
HerokuyesPaaSUbuntu
JelasticyesPaaS
HP CloudyesIaaSBased on OpenStack.
OpenShiftyesPaaSRed Hat Enterprise Linux
Engine YardyesPaaSUbuntu, Gentoo
Rackspaceyesyes
Cloud FoundryyesPaaS

Best regards,
Thomas

How to know you are inside a Docker container

How to know that you are living in the Matrix? Well, I do not know, but at least I know how to tell you if you are inside a Docker container or not.

The Docker Matrix

Docker provides virtualization based on Linux Containers (LXC). LXC is a technology to provide operating system virtualization for processes on Linux. This means, that processes can be executed in isolation without starting a real and heavy virtual machine. All processes will be executed on the same Linux kernel, but will still have their own namespaces, users and file system.

An important feature of such virtualization is that applications inside a virtual environment do not know that they are not running on real hardware. An application will see the same environment, no matter if it is running on real or virtual resources.

/proc

However, there are some tricks. The /proc file system provides an interface to kernel data structures of processes. It is a pseudo file system and most of it is read-only. But every process on Linux will have an entry in this file system (named by its PID):

In this directory, we find information about the executed program, its command line arguments or working directory. And since the Linux kernel 2.6.24, we also find a file called cgroup:

This file contains information about the control group the process belongs to. Normally, it looks something like this:

But since LXC (and therefore Docker) makes use of cgroups, this file looks different inside a container:

As you can see, some resources (like the CPU) are belonging to a control group with the name of the container. We can make this a little bit easier if we use the keyword self instead of the PID. The keyword self will always reference the folder of the calling process:

And we can wrap this into a function (thanks to Henk Langeveld from StackOverflow):

More

Best regards,
Thomas

Layering of Docker images

Docker images are great! They are not only portable application containers, they are also building blocks for application stacks. Using a Docker registry or the public Docker index, you can compose setups just by downloading the right Docker image.

But Docker images are not only building blocks for applications, they also use a kind of “build block” themselves: layers. Every Docker image consists of a set of layers which make up the final image.

Layers

Let us consider the following Dockerfile to build a simple Ubuntu image with an Apache installation:

If we build the image by calling docker build -t test/a . we get an image called a, belonging to a repository called test. We can see the history of your image by calling docker history test/a:

The final image a consists of six intermediate images as we can see. The first three layers belongs to the Ubuntu base image and the rest is ours: one layer for every build instruction.

We will see the benefit of this layering if build a slightly different image. Let’s consider this Dockerfile to build nearly the same image (only the text file in the last instruction has a different name):

When we build this file, the first thing we will notice is that the build is much faster. Since we already created intermediate images for the first three instructions (namely FROM..., RUN... and RUN...), Docker will reuse those layers for the new image. Only the last layer will be created from scratch. The history of this image will look like this:

As we see, all layers are the same as for image a, except of the first one where we touch a different file!

Benefits

Those layers (or intermediate images or whatever you call them) have some benefits. Once we build them, Docker will reuse them for new builds. This makes the builds much faster. This is great for contentious integration, where we want to build an image at the end of each successful build (e.g. in Jenkins). But the build is not only faster, the images are also smaller, since intermediate images are shared between images.

But maybe the best things are rollbacks: since every image contains all of its building steps, we can easily go back to a previous step if we want so. This can be done tagging a certain layer. Let’s take a look at image b again:

If we want to make a rollback and remove the last layer (maybe the file should be called c.txt instead of b.txt) we can do so by tagging the layer 9977b78fbad7:

Let’s take a look at the new history:

Our last layer is gone and with the layer the text file b.txt!

Best regards,
Thomas

Docker vs. Heroku

Untitled drawing

Since a couple of weeks I am working with Docker as an application container for Amazon’s EC2. Despite my eternal fight with the Docker registry, I am absolutely amazed about Docker and enjoyed my experience.

But sometimes it is hard to explain what Docker is and what is has to do with all this cloud and PaaS and scalability topic. So I thought a little bit about some similar concepts between Docker and Heroku -maybe the most popular PaaS provider. But let’s start with a small…

Disclaimer

Docker and Heroku maybe have similar concepts (as you will see below), but they are two completely different things: while Docker is an open source software project, Heroku is a commercial service provider. You can download, build and install Docker on your own laptop or participate on its online community. On Heroku, you can create yourself an user account, pay some money (maybe) and get a really great service and hosting experience for your applications and code. So obviously, Docker and Heroku are very different things. But some of their core concepts have at least some similarities.

Docker vs. Heroku

DockerHeroku
DockerfileBuildPack
ImageSlug
ContainerDyno
IndexAdd-Ons
CLICLI

Docker and Heroku have a lot of similarities, especially in their core concepts. This makes Docker an interesting alternative for people who are looking for an alternative for Heroku – maybe on their own infrastructure.

Dockerfile vs. BuildPack

Docker images can be build with a Dockerfile. A Dockerfile is a set of commands, e.g. to add files and folders or to install packages. It defines how the final image should look like. Here is an example of a Dockerfile which installs memcache from the official website:

Heroku’s pendant are so called BuildPacks. BuildPacks are also a set of scripts which are used to setup the final state of an image. Heroku comes with a couple of default BuildPacks such as for Java, Python or the Play! framework. But you can also write your own. Here’s a snippet of the Heroku BuildPack for Java apps:

BTW, there are even projects to enable the usage of Heroku’s BuildPacks for Docker images (like this).

Image vs. Slug

When you run a Dockerfile, it creates a Docker image. Such an image contains all data, files, dependencies and settings you need for your application. You can exchange those images and start them right away on any machine with Docker installed.

When you run a build on Heroku, the BuildPack creates a so called Slug. Those slugs are “are compressed and pre-packaged copies of your application” as Heroku says. Similar to Docker’s images, they contain all dependencies and can be deployed and started in a very short time.

Container vs. Dyno

After starting a Docker image, you have a running container of this image. You can start an image multiple times, to get multiple isolated container of the same application. This enables you to build an image once and start easily multiple instances of it.

Heroku does the very same. After you build your app with your BuildPack, you get a slug which you can run on a Dyno. Such a dyno is “a lightweight container running a single user-specified command” as Heroku describes it.

Heroku even uses LXC for virtualization of their containers (dynos), which is the same technology Docker uses at its core.

Index vs. Add Ons

Docker images can be shared with the community. This is possible by uploading them to the official Docker index. All images on this index can be download and used by everyone. Most of them are documented very well and can be started with a single command. This makes it possible to run a lot of applications as building blocks. Here’s an example how to run elasticsearch:

A similar concept applies to Heroku’s add-on market. You can use (or buy) different pre-configured add-ons for your application (e.g. for elasticsearch). This makes it possible to build a complex app with common building blocks – such as Docker is doing it!

So both, Docker’s index and Heroku’s add-ons, underline a service oriented way of developing applications and reusing components.

CLI

2014-05-05 17_28_04-C__Users_tuhrig_Desktop_AWSRepo_formations_RELEASE_7.0.0.5_PIM.json (static) - S

Although the four points mentioned before are the most important concepts of both, Docker and Heroku have one more thing in common: both have a powerful command line interface which allows to manage containers. E.g. you can run heroku ps to see all your running slugs or docker ps to see all your running containers or you can request the log of a certain container.

Resources

Best regards,
Thomas

Development speed, the Docker remote API and a pattern of frustration

One of the challenges Docker is facing right now, is its own development speed. Since its initial release in January 2013, there have been over 7.000 commits (in one year!) by more than 400 contributors. There are more than 1.800 forks on GitHub and Dockers brings up approximately one new release per month. Docker is in a super fast development right now and this is really great to see!

However, this very high development speed leaves a lot of third-party tools behind. If you develop a tool for Docker, you have to keep a very high pace. If not, your tool is outdated within a month.

Docker remote API client libraries

A good example how this development speed affects projects, are the remote API libraries for Docker. Docker offers a JSON API to access Docker in a programmatic way. It enables you for example to list all running containers and stop a specific one. All via JSON and HTTP requests.

To use this JSON API in a convenient way, people created bindings for their favorite programming language. As you can see below, there exist bindings for JavaScript, Ruby, Java and many more. I used some of them on my own and I am really thankful for the great work their developers have done!

But many of those libraries are outdated at the time I am writing this. To be exact: all of them are outdated! The current remote API version of Docker is v1.11 (see here for more) which none of the remote API libraries supports right now. Many of them don’t even support v1.10 or v1.9.

Here is the list of remote API tools as you find it at http://docs.docker.io/reference/api/remote_api_client_libraries/.

LanguageNameRemote API
Pythondocker-pyv1.9
Rubydocker-apiv1.10
JavaScript (NodeJS)dockerodev1.10
JavaScript (NodeJS)ocker.iov1.7
JavaScript (Angular) WebUIdockeruiv1.8
Javadocker-javav1.8
Erlangerldockerv1.4
Godockerclientv1.10
PHPDocker-PHPv1.9
Scalareactive-dockerv1.10

How to deal with rapidly evolving APIs

How to deal with rapidly evolving APIs is a difficult question and IMHO Docker made the right decision. By solely providing a JSON API Docker chose a modern and universal technique. A JSON API can be used in any language or even in a web browser. JSON (together with a RESTful API) is the state-of-the-art technique to interact with services. Docker even leaves the possibility to fall back to an old API version by adding an version identifier to the request. Well done.

But the decision to stay “universal” (by solely providing a JSON API) also means to don’t get specific. Getting specific (which means to use Docker in a certain programming language) is left to the developers of third party tools. These tools are also evolving rapidly right now, no matter if those are remote API bindings, deployment tools (like Deis.io), or hosting solutions (like CoreOS). This enriches the Docker ecosystem and makes the project even more interesting.

Bad third party tools will fall back on you

The problem is, even if Docker made a good job (which they did!), outdated or poorly implemented third party tools will fall back on Docker, too. If you use a third party library (which you maybe found via the official website) and it works fine, you will be happy with Docker and the third party library. But if the library doesn’t work next month because you updated Docker and the library doesn’t take care of the API version, you will be frustrated about the tool and about Docker.

Pattern of frustration

This pattern of frustration occurs a lot in software development. Bad libraries cause frustrations about the tool itself. Let’s take Java as an example. A lot of people complain about Java that it is verbose, uses class-explosions as a pattern and makes things much more complicated as they should be. The famous AbstractSingletonProxyFactoryBean class of the Spring framework is just such an example (see +Paul Lewis). Another example is reading a file in Java which was an awful pain:

And even the new NIO API which came with Java 7 is not as easy as it could be:

You need to put a String into a Path to pass it into static method which output you need to put into a String again. Great idea! But what about something like this:

However, it is not the fault of Java, but of a poorly implemented third party tool. If you need to put a File into a FileReader which you need to put into a BufferedReader to be able to read a file line by line into a StringBuilder you use a terrible I/O library! But anyway, you will be frustrated about Java and how verbose it is (and maybe also about the API itself).

This pattern applies to many other things: You are angry about your smartphone, because of a poorly coded app. You are angry about Eclipse because it crashes with a newly installed plugin. And so on…

I hope this pattern of frustration will not apply to Docker and the community will develop a stable ecosystem of tools to provide a solid basis for development and deployment with Docker. A tool like Dockers lives trough its ecosystem. If the tools are buggy or outdated, people will be frustrated about Docker – and that would be a shame, because Docker is really great!

Best regards,
Thomas

Java 8 for Eclipse Kepler via the Eclipse Marketplace

Eclipse Foundation Announces Java 8 Support! One day after my post about Java 8 in Eclipse Kepler (4.3) and Eclipse Luna (4.4), the Eclipse Foundation announced official support for Java 8 in Eclipse Kepler. Here is there blog post straight outta Ottawa, Canada:

http://eclipse.org/org/press-release/20140402_eclipse_foundation_announces_java8_support.php

You can now install Java 8 support to Eclipse Kepler (4.3) via the Eclipse Marketplace:

2014-04-0Eclipse

A little bit late (!), but finally the easiest way to use Java 8 in Eclipse.

Best regards,
Thomas