A Windows SSO (for Java on client and server)

A couple of months ago I worked on a single sign on (SSO) for a Windows client and server made in Java. The scenario was the following:

  • A client made with Java running on Windows
  • A server made with Java running on Windows
  • Both where logged-in to the same domain (an Active Directory LDAP)

The question was, how the server could get the identity (the name of the Windows account) of the client and – of course – how it could trust this information. But if the client would just send a name (e.g. from Java’s System.getProperty("user.name"); method), the client could send anything.

The solution for this dilemma (trust what the client sends to you) is to use a (so called) trusted third party. A trusted third party is an instance which both, client and server, know and trust. The client authenticates itself to this party and the server can verify requests against it. In the scenario above, the domain of the company (an Active Directory LDAP) is the trusted third party. Each client identifies itself against this domain when it logs-in to Windows. Its Windows username and password are checked by the domain/LDAP. On the other side, the server has also access to the domain controller and can verify information send by the client.

The nice thing about this is, that the Windows domain is already configured on nearly every machine which stands in a company. Every company, bigger than maybe five people, will have a Windows domain to log-in. Therefor, a SSO based on the Windows domain will work right out of the box in most cases and we don’t need and configuration in our Java code, since it is already configured in Windows.

Java Native Access (JNA)

To use Windows and the domain controller for authentication, we can use native Windows APIs. To call those APIs in Java, we can use the Java Native Access (JNA) library, which you can find on GitHub at https://github.com/twall/jna and on Maven central:

For example, to get all user groups of the current user, you would do:

Waffle

On top of JNA exists a library called Waffle which encapsulates all functionality you need to implement user authentication. You can find it on GitHub at https://github.com/dblock/waffle and also on Maven central:

You can use Waffle to create a token on the client, send it to the server (e.g. over HTTP or whatever) and to validate that token on the server. At the end of this process (create, send and validate) you will know on the server who the client is – for sure!

Here is an example of how to identify a client on the server. Note that this piece of code is executed completely on one machine. However, you could easily split it into two parts, one on the client and one on the server. The only thing you would need to do, is to exchange the byte[] tokens between client and server. I commented the appropriate lines of code.

(By the way, I asked this myself on Stackoverflow some times ago).

The only thing that is a little bit complicated with that solution is, that you need to do a small handshake between client and server. The client will send a token to the server, which will response with another token, which the client needs to answer again to get the final “you are authenticated” token from the server. To do this, you need to hold some state on the server for the duration of the handshake. Since the handshake is done in a second or two, I just used a limited cache from Google’s Guava library to hold maybe 100 client contexts on the server.

The exchanged tokens are validated against the underlying Windows and its domain.

Best regards,
Thomas

DeployMan (command line tool to deploy Docker images to AWS)

DeployMan

2014-07-29 11_34_11-Java EE - Eclipse

Yesterday, I published a tool called DeployMan on GitHub. DeployMan is a command line tool to deploy Docker images to AWS and was the software prototype for my master thesis. I wrote my thesis at Informatica in Stuttgart-Weilimdorf, so first of all, I want to say thank you to Thomas Kasemir for the opportunity to put this online!

Disclaimer

At the time I am writing this post, DeployMan is a pure prototype. It was created for academic research and as a demo for my thesis. It is not ready ready for production. If you need a solid tool to deploy Docker images (to AWS), have a look at Puppet, CloudFormation (for AWS), Terraform, Vagrant, fig (for Docker) or any other orchestration tool that came up in the last couple of years.

What DeployMan does

DeployMan can create new AWS EC2 instances and deploy a predefined stack of Docker images on it. To do so, DeployMan takes a configuration file called a formation. A formation specifies how the EC2 machine should look like and which Docker images (and which configurations) should be deployed. Docker images can either be deployed from a Docker registry (the public one or a private) or a tarballs from a S3 storage. Together with each image, a configuration folder will pulled from a S3 storage and mounted to the running container.

Here is an example of a formation which deploys a Nginx server with a static HTML page:

Interfaces

DeployMan provides a command line interface to start instances and do some basic monitoring of the deployment process. Here is a screenshot which shows some formations (which can be started) and the output of a started Logstash server:

Run_Logstash_Server

To keep track of the deployment process in a more pleasant way, DeployMan has a web interface. The web interface shows details to machines, such as the deployed images and which containers are running. Here is how a Logstash server would look like:

Machine_Details

The project

GitHub-Mark

You can find the project on GitHub at https://github.com/tuhrig/DeployMan. I wrote a detailed README.md which explains how to build and use DeployMan. To test DeployMan, you need an AWS account (there are also free accounts).

The project is made with Java 8, Maven, the AWS Java API, the Docker Java API and a lot of small stuff like Apache Commons. The web interface is based on Spark (for the server), Google’s AngularJS and Twitter’s Bootstrap CSS.

Best regards,
Thomas

Presentation of my master thesis

Over the last six months, I wrote my master thesis about porting an enterprise OSGi application to a PaaS. Last Monday, the 21th Juli 2014, I presented the main results of my thesis to my professor (best greetings to you, Mr. Goik!) and to my colleges (thanks to all of you!) at Informatica in Stuttgart-Weilimdorf, Germany (where I had written my thesis based on one of their product information management applications, called Informatica PIM).

Here are the slides of my presentation.

While my master thesis also covers topics like OSGi, VMs and JEE application servers, the presentation focuses on my final solution of a deployment process for the cloud. Based on Docker, the complete software stack used for the Informatica PIM server was packaged into separate, self-contained images. Those images have been stored in a repository and were used to automatically setup cloud instances on Amazon Web Services (AWS).

The presentation gives answers to the following questions:

  • What is cloud computing and what is AWS?
  • What are containers and what is Docker?
  • How can we deploy containers?

To automate the deployment process of Docker images, I implemented my own little tool called DeployMan. It will show up at the end of my slides and I will write about it in a couple of days here. Although there are a lot of tools out there to automate Docker deployments (e.g. fig or Maestro), I wanted to do my own experiments and to create a prototype for my thesis.

Enjoy!

Best regards,
Thomas

The thesis writing toolbox

I spend the last weeks (or months?) at my desk writing my Master thesis about OSGi, PaaS and Docker. After my Bachelor thesis two years ago, this was my second big “literary work”. And as any computer scientist, I love tools! So here is my toolbox I used to write my thesis.

LyX/LaTeX

2014-06-20 14_07_02-Program Manager

I did most of my work with LyX, a WYSIWYM (what you see is what you mean) editor for LaTeX. LaTeX is a document markup language written by Leslie Lamport in 1984 for the type setting system TeX (by Donald E. Knuth). So what does this mean? It means that TeX is the foundation for LaTeX (and LyX). It gives you the ability to set text on paper, print words italic or leave space between two lines. But it is pretty raw and hard to handle. LaTeX is a collection of useful functions (called macros) to make working with TeX more pleasant. Instead of marking a certain word in bold and underline it, you could just say it is a headline and LaTeX will do the rest for you. A simple LateX document would look like this:

By choosing the document class article, LaTeX will automatically render your text to A4 portrait paper with 10 pt font size and so on. You will not have to worry about the layout, just the content. This is how the above code will look like as a PDF:

2014-06-20 14_54_57-Namenlos-1.pdf - TeXworks

The main difference between writing your document in LaTeX instead of (e.g.) Microsoft Word, is that you do not mix content and styling. If you write with Microsoft Word, you will always see your document as the final result (e.g. a PDF) would look like. The Microsoft Word document will look the same as the PDF. This principal is called WYSIWYG (what you see is what you get). In LaTeX however, you will only see your raw content until you press the compile button to generate a PDF. The styling is separated from the content and only applied in the last final step when generating the PDF. This is useful, because you do not have to fight with your formatting all the time – but you have to write some pretty ugly markup.

This is where LyX comes into play. LyX is an editor to work with LaTeX, without writing a single line of code. It follows the WYSIWYM (what you see is what you mean) principal. This means you will see your document not in its final form, but as you mean it. Headlines will be big, bold words will be bold and italic word will be italic. Just as you mean it. However, the final styling will come in the end when generating the PDF. The LyX screenshot from above will look like this as a PDF:

2014-06-20 15_16_12-thesis_de.pdf - Adobe Reader

JabRef

An important part of every thesis is literature. To organize my literature I use a tool called JabRef. Its website looks very ugly, but the tool itself is really cool. It lets you generate a library with all your books and articles you want to use as references. This is my personal library for my Master thesis (with 53 entries in total):

2014-06-20 15_25_13-Program Manager

JabRef will generate and organize as so called BibTex file. This is a simple text file, where every book has its own entry:

Every book or article I read will get its own entry in this file. I can make new entries with JabRef itself or with generators like www.literatur-generator.de. I can link my library to my LyX document, so every time I want to make a new reference or citation, I just open a dialog and pick the according reference:

2014-06-20 15_31_51-LyX_ ~_Dropbox_MA - SS14 - Master Arbeit_thesis_de.lyx

LyX will automatically create the reference/citation and the according entry in the bibliography:

2014-06-20 15_34_29-thesis_de.pdf - Adobe Reader

la­texd­iff

When you write a large document such as a thesis, you will probably make mistakes. Your university wants to have some changes and you will improve your document until its final version. However, a lot of people will not read it twice. Instead, they read it once, give you some advice and want to see the changes again. With la­texd­iff you can compare two LaTeX documents and create a document visualizing the changes. Here is an example from the PDF page shown above:

2014-06-20 15_41_35-diff.pdf - Adobe Reader

As you can see, I changed a word, inserted a comma and corrected a typo.

A great tutorial about la­texd­iff can be found at https://www.sharelatex.com/blog/2013/02/16/using-latexdiff-for-marking-changes-to-tex-documents.html.

Google Drive

To make graphics I use Google Drive. It is free, it is online and it is very easy. The feature I like most on Google Drive is the grid. You can align objects to each other so your drawing will look straight.

2014-06-20 15_54_26-Untitled drawing - Google Drawings

Dropbox

If you loose your thesis you are screwed! Since LyX/LaTeX documents are pure text, you could easily put it into a version control system such as Git or SVN. However, you will probably use some graphics, maybe some other PDF and stuff like that. To organize this, I simple use Dropbox. Not only will Dropbox save your files for you, it also has a history. So you can easily restore a previous version of your document:

2014-06-20 15_49_38-Revisionen - Dropbox

eLyXer – HTML export

elyxer

eLyXer is a tool to export LyX documents as HTML. Although, LyX is meant to create PDF documents in the first place, it is nice to have a HTML version to. eLyXer is already build in to LyX, so you can just export your document with some clicks:

2014-07-02 11_19_37-LyX_ ~_Dropbox_MA - SS14 - Master Arbeit_thesis_de.lyx

Here is the CSS I used to style the HTML output:

The result looks like this:

2014-07-02 11_24_13-Portierung einer Enterprise OSGi Anwendung auf eine PaaS

The thesis writing toolbox

LyXA WYSIWYM editor for LaTeX to write documents without a single line of code.
JabRefAn organizer for literature.
latexdiffCompares two LaTeX documents and generates a visualization of their differences.
eLyXerExports LyX documents to HTML

Best regards,
Thomas

JAXB vs. GSON and Jackson

XML vs. JSON

The discussion about XML and JSON is as old as the internet itself. If you Google XML vs. JSON you will get about 2.2 million results. And here is mine – a comparison of the parsing speed of JAXB, GSON and Jackson.

XML

XML holds data between named nodes which could have attributes, too. Each node can hold child nodes or some data. A typical XML file would look like this:

XML itself is pretty simple, but comes with a totally over engineered ecosystem. There exists different types of validation schemes, query languages, transformation languages and weird dialects. If you had an XML course during your studies, the most difficult part was to instantiate the XML parser in Java (File > DocumentBuilderFactory > DocumentBuilder > Document > Element).

JSON

JSON holds data as JavaScript objects. It has two types of notations: lists enclosed in [...] and objects enclosed in {...}. The document from above could look like this:

Compared to XML, JSON is more lean and lightweight. The format contains less clutter and there are not as much tools as for XML (e.g. not query or transformation languages).

Parsing Speed Test

Since JSON is a leaner format, I was wondering if a JSON file could be parsed faster than an equivalent XML file. In the end, both formats contain the same data in a hierarchical representation of nodes and elements. So is there any difference?

Test data

I create a simple test data. A class called Course could hold a list of Student objects and Topic objects. Each object has some properties such as a name. Before each test, I create a new Course object fill with random values. I made my tests with 200, 2.000, 20.000, 100.000 and 200.000 students/topics and repeated each test 500 times.

Candidates

JAXBXMLJavaofficial Java Architecture for XML BindingMaven
GSONJSONJavaby GoogleMaven
JacksonJSONJavaopen source projectMaven

JSON writes faster, a little

The first result of my tests was that Jackson (JSON) writes data a little bit faster than JAXB (XML) and GSON (JSON). The difference is not that big

JSON reads faster, a lot

More interesting was the fact that both JSON implementation (Jackson and GSON) read data much faster than JAXB.

JSON files are smaller

OK, this one is a no-brainer, but JSON files are (of course) smaller than XML files. In my example, the JSON file was about 68% of the size of the corresponding XML file. However, this highly depends on you data. If the data inside the nodes is large, the overhead of the XML tags is not as much as for very small data (e.g. <number>5</number>). The file generated by GSON has of course the same size as the file generated by Jackson.

Best regards,
Thomas

Who is using OSGi?

Who is using OSGi? If you take a look http://www.osgi.org/About/Members you will see more than a hundred members of the the OSGi Alliance and a lot of big players just like IBM or Oracle. But lets do a little investigation with Google Trends on our own.

Google Trends

Google Trends is a service where you can search for a particular key word and get a time line. The time line shows you when the key word was search and how many requests have been made. That is a great way to estimate how popular a certain technology is at a time. Google will also show you where the search requests have been made – and that is where we start.

OSGi in Google Trends

If we search for “OSGi” on Google Trends we will see a chart just like the one shown below. As we can see, OSGi is somehow over its peak and the interest in the technology is decreasing since a couple of years.

But even more interesting is the map which shows where the search requests have been made. As we can see, most requests came from China. On the fourth place is Germany.

If we take a closer look on Germany, we see that most requests come from the south of Germany.

But we can even get more specific and view the exact cities. It is a little bit hard to see in the chart, but you can click on the link at the bottom to see the full report. You will see Walldorf, Karlsruhe and Stuttgart on top. So what? Well, in Walldorf, there is one big player who is not on the list of the OSGi Alliance: SAP.

We can do the very same with the USA and we will end up in California and Redwood City, where companies like Oracle and Informatica are located.

Best regards,
Thomas

Docker Registry Rest API

The Docker Registry

The Docker registry is Docker’s in-build way to share images. It is an open-source project and can be found at https://github.com/dotcloud/docker-registry in the official repository of DotCloud. You can set it up on your private server (maybe in the cloud) at push and pull your images to it. You can also secure it, e.g. with SSL and a NGINX (maybe I will write about this later).

The Rest API

Similar to Docker itself, the registry provides a Rest API to interact with it. Using the Rest API, you can list all images, search or brows a certain repository. The only prerequisite is that you define a search back-end in the registry’s config.yaml:

Now you can use the Rest API like this:

List a certain repository

Search

Get info to a certain image

List all image

And thanks to bwilcox from StackOverflow, this is how you can list all images:

More

Best regards,
Thomas

Cloud vendors with Windows

The cloud is build on Linux – that is my own humbling opinion. But is it really? To answer this question for myself, I took a look at a bunch of cloud vendors to see what they got under the hood. Here is what I found.

But note that the list is neither complete nor representative. I am also comparing two very different things: IaaS and PaaS. While IaaS vendors like AWS provide virtual machines, PaaS vendors like Heroku provide a tooling to setup complete environments.

However, the list shows that most of the vendors use Linux as their base system and the more you go to the PaaS direction, the more Windows vanishes.

VendorWindowsLinuxTypeComment
Microsoft AzureyesyesIaaS
AWSyesyesIaaSAWS has a lot of Linux distributions and Windows version on their IaaS EC2.
AWS Elastic BeanstalkyesyesIaaS
eNlight CloudyesyesCentOS, Red Hat Enterprise Linux, SUSE Linux, Oracle Linux, Ubuntu, Fedora, Debian, Window Server 2003, Windows Server 2008, Windows 7.
Google App EnginePaaSGoogle App Engine has a sandbox and hides the OS.
Google Compute EngineyesyesIaaSLinux, FreeBSD, Microsoft Windows
HerokuyesPaaSUbuntu
JelasticyesPaaS
HP CloudyesIaaSBased on OpenStack.
OpenShiftyesPaaSRed Hat Enterprise Linux
Engine YardyesPaaSUbuntu, Gentoo
Rackspaceyesyes
Cloud FoundryyesPaaS

Best regards,
Thomas

How to know you are inside a Docker container

How to know that you are living in the Matrix? Well, I do not know, but at least I know how to tell you if you are inside a Docker container or not.

The Docker Matrix

Docker provides virtualization based on Linux Containers (LXC). LXC is a technology to provide operating system virtualization for processes on Linux. This means, that processes can be executed in isolation without starting a real and heavy virtual machine. All processes will be executed on the same Linux kernel, but will still have their own namespaces, users and file system.

An important feature of such virtualization is that applications inside a virtual environment do not know that they are not running on real hardware. An application will see the same environment, no matter if it is running on real or virtual resources.

/proc

However, there are some tricks. The /proc file system provides an interface to kernel data structures of processes. It is a pseudo file system and most of it is read-only. But every process on Linux will have an entry in this file system (named by its PID):

In this directory, we find information about the executed program, its command line arguments or working directory. And since the Linux kernel 2.6.24, we also find a file called cgroup:

This file contains information about the control group the process belongs to. Normally, it looks something like this:

But since LXC (and therefore Docker) makes use of cgroups, this file looks different inside a container:

As you can see, some resources (like the CPU) are belonging to a control group with the name of the container. We can make this a little bit easier if we use the keyword self instead of the PID. The keyword self will always reference the folder of the calling process:

And we can wrap this into a function (thanks to Henk Langeveld from StackOverflow):

More

Best regards,
Thomas

Layering of Docker images

Docker images are great! They are not only portable application containers, they are also building blocks for application stacks. Using a Docker registry or the public Docker index, you can compose setups just by downloading the right Docker image.

But Docker images are not only building blocks for applications, they also use a kind of “build block” themselves: layers. Every Docker image consists of a set of layers which make up the final image.

Layers

Let us consider the following Dockerfile to build a simple Ubuntu image with an Apache installation:

If we build the image by calling docker build -t test/a . we get an image called a, belonging to a repository called test. We can see the history of your image by calling docker history test/a:

The final image a consists of six intermediate images as we can see. The first three layers belongs to the Ubuntu base image and the rest is ours: one layer for every build instruction.

We will see the benefit of this layering if build a slightly different image. Let’s consider this Dockerfile to build nearly the same image (only the text file in the last instruction has a different name):

When we build this file, the first thing we will notice is that the build is much faster. Since we already created intermediate images for the first three instructions (namely FROM..., RUN... and RUN...), Docker will reuse those layers for the new image. Only the last layer will be created from scratch. The history of this image will look like this:

As we see, all layers are the same as for image a, except of the first one where we touch a different file!

Benefits

Those layers (or intermediate images or whatever you call them) have some benefits. Once we build them, Docker will reuse them for new builds. This makes the builds much faster. This is great for contentious integration, where we want to build an image at the end of each successful build (e.g. in Jenkins). But the build is not only faster, the images are also smaller, since intermediate images are shared between images.

But maybe the best things are rollbacks: since every image contains all of its building steps, we can easily go back to a previous step if we want so. This can be done tagging a certain layer. Let’s take a look at image b again:

If we want to make a rollback and remove the last layer (maybe the file should be called c.txt instead of b.txt) we can do so by tagging the layer 9977b78fbad7:

Let’s take a look at the new history:

Our last layer is gone and with the layer the text file b.txt!

Best regards,
Thomas