DeployMan (command line tool to deploy Docker images to AWS)

DeployMan

2014-07-29 11_34_11-Java EE - Eclipse

Yesterday, I published a tool called DeployMan on GitHub. DeployMan is a command line tool to deploy Docker images to AWS and was the software prototype for my master thesis. I wrote my thesis at Informatica in Stuttgart-Weilimdorf, so first of all, I want to say thank you to Thomas Kasemir for the opportunity to put this online!

Disclaimer

At the time I am writing this post, DeployMan is a pure prototype. It was created for academic research and as a demo for my thesis. It is not ready ready for production. If you need a solid tool to deploy Docker images (to AWS), have a look at Puppet, CloudFormation (for AWS), Terraform, Vagrant, fig (for Docker) or any other orchestration tool that came up in the last couple of years.

What DeployMan does

DeployMan can create new AWS EC2 instances and deploy a predefined stack of Docker images on it. To do so, DeployMan takes a configuration file called a formation. A formation specifies how the EC2 machine should look like and which Docker images (and which configurations) should be deployed. Docker images can either be deployed from a Docker registry (the public one or a private) or a tarballs from a S3 storage. Together with each image, a configuration folder will pulled from a S3 storage and mounted to the running container.

Here is an example of a formation which deploys a Nginx server with a static HTML page:

Interfaces

DeployMan provides a command line interface to start instances and do some basic monitoring of the deployment process. Here is a screenshot which shows some formations (which can be started) and the output of a started Logstash server:

Run_Logstash_Server

To keep track of the deployment process in a more pleasant way, DeployMan has a web interface. The web interface shows details to machines, such as the deployed images and which containers are running. Here is how a Logstash server would look like:

Machine_Details

The project

GitHub-Mark

You can find the project on GitHub at https://github.com/tuhrig/DeployMan. I wrote a detailed README.md which explains how to build and use DeployMan. To test DeployMan, you need an AWS account (there are also free accounts).

The project is made with Java 8, Maven, the AWS Java API, the Docker Java API and a lot of small stuff like Apache Commons. The web interface is based on Spark (for the server), Google’s AngularJS and Twitter’s Bootstrap CSS.

Best regards,
Thomas

Presentation of my master thesis

Over the last six months, I wrote my master thesis about porting an enterprise OSGi application to a PaaS. Last Monday, the 21th Juli 2014, I presented the main results of my thesis to my professor (best greetings to you, Mr. Goik!) and to my colleges (thanks to all of you!) at Informatica in Stuttgart-Weilimdorf, Germany (where I had written my thesis based on one of their product information management applications, called Informatica PIM).

Here are the slides of my presentation.

While my master thesis also covers topics like OSGi, VMs and JEE application servers, the presentation focuses on my final solution of a deployment process for the cloud. Based on Docker, the complete software stack used for the Informatica PIM server was packaged into separate, self-contained images. Those images have been stored in a repository and were used to automatically setup cloud instances on Amazon Web Services (AWS).

The presentation gives answers to the following questions:

  • What is cloud computing and what is AWS?
  • What are containers and what is Docker?
  • How can we deploy containers?

To automate the deployment process of Docker images, I implemented my own little tool called DeployMan. It will show up at the end of my slides and I will write about it in a couple of days here. Although there are a lot of tools out there to automate Docker deployments (e.g. fig or Maestro), I wanted to do my own experiments and to create a prototype for my thesis.

Enjoy!

Best regards,
Thomas

Docker Registry Rest API

The Docker Registry

The Docker registry is Docker’s in-build way to share images. It is an open-source project and can be found at https://github.com/dotcloud/docker-registry in the official repository of DotCloud. You can set it up on your private server (maybe in the cloud) at push and pull your images to it. You can also secure it, e.g. with SSL and a NGINX (maybe I will write about this later).

The Rest API

Similar to Docker itself, the registry provides a Rest API to interact with it. Using the Rest API, you can list all images, search or brows a certain repository. The only prerequisite is that you define a search back-end in the registry’s config.yaml:

Now you can use the Rest API like this:

List a certain repository

Search

Get info to a certain image

List all image

And thanks to bwilcox from StackOverflow, this is how you can list all images:

More

Best regards,
Thomas

Cloud vendors with Windows

The cloud is build on Linux – that is my own humbling opinion. But is it really? To answer this question for myself, I took a look at a bunch of cloud vendors to see what they got under the hood. Here is what I found.

But note that the list is neither complete nor representative. I am also comparing two very different things: IaaS and PaaS. While IaaS vendors like AWS provide virtual machines, PaaS vendors like Heroku provide a tooling to setup complete environments.

However, the list shows that most of the vendors use Linux as their base system and the more you go to the PaaS direction, the more Windows vanishes.

Vendor Windows Linux Type Comment
Microsoft Azure yes yes IaaS
AWS yes yes IaaS AWS has a lot of Linux distributions and Windows version on their IaaS EC2.
AWS Elastic Beanstalk yes yes IaaS
eNlight Cloud yes yes CentOS, Red Hat Enterprise Linux, SUSE Linux, Oracle Linux, Ubuntu, Fedora, Debian, Window Server 2003, Windows Server 2008, Windows 7.
Google App Engine PaaS Google App Engine has a sandbox and hides the OS.
Google Compute Engine yes yes IaaS Linux, FreeBSD, Microsoft Windows
Heroku yes PaaS Ubuntu
Jelastic yes PaaS
HP Cloud yes IaaS Based on OpenStack.
OpenShift yes PaaS Red Hat Enterprise Linux
Engine Yard yes PaaS Ubuntu, Gentoo
Rackspace yes yes
Cloud Foundry yes PaaS

Best regards,
Thomas

How to know you are inside a Docker container

How to know that you are living in the Matrix? Well, I do not know, but at least I know how to tell you if you are inside a Docker container or not.

The Docker Matrix

Docker provides virtualization based on Linux Containers (LXC). LXC is a technology to provide operating system virtualization for processes on Linux. This means, that processes can be executed in isolation without starting a real and heavy virtual machine. All processes will be executed on the same Linux kernel, but will still have their own namespaces, users and file system.

An important feature of such virtualization is that applications inside a virtual environment do not know that they are not running on real hardware. An application will see the same environment, no matter if it is running on real or virtual resources.

/proc

However, there are some tricks. The /proc file system provides an interface to kernel data structures of processes. It is a pseudo file system and most of it is read-only. But every process on Linux will have an entry in this file system (named by its PID):

In this directory, we find information about the executed program, its command line arguments or working directory. And since the Linux kernel 2.6.24, we also find a file called cgroup:

This file contains information about the control group the process belongs to. Normally, it looks something like this:

But since LXC (and therefore Docker) makes use of cgroups, this file looks different inside a container:

As you can see, some resources (like the CPU) are belonging to a control group with the name of the container. We can make this a little bit easier if we use the keyword self instead of the PID. The keyword self will always reference the folder of the calling process:

And we can wrap this into a function (thanks to Henk Langeveld from StackOverflow):

More

Best regards,
Thomas

Layering of Docker images

Docker images are great! They are not only portable application containers, they are also building blocks for application stacks. Using a Docker registry or the public Docker index, you can compose setups just by downloading the right Docker image.

But Docker images are not only building blocks for applications, they also use a kind of “build block” themselves: layers. Every Docker image consists of a set of layers which make up the final image.

Layers

Let us consider the following Dockerfile to build a simple Ubuntu image with an Apache installation:

If we build the image by calling docker build -t test/a . we get an image called a, belonging to a repository called test. We can see the history of your image by calling docker history test/a:

The final image a consists of six intermediate images as we can see. The first three layers belongs to the Ubuntu base image and the rest is ours: one layer for every build instruction.

We will see the benefit of this layering if build a slightly different image. Let’s consider this Dockerfile to build nearly the same image (only the text file in the last instruction has a different name):

When we build this file, the first thing we will notice is that the build is much faster. Since we already created intermediate images for the first three instructions (namely FROM..., RUN... and RUN...), Docker will reuse those layers for the new image. Only the last layer will be created from scratch. The history of this image will look like this:

As we see, all layers are the same as for image a, except of the first one where we touch a different file!

Benefits

Those layers (or intermediate images or whatever you call them) have some benefits. Once we build them, Docker will reuse them for new builds. This makes the builds much faster. This is great for contentious integration, where we want to build an image at the end of each successful build (e.g. in Jenkins). But the build is not only faster, the images are also smaller, since intermediate images are shared between images.

But maybe the best things are rollbacks: since every image contains all of its building steps, we can easily go back to a previous step if we want so. This can be done tagging a certain layer. Let’s take a look at image b again:

If we want to make a rollback and remove the last layer (maybe the file should be called c.txt instead of b.txt) we can do so by tagging the layer 9977b78fbad7:

Let’s take a look at the new history:

Our last layer is gone and with the layer the text file b.txt!

Best regards,
Thomas

Docker vs. Heroku

Untitled drawing

Since a couple of weeks I am working with Docker as an application container for Amazon’s EC2. Despite my eternal fight with the Docker registry, I am absolutely amazed about Docker and enjoyed my experience.

But sometimes it is hard to explain what Docker is and what is has to do with all this cloud and PaaS and scalability topic. So I thought a little bit about some similar concepts between Docker and Heroku -maybe the most popular PaaS provider. But let’s start with a small…

Disclaimer

Docker and Heroku maybe have similar concepts (as you will see below), but they are two completely different things: while Docker is an open source software project, Heroku is a commercial service provider. You can download, build and install Docker on your own laptop or participate on its online community. On Heroku, you can create yourself an user account, pay some money (maybe) and get a really great service and hosting experience for your applications and code. So obviously, Docker and Heroku are very different things. But some of their core concepts have at least some similarities.

Docker vs. Heroku

Docker Heroku
Dockerfile BuildPack
Image Slug
Container Dyno
Index Add-Ons
CLI CLI

Docker and Heroku have a lot of similarities, especially in their core concepts. This makes Docker an interesting alternative for people who are looking for an alternative for Heroku – maybe on their own infrastructure.

Dockerfile vs. BuildPack

Docker images can be build with a Dockerfile. A Dockerfile is a set of commands, e.g. to add files and folders or to install packages. It defines how the final image should look like. Here is an example of a Dockerfile which installs memcache from the official website:

Heroku’s pendant are so called BuildPacks. BuildPacks are also a set of scripts which are used to setup the final state of an image. Heroku comes with a couple of default BuildPacks such as for Java, Python or the Play! framework. But you can also write your own. Here’s a snippet of the Heroku BuildPack for Java apps:

BTW, there are even projects to enable the usage of Heroku’s BuildPacks for Docker images (like this).

Image vs. Slug

When you run a Dockerfile, it creates a Docker image. Such an image contains all data, files, dependencies and settings you need for your application. You can exchange those images and start them right away on any machine with Docker installed.

When you run a build on Heroku, the BuildPack creates a so called Slug. Those slugs are “are compressed and pre-packaged copies of your application” as Heroku says. Similar to Docker’s images, they contain all dependencies and can be deployed and started in a very short time.

Container vs. Dyno

After starting a Docker image, you have a running container of this image. You can start an image multiple times, to get multiple isolated container of the same application. This enables you to build an image once and start easily multiple instances of it.

Heroku does the very same. After you build your app with your BuildPack, you get a slug which you can run on a Dyno. Such a dyno is “a lightweight container running a single user-specified command” as Heroku describes it.

Heroku even uses LXC for virtualization of their containers (dynos), which is the same technology Docker uses at its core.

Index vs. Add Ons

Docker images can be shared with the community. This is possible by uploading them to the official Docker index. All images on this index can be download and used by everyone. Most of them are documented very well and can be started with a single command. This makes it possible to run a lot of applications as building blocks. Here’s an example how to run elasticsearch:

A similar concept applies to Heroku’s add-on market. You can use (or buy) different pre-configured add-ons for your application (e.g. for elasticsearch). This makes it possible to build a complex app with common building blocks – such as Docker is doing it!

So both, Docker’s index and Heroku’s add-ons, underline a service oriented way of developing applications and reusing components.

CLI

2014-05-05 17_28_04-C__Users_tuhrig_Desktop_AWSRepo_formations_RELEASE_7.0.0.5_PIM.json (static) - S

Although the four points mentioned before are the most important concepts of both, Docker and Heroku have one more thing in common: both have a powerful command line interface which allows to manage containers. E.g. you can run heroku ps to see all your running slugs or docker ps to see all your running containers or you can request the log of a certain container.

Resources

Best regards,
Thomas