Using the console on Windows

I tell you a secret: I’m a Windows user. I don’t use OSX, I don’t use Linux, I use Windows. And I tell you something more: I like it πŸ˜‰

Usually I develop Java or JavaScript applications which perfectly run under Windows, Linux, OSX or what so ever. So developing under Windows is no problem at all. Runtime environments, IDEs, editors – Windows has it all. However, people keep wondering how I can do the most simple task:

How do you connect to another server? Do you use Putty? – Argh, no, I just type ssh some.server.com and I’m done.

Or:

Do you use your IDE to work with GIT? Or do you have Source Tree? – Argh, no, I just type git add . and git commit -m "..." and I’m done.

But I also know, that not everybody is doing it like this. People use the weirdest tools and techniques when working under Windows. A lot of people use that f***ing small Windows CMD, Putty with its broken key-format or CygWin to be a little bit more Linux-like. But I don’t like all of these. The Windows CMD is unusable, Putty is unnecessary and CygWin is a monster you don’t need. Here is what I do.

Don’t use the Windows default CMD

The first thing I do, is to don’t use Windows’ default CMD. Why? It can’t even mark and copy things! However, there’s an easy and open-source alternative: ConEmu. ConEmu has all the simple things you expect: colors, tabs, resizable, copy-and-paste and much more. You can get it from GitHub and it even works without installation.

2016-05-21 21_51_21-Settings

Use GIT as a toolbox for Windows

The other thing I do, is to use GIT as a toolbox for Windows. When people are talking about using the console, they are actually talking about using tools. They talks about SCP, SSH or CURL like they come with their console – but they don’t! All of those things are just individual programs installed on their machine. They are not related to the command line! So why don’t install them on Windows?

If you use GIT, you already have everything you need in the bin folder (e.g. on C:\Program Files (x86)\Git\bin\):

ssh, scp, curl, cat, grep, less, sh, bash, ls, mv, cp, diff, gzip, and much more…

2016-05-21 22_09_29-Using the console on Windows – Thomas Uhrig

The only thing you need to do is to put GIT’s bin folder to your Windows variable path. You will have everything at a fingertip in your console.

2016-05-21 22_26_43-Environment Variables

Put my SSH-certificates to my user-folder

The last thing I usually do is to put my SSH-certificates to my user folder at C:\Users\tug\.ssh, so that SSH can find them.

Best regards,
Thomas

More

Resizing Vagrant box disk space

Vagrant is a great tool to provision virtual machines! As I’m a passionated Windows user, Vagrant is the weapon of my choice whenever I need to use some Linux-only tools such as Docker. I spinn up a new Linux VM, already configured with the things I need and start working. However, when it comes to resizing a disk, Vagrant is not nice to you…

The problem

Vagrant doesn’t provide any out-of-the-box option to configure or to change the disk size. The disk size of a VM totally depends on the base image used for the VM. There are base images with a 10 GB disk, some with a 20 GB disk and some other with a 40 GB disk. There is no Vagrant option to change this – and even worse: most Vagrant boxes use VMDK disks which cannot be resized!

Resizing (manually) with VirtualBox

As Vagrant doesn’t provide any out-of-the-box functionality, we need to do the resizing “manually”. Of course, we can write a script for this, too, but for now we keep it simple and do it by hand.

  1. First we need to convert the VMDK disk to a VDI disk which can be resized. We do this with the VBoxManage tool which comes with the VirtualBox installation:
  2. Now we can easily resize this VDI disk, e.g. to 50 GB:
  3. The last step we need to do is just to use the new disk instead of the old one. We can do this by cloning the VDI disk back to the original VMDK disk or within a view clicks in VirtualBox:

    2016-01-06 16_32_02-Oracle VM VirtualBox Manager

That’s it! Now start your VM with vagrant up and check the disk space. It’s at 50 GB and we have new free space again!

Best regards,
Thomas

More

Cloud vendors with Windows

The cloud is build on Linux – that is my own humbling opinion. But is it really? To answer this question for myself, I took a look at a bunch of cloud vendors to see what they got under the hood. Here is what I found.

But note that the list is neither complete nor representative. I am also comparing two very different things: IaaS and PaaS. While IaaS vendors like AWS provide virtual machines, PaaS vendors like Heroku provide a tooling to setup complete environments.

However, the list shows that most of the vendors use Linux as their base system and the more you go to the PaaS direction, the more Windows vanishes.

Vendor Windows Linux Type Comment
Microsoft Azure yes yes IaaS
AWS yes yes IaaS AWS has a lot of Linux distributions and Windows version on their IaaS EC2.
AWS Elastic Beanstalk yes yes IaaS
eNlight Cloud yes yes CentOS, Red Hat Enterprise Linux, SUSE Linux, Oracle Linux, Ubuntu, Fedora, Debian, Window Server 2003, Windows Server 2008, Windows 7.
Google App Engine PaaS Google App Engine has a sandbox and hides the OS.
Google Compute Engine yes yes IaaS Linux, FreeBSD, Microsoft Windows
Heroku yes PaaS Ubuntu
Jelastic yes PaaS
HP Cloud yes IaaS Based on OpenStack.
OpenShift yes PaaS Red Hat Enterprise Linux
Engine Yard yes PaaS Ubuntu, Gentoo
Rackspace yes yes
Cloud Foundry yes PaaS

Best regards,
Thomas

How to know you are inside a Docker container

How to know that you are living in the Matrix? Well, I do not know, but at least I know how to tell you if you are inside a Docker container or not.

The Docker Matrix

Docker provides virtualization based on Linux Containers (LXC). LXC is a technology to provide operating system virtualization for processes on Linux. This means, that processes can be executed in isolation without starting a real and heavy virtual machine. All processes will be executed on the same Linux kernel, but will still have their own namespaces, users and file system.

An important feature of such virtualization is that applications inside a virtual environment do not know that they are not running on real hardware. An application will see the same environment, no matter if it is running on real or virtual resources.

/proc

However, there are some tricks. The /proc file system provides an interface to kernel data structures of processes. It is a pseudo file system and most of it is read-only. But every process on Linux will have an entry in this file system (named by its PID):

In this directory, we find information about the executed program, its command line arguments or working directory. And since the Linux kernel 2.6.24, we also find a file called cgroup:

This file contains information about the control group the process belongs to. Normally, it looks something like this:

But since LXC (and therefore Docker) makes use of cgroups, this file looks different inside a container:

As you can see, some resources (like the CPU) are belonging to a control group with the name of the container. We can make this a little bit easier if we use the keyword self instead of the PID. The keyword self will always reference the folder of the calling process:

And we can wrap this into a function (thanks to Henk Langeveld from StackOverflow):

More

Best regards,
Thomas

Flatten a Docker container or image

Docker containers and respectively images can become fairly large. I recently worked with a Docker image which was over 7 GB big. However, it is pretty easy to flatten an image at the end.

Difference between save and export

As I described in my last post (http://tuhrig.de/difference-between-save-and-export-in-docker), there are two ways to persist a Docker images or container:

  • A Docker image can be saved to a tarball and loaded back again. This will preserve the history of the image.

  • A Docker container can be exported to a tarball and imported back again. This will not preserve the history of the container.

No history

We can use this mechanism to flatten and shrink a Docker container. If we save an image to the disk, its whole history will be preserved, but if we export a container, its history gets lost and the resulting tarball will be much smaller.

We can see the history of a image be running docker tag <LAYER ID> <IMAGE NAMEgt;:

So if we export a container (either an already running one or just start a new one from an image) it will lose its history and all previous layers. This will make it impossible to make a rollback to a certain layer, but it will also shrink the image. My >7 GB image is now >3 GB large, which saves more than 50% of disk space.

Flatten a Docker container

So it is only possible to “flatten” a Docker container, not an image. So we need to start a container from an image first. Then we can export and import the container in one line:

What else?

You can use some common Linux tricks to shrink Docker images. One simple trick is to clear the cache of the package manager. So depending on which base image you use you can do something like this (for an Ubuntu/Debian system, for more see here):

Resources

Best regards,
Thomas

Difference between save and export in Docker

I recently played around with Docker, an application container and virtualizationβ€Ž technology for Linux. It was pretty cool and I was able to create Docker images and containers within a couple of minutes. Everything was working right out of the box!

At the end of my day I wanted to persist my work. I stumbled over the Docker commands save and export and wondered what their difference is. So I went to StackOverflow and asked a question which was nicely answered by mbarthelemy. Here is what I found out.

How Docker works (in a nutshell)

Docker is based on so called images. These images are comparable to virtual machine images and contain files, configurations and installed programs. And just like virtual machine images you can start instances of them. A running instance of an image is called container. You can make changes to a container (e.g. delete a file), but these changes will not affect the image. However, you can create a new image from a running container (and all it changes) using docker commit <container-id> <image-name>.

Let’s make an example:

Now we have two different images (busybox and busybox-1) and we have a container made from busybox which also contains the change (the new folder /home/test). Let’s see how we can persist our changes.

Export

Export is used to persist a container (not an image). So we need the container id which we can see like this:

To export a container we simply do:

The result is a TAR-file which should be around 2.7 MB big (slightly smaller than the one from save).

Save

Save is used to persist an image (not a container). So we need the image name which we can see like this:

To save an image we simply do:

The result is a TAR-file which should be around 2.8 MB big (slightly bigger than the one from export).

The difference

Now after we created our TAR-files, let’s see what we have. First of all we clean up a little bit – we remove all containers and images we have right now:

We start with our export we did from the container. We can import it like this:

We can do the same for the saved image:

So what’s the difference between both? Well, as we saw the exported version is slightly smaller. That is because it is flattened, which means it lost its history and meta-data. We can see this by the following command:

If we run the command we will see an output like the following. As you can see there, the exported-imported image has lost all of its history whereas the saved-loaded image still have its history and layers. This means that you cannot do any rollback to a previous layer if you export-import it while you can still do this if you save-load the whole (complete) image (you can go back to a previous layer by using docker tag <LAYER ID> <IMAGE NAME>).

Best regards,
Thomas

3 ways of installing Oracle XE 11g on Ubuntu

During the last few days I struggled with installing Oracle XE 11g on an Ubuntu VM. And with “the last few days” I mean “the last few weeks”. Sad, but true. Here is what I learned about installing Oracle XE 11g on Ubuntu. But please note that I am not a Linux specialist nor an Oracle specialist – it was my first try. And as you will see, I try to make my life as easy as possible. But let’s start with the very basics.

About Oracle XE 11g

Oracle Database Express Edition 11g (or just Oracle XE) is the free version of Oracles 11g database. It was released in 2011 and is the second free version of Oracle’s database. The first free version was Oracles XE 10g, which was released in 2005. The latest paid version of Oracle’s database is 12c, which was released in 2013.

What is really confusing is the fact that the release date of the paid versions is different from the release date of the free versions. So don’ be confused, the latest free database from Oracle is 11g.

Version Paid Free
Version 9i: 2001
Version 10g: 2003 2005
Version 11g: 2007 2011
Version 12c: 2013

By the way, the i, g and c in the database names stand for internet, grid and cloud.

You can download Oracle XE from the link below – and this is where the pain begins. First of all you need an Oracle account, but this is free and easy. Then you have the choice between two packages:

  • One package for Windows which only runs on a 32-bit machine as Oracle says. But don’t worry, it also runs on a 64-bit machine (like mine) and although the unzipped installation folder is called DISK1 there is no DISK2 or something. Just run the setup.exe.
  • One package for Linux which is meant to run on a 64-bit machine as Oracle says. And here is the first pain: The package is only available as RPM so you first have to run alien on it to convert it for Ubuntu.

#1 – Installing Oracle XE by hand

b7e

My first approach was to install Oracle XE by hand. Although this seems to be the straightforward solution, it was the most painful one. You have to convert the RPM package to a DEP package, create a chkconfig script, create some mystic kernel parameters in a 60-oracle.conf, set your swap space to 2GB or more, create some more folders, install the DEB package, configure the database, export some environment variables like ORACLE_HOME (are you still with me?), reload that changes and then start the Oracle service and connect via SQLPlus. Easy, isn’t it? That’s why Oracle is making so much money with consulting πŸ˜€

And how to know all this steps? Well, use Google, because the best descriptions I found are not from Oracle. To be precise, I found no official description how to setup the Oracle XE database. Here is what I found on blogs and forums. Just choose one that suits you best. They are all really good, thanks to the people who wrote this!

#2 – Installing Oracle XE with Vagrant and Puppet

Vagrant is a free tool to create Linux VMs automatically. This means you can run Vagrant with a simple configuration file (called Vagrantfile) which describes a Linux VM (e.g. how many CPUs it should have or which image should be installed). Vagrant will create a VM according to this configuration and start it. This gives you the ability to create the same machine with the exact same configuration over and over again.

Puppet is a tool to orchestrate Linux machines. This means Puppet can automatically install programs, create folders or write files. You can download and use it for free. It perfectly integrates into Vagrant. So you can create a VM with Vagrant and as soon as it is ready it can be orchestrated by Puppet.

The really great thing about Vagrant and Puppet is that such scripts can be exchanged. This means that you can write a setup of scripts for something (e.g. to install Oracle XE) and share it with other people. And this is what those to guys did:

  • Matthew Baldwin wrote a complete Puppet (and Vagrant) setup to install Oracle XE on CentOS (a “RPM-Linux“). You find it here on GitHub.
  • Hilverd Reker wrote a complete Puppet (and Vagrant) setup to install Oracle XE on Ubuntu 12.04 (a “DEB-Linux“). You find it here on GitHub.

Both projects work the same:

  1. Install Vagrant, VirtualBox and Puppet
  2. Checkout the GitHub repository or download it as a ZIP-file
  3. Download the Oracle XE installation files and put it in some folder which you can read in the README.md of the projects
  4. Go to the root folder of the project and run vagrant up. This will download a Linux image, install it to a VM and do all the rest (including installing Oracle XE). Note that this will take some time depending on your internet connection!
  5. Now you can call vagrant ssh and you got a running Linux VM with Oracle XE!

Both projects work really well and install an Oracle XE instance in a couple of minutes.

#3 – Installing Oracle XE with Docker

homepage-docker-logo

This one is the nicest way to install Oracle XE. Docker is an application container for Linux. It is based on LXC and gives you the ability to package complete application including their dependencies to a self-containing file (called an image). These images can be exchanged and run on every Linux machine where Docker is installed! Awesome!

Docker images are also shared around the community on https://index.docker.io. And this is where this two guys come into play:

  • Wei-Ming Wu made a Docker image containing Oracle XE. You can find it here.
  • Alexei Ledenev extended Wei-Ming Wu’s image to also use the Oracle web console (APEX). You can find it here.

Both projects work the same:

  1. Install Docker on your Linux machine. You can find instructions for that at http://docs.docker.io/en/latest/installation/ubuntulinux. But it is nothing more then this:
  2. Pull the image to your machine:
  3. Run the image:
  4. That’s it. Absolutely simple.

The great thing is that those images are not as big as a virtual machine. They only contain the actual application and its environment. But they are packed in a way that they can be executed everywhere right out of the box but are still isolated. You can also make changes to the images and create another image containing your changed system.

Resources

Best regards,
Thomas

Install OpinionTrends with nginx and memcached

OpinionTrends is build with Python and Flask. Therefore you can run it without any additional server right out of the box. Batteries included! However, it is much more common and much more efficient if you run it with a web server and an application server. A widely used combination for Python web applications is nginx together with uWSGI. In this tutorial I want to show how we set up this two tools to run TechTrends and OpinionTrends. The tutorial starts at the very beginning and the only thing I assume is that you run a Linux machine with Python and easy_install.

Note: OpinionTrends is the new version of TechTrends. However, it is just a code name. So when I talk about OpinionTrends or TechTrends I mean the same thing.

OpinionTrends

Base folder

The first thing we have to do is to create the base directory for TechTrends. By convention a web application is located in /var/www. All of our code will go into this folder.

Clone & Update

Now we can start to set-up the application from scratch. OpinionTrends is deployed with git. So we just check-out the code and switch to the new opinion branch:

After the first checkout an update is super easy. Just do a pull for the latest code:

Configuration

Now we have to make some simple settings for TechTrends. To do this, we create a new file called config.py in the folder Configuration in our checked-out project. There is also a file called config_template.py in this folder which is an empty but well documented template for the configuration. The file contains all individual settings for the application. It should look like this:

Dependencies

Now we have to install the libraries needed by TechTrends. To do this we use easy_install. All dependencies are in a file called requirements.txt in the root folder of TechTrends. We have to install all dependencies in this file:

Note: As you see, the installation of the dependencies is very easy in theory. However, some dependencies such as scikit-learn are sometimes hard to install since they are not pure Python and use come C bindings. If you have problems installing the whole requirements.txt at once try to install every dependency manually on its own.

nginx

800px-Nginx_Logo.svg

Install

Installing nginx is not a big thing:

After we installed and started nginx we can verify if it is running correctly by taking our favorite browser and surf to our machine.

By the way, stopping nginx is also very simple:

Configuration

Now we have to configure ngix to point to TechTrends instead to the default welcome page. To do this we first remove the default configuration:

Now we create our own configuration in /var/www/techtrends/nginx.conf. It should look like this:

We link this file to nginx a restart it:

Now we should get a Bad Gateway exception. Perfect! This tells use that nginx found our configuration and that every things looks good – except of the missing uWSGI!

uWSGI

logo_uWSGI

Install

uWGSI is the protocol between our Python application and nginx. It is their way of communication. First we have to install it:

Configuration

Now we create a configuration file in /var/www/techtrends/uwsgi.ini. It should look like this:

Now we can start uWSGI as a daemon in the background:

Done! Nginx is serving TechTrends now. However, we should get an exception again since memcached is still missing. If we want to use TechTrends without memcache we have to change a value in the config.py in the Configuration folder in TechTrends base directory. We have to set DEBUG = True to not use memcache.

memcached

memcached_banner75

One of the biggest performance improvements you can do (I guess in general, but especially for TechTrends) is to use memcached. Memcached is a key-value in-memory store to cache frequently requested data. In TechTrends we use it to store whole pages and JSON responses from our API.

Install

Install memcached first:

And that’s it! Memcached is installed and running now. You can restart memcached (e.g. to clear it) like this:

Congratulations

Congratulations! TechTrends should run now in a stable production mode. We installed nginx, uWSGI and memcached. We also configured it to work together. Great! But – there are still some open points we have to do.

crontab

TechTrends/OpinionTrends has two regularly scheduled jobs. One job is the crawler which crawls posts from Reddit and Hackernews and the other job is the training and restart of the application. To execute this jobs we set-up a cron tab. First we create two files which execute these jobs. The first file is called crawl.sh and looks like this:

The second file is called restart.sh and looks like this:

Both files should be in the root folder (/var/www/techtrends/) of TechTrends. Now we add those two files to our locale crontab. We can edit the it like this:

It should look like this:

This crontab will run the crawler every 30 minutes and once a day at 3 o’clock it will trigger the training and restart of the whole application. So the data will growth every 30 minutes and will be indexed once a day. That’s it.

Best regards,
Thomas