A Windows SSO (for Java on client and server)

A couple of months ago I worked on a single sign-on (SSO) for a Windows client and server made in Java. The scenario was the following:

  • A client made with Java running on Windows
  • A server made with Java running on Windows
  • Both where logged-in to the same domain (an Active Directory LDAP)

The question was, how the server could get the identity (the name of the Windows account) of the client and – of course – how it could trust this information. But if the client would just send a name (e.g. from Java’s System.getProperty("user.name"); method), the client could send anything.

The solution for this dilemma (trust what the client sends to you) is to use a (so called) trusted third party. A trusted third party is an instance which both, client and server, know and trust. The client authenticates itself to this party and the server can verify requests against it. In the scenario above, the domain of the company (an Active Directory LDAP) is the trusted third party. Each client identifies itself against this domain when it logs-in to Windows. Its Windows username and password are checked by the domain/LDAP. On the other side, the server has also access to the domain controller and can verify information send by the client.

The nice thing about this is, that the Windows domain is already configured on nearly every machine which stands in a company. Every company, bigger than maybe five people, will have a Windows domain to log-in. Therefor, a SSO based on the Windows domain will work right out of the box in most cases and we don’t need and configuration in our Java code, since it is already configured in Windows.

Java Native Access (JNA)

To use Windows and the domain controller for authentication, we can use native Windows APIs. To call those APIs in Java, we can use the Java Native Access (JNA) library, which you can find on GitHub at https://github.com/twall/jna and on Maven central:

For example, to get all user groups of the current user, you would do:

Waffle

On top of JNA exists a library called Waffle which encapsulates all functionality you need to implement user authentication. You can find it on GitHub at https://github.com/dblock/waffle and also on Maven central:

You can use Waffle to create a token on the client, send it to the server (e.g. over HTTP or whatever) and to validate that token on the server. At the end of this process (create, send and validate) you will know on the server who the client is – for sure!

Here is an example of how to identify a client on the server. Note that this piece of code is executed completely on one machine. However, you could easily split it into two parts, one on the client and one on the server. The only thing you would need to do, is to exchange the byte[] tokens between client and server. I commented the appropriate lines of code.

(By the way, I asked this myself on Stackoverflow some times ago).

The only thing that is a little bit complicated with that solution is, that you need to do a small handshake between client and server. The client will send a token to the server, which will response with another token, which the client needs to answer again to get the final “you are authenticated” token from the server. To do this, you need to hold some state on the server for the duration of the handshake. Since the handshake is done in a second or two, I just used a limited cache from Google’s Guava library to hold maybe 100 client contexts on the server.

The exchanged tokens are validated against the underlying Windows and its domain.

Best regards,
Thomas

DeployMan (command line tool to deploy Docker images to AWS)

DeployMan

2014-07-29 11_34_11-Java EE - Eclipse

Yesterday, I published a tool called DeployMan on GitHub. DeployMan is a command line tool to deploy Docker images to AWS and was the software prototype for my master thesis. I wrote my thesis at Informatica in Stuttgart-Weilimdorf, so first of all, I want to say thank you to Thomas Kasemir for the opportunity to put this online!

Disclaimer

At the time I am writing this post, DeployMan is a pure prototype. It was created for academic research and as a demo for my thesis. It is not ready ready for production. If you need a solid tool to deploy Docker images (to AWS), have a look at Puppet, CloudFormation (for AWS), Terraform, Vagrant, fig (for Docker) or any other orchestration tool that came up in the last couple of years.

What DeployMan does

DeployMan can create new AWS EC2 instances and deploy a predefined stack of Docker images on it. To do so, DeployMan takes a configuration file called a formation. A formation specifies how the EC2 machine should look like and which Docker images (and which configurations) should be deployed. Docker images can either be deployed from a Docker registry (the public one or a private) or a tarballs from a S3 storage. Together with each image, a configuration folder will pulled from a S3 storage and mounted to the running container.

Here is an example of a formation which deploys a Nginx server with a static HTML page:

Interfaces

DeployMan provides a command line interface to start instances and do some basic monitoring of the deployment process. Here is a screenshot which shows some formations (which can be started) and the output of a started Logstash server:

Run_Logstash_Server

To keep track of the deployment process in a more pleasant way, DeployMan has a web interface. The web interface shows details to machines, such as the deployed images and which containers are running. Here is how a Logstash server would look like:

Machine_Details

The project

GitHub-Mark

You can find the project on GitHub at https://github.com/tuhrig/DeployMan. I wrote a detailed README.md which explains how to build and use DeployMan. To test DeployMan, you need an AWS account (there are also free accounts).

The project is made with Java 8, Maven, the AWS Java API, the Docker Java API and a lot of small stuff like Apache Commons. The web interface is based on Spark (for the server), Google’s AngularJS and Twitter’s Bootstrap CSS.

Best regards,
Thomas

JAXB vs. GSON and Jackson

XML vs. JSON

The discussion about XML and JSON is as old as the internet itself. If you Google XML vs. JSON you will get about 2.2 million results. And here is mine – a comparison of the parsing speed of JAXB, GSON and Jackson.

XML

XML holds data between named nodes which could have attributes, too. Each node can hold child nodes or some data. A typical XML file would look like this:

XML itself is pretty simple, but comes with a totally over engineered ecosystem. There exists different types of validation schemes, query languages, transformation languages and weird dialects. If you had an XML course during your studies, the most difficult part was to instantiate the XML parser in Java (File > DocumentBuilderFactory > DocumentBuilder > Document > Element).

JSON

JSON holds data as JavaScript objects. It has two types of notations: lists enclosed in [...] and objects enclosed in {...}. The document from above could look like this:

Compared to XML, JSON is more lean and lightweight. The format contains less clutter and there are not as much tools as for XML (e.g. not query or transformation languages).

Parsing Speed Test

Since JSON is a leaner format, I was wondering if a JSON file could be parsed faster than an equivalent XML file. In the end, both formats contain the same data in a hierarchical representation of nodes and elements. So is there any difference?

Test data

I create a simple test data. A class called Course could hold a list of Student objects and Topic objects. Each object has some properties such as a name. Before each test, I create a new Course object fill with random values. I made my tests with 200, 2.000, 20.000, 100.000 and 200.000 students/topics and repeated each test 500 times.

Candidates

JAXB XML Java official Java Architecture for XML Binding Maven
GSON JSON Java by Google Maven
Jackson JSON Java open source project Maven

JSON writes faster, a little

The first result of my tests was that Jackson (JSON) writes data a little bit faster than JAXB (XML) and GSON (JSON). The difference is not that big

JSON reads faster, a lot

More interesting was the fact that both JSON implementation (Jackson and GSON) read data much faster than JAXB.

JSON files are smaller

OK, this one is a no-brainer, but JSON files are (of course) smaller than XML files. In my example, the JSON file was about 68% of the size of the corresponding XML file. However, this highly depends on you data. If the data inside the nodes is large, the overhead of the XML tags is not as much as for very small data (e.g. <number>5</number>). The file generated by GSON has of course the same size as the file generated by Jackson.

Best regards,
Thomas

Development speed, the Docker remote API and a pattern of frustration

One of the challenges Docker is facing right now, is its own development speed. Since its initial release in January 2013, there have been over 7.000 commits (in one year!) by more than 400 contributors. There are more than 1.800 forks on GitHub and Dockers brings up approximately one new release per month. Docker is in a super fast development right now and this is really great to see!

However, this very high development speed leaves a lot of third-party tools behind. If you develop a tool for Docker, you have to keep a very high pace. If not, your tool is outdated within a month.

Docker remote API client libraries

A good example how this development speed affects projects, are the remote API libraries for Docker. Docker offers a JSON API to access Docker in a programmatic way. It enables you for example to list all running containers and stop a specific one. All via JSON and HTTP requests.

To use this JSON API in a convenient way, people created bindings for their favorite programming language. As you can see below, there exist bindings for JavaScript, Ruby, Java and many more. I used some of them on my own and I am really thankful for the great work their developers have done!

But many of those libraries are outdated at the time I am writing this. To be exact: all of them are outdated! The current remote API version of Docker is v1.11 (see here for more) which none of the remote API libraries supports right now. Many of them don’t even support v1.10 or v1.9.

Here is the list of remote API tools as you find it at http://docs.docker.io/reference/api/remote_api_client_libraries/.

Language Name Remote API
Python docker-py v1.9
Ruby docker-api v1.10
JavaScript (NodeJS) dockerode v1.10
JavaScript (NodeJS) ocker.io v1.7
JavaScript (Angular) WebUI dockerui v1.8
Java docker-java v1.8
Erlang erldocker v1.4
Go dockerclient v1.10
PHP Docker-PHP v1.9
Scala reactive-docker v1.10

How to deal with rapidly evolving APIs

How to deal with rapidly evolving APIs is a difficult question and IMHO Docker made the right decision. By solely providing a JSON API Docker chose a modern and universal technique. A JSON API can be used in any language or even in a web browser. JSON (together with a RESTful API) is the state-of-the-art technique to interact with services. Docker even leaves the possibility to fall back to an old API version by adding an version identifier to the request. Well done.

But the decision to stay “universal” (by solely providing a JSON API) also means to don’t get specific. Getting specific (which means to use Docker in a certain programming language) is left to the developers of third party tools. These tools are also evolving rapidly right now, no matter if those are remote API bindings, deployment tools (like Deis.io), or hosting solutions (like CoreOS). This enriches the Docker ecosystem and makes the project even more interesting.

Bad third party tools will fall back on you

The problem is, even if Docker made a good job (which they did!), outdated or poorly implemented third party tools will fall back on Docker, too. If you use a third party library (which you maybe found via the official website) and it works fine, you will be happy with Docker and the third party library. But if the library doesn’t work next month because you updated Docker and the library doesn’t take care of the API version, you will be frustrated about the tool and about Docker.

Pattern of frustration

This pattern of frustration occurs a lot in software development. Bad libraries cause frustrations about the tool itself. Let’s take Java as an example. A lot of people complain about Java that it is verbose, uses class-explosions as a pattern and makes things much more complicated as they should be. The famous AbstractSingletonProxyFactoryBean class of the Spring framework is just such an example (see +Paul Lewis). Another example is reading a file in Java which was an awful pain:

And even the new NIO API which came with Java 7 is not as easy as it could be:

You need to put a String into a Path to pass it into static method which output you need to put into a String again. Great idea! But what about something like this:

However, it is not the fault of Java, but of a poorly implemented third party tool. If you need to put a File into a FileReader which you need to put into a BufferedReader to be able to read a file line by line into a StringBuilder you use a terrible I/O library! But anyway, you will be frustrated about Java and how verbose it is (and maybe also about the API itself).

This pattern applies to many other things: You are angry about your smartphone, because of a poorly coded app. You are angry about Eclipse because it crashes with a newly installed plugin. And so on…

I hope this pattern of frustration will not apply to Docker and the community will develop a stable ecosystem of tools to provide a solid basis for development and deployment with Docker. A tool like Dockers lives trough its ecosystem. If the tools are buggy or outdated, people will be frustrated about Docker – and that would be a shame, because Docker is really great!

Best regards,
Thomas

Java 8 in Eclipse (Kepler and Luna)

Java 8 is officially available since a couple of days. It was released on March 18th this year. Yeah! But – Eclipse, the biggest and most popular Java IDE is a little bit behind the schedule (at least IMHO). There is no official Eclipse version for Java 8 right now!

But there are two other thing: an up-to-date nightly build of the new Eclipse version Luna which will be released with Java 8 support and a patch for the current Eclipse version Kepler! Since it is sometimes (= always) hard to find the correct Eclipse version on eclipse dot org or anything useful at all, here is how to do it.

Java 8 in Eclipse Juno (4.2)

If you still use Eclipse Juno (4.2), you use an old version of Eclipse without Java 8 support and I strongly recommend to use a new version (as described below). If you cannot do that (or don’t want to do that) here is an old tutorial from me how to setup Java 8 in Eclipse Juno: http://tuhrig.de/java-8-in-eclipse-juno.

Java 8 in Eclipse Kepler (4.3)

2014-03-31 12_16_15-Program Manager

Kepler (4.3) is the current version of Eclipse. You can download it from http://www.eclipse.org/downloads. After you downloaded it, it will not have Java 8 support right out of the box! To add it, you have to install a feature patch:

  1. Download and install Java 8 (e.g. from Oracle JRE/JDK or OpenJDK)
  2. Add it to Eclipse:
    2014-03-31 12_25_37-Java - Java8Project_src_Test.java - Eclipse
  3. Install the feature patch via Help > Install New Software...:

    2014-03-31 12_04_14-Java - test_src_test_Hello.java - Eclipse

You have to restart Eclipse during the installation and accept a license and click several next-buttons. But after you installed the feature patch, you have Java 8 support in Eclipse Kepler (4.3). If you have problems with the installation process, download a new Eclipse Kepler version.

2014-03-31 12_34_14-

Java 8 in Eclipse Luna (4.4)

2014-03-31 16_18_56-C__Users_tuhrig_AppData_Local_Temp_7zOADE2.tmp_repositories • - Sublime Text 2 (

Luna (4.4) is the upcoming version of Eclipse. It will be released this summer and it will contain Java 8 support. But you can already download some nightly builds of it which contains Java 8 support.

You can download Eclipse Luna here. Just make sure you download the correct build version, since not all builds have Java 8 support already! When you download the correct version, you can just run Eclipse and use the new Java 8 features (you also need to install Java 8, e.g. from Oracle JRE/JDK or OpenJDK).

2014-03-31 12_38_38-Eclipse Project Downloads

By the way, some nightly builds still contain test errors. This is due to the fact that Eclipse Luna is still in development. But to play around with Java 8 or for a small project it should already be good enough.

2014-04-01 07_42_46-Eclipse Project

A simple Java 8 example

To test your Eclipse IDE, here is a very simple Java 8 example:

Troubleshooting

If you have problems with Java 8 in Eclipse check your project settings. If your compliance level is not set to 1.8, you cannot use Java 8. And if you don’t even have the compliance level 1.8 then your Eclipse somehow doesn’t support Java 8 and something went wrong. In this case, get yourself a new Kepler version and install the feature patch or download an appropriate Luna build as described above.

2014-03-31 12_12_56-Java - test_src_test_Hello.java - Eclipse SDK

Resources

Best regards,
Thomas

Flatten a Docker container or image

Docker containers and respectively images can become fairly large. I recently worked with a Docker image which was over 7 GB big. However, it is pretty easy to flatten an image at the end.

Difference between save and export

As I described in my last post (http://tuhrig.de/difference-between-save-and-export-in-docker), there are two ways to persist a Docker images or container:

  • A Docker image can be saved to a tarball and loaded back again. This will preserve the history of the image.

  • A Docker container can be exported to a tarball and imported back again. This will not preserve the history of the container.

No history

We can use this mechanism to flatten and shrink a Docker container. If we save an image to the disk, its whole history will be preserved, but if we export a container, its history gets lost and the resulting tarball will be much smaller.

We can see the history of a image be running docker tag <LAYER ID> <IMAGE NAMEgt;:

So if we export a container (either an already running one or just start a new one from an image) it will lose its history and all previous layers. This will make it impossible to make a rollback to a certain layer, but it will also shrink the image. My >7 GB image is now >3 GB large, which saves more than 50% of disk space.

Flatten a Docker container

So it is only possible to “flatten” a Docker container, not an image. So we need to start a container from an image first. Then we can export and import the container in one line:

What else?

You can use some common Linux tricks to shrink Docker images. One simple trick is to clear the cache of the package manager. So depending on which base image you use you can do something like this (for an Ubuntu/Debian system, for more see here):

Resources

Best regards,
Thomas

Difference between save and export in Docker

I recently played around with Docker, an application container and virtualization‎ technology for Linux. It was pretty cool and I was able to create Docker images and containers within a couple of minutes. Everything was working right out of the box!

At the end of my day I wanted to persist my work. I stumbled over the Docker commands save and export and wondered what their difference is. So I went to StackOverflow and asked a question which was nicely answered by mbarthelemy. Here is what I found out.

How Docker works (in a nutshell)

Docker is based on so called images. These images are comparable to virtual machine images and contain files, configurations and installed programs. And just like virtual machine images you can start instances of them. A running instance of an image is called container. You can make changes to a container (e.g. delete a file), but these changes will not affect the image. However, you can create a new image from a running container (and all it changes) using docker commit <container-id> <image-name>.

Let’s make an example:

Now we have two different images (busybox and busybox-1) and we have a container made from busybox which also contains the change (the new folder /home/test). Let’s see how we can persist our changes.

Export

Export is used to persist a container (not an image). So we need the container id which we can see like this:

To export a container we simply do:

The result is a TAR-file which should be around 2.7 MB big (slightly smaller than the one from save).

Save

Save is used to persist an image (not a container). So we need the image name which we can see like this:

To save an image we simply do:

The result is a TAR-file which should be around 2.8 MB big (slightly bigger than the one from export).

The difference

Now after we created our TAR-files, let’s see what we have. First of all we clean up a little bit – we remove all containers and images we have right now:

We start with our export we did from the container. We can import it like this:

We can do the same for the saved image:

So what’s the difference between both? Well, as we saw the exported version is slightly smaller. That is because it is flattened, which means it lost its history and meta-data. We can see this by the following command:

If we run the command we will see an output like the following. As you can see there, the exported-imported image has lost all of its history whereas the saved-loaded image still have its history and layers. This means that you cannot do any rollback to a previous layer if you export-import it while you can still do this if you save-load the whole (complete) image (you can go back to a previous layer by using docker tag <LAYER ID> <IMAGE NAME>).

Best regards,
Thomas

3 ways of installing Oracle XE 11g on Ubuntu

During the last few days I struggled with installing Oracle XE 11g on an Ubuntu VM. And with “the last few days” I mean “the last few weeks”. Sad, but true. Here is what I learned about installing Oracle XE 11g on Ubuntu. But please note that I am not a Linux specialist nor an Oracle specialist – it was my first try. And as you will see, I try to make my life as easy as possible. But let’s start with the very basics.

About Oracle XE 11g

Oracle Database Express Edition 11g (or just Oracle XE) is the free version of Oracles 11g database. It was released in 2011 and is the second free version of Oracle’s database. The first free version was Oracles XE 10g, which was released in 2005. The latest paid version of Oracle’s database is 12c, which was released in 2013.

What is really confusing is the fact that the release date of the paid versions is different from the release date of the free versions. So don’ be confused, the latest free database from Oracle is 11g.

Version Paid Free
Version 9i: 2001
Version 10g: 2003 2005
Version 11g: 2007 2011
Version 12c: 2013

By the way, the i, g and c in the database names stand for internet, grid and cloud.

You can download Oracle XE from the link below – and this is where the pain begins. First of all you need an Oracle account, but this is free and easy. Then you have the choice between two packages:

  • One package for Windows which only runs on a 32-bit machine as Oracle says. But don’t worry, it also runs on a 64-bit machine (like mine) and although the unzipped installation folder is called DISK1 there is no DISK2 or something. Just run the setup.exe.
  • One package for Linux which is meant to run on a 64-bit machine as Oracle says. And here is the first pain: The package is only available as RPM so you first have to run alien on it to convert it for Ubuntu.

#1 – Installing Oracle XE by hand

b7e

My first approach was to install Oracle XE by hand. Although this seems to be the straightforward solution, it was the most painful one. You have to convert the RPM package to a DEP package, create a chkconfig script, create some mystic kernel parameters in a 60-oracle.conf, set your swap space to 2GB or more, create some more folders, install the DEB package, configure the database, export some environment variables like ORACLE_HOME (are you still with me?), reload that changes and then start the Oracle service and connect via SQLPlus. Easy, isn’t it? That’s why Oracle is making so much money with consulting 😀

And how to know all this steps? Well, use Google, because the best descriptions I found are not from Oracle. To be precise, I found no official description how to setup the Oracle XE database. Here is what I found on blogs and forums. Just choose one that suits you best. They are all really good, thanks to the people who wrote this!

#2 – Installing Oracle XE with Vagrant and Puppet

Vagrant is a free tool to create Linux VMs automatically. This means you can run Vagrant with a simple configuration file (called Vagrantfile) which describes a Linux VM (e.g. how many CPUs it should have or which image should be installed). Vagrant will create a VM according to this configuration and start it. This gives you the ability to create the same machine with the exact same configuration over and over again.

Puppet is a tool to orchestrate Linux machines. This means Puppet can automatically install programs, create folders or write files. You can download and use it for free. It perfectly integrates into Vagrant. So you can create a VM with Vagrant and as soon as it is ready it can be orchestrated by Puppet.

The really great thing about Vagrant and Puppet is that such scripts can be exchanged. This means that you can write a setup of scripts for something (e.g. to install Oracle XE) and share it with other people. And this is what those to guys did:

  • Matthew Baldwin wrote a complete Puppet (and Vagrant) setup to install Oracle XE on CentOS (a “RPM-Linux“). You find it here on GitHub.
  • Hilverd Reker wrote a complete Puppet (and Vagrant) setup to install Oracle XE on Ubuntu 12.04 (a “DEB-Linux“). You find it here on GitHub.

Both projects work the same:

  1. Install Vagrant, VirtualBox and Puppet
  2. Checkout the GitHub repository or download it as a ZIP-file
  3. Download the Oracle XE installation files and put it in some folder which you can read in the README.md of the projects
  4. Go to the root folder of the project and run vagrant up. This will download a Linux image, install it to a VM and do all the rest (including installing Oracle XE). Note that this will take some time depending on your internet connection!
  5. Now you can call vagrant ssh and you got a running Linux VM with Oracle XE!

Both projects work really well and install an Oracle XE instance in a couple of minutes.

#3 – Installing Oracle XE with Docker

homepage-docker-logo

This one is the nicest way to install Oracle XE. Docker is an application container for Linux. It is based on LXC and gives you the ability to package complete application including their dependencies to a self-containing file (called an image). These images can be exchanged and run on every Linux machine where Docker is installed! Awesome!

Docker images are also shared around the community on https://index.docker.io. And this is where this two guys come into play:

  • Wei-Ming Wu made a Docker image containing Oracle XE. You can find it here.
  • Alexei Ledenev extended Wei-Ming Wu’s image to also use the Oracle web console (APEX). You can find it here.

Both projects work the same:

  1. Install Docker on your Linux machine. You can find instructions for that at http://docs.docker.io/en/latest/installation/ubuntulinux. But it is nothing more then this:
  2. Pull the image to your machine:
  3. Run the image:
  4. That’s it. Absolutely simple.

The great thing is that those images are not as big as a virtual machine. They only contain the actual application and its environment. But they are packed in a way that they can be executed everywhere right out of the box but are still isolated. You can also make changes to the images and create another image containing your changed system.

Resources

Best regards,
Thomas

Show next image by clicking images in NextGEN Gallery V2.0.40

Last year I posted two modifications for the WordPress plugin NextGen Gallery and its Thickbox and Shutter effect (here is the old one for Thickbox and the old one for Shutter). The modification makes it possible to click on the current image to go forward to the next image. To close the slideshow you can just click somewhere beneath the image. This is the same behavior as on many popular website such as Facebook.

Unfortunately, NextGen Gallery released a new version of its WordPress plugin which breaks my fix. The new version of NextGen Gallery V2.0.40 was released in November 2013 and doesn’t work with my fix any more!

So here is the new fix.

Fix for Thickbox in NextGen Gallery V2.0.40

To adjust the Thickbox effect in the current version of NextGen Gallery (V2.0.40) is quite easy. It is nearly the same fix as for the old version (only the markup for the close button is different). Here it is:

Open the file thickbox.js in our WordPress installation. You can do this via FTP or with a WordPress plugin such as wp-FileManager. In both cases you will find the file in ../wordpress/wp-includes/js/thickbox/thickbox.js. You have to do the following two changes to the file:

Around line 132 you will find the following line:

Replace the complete line with the following code:

Now you have to add a new line around line 161. Under the line:

you have to add:

Save the file. That’s it.

Fix for Shutter in NextGen Gallery V2.0.40

Open the file shutter.js in our WordPress installation. You can do this via FTP or with a WordPress plugin such as wp-FileManager. In both cases you will find the file in ../wordpress/wp-content/plugins/nextgen_gallery/products/photocrati_nextgen/modules/lightbox/static/shutter/shutter.js. You have to do the following two changes to the file:

Around line number 143 there should be a line like this:

Replace the complete line with the following code:

Now you have to add a new function just before this code around line 153:

Add this function to the file:

Note that the comma at the end is important! Now the file should look like this:

NextGEN-Shutter-Mod

Save the file. That’s it. And by the way, this is the same fix as for the old version!

Download the files

I know that fixing some JavaScript files is ugly, so here are the modified files. Just replace the original one:

shutter.js (replace ../wordpress/wp-content/plugins/nextgen_gallery/products/photocrati_nextgen/modules/lightbox/static/shutter/shutter.js)

thickbox.js (replace ../wordpress/wp-includes/js/thickbox/thickbox.js)

Best regards,
Thomas

Install OpinionTrends with nginx and memcached

OpinionTrends is build with Python and Flask. Therefore you can run it without any additional server right out of the box. Batteries included! However, it is much more common and much more efficient if you run it with a web server and an application server. A widely used combination for Python web applications is nginx together with uWSGI. In this tutorial I want to show how we set up this two tools to run TechTrends and OpinionTrends. The tutorial starts at the very beginning and the only thing I assume is that you run a Linux machine with Python and easy_install.

Note: OpinionTrends is the new version of TechTrends. However, it is just a code name. So when I talk about OpinionTrends or TechTrends I mean the same thing.

OpinionTrends

Base folder

The first thing we have to do is to create the base directory for TechTrends. By convention a web application is located in /var/www. All of our code will go into this folder.

Clone & Update

Now we can start to set-up the application from scratch. OpinionTrends is deployed with git. So we just check-out the code and switch to the new opinion branch:

After the first checkout an update is super easy. Just do a pull for the latest code:

Configuration

Now we have to make some simple settings for TechTrends. To do this, we create a new file called config.py in the folder Configuration in our checked-out project. There is also a file called config_template.py in this folder which is an empty but well documented template for the configuration. The file contains all individual settings for the application. It should look like this:

Dependencies

Now we have to install the libraries needed by TechTrends. To do this we use easy_install. All dependencies are in a file called requirements.txt in the root folder of TechTrends. We have to install all dependencies in this file:

Note: As you see, the installation of the dependencies is very easy in theory. However, some dependencies such as scikit-learn are sometimes hard to install since they are not pure Python and use come C bindings. If you have problems installing the whole requirements.txt at once try to install every dependency manually on its own.

nginx

800px-Nginx_Logo.svg

Install

Installing nginx is not a big thing:

After we installed and started nginx we can verify if it is running correctly by taking our favorite browser and surf to our machine.

By the way, stopping nginx is also very simple:

Configuration

Now we have to configure ngix to point to TechTrends instead to the default welcome page. To do this we first remove the default configuration:

Now we create our own configuration in /var/www/techtrends/nginx.conf. It should look like this:

We link this file to nginx a restart it:

Now we should get a Bad Gateway exception. Perfect! This tells use that nginx found our configuration and that every things looks good – except of the missing uWSGI!

uWSGI

logo_uWSGI

Install

uWGSI is the protocol between our Python application and nginx. It is their way of communication. First we have to install it:

Configuration

Now we create a configuration file in /var/www/techtrends/uwsgi.ini. It should look like this:

Now we can start uWSGI as a daemon in the background:

Done! Nginx is serving TechTrends now. However, we should get an exception again since memcached is still missing. If we want to use TechTrends without memcache we have to change a value in the config.py in the Configuration folder in TechTrends base directory. We have to set DEBUG = True to not use memcache.

memcached

memcached_banner75

One of the biggest performance improvements you can do (I guess in general, but especially for TechTrends) is to use memcached. Memcached is a key-value in-memory store to cache frequently requested data. In TechTrends we use it to store whole pages and JSON responses from our API.

Install

Install memcached first:

And that’s it! Memcached is installed and running now. You can restart memcached (e.g. to clear it) like this:

Congratulations

Congratulations! TechTrends should run now in a stable production mode. We installed nginx, uWSGI and memcached. We also configured it to work together. Great! But – there are still some open points we have to do.

crontab

TechTrends/OpinionTrends has two regularly scheduled jobs. One job is the crawler which crawls posts from Reddit and Hackernews and the other job is the training and restart of the application. To execute this jobs we set-up a cron tab. First we create two files which execute these jobs. The first file is called crawl.sh and looks like this:

The second file is called restart.sh and looks like this:

Both files should be in the root folder (/var/www/techtrends/) of TechTrends. Now we add those two files to our locale crontab. We can edit the it like this:

It should look like this:

This crontab will run the crawler every 30 minutes and once a day at 3 o’clock it will trigger the training and restart of the whole application. So the data will growth every 30 minutes and will be indexed once a day. That’s it.

Best regards,
Thomas