Java vs. JavaScript Build Tools

When you come from a Java background like I do and you take a look at JavaScript build tools, the sheer mass of tools is overwhelming.The eco system is evolving very fast with new tools coming up every couple of months. Below, I tried to give a comparison between Java build tools and their equivalent in the JavaScript world.

Preface

The list below aims to show similarities – not differences. Of course, every tool has its own set of features and individual strengths. The tools on the left are not the very as the tools on right, just for another software stack. However, to see some similarities might help to better understand how the JavaScript world works.

Runtime

java nodejs
  • Platform where all tools will run on
  • Executes scripts, tests, etc.

Task Runner

gradleant gruntgulp
  • Executes scripts and orchestrates tasks
  • Copy files, uglify, minify, package…

Package Manager

maven bower300
  • Dependency Management
  • Versioning

Test Runner

junit karma
  • Tests
  • Test Suits

Mocking/Assertion Library

bildschirmfoto-2016-11-14-um-09-17-07 Jasmine
  • Assertions and expectations (assertThat(1 + 1, is(2)))
  • Mocks and stups

Best regards,
Thomas

DeployMan (command line tool to deploy Docker images to AWS)

DeployMan

2014-07-29 11_34_11-Java EE - Eclipse

Yesterday, I published a tool called DeployMan on GitHub. DeployMan is a command line tool to deploy Docker images to AWS and was the software prototype for my master thesis. I wrote my thesis at Informatica in Stuttgart-Weilimdorf, so first of all, I want to say thank you to Thomas Kasemir for the opportunity to put this online!

Disclaimer

At the time I am writing this post, DeployMan is a pure prototype. It was created for academic research and as a demo for my thesis. It is not ready ready for production. If you need a solid tool to deploy Docker images (to AWS), have a look at Puppet, CloudFormation (for AWS), Terraform, Vagrant, fig (for Docker) or any other orchestration tool that came up in the last couple of years.

What DeployMan does

DeployMan can create new AWS EC2 instances and deploy a predefined stack of Docker images on it. To do so, DeployMan takes a configuration file called a formation. A formation specifies how the EC2 machine should look like and which Docker images (and which configurations) should be deployed. Docker images can either be deployed from a Docker registry (the public one or a private) or a tarballs from a S3 storage. Together with each image, a configuration folder will pulled from a S3 storage and mounted to the running container.

Here is an example of a formation which deploys a Nginx server with a static HTML page:

Interfaces

DeployMan provides a command line interface to start instances and do some basic monitoring of the deployment process. Here is a screenshot which shows some formations (which can be started) and the output of a started Logstash server:

Run_Logstash_Server

To keep track of the deployment process in a more pleasant way, DeployMan has a web interface. The web interface shows details to machines, such as the deployed images and which containers are running. Here is how a Logstash server would look like:

Machine_Details

The project

GitHub-Mark

You can find the project on GitHub at https://github.com/tuhrig/DeployMan. I wrote a detailed README.md which explains how to build and use DeployMan. To test DeployMan, you need an AWS account (there are also free accounts).

The project is made with Java 8, Maven, the AWS Java API, the Docker Java API and a lot of small stuff like Apache Commons. The web interface is based on Spark (for the server), Google’s AngularJS and Twitter’s Bootstrap CSS.

Best regards,
Thomas

Show next image by clicking images in NextGEN Gallery V2.0.40

Last year I posted two modifications for the WordPress plugin NextGen Gallery and its Thickbox and Shutter effect (here is the old one for Thickbox and the old one for Shutter). The modification makes it possible to click on the current image to go forward to the next image. To close the slideshow you can just click somewhere beneath the image. This is the same behavior as on many popular website such as Facebook.

Unfortunately, NextGen Gallery released a new version of its WordPress plugin which breaks my fix. The new version of NextGen Gallery V2.0.40 was released in November 2013 and doesn’t work with my fix any more!

So here is the new fix.

Fix for Thickbox in NextGen Gallery V2.0.40

To adjust the Thickbox effect in the current version of NextGen Gallery (V2.0.40) is quite easy. It is nearly the same fix as for the old version (only the markup for the close button is different). Here it is:

Open the file thickbox.js in our WordPress installation. You can do this via FTP or with a WordPress plugin such as wp-FileManager. In both cases you will find the file in ../wordpress/wp-includes/js/thickbox/thickbox.js. You have to do the following two changes to the file:

Around line 132 you will find the following line:

Replace the complete line with the following code:

Now you have to add a new line around line 161. Under the line:

you have to add:

Save the file. That’s it.

Fix for Shutter in NextGen Gallery V2.0.40

Open the file shutter.js in our WordPress installation. You can do this via FTP or with a WordPress plugin such as wp-FileManager. In both cases you will find the file in ../wordpress/wp-content/plugins/nextgen_gallery/products/photocrati_nextgen/modules/lightbox/static/shutter/shutter.js. You have to do the following two changes to the file:

Around line number 143 there should be a line like this:

Replace the complete line with the following code:

Now you have to add a new function just before this code around line 153:

Add this function to the file:

Note that the comma at the end is important! Now the file should look like this:

NextGEN-Shutter-Mod

Save the file. That’s it. And by the way, this is the same fix as for the old version!

Download the files

I know that fixing some JavaScript files is ugly, so here are the modified files. Just replace the original one:

shutter.js (replace ../wordpress/wp-content/plugins/nextgen_gallery/products/photocrati_nextgen/modules/lightbox/static/shutter/shutter.js)

thickbox.js (replace ../wordpress/wp-includes/js/thickbox/thickbox.js)

Best regards,
Thomas

Install OpinionTrends with nginx and memcached

OpinionTrends is build with Python and Flask. Therefore you can run it without any additional server right out of the box. Batteries included! However, it is much more common and much more efficient if you run it with a web server and an application server. A widely used combination for Python web applications is nginx together with uWSGI. In this tutorial I want to show how we set up this two tools to run TechTrends and OpinionTrends. The tutorial starts at the very beginning and the only thing I assume is that you run a Linux machine with Python and easy_install.

Note: OpinionTrends is the new version of TechTrends. However, it is just a code name. So when I talk about OpinionTrends or TechTrends I mean the same thing.

OpinionTrends

Base folder

The first thing we have to do is to create the base directory for TechTrends. By convention a web application is located in /var/www. All of our code will go into this folder.

Clone & Update

Now we can start to set-up the application from scratch. OpinionTrends is deployed with git. So we just check-out the code and switch to the new opinion branch:

After the first checkout an update is super easy. Just do a pull for the latest code:

Configuration

Now we have to make some simple settings for TechTrends. To do this, we create a new file called config.py in the folder Configuration in our checked-out project. There is also a file called config_template.py in this folder which is an empty but well documented template for the configuration. The file contains all individual settings for the application. It should look like this:

Dependencies

Now we have to install the libraries needed by TechTrends. To do this we use easy_install. All dependencies are in a file called requirements.txt in the root folder of TechTrends. We have to install all dependencies in this file:

Note: As you see, the installation of the dependencies is very easy in theory. However, some dependencies such as scikit-learn are sometimes hard to install since they are not pure Python and use come C bindings. If you have problems installing the whole requirements.txt at once try to install every dependency manually on its own.

nginx

800px-Nginx_Logo.svg

Install

Installing nginx is not a big thing:

After we installed and started nginx we can verify if it is running correctly by taking our favorite browser and surf to our machine.

By the way, stopping nginx is also very simple:

Configuration

Now we have to configure ngix to point to TechTrends instead to the default welcome page. To do this we first remove the default configuration:

Now we create our own configuration in /var/www/techtrends/nginx.conf. It should look like this:

We link this file to nginx a restart it:

Now we should get a Bad Gateway exception. Perfect! This tells use that nginx found our configuration and that every things looks good – except of the missing uWSGI!

uWSGI

logo_uWSGI

Install

uWGSI is the protocol between our Python application and nginx. It is their way of communication. First we have to install it:

Configuration

Now we create a configuration file in /var/www/techtrends/uwsgi.ini. It should look like this:

Now we can start uWSGI as a daemon in the background:

Done! Nginx is serving TechTrends now. However, we should get an exception again since memcached is still missing. If we want to use TechTrends without memcache we have to change a value in the config.py in the Configuration folder in TechTrends base directory. We have to set DEBUG = True to not use memcache.

memcached

memcached_banner75

One of the biggest performance improvements you can do (I guess in general, but especially for TechTrends) is to use memcached. Memcached is a key-value in-memory store to cache frequently requested data. In TechTrends we use it to store whole pages and JSON responses from our API.

Install

Install memcached first:

And that’s it! Memcached is installed and running now. You can restart memcached (e.g. to clear it) like this:

Congratulations

Congratulations! TechTrends should run now in a stable production mode. We installed nginx, uWSGI and memcached. We also configured it to work together. Great! But – there are still some open points we have to do.

crontab

TechTrends/OpinionTrends has two regularly scheduled jobs. One job is the crawler which crawls posts from Reddit and Hackernews and the other job is the training and restart of the application. To execute this jobs we set-up a cron tab. First we create two files which execute these jobs. The first file is called crawl.sh and looks like this:

The second file is called restart.sh and looks like this:

Both files should be in the root folder (/var/www/techtrends/) of TechTrends. Now we add those two files to our locale crontab. We can edit the it like this:

It should look like this:

This crontab will run the crawler every 30 minutes and once a day at 3 o’clock it will trigger the training and restart of the whole application. So the data will growth every 30 minutes and will be indexed once a day. That’s it.

Best regards,
Thomas

Continuous Integration Development Workflow

This semester I attended an university course called System Engineering and Management taught by Prof. Walter Kriha. The course has a slightly different topic every year and is made up of presentations from students, research assistants and other lecturers. The topic in this year was continuous integration and software deployment. Together with Jan Müller, I set up a continuous integration environment to develop a Python web app.

Our demo app

We made a simple demo app to demonstrate your workflow and to show the purpose of the components in our CI environment. We build our app with Python and Flask on the server, a simple template rendering for the client and a MongoDB at the back-end. The app itself was a simple CRUD application to add, update and delete users from a list.

The CI environment

ci_env_2

GitHub

We used GitHub as our central repository and to communicate with the rest of the world. The idea (for a real world team) was to use GitHub as a repository, as a wiki, as an issue tracker and as a community tool, e.g. for pull requests. However, we did not push any code directly to GitHub! Here is why:

GerritHub

We used GerritHub as a review tool and as our central repository for all development tasks. To do this, GerritHub cloned the original repository from GitHub and automatically synchronized both of them. As a developer we checked out code from GerritHub (not from GitHub!) and also pushed our changes to GerritHub (and, guess what, not to GitHub directly!). After a new change was pushed to GerritHub, all developers of the project received an email notification about the change. The change itself had to be reviewed and approved in GerritHub by at least one other developer. Only after a change was reviewed and approved it went into our code base on GitHub (GerritHub merged it into the code based). This enforced a certain policy in our development process where every piece of code must be reviewed by somebody else.

BuildBot

After the code was pushed to GerritHub, approved there and merged into GitHub, it was time for our build system. We set up a so called post commit hook in GitHub, which informed our build system about code changes. So each time new code arrived from GerritHub in our GitHub repository our build system started a new build. Since our app was made in Python we decided to use BuildBot (which itself is made in Python) as our build system. However, what we did is possible with any other build system, too. BuildBot did a set of tasks for us:

  1. Check out the latest code from GitHub
  2. Execute all unit tests
  3. Run static code analysis such as PEP8
  4. Build an artifact to download
  5. Deploy the app to our demo machine

Demo machine

We set up a demo machine where our BuildBot deployed ever (successful) build. With this automated deployment we always had an up-to-date and running instance of our app. This is great as a team since everybody of the team can see the latest features and show them to potential customers.

Demo videos

I made a short (~10 minutes) video to demonstrate the development workflow in this environment. The video covers a full development cycle of a new feature from the first check-out of the project to the final result on the demo machine. I made a voice-over in English and in German, both should contain the same (more or less). The video is cut to make it as short as possible, so you won’t have to wait for my Eclipse to start…

What I do in the video is the following:

  1. Check out the project and run make sure it is in a stable state
  2. Implement a new feature in a test-driven way
  3. Push the new feature
  4. Watch BuildBot doing its work
  5. Enjoy the final result on our demo machine

Demo Video (English): CI Development Workflow

Video: http://www.youtube.com/watch?v=ZYiwhvRDRak



Demo Video (German): CI Development Workflow

Video: http://www.youtube.com/watch?v=tza03NG1xwo



Best regards,
Thomas

Opinion Mining on Hackernews and Reddit

TechTrends

Last semester two of my friends and I made some sort of a search engine for Hackernews and Reddit. The idea was to collect all articles published on those two platforms and search them for trends. It should be possible to type-in a certain keyword such as “Bitcoin” and retrieve a trend chart showing when and how many articles have been published to “Bitcoin”.

The result was TechTrends. Based on Radim Řehůřek’s Gensim (a Python framework for text classification) we build a web application which crawls Hackernews and Reddit continuously and offers a simple web interface to search trends in these posts. You can find more posts about TechTrends here.

OpinionTrends

This winter semester I started to implement a new feature for TechTrends. I wanted to build an opinion mining and sentiment analysis for all posts. Based on the comments for each post on Hackernews and Reddit I wanted to classify all posts in one of three categories:

  • Positive, which means that most of the comments are praising the article.
  • Negative, which means that most of the comments are criticizing the article.
  • Neutral, which means that there is a balance between positive and negative comments.

I gave it the code name OpinionTrends.

The basic idea

The basic idea was to train a supervised classifier to categorize each comment and therefore each post. This should work similar to a spam filter in an email application: Each email marked as ‘spam’ will train a classifier which can categorize emails on its own in good or bad (which actually means spam). I wanted to do the same but with comments instead of emails and with three instead of two categories.

Training

A classifier is only as good as its training data. In case of a spam filter the training data are emails marked as ‘spam’ by the user. This makes the training data very good and very individual to each user. In my case I decided to use Amazon product reviews.

Amazon Product Reviews

Amazon product reviews are a great way to retrieve training data. They are marked with stars in 5 categories, they are available in many languages and for many domains and you can crawl them very easily. The only thing you have to do is to sign up a free developer account on Amazon and get started with your favorite language (there are libraries for most common languages out there).

Once the classifier is trained, it can be saved to a file and used further on. It doesn’t need to be updated anymore. However, the performance of the application depends completely on the classifier. Therefore it should be trained and tested carefully.

Validation

I tested different classifiers and different pre-processing steps of the Amazon Reviews. Below you can see a comparison between a Bayes Classifier and a SVM. The SVM beats the Bayes Classifier by 10% or more. However, its performance also depends dramatically on the pre-processing of the raw reviews.

Type of classfier No pre-processing Bi-grams Only adjectives No stop words
Bayes Classifier 71% accuracy 65% accuracy 70% accuracy 72% accuracy
SVM 85% accuracy 86% accuracy 78% accuracy 84% accuracy

Problems with validation

All tests were made with a 10-fold cross-validation. The only problem with those tests is, that I trained and tested with reviews from Amazon, but my final data were comments on blog posts which is not the exact same domain. Since Hackernews and Reddit are both about computer science I used reviews from SSDs, Microsoft and Apple software or computer games to be as close as possible to my final domain. However, I can’t really validate my final results. This has two reasons:

  • I don’t have a huge number of tagged blog posts and comments to compare them with the results of the classifier.
  • Comments are very subjective. In many cases you can not decide for sure whether a comment is positive or negative. Some few comments are very clear and easy (I hate your article.), but a lot of comments are something in between (I love your website but I hate the color.). Even I as a human beeing can not decide if they are positive or negative and if I could decide it my friend would argue with me.

My final result

OpinionTrends is in its last steps since a couple of days. Next week will present it at the Media Night of my university (a fairy for student projects). You can read more about it here.

OpinionTrends is also online and in some kind of a stable-state. However, it is still under development: http://opiniontrends.mi.hdm-stuttgart.de/

This is how it works

OpinionTrends works the same as TechTrends. You go to the website, type-in a keyword and get the results. The only really new thing is a brand new tab on the result page.

op_trends_new_tab

When you click on it you will see a new chart similar to the blue one on the Timeline tab. The chart has three colors and is very easy to read. The green bars represent the positive articles, the red bars represent the negative articles and the light yellow bars represent the neutral articles. The neutral articles are visualized on the positive scale and on the negative scale with same amount.

op_trends_nsa_chart

Above you see the result for NSA, which is actually a very good example since the overwhelming opinion about the NSA is very negative which you can see perfectly in the chart.

You can click on each bar to see a pop-up showing the articles behind the bar. You can jump directly to the article or open the discussion with all comments on this article.

op_trends_pop_up

Examples

Here are some good examples to show you how OpinionTrends works. The best one I found so fare is a search for NSA. The opinion is very negative as everyone would expect.

op_trends_nsa_chart

The opinion on Git is much more balanced. It has not only a nearly equal number of positive and negative articles, it has also a lager number of neutral posts.

op_trend_git

The opinion on Python is much better. Is has a lot of neutral posts, but beside them, Python has fare more positive than negative posts.

op_trend_python

More…

OpinionTrends has some more features such as individual settings so adjust each search. However, I think this is too much for this post. You can get a lot more information directly on the project site. TechTrends/OpinionTrends is also open source, so you can checkout the source code from BitBucket. OpinionTrends is in its own branch!

I hope you enjoy it and I would be really happy about some feedback.

Best regards,
Thomas

Show next image by clicking images in NextGEN Gallery with shutter effect


Here‘s the same for NextGEN + Thickbox!


Last year I posted an article about modifying NextGEN Gallery (with Thickbox) to show the next image when you click on the current one. This behaviour is well known from Facebook an co. To go to the next picture, just click on the current one. To close the slideshow, just click beside the picture. Easy. You can find the article here.

Since I used the Thickbox effect in my modification, a lot of people asked me if it is possbile with the Shutter effect, too. Of course it is!

Here’s what you have to do to modify NextGEN Gallery with the Shutter effect to display the next image by clicking on the current one. I tested it with WordPress 3.7.1 und NextGen 2.0.33.

1 – Open the file wp-content/plugins/nextgen_gallery/products/photocrati_nextgen/modules/lightbox/static/shutter/shutter.js

2 – Arround line number 143 there should be a line like this:

You have to replace this line with the following one:

3 – Now you have to add a new function just before

Here’s the function you have to add:

Now the file should look like this:

NextGEN-Shutter-Mod

That’s it! I hope it works for everyone.

Best regards,
Thomas

First steps with Ractive.js

During the summer I had a little bit of free time. I had no university courses, almost no on-going projects and was only working for two days a week. So I decided to build a small web app.

Since I was couriosly waiting for some grades to arrive in the online portal of my university, I decided to build a (mobile) app to check this.

I decided to build the app with Python on the server and some sort of JavaScript framework on the client. Many of my fellow students were talking about AngularJS, but I wasn’t sure if I should take such a big framework for my first steps (I never used a framework like this in JavaScript before). Then I stumbeled over this link on HN. It was about an Angluar-like framework called Ractive.js.

Ractive.js

Ractive.js provides two main features (in my opinion):

  1. It provides a two-way data-binding. This means that you can hand-in a JSON object and directly bind its attributes to HTML elements. So you don’t have to update certain HTML elements manually (e.g. via JQuery).
  2. It provides a template rendering. This means you can write HTML templates (as simple *.html files for example) with mustach-syntax and render them on the client via JavaScript. To develop single-page applications gets very easy with this.

The server part

First of all, I did the server part. I used a combination of twill and Beautiful Soup to parse the web portal of my university. This was (and still is) a little bit slow, but at least no big deal. Then I build a small API to access these via AJAX. I used this API later on to update my page. E.g. the call http://127.0.0.1:5000/grades would return a JSON containing all of your grads:

I have build similar APIs for your personal schedule, for our canteen and some more.

The client part

For the client part, I used Pure (CSS framework) and (of course) Ractive.js. This made my templates really short and gave me a good experience on both, classical browsers and mobil. Here’s the full template to render the grade-table:

I simply loaded the template (which is actually a *.html file) in my main-template. Afterwards I have it in a variable called gradesTemplate in this example:

Then I am able to render it, e.g. by adding an appropriate click-listener to a link. The listener will fetch some data via AJAX and render the templated (loaded before) with it:

I’m not completly happy with using JQuery, but to put it all together was quite easy.

The result

Here are some screenshot of my final version (it’s not running online and you would need an account of my university to use it, so some screenshot are the best I could provide).

The code

You can find the code on BitBucket:

Best regards,
Thomas Uhrig

TechTrends – Searching trends on HN and Reddit

It’s done! Last Friday (26th July 2013) was the final presentation of our semester project TechTrends. Together with Raphael Brand and Hannes Pernpeintner I worked the last 5 months on this project – and we are really happy about it.

What is TechTrends?

TechTrends is basically a search-engine for HN and Reddit (just the programming sub-reddit). You can type in a key-word and you will find a bunch of similar articles. But our main concern was not to find just articles, but also to find trends. Therefor, you will not only get a pure list of articles, you will get a chart showing when articles have been found for your query. For each article we calculate a popularity, indicating how important that article is. Based on these popularities, we draw the trend-chart for your search.


reddit_logo

hn_logo

How does it work?

TechTrends has six major parts (see the graphic below). First of all, we crawl HN and Reddit all 15 minutes to get the latest links. In the second part we get the actual content form each link and store it in our database. Then we do a preprocessing on this pure text content to remove stop-words, digits and so on. After that, we use the great Gensim Server from Radim Řehůřek to build an index of all documents. The next (and last part on the server) is a JSON-based web API to access all of our data (here is its documentation). On top of these API we built our user-interface – the actual website.

techtrends-components

Presentation

Here is the video of our final presentation about TechTrends in our university (on the 26the July 2013). Our presentation is about 70 minutes long and we explain a lot of details about our project. The video is available on http://events.mi.hdm-stuttgart.de/2013-07-25-vortr%C3%A4ge-programming-intelligent-applications#techtrends or below. It also contains the presentations of the other groups, but we are the first on the video.You can find the slides under the video.

Video

Thanks to Stephan Soller and his team, our presentation has been recorded on video. You can see the video below or on http://events.mi.hdm-stuttgart.de/.

Slides

Here are the slides on my speaker deck account.

What’s next?

We have a lot of ideas for the future of TechTrends. We are thinking for example about a mobile version or a reporting tool. But in my oppinion, the most important step is to make TechTrends more easy to customize. Currently, we are focused on HN and Reddit. However, everything but the actual crawlers is independent of the underlying source. With a little bit of work, you can easily implement a new crawler for the data source of your choice. Making this customization more comfortable and easy is our next goal for project.

More

There is plenty more! Here you go:

Best regards,
Thomas Uhrig

TechTrends Final Presentation

Tomorrow morning is the final presentation of our semester project TechTrends. I posted several articles about TechTrends (see here) and I will definetely post one more after tomorrow. But for now, here’s our final presentation.



The presentation shows how we built TechTrends and covers different aspects of the development process. It talks about crawling Hackernews and Reddit, preprocessing and learning a model to query. We also describe problems, further ideas and many more. The presentation will take about 60 to 70 minutes (and everybody is still welcome tomorrow).

The presentation will also be streamed live on http://events.mi.hdm-stuttgart.de.

Best regards,
Thomas Uhrig