Hello Berlin!

2014-10-03 17.53.23

The last months have been an interesting and busy time for me (that’s why I didn’t write as much as I had planed, but I already got some new drafts in my WordPress!). After I finished my master’s thesis at Informatica in Stuttgart (greetings to the whole team!), I moved to Berlin for my first “real” job. I started at a company called Westernacher Products and Services, which does a lot of web applications and consulting for juristic institutions in Germany. Based on an up-to-date technology stack with Spring, Gradle, ExtJS and AngularJS, Westernacher implements document management and all kinds of business applications. Being part of the development team I am looking forward to a bunch of interesting projects and to a lot of new stuff to learn. Nice to be here!

Best regards,
Thomas

Presentation of my master thesis

Over the last six months, I wrote my master thesis about porting an enterprise OSGi application to a PaaS. Last Monday, the 21th Juli 2014, I presented the main results of my thesis to my professor (best greetings to you, Mr. Goik!) and to my colleges (thanks to all of you!) at Informatica in Stuttgart-Weilimdorf, Germany (where I had written my thesis based on one of their product information management applications, called Informatica PIM).

Here are the slides of my presentation.

While my master thesis also covers topics like OSGi, VMs and JEE application servers, the presentation focuses on my final solution of a deployment process for the cloud. Based on Docker, the complete software stack used for the Informatica PIM server was packaged into separate, self-contained images. Those images have been stored in a repository and were used to automatically setup cloud instances on Amazon Web Services (AWS).

The presentation gives answers to the following questions:

  • What is cloud computing and what is AWS?
  • What are containers and what is Docker?
  • How can we deploy containers?

To automate the deployment process of Docker images, I implemented my own little tool called DeployMan. It will show up at the end of my slides and I will write about it in a couple of days here. Although there are a lot of tools out there to automate Docker deployments (e.g. fig or Maestro), I wanted to do my own experiments and to create a prototype for my thesis.

Enjoy!

Best regards,
Thomas

Media Night Apps

This year there is something new on the Media Night, the student fair at our university.

The booklet is dead!

In its past, the Media Night had had a printed booklet with its program and time table in it. The booklet was produced by one of our printing faculties for which the HdM is pretty famous. It was distributed for free during the evening of the fair.

old_bookelt

Long live the booklet!

However, since this semester the good old booklet is history! Instead of it, the Media Night has a brand new app with all projects, students and even an indoor navigation. You can find it in Apple’s App Store and Google’s Play Store.

Most innovative projects

But why only have one app if you could have two? In case you have an Apple iPad or iPhone you can get an app about the 10 most innovative projects of this Media Night. And – TechTrends is one of it!

innovative_app

I hope you enjoy the apps, the projects and the evening.

Best regards from the Media Night,
Thomas

Continuous Integration Development Workflow

This semester I attended an university course called System Engineering and Management taught by Prof. Walter Kriha. The course has a slightly different topic every year and is made up of presentations from students, research assistants and other lecturers. The topic in this year was continuous integration and software deployment. Together with Jan Müller, I set up a continuous integration environment to develop a Python web app.

Our demo app

We made a simple demo app to demonstrate your workflow and to show the purpose of the components in our CI environment. We build our app with Python and Flask on the server, a simple template rendering for the client and a MongoDB at the back-end. The app itself was a simple CRUD application to add, update and delete users from a list.

The CI environment

ci_env_2

GitHub

We used GitHub as our central repository and to communicate with the rest of the world. The idea (for a real world team) was to use GitHub as a repository, as a wiki, as an issue tracker and as a community tool, e.g. for pull requests. However, we did not push any code directly to GitHub! Here is why:

GerritHub

We used GerritHub as a review tool and as our central repository for all development tasks. To do this, GerritHub cloned the original repository from GitHub and automatically synchronized both of them. As a developer we checked out code from GerritHub (not from GitHub!) and also pushed our changes to GerritHub (and, guess what, not to GitHub directly!). After a new change was pushed to GerritHub, all developers of the project received an email notification about the change. The change itself had to be reviewed and approved in GerritHub by at least one other developer. Only after a change was reviewed and approved it went into our code base on GitHub (GerritHub merged it into the code based). This enforced a certain policy in our development process where every piece of code must be reviewed by somebody else.

BuildBot

After the code was pushed to GerritHub, approved there and merged into GitHub, it was time for our build system. We set up a so called post commit hook in GitHub, which informed our build system about code changes. So each time new code arrived from GerritHub in our GitHub repository our build system started a new build. Since our app was made in Python we decided to use BuildBot (which itself is made in Python) as our build system. However, what we did is possible with any other build system, too. BuildBot did a set of tasks for us:

  1. Check out the latest code from GitHub
  2. Execute all unit tests
  3. Run static code analysis such as PEP8
  4. Build an artifact to download
  5. Deploy the app to our demo machine

Demo machine

We set up a demo machine where our BuildBot deployed ever (successful) build. With this automated deployment we always had an up-to-date and running instance of our app. This is great as a team since everybody of the team can see the latest features and show them to potential customers.

Demo videos

I made a short (~10 minutes) video to demonstrate the development workflow in this environment. The video covers a full development cycle of a new feature from the first check-out of the project to the final result on the demo machine. I made a voice-over in English and in German, both should contain the same (more or less). The video is cut to make it as short as possible, so you won’t have to wait for my Eclipse to start…

What I do in the video is the following:

  1. Check out the project and run make sure it is in a stable state
  2. Implement a new feature in a test-driven way
  3. Push the new feature
  4. Watch BuildBot doing its work
  5. Enjoy the final result on our demo machine

Demo Video (English): CI Development Workflow

Video: http://www.youtube.com/watch?v=ZYiwhvRDRak



Demo Video (German): CI Development Workflow

Video: http://www.youtube.com/watch?v=tza03NG1xwo



Best regards,
Thomas

Opinion Mining on Hackernews and Reddit

TechTrends

Last semester two of my friends and I made some sort of a search engine for Hackernews and Reddit. The idea was to collect all articles published on those two platforms and search them for trends. It should be possible to type-in a certain keyword such as “Bitcoin” and retrieve a trend chart showing when and how many articles have been published to “Bitcoin”.

The result was TechTrends. Based on Radim Řehůřek’s Gensim (a Python framework for text classification) we build a web application which crawls Hackernews and Reddit continuously and offers a simple web interface to search trends in these posts. You can find more posts about TechTrends here.

OpinionTrends

This winter semester I started to implement a new feature for TechTrends. I wanted to build an opinion mining and sentiment analysis for all posts. Based on the comments for each post on Hackernews and Reddit I wanted to classify all posts in one of three categories:

  • Positive, which means that most of the comments are praising the article.
  • Negative, which means that most of the comments are criticizing the article.
  • Neutral, which means that there is a balance between positive and negative comments.

I gave it the code name OpinionTrends.

The basic idea

The basic idea was to train a supervised classifier to categorize each comment and therefore each post. This should work similar to a spam filter in an email application: Each email marked as ‘spam’ will train a classifier which can categorize emails on its own in good or bad (which actually means spam). I wanted to do the same but with comments instead of emails and with three instead of two categories.

Training

A classifier is only as good as its training data. In case of a spam filter the training data are emails marked as ‘spam’ by the user. This makes the training data very good and very individual to each user. In my case I decided to use Amazon product reviews.

Amazon Product Reviews

Amazon product reviews are a great way to retrieve training data. They are marked with stars in 5 categories, they are available in many languages and for many domains and you can crawl them very easily. The only thing you have to do is to sign up a free developer account on Amazon and get started with your favorite language (there are libraries for most common languages out there).

Once the classifier is trained, it can be saved to a file and used further on. It doesn’t need to be updated anymore. However, the performance of the application depends completely on the classifier. Therefore it should be trained and tested carefully.

Validation

I tested different classifiers and different pre-processing steps of the Amazon Reviews. Below you can see a comparison between a Bayes Classifier and a SVM. The SVM beats the Bayes Classifier by 10% or more. However, its performance also depends dramatically on the pre-processing of the raw reviews.

Type of classfier No pre-processing Bi-grams Only adjectives No stop words
Bayes Classifier 71% accuracy 65% accuracy 70% accuracy 72% accuracy
SVM 85% accuracy 86% accuracy 78% accuracy 84% accuracy

Problems with validation

All tests were made with a 10-fold cross-validation. The only problem with those tests is, that I trained and tested with reviews from Amazon, but my final data were comments on blog posts which is not the exact same domain. Since Hackernews and Reddit are both about computer science I used reviews from SSDs, Microsoft and Apple software or computer games to be as close as possible to my final domain. However, I can’t really validate my final results. This has two reasons:

  • I don’t have a huge number of tagged blog posts and comments to compare them with the results of the classifier.
  • Comments are very subjective. In many cases you can not decide for sure whether a comment is positive or negative. Some few comments are very clear and easy (I hate your article.), but a lot of comments are something in between (I love your website but I hate the color.). Even I as a human beeing can not decide if they are positive or negative and if I could decide it my friend would argue with me.

My final result

OpinionTrends is in its last steps since a couple of days. Next week will present it at the Media Night of my university (a fairy for student projects). You can read more about it here.

OpinionTrends is also online and in some kind of a stable-state. However, it is still under development: http://opiniontrends.mi.hdm-stuttgart.de/

This is how it works

OpinionTrends works the same as TechTrends. You go to the website, type-in a keyword and get the results. The only really new thing is a brand new tab on the result page.

op_trends_new_tab

When you click on it you will see a new chart similar to the blue one on the Timeline tab. The chart has three colors and is very easy to read. The green bars represent the positive articles, the red bars represent the negative articles and the light yellow bars represent the neutral articles. The neutral articles are visualized on the positive scale and on the negative scale with same amount.

op_trends_nsa_chart

Above you see the result for NSA, which is actually a very good example since the overwhelming opinion about the NSA is very negative which you can see perfectly in the chart.

You can click on each bar to see a pop-up showing the articles behind the bar. You can jump directly to the article or open the discussion with all comments on this article.

op_trends_pop_up

Examples

Here are some good examples to show you how OpinionTrends works. The best one I found so fare is a search for NSA. The opinion is very negative as everyone would expect.

op_trends_nsa_chart

The opinion on Git is much more balanced. It has not only a nearly equal number of positive and negative articles, it has also a lager number of neutral posts.

op_trend_git

The opinion on Python is much better. Is has a lot of neutral posts, but beside them, Python has fare more positive than negative posts.

op_trend_python

More…

OpinionTrends has some more features such as individual settings so adjust each search. However, I think this is too much for this post. You can get a lot more information directly on the project site. TechTrends/OpinionTrends is also open source, so you can checkout the source code from BitBucket. OpinionTrends is in its own branch!

I hope you enjoy it and I would be really happy about some feedback.

Best regards,
Thomas

Media Night Winter Semester 2013/2014

opinion-trends-poster

During the last summer semester, two friends of mine and I made a student project called TechTrends. TechTrends was a web application that let you search for articles and trends in the field of computer science. Based on posts from Reddit and Hackernews, it provided an intelligent search on a growing number of articles and blogs.

During this winter semester I continued the project and implemented a sentiment analysis for TechTrends. Based on the existing infrastructure such as our database and our crawler, I add an automated categorization of articles according to their comments on Hackernews and Reddit.

You can find the old and stable version of our project under http://techtrends.mi.hdm-stuttgart.de/. The up-to-date development version is available under http://opiniontrends.mi.hdm-stuttgart.de/.

media_night_ws13

I will present the project at the Media Night at our university next week. It’s open for everybody and for free. It will start around 6 pm, but you can come whenever you want to, there is no schedule. Every project has its own booth, where it is presented and where you can ask question and get in touch with the people behind it.

You can find the program and information about all projects on http://www.hdm-stuttgart.de/medianight.

What? – Media Night Winter 2013
When? – 16th January 2014, from 6 pm to 10 pm
Where? – Hochschule der Medien, Nobelstraße 10, 70569 Stuttgart

Best regards,
Thomas

First steps with Ractive.js

During the summer I had a little bit of free time. I had no university courses, almost no on-going projects and was only working for two days a week. So I decided to build a small web app.

Since I was couriosly waiting for some grades to arrive in the online portal of my university, I decided to build a (mobile) app to check this.

I decided to build the app with Python on the server and some sort of JavaScript framework on the client. Many of my fellow students were talking about AngularJS, but I wasn’t sure if I should take such a big framework for my first steps (I never used a framework like this in JavaScript before). Then I stumbeled over this link on HN. It was about an Angluar-like framework called Ractive.js.

Ractive.js

Ractive.js provides two main features (in my opinion):

  1. It provides a two-way data-binding. This means that you can hand-in a JSON object and directly bind its attributes to HTML elements. So you don’t have to update certain HTML elements manually (e.g. via JQuery).
  2. It provides a template rendering. This means you can write HTML templates (as simple *.html files for example) with mustach-syntax and render them on the client via JavaScript. To develop single-page applications gets very easy with this.

The server part

First of all, I did the server part. I used a combination of twill and Beautiful Soup to parse the web portal of my university. This was (and still is) a little bit slow, but at least no big deal. Then I build a small API to access these via AJAX. I used this API later on to update my page. E.g. the call http://127.0.0.1:5000/grades would return a JSON containing all of your grads:

I have build similar APIs for your personal schedule, for our canteen and some more.

The client part

For the client part, I used Pure (CSS framework) and (of course) Ractive.js. This made my templates really short and gave me a good experience on both, classical browsers and mobil. Here’s the full template to render the grade-table:

I simply loaded the template (which is actually a *.html file) in my main-template. Afterwards I have it in a variable called gradesTemplate in this example:

Then I am able to render it, e.g. by adding an appropriate click-listener to a link. The listener will fetch some data via AJAX and render the templated (loaded before) with it:

I’m not completly happy with using JQuery, but to put it all together was quite easy.

The result

Here are some screenshot of my final version (it’s not running online and you would need an account of my university to use it, so some screenshot are the best I could provide).

The code

You can find the code on BitBucket:

Best regards,
Thomas Uhrig

Bildoptimierung für Webseiten

Am vergangenen Montag (05. August 2013) haben ich die letzte Projektarbeit für dieses Semester abgeschlossen: ein Paper über Bildoptimierung für Webseiten. Zusammen mit Annette Landmesser habe ich das Paper für die Veranstaltung Entwicklung von Rich Media Systemen von Jakob Schröter an der HdM Stuttgart geschrieben.

Ihr findet das Paper direkt von Google Drive hier.

Unsere Präsentation ist hier zu finden.

Beste Grüße,
Thomas Uhrig

TechTrends – Searching trends on HN and Reddit

It’s done! Last Friday (26th July 2013) was the final presentation of our semester project TechTrends. Together with Raphael Brand and Hannes Pernpeintner I worked the last 5 months on this project – and we are really happy about it.

What is TechTrends?

TechTrends is basically a search-engine for HN and Reddit (just the programming sub-reddit). You can type in a key-word and you will find a bunch of similar articles. But our main concern was not to find just articles, but also to find trends. Therefor, you will not only get a pure list of articles, you will get a chart showing when articles have been found for your query. For each article we calculate a popularity, indicating how important that article is. Based on these popularities, we draw the trend-chart for your search.


reddit_logo

hn_logo

How does it work?

TechTrends has six major parts (see the graphic below). First of all, we crawl HN and Reddit all 15 minutes to get the latest links. In the second part we get the actual content form each link and store it in our database. Then we do a preprocessing on this pure text content to remove stop-words, digits and so on. After that, we use the great Gensim Server from Radim Řehůřek to build an index of all documents. The next (and last part on the server) is a JSON-based web API to access all of our data (here is its documentation). On top of these API we built our user-interface – the actual website.

techtrends-components

Presentation

Here is the video of our final presentation about TechTrends in our university (on the 26the July 2013). Our presentation is about 70 minutes long and we explain a lot of details about our project. The video is available on http://events.mi.hdm-stuttgart.de/2013-07-25-vortr%C3%A4ge-programming-intelligent-applications#techtrends or below. It also contains the presentations of the other groups, but we are the first on the video.You can find the slides under the video.

Video

Thanks to Stephan Soller and his team, our presentation has been recorded on video. You can see the video below or on http://events.mi.hdm-stuttgart.de/.

Slides

Here are the slides on my speaker deck account.

What’s next?

We have a lot of ideas for the future of TechTrends. We are thinking for example about a mobile version or a reporting tool. But in my oppinion, the most important step is to make TechTrends more easy to customize. Currently, we are focused on HN and Reddit. However, everything but the actual crawlers is independent of the underlying source. With a little bit of work, you can easily implement a new crawler for the data source of your choice. Making this customization more comfortable and easy is our next goal for project.

More

There is plenty more! Here you go:

Best regards,
Thomas Uhrig

TechTrends Final Presentation

Tomorrow morning is the final presentation of our semester project TechTrends. I posted several articles about TechTrends (see here) and I will definetely post one more after tomorrow. But for now, here’s our final presentation.



The presentation shows how we built TechTrends and covers different aspects of the development process. It talks about crawling Hackernews and Reddit, preprocessing and learning a model to query. We also describe problems, further ideas and many more. The presentation will take about 60 to 70 minutes (and everybody is still welcome tomorrow).

The presentation will also be streamed live on http://events.mi.hdm-stuttgart.de.

Best regards,
Thomas Uhrig