Working with GerritHub.io

This semester I attended an university course called System Engineering and Management taught by Prof. Walter Kriha. Together with Jan Müller, I build a continuous integration environment for the course. As we were looking for a code review system we found a plattform called GerritHub.io which offered a Gerrit system for GitHub as a service. Here is our experience with this service.

Gerrit

gerrit-logo

Gerrit is an open-source tool to do code reviews with Git. In its core it encapsulates a Git implementation (JGit in this case) and works exactly like Git for a developer. You can clone a Gerrit repository, commit and push changes to it. The only difference is that code must be reviewed by somebody else before it is merged into the real repository. Gerrit acts like a layer between the developer and the repository. It is used to enforce a certain review policy in (big) teams.

GerritHub.io

A good idea

GerritHub.io offers a Gerrit system for GitHub as a service. This is very nice, because setting up Gerrit on our own can be very annoying. Gerrit is written in Java and comes with a Maven pom file. This should make things easy, but it doesn’t. Some dependencies needed by Gerrit are not in public Maven repositories so you have to get theses libraries first on our own. If you need a certain plugin for Gerrit (e.g. to connect easily to GitHub) it gets even worse. Most plugins depend on a certain version of Gerrit and mostly not on the current one. The versioning of the plugins and of Gerrit isn’t the same, too. Therefore, GerritHub.io is a really good idea.

Let’s begin with something positive, it is easy!

To use GerritHub.io is really easy. You have to create a developer account and connect this account with our GitHub account. Then you can clone your GitHub repository to GerritHub.io. From this point on you only push and pull to GerritHub.io, not to GitHub any more. GerritHub.io lets you do some settings and will offer you the code review. After a review it will automatically merge the code back to GitHub so both repositories will stay synchronized. It’s really easy.

IMHO, it’s very buggy

conversation_with_GerritHub

GerritHub.io is a really good idea and it is really easy to use, but – and this is my own humbling opinion – it is really really buggy.

We used GerritHub.io for no longer than two months in a small team of two persons. We pushed maybe 20 commits. In this time we had approximately 5 problems with the platform which we couldn’t fix on our own and which blocked use completely. There were no error messages and as far as I understand nothing that we as a user did wrong. So we wrote an email to the support. And another one. And another one…

The support of GerritHub.io was really nice and really fast. But more important, they always fixed our problems. Sometimes it was something with their file system, sometimes a problem with OAuth, sometimes something else. Whatever, this should not happen. I know that GerritHub.io is for free and maybe it is a very new project. But it is backed by a company called GerritForge.com which tries to make money with such products. But this doesn’t cast a positive light on the company. Besides the blocking bugs on GerritHub.io, there were some minor bugs like the search/filter function which didn’t find any projects or the user interface which sometimes just stopped loading anything. IMHO, a project should not go public like this.

If GerritHub.io will fix the problems, the platform could be a really great product. And maybe the problems are already fixed when you read this post. But right now, I can not recommend this platform. It’s crap.

Best regards,
Thomas

Media Night Winter Semester 2013/2014

opinion-trends-poster

During the last summer semester, two friends of mine and I made a student project called TechTrends. TechTrends was a web application that let you search for articles and trends in the field of computer science. Based on posts from Reddit and Hackernews, it provided an intelligent search on a growing number of articles and blogs.

During this winter semester I continued the project and implemented a sentiment analysis for TechTrends. Based on the existing infrastructure such as our database and our crawler, I add an automated categorization of articles according to their comments on Hackernews and Reddit.

You can find the old and stable version of our project under http://techtrends.mi.hdm-stuttgart.de/. The up-to-date development version is available under http://opiniontrends.mi.hdm-stuttgart.de/.

media_night_ws13

I will present the project at the Media Night at our university next week. It’s open for everybody and for free. It will start around 6 pm, but you can come whenever you want to, there is no schedule. Every project has its own booth, where it is presented and where you can ask question and get in touch with the people behind it.

You can find the program and information about all projects on http://www.hdm-stuttgart.de/medianight.

What? – Media Night Winter 2013
When? – 16th January 2014, from 6 pm to 10 pm
Where? – Hochschule der Medien, Nobelstraße 10, 70569 Stuttgart

Best regards,
Thomas

TechTrends – Searching trends on HN and Reddit

It’s done! Last Friday (26th July 2013) was the final presentation of our semester project TechTrends. Together with Raphael Brand and Hannes Pernpeintner I worked the last 5 months on this project – and we are really happy about it.

What is TechTrends?

TechTrends is basically a search-engine for HN and Reddit (just the programming sub-reddit). You can type in a key-word and you will find a bunch of similar articles. But our main concern was not to find just articles, but also to find trends. Therefor, you will not only get a pure list of articles, you will get a chart showing when articles have been found for your query. For each article we calculate a popularity, indicating how important that article is. Based on these popularities, we draw the trend-chart for your search.


reddit_logo

hn_logo

How does it work?

TechTrends has six major parts (see the graphic below). First of all, we crawl HN and Reddit all 15 minutes to get the latest links. In the second part we get the actual content form each link and store it in our database. Then we do a preprocessing on this pure text content to remove stop-words, digits and so on. After that, we use the great Gensim Server from Radim Řehůřek to build an index of all documents. The next (and last part on the server) is a JSON-based web API to access all of our data (here is its documentation). On top of these API we built our user-interface – the actual website.

techtrends-components

Presentation

Here is the video of our final presentation about TechTrends in our university (on the 26the July 2013). Our presentation is about 70 minutes long and we explain a lot of details about our project. The video is available on http://events.mi.hdm-stuttgart.de/2013-07-25-vortr%C3%A4ge-programming-intelligent-applications#techtrends or below. It also contains the presentations of the other groups, but we are the first on the video.You can find the slides under the video.

Video

Thanks to Stephan Soller and his team, our presentation has been recorded on video. You can see the video below or on http://events.mi.hdm-stuttgart.de/.

Slides

Here are the slides on my speaker deck account.

What’s next?

We have a lot of ideas for the future of TechTrends. We are thinking for example about a mobile version or a reporting tool. But in my oppinion, the most important step is to make TechTrends more easy to customize. Currently, we are focused on HN and Reddit. However, everything but the actual crawlers is independent of the underlying source. With a little bit of work, you can easily implement a new crawler for the data source of your choice. Making this customization more comfortable and easy is our next goal for project.

More

There is plenty more! Here you go:

Best regards,
Thomas Uhrig

TechTrends Final Presentation

Tomorrow morning is the final presentation of our semester project TechTrends. I posted several articles about TechTrends (see here) and I will definetely post one more after tomorrow. But for now, here’s our final presentation.



The presentation shows how we built TechTrends and covers different aspects of the development process. It talks about crawling Hackernews and Reddit, preprocessing and learning a model to query. We also describe problems, further ideas and many more. The presentation will take about 60 to 70 minutes (and everybody is still welcome tomorrow).

The presentation will also be streamed live on http://events.mi.hdm-stuttgart.de.

Best regards,
Thomas Uhrig

A GPS diary in four weeks with Play!

During the last four weeks I did another small project for one of my university courses this semester. The course was about ORM modelling and abstractions for databases. We did some exercies with Hibernate and discussed diefferent aspects about ORM modelling. At the end of the course, every student made a small web-project using a framework and ORM tool of his choice.

Most of us used Play! and ebeans to make their application – me too. I didn’t have any experience with Play! but it was very pleasent to work with. Togehter with Foundation CSS prototyping was extremly fast. And ebeans also did a great job. During the whole project I didn’t write any line of SQL – nice!

Online demo

You can see a demo on http://tad.mi.hdm-stuttgart.de/. Just log-in as tad@tad.de with 123456 as your password to see a demo-account with some data.

Presentation

My presentation about TAD: http://tuhrig.de/wp-content/uploads/tad/presentation.html

Code

The code is available on my university’s own Git repository, but it is public:

git clone https://version.mi.hdm-stuttgart.de/git/TAD

Best regards,
Thomas Uhrig

TechTrends Presentation

Next Friday (July 26th 2013) the final presentation of TechTrends will take place at our university. The presentation will take about 60 min and will cover topics like architecture, crawling, data storage and front end design. Everybody interested is welcome (please send me an email before). Here’s the whole schedule for next week:

09.00h-10.10h Tech Trends (by Brand, Uhrig, Pernpeintner)
10.15h-11.25h Newsline (by Förder, Golpashin, Wetzel, Keller)
11.45h-12.35h Intellligent Filters for Image Search (by Landmesser, Mussin)
12.40h-13.50h Nao Face Recognition (by Sandrock, Schneider, Müller)
13.55h-14.35h GPU-driven deep CNN (by Schröder)

The presentations will take place in room 056 (Aquarium). I will upload the presentation to my speakerdeck account at the end of next week.

Best regards,
Thomas Uhrig

TechTrends at the Media Night 2013 of the Media University Stuttgart

During this summer semester, two friends of mine and I made a student project called TechTrends. TechTrends is a web application that lets you search for articles and trends in the field of computer science. We crawl posts from Reddit and Hackernews and provide an intelligent search on them. You can type in a key-word (e.g. bitcoin) and get a timeline showing you when articles for this topic have been published.

techtrends_medianight

We will present our project at the Media Night at our university next week. It’s open for everybody and for free. It will start around 6 pm, but you can come whenever you want to. There is no schedule. Every project has its own booth, where it is presented and where you can ask question and get in touch with the people behind it.

You can find the program and information about all projects on http://www.hdm-stuttgart.de/medianight.

What? – Media Night 2013
When? – 27th June 2013, from 6 pm to 10 pm
Where? – Hochschule der Medien, Nobelstraße 10 , 70569 Stuttgart

Our project is online on http://techtrends.mi.hdm-stuttgart.de/.

Best regards,
Thomas Uhrig

Writing an online scraper on Google App Engine (Python)

Sometimes you need to collect data – for visualization, data-mining, research or whatever you want. But collecting data takes time, especially when time is a major concern and data should be collected over a long period.
Typically you would use a dedicated machine (e.g. a server) to do this, rather then using your own laptop or PC to crawl the internet for weeks. But setting up a server can be complicated and time consuming – nothing you would do for a small private project.

A good and free alternative is the Google App Engine (GAE). The GAE is a web-hosting service of Google which offers a platform to upload Java and Python applications. It comes with its own user authentication system and its own database. If you already have a Google account, you can upload up to ten applications for free. However, the free version has some limitations, e.g. you only have a 1 GB database with a maximum of 50.000 write-operations per day (more details).

One big advantage of the GAE is the possibility to create cron-jobs. A cron-job is a task that is executed on fixed points in time, e.g. all 10 minutes. Exactly what you need to build a scraper!

But let’s do it step by step:

1. Registration

First of all, you need a Google account and you must be registered by the GAE. After your registration, you can create a new application (go to https://appengine.google.com and click on Create Application).

01_gae_registration

Choose the name for your application wisely, you can’t change it later on!

2. Install Python, GAE SDK and Google Eclipse plugin

To start programming for GAE, you need to set up some simple things. Since we want to develop an application in Python, Python (v. 2.7) must be installed on your computer. Also, you need to install the GAE SDK for Python. Optional, you can also install the Google plugin for Eclipse together wit PyDev which I would recommend, because it makes life much easier.

3. Create your application

Now you can start and develop your application! Open Eclipse and create a new PyDev Google App Engine Project. To make a GAE application, we need at least two files: a main Python script and the app.yaml (a configuration file). Since we want to create a cron-job, too, we need a third file (cron.yaml) to define this job. For reading a RSS stream we also use a third-party library called feedparser.py. Just download the ZIP-file and unpack the file feedparser.py to your project folder (this is ok for the beginning). A very simple scrawler could look like this:

Scrawler.py

app.yaml

Note: The application must be the same name as your registered on Google in the first step!

cron.yaml

Done! Your project should look like this now (including feedparser.py):

02_gae_project

4. Test it on your own machine

Before we deploy the application on GAE, we want to test it locally to see if it is really working. To do so, we have to make a new run-configuration in Eclipse. Click on the small arrow at the small green run button and choose “Run configurations…”. Then, create a new “Google App Engine” configuration and fill in the following parameters (see the pictures):

Name:
GAE (you can choose anything as name)

Project:
TechRepScrawler (your project in your Eclipse workspace)

Main Module:
C:Program Files (x86)Googlegoogle_appenginedev_appserver.py (dev_appserver.py in your GAE installation folder)

Program Arguments:
--admin_port=9000 "C:UsersThomasworkspace_pythonTechRepScrawler"

04_gae_run_config_1

04_gae_run_config_2

After starting GAE locally on your computer using the run configuration, just open your browser and go to http://localhost:8080/ to see the running application. You can also go to an admin perspective on http://localhost:9000/ to see, e.g. some information about your data.

5. Deploy your application to GAE

The next – and last step! – is to deploy the application on GAE. Using the Google Eclipse plugin, this is as easy as it can be. Just click right on your project, go to PyDev: Google App Engine and click upload. Now your app will be upload on GAE! On the first time, you will be asked for your credentials, that’s all.

05_gae_upload

06_gae_deployed

Now your app is online and available for every one! The cron-job will refresh it every 10 minutes (which just means, it will surf on your site like every other user would do it). Here’s how it should look:

07_last

Best regards,
Thomas Uhrig

Java 8 in Eclipse (Juno)


Note: Here is an up-to-date tutorial for the new Eclipse versions Kepler and Luna: http://tuhrig.de/java-8-in-eclipse-kepler-and-luna


Since last July the new Java 7 is generally available. However, even the newer Java 8 is already available as an Early Access Preview. It can be download as part of the OpenJDK including lambda-support. To use it with Eclipse, some additional steps are required. Here’s how it goes:

  1. Download the JDK from http://jdk8.java.net/lambda and install it.
  2. Create a new Java project in Eclipse.
  3. Change some project settings in Eclipse in order to use the new javac compiler of Java 8:
    1. Click right on your project and select Properties.
    2. Select Builders on the left side and uncheck the default Java Builder.
    3. Click on New and select Program as the type of the new builder.
    4. Open javac from the JDK 8 on your disk (it’s in the folder bin).
    5. Configure the new builder by providing some arguments:
    6. Select the tab Build Options and check the During auto builds box, to let the builder run automatically.
  4. Done! Now you can write and execute Java 8 code!

The main feature of Java 8 is the ability to write lambdas and use a functional-like programming style. This can be very useful, especially in GUI-programming. It can reduce the amount of code, because anonymous classes that only implements a listener-interface are avoided. Here’s a very simple example:

The code creates a common JFrame with two buttons. To each button an action-listener is added. But instead of implementing an anonymous class, a method reference is passed in the first case and a lambda in the second case. This reduces the amount of code compared with the old solution that looked like this:

However, the Java compiler will create exactly such a thing of the lambda expression or function reference. Therefore, the passed method must fulfill the interface described by ActionListener. Also, we can only use lambdas instead of interfaces that only describe one single method. Such interfaces are called functional interfaces.

Since Eclipse Juno is made for Java 7 it will not recognize the new lambda syntax. Hence, the editor will mark parts of the code as red errors. But it’s still possible to compile and to execute the code using the new javac compiler form Java 8.

Best regards,
Thomas Uhrig

Thomas’ Functional Interpreter

I just finished a small functional interpreter for an university course. To be precise, the interpreter itself is not functional, but it understands a functional, LISP-like language/syntax. The whole project is written in Java 7 and provides an interface to all Java classes on the class-path, too.

I also tried to create a small IDE in Swing. I built an editor with syntax-highlighting and code-completion (thanks to the RSyntaxTextArea project), an environment inspector and a small debugger to keep track of the stack. Take a look at it.

I called it ThoFu for Thomas’ Functional Interpreter. I think it teats good… You can get some brief impressions from the slideshow below.

The whole project is open, you can find it as an Eclipse project on my Bitbucket account. It contains the actual code, all libraries, screen-shots, unit tests and (of course) some documentation like UML diagrams. Here are the links to the GIT repository:

Bitbucket: https://bitbucket.org/wordless/thofu-interpreter
Git: https://bitbucket.org/wordless/thofu-interpreter.git

Here is the executable JAR-file (Note: You need Java 7). Just download it, double-click it and go-ahead:

Executable program: ThoFu Interpreter.jar
Read-Me: README.md

And this is how it goes. First a simple “hello world” program:

Now, a more sophisticated one. Here’s a quicksort-algorithm for my interpreter. It contains some user-defined functions, multi-line statements and some comments.

And this is how the Java API works. The code will create a JFrame with a JButton and a JLabel. When you press the button, the listener will change the text of the label from “click me” to “hello“. The syntax is inspired by the Java Dot Notation by Peter Norvig. I used the Apache Commons Lang library for the reflection-tricks to load class and so on.

You can just copy and paste the code snippets into the editor or save them as common text files and open them. They are also in the repository. The JUnit tests contain many more examples as well.

For those who are interested in the topic, you can find my resources (libraries, tool and literature) in the Read-Me.

Best regards,
Thomas Uhrig