Am vergangenen Montag (05. August 2013) haben ich die letzte Projektarbeit für dieses Semester abgeschlossen: ein Paper über Bildoptimierung für Webseiten. Zusammen mit Annette Landmesser habe ich das Paper für die Veranstaltung Entwicklung von Rich Media Systemen von Jakob Schröter an der HdM Stuttgart geschrieben.
Ihr findet das Paper direkt von Google Drive hier.
It’s done! Last Friday (26th July 2013) was the final presentation of our semester project TechTrends. Together with Raphael Brand and Hannes Pernpeintner I worked the last 5 months on this project – and we are really happy about it.
What is TechTrends?
TechTrends is basically a search-engine for HN and Reddit (just the programming sub-reddit). You can type in a key-word and you will find a bunch of similar articles. But our main concern was not to find just articles, but also to find trends. Therefor, you will not only get a pure list of articles, you will get a chart showing when articles have been found for your query. For each article we calculate a popularity, indicating how important that article is. Based on these popularities, we draw the trend-chart for your search.
How does it work?
TechTrends has six major parts (see the graphic below). First of all, we crawl HN and Reddit all 15 minutes to get the latest links. In the second part we get the actual content form each link and store it in our database. Then we do a preprocessing on this pure text content to remove stop-words, digits and so on. After that, we use the great Gensim Server from Radim Řehůřek to build an index of all documents. The next (and last part on the server) is a JSON-based web API to access all of our data (here is its documentation). On top of these API we built our user-interface – the actual website.
We have a lot of ideas for the future of TechTrends. We are thinking for example about a mobile version or a reporting tool. But in my oppinion, the most important step is to make TechTrends more easy to customize. Currently, we are focused on HN and Reddit. However, everything but the actual crawlers is independent of the underlying source. With a little bit of work, you can easily implement a new crawler for the data source of your choice. Making this customization more comfortable and easy is our next goal for project.