OpinionTrends is build with Python and Flask. Therefore you can run it without any additional server right out of the box. Batteries included! However, it is much more common and much more efficient if you run it with a web server and an application server. A widely used combination for Python web applications is nginx together with uWSGI. In this tutorial I want to show how we set up this two tools to run TechTrends and OpinionTrends. The tutorial starts at the very beginning and the only thing I assume is that you run a Linux machine with Python and easy_install.
Note: OpinionTrends is the new version of TechTrends. However, it is just a code name. So when I talk about OpinionTrends or TechTrends I mean the same thing.
OpinionTrends
Base folder
The first thing we have to do is to create the base directory for TechTrends. By convention a web application is located in /var/www
. All of our code will go into this folder.
mkdir /var/www/techtrends
Clone & Update
Now we can start to set-up the application from scratch. OpinionTrends is deployed with git
. So we just check-out the code and switch to the new opinion
branch:
git clone https://bitbucket.org/RaBrand/techtrends.git git fetch && git checkout opinion
After the first checkout an update is super easy. Just do a pull
for the latest code:
git pull
Configuration
Now we have to make some simple settings for TechTrends. To do this, we create a new file called config.py
in the folder Configuration
in our checked-out project. There is also a file called config_template.py
in this folder which is an empty but well documented template for the configuration. The file contains all individual settings for the application. It should look like this:
import logging class Config: DB_FILE = ‘/var/www/links.sqlite’ CORPUS_FILE = ‘/var/www/corpus.sqlite’ SIMILARITY_SERVER = ‘/var/www/simserver’ CACHE = ‘/var/www/cache.txt’ CLASSIFIER = ‘/var/www/classifier.pkl’ DEBUG = False PORT = 80 HOST = “0.0.0.0” METHOD = ‘lsi’ RESPECT_ROBOTS_TXT = ‘True’ AMAZON_CREDENTIALS = “” TWITTER_CREDENTIALS = “”
Dependencies
Now we have to install the libraries needed by TechTrends. To do this we use easy_install. All dependencies are in a file called requirements.txt
in the root folder of TechTrends. We have to install all dependencies in this file:
easy_install requirements.txt
Note: As you see, the installation of the dependencies is very easy in theory. However, some dependencies such as scikit-learn
are sometimes hard to install since they are not pure Python and use come C bindings. If you have problems installing the whole requirements.txt
at once try to install every dependency manually on its own.
nginx
Install
Installing nginx is not a big thing:
sudo apt-get install nginx sudo /etc/init.d/nginx start
After we installed and started nginx we can verify if it is running correctly by taking our favorite browser and surf to our machine.
By the way, stopping nginx is also very simple:
/etc/init.d/nginx stop
Configuration
Now we have to configure ngix to point to TechTrends instead to the default welcome page. To do this we first remove the default configuration:
sudo rm /etc/nginx/sites-enabled/default
Now we create our own configuration in /var/www/techtrends/nginx.conf
. It should look like this:
server { listen 80; server_name _; location / { try_files $uri @yourapplication; } location /static { root /var/www/techtrends/WebServer; expires 7d; add_header Pragma public; add_header Cache-Control “public, must-revalidate, proxy-revalidate”; } location @yourapplication { include uwsgi_params; uwsgi_pass unix:/var/www/techtrends/uwsgi.sock; } }
We link this file to nginx a restart it:
sudo ln -s /var/www/techtrends/nginx.conf /etc/nginx/conf.d/ sudo /etc/init.d/nginx restart
Now we should get a Bad Gateway
exception. Perfect! This tells use that nginx found our configuration and that every things looks good - except of the missing uWSGI!
uWSGI
Install
uWGSI is the protocol between our Python application and nginx. It is their way of communication. First we have to install it:
pip install uwsgi
Configuration
Now we create a configuration file in /var/www/techtrends/uwsgi.ini
. It should look like this:
[uwsgi] plugins = python base = /var/www/techtrends app = start_webserver module = %(app) socket = /var/www/techtrends/%n.sock chmod-socket = 666 callable = app logto = /var/log/uwsgi/%n.log
Now we can start uWSGI as a daemon in the background:
uwsgi –ini /var/www/techtrends/uwsgi.ini –daemonize uwsgi_daemon.log
Done! Nginx is serving TechTrends now. However, we should get an exception again since memcached is still missing. If we want to use TechTrends without memcache we have to change a value in the config.py
in the Configuration
folder in TechTrends base directory. We have to set DEBUG = True
to not use memcache.
memcached
One of the biggest performance improvements you can do (I guess in general, but especially for TechTrends) is to use memcached. Memcached is a key-value in-memory store to cache frequently requested data. In TechTrends we use it to store whole pages and JSON responses from our API.
Install
Install memcached first:
sudo apt-get install memcached
And that’s it! Memcached is installed and running now. You can restart memcached (e.g. to clear it) like this:
sudo /etc/init.d/memcached restart
Congratulations
Congratulations! TechTrends should run now in a stable production mode. We installed nginx, uWSGI and memcached. We also configured it to work together. Great! But - there are still some open points we have to do.
crontab
TechTrends/OpinionTrends has two regularly scheduled jobs. One job is the crawler which crawls posts from Reddit and Hackernews and the other job is the training and restart of the application. To execute this jobs we set-up a cron tab. First we create two files which execute these jobs. The first file is called crawl.sh
and looks like this:
python /var/www/techtrends/start_scraper.py > /var/www/techtrends/crontab_scraper.txt
The second file is called restart.sh
and looks like this:
rm -r /var/www/simserver python /var/www/techtrends/start_training.py > /var/www/techtrends/crontab_training.txt pkill -9 -f start_daemon.py nohup python /var/www/techtrends/start_daemon.py > /var/www/techtrends/nohup_daemon.txt & /etc/init.d/memcached restart killall uwsgi uwsgi –ini /var/www/techtrends/uwsgi.ini –daemonize uwsgi_daemon.log
Both files should be in the root folder (/var/www/techtrends/
) of TechTrends. Now we add those two files to our locale crontab
. We can edit the it like this:
crontab -e
It should look like this:
PYTHONPATH=:/var/www/techtrends:/var/www/techtrends/Configuration:/var/www/techtrends/Database:/var/www/techtrends/Maintenance:/var/www/techtrends/Opinion:/var$
m h dom mon dow command
*/30 * * * * cd /var/www/techtrends; sh crawl.sh 0 3 * * * cd /var/www/techtrends; sh restart.sh
This crontab
will run the crawler every 30 minutes and once a day at 3 o’clock it will trigger the training and restart of the whole application. So the data will growth every 30 minutes and will be indexed once a day. That’s it. Best regards, Thomas