Asynchronously Prototyping

Posts

Embed Scrapy in WSGI Application

WSGI and Scrapy A common question on Scrapy Stackoverflow is "How to use Scrapy with Flask, Django, or any other Python web framework?" Most are used to using the Scrapy’s generated projects and cli options, which make crawling a breeze, but are confused when trying to integrate Scrapy into a WSGI web framework. A common traceback encountered is ReactorNotRestartable , which stems from the underlaying Twisted framework. This occurs because, unlike asyncio or Tornado, Twisted’s eventloop/reactor cannot be restarted once stopped (the reason is a bit out of scope). So it becomes apparent that the trick to integrating Scrapy and WSGI frameworks involves being able to tame Twisted. Luckily, integrating async Twisted code with synchronous code has become quite easy and is only getting easier. In this post, the following will be demonstrated: Embed a crawler in a WSGI app and run it using Twisted’s twist web WSGI server. Embed a crawler in a WSGI app and run it any WSGI serve...

Python alias commands that play nice with virtualenv

There are plenty of predefined Python executables, symlinks, and aliases that come bundled with your operating system. These commands come in very handy because it saves you from typing out long commands or chain of scripts. However the downfall of operating system aliases is that they don’t always play nice with virtualenv (or venv if you’re on Python 3). Most predefined aliases use the system’s default Python as the interpreter, which is next to useless when your application runs in a virtual environment. Over the years, I’ve come up with my own Python aliases that play nice with virtual environments. For this post, I tried to stay as generic as possible such that any alias here can be used by every Pythonista. In other words, there will be no aliases for specific frameworks such as running a Django server or starting a Scrappy spider. The following is one of my bash scripts I source: .py-aliases #----- Pip -----# alias pip-list="pip freeze | less" alias pip-search...

Twisted Klein: Expanding Your App

Expanding Your App Subroutes Let’s start with a simple way to combine routes that share a common endpoint. For example, lets say we need routes for /base/first , /base/second , /base/third . The crude way of achieving this would be to explicitly write out each route with the /base route, like so: app . route( '/base/first' ) def first (request): # ... app . route( '/base/second' ) def second (request): # ... app . route( '/base/third' ) def third (request): # ... This is valid code, but there’s simpler syntax that can help reduce some common human errors like misspellings. The subroute function helps make the code more legible as well as simpler. with app . subroute( '/base' ) as sub: @sub.route ( '/first' ) def first (request): return 'first' @sub.route ( '/second' ) def second (request): return 'second' @sub.route ( '/t...

Twisted Klein: Request Parameter

Request Parameter: What is it and why is it passed in? You may have noticed the request argument which gets passed into every route function. This variable is a Request object and serves a very important purpose of holding valuable request information. Request objects have an abundance of functionality in them, which would be tedious to convey in such a short tutorial. Some concepts, such as usage of Deferred and callbacks, are formally introduced in other posts. Write to the Frontend Content can slowly be added to the frontend using the Request.write() function. Please note that the parameter must be a bytes type, Python 2.7 users don't have to worry as strings are bytes, however in Python 3, a string is NOT a bytes object. @app.route ( '/write' ) def gradualWrite (request): request . write(b '<h1>Header</h1>' ) request . write(b '<h2>Header</h2>' ) request . write(b '<h3>Header...

Twisted Klein: Non-Blocking Recipes

Non-Blocking Recipes Do you like expressjs , but don’t want to switch to Node.js? Want non-blocking execution in Python? Then look no further! Asynchronous execution is the very essence of what makes Klein a contender in todays web framework landscape. It’s also the most difficult concept to grasp since most Pythonistas are not accustomed to asynchronous programming. Hopefully with the addition asyncio to Python’s standard library, this will change. Klein is built atop Twisted and developers can expose Deferred objects for asynchronous behavior. A very brief overview will be given on Twisted Deferred , afterwards aspiring developers are encouraged to read the Twisted documents on the subject (provided in the links near the bottom). Deferred Overview To demonstrate how Deferred objects work, we will create a chain of callback functions that execute after a result is available. Don’t worry if this confuses you right now, the code will clear things up. from __future_...

Twisted Klein: Database Usage

twisted.enterprise.adbapi Twisted, and subsequently Klein, allows asynchronous interaction with databases using blocking/sequential interfaces using adbapi . The only caveat is that the database interface must be DBAPI 2.0 compliant. For instance, if a PostgreSQL database needs to be accessed, then a typical interaction would be something like: import psycopg2 connection = psycopg2 . connect(database = 'Tutorial' , user = 'admin' , host = '10.10.10.10' ) cursor = connection . cursor() cursor . execute( "CREATE TABLE test (id serial PRIMARY KEY, num integer, data varchar);" ) cursor . execute( "INSERT INTO test (num, data) VALUES (%s, %s)" , ( 100 , "mydata" )) cursor . execute( "SELECT * FROM test;" ) connection . commit() cursor . close() connection . close() Depending on environment settings, connecting to the database server, inserting data, or even querying for records can take an unpr...