Skip to main content

Posts

Showing posts from October, 2017

Embed Scrapy in WSGI Application

WSGI and Scrapy A common question on Scrapy Stackoverflow is "How to use Scrapy with Flask, Django, or any other Python web framework?" Most are used to using the Scrapy’s generated projects and cli options, which make crawling a breeze, but are confused when trying to integrate Scrapy into a WSGI web framework. A common traceback encountered is ReactorNotRestartable , which stems from the underlaying Twisted framework. This occurs because, unlike asyncio or Tornado, Twisted’s eventloop/reactor cannot be restarted once stopped (the reason is a bit out of scope). So it becomes apparent that the trick to integrating Scrapy and WSGI frameworks involves being able to tame Twisted. Luckily, integrating async Twisted code with synchronous code has become quite easy and is only getting easier. In this post, the following will be demonstrated: Embed a crawler in a WSGI app and run it using Twisted’s twist web WSGI server. Embed a crawler in a WSGI app and run it any WSGI serve