Looking a Scrapy

As much as I’ve found the basic webscraping to be really simple with urllib and BeautifulSoup. It leaves somethings to be desired. The BeautifulSoup project has languished and recent versions have switched the HTML parser for one that is less able to manage with the poorly encoded pages on real websites.

Scrapy is a full on framework for scraping websites and it offers many features including a stand alone command-line interface and daemon tool to make scraping websites much more systematic and organized.

I have yet to build any substantial scraping scripts based on Scrapy but judging from the snippets I’ve read at http://snippets.scrapy.org, the documentation at http://doc.scrapy.org and the project blog at http://blog.scrapy.org. It seems like a solid project with a good future and a lot of really great features that will make my scripts more automate-able and standardized.