Based on my last post for scraping the Google SERP I decided to make the small change to scrape the organic search results from Bing.
I wasn’t able to find a way to display 100 results per page in the Bing results so this script will only return the top 10. However it could be enhanced to loop through the pages of results but I have left that out of this code.
Example Usage:
$ python BingScrape.py http://twitter.com/halotis http://www.halotis.com/ http://www.halotis.com/progress/ http://doi.acm.org/10.1145/367072.367328 http://runtoloseweight.com/privacy.php http://twitter.com/halotis/statuses/2391293559 http://friendfeed.com/mfwarren http://www.date-conference.com/archive/conference/proceedings/PAPERS/2001/DATE01/PDFFILES/07a_2.pdf http://twitterrespond.com/ http://heatherbreen.com |
Here’s the Python Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- # (C) 2009 HalOtis Marketing # written by Matt Warren # http://halotis.com/ import urllib,urllib2 from BeautifulSoup import BeautifulSoup def bing_grab(query): address = "http://www.bing.com/search?q=%s" % (urllib.quote_plus(query)) request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} ) urlfile = urllib2.urlopen(request) page = urlfile.read(200000) urlfile.close() soup = BeautifulSoup(page) links = [x.find('a')['href'] for x in soup.find('div', id='results').findAll('h3')] return links if __name__=='__main__': # Example: Search written to file links = bing_grab('halotis') print '\n'.join(links) |