This is a simple Twitter Python script that checks your friends time-line and prints out any links that have been posted. In addition it visits each of the URLs and finds the actual title of the destination page and prints that along side. This simple script demonstrates an easy way to gather some of the hottest trends on the internet the moment they happen.
If you set up a Twitter account within a niche and find a few of the players in that niche to follow then you can simply find any links posted, check them to see if they are on topic (using some keyword/heuristics) and then either notify yourself of the interesting content, or automatically scrape it for use on one of your related websites. That gives you perhaps the most up to date content possible before it hits Google Trends. It also gives you a chance to promote it before the social news sites find it (or be the first to submit it to them).
With a bit more work you could parse out some of the meta tag keywords/description, crawl the website, or find and cut out the content from the page. If it’s a blog you could post a comment.
Example Usage:
$ python TwitterLinks.py http://bit.ly/s8rQX - Twitter Status - Tweets from users you follow may be missing from your timeline http://bit.ly/26hiT - Why Link Exchanges Are a Terrible, No-Good Idea - Food Blog Alliance http://FrankAndTrey.com - Frank and Trey http://bit.ly/yPRHp - Gallery: Cute animals in the news this week ... |
And here’s the python code:
#!/usr/bin/env python # -*- coding: utf-8 -*- # (C) 2009 HalOtis Marketing # written by Matt Warren # http://halotis.com/ try: import json except: import simplejson as json # http://undefined.org/python/#simplejson import twitter #http://code.google.com/p/python-twitter/ from urllib2 import urlopen import re SETTINGS = {'user':'twitter user name', 'password':'you password here'} def listFriendsURLs(user, password): re_pattern='.*?((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s"]*))' # HTTP URL rg = re.compile(re_pattern,re.IGNORECASE|re.DOTALL) api = twitter.Api(user, password) timeline = api.GetFriendsTimeline(user) for status in timeline: m = rg.search(status.text) if m: httpurl=m.group(1) title = getTitle(httpurl) print httpurl, '-', title def getTitle(url): req = urlopen(url) html = req.read() re_pattern='<title>(.*?)</title>' rg = re.compile(re_pattern,re.IGNORECASE|re.DOTALL) m = rg.search(html) if m: title = m.group(1) return title.strip() return None if __name__ == '__main__': listFriendsURLs(SETTINGS['user'], SETTINGS['password']) |