List the Links in Your Twitter Timeline Python Script

This is a simple Twitter Python script that checks your friends time-line and prints out any links that have been posted. In addition it visits each of the URLs and finds the actual title of the destination page and prints that along side. This simple script demonstrates an easy way to gather some of the hottest trends on the internet the moment they happen.

If you set up a Twitter account within a niche and find a few of the players in that niche to follow then you can simply find any links posted, check them to see if they are on topic (using some keyword/heuristics) and then either notify yourself of the interesting content, or automatically scrape it for use on one of your related websites. That gives you perhaps the most up to date content possible before it hits Google Trends. It also gives you a chance to promote it before the social news sites find it (or be the first to submit it to them).

With a bit more work you could parse out some of the meta tag keywords/description, crawl the website, or find and cut out the content from the page. If it’s a blog you could post a comment.

Example Usage:

$ python TwitterLinks.py
http://bit.ly/s8rQX - Twitter Status - Tweets from users you follow may be missing from your timeline
http://bit.ly/26hiT - Why Link Exchanges Are a Terrible, No-Good Idea - Food Blog Alliance
http://FrankAndTrey.com - Frank and Trey
http://bit.ly/yPRHp - Gallery: Cute animals in the news this week
...

And here’s the python code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
try:
   import json
except:
   import simplejson as json # http://undefined.org/python/#simplejson
import twitter     #http://code.google.com/p/python-twitter/
 
from urllib2 import urlopen
import re
 
SETTINGS = {'user':'twitter user name', 'password':'you password here'}
 
def listFriendsURLs(user, password):
    re_pattern='.*?((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s"]*))'	# HTTP URL
    rg = re.compile(re_pattern,re.IGNORECASE|re.DOTALL)
 
    api = twitter.Api(user, password)
    timeline = api.GetFriendsTimeline(user)
 
    for status in timeline:
        m = rg.search(status.text)
        if m:
            httpurl=m.group(1)
            title = getTitle(httpurl)
            print httpurl, '-', title
 
def getTitle(url):
    req = urlopen(url)
    html = req.read()
 
    re_pattern='<title>(.*?)</title>'
    rg = re.compile(re_pattern,re.IGNORECASE|re.DOTALL)
 
    m = rg.search(html)
    if m:
        title = m.group(1)
        return title.strip()
    return None
 
if __name__ == '__main__':
    listFriendsURLs(SETTINGS['user'], SETTINGS['password'])