Category Archives: Software

Django Dash is a 48 hour competition where teams from all over the world build a django based web application. You can see the finished web applications at http://djangodash.com/. All the projects were required to be open sourced and committed to on either github.com or bitbucket.org. The cool thing about that is you can see how all these web applications were build from the ground up and get a feel for how to build really compelling django apps.

The results have now been posted.

I’m sure there’s lots of little tricks for quick development buried in those repos. I know I’ll be digging through them to get some inspiration for future projects.

Just 2 days ago I had an idea for a website. Today I’ve launched the site. I had a lot of fun coding the web application – that is why I love using Django.

I have to say that considering that I’m getting a bit rusty with Django and that I have never used any of the Facebook APIs I am pretty impressed that I was able to get a reasonably decent looking and functional website in just 2 days.

Want to take a look at the web application? http://like.halotis.com is where I’ve hosted it for the time being.

The concept is pretty simple. People can post quotes, phrases, links, or jokes and then like it on Facebook. The integration with Facebook will allow the application to promote itself virally when popular “Likeables” spread through the network.

It was a pretty simple concept with a trivial data model so most of the development time was spent fiddling with the CSS and layouts. But certainly I have plenty of ideas for ways to increase the engagement of the site going forward.

This may be the first of many future django facebook applications from me.

I will admit that I have trouble from time to time remembering to keep my blog active and publish frequently. Compounding the problem is that I have many 10s of blogs out there that have become stale and effectively dead due to lack of attention. To help solve this problem I’ve put together a simple nagging script that I have scheduled which checks a bunch of my sites to see how fresh the content is. Once it breaks a specified threshold for the number of days since the last post I get a nagging email reminding me to write something for the site.

#!/usr/bin/env python 
# -*- coding: utf-8 -*-
# (C) 2010 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import feedparser  # available at feedparser.org
import gmail
 
import datetime
import re
 
SETTINGS = [{'name':'yourblog.com', 'url':'http://www.example.com/feed/', 'email':'YOUR EMAIL', 'frequency':'3'},
            ]
 
today = datetime.datetime.today()
 
def check_blog(blog):
 
    d = feedparser.parse(blog['url'])
    lastPost = d.entries[0]
 
    pubDate = datetime.datetime(
            lastPost.modified_parsed[0],
            lastPost.modified_parsed[1],
            lastPost.modified_parsed[2],
            lastPost.modified_parsed[3],
            lastPost.modified_parsed[4],
            lastPost.modified_parsed[5])
 
    if today - pubDate >  datetime.timedelta(days=7):
        print "send email - last post " + str((today-pubDate).days) + " days ago."
        gmail.send_email(blog['name'] + ' needs attention! last post ' + str((today-pubDate).days) + " days ago.", 'Please write a post', to_addr=blog['email'])
    else:
        print "good - last post" + str((today-pubDate).days) + " days ago."
 
if __name__ == '__main__':
    for blog in SETTINGS:
        check_blog(blog)

I’ve added this script to the halotis-collection on bitbucket.org if your interested in pulling it from there.

Bash is an incredibly powerful shell, and being proficient with it can make a massive difference in your productivity. Small tips and tricks can sometimes make a big difference in how you work. The shortcuts I’ve listed below deal mostly with what is actually readline functionality and so they may work in many other command line situations and programs. This is not a complete list but just some of my favorites.

Commands for Moving

These are the basics. The real stand outs here are moving around the line by word it can save you plenty of time compared to navigating only with the arrow keys.

  • beginning-of-line (Ctrl-a)
    Move to the start of the current line.
  • end-of-line (Ctrl-e)
    Move to the end of the line.
  • forward-char (Ctrl-f)
    Move forward a character.
  • backward-char (Ctrl-b)
    Move back a character.
  • forward-word (Meta-f)
    Move forward to the end of the next word. Words are composed of alphanumeric characters (letters and digits).
  • backward-word (Meta-b)
    Move back to the start of the current or previous word. Words are composed of alphanumeric characters (letters and digits).
  • clear-screen (Ctrl-l)
    Clear the screen leaving the current line at the top of the screen. With an argument, refresh the current line without clearing the screen.

Commands for Manipulating the History

These can be lifesavers. Especially if you’re running the same or similar commands over and over. For example Ctrl-o is so much faster than pressing ‘up’ a bunch of times, then pressing ‘up’ the same number of times to get to the next command in the sequence – use Ctrl-o, or maybe even a keyboard macro.

  • accept-line (Newline, Return)
    Accept the line regardless of where the cursor is. If this line is non-empty, add it to the history list according to the state of the HISTCONTROL variable. If the line is a modified history line, then restore the
    history line to its original state.
  • previous-history (Ctrl-p)
    Fetch the previous command from the history list, moving back in the list.
  • next-history (Ctrl-n)
    Fetch the next command from the history list, moving forward in the list.
  • beginning-of-history (Meta-< )
    Move to the first line in the history.
  • end-of-history (Meta->)
    Move to the end of the input history, i.e., the line currently being entered.
  • reverse-search-history (Ctrl-r)
    Search backward starting at the current line and moving `up’ through the history as necessary. This is an incremental search.
  • forward-search-history (Ctrl-s)
    Search forward starting at the current line and moving `down’ through the history as necessary. This is an incremental search.
  • yank-nth-arg (Meta-Ctrl-y)
    Insert the first argument to the previous command (usually the second word on the previous line) at point. With an argument n, insert the nth word from the previous command (the words in the previous command begin
    with word 0). A negative argument inserts the nth word from the end of the previous command. Once the argument n is computed, the argument is extracted as if the “!n” history expansion had been specified.
  • yank-last-arg (Meta-., Meta-_)
    Insert the last argument to the previous command (the last word of the previous history entry). With an argument, behave exactly like yank-nth-arg. Successive calls to yank-last-arg move back through the history
    list, inserting the last argument of each line in turn. The history expansion facilities are used to extract the last argument, as if the “!$” history expansion had been specified.
  • shell-expand-line (Meta-Ctrl-e)
    Expand the line as the shell does. This performs alias and history expansion as well as all of the shell word expansions. See HISTORY EXPANSION below for a description of history expansion.
  • history-expand-line (Meta-^)
    Perform history expansion on the current line. See HISTORY EXPANSION below for a description of history expansion.
  • insert-last-argument (Meta-., Meta-_)
    A synonym for yank-last-arg.
  • operate-and-get-next (Ctrl-o)
    Accept the current line for execution and fetch the next line relative to the current line from the history for editing. Any argument is ignored.
  • edit-and-execute-command (Ctrl-xCtrl-e)
    Invoke an editor on the current command line, and execute the result as shell commands. Bash attempts to invoke $VISUAL, $EDITOR, and emacs as the editor, in that order.

Commands for Changing Text

  • delete-char (Ctrl-d)
    Delete the character at point. If point is at the beginning of the line, there are no characters in the line, and the last character typed was not bound to delete-char, then return EOF.
  • quoted-insert (Ctrl-q, Ctrl-v)
    Add the next character typed to the line verbatim. This is how to insert characters like Ctrl-q, for example.
  • tab-insert (Ctrl-v TAB)
    Insert a tab character.
  • transpose-chars (Ctrl-t)
    Drag the character before point forward over the character at point, moving point forward as well. If point is at the end of the line, then this transposes the two characters before point. Negative arguments have no
    effect.
  • transpose-words (Meta-t)
    Drag the word before point past the word after point, moving point over that word as well. If point is at the end of the line, this transposes the last two words on the line.
  • upcase-word (Meta-u)
    Uppercase the current (or following) word. With a negative argument, uppercase the previous word, but do not move point.
  • downcase-word (Meta-l)
    Lowercase the current (or following) word. With a negative argument, lowercase the previous word, but do not move point.
  • capitalize-word (Meta-c)
    Capitalize the current (or following) word. With a negative argument, capitalize the previous word, but do not move point.

Killing and Yanking

Killing and yanking can be a tremendous time saver over copy/paste with the mouse.

  • kill-line (Ctrl-k)
    Kill the text from point to the end of the line.
  • backward-kill-line (Ctrl-x Backspace)
    Kill backward to the beginning of the line.
  • unix-line-discard (Ctrl-u)
    Kill backward from point to the beginning of the line. The killed text is saved on the kill-ring.
  • kill-word (Meta-d)
    Kill from point to the end of the current word, or if between words, to the end of the next word. Word boundaries are the same as those used by forward-word.
  • backward-kill-word (Meta-Backspace)
    Kill the word behind point. Word boundaries are the same as those used by backward-word.
  • shell-kill-word (Meta-d)
    Kill from point to the end of the current word, or if between words, to the end of the next word. Word boundaries are the same as those used by shell-forward-word.
  • shell-backward-kill-word (Meta-Backspace)
    Kill the word behind point. Word boundaries are the same as those used by shell-backward-word.
  • unix-word-Backspace (Ctrl-w)
    Kill the word behind point, using white space as a word boundary. The killed text is saved on the kill-ring.
  • delete-horizontal-space (Meta-\)
    Delete all spaces and tabs around point.

Completing

There are some powerful completing shortcuts.

  • complete (TAB)
    Attempt to perform completion on the text before point. Bash attempts completion treating the text as a variable (if the text begins with $), username (if the text begins with ~), hostname (if the text begins with
    @), or command (including aliases and functions) in turn. If none of these produces a match, filename completion is attempted.
  • possible-completions (Meta-?)
    List the possible completions of the text before point.
  • insert-completions (Meta-*)
    Insert all completions of the text before point that would have been generated by possible-completions.
  • complete-filename (Meta-/)
    Attempt filename completion on the text before point.
  • possible-filename-completions (Ctrl-x /)
    List the possible completions of the text before point, treating it as a filename.
  • complete-username (Meta-~)
    Attempt completion on the text before point, treating it as a username.
  • possible-username-completions (Ctrl-x ~)
    List the possible completions of the text before point, treating it as a username.
  • complete-variable (Meta-$)
    Attempt completion on the text before point, treating it as a shell variable.
  • possible-variable-completions (Ctrl-x $)
    List the possible completions of the text before point, treating it as a shell variable.
  • complete-hostname (Meta-@)
    Attempt completion on the text before point, treating it as a hostname.
  • possible-hostname-completions (Ctrl-x @)
    List the possible completions of the text before point, treating it as a hostname.
  • complete-command (Meta-!)
    Attempt completion on the text before point, treating it as a command name. Command completion attempts to match the text against aliases, reserved words, shell functions, shell builtins, and finally executable file‐
    names, in that order.
  • possible-command-completions (Ctrl-x !)
    List the possible completions of the text before point, treating it as a command name.
  • dynamiCtrl-complete-history (Meta-TAB)
    Attempt completion on the text before point, comparing the text against lines from the history list for possible completion matches.
  • complete-into-braces (Meta-{)
    Perform filename completion and insert the list of possible completions enclosed within braces so the list is available to the shell (see Brace Expansion above).

Keyboard Macros

These can be useful if you’re running the same few commands over and over. For example, when I’m working in my IDE and then want to run some tests, I can quickly create a macro the first time I run my couple of commands to clean, build, and run the tests. When I want to run that sequence again it’s very quick, and doesn’t require hunting/searching through the history.

  • start-kbd-macro (Ctrl-x ()
    Begin saving the characters typed into the current keyboard macro.
  • end-kbd-macro (Ctrl-x ))
    Stop saving the characters typed into the current keyboard macro and store the definition.
  • call-last-kbd-macro (Ctrl-x e)
    Re-execute the last keyboard macro defined, by making the characters in the macro appear as if typed at the keyboard.

Miscellaneous

  • prefix-meta (ESC)
    Metafy the next character typed. ESC f is equivalent to Meta-f.
  • undo (Ctrl-_, Ctrl-x Ctrl-u)
    Incremental undo, separately remembered for each line.
  • tilde-expand (Meta-&)
    Perform tilde expansion on the current word.
  • set-mark (Ctrl-@, Meta-)
    Set the mark to the point. If a numeric argument is supplied, the mark is set to that position.
  • exchange-point-and-mark (Ctrl-x Ctrl-x)
    Swap the point with the mark. The current cursor position is set to the saved position, and the old cursor position is saved as the mark.
  • character-search (Ctrl-])
    A character is read and point is moved to the next occurrence of that character. A negative count searches for previous occurrences.
  • character-search-backward (Meta-Ctrl-])
    A character is read and point is moved to the previous occurrence of that character. A negative count searches for subsequent occurrences.
  • insert-comment (Meta-#)
    Without a numeric argument, the value of the readline comment-begin variable is inserted at the beginning of the current line. If a numeric argument is supplied, this command acts as a toggle: if the characters at
    the beginning of the line do not match the value of comment-begin, the value is inserted, otherwise the characters in comment-begin are deleted from the beginning of the line. In either case, the line is accepted as
    if a newline had been typed. The default value of comment-begin causes this command to make the current line a shell comment. If a numeric argument causes the comment character to be removed, the line will be exe‐
    cuted by the shell.
  • display-shell-version (Ctrl-x Ctrl-v)
    Display version information about the current instance of bash.

I’ve been watching a lot of Holmes Inspection on HGTV lately. If you’re not familiar with the show Mike Holmes is the general contractor who goes into a home that has some problems – usually bad renovations have left the home structurally unsound, mold problems, overloaded electrical boxes and a full list of instances of things not built to code. Holmes and his team then come in to rebuild the house right.

One of the interesting dynamics that you see on the show is the teacher – apprentice relation between Mike Holmes and his lead contractor Damon. Mike, being the expert, will asses the house first then watch Damon asses the house and quiz him along the way. It’s the tone of those conversations where you can tell there is a trusting relationship between the two of them and you can see over the seasons as Damon has grown more assertive and confident in his skills.

Construction is a lot like programming. A developer has to write code so that it is stable and structured to allow it to stay that way in the future. Rebuilding a house is akin to refactoring a code base – many of the problems are easy to identify, the solutions sometimes require some creativity to solve but generally follow patterns and best practices. Initial assessments rarely reveal the full extent of problems in a house which leads to time and cost overruns which is typical of software development as well.

With these similarities perhaps it makes sense to model programming education after the more established and historically proven teacher-apprentice style used in construction. This would serve to allow beginner and intermediate programmers to continue learning once they enter the job market. Constructive code review from peers is really just a start. Working in an environment where there is a good mix of senior and junior developers and where you are challenged to explain and defend your code and design choices is how to create a truly great development team.

Will the industry naturally move in this direction as it matures or is it doomed as companies are only interested in hiring the cheapest entry level programmers and only giving them time push out code?

I have been doing a lot of web development work lately. Mostly learning about how different people create their workflows and manage local development, testing, staging, and production deployment of code.

In the past I have used Apache Ant for deploying Java applications. It is a bit cumbersome. Apache Ant uses XML config files which are kind of limiting once you try to do something non-standard and can sometimes require writing special Java code to create new directives. The resulting XML is not always easy to read.

For the last few days I have been using Fabric to write a few simple deploy scripts and I think this is a much nicer way of doing it. You get the full power of Python but a very simple syntax and easy command line usage.

Here’s a very simple deploy script that I am using to deploy some static files to my web server.

from fabric.api import *
 
#Fabric 0.9.0 compatible
# usages: $ fab prod deploy
 
REMOTE_HG_PATH = '/home/halotis/bin/hg'
 
def prod():
    """Set the target to production."""
    env.user = 'USERNAME'
    env.hosts = ['USERNAME.webfactional.com']
    env.remote_app_dir = 'webapps/APPLICATION'
    env.remote_push_dest = 'ssh://USERNAME@USERNAME.webfactional.com/%s' % env.remote_app_dir
    env.tag = 'production'
 
 
def deploy():
    """Deploy the site.
 
    This will tag the repository, and push changes to the remote location.
    """
    require('hosts', provided_by=[prod, ])
    require('remote_app_dir', provided_by=[prod, ])
    require('remote_push_dest', provided_by=[prod, ])
    require('tag', provided_by=[prod, ])
 
    local("hg tag --force %s" % env.tag)
    local("hg push %s --remotecmd %s" % (env.remote_push_dest, REMOTE_HG_PATH))
    run("cd %s; hg update -C %s" % (env.remote_app_dir, env.tag))

For this to work though you need to have some things set up.

  • Need SSH access to the remote server
  • Mercurial (hg) must be installed on the remote server, and development
  • Need to bootstrap the remote repository – FTP the .hg folder to the destination location
  • Install Fabric on local development machine – $ pip install fabric

Find out more about Fabric from the official site.

I have been hard at work testing out different approaches to Adwords.  One of the keys is that I’m scripting up a lot of the management of campaigns, ad groups, keywords, and ads.  The Adwords API could be used but there’s an expense to using it which would be a significant expense for the size of my campaigns.  So I have been using the Adwords Editor to help manage everything.  What makes it excellent is that the tool has import and export to/from csv files.  This makes it pretty simple to play with the data.

To get a file that this script will work with just go to the File menu in Google Adwords Editor and select “Export to CSV”  You can then select “Export Selected Campaigns”.  it will write out a csv file.

This Python script will read those output csv files into a Python data structure which you can then manipulate and write back out to a file.

With the file modified you can then use the Adwords Editor’s “Import CSV” facility to get your changes back into the Editor and then uploaded to Adwords.

Having this ability to pull this data into Python, modify it, and then get it back into Adwords means that I can do a lot of really neat things:

  • create massive campaigns with a large number of targeted ads
  • Invent bidding strategies that act individually at the keyword level
  • automate some of the management
  • pull in statistics from CPA networks to calculate ROIs
  • convert text ads into image ads

Here’s the script:

#!/usr/bin/env python
# coding=utf-8
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
"""
read and write exported campaigns from Adwords Editor
 
"""
 
import codecs
import csv
 
FIELDS = ['Campaign', 'Campaign Daily Budget', 'Languages', 'Geo Targeting', 'Ad Group', 'Max CPC', 'Max Content CPC', 'Placement Max CPC', 'Max CPM', 'Max CPA', 'Keyword', 'Keyword Type', 'First Page CPC', 'Quality Score', 'Headline', 'Description Line 1', 'Description Line 2', 'Display URL', 'Destination URL', 'Campaign Status', 'AdGroup Status', 'Creative Status', 'Keyword Status', 'Suggested Changes', 'Comment', 'Impressions', 'Clicks', 'CTR', 'Avg CPC', 'Avg CPM', 'Cost', 'Avg Position', 'Conversions (1-per-click)', 'Conversion Rate (1-per-click)', 'Cost/Conversion (1-per-click)', 'Conversions (many-per-click)', 'Conversion Rate (many-per-click)', 'Cost/Conversion (many-per-click)']
 
def readAdwordsExport(filename):
 
    campaigns = {}
 
    f = codecs.open(filename, 'r', 'utf-16')
    reader = csv.DictReader(f, delimiter='\t')
 
    for row in reader:
        #remove empty values from dict
        row = dict((i, j) for i, j in row.items() if j!='' and j != None)
        if row.has_key('Campaign Daily Budget'):  # campain level settings
            campaigns[row['Campaign']] = {}
            for k,v in row.items():
                campaigns[row['Campaign']][k] = v
        if row.has_key('Max Content CPC'):  # AdGroup level settings
            if not campaigns[row['Campaign']].has_key('Ad Groups'):
                campaigns[row['Campaign']]['Ad Groups'] = {}
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']] = row
        if row.has_key('Keyword'):  # keyword level settings
            if not campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']].has_key('keywords'):
                campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['keywords'] = []
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['keywords'].append(row)
        if row.has_key('Headline'):  # ad level settings
            if not campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']].has_key('ads'):
                campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['ads'] = []
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['ads'].append(row)
    return campaigns
 
def writeAdwordsExport(data, filename):
    f = codecs.open(filename, 'w', 'utf-16')
    writer = csv.DictWriter(f, FIELDS, delimiter='\t')
    writer.writerow(dict(zip(FIELDS, FIELDS)))
    for campaign, d in data.items():
        writer.writerow(dict((i,j) for i, j in d.items() if i != 'Ad Groups'))
        for adgroup, ag in d['Ad Groups'].items():
            writer.writerow(dict((i,j) for i, j in ag.items() if i != 'keywords' and i != 'ads'))
            for keyword in ag['keywords']:
                writer.writerow(keyword)            
            for ad in ag['ads']:
                writer.writerow(ad)
    f.close()
 
if __name__=='__main__':
    data = readAdwordsExport('export.csv')
    print 'Campaigns:'
    print data.keys()
    writeAdwordsExport(data, 'output.csv')

This code is available in my public repository: http://bitbucket.org/halotis/halotis-collection/

This is more of a helpful snippit than a useful program but it can sometimes be useful to have some user agent strings handy for web scraping.

Some websites check the user agent string and will filter the results of a request. It’s a very simple way to prevent automated scraping. But it is very easy to get around. The user agent can also be checked by spam filters to help detect automated posting.

A great resource for finding and understanding what user agent strings mean is UserAgentString.com.

This simple snippit uses a file containing the list of user agent strings that you want to use. It can very simply source that file and return a random one from the list.

Here’s my source file UserAgents.txt:

Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20090913 Firefox/3.5.3
Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.1) Gecko/20090718 Firefox/3.5.1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.1 (KHTML, like Gecko) Chrome/4.0.219.6 Safari/532.1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Win64; x64; Trident/4.0)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 2.0.50727; InfoPath.2)Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)
Mozilla/4.0 (compatible; MSIE 6.1; Windows XP)

And here is the python code that makes getting a random agent very simple:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import random
 
SOURCE_FILE='UserAgents.txt'
 
def get():
    f = open(SOURCE_FILE)
    agents = f.readlines()
 
    return random.choice(agents).strip()
 
def getAll():
    f = open(SOURCE_FILE)
    agents = f.readlines()
    return [a.strip() for a in agents]
 
if __name__=='__main__':
    agents = getAll()
    for agent in agents:
        print agent

You can grab the source code for this along with my other scripts from the bitbucket repository.

Digg is by far the most popular social news site on the internet. With it’s simple “thumbs up” system the users of the site promote the most interesting and high quality stores and the best of those make it to the front page. What you end up with is a filtered view of the most interesting stuff.

It’s a great site and one that I visit every day.

I wanted to write a script that makes use of the search feature on Digg so that I could scrape out and re-purpose the best stuff to use elsewhere. The first step in writing that larger (top secret) program was to start with a scraper for Digg search.

The short python script I came up with will return the search results from Digg in a standard python data structure so it’s simple to use. It parses out the title, destination, comment count, digg link, digg count, and summary for the top 100 search results.

You can perform advanced searches on digg by using a number of different flags:

  • +b Add to see buried stories
  • +p Add to see only promoted stories
  • +np Add to see only unpromoted stories
  • +u Add to see only upcoming stories
  • Put terms in “quotes” for an exact search
  • -d Remove the domain from the search
  • Add -term to exclude a term from your query (e.g. apple -iphone)
  • Begin your query with site: to only display stories from that URL.

This script also allows the search results to be sorted:

from DiggSearch import digg_search
digg_search('twitter', sort='newest')  #sort by newest first
digg_search('twitter', sort='digg')  # sort by number of diggs
digg_search('twitter -d')  # sort by best match

Here’s the Python code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import urllib,urllib2
import re
 
from BeautifulSoup import BeautifulSoup
 
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'
 
def remove_extra_spaces(data):
    p = re.compile(r'\s+')
    return p.sub(' ', data)
 
def digg_search(query, sort=None, pages=10):
    """Returns a list of the information I need from a digg query
    sort can be one of [None, 'digg', 'newest']
    """
 
    digg_results = []
    for page in range (1,pages):
 
        #create the URL
        address = "http://digg.com/search?s=%s" % (urllib.quote_plus(query))
        if sort:
            address = address + '&sort=' + sort
        if page > 1:
            address = address + '&page=' + str(page)
 
        #GET the page
        request = urllib2.Request(address, None, {'User-Agent':USER_AGENT} )
        urlfile = urllib2.urlopen(request)
        page = urlfile.read(200000)
        urlfile.close()
 
        #scrape it
        soup = BeautifulSoup(page)
        links = soup.findAll('h3', id=re.compile("title\d"))
        comments = soup.findAll('a', attrs={'class':'tool comments'})
        diggs = soup.findAll('strong', id=re.compile("diggs-strong-\d"))
        body = soup.findAll('a', attrs={'class':'body'})
        for i in range(0,len(links)):
            item = {'title':remove_extra_spaces(' '.join(links[i].findAll(text=True))).strip(), 
                    'destination':links[i].find('a')['href'],
                    'comment_count':int(comments[i].string.split()[0]),
                    'digg_link':comments[i]['href'],
                    'digg_count':diggs[i].string,
                    'summary':body[i].find(text=True)
                    }
            digg_results.append(item)
 
        #last page early exit
        if len(links) < 10:
            break
 
    return digg_results
 
if __name__=='__main__':
    #for testing
    results = digg_search('twitter -d', 'digg', 2)
    for r in results:
        print r

You can grab the source code from the bitbucket repository.

Have you ever wanted to track and assess your SEO efforts by seeing how they change your position in Google’s organic SERP? With this script you can now track and chart your position for any number of search queries and find the position of the site/page you are trying to rank.

This will allow you to visually identify any target keyword phrases that are doing well, and which ones may need some more SEO work.

This python script has a number of different components.

  • SEOCheckConfig.py script is used to add new target search queries to the database.
  • SEOCheck.py searches Google and saves the best position (in the top 100 results)
  • SEOCheckCharting.py graph all the results

The charts produced look like this:

seocheck

The main part of the script is SEOCheck.py. This script should be scheduled to run regularly (I have mine running 3 times per day on my webfaction hosting account).

For a small SEO consultancy business this type of application generates the feedback and reports that you should be using to communicate with your clients. It identifies where the efforts should go and how successful you have been.

To use this set of script you first will need to edit and run the SEOCheckConfig.py file. Add your own queries and domains that you’d like to check to the SETTINGS variable then run the script to load those into the database.

Then schedule SEOCheck.py to run periodically. On Windows you can do that using Scheduled Tasks:
Scheduled Task Dialog

On either Mac OSX or Linux you can use crontab to schedule it.

To generate the Chart simply run the SEOCheckCharting.py script. It will plot all the results on one graph.

You can find and download all the source code for this in the HalOtis-Collection on bitbucket. It requires BeautifulSoup, matplotlib, and sqlalchemy libraries to be installed.