Random User Agent String Python Script

This is more of a helpful snippit than a useful program but it can sometimes be useful to have some user agent strings handy for web scraping.

Some websites check the user agent string and will filter the results of a request. It’s a very simple way to prevent automated scraping. But it is very easy to get around. The user agent can also be checked by spam filters to help detect automated posting.

A great resource for finding and understanding what user agent strings mean is UserAgentString.com.

This simple snippit uses a file containing the list of user agent strings that you want to use. It can very simply source that file and return a random one from the list.

Here’s my source file UserAgents.txt:

Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20090913 Firefox/3.5.3
Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.1) Gecko/20090718 Firefox/3.5.1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.1 (KHTML, like Gecko) Chrome/4.0.219.6 Safari/532.1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Win64; x64; Trident/4.0)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 2.0.50727; InfoPath.2)Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)
Mozilla/4.0 (compatible; MSIE 6.1; Windows XP)

And here is the python code that makes getting a random agent very simple:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import random
 
SOURCE_FILE='UserAgents.txt'
 
def get():
    f = open(SOURCE_FILE)
    agents = f.readlines()
 
    return random.choice(agents).strip()
 
def getAll():
    f = open(SOURCE_FILE)
    agents = f.readlines()
    return [a.strip() for a in agents]
 
if __name__=='__main__':
    agents = getAll()
    for agent in agents:
        print agent

You can grab the source code for this along with my other scripts from the bitbucket repository.