Category Archives: Internet

neverblueNeverblue is a CPA network that I have found to be one of the better ones out there. If you are not familiar with the CPA side of internet marketing it’s where you get paid for each person you refer that performs a certain action (CPA = Cost Per Action) The action could be anything from providing a Zip Code, or email address, to purchasing a sample. The marketer who promotes the offer can get quite a good payout – anything from $0.05 to $50+.

Marketers find offers to promote using services like that provided by neverblue. And neverblue acts as the middle man by finding and paying the marketers, and finding businesses with offers for them to promote.

Neverblue is unique in that they program their own platform and have developed some nice APIs and interfaces for getting your performance and tracking statistics programatically. I promote a bunch of their offers and and make a decent amount of money through them so I thought I should write a script that can download my statistics and keep it stored somewhere mesh it with my PPC data to calculate return on investment numbers per keyword.

Getting data from Neverblue is a 3 step process:

  1. Request a report to be generated
  2. Wait for that report to finish
  3. Request the results of the report

This is a bit more complex than most of the processes that download information, but it is a pretty flexible way to request bigger datasets without timing out on the HTTP request.

So here’s a short Python script I wrote based on Neverblue’s sample PHP script. I just prints out the payout information for yesterday.

Example Usage:

$ python
2009-08-20 $40.00

Here’s the Python code that gets the statistics from neverblue:

#!/usr/bin/env python
# encoding: utf-8
Created by Matt Warren on 2009-08-12.
Copyright (c) 2009 All rights reserved.
import urllib2
import time
import csv
import os
from urllib import urlencode
    from xml.etree import ElementTree
except ImportError:
    from elementtree import ElementTree
username='Your Neverblue login (email)'
url = ''
schedule_url = url + 'reportSchedule/'
status_url   = url + 'reportStatus/'
download_url = url + 'reportDownloadUrl/'
REALM = ''
def install_opener():
    # create a password manager
    password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
    # Add the username and password.
    password_mgr.add_password(REALM, url, username, password)
    handler = urllib2.HTTPBasicAuthHandler(password_mgr)
    # create "opener" (OpenerDirector instance)
    opener = urllib2.build_opener(handler)
    # Install the opener.
    # Now all calls to urllib2.urlopen use our opener.
def request_report():
    params={'type':'date', 'relativeDate':'yesterday', 'campaign':0}
    req = urllib2.Request(schedule_url + '?' + urlencode(params))
    handle = urllib2.urlopen(req)
    xml =
    tree = ElementTree.fromstring(xml)
    # parse the reportJob code from the XML
    reportJob = tree.find('reportJob').text
    return reportJob
def check_status(reportJob):
    params = {'reportJob' = reportJob}
    for i in range(0, SERVER_RETRIES):
        req = urllib2.Request(status_url + '?' + urlencode(params))
        handle = urllib2.urlopen(req)
        xml =
        tree = ElementTree.fromstring(xml)
        reportStatus = tree.find('reportStatus').text
        if reportStatus == 'completed':
    return reportStatus
def get_results(reportJob):
    params = {'reportJob':reportJob, 'format':'csv'}
    req = urllib2.Request(download_url + '?' + urlencode(params))
    handle = urllib2.urlopen(req)
    xml =
    tree = ElementTree.fromstring(xml)
    downloadURL = tree.find('downloadUrl').text
    report = urllib2.urlopen(downloadURL).read()
    reader = csv.DictReader( report.split( '\n' ) )
    for row in reader:
        print row['Date'], row['Payout']
if __name__=='__main__':
    reportJob = request_report()
    reportStatus = check_status(reportJob)
    if reportStatus == 'completed':

If you’re interested in trying to make money with CPA offers I highly recommend using Neverblue to find some really profitable offers and probably the most advanced platform for doing international offers out there right now.

Product Advertising APIAmazon has a very comprehensive associate program that allows you to promote just about anything imaginable for any niche and earn commission for anything you refer. The size of the catalog is what makes Amazon such a great program. People make some good money promoting Amazon products.

There is a great Python library out there for accessing the other Amazon web services such as S3, and EC2 called boto. However it doesn’t support the Product Advertising API.

With the Product Advertising API you have access to everything that you can read on the Amazon site about each product. This includes the product description, images, editor reviews, customer reviews and ratings. This is a lot of great information that you could easily find a good use for with your websites.

So how do you get at this information from within a Python program? Well the complicated part is dealing with the authentication that Amazon has put in place. To make that a bit easier I used the connection component from boto.

Here’s a demonstration snippet of code that will print out the top 10 best selling books on Amazon right now.

Example Usage:

$ python
Glenn Becks Common Sense: The Case Against an Out-of-Control Government, Inspired by Thomas Paine by Glenn Beck
Culture of Corruption: Obama and His Team of Tax Cheats, Crooks, and Cronies by Michelle Malkin
The Angel Experiment (Maximum Ride, Book 1) by James Patterson
The Time Travelers Wife by Audrey Niffenegger
The Help by Kathryn Stockett
South of Broad by Pat Conroy
Paranoia by Joseph Finder
The Girl Who Played with Fire by Stieg Larsson
The Shack [With Headphones] (Playaway Adult Nonfiction) by William P. Young
The Girl with the Dragon Tattoo by Stieg Larsson

To use this code you’ll need an Amazon associate account and fill out the keys and tag needed for authentication.

Product Advertising API Python code:

#!/usr/bin/env python
# encoding: utf-8
Created by Matt Warren on 2009-08-17.
Copyright (c) 2009 All rights reserved.
import urllib
    from xml.etree import ET
except ImportError:
    from elementtree import ET
from boto.connection import AWSQueryConnection
def amazon_top_for_category(browseNodeId):
    aws_conn = AWSQueryConnection(
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY, is_secure=False,
    aws_conn.SignatureVersion = '2'
    params = dict(
        Timestamp=time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime()))
    verb = 'GET'
    path = '/onca/xml'
    qs, signature = aws_conn.get_signature(params, verb, path)
    qs = path + '?' + qs + '&Signature=' + urllib.quote(signature)
    response = aws_conn._mexe(verb, qs, None, headers={})
    tree = ET.fromstring(
    NS = tree.tag.split('}')[0][1:]
    for item in tree.find('{%s}Items'%NS).findall('{%s}Item'%NS):
        title = item.find('{%s}ItemAttributes'%NS).find('{%s}Title'%NS).text
        author = item.find('{%s}ItemAttributes'%NS).find('{%s}Author'%NS).text
        print title, 'by', author
if __name__ == '__main__':
    amazon_top_for_category(1000) #Amazon category number for US Books

linkbuildinglu7Inlinks (aka backlinks) are an important aspect of your SEO strategy. They are the ways that people will find your website and they are an indicator to search engines that your website is important and should rank well. So it is important to keep an eye on this statistic for your website. There is a saying: “you can’t manage what you can’t measure” which applies. If you want your website to rank well you need to manage your inlinks and so you need to measure them.

This script requires a Yahoo! AppID because it uses the REST API for Yahoo! Site Explorer rather than any scraping of pages which you can get by going to the Yahoo! Developer Network.

The script simply returns the total number of results but you could easily extend this to print out all your inlinks. I will be using this to track my inlink count over time by running it every day and storing the result in a database.

Example Usage:

$ python

Here’s the Python Code:

#!/usr/bin/env python 
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
import urllib2, sys, urllib
   import json
   import simplejson as json #
def yahoo_inlinks_count(query):
    if not query.startswith('http://') raise Exception('site must start with "http://"')
    request = '' + YAHOO_APP_ID + '&query=' + urllib.quote_plus(query) + '&output=json&results=0'
    	results = json.load(urllib2.urlopen(request))
    	raise Exception("Web services request failed")
    return results['ResultSet']['totalResultsAvailable']
if __name__=='__main__':
    print 'checking', sys.argv[1]
    print yahoo_inlinks_count(sys.argv[1])

ClickBankClickbank is an amazing service that allows anyone to easily to either as a publisher create and sell information products or as an advertiser sell other peoples products for a commission. Clickbank handles the credit card transactions, and refunds while affiliates can earn as much as 90% of the price of the products as commission. It’s a pretty easy to use system and I have used it both as a publisher and as an affiliate to make significant amounts of money online.

The script I have today is a Python program that uses Clickbank’s REST API to download the latest transactions for your affiliate IDs and stuffs the data into a database.

The reason for doing this is that it keeps the data in your control and allows you to more easily see all of the transactions for all your accounts in one place without having to go to and log in to your accounts constantly. I’m going to be including this data in my Business Intelligence Dashboard Application

One of the new things I did while writing this script was made use of SQLAlchemy to abstract the database. This means that it should be trivial to convert it over to use MySQL – just change the connection string.

Also you should note that to use this script you’ll need to get the “Clerk API Key” and the “Developer API Key” from your Clickbank account. To generate those keys go to the Account Settings tab from the account dashboard. If you have more than one affiliate ID then you’ll need one Clerk API Key per affiliate ID.

This is the biggest script I have shared on this site yet. I hope someone finds it useful.

Here’s the code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
import csv
import httplib
import logging
from sqlalchemy import Table, Column, Integer, String, MetaData, Date, DateTime, Float
from sqlalchemy.schema import UniqueConstraint
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
LOG_FILENAME = 'ClickbankLoader.log'
#generate these keys in the Account Settings area of ClickBank when you log in.
ACCOUNTS = [{'account':'YOUR_AFFILIATE_ID',  'API_key': 'YOUR_API_KEY' },]
Base = declarative_base()
class ClickBankList(Base):
    __tablename__ = 'clickbanklist'
    __table_args__ = (UniqueConstraint('date','receipt','item'),{})
    id                 = Column(Integer, primary_key=True)
    account            = Column(String)
    processedPayments  = Column(Integer)
    status             = Column(String)
    futurePayments     = Column(Integer)
    firstName          = Column(String)
    state              = Column(String)
    promo              = Column(String)
    country            = Column(String)
    receipt            = Column(String)
    pmtType            = Column(String)
    site               = Column(String)
    currency           = Column(String)
    item               = Column(String)
    amount             = Column(Float)
    txnType            = Column(String)
    affi               = Column(String)
    lastName           = Column(String)
    date               = Column(DateTime)
    rebillAmount       = Column(Float)
    nextPaymentDate    = Column(DateTime)
    email              = Column(String)
    format = '%Y-%m-%dT%H:%M:%S'
    def __init__(self, account, processedPayments, status, futurePayments, firstName, state, promo, country, receipt, pmtType, site, currency, item, amount , txnType, affi, lastName, date, rebillAmount, nextPaymentDate, email):
        self.account            = account
        if processedPayments != '':
        	self.processedPayments  = processedPayments
        self.status             = status
        if futurePayments != '':
            self.futurePayments     = futurePayments
        self.firstName          = firstName
        self.state              = state              = promo            = country
        self.receipt            = receipt
        self.pmtType            = pmtType               = site
        self.currency           = currency
        self.item               = item
        if amount != '':
        	self.amount             = amount 
        self.txnType            = txnType
        self.affi               = affi
        self.lastName           = lastName               = datetime.strptime(date[:19], self.format)
        if rebillAmount != '':
        	self.rebillAmount       = rebillAmount
        if nextPaymentDate != '':
        	self.nextPaymentDate    = datetime.strptime(nextPaymentDate[:19], self.format)              = email
    def __repr__(self):
        return "<clickbank ('%s - %s - %s - %s')>" % (self.account,, self.receipt, self.item)
def get_clickbank_list(API_key, DEV_key):
    conn = httplib.HTTPSConnection('')
    conn.putrequest('GET', '/rest/1.0/orders/list')
    conn.putheader("Accept", 'text/csv')
    conn.putheader("Authorization", DEV_key+':'+API_key)
    response = conn.getresponse()
    if response.status != 200:
        logging.error('HTTP error %s' % response)
        raise Exception(response)
    csv_data =
    return csv_data
def load_clickbanklist(csv_data, account, dbconnection=CONNSTRING, echo=False):
    engine = create_engine(dbconnection, echo=echo)
    metadata = Base.metadata
    Session = sessionmaker(bind=engine)
    session = Session()
    data = csv.DictReader(iter(csv_data.split('\n')))
    for d in data:
        item = ClickBankList(account, **d)
        #check for duplicates before inserting
        checkitem = session.query(ClickBankList).filter_by(, receipt=item.receipt, item=item.item).all()
        if not checkitem:
  'inserting new transaction %s' % item)
if  __name__=='__main__':
        for account in ACCOUNTS:
            csv_data = get_clickbank_list(account['API_key'], DEV_API_KEY)
            load_clickbanklist(csv_data, account['account'])

feedburnerI spent some time trying to find a snippet of example code that used Feedburner’s Awareness API with Python but I Google wasn’t much help. So I put one together for you.

One thing that I didn’t realize about feedburner stats is that if the feed publicly displays a chicklet on the site (like mine) then the RSS data is available without authentication. Therefore it’s possible to display a history of other sites RSS statistics on your own. Maybe you can think of a way to use that data.

By adding a dates argument to the REST URL you can request historical data from the API but as the script is written it will just print out the current statistics.

Example Usage:

$ python
HalotisBlog :
2009-08-11 - 32 - 87 - 217

Here’s the Python source code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
import urllib2
    from xml.etree import ElementTree
except ImportError:
    from elementtree import ElementTree
#add a dates=YYYY-MM-DD,YYYY-MM-DD argument to the url to get all data in a date range
url_prefix = ''
URIs = ['HalotisBlog',]
def print_feedburner(content):
    tree = ElementTree.fromstring(content)
    for feed in tree.findall('feed'):
        print feed.get('uri'), ':'
        for entry in feed.findall('entry'):
            print entry.get('date'), '-', entry.get('reach'), '-', entry.get('circulation'), '-', entry.get('hits')
if  __name__=='__main__':
    for uri in URIs:
        content = urllib2.urlopen(url_prefix + uri).read()

Ok, even though Yahoo search is on the way out and will be replace by the search engine behind Bing. That transition won’t happen until sometime in 2010. Until then Yahoo still has 20% of the search engine market share and it’s important to consider it as an important source of traffic for your websites.

This script is similar to the Google and Bing SERP scrapers that I posted earlier on this site but Yahoo’s pages were slightly more complicated to parse. This was because they use a re-direct service in their URLs which required some regular expression matching.

I will be putting all these little components together into a larger program later.

Example Usage:

$ python

Here’s the Script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
import urllib,urllib2
import re
from BeautifulSoup import BeautifulSoup
def yahoo_grab(query):
    address = "" % (urllib.quote_plus(query))
    request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
    urlfile = urllib2.urlopen(request)
    page =
    soup = BeautifulSoup(page)
    url_pattern = re.compile('/\*\*(.*)')
    links =   [urllib.unquote_plus(url_pattern.findall(x.find('a')['href'])[0]) for x in soup.find('div', id='web').findAll('h3')]
    return links
if __name__=='__main__':
    # Example: Search written to file
    links = yahoo_grab('halotis')
    print '\n'.join(links)

bingLogo_5F00_lgBased on my last post for scraping the Google SERP I decided to make the small change to scrape the organic search results from Bing.

I wasn’t able to find a way to display 100 results per page in the Bing results so this script will only return the top 10. However it could be enhanced to loop through the pages of results but I have left that out of this code.

Example Usage:

$ python

Here’s the Python Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
import urllib,urllib2
from BeautifulSoup import BeautifulSoup
def bing_grab(query):
    address = "" % (urllib.quote_plus(query))
    request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
    urlfile = urllib2.urlopen(request)
    page =
    soup = BeautifulSoup(page)
    links =   [x.find('a')['href'] for x in soup.find('div', id='results').findAll('h3')]
    return links
if __name__=='__main__':
    # Example: Search written to file
    links = bing_grab('halotis')
    print '\n'.join(links)

1_google_logoHere’s a short script that will scrape the first 100 listings in the Google Organic results.

You might want to use this to find the position of your sites and track their position for certain target keyword phrases over time. That could be a very good way to determine, for example, if your SEO efforts are working. Or you could use the list of URLs as a starting point for some other web crawling activity

As the script is written it will just dump the list of URLs to a txt file.

It uses the BeautifulSoup library to help with parsing the HTML page.

Example Usage:

$ python
$ cat links.txt

Here’s the script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
import urllib,urllib2
from BeautifulSoup import BeautifulSoup
def google_grab(query):
    address = "" % (urllib.quote_plus(query))
    request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
    urlfile = urllib2.urlopen(request)
    page =
    soup = BeautifulSoup(page)
    links =   [x['href'] for x in soup.findAll('a', attrs={'class':'l'})]
    return links
if __name__=='__main__':
    # Example: Search written to file
    links = google_grab('halotis')

I was a bit hesitant to post this script since it is such a powerful marketing tool that it could be used very badly in the hands of a spammer. The basic premise is to directly respond to someone’s tweet if they mention your product or service. So for example I might want to have a tweet that goes out directly to someone who mentions twitter and python in a tweet and let them know about this blog. This will accomplish the same thing as the TwitterHawk service except you won’t have to pay per tweet.

To do this I had a choice. I could use a service like and then write a script that responded to the emails in my inbox, or I could use the Twitter Search API directly. The search API is so dead simple that I wanted to try that route.

The other thing to consider is that I don’t want to send a tweet to the same person more than once so I need to keep a list of twitter users that I have responded to. I used pickle to persist that list of usernames to disk so that it sticks around between uses.

The query functionality provided by the Twitter Search API is pretty cool and provides much more power than I have used in this script. For example it is possible to geo-target, lookup hashtags, or reply tweets. You can check out the full spec at

Lastly, to keep it a bit simpler I’m ignoring the pagination in the search results and this script will only respond to the first page worth of results. Adding a loop per page would be pretty straight forward but I didn’t want to clutter up the code.

Example Usage:

>>> import tweetBack
>>> tweetBack.tweet_back('python twitter', 'Here is a blog with some good Python scripts you might find interesting', 'twitter_username', 'twitter_password')
@nooble sent message
@ichiro_j sent message
@Ghabrie1 sent message

Here’s the Python Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
   import json as simplejson
   import simplejson  #
import twitter     #
import urllib
import pickle
TWITTER_USER = 'username'
USER_LIST_FILE = 'tweetback.pck'
#read stored list of twitter users that have been responded to already in a file
    f = open(USER_LIST_FILE, 'r')
    user_list = pickle.load(f)
    user_list = []
def search_results(query):
    url = '' + '+'.join(query.split())
    return simplejson.load(urllib.urlopen(url))
def tweet_back(query, tweet_reply, username=TWITTER_USER, password=TWITTER_PASSWORD):
    results = search_results(query)
    api = twitter.Api(username, password)
        for result in results['results']:
            if result['from_user'] not in user_list:
                api.PostUpdate('@' + result['from_user'] + ' ' + tweet_reply)
                print '@' + result['from_user'] + ' sent message'
        print 'Failed to post update. may have gone over the twitter API limit.. please wait and try again'
    #write the user_list to disk
    f = open(USER_LIST_FILE, 'w')
    pickle.dump(user_list, f)
if __name__=='__main__':
    tweet_back('python twitter', 'Here is a blog with some good Python scripts you might find interesting')

Update: thanks tante for the simplejson note.

BIBusiness Intelligence is a multi-billion dollar industry powered by heavy hitters like SAP, Oracle, and HP. The problem they attempt to solve is to mine through the mountains of data created or collected by a business and find intelligent ways to present it or find patterns.

At the very simplest level, business intelligence starts with having data – lots of it – in files, databases, on the web, or inside applications and then pulling all that data together to make inferences on it. Ultimately displaying it in a simplified form in a dashboard that the CEO can use to make effective high level decisions. In big business with lots of moving parts this is invaluable since to make the best decisions you need the best information.

For small business owners the big enterprise solutions are far out of reach, but the competitive advantage of having this data presented to you in the right way is still huge.

That’s why over the next while I’m going to be posting some scripts as I build out my own business intelligence dashboard geared towards the internet marketer. As I go, I will post the scripts here for you to take and use for yourself in your business. Some of the scripts I have in mind include:

  • Get clickbank transactions
  • Get CPA report data
  • Website traffic data
  • Get PPC advertising data
  • Google Adsense data
  • Compute Profit & Revenue numbers
  • Ranking in Google/Yahoo/Bing SERP
  • Backlinks, Google Alerts, Blog Activity

Finally, when all these pieces are put together I will bundle it into one program you can run and use as your own business intelligence dashboard. By putting all this information in one place you will save time from opening 10 tabs in my browser to login to all these sites and look at the numbers. It will systematize many of your adhoc processes and make your decisions easier to make. It will take your business to the next level.

If you have any suggestions for additional functionality in this dashboard that you think would be useful just leave a comment on this post.

Keep up with the progress on this project over the next few weeks by subscribing to the RSS feed.