Tag Archives: scripts

I found this really neat bit of .bat file magic that will let you save your python script code in a .bat file and run it in windows just like any other script. The nice thing about this is that you don’t have to create a separate “launch.bat” file with one “start python script.py” line in it.

This makes running python scripts in Windows more like it is on a Linux/Mac where you can easily add a #!/usr/bin/env python line to the script and run it directly.

Here’s the bit of tricky batch file magic that does it:

@setlocal enabledelayedexpansion && python -x "%~f0" %* & exit /b !ERRORLEVEL!
#start python code here
print "hello world"

The way it works is that the first line of the file does two different things.

  1. starts python interpreter passing the name of the file in, and the -x option will tell it to skip the first line (containing .bat file code)
  2. When python finishes the script exits.

This nifty trick makes it much nicer for writing admin scripts with python on Windows.

Update: fixed to properly pass command line arguments (%* argument passes through the command line arguments for the bat file to python)

I was a bit hesitant to post this script since it is such a powerful marketing tool that it could be used very badly in the hands of a spammer. The basic premise is to directly respond to someone’s tweet if they mention your product or service. So for example I might want to have a tweet that goes out directly to someone who mentions twitter and python in a tweet and let them know about this blog. This will accomplish the same thing as the TwitterHawk service except you won’t have to pay per tweet.

To do this I had a choice. I could use a service like TweetBeep.com and then write a script that responded to the emails in my inbox, or I could use the Twitter Search API directly. The search API is so dead simple that I wanted to try that route.

The other thing to consider is that I don’t want to send a tweet to the same person more than once so I need to keep a list of twitter users that I have responded to. I used pickle to persist that list of usernames to disk so that it sticks around between uses.

The query functionality provided by the Twitter Search API is pretty cool and provides much more power than I have used in this script. For example it is possible to geo-target, lookup hashtags, or reply tweets. You can check out the full spec at http://apiwiki.twitter.com/Twitter-API-Documentation

Lastly, to keep it a bit simpler I’m ignoring the pagination in the search results and this script will only respond to the first page worth of results. Adding a loop per page would be pretty straight forward but I didn’t want to clutter up the code.

Example Usage:

>>> import tweetBack
>>> tweetBack.tweet_back('python twitter', 'Here is a blog with some good Python scripts you might find interesting http://halotis.com', 'twitter_username', 'twitter_password')
@nooble sent message
@ichiro_j sent message
@Ghabrie1 sent message
.....

Here’s the Python Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
try:
   import json as simplejson
except:
   import simplejson  # http://undefined.org/python/#simplejson
import twitter     #http://code.google.com/p/python-twitter/
 
import urllib
import pickle
 
TWITTER_USER = 'username'
TWITTER_PASSWORD = 'password'
 
USER_LIST_FILE = 'tweetback.pck'
 
#read stored list of twitter users that have been responded to already in a file
try:
    f = open(USER_LIST_FILE, 'r')
    user_list = pickle.load(f)
except:
    user_list = []
 
def search_results(query):
    url = 'http://search.twitter.com/search.json?q=' + '+'.join(query.split())
    return simplejson.load(urllib.urlopen(url))
 
def tweet_back(query, tweet_reply, username=TWITTER_USER, password=TWITTER_PASSWORD):
    results = search_results(query)
 
    api = twitter.Api(username, password)
    try:
        for result in results['results']:
            if result['from_user'] not in user_list:
                api.PostUpdate('@' + result['from_user'] + ' ' + tweet_reply)
                print '@' + result['from_user'] + ' sent message'
 
                user_list.append(result['from_user'])
    except:
        print 'Failed to post update. may have gone over the twitter API limit.. please wait and try again'
 
    #write the user_list to disk
    f = open(USER_LIST_FILE, 'w')
    pickle.dump(user_list, f)
 
if __name__=='__main__':
    tweet_back('python twitter', 'Here is a blog with some good Python scripts you might find interesting http://halotis.com')

Update: thanks tante for the simplejson note.

BIBusiness Intelligence is a multi-billion dollar industry powered by heavy hitters like SAP, Oracle, and HP. The problem they attempt to solve is to mine through the mountains of data created or collected by a business and find intelligent ways to present it or find patterns.

At the very simplest level, business intelligence starts with having data – lots of it – in files, databases, on the web, or inside applications and then pulling all that data together to make inferences on it. Ultimately displaying it in a simplified form in a dashboard that the CEO can use to make effective high level decisions. In big business with lots of moving parts this is invaluable since to make the best decisions you need the best information.

For small business owners the big enterprise solutions are far out of reach, but the competitive advantage of having this data presented to you in the right way is still huge.

That’s why over the next while I’m going to be posting some scripts as I build out my own business intelligence dashboard geared towards the internet marketer. As I go, I will post the scripts here for you to take and use for yourself in your business. Some of the scripts I have in mind include:

  • Get clickbank transactions
  • Get CPA report data
  • Website traffic data
  • Get PPC advertising data
  • Google Adsense data
  • Compute Profit & Revenue numbers
  • Ranking in Google/Yahoo/Bing SERP
  • Backlinks, Google Alerts, Blog Activity

Finally, when all these pieces are put together I will bundle it into one program you can run and use as your own business intelligence dashboard. By putting all this information in one place you will save time from opening 10 tabs in my browser to login to all these sites and look at the numbers. It will systematize many of your adhoc processes and make your decisions easier to make. It will take your business to the next level.

If you have any suggestions for additional functionality in this dashboard that you think would be useful just leave a comment on this post.

Keep up with the progress on this project over the next few weeks by subscribing to the RSS feed.

A reader suggested that it might be useful to have a script that could get an RSS feed translate it to another language and republish that feed somewhere else. Thankfully that’s pretty easy to do in Python.

I wrote this script by taking bits and pieces from some of the other scripts that I’ve posted on this blog in the past. It’s surprising just how much of a resource this site has turned into.

It uses the Google Translate Service to convert the RSS feed content from one language to another and will simply echo out the new RSS content to the standard out. If you wanted to republish the content then you could easily direct the output to a file and upload that to your web server.

Example Usage:

$ python translateRSS.py
< ?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0"><channel><title>HalOtis Marketing</title><link>http://www.halotis.com</link><description>Esprit d&amp;#39;entreprise dans le 21?me si?cle</description>
.....
</channel></rss>

Here’s the Script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import feedparser  # available at feedparser.org
from translate import translate  # available at http://www.halotis.com/2009/07/20/translating-text-using-google-translate-and-python/
import PyRSS2Gen # avaliable at http://www.dalkescientific.com/Python/PyRSS2Gen.html
 
import datetime 
import re
 
def remove_html_tags(data):
    p = re.compile(r'< .*?>')
    return p.sub('', data)
 
def translate_rss(sl, tl, url):
 
    d = feedparser.parse(url)
 
    #unfortunately feedparser doesn't output rss so we need to create the RSS feed using PyRSS2Gen
    items = [PyRSS2Gen.RSSItem( 
        title = translate(sl, tl, x.title), 
        link = x.link, 
        description = translate(sl, tl, remove_html_tags(x.summary)), 
        guid = x.link, 
        pubDate = datetime.datetime( 
            x.modified_parsed[0], 
            x.modified_parsed[1], 
            x.modified_parsed[2], 
            x.modified_parsed[3], 
            x.modified_parsed[4], 
            x.modified_parsed[5])) 
        for x in d.entries]
 
    rss = PyRSS2Gen.RSS2( 
        title = d.feed.title, 
        link = d.feed.link, 
        description = translate(sl, tl, d.feed.description), 
        lastBuildDate = datetime.datetime.now(), 
        items = items) 
    #emit the feed 
    xml = rss.to_xml()
 
    return xml
 
if __name__ == '__main__':
  feed = translate_rss('en', 'fr', 'http://www.halotis.com/feed/')
  print feed

Today’s script will perform a search on Technorati and then scrape out the search results. It is useful because Technorati is up to date about things that are happening in the blogosphere. And that gives you a way to tune into everything going on there.

The scope of the blogosphere matched with Technoratis ability sort the results by the most recent is what makes this very powerful. This script will help you find up to the moment content which you can then data-mine for whatever purposes you want.

Possible uses:

  • Create a tag cloud of what is happening today within your niche
  • aggregate the content into your own site
  • post it to Twitter
  • convert the search results into an RSS feed

And here’s the Python code:

import urllib2
 
from BeautifulSoup import BeautifulSoup
 
def get_technorati_results(query, page_limit=10):
 
    page = 1
    links = []
 
    while page < page_limit :
        url='http://technorati.com/search/' + '+'.join(query.split()) + '?language=n&page=' + str(page)
        req = urllib2.Request(url)
        HTML = urllib2.urlopen(req).read()
        soup = BeautifulSoup(HTML)
 
        next = soup.find('li', attrs={'class':'next'}).find('a')
 
        #links is a list of (url, summary, title) tuples
        links +=   [(link.find('blockquote')['cite'], ''.join(link.find('blockquote').findAll(text=True)), ''.join(link.find('h3').findAll(text=True))) for link in soup.find('div', id='results').findAll('li', attrs={'class':'hentry'})]
 
        if next :
            page = page+1
        else :
            break
 
    return links
 
if __name__=='__main__':
    links = get_technorati_results('halotis marketing')
    print links

This script will create an image on the fly of a users most recent twitter message.  It could be used as an email or forum signature or any place that allows you to embed a custom image such as on a blog or website.

I saw a website that did this the other day and wanted to try to duplicate the functionality.  Turns out it was pretty trivial even for someone with very little PHP experience. So I felt inspired enough to create a new website based on this script and called it TwitSig.us. Check it out.

It creates images something like this:

And here’s the code that does it:

<?php
include "twitter.php"; // from http://twitter.slawcup.com/twitter.class.phps
 
$t = new twitter();
$res = $t->userTimeline($_GET["user"], 1);
 
$my_img = imagecreatefrompng ( "base.png" );
 
$grey = imagecolorallocate( $my_img, 150, 150, 150 );
$red = imagecolorallocate( $my_img, 255, 0,  0 );
$text_colour = imagecolorallocate( $my_img, 0, 0, 0 );
 
if($res===false){
	imagestring( $my_img, 4, 30, 25, "no messages at this time",
	  $text_colour );
} else {
	$newtext = wordwrap($res->status->text, 65, "\n");
	imagettftext( $my_img, 10, 0, 10, 35, $text_colour, "Arial.ttf", $newtext);
	imagettftext( $my_img, 10, 0, 90, 15, $red, "Arial Bold.ttf", "@".$_GET["user"]);
	imagettftext( $my_img, 10, 0, 225, 15, $grey, "Arial.ttf", strftime("%a %d %b %H:%M %Y", strtotime($res->status->created_at)));
}
 
header( "Content-type: image/png" );
imagepng( $my_img );
?>

To get this script working for yourself you’ll need to make sure that you have the two font files and the base.png file for the background image that the text is put on.

3038922333_79273fbb30_oThere are a number of services out there such as Google Cash Detective that will go run some searches on Google and then save the advertisements so you can track who is advertising for what keywords over time. It’s actually a very accurate technique for finding out what ads are profitable.

After tracking a keyword for several weeks it’s possible to see what ads have been running consistently over time. The nature of Pay Per Click is that only profitable advertisements will continue to run long term. So if you can identify what ads, for what keywords are profitable then it should be possible to duplicate them and get some of that profitable traffic for yourself.

The following script is a Python program that perhaps breaks the Google terms of service. So consider it as a guide for how this kind of HTML parsing could be done. It spoofs the User-agent to appear as though it is a real browser, and then does a search through all the keywords stored in an sqlite database and stores the ads displayed for that keyword in the database.

The script makes use of the awesome Beautiful Soup library. Beautiful Soup makes parsing HTML content really easy. But because of the nature of scraping the web it is very fragile since it makes several assumptions about the structure of the Google results page and if they change their site then the script could break.

#!/usr/bin/env python
 
import sys
import urllib2
import re
import sqlite3
import datetime
 
from BeautifulSoup import BeautifulSoup  # available at: http://www.crummy.com/software/BeautifulSoup/
 
conn = sqlite3.connect("espionage.sqlite")
conn.row_factory = sqlite3.Row
 
def get_google_search_results(keywordPhrase):
	"""make the GET request to Google.com for the keyword phrase and return the HTML text
	"""
	url='http://www.google.com/search?hl=en&q=' + '+'.join(keywordPhrase.split())
	req = urllib2.Request(url)
	req.add_header('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13')
	page = urllib2.urlopen(req)
	HTML = page.read()
	return HTML
 
def scrape_ads(text, phraseID):
	"""Scrape the text as HTML, find and parse out all the ads and store them in a database
	"""
	soup = BeautifulSoup(text)
	#get the ads on the right hand side of the page
	ads = soup.find(id='rhsline').findAll('li')
	position = 0
	for ad in ads:
		position += 1
 
		#display url
		parts = ad.find('cite').findAll(text=True)
		site = ''.join([word.strip() for word in parts]).strip()
		ad.find('cite').replaceWith("")
 
		#the header line
		parts = ad.find('a').findAll(text=True)
		title = ' '.join([word.strip() for word in parts]).strip()
 
		#the destination URL
		href = ad.find('a')['href']
		start = href.find('&q=')
		if start != -1 :
			dest = href[start+3:]
		else :
			dest = None
			print 'error', href
 
		ad.find('a').replaceWith("")
 
		#body of ad
		brs = ad.findAll('br')
		for br in brs:
			br.replaceWith("%BR%")
		parts = ad.findAll(text=True)
		body = ' '.join([word.strip() for word in parts]).strip()
		line1 = body.split('%BR%')[0].strip()
		line2 = body.split('%BR%')[1].strip()
 
		#see if the ad is in the database
		c = conn.cursor()
		c.execute('SELECT adID FROM AdTable WHERE destination=? and title=? and line1=? and line2=? and site=? and phraseID=?', (dest, title, line1, line2, site, phraseID))
		result = c.fetchall() 
		if len(result) == 0:
			#NEW AD - insert into the table
			c.execute('INSERT INTO AdTable (`destination`, `title`, `line1`, `line2`, `site`, `phraseID`) VALUES (?,?,?,?,?,?)', (dest, title, line1, line2, site, phraseID))
			conn.commit()
			c.execute('SELECT adID FROM AdTable WHERE destination=? and title=? and line1=? and line2=? and site=? and phraseID=?', (dest, title, line1, line2, site, phraseID))
			result = c.fetchall()
		elif len(result) > 1:
			continue
 
		adID = result[0]['adID']
 
		c.execute('INSERT INTO ShowTime (`adID`,`date`,`time`, `position`) VALUES (?,?,?,?)', (adID, datetime.datetime.now(), datetime.datetime.now(), position))
 
 
def do_all_keywords():
	c = conn.cursor()
	c.execute('SELECT * FROM KeywordList')
	result = c.fetchall()
	for row in result:
		html = get_google_search_results(row['keywordPhrase'])
		scrape_ads(html, row['phraseID'])
 
if __name__ == '__main__' :
	do_all_keywords()

It is extremely useful to send emails from scripts.  Emails can alert you to errors as soon as they happen or can give you regular status updates about the running of your programs.

I have several scripts that run regularly to update various websites or scrape data from different places and quite often when dealing with the internet things change.  Code breaks constantly as the things they depend on change so to make sure everything continues to run it’s important to be notified when errors happen.

One of the greatest ways to do this is to have your programs send email messages to you.  I use Google’s Gmail SMTP server to relay my messages to me.  That way I don’t have to rely on having sendmail installed on the machine or hooking into something like MS Outlook to compose an email.

This small simple script uses smtplib to send simple text emails using Gmail’s SMTP service.

#!/usr/bin/python
 
import smtplib
from email.MIMEText import MIMEText
 
GMAIL_LOGIN = 'myemail@gmail.com'
GMAIL_PASSWORD = 'password'
 
 
def send_email(subject, message, from_addr=GMAIL_LOGIN, to_addr=GMAIL_LOGIN):
    msg = MIMEText(message)
    msg['Subject'] = subject
    msg['From'] = from_addr
    msg['To'] = to_addr
 
    server = smtplib.SMTP('smtp.gmail.com',587) #port 465 or 587
    server.ehlo()
    server.starttls()
    server.ehlo()
    server.login(GMAIL_LOGIN,GMAIL_PASSWORD)
    server.sendmail(from_addr, to_addr, msg.as_string())
    server.close()
 
 
if __name__=="__main__":
    send_email('test', 'This is a test email')