Category Archives: Python

My first thoughts yesterday when I started trying to add a lookup for a user’s country based on IP address was that this was going to be tricky. I figured I would have to create some models, fill them with data fixtures and do some manual queries against the database.

Turns out it was fairly trivial to do.

Django comes with some handy modules for doing it all for you. It just requires installing a C library from MaxMind and downloading their free Country data file.

To install the GeoIP MaxMind C Lib on Mac I used homebrew

brew install geoip

on the server I had to do the same thing on Linux:

sudo yum install geoip

Then I downloaded the free country data file from MaxMind and put it in my django project’s ‘geo’ directory:

$ curl -O http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
$ gunzip GeoIP.dat.gz 
$ mv GeoIP.dat <project_root>/geo/</project_root>

To finish the setup I needed to add a line to the settings.py file:

import os
PROJECT_ROOT = os.path.dirname(__file__)
GEOIP_PATH = os.path.join(PROJECT_ROOT, 'geo')

Getting the country of a connecting user was then rather simple.

from django.contrib.gis.utils import GeoIP
g = GeoIP()
country = g.country_code(request.META['REMOTE_ADDR'])

For the complete documentation on GeoIP check out the official documentation.

I usually use Linux for doing python and django development.   However last night my Linux PC choked up yet again due to bad video drivers and I was forced to do a hard reboot.

That was the final straw that made me switch over to using my Mac for most of my development work going forward.

I keep my current projects in Dropbox so that they are always up to date across all the computers I use day to day.  So there was nothing to do to get those files migrated over to the Mac.

My python and django development environment is pretty light weight. I use:

  • virtualenv
  • vim
  • textmate
  • terminal with zsh
  • mercurial

I don’t do django stuff in an IDE.  I find them a bit too heavy for coding.  Instead I opt for using either Vim or TextMate.  Very simple, text editing with little clutter or UI to get in the way.

I use the standard Mac Terminal app but I’ve changed the standard bash shell over to zsh using Oh My zsh. Which can be installed with this one line:

curl -L https://github.com/robbyrussell/oh-my-zsh/raw/master/tools/install.sh | sh

Virtualenv is a necessity.  It keeps all the projects I work on isolated so that I can set versions for all the libraries and come back to a project a year later and still have it work immediately.  Setting up virtualenv and virtualenvwrapper on the Mac was fairly easy:

$ sudo easy_install pip
$ sudo pip install virtualenv virtualenvwrapper
$ mkdir ~/.virtualenvs
$ echo "export WORKON_HOME=$HOME/.virtualenvs" >> ~/.zshrc
$ echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.zshrc

Restart the terminal after that. Now ‘workon’ and ‘mkvirtualenv’ will be available for working on and creating virtualenv python environments.

Mercurial is what I use to version control all my projects. I just find it makes more sense than git and the command line is much simpler and cleaner.

That’s pretty much it.

My mobile app control server is turning into a bit of a powerhouse.  The latest and perhaps most exciting addition to the server has been support for sending Apple Push Notifications and registering devices for those notifications.

The goal of this is to make use of the push notifications for updating the apps, and cross promoting new apps to help build bigger and bigger launches.

One of the things that’s possible to do (but which I have not seen very many examples of yet) is having an alert open a link.  It’s a very powerful feature when you consider how easy it is to link directly to the App Store, – With a LinkShare affiliate link (aff link).  So when I find out about any hot new game release, or sale event I can let my users know about it.

It took a while to figure out how to handle the notification payload so here’s the code snippet:

- (BOOL) application:(UIApplication*)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
    NSString *link = [[launchOptions objectForKey:@"UIApplicationLaunchOptionsRemoteNotificationKey"] objectForKey:@"link"];
    if (link != nil) {
        //abort app startup and goto link
        NSURL *url = [NSURL URLWithString:link];
        [[UIApplication sharedApplication] openURL:url];
    }
...

With that bit of code in the App Delegate I can send a notification with a {“link”:”http://halotis.com”} custom payload and when the user clicks the alert it will direct the user to the website. Because the openURL call can handle lots of custom URL types it becomes possible to link directly into the Facebook App (if it is installed), or link to the app rating page to ask for the user to rate the app in iTunes, open a map location, or even dial a phone number.

Why not use something like urban airship to do all this? Well mostly because I want to consolidate app management as much as possible. The more features I put into my custom server the more likely I will leave it open and make use of it.

Over the holidays I somehow found the time to code up an advertisement server for use with my mobile Apps.  The reason for writing a custom solution for this is that I wanted to use non-standard image sizes which I can then pull into the apps and games in unique ways so that they don’t have the look and feel of an ad.

The custom ads will be popped up in a balloon, animated across the screen or hidden in a drawer waiting to pounce.  With a unique delivery of the ad I hope it can stand out without getting in the way or feel annoying.  With the non-standard sizes I hope it won’t get the immediate “this is an ad” response from people who see it.

One of the things I have learned about doing print advertising and direct mailers is that to be noticed you have to do things unexpected.  Show someone something that they simply can’t ignore.  That’s why the classic 5 cent letters work so well – a nickel attached to a letter will immediately get noticed, and you will surely open it up.  Getting outside of the normal bar style ad along the bottom of the screen is just one simple way to break out of the expected.

Going with the standard ad services out there such as Admob or one of the many other networks is a good way to get paid, but (from my testing) is a terrible way to advertise.  Erroneous and fraudulent clicks are rampant on phones and paying for clicks simply doesn’t come close to breaking even in most cases.

I needed ads that could be targeted and integrated with the look and feel of an app.  To do that I wanted the flexibility of an image. but I also wanted the ability to provide both app specific ads as well as network wide ads.  So for example if there is a free and paid version of the app I can always advertise the paid version on the free app.  But moving the ad to the server allows me to test different images, or run special promotions.

Looking forward to seeing how it performs live in January.

This event has been a long time in the making and now the day has finally arrived.

Most readers of this blog would be interested in the technical details of the site.

Automatic Blog Machine is a django application hosted on Amazon’s cloud infrastructure. The project has evolved over several years various scripts that I had been using to personally manage some of my websites.

There were a number of pain points in the development and lessons learned.

PayPal Integration
The most difficult part of the coding was integrating with PayPal. It was also probably one of the more innovative parts from a sales/marketing perspective. I used the django-paypal app to handle the dirty work but it was really the signup process that I thought worked out really well.

For signups the sales page presents an account creation form. This form will create an inactive user account and then redirects the user to the paypal site to complete the purchase. When they finish the purchase paypal sends an ipn POST request back to the site which activates the account. This allows me to create some commitment and consistency in the buying process which should (in theory) boost conversions.

PayPal’s API is a bit of a mess and testing it is somewhat difficult. I’m looking forward to seeing what the guys at Stripe have to offer to compete.

Performance
There were a few weeks during development after I ramped up the number of sites that were being managed where the entire website was deathly slow. Because there are a number of python processes that are scheduled to run through cron it wasn’t easy to see which job was causing the bottleneck.

Eventually I discovered that one of the major cuprits was the use of print statements in the code. A lot of the older code was logging with prints to standard out and then redirecting to a file. This was good enough when they were a lot simpler and I was running them manually from the commandline. Switching to use python logging made a massive difference.

Effort
Django is a great framework that makes it very quick to get things put together. What takes time however is a lot of the little things – creating logos in various sizes for different pages, writing content for pages, and messing with javascript or CSS. I was surprised at how much time it took to get videos done. Even a 2 minute video takes a minium of about 40 minutes to film, get on the computer, edit it, export it, and upload to the web. the 12 minute sales video took a full day to finish, plus a few hours to write a script.

The coding, thanks to Python and Django took almost no time.

Development Tips/Tricks
Because I was the only developer on this project I was able to work in a way that probably wouldn’t work well for a team.

All the code I was working on was kept in a DropBox folder – this allowed me to have access to the latest changes across all the computers I might use.

I used Fabric to deploy both the code and the media files. Static media files were hosted on Amazon S3. The code was committed to a private code repository on bitbucket and then remotely pulled to the server. This seemed to workout really well and made deployments very easy and quick to do.

Virtualenv and pip also were a great help for managing the libraries and python versions across the computers I have.

Don’t forget about some of the nice things that users need. Just an hour after launch I realized that I wasn’t sending welcome emails and I was hiding the link to the login page. So the first few people I had to manually send them a message for how to login. Somewhat embarrassing.

Spend some time thinking about how best to organize the templates. As the project I started to find I was doing a lot more copy/paste and it resulted in some minor bugs where pages had missing links. And resulted in having probably a few more files than were really necessary for base templates that were 90% identical.

Much of the work done on this happens in scripts that are scheduled to run – there’s no webpage getting hit so errors are not visible when going through the site. It was really helpful to have those scripts set up so that errors created emails with stack traces – that allowed me to quickly track down some lingering problems.

Final Point
Definitely would recommend using python and django to anyone doing web application projects. It’s really a pleasure to use. As a developer though I totally underestimated the time it takes to get the UI just right, and think of all the various pages and emails that are required to go out.

Someone asked me recently to develop a content spinner algorithm which can take a document and produce variations of that document. I thought it was an interesting thing to think about, and very rarely do I get to think about algorithms like this in my normal day-to-day programming.

The document contains variation options with a special syntax.  for example:

{Hi|Hello|Good morning}, my name is Matt and I have {something {important|special} to say|a favorite book}.

The algorithm will recursively go through the string and generate a new string by choosing an option provided in curly braces separated by pipes.

import random
 
def spin(content):
    """takes a string like
 
    {Hi|Hello|Good morning}, my name is Matt and I have {something {important|special} to say|a favorite book}.
 
    and randomly selects from the options in curly braces
    to produce unique strings.
    """
    start = content.find('{')
    end = content.find('}')
 
    if start == -1 and end == -1:
        #none left
        return content
    elif start == -1:
        return content
    elif end == -1:
        raise "unbalanced brace"
    elif end < start:
        return content
    elif start < end:
        rest = spin(content[start+1:])
        end = rest.find('}')
        if end == -1:
            raise "unbalanced brace"
        return content[:start] + random.choice(rest[:end].split('|')) + spin(rest[end+1:])
 
if __name__=='__main__':
    print spin('{Hi|Hello|Good morning}, my name is Matt and I have {something {important|special} to say|a favorite book}.')

This is the project that I have been focused on building for the last few months, and it has expanded from work that I have been doing off and on for several years.  I’m excited that it finally has a home of it’s own and is nearly ready to release to the public.

I will be finishing up the design over the next month or so with the plan to do a very small announcement in early January to let some early adopters in to start using the system.

You are probably a lot like me and have a couple of idle domain names that you’ve picked up over the years and never really done anything with.  You also have a few websites that are actually pretty decent but not getting the traffic they deserve from the search engines.  That’s why I created the automatic blog machine – For myself to solve these two problems.  I can easily create a blog and hook it into the system then forget about it.  It will run for months building traffic, links, and attracting advertisers and drive traffic to the websites I actually care about.

I’m using this system to build out my network of websites and build an increasing base of pages I can then sell through Google DoubleClick for Publishers, or sell text link ads or promote Amazon products, Ebay auctions or any number of other affiliate products.  By integrating an Ad Server I can quickly and easily put an ad across the entire network at no cost to me and immediately drive massive amounts of traffic.

One concern is that the sites should not be spammy and I took great effort to make sure the content that the Automatic Blog Machine creates is unique, natural and readable.  That means auto-translation is not used because it creates hard to read content, it also means there is a strategy for both internal and external linking.  Getting these things to work correctly was actually pretty complex.  It requires me to datamine a lot of content in order to build each blog post.

I’m pretty proud of the development of this tool and I’m looking forward to letting people in to see how it works.

Check out the Automatic Blog Machine

After seeing the successful launch of the Autoblog Samurai product launch come through my email box over the last week I thought it might be time to dig up my scripts that I wrote several years ago to attempt the same thing. Over the past few years I have run a couple of autoblogs but never really took the concept into something that was really profitable or very easy to use. (even though the blogs that I did run were making a small profit)

But after seeing the amount of excitement that Autoblog Samurai has been able to create around their software I’m intrigued enough to give it a second shot.

So I have started to revamp my existing hodge podge of scripts into a proper web based application.  It will be a django based web application that allows users to configure many blogs and pipe in many content sources to each one.  Wrapped around everything will be a number of specific monetization tools, cross promotion tools, and hopefully some analytics built right in.

The one big perk of having it web based is that it will never sleep unlike your home computer which may not always be turned on an have the software running.  That actual bit of out of sight out of mind might actually mean that users forget they’re running a bunch of autoblogs until they get a check in the mail from Google Adsense.

I’m not sure yet if I’ll make the software public or if I’ll keep it for myself.

After an hour of work on it last night I actually got a functional prototype system running.  There’s a couple of things that need to be cleaned up, and features added to make it competitive.  But most of the work will be designing and polishing a nice user interface.

I would like to be able to create a system that can scale to 10,000+ blogs and publish content to all of them as often as every minute.  I think that would be an interesting experiment in internet marketing.

Connecting to a Google Gmail account is easy with Python using the built in imaplib library. It’s possible to download, read, mark and delete messages in your gmail account by scripting it.

Here’s a very simple script that prints out the latest email received:

#!/usr/bin/env python
 
import imaplib
M=imaplib.IMAP4_SSL('imap.gmail.com', 993)
M.login('myemailaddress@gmail.com','password')
status, count = M.select('Inbox')
status, data = M.fetch(count[0], '(UID BODY[TEXT])')
 
print data[0][1]
M.close()
M.logout()

As you can see. Not a lot of code required to login and check and email. However, imaplib provides just a very thin layer on the imap protocol and you’ll have to refer to the documentation on how imap works and the commands available to really use imaplib. As you can see in the fetch command the “(UID BODY[TEXT])” bit is a raw imap instruction. In this case I’m calling fetch with the size of the Inbox folder because the most recent email is listed last (uid of most recent message is count) and telling it to return the body text of the email. There are many more complex ways to navigate an imap inbox. I recommend playing with it in the interpreter and connecting directly to the server with telnet to understand exactly what is happening.

Here’s a good resource for quickly getting up to speed with IMAP Accessing IMAP email accounts using telnet

As much as I’ve found the basic webscraping to be really simple with urllib and BeautifulSoup. It leaves somethings to be desired. The BeautifulSoup project has languished and recent versions have switched the HTML parser for one that is less able to manage with the poorly encoded pages on real websites.

Scrapy is a full on framework for scraping websites and it offers many features including a stand alone command-line interface and daemon tool to make scraping websites much more systematic and organized.

I have yet to build any substantial scraping scripts based on Scrapy but judging from the snippets I’ve read at http://snippets.scrapy.org, the documentation at http://doc.scrapy.org and the project blog at http://blog.scrapy.org. It seems like a solid project with a good future and a lot of really great features that will make my scripts more automate-able and standardized.