Tag Archives: google

It occurred to me yesterday when I was reading an essay by technology investor and author Paul Graham called What Happened to Yahoo that RIM seems to be making the same mistake.

Yahoo was making a lot of money – so much in fact that they stopped trying to make more of it. They were blatantly ignoring new product ideas, and improvements to their services because it was too easy for the sales guys to sell banner ad spaces for millions of dollars. Along comes a competitor (Google) with a fundamentally more valuable product (Adwords) which very quickly destroys Yahoo’s banner advertising business. The lack of development at Yahoo over the years had unfortunately destroyed the corporate culture leaving them ill equipped to invent new products. They certainly couldn’t keep up with all the smart people that Google was hiring.

And so the inevitable happened. The business shrank forcing them to divest from important products and “re-focus” on core products.

I suspect that RIM made the same critical mistake. They owned the smartphone market until Apple came out of the blue with the iPhone. However it had taken Apple about 3 years of development before the first iPhone was released. Certainly at some point in these three years the pieces of technology that make the iPhone possible would have been discussed and tossed aside by people at RIM. They got far too committed to their crappy apps, and qwerty keyboards to try anything that might be an improvement.

The money RIM made hand over fist was from the enterprises that would buy expensive servers to manage all the corporate phones. It was the only solution for so long that they really didn’t have to sell it. Every new phone was only a minor improvement to the existing model, but people bought them so RIM probably thought they were doing a good enough job.

Now that there’s a competitor in the market RIM is finding itself in a bad situation. The phones operating system had been worked on by mediocre programmers who left it in a state where it now has to be tossed out – what’s worse is that they couldn’t trust the internal programmers to program their own operating system so they brought QNX in to do a proper job of it. The hardware hadn’t seen any major revisions or new ideas. And the backend servers that enterprises spent lots of money on have become obsolete – the iPhone can connect directly to Microsoft Exchange.

RIM’s business is now in decline and I suspect that they will soon find themselves cutting product lines to “re-focus” on key markets. They’ll continue to slide until eventually they become an acquisition target for the likes of Nokia or Motorola.

My suggestion to RIM to get them out of this is to step back and address some internal issues. They need to hire the smartest programmers they can find – steal them from Google, Apple, Facebook, and Microsoft if they have to. Give those programmers the freedom to work on projects that aspire to do amazing things for users (not businesses) and hold them to that vision. Tackle and rebuild the corporate culture to find and embrase new ideas, more efficient ways of doing things, and new business ventures to make money from. Encourage and give employees time to mingle between departments to discuss ideas and provide time and ways for them to work on them.

Want another idea? Start a business incubator program following the model of y-combinator but focused on BlackBerry Apps. Hire students for 4 months give them about $20,000 and help with the legal work for starting a business as well as networking connections. In exchange take 5%-10% equity stake in the business. It will help establish the developer community, potentially some killer apps or games for the platform and could eventually result in a huge payout if these companies ever get acquired or IPO.

I suspect however that RIM is a ship too big to turn.

I have been hard at work testing out different approaches to Adwords.  One of the keys is that I’m scripting up a lot of the management of campaigns, ad groups, keywords, and ads.  The Adwords API could be used but there’s an expense to using it which would be a significant expense for the size of my campaigns.  So I have been using the Adwords Editor to help manage everything.  What makes it excellent is that the tool has import and export to/from csv files.  This makes it pretty simple to play with the data.

To get a file that this script will work with just go to the File menu in Google Adwords Editor and select “Export to CSV”  You can then select “Export Selected Campaigns”.  it will write out a csv file.

This Python script will read those output csv files into a Python data structure which you can then manipulate and write back out to a file.

With the file modified you can then use the Adwords Editor’s “Import CSV” facility to get your changes back into the Editor and then uploaded to Adwords.

Having this ability to pull this data into Python, modify it, and then get it back into Adwords means that I can do a lot of really neat things:

  • create massive campaigns with a large number of targeted ads
  • Invent bidding strategies that act individually at the keyword level
  • automate some of the management
  • pull in statistics from CPA networks to calculate ROIs
  • convert text ads into image ads

Here’s the script:

#!/usr/bin/env python
# coding=utf-8
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
read and write exported campaigns from Adwords Editor
import codecs
import csv
FIELDS = ['Campaign', 'Campaign Daily Budget', 'Languages', 'Geo Targeting', 'Ad Group', 'Max CPC', 'Max Content CPC', 'Placement Max CPC', 'Max CPM', 'Max CPA', 'Keyword', 'Keyword Type', 'First Page CPC', 'Quality Score', 'Headline', 'Description Line 1', 'Description Line 2', 'Display URL', 'Destination URL', 'Campaign Status', 'AdGroup Status', 'Creative Status', 'Keyword Status', 'Suggested Changes', 'Comment', 'Impressions', 'Clicks', 'CTR', 'Avg CPC', 'Avg CPM', 'Cost', 'Avg Position', 'Conversions (1-per-click)', 'Conversion Rate (1-per-click)', 'Cost/Conversion (1-per-click)', 'Conversions (many-per-click)', 'Conversion Rate (many-per-click)', 'Cost/Conversion (many-per-click)']
def readAdwordsExport(filename):
    campaigns = {}
    f = codecs.open(filename, 'r', 'utf-16')
    reader = csv.DictReader(f, delimiter='\t')
    for row in reader:
        #remove empty values from dict
        row = dict((i, j) for i, j in row.items() if j!='' and j != None)
        if row.has_key('Campaign Daily Budget'):  # campain level settings
            campaigns[row['Campaign']] = {}
            for k,v in row.items():
                campaigns[row['Campaign']][k] = v
        if row.has_key('Max Content CPC'):  # AdGroup level settings
            if not campaigns[row['Campaign']].has_key('Ad Groups'):
                campaigns[row['Campaign']]['Ad Groups'] = {}
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']] = row
        if row.has_key('Keyword'):  # keyword level settings
            if not campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']].has_key('keywords'):
                campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['keywords'] = []
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['keywords'].append(row)
        if row.has_key('Headline'):  # ad level settings
            if not campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']].has_key('ads'):
                campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['ads'] = []
            campaigns[row['Campaign']]['Ad Groups'][row['Ad Group']]['ads'].append(row)
    return campaigns
def writeAdwordsExport(data, filename):
    f = codecs.open(filename, 'w', 'utf-16')
    writer = csv.DictWriter(f, FIELDS, delimiter='\t')
    writer.writerow(dict(zip(FIELDS, FIELDS)))
    for campaign, d in data.items():
        writer.writerow(dict((i,j) for i, j in d.items() if i != 'Ad Groups'))
        for adgroup, ag in d['Ad Groups'].items():
            writer.writerow(dict((i,j) for i, j in ag.items() if i != 'keywords' and i != 'ads'))
            for keyword in ag['keywords']:
            for ad in ag['ads']:
if __name__=='__main__':
    data = readAdwordsExport('export.csv')
    print 'Campaigns:'
    print data.keys()
    writeAdwordsExport(data, 'output.csv')

This code is available in my public repository: http://bitbucket.org/halotis/halotis-collection/

Google Adwords LogoLast week I started testing some new concepts on Adwords. A week has passed and I wanted to recap what has happened and some things that I have noticed and learned so far.

First off, the strategy that I’m testing is to use the content network exclusively. As a result some of the standard SEM practices don’t really apply. Click through rates are dramatically lower than on search and it takes some time to get used to 0.1% CTRs. It takes a lot of impressions to get traffic at those levels.

Luckily the inventory of ad space with people using Adsense is monstrous and as a result there is plenty opportunities for placements. So for my limited week of testing I have had about 150,000 impressions on my ads resulting in 80 clicks.

The other thing to note is that there is comparatively nobody running ads on the content network. So the competition is almost non-existent. That makes the price per click very low. The total ad spend for the first week of testing was less than $10.

I have run into a number of problems in my testing that I never expected.

  • It’s not possible to use the Adwords API to build flash ads with the Display Ad Builder :(
  • There seems to be a bug with the Adwords Editor when trying to upload a lot of image ads.
  • It takes a long time for image ads to be approved and start running (none of my image ads have been approved yet)
  • Paying to use the Adwords API works out to be very expensive for the scale I want to use it at.
  • optimizing the price is time consuming since it can take days to see enough results.

With all those problems I’m still optimistic that I can find a way to scale things up more radically.  So far in the past week I have written a number of scripts that have helped me build out the campaigns, ad groups and ads.  It has gotten to the point where I can now upload over 1000 text ads to new campaigns, ad groups and keywords in one evening.

Since so far the testing has been inconclusive I’m going to hold off sharing the scripts I have for this strategy.  If it works out you can count on me recording some videos of the technique and the scripts to help.

Have you ever wanted to track and assess your SEO efforts by seeing how they change your position in Google’s organic SERP? With this script you can now track and chart your position for any number of search queries and find the position of the site/page you are trying to rank.

This will allow you to visually identify any target keyword phrases that are doing well, and which ones may need some more SEO work.

This python script has a number of different components.

  • SEOCheckConfig.py script is used to add new target search queries to the database.
  • SEOCheck.py searches Google and saves the best position (in the top 100 results)
  • SEOCheckCharting.py graph all the results

The charts produced look like this:


The main part of the script is SEOCheck.py. This script should be scheduled to run regularly (I have mine running 3 times per day on my webfaction hosting account).

For a small SEO consultancy business this type of application generates the feedback and reports that you should be using to communicate with your clients. It identifies where the efforts should go and how successful you have been.

To use this set of script you first will need to edit and run the SEOCheckConfig.py file. Add your own queries and domains that you’d like to check to the SETTINGS variable then run the script to load those into the database.

Then schedule SEOCheck.py to run periodically. On Windows you can do that using Scheduled Tasks:
Scheduled Task Dialog

On either Mac OSX or Linux you can use crontab to schedule it.

To generate the Chart simply run the SEOCheckCharting.py script. It will plot all the results on one graph.

You can find and download all the source code for this in the HalOtis-Collection on bitbucket. It requires BeautifulSoup, matplotlib, and sqlalchemy libraries to be installed.

1_google_logoHere’s a short script that will scrape the first 100 listings in the Google Organic results.

You might want to use this to find the position of your sites and track their position for certain target keyword phrases over time. That could be a very good way to determine, for example, if your SEO efforts are working. Or you could use the list of URLs as a starting point for some other web crawling activity

As the script is written it will just dump the list of URLs to a txt file.

It uses the BeautifulSoup library to help with parsing the HTML page.

Example Usage:

$ python GoogleScrape.py
$ cat links.txt

Here’s the script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
import urllib,urllib2
from BeautifulSoup import BeautifulSoup
def google_grab(query):
    address = "http://www.google.com/search?q=%s&num=100&hl=en&start=0" % (urllib.quote_plus(query))
    request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
    urlfile = urllib2.urlopen(request)
    page = urlfile.read(200000)
    soup = BeautifulSoup(page)
    links =   [x['href'] for x in soup.findAll('a', attrs={'class':'l'})]
    return links
if __name__=='__main__':
    # Example: Search written to file
    links = google_grab('halotis')

This isn’t my script but I thought it would appeal to the reader of this blog.  It’s a script that  will lookup the Google Page Rank for any website and uses the same interface as the Google Toolbar to do it. I’d like to thank Fred Cirera for writing it and you can checkout his blog about this script here.

I’m not exactly sure what I would use this for but it might have applications for anyone who wants to do some really advanced SEO work and find a real way to accomplish Page Rank sculpting. Perhaps finding the best websites to put links on.

The reason it is such an involved bit of math is that it need to compute a checksum in order to work. It should be pretty reliable since it doesn’t involve and scraping.

Example usage:

$ python pagerank.py http://www.google.com/
PageRank: 10	URL: http://www.google.com/
$ python pagerank.py http://www.mozilla.org/
PageRank: 9	URL: http://www.mozilla.org/
$ python pagerank.py http://halotis.com
PageRange: 3   URL: http://www.halotis.com/

And the script:

#!/usr/bin/env python
#  Script for getting Google Page Rank of page
#  Google Toolbar 3.0.x/4.0.x Pagerank Checksum Algorithm
#  original from http://pagerank.gamesaga.net/
#  this version was adapted from http://www.djangosnippets.org/snippets/221/
#  by Corey Goldberg - 2010
#  Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
import urllib
def get_pagerank(url):
    hsh = check_hash(hash_url(url))
    gurl = 'http://www.google.com/search?client=navclient-auto&features=Rank:&q=info:%s&ch=%s' % (urllib.quote(url), hsh)
        f = urllib.urlopen(gurl)
        rank = f.read().strip()[9:]
    except Exception:
        rank = 'N/A'
    if rank == '':
        rank = '0'
    return rank
def  int_str(string, integer, factor):
    for i in range(len(string)) :
        integer *= factor
        integer &= 0xFFFFFFFF
        integer += ord(string[i])
    return integer
def hash_url(string):
    c1 = int_str(string, 0x1505, 0x21)
    c2 = int_str(string, 0, 0x1003F)
    c1 >>= 2
    c1 = ((c1 >> 4) & 0x3FFFFC0) | (c1 & 0x3F)
    c1 = ((c1 >> 4) & 0x3FFC00) | (c1 & 0x3FF)
    c1 = ((c1 >> 4) & 0x3C000) | (c1 & 0x3FFF)
    t1 = (c1 & 0x3C0) < < 4
    t1 |= c1 & 0x3C
    t1 = (t1 << 2) | (c2 & 0xF0F)
    t2 = (c1 & 0xFFFFC000) << 4
    t2 |= c1 & 0x3C00
    t2 = (t2 << 0xA) | (c2 & 0xF0F0000)
    return (t1 | t2)
def check_hash(hash_int):
    hash_str = '%u' % (hash_int)
    flag = 0
    check_byte = 0
    i = len(hash_str) - 1
    while i >= 0:
        byte = int(hash_str[i])
        if 1 == (flag % 2):
            byte *= 2;
            byte = byte / 10 + byte % 10
        check_byte += byte
        flag += 1
        i -= 1
    check_byte %= 10
    if 0 != check_byte:
        check_byte = 10 - check_byte
        if 1 == flag % 2:
            if 1 == check_byte % 2:
                check_byte += 9
            check_byte >>= 1
    return '7' + str(check_byte) + hash_str
if __name__ == '__main__':
    if len(sys.argv) != 2:
        url = 'http://www.google.com/'
        url = sys.argv[1]
    print get_pagerank(url)

translate_logoSometimes is can be quite useful to be able to translate content from one language to another from within a program. There are many compelling reasons why you might like the idea of auto translating text. The reason why I’m interested in writing this script is that it is useful to sometimes create unique content online for SEO reasons. Search engines like to see unique content rather than words that have been copied and pasted from other websites. What you’re looking for in web content is:

  1. A lot of it.
  2. Highly related to the keywords you’re targeting.

When trying to get a great position in the organic search results it is important to recognize that you’re competing against an army of low cost outsourced people that are pumping out page after page of mediocre content and then running scripts to generate thousands of back-links to the sites they are trying to rank.  It is very much impossible to get the top spot for any desirable keyword if you’re writing all the content yourself.  You need some help with this.

That’s where Google Translate comes in.

Take an article from somewhere, push it through a round trip of translation such as English->French->English and the content will then be unique enough that it won’t raise any flags that it has been copied from somewhere else on the internet.  The content may not be readable but it will make for fodder for the search engines to eat up.

Using this technique it is possible to build massive websites of unique content overnight and have it quickly rank highly.

Unfortunately Google doesn’t provide an API for translating text.  That means the script has to resort to scraping which is inherently prone to breaking.  The script uses BeautifulSoup to help with the parsing of the HTML content. (Note: I had to use the older 3.0.x series of BeautifulSoup to successfully parse the content)

The code for this was based on this script by technobabble.

import sys
import urllib2
import urllib
from BeautifulSoup import BeautifulSoup # available at: http://www.crummy.com/software/BeautifulSoup/
def translate(sl, tl, text):
    """ Translates a given text from source language (sl) to
        target language (tl) """
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)')]
    translated_page = opener.open(
        "http://translate.google.com/translate_t?" + 
        urllib.urlencode({'sl': sl, 'tl': tl}),
        data=urllib.urlencode({'hl': 'en',
                               'ie': 'UTF8',
                               'text': text.encode('utf-8'),
                               'sl': sl, 'tl': tl})
    translated_soup = BeautifulSoup(translated_page)
    return translated_soup('div', id='result_box')[0].string
if __name__=='__main__':
    print translate('en', 'fr', u'hello')

To generate unique content you can use this within your own python program like this:

import translate
content = get_content()
new_content = translate('fr', 'en', translate('en','fr', content))

3038922333_79273fbb30_oThere are a number of services out there such as Google Cash Detective that will go run some searches on Google and then save the advertisements so you can track who is advertising for what keywords over time. It’s actually a very accurate technique for finding out what ads are profitable.

After tracking a keyword for several weeks it’s possible to see what ads have been running consistently over time. The nature of Pay Per Click is that only profitable advertisements will continue to run long term. So if you can identify what ads, for what keywords are profitable then it should be possible to duplicate them and get some of that profitable traffic for yourself.

The following script is a Python program that perhaps breaks the Google terms of service. So consider it as a guide for how this kind of HTML parsing could be done. It spoofs the User-agent to appear as though it is a real browser, and then does a search through all the keywords stored in an sqlite database and stores the ads displayed for that keyword in the database.

The script makes use of the awesome Beautiful Soup library. Beautiful Soup makes parsing HTML content really easy. But because of the nature of scraping the web it is very fragile since it makes several assumptions about the structure of the Google results page and if they change their site then the script could break.

#!/usr/bin/env python
import sys
import urllib2
import re
import sqlite3
import datetime
from BeautifulSoup import BeautifulSoup  # available at: http://www.crummy.com/software/BeautifulSoup/
conn = sqlite3.connect("espionage.sqlite")
conn.row_factory = sqlite3.Row
def get_google_search_results(keywordPhrase):
	"""make the GET request to Google.com for the keyword phrase and return the HTML text
	url='http://www.google.com/search?hl=en&q=' + '+'.join(keywordPhrase.split())
	req = urllib2.Request(url)
	req.add_header('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/ Safari/525.13')
	page = urllib2.urlopen(req)
	HTML = page.read()
	return HTML
def scrape_ads(text, phraseID):
	"""Scrape the text as HTML, find and parse out all the ads and store them in a database
	soup = BeautifulSoup(text)
	#get the ads on the right hand side of the page
	ads = soup.find(id='rhsline').findAll('li')
	position = 0
	for ad in ads:
		position += 1
		#display url
		parts = ad.find('cite').findAll(text=True)
		site = ''.join([word.strip() for word in parts]).strip()
		#the header line
		parts = ad.find('a').findAll(text=True)
		title = ' '.join([word.strip() for word in parts]).strip()
		#the destination URL
		href = ad.find('a')['href']
		start = href.find('&q=')
		if start != -1 :
			dest = href[start+3:]
		else :
			dest = None
			print 'error', href
		#body of ad
		brs = ad.findAll('br')
		for br in brs:
		parts = ad.findAll(text=True)
		body = ' '.join([word.strip() for word in parts]).strip()
		line1 = body.split('%BR%')[0].strip()
		line2 = body.split('%BR%')[1].strip()
		#see if the ad is in the database
		c = conn.cursor()
		c.execute('SELECT adID FROM AdTable WHERE destination=? and title=? and line1=? and line2=? and site=? and phraseID=?', (dest, title, line1, line2, site, phraseID))
		result = c.fetchall() 
		if len(result) == 0:
			#NEW AD - insert into the table
			c.execute('INSERT INTO AdTable (`destination`, `title`, `line1`, `line2`, `site`, `phraseID`) VALUES (?,?,?,?,?,?)', (dest, title, line1, line2, site, phraseID))
			c.execute('SELECT adID FROM AdTable WHERE destination=? and title=? and line1=? and line2=? and site=? and phraseID=?', (dest, title, line1, line2, site, phraseID))
			result = c.fetchall()
		elif len(result) > 1:
		adID = result[0]['adID']
		c.execute('INSERT INTO ShowTime (`adID`,`date`,`time`, `position`) VALUES (?,?,?,?)', (adID, datetime.datetime.now(), datetime.datetime.now(), position))
def do_all_keywords():
	c = conn.cursor()
	c.execute('SELECT * FROM KeywordList')
	result = c.fetchall()
	for row in result:
		html = get_google_search_results(row['keywordPhrase'])
		scrape_ads(html, row['phraseID'])
if __name__ == '__main__' :