Tag Archives: seo

Have you ever wanted to track and assess your SEO efforts by seeing how they change your position in Google’s organic SERP? With this script you can now track and chart your position for any number of search queries and find the position of the site/page you are trying to rank.

This will allow you to visually identify any target keyword phrases that are doing well, and which ones may need some more SEO work.

This python script has a number of different components.

  • SEOCheckConfig.py script is used to add new target search queries to the database.
  • SEOCheck.py searches Google and saves the best position (in the top 100 results)
  • SEOCheckCharting.py graph all the results

The charts produced look like this:

seocheck

The main part of the script is SEOCheck.py. This script should be scheduled to run regularly (I have mine running 3 times per day on my webfaction hosting account).

For a small SEO consultancy business this type of application generates the feedback and reports that you should be using to communicate with your clients. It identifies where the efforts should go and how successful you have been.

To use this set of script you first will need to edit and run the SEOCheckConfig.py file. Add your own queries and domains that you’d like to check to the SETTINGS variable then run the script to load those into the database.

Then schedule SEOCheck.py to run periodically. On Windows you can do that using Scheduled Tasks:
Scheduled Task Dialog

On either Mac OSX or Linux you can use crontab to schedule it.

To generate the Chart simply run the SEOCheckCharting.py script. It will plot all the results on one graph.

You can find and download all the source code for this in the HalOtis-Collection on bitbucket. It requires BeautifulSoup, matplotlib, and sqlalchemy libraries to be installed.

nichemarketingbullseyeYesterday I found a website that got me thinking.

It’s a niche website with a lot of unique content that is heavily SEO optimized. The business model was very simple – they have 4 CPA lead generation offers on each page and the goal is quite apparent: keep people on the site long enough with the content and drive them to one of the CPA offers. I imagine that the site is very successful at making money.

It’s a model that is simple, and a website that is not technically difficult to create.  However when I started to think about how to duplicate this business model for myself I found it to be a mountainous undertaking.

And so I’m going to ask you: If you needed to get 30,000-50,000 words of high quality unique content that would keep people on your website for 10+minutes how would you get it? The two options I could think of are:

  1. Pay someone to research and write the content for you
  2. Buy a website that already has the unique content on it

Both of those options could cost $$$.  Is there another way to get great content fast without spending so much money?  Leave a comment with your thoughts.

By the way…  the interesting  website I found that started this train of thought is NursingSchool.org.  How much do you think a website like that would be making?  How much would it cost to develop?

linkbuildinglu7Inlinks (aka backlinks) are an important aspect of your SEO strategy. They are the ways that people will find your website and they are an indicator to search engines that your website is important and should rank well. So it is important to keep an eye on this statistic for your website. There is a saying: “you can’t manage what you can’t measure” which applies. If you want your website to rank well you need to manage your inlinks and so you need to measure them.

This script requires a Yahoo! AppID because it uses the REST API for Yahoo! Site Explorer rather than any scraping of pages which you can get by going to the Yahoo! Developer Network.

The script simply returns the total number of results but you could easily extend this to print out all your inlinks. I will be using this to track my inlink count over time by running it every day and storing the result in a database.

Example Usage:

$ python yahooInlinks.py http://www.halotis.com
checking http://www.halotis.com
937

Here’s the Python Code:

#!/usr/bin/env python 
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import urllib2, sys, urllib
 
try:
   import json
except:
   import simplejson as json # http://undefined.org/python/#simplejson
 
YAHOO_APP_ID='YOUR API KEY'
 
def yahoo_inlinks_count(query):
    if not query.startswith('http://') raise Exception('site must start with "http://"')
 
    request = 'http://search.yahooapis.com/SiteExplorerService/V1/inlinkData?appid=' + YAHOO_APP_ID + '&query=' + urllib.quote_plus(query) + '&output=json&results=0'
 
    try:
    	results = json.load(urllib2.urlopen(request))
    except:
    	raise Exception("Web services request failed")
    	sys.exit()
 
    return results['ResultSet']['totalResultsAvailable']
 
if __name__=='__main__':
    print 'checking', sys.argv[1]
    print yahoo_inlinks_count(sys.argv[1])

Ok, even though Yahoo search is on the way out and will be replace by the search engine behind Bing. That transition won’t happen until sometime in 2010. Until then Yahoo still has 20% of the search engine market share and it’s important to consider it as an important source of traffic for your websites.

This script is similar to the Google and Bing SERP scrapers that I posted earlier on this site but Yahoo’s pages were slightly more complicated to parse. This was because they use a re-direct service in their URLs which required some regular expression matching.

I will be putting all these little components together into a larger program later.

Example Usage:

$ python yahooScrape.py
http://www.halotis.com/
http://www.halotis.com/2007/08/27/automation-is-key-automate-the-web/
http://twitter.com/halotis
http://www.scribd.com/halotis
http://www.topless-sandal.com/product_info.php/products_id/743?tsSid=71491a7bb080238335f7224573598606
http://feeds.feedburner.com/HalotisBlog
http://www.planet-tonga.com/sports/haloti_ngata.shtml
http://blog.oregonlive.com/ducks/2007/08/kellens_getting_it_done.html
http://friendfeed.com/mfwarren
http://friendfeed.com/mfwarren?start=30

Here’s the Script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import urllib,urllib2
import re
 
from BeautifulSoup import BeautifulSoup
 
def yahoo_grab(query):
 
    address = "http://search.yahoo.com/search?p=%s" % (urllib.quote_plus(query))
    request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
    urlfile = urllib2.urlopen(request)
    page = urlfile.read(200000)
    urlfile.close()
 
    soup = BeautifulSoup(page)
    url_pattern = re.compile('/\*\*(.*)')
    links =   [urllib.unquote_plus(url_pattern.findall(x.find('a')['href'])[0]) for x in soup.find('div', id='web').findAll('h3')]
 
    return links
 
if __name__=='__main__':
    # Example: Search written to file
    links = yahoo_grab('halotis')
    print '\n'.join(links)

bingLogo_5F00_lgBased on my last post for scraping the Google SERP I decided to make the small change to scrape the organic search results from Bing.

I wasn’t able to find a way to display 100 results per page in the Bing results so this script will only return the top 10. However it could be enhanced to loop through the pages of results but I have left that out of this code.

Example Usage:

$ python BingScrape.py
http://twitter.com/halotis
http://www.halotis.com/
http://www.halotis.com/progress/
http://doi.acm.org/10.1145/367072.367328
http://runtoloseweight.com/privacy.php
http://twitter.com/halotis/statuses/2391293559
http://friendfeed.com/mfwarren
http://www.date-conference.com/archive/conference/proceedings/PAPERS/2001/DATE01/PDFFILES/07a_2.pdf
http://twitterrespond.com/
http://heatherbreen.com

Here’s the Python Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import urllib,urllib2
 
from BeautifulSoup import BeautifulSoup
 
def bing_grab(query):
 
    address = "http://www.bing.com/search?q=%s" % (urllib.quote_plus(query))
    request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
    urlfile = urllib2.urlopen(request)
    page = urlfile.read(200000)
    urlfile.close()
 
    soup = BeautifulSoup(page)
    links =   [x.find('a')['href'] for x in soup.find('div', id='results').findAll('h3')]
 
    return links
 
if __name__=='__main__':
    # Example: Search written to file
    links = bing_grab('halotis')
    print '\n'.join(links)

1_google_logoHere’s a short script that will scrape the first 100 listings in the Google Organic results.

You might want to use this to find the position of your sites and track their position for certain target keyword phrases over time. That could be a very good way to determine, for example, if your SEO efforts are working. Or you could use the list of URLs as a starting point for some other web crawling activity

As the script is written it will just dump the list of URLs to a txt file.

It uses the BeautifulSoup library to help with parsing the HTML page.

Example Usage:

$ python GoogleScrape.py
$ cat links.txt
http://www.halotis.com/
http://www.halotis.com/2009/07/01/rss-twitter-bot-in-python/
http://www.blogcatalog.com/blogs/halotis.html
http://www.blogcatalog.com/topic/sqlite/
http://ieeexplore.ieee.org/iel5/10358/32956/01543043.pdf?arnumber=1543043
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1543043
http://doi.ieeecomputersociety.org/10.1109/DATE.2001.915065
http://rapidlibrary.com/index.php?q=hal+otis
http://www.tagza.com/Software/Video_tutorial_-_URL_re-directing_software-___HalOtis/
http://portal.acm.org/citation.cfm?id=367328
http://ag.arizona.edu/herbarium/db/get_taxon.php?id=20605&show_desc=1
http://www.plantsystematics.org/taxpage/0/genus/Halotis.html
http://www.mattwarren.name/
http://www.mattwarren.name/2009/07/31/net-worth-update-3-5/
http://newweightlossdiet.com/privacy.php
http://www.ingentaconnect.com/content/nisc/sajms/1988/00000006/00000001/art00002?crawler=true
http://www.ingentaconnect.com/content/nisc/sajms/2000/00000022/00000001/art00013?crawler=true
http://www.springerlink.com/index/etm69yghjva13xlh.pdf
http://www.springerlink.com/index/b7fytc095bc57x59.pdf
......
$

Here’s the script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import urllib,urllib2
 
from BeautifulSoup import BeautifulSoup
 
def google_grab(query):
 
    address = "http://www.google.com/search?q=%s&num=100&hl=en&start=0" % (urllib.quote_plus(query))
    request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
    urlfile = urllib2.urlopen(request)
    page = urlfile.read(200000)
    urlfile.close()
 
    soup = BeautifulSoup(page)
    links =   [x['href'] for x in soup.findAll('a', attrs={'class':'l'})]
 
    return links
 
if __name__=='__main__':
    # Example: Search written to file
    links = google_grab('halotis')
    open("links.txt","w+b").write("\n".join(links))

translate_logoSometimes is can be quite useful to be able to translate content from one language to another from within a program. There are many compelling reasons why you might like the idea of auto translating text. The reason why I’m interested in writing this script is that it is useful to sometimes create unique content online for SEO reasons. Search engines like to see unique content rather than words that have been copied and pasted from other websites. What you’re looking for in web content is:

  1. A lot of it.
  2. Highly related to the keywords you’re targeting.

When trying to get a great position in the organic search results it is important to recognize that you’re competing against an army of low cost outsourced people that are pumping out page after page of mediocre content and then running scripts to generate thousands of back-links to the sites they are trying to rank.  It is very much impossible to get the top spot for any desirable keyword if you’re writing all the content yourself.  You need some help with this.

That’s where Google Translate comes in.

Take an article from somewhere, push it through a round trip of translation such as English->French->English and the content will then be unique enough that it won’t raise any flags that it has been copied from somewhere else on the internet.  The content may not be readable but it will make for fodder for the search engines to eat up.

Using this technique it is possible to build massive websites of unique content overnight and have it quickly rank highly.

Unfortunately Google doesn’t provide an API for translating text.  That means the script has to resort to scraping which is inherently prone to breaking.  The script uses BeautifulSoup to help with the parsing of the HTML content. (Note: I had to use the older 3.0.x series of BeautifulSoup to successfully parse the content)

The code for this was based on this script by technobabble.

import sys
import urllib2
import urllib
 
from BeautifulSoup import BeautifulSoup # available at: http://www.crummy.com/software/BeautifulSoup/
 
def translate(sl, tl, text):
    """ Translates a given text from source language (sl) to
        target language (tl) """
 
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)')]
 
    translated_page = opener.open(
        "http://translate.google.com/translate_t?" + 
        urllib.urlencode({'sl': sl, 'tl': tl}),
        data=urllib.urlencode({'hl': 'en',
                               'ie': 'UTF8',
                               'text': text.encode('utf-8'),
                               'sl': sl, 'tl': tl})
    )
 
    translated_soup = BeautifulSoup(translated_page)
 
    return translated_soup('div', id='result_box')[0].string
 
if __name__=='__main__':
    print translate('en', 'fr', u'hello')

To generate unique content you can use this within your own python program like this:

import translate
 
content = get_content()
new_content = translate('fr', 'en', translate('en','fr', content))
publish_content(new_content)