Tag Archives: alexa

alexa_logoSometimes it’s useful to know where all the back-links to a website are coming from.

As a competitor it can give you information about how your competition is promoting their site. You can shortcut the process of finding the good places to get links from, and who might be a client or a good contact for your business by finding out who is linking to your competitors.

If you’re buying or selling a website the number and quality of back-links helps determine the value of a site. checking the links to a site should be on the checklist you use when buying a website.

With that in mind I wrote a short script that scrapes the links to a particular domain from the list that Alexa provides.

import urllib2
 
from BeautifulSoup import BeautifulSoup
 
def get_alexa_linksin(domain):
 
    page = 0
    linksin = []
 
    while True :
        url='http://www.alexa.com/site/linksin;'+str(page)+'/'+domain
        req = urllib2.Request(url)
        HTML = urllib2.urlopen(req).read()
        soup = BeautifulSoup(HTML)
 
        next = soup.find(id='linksin').find('a', attrs={'class':'next'})
 
        linksin += [(link['href'], link.string) for link in soup.find(id='linksin').findAll('a')]
 
        if next :
	    page = page+1
        else :
	    break
 
    return linksin
 
if __name__=='__main__':
    linksin = get_alexa_linksin('halotis.com')
    print linksin