Category Archives: Internet

This is the project that I have been focused on building for the last few months, and it has expanded from work that I have been doing off and on for several years.  I’m excited that it finally has a home of it’s own and is nearly ready to release to the public.

I will be finishing up the design over the next month or so with the plan to do a very small announcement in early January to let some early adopters in to start using the system.

You are probably a lot like me and have a couple of idle domain names that you’ve picked up over the years and never really done anything with.  You also have a few websites that are actually pretty decent but not getting the traffic they deserve from the search engines.  That’s why I created the automatic blog machine – For myself to solve these two problems.  I can easily create a blog and hook it into the system then forget about it.  It will run for months building traffic, links, and attracting advertisers and drive traffic to the websites I actually care about.

I’m using this system to build out my network of websites and build an increasing base of pages I can then sell through Google DoubleClick for Publishers, or sell text link ads or promote Amazon products, Ebay auctions or any number of other affiliate products.  By integrating an Ad Server I can quickly and easily put an ad across the entire network at no cost to me and immediately drive massive amounts of traffic.

One concern is that the sites should not be spammy and I took great effort to make sure the content that the Automatic Blog Machine creates is unique, natural and readable.  That means auto-translation is not used because it creates hard to read content, it also means there is a strategy for both internal and external linking.  Getting these things to work correctly was actually pretty complex.  It requires me to datamine a lot of content in order to build each blog post.

I’m pretty proud of the development of this tool and I’m looking forward to letting people in to see how it works.

Check out the Automatic Blog Machine

For my next web application I am using Amazon EC2 Cloud infrastructure for hosting.  This is going to be by far the most demanding web application in terms of memory, processing and disk requirements I have ever launched and so it requires a scalable system to run on.  One of the frustrations I’ve had with shared hosting is the limited memory resources.  On a number of occasions my applications have been killed for going over the memory limits leaving my sites unreachable for hours before I realize and can restart processes.

I have been toying with Amazon EC2 for the last few days and it’s pretty fun to spin up different boxes and play around with them for a few pennies.  With the new micro instances going for about $15/month it’s actually very competitive with the price of a shared hosting service but with much more flexibility.

It’s pretty cool to have a server running in the cloud 24/7.  I can ssh into it, use VIM, git, mercurial to pull and edit my code.  I can easily manage the Ubuntu server the same way I would with my desktop, full root access to install packages makes things simple. Not having to worry about the peculiarities of shared systems or locked down web interfaces for installing apps is refreshing.  Having direct access to tweak the apache and PostgreSQL config files makes things much more familiar to me.  It makes it much easier to keep my development environment similar to my production environment.

As things grow it won’t be hard to split off database servers and add in load balancing.  Without the upfront costs of server hardware, and without the restrictions of shared hosting.

On a recent trip home to my parents house I was asked to do what most tech savvy guys are asked.  “Please fix my computer.”  This time around I found my parents computer had been infected with something that left both google and yahoo search results being diverted to parked advertising sites.  It was pretty frustrating, and made using the internet almost impossible.  Do a search for almost anything and the results would look fine but the click would go somewhere other than the intended site.  Mom wanted to toss the computer out the window several times over the last few months dealing with this problem.

Numerous anti-virus applications had been downloaded and installed in an effort to fix the problem but nothing was working.  Whatever was making this happen wasn’t still on the system.  So I checked the Internet Explorer add-ons for anything suspicious. Nothing.  Then I decided to install Google Chrome but when I went to the download page everything was in Dutch.  Hmmm.  I know google does some IP geo location to determine a default language for their site so something must be rerouting the traffic through Holland.

Running traceroute on the command line revealed the problem.  google.com was resolving to an IP address of a hosting company in the Netherlands.  It seems that some malware had gotten on the computer and modified the hosts file to redirect search traffic to a proxy service.  It’s a scary thought.  Everything going to and from google.com or yahoo.com was potentially being sniffed and injected with any number of malicious attacks.

I tried to edit the hosts file, but it was hidden and locked up.  Grr.  A quick search though and I was able to locate a Microsoft support fix-it script that could revert the hosts file.  Everything was fixed.

But that got me thinking about how much money whoever had this proxy server running was making.  With even just simply redirecting the search results from any infected computer they would easily be getting massive amounts of traffic.  They could have stolen massive numbers of gmail account passwords, and injected their own links for any google adsense ads.

If this had been deployed across some of the larger bot nets then the number of infected computers could be in the 100,000+ range.  With 1 click per infected computer per day that’s a lot of traffic and with some smart affiliate links it would be making tens of thousands of dollars in profit (minimum) per day.  A stunning amount of money being made from a simple script, and a proxy server.

If the proxy server had been just a little more discrete with the redirections it may never have even been noticed…

Just finished up at Jeff Walker’s PLF3 Live event in Arizona and I have to say that it was fantastic.  This is the first time I have ever done a live internet marketing event, but it surely won’t be the last.

After three long days of interviews, case studies, guest speakers, and teaching I think I have enough tips and advice to keep me going for a while.  But the biggest thing I got out of the event was getting the chance to meet other entrepreneurs anxious to build their own businesses.  I was able to find a number of people that I could help with their businesses through either technical help, or joint ventures with some of my other ventures.  I got a lot of great feedback about my new business idea and made some amazing contacts who are very interested in using it and promoting it when it’s done including some heavy hitters in the industry.

It’s got me really amped up and excited to try and push my development timeline to get something out and available.  I am really onto something unique in a hugely underserved segment of the market.

Between the breaks I squeezed in the time to plan out the next 6 months of development, roughly schedule 5 launches, define the feature sets for 3 major releases, and detail the programming required before going public.

I’m supposed to be on vacation for the next 10 days…  not sure how much actual vacation I’m going to get now.

Progress on my Auto Blogging Django Application has been going surprisingly swiftly.  I have added support for a number of different content sources and have been able to launch the application to my server, scheduled the cron jobs and it is currently running in test mode for 1 blog.  I will be quickly expanding to 100 blogs powered by this software by the end of the month.

It’s not quite ready to demo and show off to anyone – the code and UI is fairly ugly.  I’m actually surprised it works at this early stage of development.  But that is proving to be a great asset since an early release allows me to start testing the application in a production environment and ensure that everything is working.

I have developed a short business plan around this application to scale it up to 1000 blogs by spring 2011.  At which point I hope it will be polished enough to release as a service to other users.  Development will occur in 4 phases with a launch expected in Q2 2011:

  1. develop the basic system, and scale up to 100 test blogs by Nov 1st
  2. test various monetization strategies and linking strategies until the end of the year
  3. develop additional automation tools and scale the test to 1000 sites by Feb 2011
  4. bring on some beta testers in Q1 2011
  5. Launch to the public in Q2 2011

At some point when the complexity grows I may try to bring on another python/django developer to help offload some of the work required and perhaps speed up the development of this.

In the short term however, I’m going to be focused on creating new blogs and configuring them in the application.  If every blog takes 10 minutes to set up, then even at 5 per night it will take almost 3 weeks to reach my 100 blog goal.  Unfortunately it is tedious work that needs to be done.  Future versions of the software will try to make this whole process easy to outsource.

I am still extremely excited to see how well this performs in the real world.

After seeing the successful launch of the Autoblog Samurai product launch come through my email box over the last week I thought it might be time to dig up my scripts that I wrote several years ago to attempt the same thing. Over the past few years I have run a couple of autoblogs but never really took the concept into something that was really profitable or very easy to use. (even though the blogs that I did run were making a small profit)

But after seeing the amount of excitement that Autoblog Samurai has been able to create around their software I’m intrigued enough to give it a second shot.

So I have started to revamp my existing hodge podge of scripts into a proper web based application.  It will be a django based web application that allows users to configure many blogs and pipe in many content sources to each one.  Wrapped around everything will be a number of specific monetization tools, cross promotion tools, and hopefully some analytics built right in.

The one big perk of having it web based is that it will never sleep unlike your home computer which may not always be turned on an have the software running.  That actual bit of out of sight out of mind might actually mean that users forget they’re running a bunch of autoblogs until they get a check in the mail from Google Adsense.

I’m not sure yet if I’ll make the software public or if I’ll keep it for myself.

After an hour of work on it last night I actually got a functional prototype system running.  There’s a couple of things that need to be cleaned up, and features added to make it competitive.  But most of the work will be designing and polishing a nice user interface.

I would like to be able to create a system that can scale to 10,000+ blogs and publish content to all of them as often as every minute.  I think that would be an interesting experiment in internet marketing.

Connecting to a Google Gmail account is easy with Python using the built in imaplib library. It’s possible to download, read, mark and delete messages in your gmail account by scripting it.

Here’s a very simple script that prints out the latest email received:

#!/usr/bin/env python
 
import imaplib
M=imaplib.IMAP4_SSL('imap.gmail.com', 993)
M.login('myemailaddress@gmail.com','password')
status, count = M.select('Inbox')
status, data = M.fetch(count[0], '(UID BODY[TEXT])')
 
print data[0][1]
M.close()
M.logout()

As you can see. Not a lot of code required to login and check and email. However, imaplib provides just a very thin layer on the imap protocol and you’ll have to refer to the documentation on how imap works and the commands available to really use imaplib. As you can see in the fetch command the “(UID BODY[TEXT])” bit is a raw imap instruction. In this case I’m calling fetch with the size of the Inbox folder because the most recent email is listed last (uid of most recent message is count) and telling it to return the body text of the email. There are many more complex ways to navigate an imap inbox. I recommend playing with it in the interpreter and connecting directly to the server with telnet to understand exactly what is happening.

Here’s a good resource for quickly getting up to speed with IMAP Accessing IMAP email accounts using telnet

As much as I’ve found the basic webscraping to be really simple with urllib and BeautifulSoup. It leaves somethings to be desired. The BeautifulSoup project has languished and recent versions have switched the HTML parser for one that is less able to manage with the poorly encoded pages on real websites.

Scrapy is a full on framework for scraping websites and it offers many features including a stand alone command-line interface and daemon tool to make scraping websites much more systematic and organized.

I have yet to build any substantial scraping scripts based on Scrapy but judging from the snippets I’ve read at http://snippets.scrapy.org, the documentation at http://doc.scrapy.org and the project blog at http://blog.scrapy.org. It seems like a solid project with a good future and a lot of really great features that will make my scripts more automate-able and standardized.

I got an email the other day from Frank Kern who was pimping another make money online product from his cousin Trey. The Number Effect is a DVD containing the results of an experiment where he created an affiliate link to every one of the 12,000 products for sale on ClickBank and sent paid (PPV) traffic to all of those links and found which ones were profitable. He found 54 niches with profitable campaigns out of 12,000.

Trey went on to talk about the software that he had written for this experiment. It apparently took a bit of work to get going from his outsourced programmer.

I thought it would be fun to try and implement the same script myself. It took about 1 hour to program the whole thing.

So if you want to create your own clickbank affiliate link for all of the clickbank products for sale here’s a script that will do it. Keep in mind that I never did any work to make this thing fast. and it takes about 8 hours to scrape all 13,000 products, create the affiliate links, and resolve the urls for where it goes. Sure I could make it faster, but I’m lazy.

Here’s the python script to do it:

#!/usr/bin/env python
# encoding: utf-8
"""
ClickBankMarketScrape.py
 
Created by Matt Warren on 2010-09-07.
Copyright (c) 2010 HalOtis.com. All rights reserved.
 
"""
 
 
 
CLICKBANK_URL = 'http://www.clickbank.com'
MARKETPLACE_URL = CLICKBANK_URL+'/marketplace.htm'
AFF_LINK_FORM = CLICKBANK_URL+'/info/jmap.htm'
 
AFFILIATE = 'mfwarren'
 
import urllib, urllib2
from BeautifulSoup import BeautifulSoup
import re
 
product_links = []
product_codes = []
pages_to_scrape = []
 
def get_category_urls():
	request = urllib2.Request(MARKETPLACE_URL, None)
	urlfile = urllib2.urlopen(request)
	page = urlfile.read()
	urlfile.close()
 
	soup = BeautifulSoup(page)
	parentCatLinks = [x['href'] for x in soup.findAll('a', {'class':'parentCatLink'})]
	return parentCatLinks
 
def get_products():
 
	fout = open('ClickBankLinks.csv', 'w')
 
	while len(pages_to_scrape) > 0:
 
		url = pages_to_scrape.pop()
		request = urllib2.Request(url, None)
		urlfile = urllib2.urlopen(request)
		page = urlfile.read()
		urlfile.close()
 
		soup = BeautifulSoup(page)
 
		results = [x.find('a') for x in soup.findAll('tr', {'class':'result'})]
 
		nextLink = soup.find('a', title='Next page')
		if nextLink:
			page_to_scrape.append(nextLink['href'])
 
		for product in results:
			try:
				product_code = str(product).split('.')[1]
				product_codes.append(product_code)
				m = re.search('^< (.*)>(.*)< ', str(product))
				title = m.group(2)
				my_link = get_hoplink(product_code)
				request = urllib2.Request(my_link)
				urlfile = urllib2.urlopen(request)
				display_url = urlfile.url
				#page = urlfile.read()  #continue here if you want to scrape keywords etc from landing page
 
				print my_link, display_url
				product_links.append({'code':product_code, 'aff_link':my_link, 'dest_url':display_url})
				fout.write(product_code + ', ' + my_link + ', ' + display_url + '\n')
				fout.flush()
			except:
				continue  # handle cases where destination url is offline
 
	fout.close()
 
def get_hoplink(vendor):
	request = urllib2.Request(AFF_LINK_FORM + '?affiliate=' + AFFILIATE + '&promocode=&submit=Create&vendor='+vendor+'&results=', None)
	urlfile = urllib2.urlopen(request)
	page = urlfile.read()
	urlfile.close()
	soup = BeautifulSoup(page)
	link = soup.findAll('input', {'class':'special'})[0]['value']
	return link
 
if __name__=='__main__':
	urls = get_category_ids()
	for url in urls:
		pages_to_scrape.append(CLICKBANK_URL+url)
	get_products()

Django Dash is a 48 hour competition where teams from all over the world build a django based web application. You can see the finished web applications at http://djangodash.com/. All the projects were required to be open sourced and committed to on either github.com or bitbucket.org. The cool thing about that is you can see how all these web applications were build from the ground up and get a feel for how to build really compelling django apps.

The results have now been posted.

I’m sure there’s lots of little tricks for quick development buried in those repos. I know I’ll be digging through them to get some inspiration for future projects.