Author Archives: Matt Warren

I have noticed something very different about how I develop my own projects compared to the ones where I’m working for someone else.  It’s a pattern I think many people follow in how they tackle hard intellectual problems in a work setting based on the expectations of others.

When working on my own projects I often find myself sitting in a comfortable chair to think.  I close my eyes and ponder various solutions, wrap my head around implementation details and workout the complexities and issues that might arise when I implement it.  All that thought usually results in some a-ha moments and hopefully less re-work.

On the other hand when working with a team or for someone else I’m anxious to show progress every hour of effort. This means more thinking while doing and occasionally finding myself in a spot where I need to stop and refactor, or re-implement something differently.

It occurred to me that refactoring code in your head is very quick. So the iterations are also quick on improving the design of any solution.  It’s very easy to just toss out an entire train of thought and start somewhere else, but tossing out 100’s of lines of code can feel like a loss.  There could be orders of magnitude better productivity if people spent more time quietly contemplating their code before writing it.

The other thing I find is that when I really know what I’m going to write, having thought about it deeply.  The coding part is just a matter of typing my thoughts out into the computer.  It is then that typing speed can actually become a limiting factor in how fast you can code – rather than splitting time between thinking and typing.

It seems that as I’ve gained experience, my effort of thinking about how to implement something has given way to instinct and habits.  Which is a good thing.  But sometimes there are particular problems that demand a thoughtful response and for those maybe the best place to accomplish great work is from your comfortable Lazy-Boy.


We all only have so much time and attention, and with constantly being pulled by work, friends, family in various directions we are only able to give our own causes and ambitions a percentage of ourselves.

Opportunity cost is what we lose out on for taking one path over another.  In a way opportunity cost is infinite.  We could at any time take any of thousands of possible actions which could, in aggregate, lead to very different futures.

In the present though, opportunity cost is speculative.  We can’t know for certain how much taking one action over another will be better or worse than another action.  Although we usually have enough information to help make informed choices.

Making the optimal choice is non-trivial.  And so we apply tools to help us make smart decisions.  We go to school to understand how things work, study history to learn how things have been done in the past, and use frameworks to help guide us to use best practices.

Going against these best practices are countless other factors.  Emotional impulses, reaction to external factors, others asking things of you, habits, and restraints. They act like friction preventing us from making better decisions. Some of these things we can easily do things about, recognize and counteract. While other things are very difficult to ignore or fix.


I’m going deeper in my learning about how to successfully implement machine learning algorithms this year by initally doing a survey of all the resources out there for learning this stuff.

It is a fast moving area of expertise and as such newer techniques and tools wont be covered in older books or tutorials.

MOOCs are now a great way to get up to speed on the Deep Learning approaches to Machine Learning.  And while there are some good quality general books out there about ML, most are currently in pre-order.

The most appealing to me right now is the course on Udacity, presented by Google which uses Tensorflow in iPython notebooks to teach how to build and apply ML.  The best thing is that it’s

  1. in Python
  2. uses the latest ML library TensorFlow (developed at Google)
  3. Is free

As with all learning, the best way to learn is by doing it yourself and practicing enough to make it stick.

This is not the first resource I’ve used to learn about topics in Machine Learning and it won’t be the last.  Taking multiple courses, reading multiple books and tackling multple problems on your own is the best way to ensure you have no gaps and a well rounded deep understanding of the concepts.

Actually mastering a new skill is hard and there are no shortcuts.  Accept that and jump into the challenge.

This year seems to be a big year for AI development. Deep Learning approaches are going to be applied to more areas and I expect most of the big name tech companies will continue to expand their research in the area.

The encouraging thing for the rest of us developers is going to be the opening up of core technologies.  The algorithms themselves are not significantly complicated. And the true value comes from the data used to train these models.  So there is some incentive for companies like Google to open-source their AI tooling.  It will enable more developers the chance to push the boundaries of AI techniques, while the companies themselves maintain ownership of the critical training data used to get the best results from these models.

What that means is that this year there will be more than a few new start-ups trying to turn these AIs into web services, or sell trained libraries as tools you can use in your own code.

Take for instance, something like sentiment analysis.  There are already quite a few APIs you can easily tie into to get this sort of analysis added to your own projects.

This year I expect this will expand into a large variety of areas.

Spell checking is prime for disruption.  For too long spell checking has relied on simple dictionary lookups and Levenshtein distance to guess at correct spelling.  These are relatively crude compared to the ability to understand context within a sentence and give much more probable corrections.

Google has open-sourced TensorFlow, and it has already gotten some significant attention from the developer community.  As more developers learn how to use these tools this year, you’ll see a lot of very interesting developments.

One of my goals for the year is to get deeper into learning the new generation of AI algorithms and practice getting good at applying those to real problems. AI has been one of those areas that always fascinated me, and then I took the AI course at university and learned that it just wasn’t as difficult or as interesting once the covers had been lifted on the mystic of it.

There are many approaches to algorithms that can be classified as AI.  If you consider that AI is the ability of a program to be given a dataset and then answer questions outside that dataset then something as simple as a linear regression is considered and AI.

#!/usr/bin/env python3
import random
def linear_regression(x, y):
 length = len(x)
 sum_x = sum(x)
 sum_y = sum(y)
# Σx**2 and Σxy
 sum_x_squared = sum(map(lambda a: a*a, x))
 sum_of_products = sum([x[i] * y[i] for i in range(length)])
a = (sum_of_products - (sum_x * sum_y) / length) / (sum_x_squared - ((sum_x**2) / length))
 b = (sum_y - a * sum_x) / length
 return a, b # y = ax + b
if __name__ == '__main__':
 simple_data = [[0, 10], [0, 10]] # slope=1, intercept=0

random_data = [list(range(1000)), [random.triangular(20, 99, 70) for i in range(1000)]] # should be slope ~=0 intercept ~= 70 print(linear_regression(*random_data))

In a real world example this would be expanded into an N dimensional regression where each dimension is an attribute.  As the data gets bigger and bigger, regressions need more advanced techniques to comute things efficiently.  But ultimately it never feels like you’re doing something emergent, you’re just doing math.

Decision trees are another popular form of AI algorithm.  in the most basic form this is just a binary tree of questions, to answer a question like “do I have cancer?” you start at the top of the tree and answer yes or no questions at each node until you reach the leaf which should provide the answer.  Again these get more advanced as they are applied to more difficult use cases but never really get to the point where they feel like an intelligence.

Neural networks and the new research in deep learning approaches are by far the most interesting, and yet they are also still nowhere near a state of general intelligence.  A neuron in a neural network is a simple program that takes input, modifies it and sends that as ouput and accepts feedback to re-inforce positive modifications.  These neurons are then connected into vast networks, usually in layers.

The breakthough in deep learning is that we can provide re-inforcement at different layers in the network for successively more specific things and get better results.  Applied to a data set these can do remarkably well at thing like identifying faces in a photo.

There is a bit of artistry required to apply these to a real world problem. Given data and a problem to answer from it, which type of algorithm do you use, how does the data need to be cleaned up or re-factored, how will you train and verify your AI algorithm afterwards?  There’s enough options there that just choosing a path to go on is often the most difficult task.

The whole AI space still is at it’s infancy and really needs a genius to come in and shake up everything.  All the current approaches are narrow in scope and a breakthrough is required to find a path that will lead to a strong general AI.

Luck and success comes to those who persist.  And for the last couple of months I’ve been planning and doing some market research on a way to pivot halotis into a new business.

As part of this transition I’ll be re-doing this website and you’ll see some new kinds of posts published here.

I’m hopeful that you’ll come along for the journey into a new space and watch as this business is re-born.

If you’re a frequent reader of this blog you might notice that eveything is now going through https.

Thanks to the awesome people at Lets Encrypt I’ve managed to get free SSL Certs for many of my websites.  If it’s free why not encrypt all the things.

Nobody uses telnet anymore when ssh is far more ubiquitous and secure.  Yet the use of unencrypted http is vastly more popular than the secure https version.  Why? Mostly because of cost and complexity with setting up https.

Lets Encrypt is a pretty neat tool.  you run it on the server directly and it will validate the site automatically, generate the keys, install them into apache or nginx and modify the configurations to enable https for those sites.  It’s free and just takes a few seconds to generate and install the certificates.

Now that it’s easy and free to setup SSL Certificates and enable HTTPS, I’m hopeful that it will force other SSL providers to compete and encrypted web traffic will become the norm instead of the exception.

In a one night spark of interest I thought I’d write a webframework that deploys to Amazon Lambda and API Gateway.

A web framework that deploys to a serverless environment is kind of an interesting idea.  Costs scale directly with the users and limit the base costs in some cases all the way to $0.

Now that Amazon Lambda supports Python I thought it was worth trying to make something that allows you to manage a larger project in a way that is familiar to most web application developers.

To start with I based the framework on the awesome Falcon web framework because it is 1. high performance 2. API focused and 3. very easy to understand.

What makes this framework unique is that deployment needs to be baked in.  The framework needs to understand how the APIs are built in order to chop them up into stand alone functions and upload them to Amazon.

The net result is an easy to use framework for developing APIs that can deploy directly to a scalable platform.

It’s still early alpha stages but I’m hopeful that this or something like it will catch on in the future.  For those interested, you can follow the development on github.

I got this idea from the guys at Yelp and how they manage their deployments and several internal infratructure tools.

The basic concept is this: If you had a source of information about all your projects in a simple easy to work with format what kind of tooling would be easy to implement with it?

Yelp does this with a git repository full of yaml files. coupled with a handful of commit hooks, cron jobs and rsync they are able to provide local access to this information to any scripts that want it.

If you had the ability to get information about all your projects what kind of things would you want to know:

  • where is the git repository
  • where are the servers hosted
  • who should be notified if there are problems
  • how is it deployed
  • how can it be monitored

With this information what kind of easy to write scripts could be developed?

  • connect to all servers and check for security patches
  • check all git repositories for outdated requirements
  • validate status of all services and notify developers of problems
  • build analytics of activity across all projects
  • be the source of information for a business dashboard

Also interesting about this approach is that it easily handles new strategies.  If you’re deploying to Heroku, Elastic Beanstalk, raw EC2, Digital Ocean, or through new deployment services it doesn’t matter.  Create a new document with the information needed for a new method and write the required scripts that know how to use it.

By not using a webservice or database you gain the simplicity of just dealing with reading local files.  This low bar makes it trivial to implement new ideas.

A meta project: a project with information about other projects is an intriguing and powerful idea.

A recent project I have been working on involved a custom built Linux distro running on an ARMv6 piece of hardware. We figured we were fairly immune to getting hacked based on obscure old hardware and pared-down Linux distro.

Unfortunately, early in development for ease of working on things we chose a guessable root password.  By the time (months later) that we wanted to plug our device onto the internet for testing we’d long since forgot the state that we had left things with the root user account.

It took just 1 week of being connected to the internet for the device to be hacked and malware installed.

An investigation uncovered just how unsophisticated of an attack was required to gain access.

So a lesson was learned by everyone on the team. Basic security precations such as using a strong root password should be made from the start – not procrastenated.