Netflix has been in the news quite a bit lately. Regardless of the side you pick on this first world problem, there is something really neat that they do that I wanted to share with a larger audience. If you read Harvard Business Review, you already know what I am talking about.

Andrew McAfee published an article entitled “What Every CEO Need to Know About the Cloud.” In this basic primer for business folks, McAfee describes something that Netflix created called the Chaos Monkey, a process largely credited for preparing the company to weather the Amazon ECC outage with minimal issues of their own while others, like Foursquare, experienced problems for days. McAfee talks about this in the section of the article dealing with the reliability issues that can happen with cloud services. According to McAfee:

[Chaos Monkey’s] job is to automatically and randomly shut down major parts of the company’s technology environment. Because Netflix learned to handle its own Chaos Monkey, it was prepared to deal with the breakdown caused by Amazon.

When I read that, I was shocked. I don’t know how every company runs their IT systems, but I am certainly not familiar with any other company that intentionally disrupts their systems to keep employees ready for anything. If the article is accurate, the Chaos Monkey could be revenue impacting for Netflix.

The rest of the world has much to learn from Netflix (including all that other stuff putting them in the news), and I think we should all start asking, “Where is our information security Chaos Monkey?” I suppose we have enough chaos in our environments today that we might think someone created a version of the Chaos Monkey for us.

We need more of the semi-controlled security events to keep our employees fresh and ready for the uncontrolled ones coming from the outside. Our version of the Chaos Monkey could do things like:

  • Interrupt backup routines
  • Phish employees
  • Hijack caller-id and place “trusted calls from IT” to unsuspecting users
  • Forward requests to common sites to look-alikes to see if employees are fooled
  • Pop up bad certificate errors
  • Offer new software packages as “security patches”

What features would you add into the chaos monkey? (Check out Netflix’s post on their Chaos Monkey here!)

