The Hosting site Rule of 3
Many people have heard of the SaaS Rule of 40. But what about the Hosting site Rule of 3? I don’t know if that’s really the name, because I just made it up. I talk with a lot of enterprise executives about their companies, especially private equity portfolio companies. This means they are often working with software that has been successful for a long time, has achieved product market fit, and was almost certainly designed before the explosion of cloud. I often state, to shocked surprise, that of course:
It’s cheaper to run in 3 data centers than in 2.
That’s impossible? Right? In what universe is 3 less than 2?
Most of those applications were developed at a time when the way to handle failures in an environment was to have a “disaster recovery” site. This means that the primary site would handle all the traffic and if there was ever an adverse event, the site would fail over to the secondary. There were and still are a number of problems with this scenario:
- We need to purchase 200% of capacity and the secondary site is sitting idle doing absolutely nothing. It needs to be fully built out to handle the full website traffic, but often just sits for months or years. An incredible waste of capital.
- Some sites never bother to do the full build out and spend all that capital (understandably). They hope they’ll get the primary site back up before things get “too bad”.
- Many sites never get around to testing the failover or fail to simulate the correct conditions. When it comes time to use their secondary site, they sometimes encounter a strange loop (the secondary site requires the primary site to be active to perform the failover because that had been the situation when they’d tested it!). This actually happened at Salesforce.
By contrast:
- If we run active-active-active in 3 locations, we only need to purchase 150% of the site capacity. All resources are always used. Keep this in mind the next time someone tells you how expensive the cloud is.
- We even have a measure of headroom to accommodate burst traffic if it arrives and that can give us advance warning to deploy additional resources if necessary (or automatically).
- We can lose an entire data center and it’s not a panic event. The site keeps working without strain and we can either spin up more resources in the existing sites, or work on bringing back the 3rd site.
- The networking is vastly simplified. In AWS if you deploy across 3 availability zones, the load balancers are deployed that way as well.
Of course, it’s possible to deploy the two sites as active-active even if it’s rarely done. But the costs are still 50% more. For architectures which are making the transition, I generally work with the engineering teams on moving session data from the hosts to a cache and the local data files to an object store. This way, the cloud provider handles many of the HA features, leaving the application architecture concerns to the portfolio company engineering team.
I remember when I learned this long ago, I actually felt foolish that I hadn’t reasoned it out myself. How are your sites deployed?