As I work through my day job as the guy in charge of Cloud Services ecosystem development for Microsoft, I have the distinct pleasure of working on some very hard problems. During my travels, I also get to spend time thinking about topics on which not much brain matter has been spent. Over the last couple of weeks, I have been thinking about a unique circumstance that could present itself to developers as development continues to migrate to cloud platforms. Dare Obasanjo has written about some of the attendant challenges with a SaaS platform, but I wanted to throw out a few new ideas.
In times past, ISVs would develop their software and ship it off to their customers when there was a sale. Way, way back, diskettes were sent, which evolved to CDs, then to DVDs, and eventually net-based delivery. For the really complicated stuff, a bona fide “systems engineer” would show up with the servers under his arm. The beauty of this model, from the stand point of running a business, was that you knew exactly how much profit you were making on a customer. Sales guys have some loose reigns, for the most part, so that they can get the deal done, and their lever is product price. CFOs can then plug all the data into their spreadsheets and from that they derive the customer profitability.
You see, it doesn’t generally matter if you write bad code. The first step in all customer service calls for any enterprise software packages is to poke the box. No matter what the issue is, the first step is to reset the box. How does that construct carry over in a cloud based world? With shared hosting, the damage any single tenant can do is limited to the number of other deployments on that shared set of servers. With a true cloud – a fabric of machines shared across all deployments — there is no proverbial machine to poke. Mercifully, the cloud architects have thought this through, and are building in self-healing into the infrastructure.
What this example does not envisage is the profitability impact of bad code. When code goes awry on a customer site, only that customer is impacted. Profitability never enters into the equitation. Service support calls, if they are not charged for, are at least baked into the standard service and p contract. Further, a machine lockup cannot spread in this environment. In a cloud, thanks to the magic of elasticity, without proper controls in place, errant code can actually spread like a virus, causing machine images to spin up, which in turn drains the profitability of that customer account.
Consumptive pricing for end customers hasn’t yet hit mainstream. Most pricing (if you are charging at all) is based on price per seat per month or year. If you are hosting your web app with a hoster, your profitability is locked in when you make the sale. There is some variability associated with network throughput charges, but for the most part, you’re locked in. In the cloud world, where all of your resources are billed out 0n a consumptive basis, profitability is now a function of variable cost inputs. These inputs are impacted by how well your code performs.
Amazon loves to hold out Animoto as an example of the greatness of their platform. They love to show the chart on the left here. In a couple of days, usage of the Animoto service exploded. There’s an accounting of the event in a blog post by the AWS team. If you do the quick math, they were supporting approximately 74 users per machine instance, and their user/machine image density was on the decline with increased user accounts. The story they like to tell from this chart is “wow, we were able to spin up 3000 machines over night. It’s amazing!” What I see is more along the lines of “holy crap, what is your code doing that you need that many instances for that many users?” I don’t mean to impugn Animoto here, but I don’t want the point to be lost: the profitability of your project could disappear overnight on account of code behaving badly.
For the most part, the design and efficiency of your code is largely in your hands. What about a potentially more onerous situation – customers behaving badly? Revisiting the software license model, a customer can only use so many cycles on the servers to which your application is deployed. That customer can abuse the code, but they cannot impact the customer experience of any other customer, and they can’t cause you to deliver more machines to them to support their load unless they pay for them. As such, you’ve collected your money, and they have no impact on your profitability. In a world where SaaS vendors haven’t yet figured out how to do consumptive based pricing for their offerings, the very real possibility exists that a small handful of customers can abuse your application and destroy not only the profitability of the account, but of your entire P&L.
The simple answer is “quotas,” though that is somewhat harder to enforce in practice when the consumptive unit is not easily measurable. What kind of a quota can you put in place for a business intelligence application where a micro managing mid-level employee is running wild with “what-if” scenarios? As more applications migrate to the cloud, there will be plenty of scenarios where the work units are not as simply to constrain as disk or processor usage.
Consumptive costing models, automatic scaling of applications and difficult to define atomic quota units have the potential to create serious financial challenges for cloud based application vendors. The new challenges will manifest themselves by enforcing rigid software efficiency design goals on development teams, and forcing the operations team to entertain the notion of firing bad customers. Designing good software is certainly not a new topic, but the possibility of bankrupting a company is not something about which the architects have ever really had to think.