The Value of Location-Independence
At the dawn of the computing age, since the mountain of equipment—relays, dials, vacuum tubes, etc.—would not come to the users, the users would come to the mountain. Fast forward half a century, and the equipment, or at least applications, services, and content, do come to the user over global networks. An important attribute of today’s world is this ubiquity and availability, regardless of whether one is using wired, wireless, converged, or satellite networks.
Latency can also impact other application contexts, from ruining the flow of natural conversation and thus hindering collaboration when delays cross 200 milliseconds, to reducing revenue in eCommerce and online search applications [Hamilton, 2009].
To make things worse, for many applications it is not just the latency of a single round trip request-response transaction that matters, but the latency of multiple round trips, for example, as numerous objects such as images are fetched to load a web page.
Moreover those latencies add up. A wait of 300 milliseconds or 3 seconds for a web page to load is not a lot, until you multiply it by thousands of knowledge or contact center workers, each with hundreds of transactions per day, or consider the importance of timeliness under battlefield conditions.
As a result, the single instance datacenter is not suited for these types of tasks. And, more than simply an outbound stream which may be buffered and/or delayed without substantial impact, these are interactive and real-time services.
Given human response times in the tens and low hundreds of milliseconds, the circumference of the Earth, and the speed of light in fiber (only 124 miles per millisecond), supporting a global user base requires a dispersed services architecture. While there are many thorny issues of coordination, consistency, availability, partition-tolerance, we will address a simple one: the investment implications of such distributed architectures, especially given the potential of common infrastructure.
Latency is strongly, but not perfectly, correlated with distance. The reason for the imperfect correlation has to do with routing anomalies in wireline networks, specifics of router hops and optical-electronic-optical conversions, and so forth, before congestion or link outages even enter the picture. However, given the strong correlation, we can use distance as a proxy for latency. On a plane, both worst-case and expected latency are proportional to the radius of the circle centered on the service node. Consequently, while there are variations due to whether the coverage strategy is circle packing (like cannon balls—with gaps in-between) or circle covering (full coverage but with overlaps), the area covered is proportional to the radius and the number of service nodes (Weinman, 2011).
For n service nodes on a plane, the area A covered depends on the radius r related to the latency/distance, and a constant of proportionality k that depends on the packing/covering strategy. Thus, A = knπr². Therefore, if the area A is a constant we can rearrange terms to realize that .
This simple equation gives us a term but for geometric rather than statistical reasons. Similar economic characteristics ensue: it doesn’t take many nodes to make rapid initial gains, but then there are rapidly diminishing returns: getting worst-case global network round-trip latency from 160 milliseconds to 80 or 40 or 20 takes only a handful or a couple of dozen nodes, but after that, thousands or millions of nodes will only result in microsecond or nanosecond improvements.
To be precise, the Earth is not a plane. At best, it approximates a sphere. Consequently, if we are to devise a formula, we need to consider packings or coverings not of circles, but of “spherical caps” (like baseball caps, but without rims). A useful formula is that the surface area S of a spherical cap is proportional to its height: S = 2πRh. We can thus calculate the surface area of the cap in terms of its angular radius β, since S = 2πR² (1-cos (β) ). If we double this angular radius—which is equivalent to doubling the worst-case distance and thus latency along the surface of the sphere, we clearly have a surface area of 2πR² (1-cos (2β) ), rewriteable as 2πR² (1-cos² (β) + sin² (β) ).
To help understand this, suppose we are placing service nodes. If we have just one and place it, say, at the North Pole, the worst case distance is the antipodal point: the South Pole. If we have two service nodes, optimal placement would be, in effect, the North Pole and the South Pole, and the worst case distance is that to a point on the equator, say, Quito. By increasing the number of service nodes from 1 to 2, we have decreased the distance by ½.
In other words, rather than the law, we have rule when n is 1 or 2. However, as n increases we get closer and closer to as the surface of the sphere increasingly approximates a plane locally.
Given that is exactly correct on a plane and increasingly correct on a sphere, especially at the scale of continents or countries, it tells us that the diminishing returns due to the inverse square root make private investment increasingly difficult. An enterprise or government would be investing more and more capital in service nodes to get less and less improvement in response time.
However, depending on the application profile, the cloud can alter the balance. Specifically, if the cost is based on pay-per-use, having say, 1 node with 10,000 users or 10,000 nodes each with 1 user will cost the same. In practice, there will be some tradeoffs. For example, shorter transaction routes reduce aggregate network usage requirements on a bandwidth-mile basis and thus save either on (customers’) network costs or (providers’) infrastructure investments. On the other hand, maintaining multiple copies of a non-trivial application takes up storage space which costs money or can incur additional license fees. User data that partitions cleanly is cost-insensitive to division, but excessively mobile users may incur data transport costs that are nontrivial. In short, both public cloud and private implementations have application-dependent characteristics regarding user experience and network, storage, and processing costs.
The Value of Utility Pricing
Conventional wisdom suggests that cloud services must be cheaper due to immense economies of scale. However, empirical data offer a mixed and nuanced view (McKinsey, 2009; Harms and Yamartino, 2010). Briefly, economies of scale certainly may exist for large service providers. However, the question is not whether large service providers exhibit economies of scale, but whether these cost economies are sufficiently advantaged relative to enterprise or government scales to overcome the margin, SG&A, and uncollectables cost element disadvantages that any sustainable, rational, commercial service must of necessity have, and whether the total package exhibits any competitive net price advantage. Moreover, cloud technology is a moving target: any current advantages due to, say, proprietary automation or provisioning technology may not be sustainable in the long term as vendors arise that offer them to all players including small cloud providers, enterprises, and governments.
Some people believe that if cloud services are more costly on a unit basis then they should be avoided: after all, why pay more? However, this misses the cloud value proposition generally, including specifically the value of utility pricing.
The complete value proposition of pay-per-use pricing includes the benefit that, regardless of unit cost, these resources are paid for only if used, in contradistinction to owned, dedicated resources.
In everyday life, one often pays more for utility pricing. A midsize car may be financed, leased, or depreciated for roughly $300 per month or $10 per day. That same car from a rental car service might cost $45 per day, even after allowing for insurance and car washes and so forth. One does not shy away from “overpaying,” because in fact, one is not overpaying, one is saving money relative to owning that car but only using it for one or two days.
Consequently, some rules about cloud value are clear (Weinman, 2011c). If the unit cost of the cloud is in fact cheaper than the unit cost of an owned resource, then the cloud should always be the solution. In effect, if I can rent a car for $5 a day, and owning it costs $10, I should use a rental for both short term and long term requirements.
If the cloud costs the same and resource demand is flat, both strategies cost the same. Where it gets interesting is when the cloud costs more, but the resource demand is variable.
Let the demand for resources be denoted by D(t), where 0 ≤ t ≤ T. Let the peak demand be P = max (D (t) ) over that period and the average demand be A = μ (D (t) ) over the same period. Let the utility premium—the ratio of the cloud unit cost to the owned and dedicated unit cost—be U. For the rental car example above, U would be 4.5 (=$45/$10). Let the baseline cost for owned resources be B. It should be noted that linear usage-sensitive pricing implies that we only care about A, not P for the utility priced resources, since the total cost of meeting demand D(t) is A × U × B × T, which is the value of the definite integral . For an owned solution that must meet peak resource requirements, the total cost is P × B × T. For the utility-priced resources to be a lower total cost solution, we must simply ensure that A × U × B × T < P × B × T. The B and the T drop out, leading us to the insight that the cloud is cheaper when (P/A) > U. In English, this means that if the cloud’s unit cost is U times that of a dedicated resource, but the Peak-to-Average ratio is higher than that, a pure cloud solution will be less expensive.
It’s beyond the scope of this article, but it can be shown that for most variable demand situations, a hybrid of dedicated and owned resources is cost-optimal. This matches our intuitive optimization strategy: use owned resources such as homes and owned cars for long duration usage, use rented ones such as hotels or rental cars for short ones. The key factor turns out to be the percentage of time within the planning period that a given resource level is needed. If the percent of time that a given demand is needed is greater than the inverse of the utility premium, we can take that tranche of demand and serve it more cheaply via ownership. If it is less than the inverse of the utility premium, it may be served more cheaply via pay-per-use rental. Finally, if it is equal, there is a breakeven zone where either approach will work well.
Think of it this way: you probably need a car every day for your commute to work: owning is cheaper than renting. A spare car that you need only one day a year when your car is serviced is best met through a rental car. If your child is home from college 3 months out of the year, and rentals cost 4 times as much, the cost is equal, regardless of the approach you use. If you have 3 kids in college, whether you rent 0, 1, 2, or 3 cars won’t matter.
In practice (Weinman, 2009), demands often are highly spiky. In commercial contexts, news stories, marketing promotions, special events, product launches, Internet flash floods and the like drive traffic spikes, enhancing the value of pay-per-use pricing. In government applications there also are spikes, e.g., due to tax filing, egovernment initiatives, and the like. For defense and intelligence applications, there are often tradeoffs between resource requirements and time limits on effective use of information. For example, an initiative might require extremely fast turnaround—say, processing hundreds of hours of drone video—to provide actionable intelligence. This creates a demand spike with national security implications.
One important cost component that we have not addressed yet is the cost of the network in hybrid solutions. If a hybrid solution would be optimal, but a fixed price $20 million network is required to make it work, the cost equations clearly will suffer. If the network is variably priced, that will shift where the breakeven point is for the hybrid. In the college student example, either a daily bus fare or a bus pass to get to and from the rental car location would alter the attractiveness of the rental car option, as would accounting for the value of time spent.