Posted Tuesday, January 29th, 2008 at 5:18 pm under companies.

## S3 numbers revisited: six orders of magnitude does matter

OK…. I should have realized in my original posting that the Oct 2007 10,000,000,000,000 objects figure was the source of the problem. I knew S3 could not be doubling every week, and that Amazon could not be making \$11B a month, but didn’t see the now-obvious error in the input.

So what sort of money are they actually making?

Don MacAskill pointed me to this article at Forbes which says the number of objects at the end of 2007 was up to 14B from 10B in October. So let’s suppose the number now stands at 15B (1.5e10) and that Amazon are currently adding about 1B objects a month.

I’ll leave the other assumptions alone, for now.

Amazon’s S3 pricing for storage is \$0.15 per GB per month. Assume all this data is stored on their cheaper US servers and that objects take on average 1K bytes. So that’s roughly 1.5e10 * 1e3 / 1e9 = 1.5e4 gigabytes in storage, for which Amazon charges \$0.15 per month, or \$2250.

Next, let’s do incoming data transfer cost, at \$0.10 per GB. That’s simply 2/3rds of the data storage charge, so we add another 2/3 * \$2250, or \$1500.

Then the PUT requests that transmit the new objects: 1B new objects were added in the last month. Each of those takes a PUT, and these are charged at \$0.01 per thousand, so that’s 1e9 / 1e3 * \$0.01, or \$10,000.

Lastly, some of the stored data is being retrieved. Some will just be backups, and never touched, and some will simply not be looked at in a given month. Let’s assume that just 1% of all (i.e., not just the new) objects and data are retrieved in any given month.

That’s 1.5e10 * 1e3 * 0.01 / 1e9 = 150 GB of outgoing data, or 0.15K TB. That’s much less than 10TB, so all this goes out at the highest rate, \$0.18 per GB, giving another \$27 in revenue.

And if 1% of objects are being pulled back, that’s 1.5e10 * 0.01 = 1.5e8 GET operations, which are charged at \$0.01 per 10K. So that’s 1.5e8 / 1e4 * \$0.01 = \$150 for the GETs.

This gives a total of \$2250 + \$1500 + \$10,000 + \$27 + \$150 = \$13,927 in the last month.

And that doesn’t look at all like \$11B!

Where did all that revenue go? Mainly it’s not there because Amazon only added 1e9 objects in the last month, not 1e15. That’s six orders of magnitude. So instead of \$11B in PUT charges, they make a mere \$11K. That’s about enough to pay one programmer.

I created a simple Amazon S3 Model spreadsheet where you can play with the numbers. The cells with the orange background are the variables you can change in the model. The variables we don’t have a good grip on are the average size of objects and the percentage of objects retrieved each month. If you increase average object size to 1MB, revenue jumps to \$3.7M.

BTW, the spreadsheet has a simplification: regarding all data as being owned by one user, and using that to calculate download cost. In reality there are many users, and most of them will be paying for all their download data at the top rate. Also note that my % of objects retrieved is a simplification. Better would be to estimate how many objects are retrieved (i.e., including objects being retrieved multiple times) as well as estimating the download data amount. I roll these both into one number.

• johander

Hi Terry – this is great info. I’m curious if you ever tried to estimate the # of customers or S3 access tokens required to drive the above numbers? Any idea how many blocks customers’s store on average?

What I’m trying to figure out is what the distribution of customers look like in a typical cloud storage app like s3. If you took all the storage, put customers into buckets based on how much data they have stored, and looked at the histogram, what would you see?

My assumption is that the distribution is highly skewed towards 0 with the majority of customers storing a small amount of data and the tail representing the bulk. Old 80/20 rule. 20% represents 80% of that data.

Do you have any thoughts on this?

• johander

Hi Terry – this is great info. I’m curious if you ever tried to estimate the # of customers or S3 access tokens required to drive the above numbers? Any idea how many blocks customers’s store on average?

What I’m trying to figure out is what the distribution of customers look like in a typical cloud storage app like s3. If you took all the storage, put customers into buckets based on how much data they have stored, and looked at the histogram, what would you see?

My assumption is that the distribution is highly skewed towards 0 with the majority of customers storing a small amount of data and the tail representing the bulk. Old 80/20 rule. 20% represents 80% of that data.

Do you have any thoughts on this?