The keynote speaker at LISA09 (www.usenix.org/events/lisa09/) was Werner Vogels, CTO of Amazon talking about cloud computing. I went all Chuck-Thompson-style-stenographer during the talk, and my raw notes are below. I hope it’s useful for people interested in cloud computing but couldn’t attend (either the conference or the keynote because of training, etc.)

—–

animoto – takes music files and photos and makes movies
- very seductive
- started as college kids with laptops; still working just off laptops and the cloud
signed up 25k customers an hour (facebook app)

infrastructure as a service

setting things straight:
* not amazon selling excess, designed as self-standing business
amazon is customer of aws, but others even larger

* sa jobs change if you insert cloud services into portfolio, but
not designed to end SA jobs

definitions liked and disliked
cloud = things somewhere managed by somewhere else
cloud is target for criticism, even when technology is used by something else
(air new Zealand failed due to mainframe failure, not “cloud”)

a style of computing where massively scalable it related capabilities are provided as a service across the Internet to multiple external customers – Gardner 2008

missing “on demand” – and releasing when no longer needed
missing “pay as you go” – if you don’t use, you don’t pay

foreign concept in computing, but not in other infrastructure (power, water, gas)
true utility – only pay for what you use

precursors:
: software as a service
- what it meant to create software that was operated, efficiently, at scale
– different expertise

: distributed computing
- needed new algorithms to work at a large, cost effective scale
- academics of 5 years ago didn’t cover it

: virtualization
- beyond cpu – disk, guarantees on IO, networking

: SOA – service orientated architecture
- simple description of service, don’t need libraries/software to determine how to use
http, REST, xml, whatever… then build your own w/ software
- no integration, pure tools

if major business magazines start calling Bezos crazy, we’re doing things right

first 5 years, amazon a traditional web shop. one single app ran amazon
- different command line flags for different purposes
- centralization not all bad

~2002 rearchitecture needed, and then Target wanted amz to do a site for them
- made them realize major opportunity out there
- to achieve a level of scale amz could not reach by self
- principle: drive cost down to infrastructure, we can lower retail prices
- competing on price is good for customers

next architecture should be platform
- service oriented approach, internal and externally
- now some of the largest ecommerce sites run on this platform

- take a piece of business logic, what data it operated on: break out to API
- API becomes only source to data
- bit by bit transition: business logic could move slowly

each amazon page hits 250-300 different amazon services to build/return
- each action/logic an independent service

technical architecture, and management architecture
- teams run services, build them, improve them, maintain them, enhance them
- no separation between development and IT
- thought: engineers would be in contact with customer, and motivated to drive improvement
- didn’t want an isolated operations department, did not want to isolate engineers

- allows fast innovation, mobility
– get a small team, standard way to get resources, try things out

600 services means 600 teams

internal rules for how to deal with failures
- if we lose a whole data center, customer should not be affected
- two data centers, may violate SLA, but functionally should be 100% correct

teams needed to scale w/ capacity
- doing that in parallel 600 times over: not so good
- became obvious when teams started communicating more and more w/ networking group

70/30 switch
70% of time scaling of servers, dealing w/ infrastructure matters
- more databases, new storage techs, failover testing

(mini aside from slides)
“game days” – just to make sure guarantees are met
- turn off a data center
- see what happens
- first time, looked really good on paper but didn’t work well in reality
- not just failover, beepers going off, switching to secondaries, etc.
- restore operation was difficult
– but if you don’t exercise frequently, you cannot tolerate when happens in real life

30% of time, energy, dollars on differentiated value creation
70% of time, energy, dollars on undifferentiated heavy lifting

technical debt for creating complicated solutions
- pay it off by making a shared service platform
- hoped to flip 70/30 switch

there isn’t an enterprise pattern you won’t find in amazon (search, prediction tracking, batch processing, etc.)

for some reason we tend to put DCs next to trailer parks, and trailer parks attract tornadoes

horror stories about data center loss

need multiple data centers; can’t pour money into one good one, need to focus on multiple good/ok ones?

disk failures are not age related (in large numbers)
8-10% of disk will fail per year, if makes it past infant mortality
- burn in is important
no prediction of which disk will fail per node
one guy in DC whose full time job is nothing else but replacing disks
10 guys with disk dense nodes

capacity planning for the busiest day of the year is hard
– average 200n a day, bursts to 900n in Q4/holidays

web-scale computing
- scalable infrastructure

virtualized infrastructure, start w/ computing
give engineers requirement to start building against software platform
amz does not mandate tools, languages — trusts engineers to make best choices, hiring practices to support this
– so forcing them to switch to programming paradigm would be bad
- so virtual machines were a part of the platform

amz was really good at server provisioning
- 5-6 hours from request to use
- but still did not incense engineers to release capacity when they no longer needed them
- needed to switch to minutes to provision
– learned (now) that still does not incense release behavior

- behavior comes from automation, because settings things up no longer difficult

chef, scalar, open source tools
- “provisioning is programmable”

VMs and xen -> amazon ec2 elastic computing cloud was born
- we’re also cheap
- emulate data centers as “reliability zones”, worldwide regions

storage
17% 70% of the storage was not relational – just key value
- 1 index
- turned into amazon s3

__% 20% of storage was single table, lots of indexes but no other tables
- amazon simpledb

amazon elastic block storage – treat it like a hard disk, present to ec2
relational databases, squid,

relational database service – “dba as a service”
not inf scalable database, but they run mysql for you – tune to size and storage
but they run it for you
use std mysql libraries
- not something they originally designed internally, matched customer need
- but now that it exists amz engineesr use it

saw how efficient engrs came using this service

amz infra services
principles:
scalable – incr/decr capacity in minutes
cost-effective – low rate, pay as you go
reliable – amazon’s proven infra (aware of cost of outages to customers)
secure – multilayer security

developed as tools
- each should be independent, or seductive on its own
- tools in your tool box, not forced to use all of them together

amazon virtual private cloud (vpc)
can create a walled garden in the cloud
vpn back to your environment
give a cidr block to vpc, then start creating subnets
- can allocate ec2 instances in those subnets
- either part of your block in your enterprise
looks like these nodes in the cloud are part of your local infra
std management tools, deployment tools, auditing tools still work
seamless extension of data center to cloud

if you release resources, you save money
increase agility – massive server park, around the world, multiple data centers
- removes constraints to thinking about this
- access to infinite scalable resources

enterprise computing these days is kinda 1990s
many haven’t reached client/server yet
too many amber terminals still in use
so “new IT” development happening in cloud

82B objects in s3 2009q3

enterprise partners: oracle, ibm, ms, redhat, sun
- but they’ve adopted different licensing models
- stop hammering them about cloud strategies – ask about how licensing is changing
– make sure they are changing (pay as you go)
- ibm is more creative: use all for free in development, production: pay small ec2 upcharge
pay a bit more on an hourly basis (10c/hour instead of 8.5c/hour)

use case trends:
DR, HPC, collaborations, large scale analysis, load testing, marking compliance
- allocate cloud for DR purpose if needed, and use for R&D, load testing

example case:
storage in s3, computing power on customer desktop (adobe air)
- back in time analysis of stock ticker info
-> could build new product w/o infrastructure

washington post example of disseminating hillary clinton’s schedule

guardian all expense reports for government, let users read all of them, mark suspect ones
-> crowdsourcing used the cloud

Internet Scaling
indy.com, indy500.com – 8 flash streams, lots of video feeds, etc. 3 times a year
espn
playfish – scaling gaming online
- farmville (?) 25M concurrent users
most marketing campaigns need the scalability

eharmony – 250 marriages a day
- use a relational database (ha ha ha – he didn’t get joke first time)
- large map reduce jobs go at night, new results in the morning

netflix picked amz b/c needed a platform that would really scale up
- even if they compete w/ amz for VOD
- amz wants netflix to be successful

can get hippa certified for putting data in amz
load testing turbotax.com prior to tax day

amz really wants the feedback to tweak operations, create new business models
- relational database service (rds)
- reserved/predicted ec2 at savings

reduce prices of storage, networking, and pass savings to aws

@werner on twitter
mynameise.com/werner
- integrates all these social network info

aws.amazon.com/ “and a credit card is all you need”

Questions:

- what happens if oracle kills sun/mysql?
A: not something they have much control over,

- chef, cfengine, config automation: how do you do network? is the net static, or do you provide tools to manage net?
A: build and provide some high level tools
elastic load balancer
except for vpc part, don’t give customers network control themselves
machines have firewalls and users can adjust those
but no control over edge routers, pieces

- virtual gardens/data security: strongly regulated data. how does that get affected?
A: many different layers in this story
what happens if you remove object? what happens if disk fails?
guarantees are necessary for aws to get certifications themselves
for the rest, it’s something amz tries to help their customers w/
security an end to end issue, so encryption still a good solution
see customers split data/metadata to encrypt payload but index/table metadata
- but then who manages the keys? amz does not do key management
some customers achieved hippa, ferpa
- whitepaper on website about this
but end-to-end issue
popular healthcare as a platform providers to work in this space
large parts work in the cloud, large parts stay in the walled garden

End of questions/talk.



4 Comments to “LISA09 Keynote: Power of Infrastructure as a Service – raw notes”

  1. Werner | November 4th, 2009 at 10:52 am

    Dave, the numbers for storage were 70% (not 17%) for key/value and 20% for single table.

  2. Dave | November 4th, 2009 at 10:56 am

    Thanks, Werner. I’ll edit that in. Thanks for the talk, and the followup.

  3. Andy | November 5th, 2009 at 1:14 am

    I always enjoy learning what other people think about Amazon Web Services and how they use them. Check out my very own tool CloudBerry Explorer that helps to
    manage S3 on Windows . It is a freeware. http://cloudberrylab.com/

  4. Learn | November 22nd, 2009 at 4:28 pm

    It looks like the cloud computing model(AWS) is working.
    Thanks for sharing the details.

Leave a Comment