Many Cloud Providers:
1. AWSa. EC2 : Elastic Compute Cloud
b. S3 : Simple Storage Service
c. EBS : Elastic Block Storage, accessed by EC2 instances
2. Microsft Azure
3. Google Compute Engine
4. Rightscale, Salesforce, EMC, Gigaspaces, 10gen, Datastax, Oracle, VMWare, Yahoo, Cloudera, etc
Two Categories:
1. Private cloud : accessible only to company employees2. Public cloud : service to paying customer
Advantages:
* Cloud computing is useful to save money and time to bring up new compute and/or storage instances- A new server can be up and running in 3 minutes, unlike 7.5 weeks to deploy a server internally.
A 64-node Linux cluster can be online in 5 minutes, unlike 3 months if deplyed internally
- Reduction in IT operational costs by roughly 30%
- A private cloud of virtual servers inside datacenter has saved Sybase $2 million annually because the company can share computing power and storage resources across servers
- Startups can harness large computing resources without buying their own machines
What is a cloud?
Cloud = Lots of storage + compute cycles nearbyCompute is brought closer to data rather than data being moved closer to compute
1. A single-site cloud (aka "Datacenter") consists of
a. Compute nodes (grouped into racks)
b. Switches, connecting the racks - many top of rack switches are connected to one core switch in a 2-level network topology
c. A network topology, e.g. hierarchical
d. Storage (backend) nodes connected to the network
e. Front-end for submitting jobs and receiving client requests
f. Software services
2. Geographically distributed cloud consists of multipe such sites and each site perhaps with different structure and services
History:
1. First data centers (1940-1960) - ENIAC, ILIAC - They occupied entire hall2. Time sharing companies and data processing industry - punch cards as i/p and o/p (Honeywell, IBM, Xerox)
3. Clusters/Grids (1980-2012) - Personal computers (Cray, Berkeley NOW Project, Supercomputers, Server Farms (Eg Oceano), Bittorrent, GriPhyN
4. Clouds and datacenters (2000 - present) - similar to dataprocessing era but different workloads
Technology Trends
1. Moore's law : CPU compute capacity doubles every 18 months; earlier it was CPU frequency, now it is number of cores2. Storage doubles every 12 months
3. Bandwidth doubles every 9 months
User Trends
Biologists are producing PB/year of data which needs to be stored and processedProphecies:
1. Computer facility operating like a utility (power or water company)2. Plug your thin client into the computing utility and play your favorite Intensive Compute and Communicate Application
Unix is a precursor for this vision
What's new in today's clouds?
1. Massive scale : Large datacenters2. On-demand access : Pay-as-you-go, no upfront commitment
3. Data-intensive nature : TBs, PBs, XBs - daily logs, forensics, web data, compressed data
4. New cloud programming paradigms : MapReduce/Hadoop, NoSQL/Cassandra/MongoDB, etc
I. MASSIVE SCALE
Power is either off-site (hydro-electric or coal) or onsite(solar panels)WUE = Annual Water Usage / IT Equipment Energy
PUE = Total Facility Power / IT Equipment Power
II. ON-DEMAND ACCESS : *AAS CLASSIFICATION
1. HaaS : Hardware as a Service - access to barebones hardware machines; but security risk2. IaaS : Infrastructure as a Service - No security holes as HaaS. Flexible computing and storage infrastructure. Virtualization is a way for achieving this. Eg: AWS
3. PaaS : Platform as a Service - Flexible computing and storage infrastructure, couple with a software platform (not in terms of VMs), easier but less flexible than IaaS. Eg: Google AppEngine/Compute Engine
4. SaaS : Software as a Service - access to software services, when you need them. Eg: Google docs, MS Office on demand
III. DATA-INTENSIVE COMPUTING
1. Computation-Intensive computing - MPI-based, high-performance computing, Grids; typically supercomputers2. Data-Intensive - store data at datacenters, use compute nodes nearby since movement of enormous amount of data would unnecessarily consume a lot of bandwidth, compute nodes run computation services; CPU utilization no longer the most important resource metric, instead I/O (disk and/or network) is.
IV. NEW CLOUD PROGRAMMING PARADIGMS
Easy to write and run highly parallel programs in new cloud programming paradigms1. Google : MapReduce and Sawzall
2. Amazon : Elastic MapReduce service
3. Yahoo : Hadoop + Pig, WebMap
4. Facebook : Hadoop + Hive
Economics of Clouds:
2 categories of clouds - public vs private cloudsOutsource or Own?
Do cost analysis and determine break even points for duration that the cloud/service will be operational
No comments:
Post a Comment