VMware – VMworld 2014 – Wrap-up

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream

Overall Impressions

Since a lot of my blog posts this week have made use of bullet points, why don’t I keep it up here?

  • Massive scale conference (compared to what I’ve been to in Australia)
  • Very helpful staff
  • So good to meet people I know from twitter in real life
  • Content at the presentations was invaluable
  • Solutions Exchange was great too (I’ll pay for all those t-shirts with a bit of extra e-mail though)
  • It’s a long way from Australia but worth the trip

Parties

VMunderground

Great event. I spent a lot of time meeting people I’d only ever communicated with on twitter. Here’s a picture I took from the outside area.

VMunderground

EMC Agents of Change

Held at the Minna Gallery, EMC put on a Bond-themed party with some nice food and drink. It was a little bit packed, but lots of fun. I also picked up some martini glasses and a shaker. The shaker didn’t really make it though.

glass

VMworld Party

The Black Keys played in Moscone North and I think they sounded great considering the pretty awful acoustics in the room. Here’s a picture.

BlackKeys

Swag

Too many t-shirts to list. And some nice glassware that won’t survive the baggage handlers I’m sure. But here’s a photo anyway.

swag

Thank You

Once again I’d like to thank Corey and Amanda from the VMware Community & Social Media team for the amazing opportunity to come to VMworld this year and am looking forward to (hopefully) getting back next year. I’d also like to say thanks Sean Thulin from EMC, and the other EMC Elect people I ran into, for making me feel welcome and part of the community. Finally, a big shoutout to the other bloggers and vExperts I met throughout the week – it’s great to meet people from such a diverse range of backgrounds (and geographical regions) who are driven by a common purpose.

Next_Year

 

VMware – VMworld 2014 – Thursday General Session

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream 

This is a summary of my notes from Thursday’s general session. Firstly, a photo from the blogger’s table.

VMW_Thurs_1

Robin Matlock was up first. She offered the following statistics on the week:

  • 9300 HOLs, 87000+ VMs this week (that number will go up as the HOLs don’t close until 3pm)
  • Destination giveback raised $248,460

She then introduced Jane McGonigal (@avantgame).
She shared research on how games change how we do stuff. Check out her book “Reality is Broken”. Here’re few stats:

  • 1 billion people spend an hour a day playing games on some kind on a connected device (phones, consoles, tablets, etc)
  • 300 million minutes a day playing Angry Birds
  • Call ofDuty – 170hrs per year on average
  • Gallup – 81% of people are not engaged with their jobs.
  • 92% of 2yo children play games in the US

“It’s inevitable, soon we’ll all be gamers”

Use our skills as gamers to do good in the world. She has been studying this for 13 years and has a PhD in how games can change our life. Positive emotions can be had from gaming

VMW_Thurs_2

She then got the crowd to engage in massively-multiplayer thumb wrestling. The first to pin someone’s thumb wins. Positive emotions are the neurological foundation of resilience – “the opposite of play isn’t work – it’s depression”. Self-suppression (escaping feelings) vs self-expansion (you want something from the game – to challenge yourself).

 

Robin then introduces the next speaker, James Patten.

James does a lot with how we interact with computers. He’s very interested in building new interfaces that take advantage of human abilities, rather than us having to do all of the work to get things out of machines. Check out some of the projects on his site, they’re pretty cool. Maybe the coffee table is the computer of the future? Turn the surface into an interactive space where the objects come alive to help you solve a complex problem.

 

The final speaker is Sean Gourley.

He covers a number of topics in clouding the concept of Man vs the Machine and an example using Kasporov’s Freestyle Chess. It’s not about artificial intelligence, it’s about augmented intelligence (humans and machines teaming up to solve problems). Go check out Quid – it’s a really awesome looking tool (based on the short demo I saw).

And that’s a wrap. 4 stars.

 

VMware – VMworld 2014 – STO1424 – Massively Scaling Virtual SAN Implementations

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream

STO1424 – Massively Scaling Virtual SAN Implementations

STO1424

This session was presented by

Adobe Marketing Cloud Background

  • Massively scaled SAAS base infrastructure
  • Globally distributed
  • 10s of thousands of servers
  • World class operations – Techops
  • Supports operations for multiple product teams that form the Adobe Marketing Cloud

How do they do massive scale?

Frans covered some pretty simple stuff here, but it’s worth listing as a reminder.

  • Standardisation – build on standard commodity hardware
  • Configuration management – CMDB to track and manage device services, roles, etc
  • Automation – Cobbler and Puppet are used to deploy a large amount of machines in a short period of time
  • Self service – provision resources based on workload and product requirements

They don’t want to build “snowflakes”.
VSAN is just another tool in their toolbox. VSAN isn’t going to replace their current storage, it’s complimentary. It’s not going to solve every problem you have, so you need to know your workload.

First Use Case: Core

  • A simple solution to provide core services in every DC – DNS, mail, monitoring, authentication, kickstart, etc
  • Beach Head – DC standup tool.
  • Highly available
  • Not dependent on SAN
  • Standard hardware
  • Took a 1RU configuration, added memory, NICs and reconfigured disk setup to produce “Core” platform.
  • Becomes the building block used to build and manage other services from (Cloud, vCache)

Cache to vCache (It was a Journey)

  • Cache: a server role in a digital marketing group with a large server footprint (approx 8000)
  • Processes billions of hits a day
  • Very sensitive to MTTR
  • Hardware only, mostly blades
  • Actual servers small footprint – 16GB RAM, 146GB HDD, Low CPU usage
  • CentOS – custom monitoring and mgmt tools
  • Widely distributed
  • Current hardware was up for refresh
  • Software wasn’t able to take advantage of the hardware

Requirements: Enter vCache

  • Keep the hardware close to the original platform
  • Do not change server configs
  • Better MTTR
  • NO SAN
  • 4:1 consolidation ratio, starting with 3:1
  • Solution for in-depth monitoring and anomaly detection
  • Automate deployment
  • Deploy 3500 hosts and 14000 VMs globally

vCache version 0.1 (PoC)
Step 1

  • Needed to see if Cache could even run as a VM
  • Used William Lam’s post on virtuallyghetto for SSD emulation on existing hardware
  • Kicked a lot of hosts (7) at once – not happy. 1 at a time was ok – not enough IO to do it.
  • Did ok with 10 million hits per day – but had problems with vMotion and HA.
  • Result: sort of worked, but you really need SSDs to do it properly.

vCache Version 0.5
Step 2 – Meeting the requirements

  • Blade chassis is NOT the best platform for VSAN deployment. For them it works because they had low disk requirements and a 4:1 consolidation ratio
  • Selected MLC SSD – This was down to Cost for them.
  • Setup a VSAN cluster chassis (16 nodes)
  • vCenter resides on Core
  • HA enabled and DRS fully automated

Lessons learned from 0.1 – 0.5

  • Use real world traffic to understand the load capability
  • Use VSAN Observer
  • Test as many scenarios as possible – Chaos Monkey
  • With no memory reservation, they filled disks quicker than expected
  • Stick to the HCL or lose data
  • There’s a KB on enabling VSAN Observer

The Final design

  • Management cluster – Core runs the vCenter appliance
  • Multiple vCenters for segmented impact when failure occurs
  • Setup auto deploy
  • Build host profiles
  • Establish a standard server naming strategy
  • 6 clusters per vCenter, 16 hosts per cluster, 4 VMs per host
  • VSAN spans a chassis but no more (they don’t always have 10Gbps in their DCs)
  • VMs: 16GB, 146GB, 8vCPU and Memory reservation set to 16GB
  • Blade: 96GB, …

Use Adobe SKMS / CMDB as the automation platform

  • SKMS – a portal for device management
  • CMDB – configuration management database
  • Custom build that has tools for deployment (virtual / physical)
  • Tracks device states
  • Contains device information
  • Provides API access to other services to consume
  • Some services including: Cobbler, Puppet, DNS, self service portal
  • Used a lot of concepts from Project Zombie

Automation of vCache

  • Deploy vCenter appliance via Puppet
  • https://forge.puppetlabs.com/vmware/vcsa

Auto deploy

  • Does a lot of the work
  • It has shortcomings – can only deploy to one vCenter in a DC
  • Alan Renouf has a workaround

Chassis Setup

  • DC receives, racks and cables, sets up the management IP, sets to “Racked pending deployment”
  • Chassis configuration script goes out
  • Blades boot via iPXE chaining, checks if it’s configured, runs a firmware update if required and vCache disk configuration script then chains to Auto Deploy.
  • Configured blades boot via Auto Deploy to vCenter for the configured subnet

Blade Setup

  • Cluster gets created in vCenter via a script.

VM Setup

  • Creates an empty vCache template
  • Clone 48 VMs via template
  • MAC addresses, devices names, etc get added to CMDB
  • Set to “Pending Config”
  • Cobbler set to “Ready to PXE”
  • VMs power on at this point
  • VMs kick and puppet manifest is applied
  • Machines marked as “Image complete”
  • They are then added to monitoring and moved to the cache spare pool ready for use

Final steps

  • Standard Operating Procedure (SOP) Design
  • Additional Testing – finding the upper limits of what they can do with this design
  • Incident simulations
  • Alert Tooling – keeping an eye on trends in the environment

What’s Next?

  • Move away from big blade chassis to something smaller
  • Look at Puppet Razor as a deployment option
  • Testing Mixed workloads on VSAN
  • All Flash VSAN
  • Using OpenStack as the CML
  • Looking at Python for provisioning

Andrew then came on and spoke about getting into the Experiment – Prototype – Scale Cycle as a way to get what you need done.

VSAN Automation Building Blocks

VSAN Observer with vCOps

VSAN and OpenStack

Workloads on VSAN

  • Design policy per workload type
  • IO dependent? CPU? or RAM?
  • Core management services: vCenter, Log Insight, vCenter Operations Manager
  • Scale-out services: Storm, Cassandra, Zookeeper cluster
  • What would you like to run? Anything you can virtualise.

And that’s it. Very useful session. I always enjoy customer presentations that don’t just cover the marketing side of things but actually talk about real-world use cases. 4.5 stars.

VMware – VMworld 2014 – BCO2701 – vSphere HA Best Practices and FT Tech Preview

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream

BCO2701 – vSphere HA Best Practices and FT Tech Preview

BCO2701

If you’ve read my other posts from this week it’s probably pretty clear  that I should give up the charade of not providing a transcript of the session. So let’s just call them notes and not talk about it any more, yeah?

This session was presented by:

  • GS Khalsa – VMware Senior Technical Marketing Manager
  • Manoj Krishnan – VMware Engineer – vSphere HA

Agenda:

  • What’s New
  • Failure Events
  • Best Practices (Network, HA and VSAN, Host Isolation Response and Admission Control)
  • Tech Previews

What’s New
In vSphere 5.5

  • Protection for VSAN VMs (major)
  • AppHA Integration
  • VM-VM Anti-affinity rule

What is HA?

  • Minimises downtime
  • Auto VM recovery in minutes
  • Protects against 3 types of failures:

– Infrastructure – host, vm
– Connectivity – host network iso, DS incurs PDL
– App – GuestOS hangs / crashes, app hangs / crashes

  • Cluster with a master and multiple slaves
  • Heartbeats via network and storage
  •  HA network can be management network or VSAN network (if using VSAN)

Failure Events
Once the HA agents are up and running, they go into election mode.

What if a slave dies? Master declares and restarts affected VMs on another host. If the Master fails, slaves go into election mode and restart VMs. Note that vCenter only talks to one master, so if you have a network partition, this can impact where VMs will startup in a failure event.

Host Isolation
The host declares itself isolated if it can’t talk to Master via network. Lets the master know via the datastore heartbeat

Best Practices

Network

  • Redundant HA network
  • Fewest hops possible
  • Consistent portgroup names, network labels
  • Route based on originating port ID
  • Failback policy = no
  • Enable portfast, Edge, etc
  •  Same MTU Size
  •  Disable host monitoring during NW maint
  •  vmknics on separate subnets

Storage

  • heartbeats – all hosts should see the same datastores
  • choose a heartbeat datastore that is fault isolated from ha network and resilient to failures
  • override auto-selected options as required

HA and VSAN

  • Heartbeat datastore is not necessary for VSAN cluster
  • Add non-VSAN datastore to cluster hosts if VM MAC address collisions on the VM network are a significant concern
  • Choose a datastore that is fault isolated from the VSAN network
  • Set the isolation address with das.isolationAddressX
  • Configure HA to not use the default management network gateway
  • If VSAN is non-routable, use a pingable isolation address somewhere else in the network

Host Isolation Response

To delay response in 5.1+, use das.config.fdm.isolationPolicyDelaySec

There are a few options:

  • Leave Powered On (default with 5.1+)
  • Shutdown (default with 4.x)
  • Power Off

Which to use? It depends.

  • What type of VM?
  • Will host likely retain access to storage?
  • Will VM likely retain access to VM?

Admission Control
If you turn admission control off, you’re limiting what HA can do for you.

You can reserve resources in case of host failures

  • Ensures resources are available
  • No guarantee VMs will be happy after a failure though
  • Working on closing this gap

How to do it?

  • Select the appropriate policy
  • Enable DRS
  • Simulate failures to test by using maintenance mode and the Impact Assessment fling

Make adjustments if:

  • VMs are not restarted
  • Desired performance is not realised

Reducing Impediments

  • Maximise utility by enabling DRS in automatic mode
  • Maximise hosts a given VM can run on
  • Ensure sufficient resources
  • Ensure critical VMs get what they need (HA restart priority – this doesn’t guarantee VM restart order)

There are 3 Admission Control Policies
1. Percentage of Cluster Resources

  • Often best choice
  • Maximises cluster resource use
  • Use when reservation varies considerably

2. Number of hosts policy (slot algorithm)

  • Maximises chances of restarting VMs with reservations
  • Avoids fragmentation
  • Often conservative – if this is a concern, use dedicated failover host policy.

3. Dedicated Failover Host(s)

  • Best if VMs have large resource reservations
  • Impact to VMs on other hosts minimised
  • Restart time can be longer
  • Failover hosts are idle prior to failure
  • If you use this, select largest hosts to be failover hosts.

Tech Previews – FT and HA
FT for zero-RPO, zero-RTO (The version in the beta)

  • 64GB RAM per protected VM
  • 4 vCPUs
  • Uses individual, separate storage (no longer shared) for Primary and Secondary VM
  • Complete “re-imagining” of how FT functioned.
  • Lockstep was hard to scale, now uses “incredibly fast check-pointing”.

HA – VM Component Protection

  • Problem: Host has storage connectivity issue – APD or PDL
  • Difficult to manage VMs
  • Approach: Move VMs to a healthier host
  • You can choose how you want to protect and under what circumstances (sounds a bit like Australia’s border protection policy – sigh).

Admission Control Fling – vRAS

  • Coming in the next month or two
  • vSphere Resource and Availability Service
  • Assess impact of host failures and VM migration on resources using DRS dump files
  • Provides sample what-if scenarios

And that’s it. Top session, very informative. 4 stars.

VMware – VMworld 2014 – BCO2629 – Site Recovery Manager and vSphere Replication: What’s New Technical Deep Dive

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream

BCO2629 – Site Recovery Manager and vSphere Replication: What’s New Technical Deep Dive

BCO2629

Jeff Hunter – VMware, Senior Technical Marketing Architect (@jhuntervmware)

Ken Werneburg – VMware, Senior Technical Marketing Architect (@vmKen)

Intro from Jeff
These guys do the Uptime category on the VMware blog.

Now to Ken
Talks about the transition to software-defined storage and availability.

SRM 5.8

Abbreviations

  • SRM – vCenter Site Recovery Manager
  • ABR – array-based replication
  • VR – vSphere Replication
  • VRMS – vSphere Replication Management Server
  • VRS – vSphere Replication Server
  • SVR – Standalone vSphere Replication (No SRM involved)

What does SRM do?
Key features:

  • Centralised recovery
  • Non-disruptive testing
  • Automated DR
  • Integrated with VMware product stack

Used for Disaster Recovery, Disaster Avoidance and Planned Migration (Unexpected, Expected and Planned)

Recovery Workflows

  • Failover Automation
  • Non-disruptive failover testing
  • Planned Migration
  • Failback Automation

What’s new?

  • DR for the SDDC
  • Enhanced scalability
  • Simplified Ops

DR for the SDDC
Architecture

  • SRM using ABR
  • vCAC management across 2 sites
  • Integration via vCO plugin for SRM
  • New APIs for PowerCLI integration

Capabilities

  • Self-service DR provisioning using vCAC blueprints
  • Automated protection mapping according to pre-defined tiers

Benefits

  • DR control as service to application tenants
  • Quicker time to market for apps
  • Reduced complexity for infrastructure admins

SRM protection is exposed through the vCAC portal, then runs a standard vCO workflow after provisioning.

You can also

  • Create protection groups and add VMs
  • Find protection groups by datastore
  • Add protection to unprotected VMs in the replication datastore
  • And most everything available via SRM API can be used by vCO

Scalability in SRM 5.8

  • 5000 protected VMs per individual SRM (with ABR only, still 500 for VR)
  • 2000 concurrent VMs recovery

Performance Improvements
There’s a whole bunch of tweaks that have been done (Up to 75% Faster RTO particularly with ABR)

This was tested using

  • 250 protection grps
  • 2000 VMs with I am a pineapple customisation on

With the old method 29H (Storage Time 17h15), with the new method 13h (Storage Time 4h13). Total time drops dramatically when you leave out IP customisation.

Converged UI with plugin for vSphere Web Client (finally)
Rule-based management at subnet level
dr-IP-customizer was a bit of hard work. You can now setup IP customisation rules via the web client on a per subnet basis.

VSAN + VR and SRM
VSAN compatible with:

  • VR
  • SPBM configured as part of replication
  • SRM
  • SRM configuration based on VR

VR and SRM

  • Supports asynchronous replication – 15min RPO
  • VM-centric based protection
  • Automated DR op and orchestration
  • Automated failover – user defined plans

SRM can use BOTH ABD and VR. SRM will see existing standalone replication protected VMs. SRM can also install VR from scratch if required.

What’s new in VR 5.8? (Jeff)
What is VR?
Per-VM host-based replication integrated with vSphere. Note that this is included with Essentials Plus and higher.

Features:

  • Easy virtual appliance deployment
  • Integration with web client
  • Protect any VM
  • Flexible RPO
  • Quick for recovery for VMs
  • Replication engine for SRM
  • Compatible with SAN, NAS, local and VSAN storage
  • Replicate workloads to vCenter Server and vCloud Air (New feature)

Use cases:

  • Data protection and DR
  • DC migration
  • Replication engine for SRM
  • Standalone Replication
  • Within the same site
  • Across Sites with vCenter Server and vCloud Air (“Replicate to a cloud provider”)

Components:

  • vCenter Server, Web Client
  • VR Agent (VRA) built into vSphere
  • VRMS (virtual appliance)
  • VRS (virtual appliance)

Consistency
Disk consistency within a VM: Yes, across VMs? No.

Application Consistency? Supports VSS quiescing on Windows. Use this only is you need to.

VR Reporting

  • Replicated VMs (by VC and by Host)
  • Transferred bytes, RPO violations

VR Multiple point in time recovery (MPIT)

  • Up to 24 recovery points (also supported in 5.5)
  • Recovered as VM with snapshots – recovers latest replica, uses snapshot manager to recover

Best Practices

“Just because you can doesn’t mean you should”.

  • Set RPO only to what’s needed
  • Set MPIT only what’s needed
  • Use VSS only if needed

VR Resources

All in all a very informative and useful session. 4.5 stars.

VMware – VMworld 2014 – STO3161- What Can Virtual Volumes Do For You?

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream 

STO3161 – What can virtual volumes do for you?

VMW_Tues_STO3161

STO3161 was presented by:

  • Matt Cowger, (@mcowger), EMC
  • Suzy Visvanathan, VMware – Product Manager VVOLs

There were two different tracks that they wanted to cover

  1. How will this be of benefit from a business perspective?
  2. What’s going on at the 201 technical level?

Suzy starts with the SDDC overview. With the goal of VVOLs being to transform storage by aligning it with application demands.

STO3161_1

Today

  • Create fix-sized, uniform LUNs
  • Lack of granular control
  • Complex provisioning cycles
  • LUN-centric storage configurations

Today’s problems

  • Extensive manual bookkeeping to match VMs to LUNs
  • LUN-granularity hinders per-VM SLAs
  • Overprovisioning
  • Wasted resources, time, high costs
  • Frequent data migrations

It’s not about VSAN or VVOLs, it’s about how to make the external array more feature-reach, more in control. Regardless of the storage you use, they want VMware to be the platform. Here’s a picture.

STO3161_2

Suzy finishes by saying they’ve virtualised storage, but it’s not “slick” yet.

Now Matt explains the concept of Virtual Volumes.
“How many of you think LUNs suck? Only half? Are the rest of you using NFS?”

STO3161_3

At a high-level:

  • There’s no filesystem
  • Managed through VASA APIs
  • Arrays are partitioned into containers (storage containers)
  • VM disks, called virtual volumes, stored natively on the storage containers
  • IO from ESX to array through an access point called a Protocol Endpoint (PE) – this is like a Gatekeeper in VMAX, it just processes commands. There’s one PE configured per array
  • Data services are offloaded to the array
  • All managed through storage policy-based management framework

VNXe 3200 is the first place you’re going to see this.

STO3161_4

“NFS vs FC vs iSCSI is a minor implementation detail now”

Storage pools host VVOL containers. You can look at capability profiles for various containers

Because the array is completely aware of the VM, you can do cool stuff, like offloading snapshots. Array managed cloning – better than VAAI XCOPY.
What we really want to do is manage applications by service level objectives via policy-based automation. This is what VMAX3 is all about.

So where does ViPR fit? Isn’t that what you just showed us?
There are array-specific details (i.e. Gold on VMAX vs Gold on VNX). These can be different on each array. That’s not ideal though. ViPR provides a single point for the storage to talk to, a single point for vSphere / VASA to talk to, a consistent view.

VNXe 3200 Virtual Volumes Beta starts in Q4 2014 (e-mail [email protected] for more information).

Note that there’s no support for SRM in the first version. They are working on per-VM replication. The arrays can replicate VVOLs though. RecoverPoint support for per-VM is coming too.

By making VVOLs work the same across all of the protocols, you get to be interested in what the storage arrays can do, not the protocols.

Hope that helps some. Matt and Suzy did a great job presenting. 4.5 stars.

 

VMware – VMworld 2014 – Tuesday General Session

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream 

Firstly, a blurry photo from the blogger’s table. You can make out Chris Wahl and Mike Laverick.

VMW_Tuesday_Keynote

Ben Fathi – CTO – kicks off by talking about the VMware Foundation and Destination Giveback. You can find out more here.

A theme from yesterday that is repeated in today’s keynote is that of conflict:

  • traditional applications vs cloud-native applications
  • IT vs developers
  • on-premises vs off-premises
  • safe and secure vs instant, elastic

According to VMware, it’s not a choice you have to make. VMware believes in the power of “&”. The SDDC architecture gives you the power to do this.

Sanjay Poonen – Executive VP and GM of EUC then takes the stage.
We live in a heterogenous world. VMware wants to cover all the platforms and help you work at the speed of life. There are three major areas covered here:

  • Desktop;
  • mobile;
  • content.

VMware want to bring the “all about the end-user” folks together with the “all about the infrastructure” people.
Desktop – Unified VDI and App publishing, Desktop-as-a-Service, real-time app delivery, rich user experience (there was a partnership announcement with NVIDIA, Google and VMware – 3D on NVIDIA-powered Chromebooks).

Mobility – Device management, application management, content management, email management.
Security, multi-tenant, scale, privacy, access-control, self-service.

A partnership with SAP and VMW is announced – the key benefits being better integration / lower TCO, and faster time to installation. I’ve just never thought of SAP as a leader in mobility.

Content-collaboration
Anywhere, anytime access, hybrid deployment, enterprise-grade security, unified access to all content.

And this is all integrated. Apparently.

VMware Workspace Suite – horizon desktop, air watch mobile, content locker with a workspace portal

Kit Colbert, CTO of EUC then takes the stage.
Define centrally, implement locally – this is what the mobile cloud architecture enables.

The recorded demo is focussed on healthcare with some cool scenarios. They also demoed CloudVolumes with workspace integration.

Project Fargo is also discussed.

With their EUC strategy, VMware is going for

  • A unified experience, on any device, anywhere
  • customers driving industry change
  • optimised for the SDDC

Raghu Raghuram – EVP of SDDC then takes the stage.
“You are Team SDDC”
vSphere 6 Beta (over 10000 downloads)
Virtual SAN
NSX is now GA (150 customers)
vRealize

VMware is delivering the power of AND – this is something I forgot to talk about when I summarised yesterday’s keynote.

The SDDC is 1 destination with 3 choices of how to get there – BYO, Converged, Hyper-converged.
Broad topics are:

  • hardware choice;
  • open cloud infrastructure;
  • all applications; and
  • management policy.

With these SDDC components, you can have the right (standard) building block. -> For VMware this is EVO.
Ben Fathi comes back on stage.
15 minutes – that’s how long it takes to get EVO:RAIL (Virtual infrastructure) up and running VMs.

EVO Supports:

  • 100 Server VMs, 250 desktop VMs;
  • deploys in 15 minutes,
  • Supports non-disruptive upgrades

It is comprised of 4 identical but independent nodes with all the storage, compute and networking included.
You can scale out – 4 RAIL units can be connected together for a 16 node cluster.
The simple, Web-based UI can be used to create VMs that are small / medium / large. You can also leverage the standard management suite if you want to. This is GA Q3 2014.

EVO RACK (Cloud Infrastructure) – vCloud Suite, VSAN, NSX.
Data Centre Scale – go from Zero to Application in less than 2 hrs.

VMware integrated OpenStack is now in beta. This is “the best way to run OpenStack is on VMware”.
vSphere, NSX, VSAN + vRealize (operations, visibility, cost management)

vSphere 6 Beta features
SMP FT for scale-up apps (4 vCPUs) [applause]
Application mobility – cross vcenter vMotion, long distance vMotion (NSX enables this). [more applause]

Cloud-native Applications (Ben)
Containers have been around for a while (10 – 15 yrs). Then Docker came along. VMware believes in containers without compromise – persistence, network and security, resource management. They tell us they’re “working to make containers a first-class citizen in the SDDC”. Working with Pivotal, Google and Docker to make sure this happens.

 

vRealize (Product name fail of the year)

Now looking at on-premises – cloud automation – cloud operations – cloud business – as-a-service
Management by Policy (this is where the crowd start to lose interest – there’s something about management stuff that just isn’t sexy to the great unwashed).

Raghu says “The future is here – it is just not evenly distributed”. At least I think that’s what he said. I’m not sure what that means though.

Finally, Simone Brunozzi – VP and Chief Technologist – Hybrid Cloud takes the stage. @simon shows us footage of an alert in vCloud Air via Google Glass. And some other vRealize stuff. And our attention spans have once again failed us.
“Hybridity”

All in all, slightly more meat than yesterday, which was useful. 4 stars.

 

VMware – VMworld 2014 – Monday General Session

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream

Sorry this one’s a bit later than the others, but there’s a bit to get through. I don’t want this to be a transcript of the keynote because, frankly, that wouldn’t be quite as interesting as being there. And it would make for a pretty long read. But, there was some neat dancing, loud music, and interesting graphics on the screens to start.

 

The keynote starts with Robin Matlock, CMO of VMware, with a brief mention of the quake in Napa. She introduces the topic of change by way of introducing the diversity of the audience. The only constant in our business being change. Believe in what you’re doing.

 

And then it’s Pat Gelsinger’s (VMware CEO) turn to take the stage. The overall theme is bravery and informed risk taking. The main announcements from Pat were around:

Then Project Marvin was unveiled. You check the links below. Interestingly, the 6 initial hardware partners are Dell, EMC, Fujitsu, Inspur, NetOne and SuperMicro to begin with. Duncan has a pretty excellent summary here.

Pat also discussed vRealize and the VMware / Openstack integration, Simon from El Reg has a good write-up here.

 

Bill Fathers also presented on cloud and where that’s going for VMware. Excellent sense of humour.

 

To wrap-up, Carl Eschenbach – President and COO, took the stage for some customer testimonials and general good feelings.

All in all, nothing earth shattering here, although I think EVO is going to be a pretty cool play. 3.5 stars.

 

 

VMware – VMworld 2014 – VAPP2457 – Hitting It Out Of The Park: vSphere and MLB Network

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream 

VAPP2457 – Hitting It Out Of The Park: vSphere and MLB Network

This session has been one of my highlights so far, simply because I’m fascinated by sports broadcasting and the infrastructure that goes into media production at a broadcast level in general. Again, I’ll try not to post a transcript, but just some of the bits I found interesting. Which means it will be a transcript, because this stuff rocks in my opinion. But firstly, the proof of life photo.

VMW_Monday_MLB

 

This session was presented by Tab Butler – MLB Network Director – Media Management & Post Production. Tab knows his stuff, and I found this to be a particularly entertaining session for that reason alone. And these guys are everywhere, at least in America.

Tab starts out with a bit of background of the Network, and it sounds pretty intense.

The big challenge was working out how to bring up a back office infrastructure within a few weeks. vSphere was the key to the success here. To give you an idea of the timeframe they were working to, they gave their hardware vendors a PO on the Friday, and were hopeful that they could deliver on the Saturday. I simply can’t imagine that happening in AU.

For production they used Grass Valley Aurora on standard (Dell 9G) servers – about 400 of them. They have 80 recording channels bringing content in, and the data is stored on 2 SANs. Tab load balances across the two by having an American League SAN and a National League SAN. They also used Apple Final Cut Pro 7 (still a staple for performance editing). This is then fed into a file-based service called the Omneon system (I swear there’s more acquisitions going on in this industry than in the storage industry) sitting on more (13) file-based SANs. There wasn’t a media asset management system agile enough to meet their requirements so they built their own – the Diamond System. This is a big ole system for logging metadata to a database. Seems simple enough, but it still took a while to put it together. Their in-house development continues today.

Funnily enough, the SANs fill up quickly, and content needs to be moved within 8 hours to the archive, based on LTO-4 (StorageTek / Sun / Oracle SL8500).

MLB Productions (the keeper of the archives) also leverage this system. This is where the footage you see in movies and documentaries comes from. There’s an on-site and off-site copy of the library.

Today, they record 7 hours minimum of baseball for every hour played:

  • A clean feed (no graphics) from the home team and from the away team
  • With graphics from the home team and from the away team
  • Dugout isolation camera
  • Backup records, and other iso records that people want as well.

In 2009 / 10 it was all getting too much. So how about running virtual in the production space? They asked Grass Valley what they did to scale at trade shows, and the answer came back: VMware. With the help of an initially reluctant Grass Valley team, they put it on a pair of IBM servers running 8Gbs FC to NetApp E-series and it’s still running today.

The Diamond System has grown, and now runs on Simplivity Omnicube CN-3000s, while Omnicube CN-2000s run on the back-end for development.  Incidentally, if you’re at VMworld in SF this year and want to know more about what the Omnicube can do, go and talk to David at the Simplivity stand. He’s a good guy and knows his stuff.

There’s over 500,000 hours of content in the archive now. To put it in perspective, as of 2008 they had around 150,000 hours of content, and they’ve tripled that in the last 6. It’s not just MLB Productions or MLB Network using it, it’s also used for umpire review, and by the teams as a critical source of information and analysis.

Five years in though, they were running into issues with the post-production infrastructure, because a bunch of stuff was either approaching EOSL or well and truly there.

So they went for:

  • Open architecture (vSphere, NetApp E-Series)
  • Scalability and performance (UCS)
  • Convergence (Simplivity)

All with a CIO-mandated 7-year ROI. Eeep! And issues with power, cooling and space. And 25% infrastructure growth.

They also introduced optical routers (and a lot of glass). And seriously responsive KVMs for editors (all on one fibre). And there’s 192 cores of fibre between sites as well. This has really changed the game for their editors in terms of efficiency. Between the two sites they have 320Gbps as a starting point for bandwidth, with a lot of room to scale up if required. 10Gbps to every server in the racks. Cisco 5596s delivering 8Gbps FC to servers as required. One other cool thing they have is monitors on top of the extra-high racks to keep on eye on what’s happening on the servers. They now have 136 recording channels (increased from 80 previously).  They’re using an SDCAM HD50 codec – 50Mbps (16 audio tracks). The SANs currently hold 4500hrs of content, but Tab expects this to scale 10x in the next 7 years. Everything sits on a StorNext 5.1 file system.

Matching the end-user experience (read expectations) and the technology available today has been a key design goal of the build. Sometimes they need bare-metal, but UCS works fine for that too. They don’t have a lot of people looking after this either, so they need to use everything VMware offers to keep it running smoothly. There’s also some AWS in the mix.

The challenge in metadata management is sticking to a taxonomy and not changing it (they haven’t changed it since 2011). The tighter they are with the metadata, the easier it is to find what they need.

And that’s about it. Sorry I got a bit carried away, but I found this session fascinating. 5 stars.

VMware – VMworld 2014 – STO2197 – Storage DRS: Deep Dive and Best Practices

Disclaimer: I recently attended VMworld 2014 – SF.  My flights and accommodation were paid for by myself, however VMware provided me with a free pass to the conference and various bits of swag. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

banner-hero-mon-stream

STO2197 – Storage DRS: Deep Dive and Best Practices

I’ll try not to make this a transcript, but rather will cover the highlights of the session. My apologies if the notes are a little rough. But firstly, a picture.

VMW_Monday_SDRS

 

Storage DRS: Deep Dive and Best Practices
Luis Useche, VMW
Mustafe Uysal, VMW

This was covered with VMware’s standard disclaimer (i.e. that some of the features covered were not yet released, etc.). This is obviously worth noting when you want to try this at home. Which leads me to the next point – get yourself on to the vSphere 6 beta.

Luis starts by presenting the problem: you have multiple workloads on shared storage, then backup starts and takes all the IO and kills your production IO. Ideally, you want it evenly distributed, with your backup using just enough IO to finish on time.

This can be achieved with Storage Performance Controls:
Shares: relative importance of VMs – IOPS allocated in proportion.
In this scenario, the backup can still takeover, so you want to lower the priority. But you still might be dedicating too much to your backup when you really want your apps to maximum access to IO.

Limit: Maximum IOPS allowed per VM

Reservations: Minimum IOPS per VM

 

With ESX 5.5 a new IO Scheduler (mClock) was introduced.

In this example, the allocation of A and B will be in 1000:3000 ratio subject to reservation and limit constraints. [imagine this is a pretty table]
IOPS (Capacity)   VM A    VM B
1000                       250       750
500                         200       300
200                         133.33     66.66

Compared to the old scheduler, supports reservations, shares and limits. Interestingly, it also breaks large IOs into 32KB (for accounting purposes, not really). Run the following command via esxcli to check out your settings.

esxcli system settings advanced list -o /Disk/SchedulerWithReservation

This all works great when the datastore is used by one host. When you have multiple hosts, you need to combine local IO scheduler with SIOC. You can thus

  • control congestion on datastores
  • detect congestion based on average IO latency
  • once congestion is detected, throttle IOs
  • The VMs shares, reservations and limits on each host will determine priority

The default threshold is 90% of peak IOPs capacity. You can change this to another % or absolute (ms) value. As far as what to set for the congestion threshold – high is good for overall throughput, low ms value better for latency. If reservations are not satisfied, SIOC will notify SDRS for further action.

  • IO reservations are at a cluster level
  • monitors SIOC reservation enforcement
  • balances reserved IOPS in cluster
  • reservations as VM placement constraints (hard or soft)
  • per-datastore reservable IOPS (can be manually overridden)

Mustafa goes on to cover a number of the features of SDRS.
Uses a Datastore Cluster to achieve:

  • Ease of storage management
  • Initial placement of VMs
  • Out of space avoidance
  • IO LOad Blancing
  • Virtual disk affinity (anti-affinity)
  • Datastore maintenance mode
  • Add datastores

So how does SDRS work with thin provisioning, dedupe and auto-tiering?
Mustafa uses a capacity management example. There is a great diagram that breaks down used vs provisioned vs allocated (actual) capacity. You can use PercentIdleMBInSpaceDemand, default – 25%. You can make this high or low (overcommit goes high).

 

Or you can use thin provisioning. So you can use either (hypervisor or datastore via array), but don’t use both.

With arrays – SDRS manages logical space usage.

  • VASA v1 integration handles space outage signal from the backing pool.
  • VASA v2 SDRS controls space usage in backing pools.

You can also overcommit with dedupe. SDRS supports this using actual capacity for placements. With VASA v2, SDRS will manage logical space while keeping virtual disks in the same dedupe pool.

What about Auto-tiering? Should you use SDRS with this?
The whole reason behind auto-tiering is to try and push latency down even as IO load increases. SIOC sets a threshold, as does SDRS. Auto-tiering isn’t magic though, there will come a time where you have to move workload off the datastore. SDRS automation controls are being introduced in the vSphere 6 beta.

How about SDRS integration with array-based replication?
The problem is SDRS didn’t understand different replication policies. Now, SDRS recommendations are in sync with replication policies. There’s also accounting of replication overhead due to Storage vMotion activity.

And vSphere Replication (VR)? Is that covered?
SDRS discovers VR-replicas in the datastore.
SDRS understands the space usage of replica disks.
Storage coordinates moves with VR.

SIOC Best Practices
Avoid mixing vSphere LUNs and non-vSphere LUNs on the same physical storage. If you do, SIOC will detect and raise an alarm.

Configure the host IO queue size with the highest allowed value. This provides maximum flexibility for SIOC throttling.

Keep the congestion threshold conservatively high. This will improve overall utilisation. Set it lower if latency is more important than throughput.

Datastore clusters should have:

  • similar performance (may not be identical)
  • similar capabilities

– data management
– backup

Connectivity
Provide the maximum possible host and datastore connectivity
More datastores in cluster = better space and IO balance
A larger datastore size can lead to better SDRS performance

Policy-based Management
You used to have to have datastores with identical storage profiles, now you can have a datastore with any profile.

 

And that’s all I have. Great session, very informative, 4 stars.