Random Short Take #13

Here are a few links to some random news items and other content that I found interesting. You might find them interesting too. Let’s dive in to lucky number 13.

Pure Storage Goes All In On Hybrid … Cloud

I recently had the opportunity to hear from Chadd Kenney about Pure Storage’s Cloud Data Services announcement and thought it worthwhile covering here. But before I get into that, Pure have done a little re-branding recently. You’ll now hear them referring to Cloud Data Infrastructure (their on-premises instances of FlashArray, FlashBlade, FlashStack) and Cloud Data Management (being their Pure1 instances).


The Announcement

So what is “Cloud Data Services”? It’s comprised of:

According to Kenney, “[t]he right strategy is and not or, but the enterprise is not very cloudy, and the cloud is not very enterprise-y”. If you’ve spent time in any IT organisation, you’ll see that there is, indeed, a “Cloud divide” in play. What we’ve seen in the last 5 – 10 years is a marked difference in application architectures, consumption and management, and even storage offerings.

[image courtesy of Pure Storage]


Cloud Block Store

The first part of the puzzle is probably the most interesting for those of us struggling to move traditional application stacks to a public cloud solution.

[image courtesy of Pure Storage]

According to Pure, Cloud Block Store offers:

  • High reliability, efficiency, and performance;
  • Hybrid mobility and protection; and
  • Seamless APIs on-premises and cloud.

Kenney likens building a Purity solution on AWS to the approach Pure took in the early days of their existence, when they took off the shelf components and used optimised software to make them enterprise-ready. Now they’re doing the same thing with AWS, and addressing a number of the shortcomings of the underlying infrastructure through the application of the Purity architecture.


So why would you want to run virtual Pure controllers on AWS? The idea is that Cloud Block Store:

  • Aggregates performance and reliability across many cloud stores;
  • Can be deployed HA across two availability zones (using active cluster);
  • Is always thin, deduplicated, and compressed;
  • Delivers instant space-saving snapshots; and
  • Is always encrypted.

Management and Orchestration

If you have previous experience with Purity, you’ll appreciate the management and orchestration experience remains the same.

  • Same management, with Pure1 managing on-premises instances and instances in the cloud
  • Consistent APIs on-premises and in cloud
  • Plugins to AWS and VMware automation
  • Open, full-stack orchestration

Use Cases

Pure say that you can use this kind of solution in a number of different scenarios, including DR, backup, and migration in and between clouds. If you want to use ActiveCluster between AWS regions, you might have some trouble with latency, but in those cases other replication options are available.

[image courtesy of Pure Storage]

Not that Cloud Block Store is available in a few different deployment configurations:

  • Test/Dev – using a single controller instance (EBS can’t be attached to more than one EC2 instance)
  • Production – ActiveCluster (2 controllers, either within or across availability zones)



Pure tell us that we’ve moved away from “disk to disk to tape” as a data protection philosophy and we now should be looking at “Flash to Flash to Cloud”. CloudSnap allows FlashArray snapshots to be easily sent to Amazon S3. Note that you don’t necessarily need FlashBlade in your environment to make this work.

[image courtesy of Pure Storage]

For the moment, this only being certified on AWS.


StorReduce for AWS

Pure acquired StorReduce a few months ago and now they’re doing something with it. If you’re not familiar with them, “StorReduce is an object storage deduplication engine, designed to enable simple backup, rapid recovery, cost-effective retention, and powerful data re-use in the Amazon cloud”. You can leverage any array, or existing backup software – it doesn’t need to be a Pure FlashArray.


According to Pure, you get a lot of benefits with StorReduce, including:

  • Object fabric – secure, enterprise ready, highly durable cloud object storage;
  • Efficient – Reduces storage and bandwidth costs by up to 97%, enabling cloud storage to cost-effectively replace disk & tape;
  • Fast – Fastest Deduplication engine on the market. 10s of GiB/s or more sustained 24/7;
  • Cloud Native – Native S3 interface enabling openness, integration, and data portability. All Data & Metadata stored in object store;
  • Single namespace – Stores in a single data hub across your data centre to enable fast local performance and global data protection; and
  • Scalability – Software nodes scale linearly to deliver 100s of PBs and 10s of GBs bandwidth.


Thoughts and Further Reading

The title of this post was a little misleading, as Pure have been doing various cloud things for some time. But sometimes I give in to my baser instincts and like to try and be creative. It’s fine. In my mind the Cloud Block Store for AWS piece of the Cloud Data Services announcement is possibly the most interesting one. It seems like a lot of companies are announcing these kinds of virtualised versions of their hardware-based appliances that can run on public cloud infrastructure. Some of them are just encapsulated instances of the original code, modified to deal with a VM-like environment, whilst others take better advantage of the public cloud architecture.

So why are so many of the “traditional” vendors producing these kinds of solutions? Well, the folks at AWS are pretty smart, but it’s a generally well understood fact that the enterprise moves at enterprise pace. To that end, they may not be terribly well positioned to spend a lot of time and effort to refactor their applications to a more cloud-friendly architecture. But that doesn’t mean that the CxOs haven’t already been convinced that they don’t need their own infrastructure anymore. So the operations folks are being pushed to migrate out of their DCs and into public cloud provider infrastructure. The problem is that, if you’ve spent a few minutes looking at what the likes of AWS and GCP offer, you’ll see that they’re not really doing things in the same way that their on-premises comrades are. AWS expects you to replicate your data at an application level, for example, because those EC2 instances will sometimes just up and disappear.

So how do you get around the problem of forcing workloads into public cloud without a lot of the safeguards associated with on-premises deployments? You leverage something like Pure’s Cloud Block Store. It overcomes a lot of the issues associated with just running EC2 on EBS, and has the additional benefit of giving your operations folks a consistent management and orchestration experience. Additionally, you can still do things like run ActiveCluster between and within Availability Zones, so your mission critical internal kitchen roster application can stay up and running when an EC2 instance goes bye bye. You’ll pay a bit less or more than you would with normal EBS, but you’ll get some other features too.

I’ve argued before that if enterprises are really serious about getting into public cloud, they should be looking to work towards refactoring their applications. But I also understand that the reality of enterprise application development means that this type of approach is not always possible. After all, enterprises are (generally) in the business of making money. If you come to them and can’t show exactly how they’ save money by moving to public cloud (and let’s face it, it’s not always an easy argument), then you’ll find it even harder to convince them to undertake significant software engineering efforts simply because the public cloud folks like to do things a certain way. I’m rambling a bit, but my point is that these types of solutions solve a problem that we all wish didn’t exist but it does.

Justin did a great write-up here that I recommend reading. Note that both Cloud Block Store and StorReduce are in Beta with planned general availability in 2019.

Getting Started With The Pure Storage CLI

I used to write a lot about how to manage CLARiiON and VNX storage environments with EMC’s naviseccli tool. I’ve been doing some stuff with Pure Storage FlashArrays in our lab and thought it might be worth covering off some of the basics of their CLI. This will obviously be no replacement for the official administration guide, but I thought it might come in useful as a starting point.



Unlike EMC’s CLI, there’s no executable to install – it’s all on the controllers. If you’re using Windows, PuTTY is still a good choice as an ssh client. Otherwise the macOS ssh client does a reasonable job too. When you first setup your FlashArray, a virtual IP (VIP) was configured. It’s easiest to connect to the VIP, and Purity then directs your session to whichever controller is the current primary controller. Note that you can also connect via the physical IP address if that’s how you want to do things.

The first step is to login to the array as pureuser, with the password that you’ve definitely changed from the default one.

login as: pureuser
pureuser@10.xxx.xxx.30's password:
Last login: Fri Aug 10 09:36:05 2018 from 10.xxx.xxx.xxx

Mon Aug 13 10:01:52 2018
Welcome pureuser. This is Purity Version 4.10.4 on FlashArray purearray

“purehelp” is the command to run to list available commands.

pureuser@purearray> purehelp
Available commands:

If you want to get some additional help with a command, you can run “command -h” (or –help).

pureuser@purearray> purevol -h
usage: purevol [-h]

positional arguments:
    add                 add volumes to protection groups
    connect             connect one or more volumes to a host
    copy                copy a volume or snapshot to one or more volumes
    create              create one or more volumes
    destroy             destroy one or more volumes or snapshots
    disconnect          disconnect one or more volumes from a host
    eradicate           eradicate one or more volumes or snapshots
    list                display information about volumes or snapshots
    listobj             list objects associated with one or more volumes
    monitor             display I/O performance information
    recover             recover one or more destroyed volumes or snapshots
    remove              remove volumes from protection groups
    rename              rename a volume or snapshot
    setattr             set volume attributes (increase size)
    snap                take snapshots of one or more volumes
    truncate            truncate one or more volumes (reduce size)

optional arguments:
  -h, --help            show this help message and exit

There’s also a facility to access the man page for commands. Just run “pureman command” to access it.

Want to see how much capacity there is on the array? Run “purearray list –space”.

pureuser@purearray> purearray list --space
Name        Capacity  Parity  Thin Provisioning  Data Reduction  Total Reduction  Volumes  Snapshots  Shared Space  System  Total
purearray  12.45T    100%    86%                2.4 to 1        17.3 to 1        350.66M  3.42G      3.01T         0.00    3.01T

Need to check the software version or generally availability of the controllers? Run “purearray list –controller”.

pureuser@purearray> purearray list --controller
Name  Mode       Model   Version  Status
CT0   secondary  FA-450  4.10.4   ready
CT1   primary    FA-450  4.10.4   ready


Connecting A Host

To connect a host to an array (assuming you’ve already zoned it to the array), you’d use the following commands.

purehost create hostname
purehost create -wwnlist WWNs hostname
purehost list
purevol connect --host [host] [volume]


Host Groups

You might need to create a Host Group if you’re running ESXi and want to have multiple hosts accessing the same volumes. Here’re the commands you’ll need. Firstly, create the Host Group.

purehgroup create [hostgroup]

Add the hosts to the Host Group (these hosts should already exist on the array)

purehgroup setattr --hostlist host1,host2,host3 [hostgroup]

You can then assign volumes to the Host Group

purehgroup connect --vol [volume] [hostgroup]


Other Volume Operations

Some other neat (and sometimes destructive) things you can do with volumes are listed below.

To resize a volume, use the following commands.

purevol setattr --size 500G [volume]
purevol truncate --size 20GB [volume]

Note that a snapshot is available for 24 hours to roll back if required. This is good if you’ve shrunk a volume to be smaller than the data on it and have consequently munted the filesystem.

When you destroy a volume it immediately becomes unavailable to host, but remains on the array for 24 hours. Note that you’ll need to remove the volume from any hosts connected to it first.

purevol disconnect [volume] --host [hostname]
purevol destroy [volume]

If you’re running short of capacity, or are just curious about when a deleted volume will disappear, use the following command.

purevol list --pending

If you need the capacity back immediately, the deleted volume can be eradicated with the following comamnd.

purevol eradicate [volume]


Further Reading

The Pure CLI is obviously not a new thing, and plenty of bright folks have already done a few articles about how you can use it as part of a provisioning workflow. This one from Chadd Kenney is a little old now but still demonstrates how you can bring it all together to do something pretty useful. You can obviously extend that to do some pretty interesting stuff, and there’s solid parity between the GUI and CLI in the Purity environment.

It seems like a small thing, but the fact that there’s no need to install an executable is a big thing in my book. Array vendors (and infrastructure vendors in general) insisting on installing some shell extension or command environment is a pain in the arse, and should be seen as an act of hostility akin to requiring Java to complete simple administration tasks. The sooner we get everyone working with either HTML5 or simple ssh access the better. In any csase, I hope this was a useful introduction to the Purity CLI. Check out the Administration Guide for more information.

Pure//Accelerate 2018 – Wrap-up and Link-o-rama

Disclaimer: I recently attended Pure//Accelerate 2018.  My flights, accommodation and conference pass were paid for by Pure Storage via the Analysts and Influencers program. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here’s a quick post with links to the other posts I did covering Pure//Accelerate 2018, as well as links to other articles related to the event that I found interesting.


Gestalt IT Articles

I wrote a series of articles about Pure Storage for Gestalt IT.

Pure Storage – You’ve Come A Long Way

//X Gon Give it to Ya

Green is the New Black

The Case for Data Protection with FlashBlade



Here’re the posts I did during the show. These were mainly from the analyst sessions I attended.

Pure//Accelerate 2018 – Wednesday General Session – Rough Notes

Pure//Accelerate 2018 – Thursday General Session – Rough Notes

Pure//Accelerate 2018 – Wednesday – Chat With Charlie Giancarlo

Pure//Accelerate 2018 – (Fairly) Full Disclosure


Pure Storage Press Releases

Here are some of the press releases from Pure Storage covering the major product announcements and news.

The Future of Infrastructure Design: Data-Centric Architecture

Introducing the New FlashArray//X: Shared Accelerated Storage for Every Workload

Pure Storage Announces AIRI™ Mini: Complete, AI-Ready Infrastructure for Everyone

Pure Storage Delivers Pure Evergreen Storage Service (ES2) Along with Major Upgrade to Evergreen Program

Pure Storage Launches New Partner Program


Pure Storage Blog Posts

A New Era Of Storage With NVMe & NVMe-oF

New FlashArray//X Family: Shared Accelerated Storage For Every Workload

Building A Data-Centric Architecture To Power Digital Business

Pure’s Evergreen Delivers Right-Sized Storage, Again And Again And Again

Pure1 Expands AI Capabilities And Adds Full Stack Analytics



I had a busy but enjoyable week. I would have liked to get to more of the technical sessions, but being given access to some of the top executives and engineering talent in the company via the Analyst and Influencer Experience was invaluable. Thanks again to Pure Storage (particularly Armi Banaria and Terri McClure) for having me along to the show.

Pure//Accelerate 2018 – (Fairly) Full Disclosure

Disclaimer: I recently attended Pure//Accelerate 2018.  My flights, accommodation and conference pass were paid for by Pure Storage via the Analysts and Influencers program. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my notes on gifts, etc, that I received as a conference attendee at Pure//Accelerate 2018. This is by no stretch an interesting post from a technical perspective, but it’s a way for me to track and publicly disclose what I get and how it looks when I write about various things. I’m going to do this in chronological order, as that was the easiest way for me to take notes during the week. While everyone’s situation is different, I took 5 days of unpaid leave to attend this conference.



My wife dropped me at the BNE domestic airport and I had some ham and cheese and a few coffees in the Qantas Club. I flew Qantas economy class to SFO via SYD. The flights were paid for by Pure Storage. Plane food was consumed on the flight. It was a generally good experience, and I got myself caught up with Season 3 of Mr. Robot. Pure paid for a car to pick me up at the airport. My driver was the new head coach of the San Francisco City Cats ABA team, so we talked basketball most of the trip. I stayed at a friend’s place until late Monday and then checked in to the Marriott Marquis in downtown San Francisco. The hotel costs were also covered by Pure.



When I picked up my conference badge I was given a Pure Storage and Rubrik co-branded backpack. On Tuesday afternoon we kicked off the Analyst and Influencer Experience with a welcome reception at the California Academy of Sciences. I helped myself to a Calicraft Coast Kolsch and 4 Aliciella Bitters. I also availed myself of the charcuterie selection, cheese balls and some fried shrimp. The most enjoyable part of these events is catching up with good folks I haven’t seen in a while, like Vaughn and Craig.

As we left we were each given a shot glass from the Academy of Sciences that was shaped like a small beaker. Pure also had a small box of Sweet 55 chocolate delivered to our hotel rooms. That’s some seriously good stuff. Sorry it didn’t make it home kids.

After the reception I went to dinner with Alastair Cooke, Chris Evans and Matt Leib at M.Y. China in downtown SF. I had the sweet and sour pork and rice and 2 Tsingtao beers. The food was okay. We split the bill 4 ways.



We were shuttled to the event venue early in the morning. I had a sausage and egg breakfast biscuit, fruit and coffee in the Analysts and Influencers area for breakfast. I need to remind myself that “biscuits” in their American form are just not really my thing. We were all given an Ember temperature control ceramic mug. I also grabbed 2 Pure-flavoured notepads and pens and a Pure Code t-shirt. Lunch in the A&I room consisted of chicken roulade, salmon, bread roll, pasta and Perrier sparkling spring water. I also grabbed a coffee in between sessions.

Christopher went down to the Solutions Expo and came back with a Quantum sticker (I am protecting data from the dark side) and Veeam 1800mAh keychain USB charger for me. I also grabbed some stickers from Justin Warren and some coffee during another break. No matter how hard I tried I couldn’t trick myself into believing the coffee was good.

There was an A&I function at International Smoke and I helped myself to cheese, charcuterie, shrimp cocktail, ribs, various other finger foods and 3 gin and tonics. I then skipped the conference entertainment (The Goo Goo Dolls) to go with Stephen Foskett and see Terra Lightfoot and The Posies play at The Independent. The car to and from the venue and the tickets were very kindly covered by Stephen. I had two 805 beers while I was there. It was a great gig. 5 stars.



For breakfast I had fruit, a chocolate croissant and some coffee. Scott Lowe kindly gave me a printed copy of ActualTech’s latest Gorilla Guide to Converged Infrastructure. I also did a whip around the Solutions Expo and grabbed:

  • A Commvault glasses cleaner;
  • 2 plastic Zerto water bottles;
  • A pair of Rubrik socks;
  • A Cisco smart wallet and pen;
  • Veeam webcam cover, retractable charging cable and $5 Starbucks card; and
  • A Catalogic pen.

Lunch was boxed. I had the Carne Asada, consisting of Mexican style rice, flat iron steak, black beans, avocado, crispy tortilla and cilantro. We were all given 1GB USB drives with a copies of the presentations from the A&I Experience on them as well. That was the end of the conference.

I had dinner at ThirstBear Brewing Co with Alastair, Matt Leib and Justin. I had the Thirstyburger, consisting of Richards Ranch grass-fed beef, mahón cheese, chorizo-andalouse sauce, arugula, housemade pickles, panorama bun, and hand-cut fried kennebec patatas. This was washed down with two glasses of The Admiral’s Blend.



As we didn’t fly out until Friday evening, Alastair and I spent some time visiting the Museum of Modern Art. vBrownBag covered my entry to the museum, and the Magritte exhibition was terrific. We then lunched in Chinatown at a place (Maggie’s Cafe) that reminded me a lot of the Chinese places in Brisbane. Before I went to the airport I had a few beers in the hotel bar. This was kindly paid for by Justin Warren. On Friday evening Pure paid for a car to take Justin and I to SFO for our flight back to Australia. Justin gets extra thanks for having me as his plus one in the fancier lounges that I normally don’t have access to.

Big thanks to Pure Storage for having me over for the week, and big thanks to everyone who spent time with me at the event (and after hours) – it’s a big part of why I keep coming back to these types of events.

Pure//Accelerate 2018 – Wednesday – Chat With Charlie Giancarlo

Disclaimer: I recently attended Pure//Accelerate 2018.  My flights, accommodation and conference pass were paid for by Pure Storage via the Analysts and Influencers program. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my notes from the “Chat with Charlie Giancarlo” session for Analysts and Influencers at Pure//Accelerate.


Chat with Charlie Giancarlo

You’ve said culture is important. How do you maintain it? “It’s the difference between hiring missionaries and hiring mercenaries”. People are on a mission, out there to prove something, drive the company forward. There’s not an exact, formulaic way of doing it. Hire people you have experience with in the industry. Pure does have a good interview process. It tends to bring out different sides of the interviewee at the same time. We use objective tests for engineering people. Check on the cultural backgrounds of sales talent.

Are there any acquisitions on the horizon? Gaps you want to fill? We have an acquisition strategy. We’ve decided where we’re going, identified the gaps, looked at buy versus build versus partner. There’s a lot of research to do around strengths and weaknesses, fit, culture. There are different types of companies in the world. Get rich quick, play in your own sandbox, people who are on a mission. We have gaps in our product lines. We could be more cloud, more hyper-converged. FlashBlade is still not 3.0 product.

Other companies are under pressure to be more software or cloud. Given your hardware background, how’s that for you? Our original product was software on commodity hardware. All except one SDS vendor sells hardware. At the end of the day, selling pure software that goes on any box is insanely hard to achieve. Majority of SDS still sell on hardware – one throat to choke. Some customers, at scale, might be able to do this with us. Why build our own hardware? 4RU and 1PB versus 1.5 racks. We understood the limitations of commodity hardware. We’re not wedded to the hardware – we’re wedded to providing more value-add to our customers.

Has anyone taken you up on the offer? Some are enquiring.

One of your benefits has been focus, one thing to sell. You just mentioned your competitors don’t have that. Now you’re looking at other stuff? We’re making data easier and more economic to consume. Making the entire stack easier to consume. When I say more HCI, what do I mean? Box with compute, storage and network and you can double it, etc. Another way to look at HCI is a single pane of glass for orchestration, automated discovery, ease of use issue. Customers want us to extend beyond storage.

Single throat to choke and HCI. You provide the total stack, or become an OEM. I have no intention of selling compute. It’s controlled by the semi-conductor company or the OS company.

How about becoming an OEM provider? If they were willing, I’d be all ears. But it’s rare. Dell, Cisco, they’re not OEM companies. Margin models are tough with this.

Developing international sales? Our #2 goal is to scale internationally. Our goals align the company around a few things. It’s not just more sales people. It’s the feature set (eg ActiveCluster). ActiveCluster is more valuable in Europe than anywhere else. US – size is difficult (distance). In Europe they have a lot of 100km apart DCs. Developing support capability. Scaling marketing, legal, finance. It’s a goal for the whole company.

The last company to get to $1B was NetApp. What chance does Pure have to make it to $5B? Well, I hope we do. It’s my job. Storage? That’s a terrible business! Friends in different companies have a lot of different opinions about it. Pure could be the Arista of storage? The people who are engaged in storage don’t believe in storage anymore. They’re not investing in the business. It’s a contrarian model. Compete in business, not just tech. We’re investing 20% in R&D. You need to invest a certain amount in a product line. They have a lot of product lines. We could be bought – we’re a public company. But Dell won’t buy. HPE have bought Nimble. Hitachi don’t really buy people. Who does that leave? I think we have a chance of staying independent.

You could buy somebody. I believe we have a very good sales force. There are a lot of ways to build an acquisition strategy. We have a good sales force.

You’re a public company. You haven’t been doing well. What if Mr Elliott comes into your company? (Activist investor). Generally they like companies with lots of cash. Or companies spending too much on R&D without getting results. We’re growing fast. We just posted 40% profit. Puritans might believe our market cap should be higher. The more we can show that we grow, the more exciting things might be. I don’t think we’re terribly attractive to an activist right now.

Storage is not an interesting place to be. But it’s all about data. Do you see that shifting with investors? What would cause that? I believe we need to innovate too. I think that the investors would need to believe that some of the messages we’re sending today, and over the next year, create an environment where our long term growth path is higher and stronger than it is today. Sometimes its sheer numbers, not storyline. The market believes that NetApp, EMC, etc that they can cause pricing and growth challenges for us for a long time. We need them to believe we’re immune to those challenges.

How about China as a marketplace? China as a competitive threat with new technologies? China is a formidable country in every respect. Competition, market. It’s more difficult than it was 10 years ago as a market. Our administration hasn’t help, China has put a lot of rules and regulations in place. I wish we’d focus on those, not the trade deficit. It’s a market we’re experimenting in. If it only works out as well as our competitors can achieve, it may not be worthwhile. And the issue of competition. I worry about Huawei, particularly in third world countries. Viable, long-lasting commercial concerns. In Europe it’s a bit different. The Chinese are very innovative. The US does well because of a market of 300 million, China has 1.4 billion people.

Joe Tucci said 4-5 years ago that the industry was merging. He said you can’t survive as a small player. How many times have we seen this picture? Huge conglomerates falling apart under their own weight. I hate to disagree with Joe. It’s a misunderstanding of scale. It’s about individual products and capabilities, not the size of the business. If you’re just big, and not growing, you no longer have scale. All you’ve done is create a large company with a lot of under scaled products. Alan Kay “perspective is worth 40 IQ points” [note: it’s apparently 80, maybe I misheard].

Interesting session. 4 stars.

Pure//Accelerate 2018 – Thursday General Session – Rough Notes

Disclaimer: I recently attended Pure//Accelerate 2018.  My flights, accommodation and conference pass were paid for by Pure Storage via the Analysts and Influencers program. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from Thursday’s General Session at Pure//Accelerate 2018.


Dave Hatfield

Dave Hatfield takes the stage, reports that there have been over 10000+ viewers and participants for the show. Cast your minds back to the “Summer of love” in 1968. This was also the time of the first big tech demo – “The Mother of All Demos” by Doug Engelbart – and it included the introduction of the mouse, network computing, hypertext linking, collaboration, multiple windows. You can see the clip here.

Pure is about embracing the tools of transformation.


Dr Kate Darling

Dr Kate Darling (MIT Media Lab) then takes the stage. She is a researcher with expertise in AI and robotics. She just flew in from Hong Kong. She mentions she had a baby 6 months ago. People say to her “Kate, it must be so interesting to watch your baby develop and compare it to AI development”. She says “[m]y baby is a million times more interesting than anything we’ve developed”.

AI is going to shape the world her baby’s growing up in. Like electricity, we don’t know how it will shape things yet. Some of the applications are really cool. A lot of it is happening behind the scenes. E.g. They took a Lyft to the airport and the driver was using Waze (which uses AI). There’s a bit of hype that goes on, and fear that AI might self-evolve and kill us all. This distracts from the benefits. And the actual problems we face right now (privacy, security, etc). Leads people to over-estimate where we are right now in terms of development.

She works in robotics. We’ve been doing this for centuries. We’re a long way from them taking over the world and killing us all. If you search for AI (via google images) you see human brain / robots pictures. Constantly comparing AI to human intelligence. This image is heavily influenced by sci-fi and pop culture. Automation will have an impact on labour markets. But AI is not like human intelligence. We’ve developed AI that is much smarter than people. But the AI is also a lot dumber. E.g. Siri, I’m bleeding, call me an ambulance. Ok, I’ll call you “an ambulance” from now on.

[image source http://www.derppicz.com/siri-call-me-an-ambulance/]

We’ve been using animals for 1000s of years, and we still use them. E.g., Dolphins for echo-location. Autonomous and unpredictable agents. Their skills are different to ours, and they can partner with us and extend our abilities. We should be thinking outside of the “human replacement” box.


  • Japan looks to AI to simplify patent screening
  • Recognise patterns in peoples’ energy consumption
  • Spam filters

Work in human – robot interaction. People’s psychological reactions to physical robots. Treat them like they’re alive, even though they’re machines. Perceive movement in our personal space as intent. The Roomba is really dumb. Military robots – soldiers become attached to bomb disposal robots. Paro Robotics – seal used in nursing homes. A lot of people don’t like the idea of robots for them. But this replaces animal therapy, not human care.

AI can shape how we relate to our tools, and how we relate to each other. The possibilities are endless.

If you’re interested in AI. It’s kind of a “hypey buzzword” thrown around at conferences. It’s not a method and more of a goal. Most of what we do is machine learning. eg. Hot dog example from Silicon Valley. If you’re into AI, you’ll need data scientists. They’re in high demand. If you want to use AI in your business, it’s important to educate yourself.

Need to be aware of some of the pitfalls, check out “Weapons of Math Destruction” by Cathy O’Neill.

There are so many amazing new tools being developed. OSS machine learning libraries. There’s a lot to worry about as a parent, but there’s a lot to look forward to as well. eg. AI that sorts LEGO. Horses replaced by cars. Cars now being replaced by a better version of an autonomous horse.


Dave Hatfield

Dave Hatfield takes the stage again. How can you speed up tasks that are mundane so you can do things that are more impactful? You need a framework and a way to ask the questions about the pitfalls. DevOps – institutionalised knowledge of how to become software businesses. Introduces Jez Humble.


Jez Humble

Why does DevOps matter? 

The enterprise is comprised of business, engineering, and operations. The idea for a project occurs, it’s budgeted, delivered and thrown over the wall to ops. Who’s practicing Agile? All about more collaboration. Business people don’t really like that. Now delivering into production all the time and Operations aren’t super happy about that. Operations then create a barrier (through change management), ensuring nothing ever changes.

How does DevOps help?

No real definition. The DevOps Movement is “a cross-functional community of practice dedicated to the study of building, evolving and operating rapidly changing, secure, resilient systems at scale”. There’s some useful reading (Puppet’s State of DevOps Reports) here, here, and here.

Software delivery as a competitive advantage

High performers were more than twice as likely to achieve or exceed the following objectives

  • Quantity of products or services
  • Operating efficiency
  • Customer satisfaction
  • Quality of products or services provided
  • Achieving organisational and mission goals
  • Measures that demonstrate to external parties whether or not the organisation is achieving intended results

IT Performance

  • Lead time for changes
  • Release frequency
  • Time to restore service
  • Change fail rate

We’re used to thinking about throughput and stability and a trade-off – that’s not really the case. High performers do both.

2016 IT performance by Cluster 

(From the 2016 report)

  High IT Performers Medium IT Performers Low IT Performers
Deployment Frequency

For the primary application or service you work on, how often does your organisation deploy code?

On demand (multiple deploys per day) Between once per week and once per month Between once per month and every 6 months
Lead time for changes

For the primary application or service you work on, what is your lead time for changes (i.e. how long does it take to go from code commit to code successfully running in production)?

Less than an hour Between one week and one month Between one month and 6 months
Mean time to recover (MTTR)

For the primary application or service you work on,how long does it generally take to restore service when a service incident occurs (e.g. unplanned outage, service impairment)?

Less than an hour Less than one day Less than one day
Change failure rate

For the primary application or service you work on, what percentage of the changes either result in degraded service or subsequently require remediation (e.g. lead to service impairment, service outage, require a hotfix, rollback, fix forward, patch)?

0-15% 31-45% 16-30%


“It’s about culture and architecture”. DevOps isn’t about hiring “DevOps experts”. Go solve the boring problems that no-one wants to do. Help your people grow. Grow your own DevOps experts. Re-orgs sucks the energy out of company. They often don’t produce better outcomes. Have people who need to work together, sit together. The cloud’s great, but you can do continuous delivery with mainframes. Tools are great, but buying “DevOps tools” doesn’t change the outcomes. “Please don’t give developers access to Prod”. DevOps is learning to work in in small batches (product dev and org change). You can’t move fast with water / scrum / fall.

Architectural Outcomes

Can my team …

  • Make large-scale changes to the design of its system without the permission of somebody outside the team or depending on other teams?
  • Complete its work without needing fine-grained communication and coordination with people outside the team?
  • Deploy and release its product or service on demand, independently of other services the product or service depends on?
  • Do most of its testing on demand, without requiring an integrated test environment?
  • Perform deployments during normal business hours with negligible downtime?

Deploying on weekends? We should be able to deploy during the day with negligible downtime

  • DevOps is learning to build quality in. “Cease dependence on mass inspection to achieve quality. Improve the process and build quality into the product in the first place”. W. Edwards Deming.
  • DevOps is enabling cross-functional collaboration through value streams
  • DevOps is developing a culture of experimentation
  • DevOps is continually working to get better

Check out the Accelerate book from Jez.

The Journey

  • Agree and communicate measurable business goals
  • Give teams support and resources to experiment
  • Talk to other teams
  • Achieve quick wins and share learnings
  • Never be satisfied, always keep going


Dave Hatfield

Dave Hatfield takes the stage again. Don’t do re-orgs? We had 4 different groups of data scientists pop up in a company of 2300. All doing different things. All the data was in different piggy banks. We got them all to sit together and that made a huge difference. “We need to be the ambassadors of change and transformation. If you don’t do this, one of your competitors will”.

Please buy our stuff. Thanks for your time. Next year the conference will be in September. We’re negotiating the contracts right now and we’ll let you know soon.

Solid session. 4.5 stars.

Pure//Accelerate 2018 – Wednesday General Session – Rough Notes

Disclaimer: I recently attended Pure//Accelerate 2018.  My flights, accommodation and conference pass were paid for by Pure Storage via the Analysts and Influencers program. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my rough notes from Wednesday’s General Session at Pure//Accelerate 2018.

[A song plays. “NVMe” to the tune of Naughty By Nature’s OPP]


Charlie Giancarlo

Charlie Giancarlo (Pure Storage CEO) takes the stage. We share a mission: to power innovation. Storage is really important part of making that mission happen. We’re in the zettabyte era, we don’t even talk about EB anymore. It’s not the silicon age, or the Internet age, or the social age. We’re talking about the original gold rush of 1849. The amount gold in data is unlimited. We need the tools to pick the gold out of the data. The data heroes are us. And we’re announcing a bunch of tools to mine the gold.

Who am I? What has allowed us to get to success? Where are we going?

I’m new here, I’ve just gotten through his 3rd quarter. I’ve been an nngineer, entrepreneur, CTO, equity partner – entirely tech focused. I’ve made a living looking at innovation on the basis of looking at it as a 3-legged stool.

What are the components that advance tech?

  • Networking
  • Compute (Processing)
  • Storage

They advance on their own timelines, and don’t always keep pace, and the industry someitmes gets out of balance. Data centre and system architectures adjust to accommodate this.


Density has multiplied by a factor of 10 in 10 years (slow down of Moore’s Law), made up for this by massive scale in the DC


Multiplied by 10 in 8 years, 10Gbps about 10 years ago, and 100Gbps about 2 years ago


  • Multiplied by a factor of 1000
  • Storage vendors just haven’t kept up
  • Storage on white boxes?

Pure came in and bought balance to this picture, allowing storage to keep up with networking and compute.

It’s all about

  • Business model
  • Customer experience
  • Technology

“Data is the most important asset that you have”

Pure Storage became a billion dollar company in revenue last year (8 years in, 5 years after it introduced its first product). It’s cashflow positive and “growing like a bat out of hell” with over 4800 customers. Had less than 500 customers just 4 years ago. And a large chunk of customers are cloud providers. Also in 30% of the Fortune 500.


The software economy sits on top of the compute / network / storage stool. Companies are becoming more digital. Last year they talked about Domino’s, and this year they’re using AI to analyse your phone calls. Your calls are being answered by an AI engine that takes your order. An investment bank has more computer engineers and developers than they have investment bankers. Companies need to feed these apps with data. Data is where the money is.

DC Architectures

  • Monolithic scale-up – client / server (1990s)
  • Virtualised – x86 + virtualisation (2000s)
  • Scale-out  – cloud (2010s)

Previously big compute – apps are rigid, now there’s big data – apps are fluid, data is shared

“Data-centric” architecture

  • Faster
  • 100% shared
  • Simpler
  • Built for rapid innovation

Dedicated storage and stateless compute


  • Large booking, travel, e-commerce site
  • PAIGE.AI – cancer pathology – digitised samples from the last decade
  • Man AHL – economy and stock market modelling

Behind all these companies is a “data hero”

Over 80% of CxOs believe that the speed of analysing data would be one of their biggest competitive issues, but CIOs worried about not being able to keep up with data coming in to the enterprise.

“We empower innovators to build a better word with data”

Beyond AFA

  • Modern data pipeline
  • A whole new model
  • Pure “on-demand”
  • The AI Era

“New meets Now”

It takes great people to make a great company – the amazing “Puritans”. Pure have a NPS score of 83.7 – best in B2B.


Matt Kixmoeller

Matt Kixmoeller takes the stage. We need a new architecture to unlock the value of data. Back in 2009. Michael Jackson died, Obama was in, Fusion-IO had just started. Pure came along and had the idea of building an AFA. Today we’re going to bring you the sequel

  • There’s basically SAN / NAS and DAS (which has seen a resurgence in web scale era)
  • DAS reality – many apps, rigid scaling, either too much storage or too much compute

New technologies to re-wire the DC

  • Diverse, fast compute (CPU, GPU, FPGA)
  • Fast networks and protocols (RoCE, NVMe-oF)
  • Diverse SSD
  • Eliminates the outside the box penalty
  • Gets CPUs totally focussed on applications

What if we can finally unite SAN and DAS into a data-centric architecture?

Gartner have identified “Shared accelerated storage”. “The NVMe-oF protocol … will help balance the performance and simplicity of direct-attached storage (DAS) with the scalability and manageability of shared storage”.

“Tier 0”? – they’re making the same mistake again. Pure are focused on shared accelerated storage available for all.

Tomorrow can look like this

  • Diskless, stateless, elastic compute (continuers, VMs, bare metal)
  • Shared accelerated storage (block, file, object)
  • Fast, converged networks
  • Open, full-stack orchestration


Keith Martin

Keith Martin (ServiceNow) takes the stage

  • Dealing with high volumes of data
  • Tremendous growth in net new data
  • 18 months ago, doing basic web scale, DAS architecture
  • Filling up DCs at a very fast clip
  • Stopped and analysed everything there was

What happens in an hour in the DC?

In one hour our customers:

  • 7.5 million performance analytics scores computed
  • 730,000 configuration items added
  • 274,000 notifications sent
  • 76,000 assets added
  • 49,200 live feed messages
  • 36,300 change requests
  • 15,600 workflow activities

Every hour of the day our engineering teams:

  • Develop code across the globe in 9 global develoipment locations (SD, SC, SF, Kirkland, London, Amsterdam, Tel Aviv, Hyderabad, Bangalore)
  • Use 450 copies of ServiceNow for quality engineering testing
  • Run 100,000 automated quality engineering tests

In one hour on our infrastructure

  • 25 billion database queries
  • 112 million HTTP requests
  • 2.5 million emails
  • 25.3 million API calls
  • 493TB of backups

We were going through

  • 30K hard drives
  • 3500+ servers
  • >2000 failed HDDs per year

CPU time was being consumed with backup data movement and restore times were becoming longer and longer. They started to look at the FlashBlade. With its small footprint and low power it was a really interesting option for them. It was really easy to setup and use. They let the engineers out of their cages to play with it in the lab and found it was surprisingly hard to break. So they’ve decided to start using FlashBlade in production as their standard for protection data.

Achieving 3x density now

Each rack has:

  • 30 1RU servers
  • 1000 compute cores
  • 1.5PB effective Flash

Decided to test and implement FlashArray as well and they’re excited about FlashArray//X. ServiceNow cares about uptime. Pure has the best non-disruptive upgrade, expansion and repair model. DAS can prove to be expensive at scale.


Matt Kixmoeller

Kix takes the stage again

  • 2016: FlashBlade – the world’s first AFA for big data
  • 2017: FlashArray//X

Introducing the FlashArray//X Family

  • //X10
  • //X20
  • //X50
  • //X70
  • //X90


Bill Cerreta

Bill Cerreta takes the stage.

  • The FlashArray was launched in 2012, Purity was built to optimise Flash
  • //M chassis designed for NVMe
  • Deep integration of software and hardware

Where are we going with Flash?

SCM, QLC. We’ve eliminated translation layers. The X//90, for example, has

  • Dual-Protocol controllers – speaks to both SSD and NVMe
  • The 10 through 90 have 25GbE onboard
  • Everything’s NVMe/oF ready and this will be added via software later in the year
  • Double the write bandwidth of //M
  • This year, they’re all in on //X
  • 7 generations of evergreen, non-disruptive upgrades [photo]
  • //X makes everything faster (compared to //M)

Neil Vachharajani takes the stage briefly to talk MongoDB on shared accelerated storage.

Kix continues.

Priced for mainstream adoption

  • Early attempts at NVMe cost 10x more than AFAs
  • //X, when introduced last year, was 25% more than //M
  • $0 premium for //X over //M on an effective capacity basis

[Customer video – Berrios]


Jason Nadeau

Jason Nadeau takes the stage. Most infrastructure wasn’t built to allow data to flow freely.

  • 10s of products
  • Complex design
  • Silos, difficult to share


Data-centric Architecture

  • Consolidated and simplified
  • Real-time
  • On-demand and self-driving
  • Ready for tomorrow
  • Multi-cloud


  • FlashArray
  • FlashBlade
  • FlashStack
  • AIRI

API-first model and software at the heart of the architecture.


Sandeep Singh

Sandeep Singh takes the stage. A lot of companies have managed to virtualise. A lot have managed to “flash-ify”. But a lot of them have yet to automate and “service-ize”, to “container-ize”, or to adopt multi-cloud.

Automate and service-ize – on every cloud platform

  • VMware SDDC – VMware SDDC validated design
  • Open automation – pre-built open full-stack automation toolkits
  • Openshift PaaS – container-based reference architecture

Simon Dodsley takes the stage to talk with Sandeep about MongoDB deployments in less than a minute (down from 5 days).

Sandeep continues. Container adoption is increasing quickly but there’s a lack of storage support for persistent containers. Pure have container plug-ins for Docker, Kubernetes. Containerized apps want to consume storage as-a-service. Introducing Pure Service Orchestrator.


Introduced ActiveCluster last year. Snapshots and snapshot mobility (portable snapshots introduced last year) are important.

  • Snap to NFS is generally available now
  • CloudSnap to AWS S3 (available in late 2018)
  • DeltaSnap open API (Veeam, Catalogic, actifiio, CommVault, Rubrik, Cohesity)


Jason Nadeau

Jason Nadeau comes back on stage. Data as-a-service consumption. Leases aren’t pay per use and aren’t a service-like experience

Introducing Evergreen Storage Service (ES2)

  • Pay per used GB
  • True open
  • Terms as short as 12 months
  • Always evergreen
  • Onboard in days
  • Always “better-than-cloud” economics

Capex with Evergreen storage, Opex with ES2

[Video on PAIGE.AI]


Matt Burr

Matt Burr takes the stage. Unlocking the value of what was once cold data. New era demands a new data mindset.

  • How has the value of data changed?
  • How can you extract that value?
  • How can you get started today?

A robot will replace a human surgeon. A machine has learned to adapt faster than the human brain can. More and more data will live in the hotter tier. What tools can make this valuable? Change in the piggy bank – like data. But data is stuck in silos.

  • Data warehouse
  • Data lake
  • Modern data pipeline
  • AI data pipeline

$/GB used to make sense. We need new metrics. $/flops? $/simulation. Real value is generated by simplifying and accelerating the data flow. Build a data hub on FlashBlade. FlashBlade is 16 months old (GA in January 2016).

Invites NVIDIA’s Rob Ober on stage


Rob Ober

“The time has come for GPU computing”

  • Moore’s Law is flattening an awful lot
  • NVIDIA as “the AI computing platform”
  • “The more you buy, the more you save”

Traditional hyper scale cluster – 300 dual-CPU servers, 180KW power, or you can deploy 1 DGX-2, 10KW.

Science fiction is being made possible

  • Ultrasound retrofit
  • 5G beam
  • Molecule modelling 1/10 millionth $

Scaling AI

  • Design guesswork
  • Deployment complexity
  • Multiple points of support

AI scaling is hard, “not like your traditional infrastructure”


  • Jointly-validated solution
  • Faster, simplified deployment
  • Trusted expertise and support


Matt Kixmoeller

Kix takes the stage again. There’s a big gap in AI infrastructure, with customers spread across varying stages of journey from single server -> scale-out infrastructure. Introduces AIRI Mini and they’re also extending AIRI to Cisco.


Data Warehouse pitfalls

  • Performance not keeping up with data
  • Pricing extortions and over-provisioning
  • Inflexible appliances built for a single workload

Progress has to have a foundation.

Customer example of telco in Asia moving from Exadata to FlashBlade

Introducing FlashStack for Oracle Data Warehouse

Set your data free


Dave Hatfield

Dave Hatfield takes the stage. Thanks for coming. Over 5000 people in the Bill Graham Civic Auditorium and a lot watching on-line. Customers, partners. Be sure to check out the “petting zoo” (Solutions Pavilion). We wanted to have something that was “not your father’s storage show. Your father’s storage show happened last month”. Anyone been to a Grateful Dead show? It’s a community experience, you don’t know what will happen next.

And that’s a wrap.

Pure Storage ActiveCluster – Background Information

I’ve been doing a bunch of research into Pure Storage’s ActiveCluster product recently. I was all set to do an article that explains how to set it up and what a vSphere Metro Cluster looks like with it in place, but Cody Hosterman has beaten me to the punch. Given that it’s more his job than mine to write this stuff, and that he works for Pure Storage, I’m okay with that. In any case, I thought it would be worthwhile to jot down some thoughts and notes and share some links to Cody’s work, if for no other reason than it gives me an aggregation point for my thoughts.



I was lucky enough to be at Pure//Accelerate in 2017 when ActiveCluster was announced and covered it at a high level here. If you’re unfamiliar with ActiveCluster, it’s “a fully symmetric active/active bidirectional replication solution that provides synchronous replication for RPO zero and automatic transparent failover for RTO zero. ActiveCluster spans multiple sites enabling clustered arrays and clustered ESXi hosts to be used to deploy flexible active/active datacenter configurations.” (https://kb.vmware.com/s/article/51656).

[image courtesy of Pure Storage]



There are a few bits that are needed to make ActiveCluster work (besides Purity 5.0 on your FlashArray):

  • Replication Network;
  • Pods; and
  • Pure1 Cloud Mediator.


Replication Network

The replication network is used for the initial asynchronous transfer of data to stretch a pod, to synchronously transfer data and configuration information between arrays, and to resynchronise a pod. For this network to work, you should note the following criteria apply:

  • The maximum tolerable RTT is 5ms between clustered FlashArrays;
  • 4x 10GbE replication ports per array (two per controller). Two replication ports per controller are required to ensure redundant access from the primary controller to the other array;
  • 4x dedicated replication IP addresses per array;
  • A redundant, switched replication network. Direct connection of FlashArrays for replication is not supported; and
  • Adequate bandwidth between arrays to support bi-directional synchronous writes and bandwidth for resynchronizing. This depends on the write rate of the hosts at both sites.

So, you need to know (and understand) your workload, and you need some reasonable bandwidth between the arrays. This shouldn’t be unexpected, but it’s clearly well suited to a metro deployment.



A Pod is a replication namespace. Once a pod is created, the pod (and the volumes inside it) can be controlled from either FlashArray. If you create a snapshot, that snapshot is created on both sides. If snapshots exist on the volume before it’s added to the pod, those snapshots will be copied over when you add it in. The pod itself acts as a consistency group.


Pure1 Cloud Mediator

The Pure1 Cloud Mediator is used to arbitrate split-brain scenarios. It sits in the cloud and keeps an eye on stuff. Think of it as the Vanilla Ice of the ActiveCluster (before he went off and did moto-x and renovation shows). For “dark” sites, an on-premises mediator (VM) can also be deployed.


A Few Other Notes

A few other things to note about the behaviour of ActiveCluster:

  • Data reduction is performed independently between arrays. This is cool because you might have a mix of workloads at each data centre;
  • If the arrays lose connection to the mediator they will continue to serve data and synchronously replicate as long as array to array communication is active; and
  • If both arrays lose communication with each other and with the mediator, this is a dual failure and both the mirrored volumes become unavailable until communication with the other array or the mediator can be re-established. Non-mirrored volumes would not be affected in this instance and would still be accessible.


Disaster Avoidance Or Recovery?

Before deploying ActiveCluster, you should think about what kind of goal you’re trying to achieve. Disaster Avoidance assumes that some element of the primary site (Site A) is unavailable due to a disaster. DA uses synchronous replication only and requires a stretched cluster technology (such as VMware vSphere Metro Cluster) to provide active / active workload availability access both sites. Disaster Recovery, on the other hand, assumes that workloads are deployed in an active / passive configuration across sites. There are advantages to each approach, depending on what your recovery point objective (RPO) is, and what your recovery time objective (RTO) is. If you have a very low RPO and RTO requirement, the added expense of deploying a synchronous replication solution (not the Pure bit, but the supporting infrastructure) is worth it. If you have a greater tolerance for a higher RPO and / or RTO, an asynchronous solution (and the less stringent replication network requirements) may be a better fit for you.

You should also think about whether the topology you’re deploying is Uniform or Non-Uniform. A Uniform configuration provides hosts with access across Sites. This requires a bit more investment in terms of stretched FC fabrics (assuming you’re using FC and not iSCSI). This is generally the topology deployed for metro clusters.

You might decide, however, to deploy a Non-Uniform configuration for simpler disaster recovery. In that case, there’s no requirement to have cross-site FC links in place, but your time to recover will be impacted. You’ll also want to look at something like VMware Site Recovery Manager to orchestrate the recovery of workloads at the secondary site.



Whilst I think ActiveCluster is a very neat piece of technology, you should be doing a whole lot of thinking about other (possibly very boring) stuff before you take the plunge and decide to deploy vMSC sitting on an ActiveCluster environment. Disaster Avoidance (and Recovery) require a lot of planning and understanding of what’s important to your business before you deploy a solution. In the next little while I hope to be able to report back with some results from testing, and talk a bit about other protection scenarios, including metro clusters with asynchronous protection off to the side.

Storage Field Day Exclusive at Pure//Accelerate 2017 – FlashBlade 2.0

Disclaimer: I recently attended Storage Field Day 13.  My flights, accommodation and other expenses were paid for by Tech Field Day and Pure Storage. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.


These are my rough notes from a session I attended on “Day 0” of Pure//Accelerate 2017 (aka Storage Field Day Exclusive at Pure//Accelerate 2017). Videos of the session can be found here and you can grab my raw notes from here. I try to avoid dumping a bunch of dot points in Tech Field Day posts, but as this one covered some key announcements, I thought most of the information was useful presented as is.


A Year of FlashBlade

Par Botes spoke to us briefly about the progress made with FlashBlade in the past year. Originally internally codenamed “Wedding Cake” as it was a white box, the performance is “better than you think”, going from 500K IOPS and 15GB/s, to 1.2M IOPS and 15GB/s, to 1.5M IOPS and 16GB/s in the first six months since GA. I first encountered FlashBlade in the flesh at Storage Field Day 10. You can read more about that here.


Scaling Beyond 15 Blades

Rob Lee took some time to talk about scaling beyond 15 blades in a chassis.

  • Linear capacity scale – single namespace growing to dozens of PB scale
  • Linear IOPS and throughput – single namespace / IP scales IOPS and throughout with capacity
  • Preserve simplicity – more capacity adds IOPS & throughout with zero administration


Logical View

  • Fabric – scale raw bandwidth without adding management
  • Processing – software dynamically schedules processing resources globally
  • Data – place data as a single system across all blades


High level Architecture

  • Integrated Networking – combined internal and external networks. Load balance connections across all blades.
  • Distributed control – partition and distribute control of namespace, data, and metadata across all blades
  • Distributed Data – Distribute persistent data across all blades – high-frequency transactions in NVRAM and longer-lived data in N+2 erasure-coded flash

[image courtesy of Pure Storage]


FlashBlade Data Distribution

  • Wide-stripe erasure coding with +2 redundancy
  • Scaling past a single chassis
  • External network load balancer, inter-chassis network switching
  • Added external Fabric Module
  • intra-chassis network switching
  • External Flash Module is 2 rackmounted switches


Scaling Fabric Bandwidth

32port 100Gbs switch (1.6Tb/s north-south). Here’s a photo of Rob talking about this.

  • Controller load balancing and capacity load balancing
  • East-west traffic
  • NVRAM/Flash data access
  • Metadata coordination
  • 1.6Tbs across chassis, 300Gbs within chassis


Control Placement

Adding a blade is straightforward

  • Partitions rebalance to new blade – stops running on old blade and boots on new blade
  • No data movement required, only compute (data stays in-place)
  • Partition load balancing on a per-blade basis – not chassis constrained


Data Placement

  • Data erasure coded across n+2 RAID stripes
  • 15 blades – 13-wide stripe (11+2 parity shards)
  • RAID Groups are dynamic – selected as needed
  • RAID Groups can cross chassis boundaries

As you fill the chassis, it becomes beneficial to constrain the RAID group to a chassis. Note also that there’s enhanced resiliency (n+2 per chassis, without additional overhead) and reduced inter-chassis bandwidth requirements for rebuild operations.



  • Software creates parallelism and scale; hardware enables access to data
  • Software/hardware integration without tight coupling
  • Simplicity/reliability created by software control of the network fabric


Native Objects

Brian Gold presented a section of the session on Why Objects?

Next Generation Apps

  • Cloud-native development
  • Rich metadata databases


  • Large & streaming: AI training, media serving, analytics
  • Small & random: time-series metrics, real-time streams


  • No visible partitions
  • Unified management


Classic Object Gateways

  • Object API gateway -> file system (index to track metadata)
  • File system becomes bottle neck when scaling to billions of object
  • Purity (FlashArray and FlashBlade) – objects at the core


Object Read Path

Request Arrival

  • Extract bucket and object names from request
  • Decode bucket and object names
  • Get bucket ID from authority
  • Get object ID from bucket authority
  • Forward read request to object authority

Read data

  • Read object data from flash
  • Forward back to protocol handler
  • Decompress and form response to client

Two takeaways

  • Two phases – metadata lookup and data access; distributed everything
  • Basically identical to how a file is read via NFS

FlashBlade is S3-compatible for the moment. Purity is really a key-value database


Looking Forward

The next generation of applications require new storage interfaces. There was a demo using TensorFlow.

  • Converting raw pixels (ultimate in unstructured data) to structured data
  • Now imagine if you’ve got 10s of thousands of cameras
  • Object detection -> message queue -> object indexing, streaming queries, time-series analysis



Par wrapped up by talking about:

  • “The big bang of intelligence”
  • Modern Compute – parallel architecture driving performance
  • New Algorithms – modern approaches for superhuman accuracy
  • Big Data – Data is the new oil
  • “Massively parallel is the new normal”
  • 4th Industrial Revolution (2010 – now) – AI, Big Data, Cloud, IoT, Computing, digital to intelligence

I was a bit confused by FlashBlade when I first heard about it, and suggested that the 12 months post Storage Field Day 10 would be critical to the success of the product. Pure have managed to blow me away with the progress they’ve made with the product since GA, the breadth of customers and use cases they’ve lined up, and the overall level of forward thinking that’s gone into the product. You can use it to do some really cool stuff. The biggest problem I’ve had with the “data is the new oil” paradigm is that, unlike real oil, a lot of companies don’t actually know what to do with their data. FlashBlade is not going to magically fix this for you, but it’s going to give you some pretty compelling infrastructure that solves some of the problem of how to do stuff effectively with massive amounts of data.

Object storage is the new hot, and has been for a little while. Putting together a product like FlashBlade has certainly gotten Pure into a bunch of accounts where they weren’t traditionally successful with FlashArray. It’s also given their more traditional customers a different option for tackling big data problems. Pure strike me as being fiendishly focused on delivering something special with FlashBlade, and certainly don’t appear to be slowing down when adding new features to the platform. There’s been some really cool features added, including support for 17TB blades (almost by accident) and increasing scalability to 75 blades. I’m looking forward to seeing what’s next for FlashBlade. You can read the blog post about the FlashBlade 2.0 announcement here.