Random Short Take #28

New year, same old format for news bites. This is #28 – the McKinnie Edition. I always thought Alfonzo looked a bit like that cop in The Deuce. Okay – it’s clear that some of these numbers are going to be hard to work with, but I’ll keep it going for a little while longer (the 30s are where you find a lot of the great players).

  • In what seems like pretty big news, Veeam has been acquired by Insight Partners. You can read the press release here, and Anton Gostev shares his views on it here.
  • This one looks like a bit of a science project, but I find myself oddly intrigued by it. You can read the official announcement here. Pre-orders are open now, and I’ll report back some time in March or April when / if the box turns up.
  • I loved this article from Chin-Fah on ransomware and NAS environments. I’m looking forward to catching up with Chin-Fah next week (along with all of the other delegates) at Storage Field Day 19. Tune in here if you want to see us on camera.
  • Speaking of ransomware, this article from Joey D’Antoni provided some great insights into the problem and what we can do about it.
  • A lot of my friends overseas are asking about the bush fires in Australia. There’s a lot in the media about it, and this article about the impact on infrastructure from Preston made for some thought-provoking reading.
  • I still use Plex heavily, and spend a lot of time moving things from optical discs to my NAS. This article covers a lot of the process I use too. I’ve started using tinyMediaManager as well – it’s pretty neat.
  • All the kids (and vendor executives) are talking about Kubernetes. It’s almost like we’re talking about public cloud or big data. Inspired in part by what he saw at Cloud Field Day 6, Keith weighs in on the subject here and I recommend you take the time to read (and understand) what he’s saying.
  • I enjoy reading Justin’s disclosure posts, even when he throws shade on my state (“Queensland is Australia’s Florida”). Not that he’s wrong, mind you.

Random Short Take #27

Welcome to my semi-regular, random news post in a short format. This is #27. You’d think it would be hard to keep naming them after basketball players, and it is. None of my favourite players ever wore 27, but Marvin Barnes did surface as a really interesting story, particularly when it comes to effective communication with colleagues. Happy holidays too, as I’m pretty sure this will be the last one of these posts I do this year. I’ll try and keep it short, as you’ve probably got stuff to do.

  • This story of serious failure on El Reg had me in stitches.
  • I really enjoyed this article by Raj Dutt (over at Cohesity’s blog) on recovery predictability. As an industry we talk an awful lot about speeds and feeds and supportability, but sometimes I think we forget about keeping it simple and making sure we can get our stuff back as we expect.
  • Speaking of data protection, I wrote some articles for Druva about, well, data protection and things of that nature. You can read them here.
  • There have been some pretty important CBT-related patches released by VMware recently. Anthony has provided a handy summary here.
  • Everything’s an opinion until people actually do it, but I thought this research on cloud adoption from Leaseweb USA was interesting. I didn’t expect to see everyone putting their hands up and saying they’re all in on public cloud, but I was also hopeful that we, as an industry, hadn’t made things as unclear as they seem to be. Yay, hybrid!
  • Site sponsor StorONE has partnered with Tech Data Global Computing Components to offer an All-Flash Array as a Service solution.
  • Backblaze has done a nice job of talking about data protection and cloud storage through the lens of Star Wars.
  • This tip on removing particular formatting in Microsoft Word documents really helped me out recently. Yes I know Word is awful.
  • Someone was nice enough to give me an acknowledgement for helping review a non-fiction book once. Now I’ve managed to get a character named after me in one of John Birmingham’s epics. You can read it out of context here. And if you’re into supporting good authors on Patreon – then check out JB’s page here. He’s a good egg, and his literary contributions to the world have been fantastic over the years. I don’t say this just because we live in the same city either.

Storage Field Day – I’ll Be At Storage Field Day 19

Here’s some news that will get you in the holiday spirit. I’ll be heading to the US in late January for another Storage Field Day event. If you haven’t heard of the very excellent Tech Field Day events, you should check them out. I’m looking forward to time travel and spending time with some really smart people for a few days. It’s also worth checking back on the Storage Field Day 19 website during the event (January 22 – 24) as there’ll be video streaming and updated links to additional content. You can also see the list of delegates and event-related articles that have been published.

I think it’s a great line-up of both delegates and presenting companies (including a “secret company”) this time around. I know most of them, but there may also still be a few companies added to the line-up. I’ll update this if and when they’re announced.

I’d like to publicly thank in advance the nice folks from Tech Field Day who’ve seen fit to have me back, as well as my employer for letting me take time off to attend these events. Also big thanks to the companies presenting. It’s going to be a lot of fun. Seriously. If you’re in the Bay Area and want to catch up prior to the event, please get in touch. I’ll have some free time, so perhaps we could check out a Warriors game on the 18th and discuss the state of the industry?

Scale Computing Makes Big Announcement About Small HE150

Scale Computing recently announced the HE150 series of small edge servers. I had the chance to chat with Alan Conboy about the announcement, and thought I’d share some thoughts here.

 

Edge, But Smaller

I’ve written in the past about additions to the HC3 Edge Platform. But those things had a rack-mount form factor. The newly announced HE150 runs on Intel NUC devices. Wait, what? That’s right, hyper-converged infrastructure on really small PCs. But don’t you need a bunch of NICs to do HC3 properly? There’s no need for backplane switch requirement, as they use some software-defined networking to tunnel the backplane network across the NIC. The HC3 platform uses less than 1GB RAM per node, and each node has 2 cores. The storage sits on an NVMe drive and you can get hold of this stuff at a retail price of around $5K US for 3 nodes.

[image courtesy of Scale Computing]

Scale at Scale?

How do you deploy these kinds of things at scale then? Conboy tells me there’s full Ansible integration, RESTful API deployment capabilities, and they come equipped with Intel AMT. In short, these things can turn up at the remote site, be plugged in, and be ready to go.

Where would you?

The HE150 solution is 100% specific to multi-site edge implementations. It’s not trying to go after workloads that would normally be serviced by the HE500 or HE1000. Where it can work though, is with:

  • Oil and Gas exploration – with one in each ship (they need 4-5 VMs to handle sensor data to make command decisions)
  • Grocery and retail chains
  • Manufacturing platforms
  • Telcos – pole-side boxes

In short, think of environments that require some amount of compute and don’t have IT people to support it.

 

Thoughts

I’ve been a fan of what Scale Computing has been doing with HCI for some time now. Scale’s take on making things simple across the enterprise has been refreshing. While this solution might surprise some folks, it strikes me that there’s an appetite for this kind fo thing in the marketplace. The edge is often a place where less is more, and there’s often not a lot of resources available to do basic stuff, like deploy a traditional, rackmounted compute environment. But a small, 3-node HCI cluster that can be stacked away in a stationery cupboard? That might just work. Particularly if you only need a few virtual machines to meet those compute requirements. As Conboy pointed out to me, Scale isn’t looking to use this as a replacement for the higher-preforming options it has available. Rather, this solution is perfect for highly distributed retail environments where they need to do one or two things and it would be useful if they didn’t do those things in a data centre located hundreds of kilometres away.

If you’re not that excited about Intel NUCs though, you might be happy to hear that solutions from Lenovo will be forthcoming shortly.

The edge presents a number of challenges to enterprises, in terms of both its definition and how to deal with it effectively. Ultimately, the success of solutions like this will hinge on ease of use, reliability, and whether it really is fit for purpose. The good folks at Scale don’t like to go off half-cocked, so you can be sure some thought went into this product – it’s not just a science project. I’m keen to see what the uptake is like, because I think this kind of solution has a place in the market. The HE150 is available for purchase form Scale Computing now. It’s also worth checking out the Scale Computing presentations at Tech Field Day 20.

InfiniteIO And Your Data – Making Meta Better

InfiniteIO recently announced its new Application Accelerator. I had the opportunity to speak about the news with Liem Nguyen (VP of Marketing) and Kris Meier (VP of Product Management) from InfiniteIO and thought I’d share some thoughts here.

 

Metadata Is Good, And Bad

When you think about file metadata you might think about photos and the information they store that tells you about where the photo was taken, when it was taken, and the kind of camera used. Or you might think of an audio file and the metadata that it contains, such as the artist name, year of release, track number, and so on. Metadata is a really useful thing that tells us an awful lot about data we’re storing. But things like simple file read operations make use of a lot of metadata just to open the file:

  • During the typical file read, 7 out of 8 operations are metadata requests which significantly increases latency; and
  • Up to 90% of all requests going to NAS systems are for metadata.

[image courtesy of InfiniteIO]

 

Fire Up The Metadata Engine

Imagine how much faster storage would be if it only has to service 10% of the requests it does today? The Application Accelerator helps with this by:

  • Separating metadata request processing from file I/O
  • Responding directly to metadata requests at the speed of DRAM – much faster than a file system

[image courtesy of InfiniteIO]

The cool thing is it’s a simple deployment – installed like a network switch requiring no changes to workflows.

 

Thoughts and Further Reading

Metadata is a key part of information management. It provides data with a lot of extra information that makes that data more useful to applications that consume it and to the end users of those applications. But this metadata has a cost associated with it. You don’t think about the amount of activity that happens with simple file operations, but there is a lot going on. It gets worse when you look at activities like AI training and software build operations. The point of a solution like the Application Accelerator is that, according to InfiniteIO, your primary storage devices could be performing at another level if another device was doing the heavy lifting when it came to metadata operations.

Sure, it’s another box in the data centre, but the key to the Application Accelerator’s success is the software that sits on the platform. When I saw the name my initial reaction was that filesystem activities aren’t applications. But they really are, and more and more applications are leveraging data on those filesystems. If you could reduce the load on those filesystems to the extent that InfiniteIO suggest then the Application Accelerator becomes a critical piece of the puzzle.

You might not care about increasing the performance of your applications when accessing filesystem data. And that’s perfectly fine. But if you’re using a lot of applications that need high performance access to data, or your primary devices are struggling under the weight of your workload, then something like the Application Accelerator might be just what you need. For another view, Chris Mellor provided some typically comprehensive coverage here.

Random Short Take #26

Welcome to my semi-regular, random news post in a short format. This is #26. I was going to start naming them after my favourite basketball players. This one could be the Korver edition, for example. I don’t think that’ll last though. We’ll see. I’ll stop rambling now.

Excelero And The NVEdge

It’s been a little while since I last wrote about Excelero. I recently had the opportunity to catch up with Josh Goldenhar and Tom Leyden and thought I’d share some of my thoughts here.

 

NVMe Performance Good, But Challenging

NVMe has really delivered storage performance improvements in recent times.

All The Kids Are Doing It

Great performance:

  • Up to 1.2M IOPs, 6GB/s per drive
  • Ultra-low latency (20μs)

Game changer for data-intensive workloads:

  • Mission-Critical Databases
  • Analytical Processing
  • AI and Machine Learning

But It’s Not Always What You’d Expect

IOPs and Bandwidth Utilisation

  • Applications struggle to use local NVMe performance beyond 3-4 drives
  • Stranded IOPS and / or bandwidth = poor ROI

Sharing is the Logical Answer, with local latency

  • Physical disaggregation is often operationally desirable
  • 24 Drive servers are common and readily available

Data Protection Desired

  • NVMe performs, but by itself offers no data protection
  • Local data protection does not protect against server failures

Some NVMe-over-fabrics solutions offer controller based data protection, but limit IOPs, bandwidth and sacrifice latency.

 

Scale Up Or Out?

NVMesh – Scale-out design: data centre scale

  • Disaggregated & converged architecture
  • No CPU overhead: no noisy neighbours
  • Lowest latency: 5μs

NVEdge – Scale-up design: rack scale

  • Disaggregated architecture
  • Full bandwidth even at 4K IO
  • Client-less architecture with NVMe-oF initiators
  • Enterprise-ready: RAID 1/0, High Availability with fast failover, Thin Provisioning, CRC

 

Flexible Deployment Models

There are a few different ways you can deploy Excelero.

Converged – Local NVMe drives in Application Servers

  • Single, unified storage pool
  • NVMesh initiator and client on all nodes
  • NVMesh bypasses server CPU
  • Various protection levels
  • No dedicated storage servers needed
  • Linearly scalable
  • Highest aggregate bandwidth

Top-of-Rack Flash

  • Single, unified storage pool
  • NVMesh Target runs on dedicated storage nodes
  • NVMesh Client runs on application servers
  • Applications get performance of local NVMe storage
  • Various Protection Levels
  • Linearly scalable

Data Protection

There are also a number of options when it comes to data resiliency.

[image courtesy of Excelero]

Networking Options

You can choose either TCP/IP or RDMA. TCP/IP offers a latency hit, but it works with any NIC (and your existing infrastructure). RDMA has super low latency, but is only available on a limited subset of NICs.

 

NVEdge Then?

Excelero described NVEdge as “block storage software for building NVMe Flash Arrays for demanding workflows such as AI, ML and databases in the Cloud and at the Edge”.

Scale-up architecture

  • High NVMe AFA performance, leveraging NVMe-oF
  • Full bandwidth performance even at 4K block size

High availability, supporting:

  • Dual-port NVMe drives
  • Dual controllers (with fast failover, less than 100ms)
  • Active / active controller operation and active/passive logical volume access

Data services include:

  • RAID 1/0 data protection
  • Thin Provisioning: thousands of striped volumes of up to 1PB each
  • Enterprise grade block checksums (CRC 16/32/64).

Hardware Compatibility?

Supported Platforms

  • x86-based systems for higher aggregate performance
  • SmartNIC-based architectures for lower power & cost

HW Requirements

  • Each controller has PCIe connectivity to all drives
  • Controllers can communicate over a network
  • Controllers communicate over both the network and drive pairs to identify connectivity (failure) issues

Supported Networking

  • RDMA (InfiniBand or Ethernet) TCP/IP networking

 

Thoughts and Further Reading

NVMe has been a good news story for folks struggling with the limitations of the SAS protocol. I’ve waxed lyrical in the past about how impressed I was with Excelero’s offering. Not every workload is necessarily suited to NVMesh though, and NVEdge is an interesting approach to solving that problem. Where NVMesh provides a tonne of flexibility when it comes to deployment options and the hardware used, NVEdge doubles down on availability and performance for different workloads.

NVMe isn’t a handful of magic beans that will instantly have your storage workloads. You need to be able to feed it to really get value from it, and you need to be able to protect it too. It comes down to understanding what it is you’re trying to achieve with your applications, rather than just splashing cash on the latest storage protocol in the hope that it will make your business more money.

At this point I’d make some comment about data being the new oil, but I don’t really have enough background in the resources sector to be able to carry that analogy much further than that. Instead I’ll say this: data (in all of its various incantations) is likely very important to your business. Whether it’s something relatively straightforward like seismic data, or financial results, or policy documents, or it may be the value that you can extract from that data by having fast access to a lot of it. Whatever you’re doing with it, you’re likely investing in hardware and software that helps you get to that value. Excelero appears to have focused on ensuring that the ability to access data in a timely fashion isn’t the thing that holds you back from achieving your data value goals.

Datrium Enhances DRaaS – Makes A Cool Thing Cooler

Datrium recently made a few announcements to the market. I had the opportunity to speak with Brian Biles (Chief Product Officer, Co-Founder), Sazzala Reddy (Chief Technology Officer and Co-Founder), and Kristin Brennan (VP of Marketing) about the news and thought I’d cover it here.

 

Datrium DRaaS with VMware Cloud

Before we talk about the new features, let’s quickly revisit the DRaaS for VMware Cloud offering, announced by Datrium in August this year.

[image courtesy of Datrium]

The cool thing about this offering was that, according to Datrium, it “gives customers complete, one-click failover and failback between their on-premises data center and an on-demand SDDC on VMware Cloud on AWS”. There are some real benefits to be had for Datrium customers, including:

  • Highly optimised, and more efficient than some competing solutions;
  • Consistent management for both on-premises and cloud workloads;
  • Eliminates the headaches as enterprises scale;
  • Single-click resilience;
  • Simple recovery from current snapshots or old backup data;
  • Cost-effective failback from the public cloud; and
  • Purely software-defined DRaaS on hyperscale public clouds for reduced deployment risk long term.

But what if you want a little flexibility in terms of where those workloads are recovered? Read on.

Instant RTO

So you’re protecting your workloads in AWS, but what happens when you need to stand up stuff fast in VMC on AWS? This is where Instant RTO can really help. There’s no rehydration or backup “recovery” delay. Datrium tells me you can perform massively parallel VM restarts (hundreds at a time) and you’re ready to go in no time at all. The full RTO varies by run-book plan, but by booting VMs from a live NFS datastore, you know it won’t take long. Failback uses VADP.

[image courtesy of Datrium]

The only cost during normal business operations (when not testing or deploying DR) is the cost of storing ongoing backups. And these are are automatically deduplicated, compressed and encrypted. In the event of a disaster, Datrium DRaaS provisions an on-demand SDDC in VMware Cloud on AWS for recovery. All the snapshots in S3 are instantly made executable on a live, cloud-native NFS datastore mounted by ESX hosts in that SDDC, with caching on NVMe flash. Instant RTO is available from Datrium today.

DRaaS Connect

DRaaS Connect extends the benefits of Instant RTO DR to any vSphere environment. DRaaS Connect is available for two different vSphere deployment models:

  • DRaaS Connect for VMware Cloud offers instant RTO disaster recovery from an SDDC in one AWS Availability Zone (AZ) to another;
  • DRaaS Connect for vSphere On Prem integrates with any vSphere physical infrastructure on-premises.

[image courtesy of Datrium]

DRaaS Connect for vSphere On Prem extends Datrium DRaaS to any vSphere on-premises infrastructure. It will be managed by a DRaaS cloud-based control plane to define VM protection groups and their frequency, replication and retention policies. On failback, DRaaS will return only changed blocks back to vSphere and the local on-premises infrastructure through DRaaS Connect.

The other cool things to note about DRaaS Connect is that:

  • There’s no Datrium DHCI system required
  • It’s a downloadable VM
  • You can start protecting workloads in minutes

DRaaS Connect will be available in Q1 2020.

 

Thoughts and Further Reading

Datrium announced some research around disaster recovery and ransomware in enterprise data centres in concert with the product announcements. Some of it wasn’t particularly astonishing, with folks keen to leverage pay as you go models for DR, and wanting easier mechanisms for data mobility. What was striking is that one of the main causes of disasters is people, not nature. Years ago I remember we used to plan for disasters that invariably involved some kind of flood, fire, or famine. Nowadays, we need to plan for some script kid pumping some nasty code onto our boxes and trashing critical data.

I’m a fan of companies that focus on disaster recovery, particularly if they make it easy for consumers to access their services. Disasters happen frequently. It’s not a matter of if, just a matter of when. Datrium has acknowledged that not everyone is using their infrastructure, but that doesn’t mean it can’t offer value to customers using VMC on AWS. I’m not 100% sold on Datrium’s vision for “disaggregated HCI” (despite Hugo’s efforts to educate me), but I am a fan of vendors focused on making things easier to consume and operate for customers. Instant RTO and DRaaS Connect are both features that round out the DRaaS for VMwareCloud on AWS quite nicely.

I haven’t dived as deep into this as I’d like, but Andre from Datrium has written a comprehensive technical overview that you can read here. Datrium’s product overview is available here, and the product brief is here.

Brisbane VMUG – November 2019

hero_vmug_express_2011

The November 2019 edition of the Brisbane VMUG meeting will be held on Tuesday 26th November at Fishburners from 4pm – 6pm. It’s sponsored by VMware and promises to be a great afternoon.

Here’s the agenda:

  • VMUG Intro
  • VMware Presentation: VMware the network company! A presentation by Francois Prowse
  • Q&A
  • Refreshments and drinks

Join us for an end of year celebration to thank the VMUG community for all their efforts in 2019 as well as hearing from VMware from a different perspective. You can find out more information and register for the event here. I hope to see you there. Also, if you’re interested in sponsoring one of these events, please get in touch with me and I can help make it happen.

Random Short Take #25

Want some news? In a shorter format? And a little bit random? Here’s a short take you might be able to get behind. Welcome to #25. This one seems to be dominated by things related to Veeam.

  • Adam recently posted a great article on protecting VMConAWS workloads using Veeam. You can read it about it here.
  • Speaking of Veeam, Hal has released v2 of MS Office 365 Backup Analysis Tool. You can use it to work out how much capacity you’ll need to protect your O365 workloads. And you can figure out what your licensing costs will be, as well as a bunch of other cool stuff.
  • And in more Veeam news, the VeeamON Virtual event is coming up soon. It will be run across multiple timezones and should be really interesting. You can find out more about that here.
  • This article by Russ on copyright and what happens when bots go wild made for some fascinating reading.
  • Tech Field Day turns 10 years old this year, and Stephen has been running a series of posts covering some of the history of the event. Sadly I won’t be able to make it to the celebration at Tech Field Day 20, but if you’re in the right timezone it’s worthwhile checking it out.
  • Need to connect to an SMB share on your iPad or iPhone? Check out this article (assuming you’re running iOS 13 or iPadOS 13.1).
  • It grinds my gears when this kind of thing happens. But if the mighty corporations have launched a line of products without thinking it through, we shouldn’t expect them to maintain that line of products. Right?
  • Storage and Hollywood can be a real challenge. This episode of Curtis‘s podcast really got into some of the details with Jeff Rochlin.