Komprise Continues To Gain Momentum

I first encountered Komprise at Storage Field Day 17, and was impressed by the offering. I recently had the opportunity to take a briefing with Krishna Subramanian, President and COO at Komprise, and thought I’d share some of my notes here.

 

Momentum

Funding

The primary reason for our call was to discuss Komprise’s Series C funding round of US $24 million. You can read the press release here. Some noteworthy achievements include:

  • Revenue more than doubled every single quarter, with existing customers steadily growing how much they manage with Komprise; and
  • Some customers now managing hundreds of PB with Komprise.

 

Key Verticals

Komprise are currently operating in the following key verticals:

  • Genomics and health care, with rapidly growing footprints;
  • Financial and Insurance sectors (5 out of 10 of the largest insurance companies in the world apparently use Komprise);
  • A lot of universities (research-heavy environments); and
  • Media and entertainment.

 

What’s It Do Again?

Komprise manages unstructured data over three key protocols (NFS, SMB, S3). You can read more about the product itself here, but some of the key features include the ability to “Transparently archive data”, as well as being able to put a copy of your data in another location (the cloud, for example).

 

So What’s New?

One of Komprise’s recent announcements was NAS to NAS migration.  Say, for example, you’d like to migrate your data from an Isilon environment to FlashBlade, all you have to do is set one as a source, and one as target. The ACLs are fully preserved across all scenarios, and Komprise does all the heavy lifting in the background.

They’re also working on what they call “Deep Analytics”. Komprise already aggregates file analytics data very efficiently. They’re now working on indexing metadata on files and exposing that index. This will give you “a Google-like search on all your data, no matter where it sits”. The idea is that you can find data using any combination of metadata. The feature is in beta right now, and part of the new funding is being used to expand and grow this capability.

 

Other Things?

Komprise can be driven entirely from an API, making it potentially interesting for service providers and VARs wanting to add support for unstructured data and associated offerings to their solutions. You can also use Komprise to “confine” data. The idea behind this is that data can be quarantined (if you’re not sure it’s being used by any applications). Using this feature you can perform staged deletions of data once you understand what applications are using what data (and when).

 

Thoughts

I don’t often write articles about companies getting additional funding. I’m always very happy when they do, as someone thinks they’re on the right track, and it means that people will continue to stay employed. I thought this was interesting enough news to cover though, given that unstructured data, and its growth and management challenges, is an area I’m interested in.

When I first wrote about Komprise I joked that I needed something like this for my garage. I think it’s still a valid assertion in a way. The enterprise, at least in the unstructured file space, is a mess based on the what I’ve seen in the wild. Users and administrators continue to struggle with the sheer volume and size of the data they have under their management. Tools such as this can provide valuable insights into what data is being used in your organisation, and, perhaps more importantly, who is using it. My favourite part is that you can actually do something with this knowledge, using Komprise to copy, migrate, or archive old (and new) data to other locations to potentially reduce the load on your primary storage.

I bang on all the time about the importance of archiving solutions in the enterprise, particularly when companies have petabytes of data under their purview. Yet, for reasons that I can’t fully comprehend, a number of enterprises continue to ignore the problem they have with data hoarding, instead opting to fill their DCs and cloud storage with old data that they don’t use (and very likely don’t need to store). Some of this is due to the fact that some of the traditional archive solution vendors have moved on to other focus areas. And some of it is likely due to the fact that archiving can be complicated if you can’t get the business to agree to stick to their own policies for document management. In just the same way as you can safely delete certain financial information after an amount of time has elapsed, so too can you do this with your corporate data. Or, at the very least, you can choose to store it on infrastructure that doesn’t cost a premium to maintain. I’m not saying “Go to work and delete old stuff”. But, you know, think about what you’re doing with all of that stuff. And if there’s no value in keeping the “kitchen cleaning roster May 2012.xls” file any more, think about deleting it? Or, consider a solution like Komprise to help you make some of those tough decisions.

VMware vSphere and NFS – Some Links

Most of my experience with vSphere storage has revolved around various block storage technologies, such as DAS, FC and iSCSI. I recently began an evaluation of one of those fresh new storage startups running an NVMe-based system. We didn’t have the infrastructure to support NVMe-oF in our lab, so we’ve used NFS to connect the datastores to our vSphere environment. Obviously, at this point, it is less about maximum performance and more about basic functionality. In any case, I thought it might be useful to include a series of links regarding NFS and vSphere that I’ve been using to both get up and running, and troubleshoot some minor issues we had getting everything running. Note that most of these links cover vSphere 6.5, as our lab is currently running that version.

Basics

Create an NFS Datastore

How to add NFS export to VMware ESXi 6.5

NFS Protocols and ESXi

Best Practice

Best Practices for running VMware vSphere on Network Attached Storage

Troubleshooting

Maximum supported volumes reached (1020652)

Increasing the default value that defines the maximum number of NFS mounts on an ESXi/ESX host (2239)

Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts (1003967)

Random Short Take #11

Here are a few links to some random news items and other content that I found interesting. You might find it interesting too. Maybe. Happy New Year too. I hope everyone’s feeling fresh and ready to tackle 2019.

  • I’m catching up with the good folks from Scale Computing in the next little while, but in the meantime, here’s what they got up to last year.
  • I’m a fan of the fruit company nowadays, but if I had to build a PC, this would be it (hat tip to Stephen Foskett for the link).
  • QNAP announced the TR-004 over the weekend and I had one delivered on Tuesday. It’s unusual that I have cutting edge consumer hardware in my house, so I’ll be interested to see how it goes.
  • It’s not too late to register for Cohesity’s upcoming Helios webinar. I’m looking forward to running through some demos with Jon Hildebrand and talking about how Helios helps me manage my Cohesity environment on a daily basis.
  • Chris Evans has published NVMe in the Data Centre 2.0 and I recommend checking it out.
  • I went through a basketball card phase in my teens. This article sums up my somewhat confused feelings about the card market (or lack thereof).
  • Elastifile Cloud File System is now available on the AWS Marketplace – you can read more about that here.
  • WekaIO have posted some impressive numbers over at spec.org if you’re into that kind of thing.
  • Applications are still open for vExpert 2019. If you haven’t already applied, I recommend it. The program is invaluable in terms of vendor and community engagement.

 

 

Storage Field Day – I’ll Be At Storage Field Day 18

Here’s some good news for you. I’ll be heading to the US in late February for another Storage Field Day event. If you haven’t heard of the very excellent Tech Field Day events, you should check them out. I’m looking forward to time travel and spending time with some really smart people for a few days. It’s also worth checking back on the Storage Field Day 18 website during the event (February 27 – March 1) as there’ll be video streaming and updated links to additional content. You can also see the list of delegates and event-related articles that have been published.

I think it’s a great line-up of both delegates and presenting companies (including a “secret company”) this time around. I know them all pretty well, but there may also still be a few companies added to the line-up. I’ll update this if and when they’re announced.

I’d like to publicly thank in advance the nice folks from Tech Field Day who’ve seen fit to have me back, as well as my employer for letting me take time off to attend these events. Also big thanks to the companies presenting. It’s going to be a lot of fun. Seriously. If you’re in the Bay Area and want to catch up prior to the event, please get in touch. I’ll have some free time, so perhaps we could check out a Warriors game on the 23rd and discuss the state of the industry? ;)

OT – I Voted. Now It’s Over To You

Eric Siebert has opened up voting for the Top vBlog 2018. I’m listed on the vLaunchpad and you can vote for me under storage and independent blog categories as well. There are a bunch of great blogs listed on Eric’s vLaunchpad, so if nothing else you may discover someone you haven’t heard of before, and chances are they’ll have something to say that’s worth checking out. If this stuff seems a bit needy, it is. But it’s also nice to have people actually acknowledging what you’re doing. I’m hoping that people find this blog useful, because it really is a labour of love (random vendor t-shirts notwithstanding).

Elastifile Announces v3.0

Elastifile recently announced version 3.0 of their product. I had the opportunity to speak to Jerome McFarland (VP of Marketing) and thought I’d share some information from the announcement here. If you haven’t heard of them before, “Elastifile augments public cloud capabilities and facilitates cloud consumption by delivering enterprise-grade, scalable file storage in the cloud”.

 

The Announcement

ClearTier

One of the major features of the 3.0 release is “ClearTier”, delivering integration between file and object storage in public clouds. With ClearTier, you have object storage expanding the file system namespace. The cool thing about this is that Elastifile’s ECFS provides transparent read / write access to all data. No need to re-tool applications to take advantage of the improved economics of object storage in the public cloud.

How Does It Work?

All data is accessible through ECFS via a standard NFS mount, and application access to object data is routed automatically. Data tiering occurs automatically according to user-defined policies specifying:

  • Targeted capacity ratio between file and object;
  • Eligibility for data demotion (i.e. min time since last access); and
  • Promotion policies control response to object data access.

Bursting

ClearTier gets even more interesting when you combine it with Elastifile’s CloudConnect, by using CloudConnect to get data to the public cloud in the first place, and then using CloudTier to push data to object storage.

[image courtesy of Elastifile]

It becomes a simple process, and consists of two steps:

  1. Move on-premises data (from any NAS) to cloud-based object storage using CloudConnect; and
  2. Deploy ECFS with pointer to designated object store.

Get Snappy

ClearTier also provides the ability to store snapshots on an object tier. Snapshots occur automatically according to user- defined policies specifying:

  • Data to include;
  • Destination for snapshot (i.e. file storage / object storage); and
  • Schedule for snapshot creation.

The great thing is that all snapshots are accessible through ECFS via the same NFS mount.

 

Thoughts And Further Reading

I was pretty impressed with Elastifile’s CloudConnect solution when they first announced it. When you couple CloudConnect with something like ClearTier, and have it sitting on top of the ECFS foundation, it strikes me as a pretty cool solution. If you’re using applications that rely heavily on NFS, for example, ClearTier gives you a way to leverage the traditionally low cost of cloud object storage with the improved performance of file. I like the idea that you can play with the ratio of file and object, and I’m a big fan of not having to re-tool my file-centric applications to take advantage of object economics. The ability to store a bunch of snapshots on the object tier also adds increased flexibility in terms of data protection and storage access options.

The ability to burst workloads is exactly the kind of technical public cloud use case that we’ve been talking about in slideware for years now. The reality, however, has been somewhat different. It looks like Elastifile are delivering a solution that competes aggressively with some of the leading cloud providers’ object solutions, whilst also giving the storage array vendors, now dabbling in cloud solutions, pause for thought. There are a bunch of interesting use cases, particularly if you need to access a bunch of compute, and large data sets via file-based storage, in a cloud environment for short periods of time. If you’re looking for a cost-effective, scalable storage solution, I think that Elastifile are worth checking out.

Storage Field Day 17 – Wrap-up and Link-o-rama

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

This is a quick post to say thanks once again to Stephen and Ben, and the presenters at Storage Field Day 17. I had a super fun and educational time. For easy reference, here’s a list of the posts I did covering the events (they may not match the order of the presentations).

Storage Field Day – I’ll Be At Storage Field Day 17

Storage Field Day 17 – (Fairly) Full Disclosure

I Need Something Like Komprise For My Garage

NGD Systems Are On The Edge Of Glory

Intel’s Form Factor Is A Factor

StarWind Continues To Do It Their Way

 

Also, here’s a number of links to posts by my fellow delegates (in no particular order). They’re all very smart people, and you should check out their stuff, particularly if you haven’t before. I’ll attempt to keep this updated as more posts are published. But if it gets stale, the Storage Field Day 17 landing page will have updated links.

 

Max Mortillaro (@DarkkAvenger)

I will be at Storage Field Day 17! Wait what is a “Storage Field Day”?

The Rise of Computational Storage

Komprise: Data Management Made Easy

What future for Intel Optane?

 

Ray Lucchesi (@RayLucchesi)

Screaming IOP performance with StarWind’s new NVMeoF software & Optane SSDs

GreyBeards talk Computational Storage with Scott Shadley VP Marketing NGD Systems

 

Howard Marks (@DeepStorageNet)

 

Arjan Timmerman (@ArjanTim)

EP10 – Computational Storage: A Paradigm Shift In The Storage Industry with Scott Shadley and NGD Systems

EP11 – Data Management with Komprise: Transformation without Disruption – with Krishna Subramanian

Enable your Data: Komprise

 

Aaron Strong (@TheAaronStrong)

Komprise Systems Overview from #SFD17

NGD Systems from #SFD17

StarWind NVMeoF

 

Jeffrey Powers (@Geekazine)

Komprise Transforming Data Management with Disruption at SFD17

Starwind NVMe Over Fabrics for SMB and ROBO at SFD17

NGD Systems Help Make Cat Searches Go Faster with Better Results at SFD17

 

Joe Houghes (@JHoughes)

 

Luigi Danakos (@NerdBlurt)

Tech Stand UP Episode 8 – SFD17 – Initial Thoughts on Komprise Podcast

Tech Stand Up Episode 9 – SFD17 – Initial thoughts NGD Systems Podcast

 

Mark Carlton (@MCarlton1983)

 

Enrico Signoretti (@ESignoretti)

Secondary Storage Is The New Primary

The Era of Composable Storage Infrastructures is Coming

The Fascinating World Of Computational Storage

Secondary Data and Komprise with Krishna Subramanian

 

Jon Hudson (@_Desmoden)

 

[photo courtesy of Ramon]

StarWind Continues To Do It Their Way

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

StarWind recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here.

 

StarWind Do All Kinds Of Stuff

I’ve written enthusiastically about StarWind previously. If you’re unfamiliar with them, they have three main focus areas:

They maintain a strict focus on the SMB and Enterprise ROBO markets, and aren’t looking to be the next big thing in the enterprise any time soon.

 

So What’s All This About NVMe [over Fabrics]?

According to Max and the team, NVMe over Fabrics is “the next big thing in [network] storage”. Here’s a photo of Max saying just that.

Why Hate SAS?

It’s not that people hate SAS, it’s just that the SAS protocol was designed for disk, and NVMe was designed for Flash devices.

SAS (iSCSI / iSER) NVMe [over Fabrics]
Complex driver built around archaic SCSI Simple driver built around block device (R/W)
Single short queue per controller One device = one controller, no bottlenecks
Single short queue per device Many long queues per device
Serialised access, locks Non-serialised access, no locks
Many-to-One-to-Many Many-to-Many, true Point-to-Point

 

You Do You, Boo

StarWind have developed their own NVMe SPDK for Windows Server (as Intel doesn’t currently provide one). In early development they had some problems with high CPU overheads. CPU might be a “cheap resource”, but you still don’t want to use up 8 cores dishing out IO for a single device. They’ve managed to move a lot of the work to user space and cut down on core consumption. They’ve also built their own Linux (CentOS) based initiator for NVMe over Fabrics. They’ve developed a NVMe-oF initiator for Windows by combining a Linux initiator and stub driver in the hypervisor. “We found the elegant way to bring missing SPDK functionality to Windows Server: Run it in a VM with proper OS! First benefit – CPU is used more efficiently”. They’re looking to do something similar with ESXi in the very near future.

 

Thoughts And Further Reading

I like to think of StarWind as the little company from the Ukraine that can. They have a long, rich heritage in developing novel solutions to everyday storage problems in the data centre. They’re not necessarily trying to take over the world, but they’ve demonstrated before that they have an ability to deliver solutions that are unique (and sometimes pioneering) in the marketplace. They’ve spent a lot of time developing software storage solutions over the years, so it makes sense that they’d be interested to see what they could do with the latest storage protocols and devices. And if you’ve ever met Max and Anton (and the rest of their team), it makes even more sense that they wouldn’t necessarily wait around for Intel to release a Windows-based SPDK to see what type of performance they could get out of these fancy new Flash devices.

All of the big storage companies are coming out with various NVMe-based products, and a number are delivering NVMe over Fabrics solutions as well. There’s a whole lot of legacy storage that continues to dominate the enterprise and SMB storage markets, but I think it’s clear from presentations such as StarWind’s that the future is going to look a lot different in terms of the performance available to applications (both at the core and edge).

You can check out this primer on NVMe over Fabrics here, and the ratified 1.0a specification can be viewed hereRay Lucchesi, as usual, does a much better job than I do of explaining things, and shares his thoughts here.

Intel’s Form Factor Is A Factor

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

The Intel Optane team recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here. I urge you to check out the videos, as there was a heck of a lot of stuff in there. But rather than talk about benchmarks and the SPDK, I’m going to focus on what’s happening with Intel’s approach to storage in terms of the form factor.

 

Of Form Factors And Other Matters Of Import

An Abbreviated History Of Drive Form Factors

Let’s start with little bit of history to get you going. IBM introduced the first hard drive – the IBM 350 disk storage unit – in 1956. Over time we’ve gone from a variety of big old drives to smaller form factors. I’m not old enough to reminisce about the Winchester drives, but I do remember the 5.25″ drives in the XT. Wikipedia provides a good a place to start as any if you’re interested in knowing more about hard drives. In any case, we now have the following prevailing form factors in use as hard drive storage:

  • 3.5″ drives – still reasonably common in desktop computers and “cheap and deep” storage systems;
  • 2.5″ drives (SFF) – popular in laptops and used as a “dense” form factor for a variety of server and storage solutions;
  • U.2 – mainstream PCIe SSD form factor that has the same dimensions as 2.5″ drives; and
  • M.2 – designed for laptops and tablets.

Challenges

There are a number of challenges associated with the current drive form factors. The most notable of these is the density issue. Drive (and storage) vendors have been struggling for years to try and cram more and more devices into smaller spaces whilst increasing device capacities as well. This has led to problems with cooling, power, and overall reliability. Basically, there’s only so much you can put in 1RU without the whole lot melting.

 

A Ruler? Wait, what?

Intel’s “Ruler” is a ruler-like, long (or short) drive based on the EDSFF (Enterprise and Datacenter Storage Form Factor) specification. There’s a tech brief you can view here. There are a few different versions (basically long and short), and it still leverages NVMe via PCIe.

[image courtesy of Intel]

It’s Denser

You can cram a lot of these things in a 1RU server, as Super Micro demonstrated a few months ago.

  • Up to 32 E1.L 9.5mm drives per 1RU
  • Up to 48 E1.S drives per 1RU

Which means you could be looking at around a petabyte of raw storage in 1RU (using 32TB E1.L drives). This number is only going to go up as capacities increase. Instead of half a rack of 4TB SSDs, you can do it all in 1RU.

It’s Cooler

Cooling has been a problem for storage systems for some time. A number of storage vendors have found out the hard way that jamming a bunch of drives in a small enclosure has a cost in terms of power and cooling. Intel tell us that they’ve had some (potentially) really good results with the E1.L and E1.S based on testing to date (in comparison to traditional SSDs). They talked about:

  • Up to 2x less airflow needed per E1.L 9.5mm SSD vs. U.2 15mm (based on Intel’s internal simulation results); and
  • Up to 3x less airflow needed per E1.S SSD vs. U.2 7mm.

Still Serviceable

You can also replace these things when they break. Intel say they’re:

  • Fully front serviceable with an integrated pull latch;
  • Support integrated, programmable LEDs; and
  • Support remote, drive specific power cycling.

 

Thoughts And Further Reading

SAN and NAS became popular in the data centre because you could jam a whole bunch of drives in a central location and you weren’t limited by what a single server could support. For some workloads though, having storage decoupled from the server can be problematic either in terms of latency, bandwidth, or both. Some workloads need their storage as close to the processor as possible. Technologies such as NVMe over Fabrics are addressing that issue to an extent, and other vendors are working to bring the compute closer to the storage. But some people just want to do what they do, and they need more and more storage to do it. I think the “ruler” form factor is an interesting approach to the issue traditionally associated with cramming a bunch of capacity in a small space. It’s probably going to be some time before you see this kind of thing in data centres as a matter of course, because it takes a long time to change the way that people design their servers to accommodate new standards. Remember how long it took for SFF drives to become as common in the DC as they are? No? Well it took a while. Server designs are sometimes developed years (or at least months) ahead of their release to the market. That said, I think Intel have come up with a really cool idea here, and if they can address the cooling and capacity issues as well as they say they can, this will likely take off. Of course, the idea of having 1PB of data sitting in 1RU should be at least a little scary in terms of failure domains, but I’m sure someone will work that out. It’s just physics after all, isn’t it?

There’s also an interesting article at The Register on the newer generation of drive form factors that’s worth checking out.

NGD Systems Are On The Edge Of Glory

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

NGD Systems recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here.

 

Edgy

Storage and compute / processing requirements at the edge aren’t necessarily new problems. People have been trying to process data outside of their core data centres for some time now. NGD Systems have a pretty good handle on the situation, and explained it thusly:

  • A massive amount of data is now produced at the edge;
  • AI algorithms demand large amounts of data; and
  • Moving data to cloud is often not practical.

They’ve taken a different approach with “computational storage” by moving the compute to storage. It then becomes a problem to solve in terms of Power/TB + $/GB + in-situ processing. Their focus has been on delivering a power efficient, low cost, computational storage solution.

A Novel Solution – Move Compute to Storage

Key attributes:

  • Maintain familiar methodology (no new learning)
  • Use standard protocols (NVMe) and processes (no new commands)
  • Minimise interface traffic (power and time savings)
  • Enhancing limited footprint with maximum benefit (customer TCO)

Moving Computation to Data is Cheaper than moving Data

  • A computation requested by an application is much more efficient if it is executed near the data it operates on
    • Minimises network traffic
    • Increases effective throughput and performance of the system (eg Hadoop Distributed File System)
    • Enables distributed processing
  • Especially true for big data (analytics): large sets and unstructured data
  • Traditional approach: high-performance servers coupled with SAN/NAS storage – Eventually limited by networking bottlenecks

 

Thoughts and Further Reading

NGD are targeting some interesting use cases, including:

  • Hyperscalers;
  • Content Delivery Networks; and
  • Fog Storage” market.

They say that these storage solutions solve the low power, more efficient compute needs without placing strain on the edge and “fog” platforms. I found the CDN use case to be particularly interesting. When you have a bunch of IP-addressable storage sitting in a remote point of presence it can sometimes be a pain to have them talking back to a centralised server to get decryption keys for protected content, for example. In this case you can have the drives do the key handling and authentication, providing faster access to content than would be possible in latency-constrained environments.

It seems silly to quote Gaga Herself when writing about tech, but I think NGD Systems are taking a really interesting approach to solving some of the compute problems at the edge. They’re not just talking about jamming a bunch of disks together with some compute. Instead, they’re jamming the compute in each of the disks. It’s not a traditional approach to solving some of the challenges of the edge, but it seems like it has legs for those use cases mentioned above. Edge compute and storage is often deployed in reasonably rugged environments that are not as well-equipped as large DCs in terms of cooling and power. The focus on delivering processing at storage that relies on minimal power and uses standard protocols is intriguing. They say they can do it at a reasonable price too, making the solution all the more appealing for those companies facing difficulties using more traditional edge storage and compute solutions.

You can check out the specifications of the Newport Platform here. Note the various capacities depend on the form factor you are consuming. There’s also a great paper on computational storage that you can download from here. For some other perspectives on computational storage, check out Max‘s article here, and Enrico’s article here.