Datadobi Announces DobiMigrate 5.8 – Introduces Chain of Custody

Datadobi recently announced version 5.8 of its DobiMigrate software and introduced a “Chain of Custody” feature. I had the opportunity to speak to Carl D’Halluin and Michael Jack about the announcement and thought I’d share some thoughts on it here.

 

Don’t They Do File Migration?

If you’re unfamiliar with Datadobi, it’s a company that specialises in NAS migration software. It tends to get used a lot by the major NAS vendors as rock solid method of moving data of a competitor’s box and onto theirs. Datadobi has been around for quite a while, and a lot of the founders have heritage with EMC Centera.

Chain of Custody?

So what exactly does the Chain of Custody feature offer?

  • Tracking files and objects throughout an entire migration
  • Full photo-finish of source and destination system at cutover time
  • Forensic input which can serve as future evidence of tampering
  • Available for all migrations.
    • No performance hit.
    • No enlarged maintenance window.

[image courtesy of Datadobi]

Why Is This Important?

Organisations are subject to a variety of legislative requirements the word over to ensure that the data presented as evidence in courts of law hasn’t been tampered with. Some of them spend an inordinate amount of money ensuring that the document management systems (and the hardware those systems reside on) offer all kinds of compliance and governance features that ensure that you can reliably get up in front of a judge and say that nothing has been messed with. Or you can reliably say that it has been messed with. Either way though, it’s reliable. Unfortunately, nothing lasts forever (not even those Centera cubes we put in years ago).

So what do you do when you have to migrate your data from one platform to another? If you’ve just used rsync or robocopy to get the data from one share to another, how can you reliably prove that you’ve done so, without corrupting or otherwise tampering with the data? Logs are just files, after all, so what’s to stop someone “losing” some data. along the way?

It turns out that a lot of folks in the legal profession have been aware that this was a problem for a while, but they’ve looked the other way. I am no lawyer, but as it was explained to me, if you introduce some doubt into the reliability of the migration process, it’s easy enough for the other side to counter that your stuff may not have been so reliable either, and the whole thing becomes something of a shambles. Of course, there’s likely a more coherent way to explain this, but this is tech blog and I’m being lazy.

 

Thoughts

I’ve done all kinds of data migrations over the years. I think I’ve been fortunate that I’ve never specifically had to deal with a system that was being relied on seriously for legislative reasons, because I’m sure that some of those migrations were done more by the seat of my pants than anything else. Usually the last thing on the organisation’s mind (?) was whether the migration activity was compliant or not. Instead, the focus of the project manager was normally to get the data from the old box to the new box as quickly as possible and with as little drama / downtime as possible.

If you’re working on this stuff in a large financial institution though, you’ll likely have a different focus. And I’m sure the last thing your corporate counsel want to hear is that you’ve been playing a little fast and loose with data over the years. I anticipate this announcement will be greeted with some happiness by people who’ve been saddled with these kinds of daunting tasks in the past. As we move to a more and more digital world, we need to carry some of the concepts from the physical world across. It strikes me that Datadobi has every reason to be excited about this announcement. You can read the press release here.

 

QNAP – TR-004 Firmware Issue Workaround

I’ve been a user of QNAP products for over 10 years now. I have a couple of home systems running at the moment, including a TS-831X with a TR-004 enclosure attached to it. Last week I was prompted to update the external enclosure firmware to 1.1.0. After I did that, I had an issue where, once the unit spun down its disks, the volume would be marked as “Not active” by the system and I’d lose access to the data. Recovery was simple enough – I could either reboot the box or manually recover the enclosure via the QTS interface. I raised a job with QNAP web support, and we went back and forth with troubleshooting over the course of a week. The ticket was eventually escalated, and it was acknowledged that the current fix was to rollback to version 1.0.4 of the enclosure firmware.

The box is only used for media storage for Plex, but I figured it was worth backing up the contents of the external enclosure to another location in case something went wrong with the rollback. In any case, I’ve not done a downgrade on a QNAP device before, so I thought it was worth documenting the procedure here.

For some reason I needed to use Chrome over Safari in this example. I don’t know why that is. But whatever. In QTS, click on Storage & Snapshots, then Storage. Click on External RAID Management and then click on Check for Update.

You’ll see in this example, the installed TR-004 version is 1.1.0. Click on Browse to get the firmware file you want to roll back to.

You’ll get a stern warning that this kind of thing might cause problems.

Take a backup. Then tick the box.

The update will progress. It doesn’t take too long.

You then need to power off the enclosure and power it back on.

And, hopefully, your data will still be there. One side effect I noted was that the shared folder on that particular volume no longer had the correct permissions associated with the share. Fortunately, this is a home environment, and I’m using one user account to provide access to the share. I don’t know what you’d do if you had a complicated permissions situation in place.

And there you go. Like most things with QNAP, it’s a fairly simple process. This is the first time I’ve had to use QNAP support, and I found them responsive and helpful. I’ll report back if I get any other issues with the enclosure.

FalconStor Announces StorSafe

Remember FalconStor? You might have used its VTL product years ago? Or perhaps the Network Storage Server product? Anyway, it’s still around, and recently announced a new product. I had the opportunity to speak to Todd Brooks (CEO) and David Morris (VP Products) to discuss StorSafe, something FalconStor is positioning as “the industry’s first enterprise-class persistent data storage container”.

 

What Is It?

StorSafe is essentially a way to store data via containers. It has the following features:

  • Long-term archive storage capacity reduction drives low cost;
  • Multi-cloud archive storage;
  • Automatic archive integrity validation & journaling in the cloud;
  • Data egress fee optimisation; and
  • Unified Management and Analytics Console.

Persistent Virtual Storage Container

StorSafe is a bit different to the type of container you might expect from a company with VTL heritage.

  • Does not rely on traditional tape formats, e.g. LTO constraints
  • Variable Payload Capacity of Archival Optimisation by Type
  • Execution capabilities for Advanced Features
  • Encryption, Compression, and Best-in-Class Deduplication
  • Erasure coding for Redundancy across On-premise/Clouds
  • Portable – Transfer Container to Storage System or any S3 Cloud
  • Archival Retention for 10, 25, 50, & 100 years

[image courtesy of FalconStor]

Multi-Cloud Erasure Coding

  • The VSC is sharded into multiple Mini-Containers that are protected with Erasure Coding
  • These Mini-Containers can then be moved to multiple local, private data centres, or cloud destinations for archive
  • Tier Containers depending on Access Criticality or Limited Access needs

[image courtesy of FalconStor]

 

Thoughts And Further Reading

If you’re familiar with my announcement posts, you’ll know that I try to touch on the highlights provided to me by the vendor about its product, and add my own interpretation. I feel like I haven’t really done StorSafe justice however. It’s a cool idea, in a lot of ways. This idea that you can take a bunch of storage and dump it all over the place in a distributed fashion and have it be highly accessible and resilient. This isn’t designed for high performance storage requirements. This is very much focused on the kinds of data you’d be keen to store long-term, maybe on tape. I can’t tell you what this looks like from an implementation or performance perspective, so I can’t tell you whether the execution matches up with the idea that Falconstor has had. I find the promise of portability, particularly for data that you want to keep for a long time, extremely compelling. So let’s agree that this idea seems interesting, and watch this space for more on this as I learn more about it. You can read the press release here, and check out Mellor’s take on it here.

Random Short Take #31

Welcome to Random Short Take #31. Lot of good players have worn 31 in the NBA. You’d think I’d call this the Reggie edition (and I appreciate him more after watching Winning Time), but this one belongs to Brent Barry. This may be related to some recency bias I have, based on the fact that Brent is a commentator in NBA 2K19, but I digress …

  • Late last year I wrote about Scale Computing’s big bet on a small form factor. Scale Computing recently announced that Jerry’s Foods is using the HE150 solution for in-store computing.
  • I find Plex to be a pretty rock solid application experience, and most of the problems I’ve had with it have been client-related. I recently had a problem with a server update that borked my installation though, and had to roll back. Here’s the quick and dirty way to do that on macOS.
  • Here’s are 7 contentious thoughts on data protection from Preston. I think there are some great ideas here and I recommend taking the time to read this article.
  • I recently had the chance to speak with Michael Jack from Datadobi about the company’s announcement about its new DIY Starter Pack for NAS migrations. Whilst it seems that the professional services market for NAS migrations has diminished over the last few years, there’s still plenty of data out there that needs to be moved from on box to another. Robocopy and rsync aren’t always the best option when you need to move this much data around.
  • There are a bunch of things that people need to learn to do operations well. A lot of them are learnt the hard way. This is a great list from Jan Schaumann.
  • Analyst firms are sometimes misunderstood. My friend Enrico Signoretti has been working at GigaOm for a little while now, and I really enjoyed this article on the thinking behind the GigaOm Radar.
  • Nexsan recently announced some enhancements to its “BEAST” storage platforms. You can read more on that here.
  • Alastair isn’t just a great writer and moustache aficionado, he’s also a trainer across a number of IT disciplines, including AWS. He recently posted this useful article on what AWS newcomers can expect when it comes to managing EC2 instances.

Storage Field Day 19 – Wrap-up and Link-o-rama

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

This is a quick post to say thanks once again to Stephen and Ben, and the presenters at Storage Field Day 19. I had a super fun and educational time. For easy reference, here’s a list of the posts I did covering the events (they may not match the order of the presentations).

Storage Field Day – I’ll Be at Storage Field Day 19

Storage Field Day 19 – (Fairly) Full Disclosure

Tiger Technology Is Bridging The Gap

Western Digital, Composable Infrastructure, Hyperscalers, And You

Infrascale Protects Your Infrastructure At Scale

MinIO – Not Your Father’s Object Storage Platform

Dell EMC Isilon – Cloudy With A Chance Of Scale Out

NetApp And The StorageGRID Evolution

Komprise – Non-Disruptive Data Management

Stellus Is Doing Something With All That Machine Data

Dell EMC PowerOne – Not V(x)block 2.0

WekaIO And A Fresh Approach

Dell EMC, DevOps, And The World Of Infrastructure Automation

Also, here’s a number of links to posts by my fellow delegates (in no particular order). They’re all very smart people, and you should check out their stuff, particularly if you haven’t before. I’ll attempt to keep this updated as more posts are published. But if it gets stale, the Storage Field Day 19 landing page will have updated links.

 

Becky Elliott (@BeckyLElliott)

SFD19: No Komprise on Knowing Thy Data

SFD19: DellEMC Does DevOps

 

Chin-Fah Heoh (@StorageGaga)

Hadoop is truly dead – LOTR version

Zoned Technologies With Western Digital

Is General Purpose Object Storage Disenfranchised?

Tiger Bridge extending NTFS to the cloud

Open Source and Open Standards open the Future

Komprise is a Winner

Rebooting Infrascale

DellEMC Project Nautilus Re-imagine Storage for Streams

Paradigm shift of Dev to Storage Ops

StorageGRID gets gritty

Dell EMC Isilon is an Emmy winner!

 

Chris M Evans (@ChrisMEvans)

Storage Field Day 19 – Vendor Previews

Storage Management and DevOps – Architecting IT

Stellus delivers scale-out storage with NVMe & KV tech – Architecting IT

Can Infrascale Compete in the Enterprise Backup Market?

 

Ray Lucchesi (@RayLucchesi)

097: GreyBeards talk open source S3 object store with AB Periasamy, CEO MinIO

Gaming is driving storage innovation at WDC

 

Enrico Signoretti (@ESignoretti)

Storage Field Day 19 RoundUp

Tiers, Tiers, and More Storage Tiers

The Hard Disk is Dead! (But Only in Your Datacenter)

Dell EMC PowerOne is Next-Gen Converged Infrastructure

Voices in Data Storage – Episode 35: A Conversation with Krishna Subramanian of Komprise

 

Gina Rosenthal (@GMinks)

Storage Field Day 19: Getting Back to My Roots

Is storage still relevant?

Tiger Technology Brings the Cloud to You

Taming Unstructured Data with Dell EMC Isilon

Project Nautilus emerged as Dell’s Streaming Data Platform

 

Joey D’Antoni (@JDAnton)

Storage Field Day 19–Current State of the Storage Industry #SFD19

Storage Field Day 19–Western Digital #SFD19

Storage Field Day 19 MinIO #SFD19

 

Keiran Shelden (@Keiran_Shelden)

California, Show your teeth… Storage Field Day 19

Western Digital Presents at SFD19

 

Ruairi McBride (@McBride_Ruairi)

 

Arjan Timmerman (@ArjanTim)

TECHunplugged at Storage Field Day 19

TECHunplugged VideoCast SFD19 Part 1

Preview Storage Field Day 19 – Day 1

 

Vuong Pham (@Digital_KungFu)

 

[photo courtesy of Stephen Foskett]

WekaIO And A Fresh Approach

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

WekaIO recently presented at Storage Field Day 19. You can see videos of their presentation here, and download my rough notes from here.

 

More Data And New Architectures

Liran Zvibel (Co-founder and CEO) spent some time talking about the explosion in data storage requirements in the next 4 – 5 years. It was suggested that most of this growth will come in the form of unstructured data. The problem with today’s storage systems, he suggested, was that storage is broken into “Islands of Compromise” categories – each category carries a leader. What does that mean exactly? DAS and SAN cannot share data easily, and the performance of a number of NAS and Object architectures isn’t great.

A New Storage Category

WekaIO is positioning itself in a new storage category. One that delivers:

  • The highest performance for any workload
  • Complete data shareability
  • Cloud native, hybrid cloud support
  • Full enterprise features
  • Simple management

Unique Product Differentiation

So what is that sets WekaIO apart from the rest of the storage industry? Zvibel listed a number of differentiators, including:

  • Only POSIX namespace that scales to exabytes of capacity and trillions of files
  • Only networked file system that is faster than local storage
    • Massively parallel
    • Lowest latency
  • Snap to object
    • Unique blend of All-Flash and Object storage for instant backup to cloud storage (no backup software required)
  • Cloud burst from on-premises to public cloud
    • Fully hybrid cloud enabled with highest performance
  • End-to-end data encryption with no performance degradation
    • Critical for modern workloads and compliance

[image courtesy of Barbara Murphy]

 

Customer Examples

This all sounds great, but where is WekaIO really being used effectively? Barbara Murphy spent some time talking with the delegates about a number of customer examples across the following market verticals.

Life sciences

  • Genomics sequencing and analytics
  • Drug discovery
  • Microscopy

Deep Learning

  • Machine Learning / Artificial Intelligence
  • Real-time analytics
  • IoT

 

Thoughts and Further Reading

I’ve written enthusiastically about WekaIO before. It’s easy to get caught up in some of the hype that seems to go hand in hand with WekaIO presentations. But WekaIO has a lot of data to back up its claims, and it’s taken an interesting approach to solving traditional storage problems in a non-traditional fashion. I like that there’s a strong cloud story there, as well as the potential to leverage the latest hardware advancements to deliver the performance companies need.

The analysts and storage vendors drone on and on about the explosion in data growth over the coming years, but it’s a real problem. Our workload challenges are changing as well, and it seems like a new approach is needed for how we approach some of these challenges. The scale of the data that needs to be crunched doesn’t always mean that DAS is a good option. You’re more likely to see these kinds of challenges show up in the science and technology industries. And WekaIO seems to be well-positioned to meet these challenges, whether it’s in public cloud or on-premises. It strikes me that WekaIO’s focus on performance and resilience, along with a robust software-defined architecture, has it in a good position to tackle the types of workload problems we’re seeing at the edge and in AI / ML focused environments. I’m really looking forward to seeing what comes next for WekaIO.

Dell EMC PowerOne – Not V(x)block 2.0

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Dell EMC recently presented at Storage Field Day 19. You can see videos of the presentation here, and download my rough notes from here.

 

Not VxBlock 2.0?

Dell EMC describes PowerOne as “all-in-one autonomous infrastructure”. It’s converged infrastructure, meaning your storage, compute, and networking are all built into the rack. It’s a transportation-tested package and fully assembled when it ships. When it arrives, you can plug it in, fire up the API, and be up and going “within a few hours”.

Trey Layton is no stranger to Vblock / VxBlock, and he was very clear with the delegates that PowerOne is not replacing VxBlock. After all, VxBlock lets them sell Dell EMC external storage into Cisco UCS customers.

 

So What Is It Then?

It’s a rack or racks full of gear. All of which is now Dell EMC gear. And it’s highly automated and has some proper management around it too.

[image courtesy of Dell EMC]

So what’s in those racks?

  • PowerMax Storage – World’s “fastest” storage array
  • PowerEdge MX – industry leading compute
  • PowerSwitch – Declarative system fabric
  • PowerOne Controller – API-powered automation engine

PowerMax Storage

  • Zero-touch SAN config
  • Discovery / inventory of storage resources
  • Dynamically create storage volumes for clusters
  • Intelligent load balancing

PowerEdge MX Compute

  • Dynamically provision compute resources into clusters
  • Automated chassis expansion
  • Telemetry aggregation
  • Kinetic infrastructure

System Fabrics

  • Switches are 32Gbps
  • 98% reduction in network configuration steps
  • System fabric visibility and lifecycle management
  • Intent-based automated deployment and provision
  • PowerSwitch open networking

PowerOne Controller

  • Highly automates 1000s of tasks
  • Powered by Kubernetes and Ansible
  • Delivers next-gen autonomous outcomes via robust API capabilities

From a scalability perspective, you can go to 275 nodes in a pod, and you can look after up to 32 pods (I think). The technical specifications are here.

 

Thoughts and Further Reading

Converged infrastructure has always been an interesting architectural choice for the enterprise. When VCE first came into being 10+ years ago via Acadia, delivering consistent infrastructure experiences in the average enterprise was a time-consuming endeavour and not a lot of fun. It was also hard to do well. VCE changed a lot of that with Vblock, but you paid a premium. The reason you paid that premium was that VCE did a pretty decent job of putting together an architecture that was reliable and, more importantly, supportable by the vendor. It wasn’t just the IP behind this that made it successful though, it was the effort put into logistics and testing. And yes, a lot of that was built on the strength of spreadsheets and the blood, sweat and tears of the deployment engineers out in the field.

PowerOne feels like a very different beast in this regard. Dell EMC took us through a demo of the “unboxing” experience, and talked extensively about the lifecycle of the product. They also demonstrated many of the automation features included in the solution that weren’t always there with Vblock. I’ve been responsible for Vblock environments over the years, and a lot of the lifecycle management activities were very thoroughly documented, and extremely manual. PowerOne, on the other hand, doesn’t look like it relies extensively on documentation and spreadsheets to be managed effectively. But maybe that’s just because Trey and the team were able to demonstrate things so effectively.

So why would the average enterprise get tangled up in converged infrastructure nowadays? What with all the kids and their HCI solutions, and the public cloud, and the plethora of easy to consume infrastructure solutions available via competitive consumption models? Well, some enterprises don’t like relying on people within the organisation to deliver solutions for mission critical applications. These enterprises would rather leave that type of outcome in the hands of one trusted vendor. But they might still want that outcome to be hosted on-premises. Think of big financial institutions, and various government agencies looking after very important things. These are the kinds of customers that PowerOne is well suited to.

That doesn’t mean that what Dell EMC is doing with PowerOne isn’t innovative. In fact I think what they’ve managed to do with converged infrastructure is very innovative, within the confines of converged infrastructure. This type of approach isn’t for everyone though. There’ll always be organisations that can do it faster and cheaper themselves, but they may or may not have as much at stake as some of the other guys. I’m curious to see how much uptake this particular solution gets in the market, particularly in environments where HCI and public cloud adoption is on the rise. It strikes me that Dell EMC has turned a corner in terms of system integration too, as the out of the box experience looks really well thought out compared to some of its previous attempts at integration.

Stellus Is Doing Something With All That Machine Data

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Stellus Technologies recently came out of stealth mode. I had the opportunity to see the company present at Storage Field Day 19 and thought I’d share my thoughts here. You can grab a copy of my notes here.

 

Company Background

Jeff Treuhaft (CEO) spent a little time discussing the company background and its development up to this point in time.

  • Founded in 2016
  • Data Path architecture developed in 2017
  • Data path validations in 2018
  • First customer deployments in 2019
  • Commercial availability in 2020

 

The Problem

What’s the problem Stellus is trying to solve then? There’s been a huge rise in unstructured data (driven in large part by AI / ML workloads) and an exponential increase in the size of data sources that enterprises are working with. There have also been significant increases in performance requirements for unstructured data. This has been driven primarily by:

  • Life sciences;
  • Media and entertainment; and
  • IoT.

The result is that the storage solutions supporting these workloads need to:

  • Offer scalable, consistent performance;
  • Support common global namespaces;
  • Work with variable file sizes;
  • Deliver high throughput;
  • Ensure that there are no parallel access penalties;
  • Easily manage data over time; and
  • Function as a data system of record.

It’s Stellus’s belief that “[c]urrent competitors have built legacy file systems at the time when spinning disk and building private data centres were the focus”.

 

Stellus Data Platform

Bala Ganeshan (CTO and VP of Engineering) walked the delegates through the Stellus Data Platform.

Design Goals

  • Parallelism
  • Scale
  • Throughput
  • Constant performance
  • Decoupling capacity and performance
  • Independently scale perfromance and capacity on commodity hardware
  • Distributed all, share everything KV based data model data path ready for new memories
  • Consistently high performance even as system scales

File System as Software

  • Stores unstructured data closest to native format: objects
  • Data Services provided on Stellus objects
  • Stateless – state in Key Value Stores
  • User mode enables
    • On-premises
    • Cloud
    • Hybrid
  • Independent from custom hardware and kernel

Don’t currently have deduplication capability built in.

Algorithmic Data Locality and Data Services

  • Enables scale by algorithmically determining location – no cluster-wide maps
  • Built for resilience to multiple failure – pet vs. cattle
  • Understands topology of persistent stores
  • Architecture maintains versions – enables data services such as snapshots

Key-Value-over-NVMe Fabrics

  • Decoupled data services and persistence requires transport
  • Architecture maintains native data structure – objects
  • NVMe-over-Fabric protocol enhanced to transport KV commands
  • Transport independent
    • RDMA
    • TCP/IP

Native Key-Value Stores

  • Unstructured data is generally immutable
  • Updates result in new objects
  • Available in different sizes and performance characteristics
  • We used application-specific KV stores, such as:
    • Immutable data
    • Short-lived updates
    • Metadata

 

Thoughts and Further Reading

Every new company emerging from stealth has a good story to tell. And they all want it to be a memorable one. I think Stellus certainly has a good story to tell in terms of how it’s taking newer technologies to solve more modern storage problems. Not every workload requires massive amounts of scalability at the storage layer. But for those that do, it can be hard to solve that problem with traditional storage architectures. The key-value implementation from Stellus allows it to do some interesting stuff with larger drives, and I can see how this will have appeal as we move towards the use of larger and larger SSDs to store data. Particularly as a large amount of modern storage workloads are leveraging unstructured data.

More and more NVMe-oF solutions are hitting the market now. I think this is a sign that evolving workload requirements are pushing the capabilities of traditional storage solutions. A lot of the data we’re dealing with is coming from machines, not people. It’s not about how I derive value from a spreadsheet. It’s about how I derive value from terabytes of log data from Internet of Things devices. This requires scale – in terms of both capacity and performance. Using key-value over NVMe-oF is an interesting approach to the challenge – one that I’m keen to explore further as Stellus makes its way in the market. In the meantime, check out Chris Evans’s article on Stellus over at Architecting IT.

Komprise – Non-Disruptive Data Management

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Komprise recently presented at Storage Field Day 19. You can see videos of the presentation here, and download my rough notes from here.

 

What Do You Need From A Data Management Solution?

Komprise took us through the 6 tenets used to develop the solution:

  • Insight into our data
  • Make the insight actionable
  • Don’t get in front of hot data
  • Show us a path to the cloud
  • Scale to manage massive quantities of data
  • Transparent data movement

3 Architectural pillars

  • Dynamic Data Analytics – analyses data so you can make the right decision before buying more storage or backup
  • Transparent Move Technology – moves data with zero interference to apps, users, or hot data
  • Direct Data Access – puts you in control of your data – not your vendor

Archive successfully

  • No disruption
    • Transparency
    • No interference with hot data
  • Save money
  • Without lock-in
  • Extract value

 

Architecture

So what does the Komprise architecture look like? There are a couple of components.

  • The Director is a VM that can be hosted on-premises or in a cloud. This hosts the console, exposes the API, and stores configuration information.
  • The Observer runs on-premises and can run on ESXi, or can be hosted on Linux bare metal. It’s used to discover the storage (and should be hosted in the same DC as said storage).
  • Deep Analytics indexes the files, and the Director can run queries against it. It can also be used to tag the data. Deep Analytics supports multiple Observers (across multiple DCs), giving you a “global metadata lake” and can also deliver automatic performance throttling for scans.

One neat feature is that you can choose to put a second copy somewhere when you’re archiving data. Komprise said that the typical customer starting size is 1PB or more.

 

Thoughts and Further Reading

I’ve previously written enthusiastically about what I’ve seen from Komprise. Data management is a difficult thing to get right at the best of times. I believe the growth in primary, unstructured storage has meant that the average punter / enterprise can’t really rely on file systems and directories to store data in a sensible location. There’s just so much stuff that gets generated daily. And a lot of it is important (well, at least a fair chunk of it is). One of the keys to getting value from the data you generate, though, is the ability to quickly access that data after it’s been generated. Going back to a file in 6 months time to refer to something can be immensely useful. But it’s a hard thing to do if you’ve forgotten about the file, or what was in it. So it’s a nice thing to have a tool that can track this stuff for you in a relatively sane fashion.

Komprise can also guide you down the path when it comes to intelligently accessing and storing your unstructured data. It can help with reducing your primary storage footprint, reducing your infrastructure spend and, hopefully, your operational costs. What’s more exciting, though, is the fact that all of this can be done in a transparent fashion to the end user. Betty in the finance department can keep generating documents that have ridiculous file names, and storing them forever, and Komprise will help you move those spreadsheets to where they’re of most use.

Storage is cheaper than it once was, but we’re also storing insanely big amounts of data. And for much longer than we have previously. Even if my effective $/GB stored is low compared to what it was in the year 2000, my number of GB stored is exponentially higher. Anything I can do to reduce that spend is going to be something that my enterprise is interested in. It seems like Komprise is well-positioned to help me do that. It’s biggest customer has close to 100PB of data being looked after by Komprise.

You can download a whitepaper overview of the Komprise architecture here (registration required). For a different perspective on Komprise, check out Becky’s article here. Chin-Fah also shared his thoughts here.

Random Short Take #30

Welcome to Random Short Take #30. You’d think 30 would be an easy choice, given how much I like Wardell Curry II, but for this one I’m giving a shout out to Rasheed Wallace instead. I’m a big fan of ‘Sheed. I hope you all enjoy these little trips down NBA memory lane. Here we go.

  • Veeam 10’s release is imminent. Anthony has been doing a bang up job covering some of the enhancements in the product. This article was particularly interesting because I work in a company selling Veeam and using vCloud Director.
  • Sticking with data protection, Curtis wrote an insightful article on backups and frequency.
  • If you’re in Europe or parts of the US (or can get there easily), like writing about technology, and you’re into cars and stuff, this offer from Cohesity could be right up your alley.
  • I was lucky enough to have a chat with Sheng Liang from Rancher Labs a few weeks ago about how it’s going in the market. I’m relatively Kubernetes illiterate, but it sounds like there’s a bit going on.
  • For something completely different, this article from Christian on Raspberry Pi, volumio and HiFiBerry was great. Thanks for the tip!
  • Spinning disk may be as dead as tape, if these numbers are anything to go by.
  • This was a great article from Matt Crape on home lab planning.
  • Speaking of home labs, Shanks posted an interesting article on what he has running. The custom-built rack is inspired.