Welcome to Random Short Take #76. Summer’s almost here. Let’s get random.
The nice folks at StorPool have announced StorPool Storage v20. I was lucky enough to catch up with Boyan and the team recently, and they told me about their work on supporting NVMe/TCP, StorPool on Amazon AWS, and NFS File Storage. It’s great stuff and worth checking out.
Long term retention – all the kids are doing it, but there are some things you need to think about. Preston has posted a great article on it here.
Disclaimer: I recently attended Storage Field Day 22. Some expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
It’s probably been the age of data for some time now, but I couldn’t think of a catchy heading. One comment from the Komprise folks during the presentation that really stood out to me was “Data outlives its storage infrastructure”. If I think back ten years to how I thought about managing data movement, it was certainly tied to the storage platform hosting the data, rather than what the data did. Whenever I had to move from one array to the next, or one protocol to another, I wasn’t thinking in terms of where the data would necessarily be best placed to serve the business. Generally speaking, I was approaching the problem in terms of getting good performance for blocks and files, but rarely was I thinking in terms of the value of the data to the business. Nowadays, it seems that there’s an improved focus on getting the “[d]ata in the right place at the right time – not just for efficiency – but to extract maximum value”. We’re no longer thinking about data in terms of old stuff living on slow storage, and fresh bits living on the fast stuff. As the amount of data being managed in enterprises continues to grow at an insane rate, it’s becoming more important than ever to understand just what usefulness the data offers the business.
[image courtesy of Komprise]
The variety of storage platforms available now is also a little more extensive than it was last century, and that presents some more interesting challenges in getting the data to where it needs to be. As I mentioned earlier, data growth is going berserk the world over. Add to this the problem of ubiquitous cloud access (and IT departments struggling to keep up with the governance necessary to wrangle these solutions into some sensible shape), and most enterprises looking to save money wherever possible, and data management can present real problems to most enterprise shops.
[image courtesy of Komprise]
Analytics To The Rescue!
Komprise has come up with an analytics-driven approach to data management that is built on some sound foundational principles. The solution needs to:
Go beyond storage efficiency – it’s not just about dedupe and compression at a certain scale.
Must be multi-directional – you need to be able to get stuff back.
Not disrupt users and workflows – do that and you may as well throw the solution in the bin.
Should create new uses for your data – it’s all about value, after all.
Puts your data first.
The final point is possibly the most critical one. If I think about the storage-centric approaches to data management that I’ve seen over the years, there’s definitely been a viewpoint that the underlying storage infrastructure would heavily influence how the data is used, rather than the data dictating how the storage platforms should be architected. Some of that is a question of visibility – if you don’t understand your data, it’s hard to come up with tailored solutions. Some of the problem is also the disconnect that seems to exist between “the business” and IT departments in a large number of enterprises. It’s not an easy problem to solve, by any stretch, but it does explain some of the novel approaches to data management that I’ve seen over the years.
Thoughts and Further Reading
Data management is hard, and it keeps getting harder because we keep making more and more data. And we frequently don’t have the time, or take the time, to work out what value the data actually has. This problem isn’t going to go away, so it’s good to see Komprise moving the conversation past that and into the realm of how we can best focus on deriving value from the data itself. There was certainly some interesting discussion during the presentation about the term analytics, and what that really meant in terms of the Komprise solution. Ultimately, though, I’m a fan of anything that elevates the conversation beyond “I can move your terabytes from this bucket to that bucket”. I want something that starts to tell me more about what type of data I’m storing, who’s using it, and how they’re using it. That’s when it gets interesting from a data management perspective. I think there’s a ways to go in terms of getting this solution right for everyone, but it strikes me that Komprise is on the right track, and I’m looking forward to seeing how the solution evolves alongside the storage technologies it’s using to get the most from everyone’s data. You can read more on the Komprise approach here.
Komprise has traditionally focused on unstructured data stored on-premises. It has now extended the capabilities of Komprise Intelligent Data Management to include cloud data. There’s currently support for Amazon S3 and Wasabi, with Google Cloud, Microsoft Azure, and IBM support coming soon.
So what do you get with this capability?
Analyse data usage across cloud accounts and buckets easily
Single view across cloud accounts, buckets, and storage classes
Analyse AWS usage by various metrics accurately based on access times
Explore different data archival, replication, and deletion strategies with instant cost projections
Optimise AWS costs with analytics-driven archiving
Continuously move objects by policy across Cloud Network Attached Storage (NAS), Amazon S3, Amazon S3 Standard-IA, Amazon S3 Glacier, and Amazon S3 Glacier DeepArchive
Minimise costs and penalties by moving data at the right time based on access patterns
Bridge to Big Data/Artificial Intelligence (AI) projects
Create virtual data lakes for Big Data, AI – search for exactly what you need across cloud accounts and buckets
Native access to moved data on each storage class with full data fidelity
Create Cyber Resiliency with AWS
Copy S3 data to AWS to protect from ransomware with an air-gapped copy
[image courtesy of Komprise]
Why Is This Good?
The move to cloud storage hasn’t been all beer and skittles for enterprise. Storing large amounts of data in public cloud presents enterprises with a number of challenges, including:
Poor visibility – “Bucket sprawl”
Insufficient data – Cloud does not easily track last access / data use
Cost complexity – Manual data movement can lead to unexpected retrieval cost surprises
Labour – Manually moving data is error-prone and time-consuming
Sample Use Cases
Some other reasons you might want to have Komprise manage your data include:
Finding ex-employee data stored in buckets.
Data migration – you might want to take a copy of your data from Wasabi to AWS.
There’s support for all unstructured data (file and object), so the benefits of Komprise can be enjoyed regardless of how you’re storing your unstructured data. It’s also important to note that there’s no change to the existing licensing model, you’re just now able to use the product on public cloud storage.
Effective data management remains a big challenge for enterprises. It’s no secret that public cloud storage is really just storage that lives in another company’s data centre. Sure, it might be object storage, rather than file based, but it’s still just a bunch of unstructured data sitting in another company’s data centre. The way you consume that data may have changed, and certainly the way you pay for it has changed, but fundamentally it’s still your unstructured data sitting on a share or a filesystem. The problems you had on-premises though, still manifest in public cloud environments (i.e. data sprawl, capacity issues, etc). That’s why the Komprise solution seems so compelling when it comes to managing your on-premises storage consumption, and extending that capability to cloud storage is a no-brainer. When it comes to storing unstructured data, it’s frequently a bin fire of some sort or another. The reason for this is because it doesn’t scale well. I don’t mean the storage doesn’t scale – you can store petabytes all over the place if you like. But if you’re still hand crafting your shares and manually moving data around, you’ll notice that it becomes more and more time consuming as time goes on (and your data storage needs grow).
One way to address this challenge is to introduce a level of automation, which is something that Komprise does quite well. If you’ve got many terabytes of data stored on-premises and in AWS buckets (or you’re looking to move some old data from on-premises to the cloud) and you’re not quite sure what it’s all for or how best to go about it, Komprise can certainly help you out.
Welcome to Random Short Take #39. Not a huge amount of players have worn 39 in the NBA, and I’m not going to pretend I’m any real fan of The Dwightmare. But things are tough all around, so let’s remain optimistic and push through to number 40. Anyway let’s get random.
I’ve known Howard Marks for a while now, and always relish the opportunity to speak with him when I can. This post is pretty hilarious, and I’m looking forward to reading the followup posts.
This is a great article from Alastair Cooke on COVID-19 and what En-Zed has done effectively to stop the spread. It was interesting to hear his thoughts on returning to the US, and I do agree that it’s going to be some time until I make the trip across the Pacific again.
Sometimes people get crazy ideas about how they might repurpose some old bits of technology. It’s even better when they write about their experiences in doing so. This article on automating an iPod Hi-Fi’s volume control over at Six Colors was fantastic.
Chris M. Evans put out a typically thought-provoking piece on data migration challenges recently that I think is worth checking out. I’ve been talking a lot to customers that are facing these challenges on a daily basis, and it’s interesting to see how, regardless of the industry vertical they operate in, it’s sometimes just a matter of the depth varying, so to speak.
I frequently bump into Ray Lucchesi at conferences, and he knows a fair bit about what does and doesn’t work. This article on his experiences recently with a number of virtual and online conferences is the epitome of constructive criticism.
Speaking of online conferences, the Australian VMUG UserCon will be virtual this year and will be held on the 30th July. You can find out more and register here.
Komprise recently announced the availability of its Elastic Data Migration solution. I was lucky enough to speak with Krishna Subramanian about the announcement and thought I’d share some of my notes here.
I’ve written about Komprise before. A few times, as it happens. Subramanian describes it as “analytics driven data management software”, capable of operating with NFS, SMB, and S3 storage. The data migration capability was added last year (at no additional charge), but it was initially focused on LAN-based migration.
Enter Elastic Data Migration
Elastic Data Migration isn’t just for LAN-based migrations though, it’s for customers want to migrate to the cloud, or perhaps another data centre. Invariably they’ll be looking to do this over a WAN, rather than a LAN. Given that WAN connections invariably suffer from lower speeds and higher latencies, how does Komprise deal with this? I’m glad you asked. The solution addresses latency thusly:
Increased parallelism inside the software (based on Komprise VMs, and the nature of the data sets);
Reducing round trips over the network; and
It’s been optimised to reduce the chatter of the protocol (eg NFS being chatty).
Sounds simple enough, but Komprise is seeing some great results when compared to traditional tools such as rsync.
There are some other benefits over the more traditional tools, including GUI access that allows you to run hundreds of migrations simultaneously.
[image courtesy of Komprise]
Of course, if you’re not into doing things with GUIs (and it doesn’t always make sense where a level of automation is required), you can do this programmatically via API access.
Thoughts and Further Reading
Depending on what part of the IT industry you’re most involved in, the idea of data migrations may seem like something that’s a little old fashioned. Moving a bunch of unstructured data around using tools from way back when? Why aren’t people just using the various public cloud options to store their data? Well, I guess it’s partly because things take time to evolve and, based on the sorts of conversations I’m still regularly having, simple to use data migration solutions for large volumes of data are still required, and hard to come across.
Komprise has made its name making sense of vast chunks of unstructured data living under various rocks in enterprises. It also has a good story when it comes to archiving that data. It makes a lot of sense that it would turn its attention to improving the experience and performance of migrating a large number of terabytes of unstructured data from one source to another. There’s already a good story here in terms of extensive multi-protocol support and visibility into data sources. I like that Komprise has worked hard on the performance piece as well, and has removed some of the challenges traditionally associated with migrating unstructured data over WAN connections. Data migrations are still a relatively complex undertaking, but they don’t need to be painful.
One of the few things I’m sure of nowadays is that the amount of data we are storing is not shrinking. Komprise is working hard to make sense of what all that data is being used for. Once it knows what that data is for, it’s making it easy to put it in the place that you’ll get the most value from it. Whether that’s on a different NAS on your LAN, or sitting in another data centre somewhere. Komprise has published a whitepaper with the test results I referred to earlier, and you can grab it from here (registration required). Enrico Signoretti also had Subramanian on his podcast recently – you can listen to that here.
Disclaimer: I recently attended Storage Field Day 19. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Komprise took us through the 6 tenets used to develop the solution:
Insight into our data
Make the insight actionable
Don’t get in front of hot data
Show us a path to the cloud
Scale to manage massive quantities of data
Transparent data movement
3 Architectural pillars
Dynamic Data Analytics – analyses data so you can make the right decision before buying more storage or backup
Transparent Move Technology – moves data with zero interference to apps, users, or hot data
Direct Data Access – puts you in control of your data – not your vendor
No interference with hot data
So what does the Komprise architecture look like? There are a couple of components.
The Director is a VM that can be hosted on-premises or in a cloud. This hosts the console, exposes the API, and stores configuration information.
The Observer runs on-premises and can run on ESXi, or can be hosted on Linux bare metal. It’s used to discover the storage (and should be hosted in the same DC as said storage).
Deep Analytics indexes the files, and the Director can run queries against it. It can also be used to tag the data. Deep Analytics supports multiple Observers (across multiple DCs), giving you a “global metadata lake” and can also deliver automatic performance throttling for scans.
One neat feature is that you can choose to put a second copy somewhere when you’re archiving data. Komprise said that the typical customer starting size is 1PB or more.
Thoughts and Further Reading
I’ve previously writtenenthusiastically about what I’ve seen from Komprise. Data management is a difficult thing to get right at the best of times. I believe the growth in primary, unstructured storage has meant that the average punter / enterprise can’t really rely on file systems and directories to store data in a sensible location. There’s just so much stuff that gets generated daily. And a lot of it is important (well, at least a fair chunk of it is). One of the keys to getting value from the data you generate, though, is the ability to quickly access that data after it’s been generated. Going back to a file in 6 months time to refer to something can be immensely useful. But it’s a hard thing to do if you’ve forgotten about the file, or what was in it. So it’s a nice thing to have a tool that can track this stuff for you in a relatively sane fashion.
Komprise can also guide you down the path when it comes to intelligently accessing and storing your unstructured data. It can help with reducing your primary storage footprint, reducing your infrastructure spend and, hopefully, your operational costs. What’s more exciting, though, is the fact that all of this can be done in a transparent fashion to the end user. Betty in the finance department can keep generating documents that have ridiculous file names, and storing them forever, and Komprise will help you move those spreadsheets to where they’re of most use.
Storage is cheaper than it once was, but we’re also storing insanely big amounts of data. And for much longer than we have previously. Even if my effective $/GB stored is low compared to what it was in the year 2000, my number of GB stored is exponentially higher. Anything I can do to reduce that spend is going to be something that my enterprise is interested in. It seems like Komprise is well-positioned to help me do that. It’s biggest customer has close to 100PB of data being looked after by Komprise.
You can download a whitepaper overview of the Komprise architecture here (registration required). For a different perspective on Komprise, check out Becky’s article here. Chin-Fah also shared his thoughts here.
The primary reason for our call was to discuss Komprise’s Series C funding round of US $24 million. You can read the press release here. Some noteworthy achievements include:
Revenue more than doubled every single quarter, with existing customers steadily growing how much they manage with Komprise; and
Some customers now managing hundreds of PB with Komprise.
Komprise are currently operating in the following key verticals:
Genomics and health care, with rapidly growing footprints;
Financial and Insurance sectors (5 out of 10 of the largest insurance companies in the world apparently use Komprise);
A lot of universities (research-heavy environments); and
Media and entertainment.
What’s It Do Again?
Komprise manages unstructured data over three key protocols (NFS, SMB, S3). You can read more about the product itself here, but some of the key features include the ability to “Transparently archive data”, as well as being able to put a copy of your data in another location (the cloud, for example).
So What’s New?
One of Komprise’s recent announcements was NAS to NAS migration. Say, for example, you’d like to migrate your data from an Isilon environment to FlashBlade, all you have to do is set one as a source, and one as target. The ACLs are fully preserved across all scenarios, and Komprise does all the heavy lifting in the background.
They’re also working on what they call “Deep Analytics”. Komprise already aggregates file analytics data very efficiently. They’re now working on indexing metadata on files and exposing that index. This will give you “a Google-like search on all your data, no matter where it sits”. The idea is that you can find data using any combination of metadata. The feature is in beta right now, and part of the new funding is being used to expand and grow this capability.
Komprise can be driven entirely from an API, making it potentially interesting for service providers and VARs wanting to add support for unstructured data and associated offerings to their solutions. You can also use Komprise to “confine” data. The idea behind this is that data can be quarantined (if you’re not sure it’s being used by any applications). Using this feature you can perform staged deletions of data once you understand what applications are using what data (and when).
I don’t often write articles about companies getting additional funding. I’m always very happy when they do, as someone thinks they’re on the right track, and it means that people will continue to stay employed. I thought this was interesting enough news to cover though, given that unstructured data, and its growth and management challenges, is an area I’m interested in.
When I first wrote about Komprise I joked that I needed something like this for my garage. I think it’s still a valid assertion in a way. The enterprise, at least in the unstructured file space, is a mess based on the what I’ve seen in the wild. Users and administrators continue to struggle with the sheer volume and size of the data they have under their management. Tools such as this can provide valuable insights into what data is being used in your organisation, and, perhaps more importantly, who is using it. My favourite part is that you can actually do something with this knowledge, using Komprise to copy, migrate, or archive old (and new) data to other locations to potentially reduce the load on your primary storage.
I bang on all the time about the importance of archiving solutions in the enterprise, particularly when companies have petabytes of data under their purview. Yet, for reasons that I can’t fully comprehend, a number of enterprises continue to ignore the problem they have with data hoarding, instead opting to fill their DCs and cloud storage with old data that they don’t use (and very likely don’t need to store). Some of this is due to the fact that some of the traditional archive solution vendors have moved on to other focus areas. And some of it is likely due to the fact that archiving can be complicated if you can’t get the business to agree to stick to their own policies for document management. In just the same way as you can safely delete certain financial information after an amount of time has elapsed, so too can you do this with your corporate data. Or, at the very least, you can choose to store it on infrastructure that doesn’t cost a premium to maintain. I’m not saying “Go to work and delete old stuff”. But, you know, think about what you’re doing with all of that stuff. And if there’s no value in keeping the “kitchen cleaning roster May 2012.xls” file any more, think about deleting it? Or, consider a solution like Komprise to help you make some of those tough decisions.
Disclaimer: I recently attended Storage Field Day 17. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Komprise recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here. Here’s a blurry photo (love that iPhone camera quality) of Kumar K. Goswami (Founder and CEO of Komprise) presenting.
What’s In Your Garage?
My current house has a good sized garage, and we only have one car. So I have a lot of space to store things in it. When we moved in we added some storage cupboards and some additional shelving to accommodate our stuff. Much like Parkinson’s Law (and the corollary for storage systems), the number of things in my garage has expanded to fill the available space. I have toys from when my children were younger, old university assignments, clothes, Christmas decorations, oft-neglected gym equipment. You get the idea. Every year I give a bunch of stuff away to charities or throw it out. But my primary storage (new things) keeps expanding too, so I need to keep moving stuff to my garage for storage.
If you’ve ever had the good (!) fortune of managing file servers, you’ll understand that there’s a lot of data being stored in corporate environments that people don’t know what to do with. As Komprise pointed out in their presentation, we’re “[d]rowning in unstructured data”. Komprise wants to help out by “[i]dentifying cold data and syphoning it off before it goes into the data workflow and data protection systems”. The idea is that it delivers non-disruptive data management. Unlike cleaning up my garage, things just move about based on policies.
How’s That Work Then?
Komprise works by moving unstructured data about the place. It’s a hybrid SaaS solution, with a console in the cloud, and “observers” running in VMs on-premises.
[image courtesy of Komprise]
I don’t want to talk too much about how the product works, as I think the video presentation does a better job of that than I would. And there’s also an excellent article on their website covering the Komprise Filesystem. From a visualisation perspective though, the dashboard presents a “green doughnut”, providing information including:
Data by age;
File analytics (size, types, top users, etc); and
Then set policies and see ROI based on the policy (customer enters their own costs).
When files are moved around, Komprise leaves a “breadcrumb” on the source storage. They were careful not to call it a stub – it’s a Komprise Dynamic Link – a 4KB symbolic link.
It’s A Real Problem
One thing that really struck me about Komprise’s presentation was when they said they wanted to “[m]ove things you don’t want to cheaper storage”. It got me thinking that a lot of corporate file servers are very similar to my garage. There’s an awful lot of stuff being stored on them. Some of it is regularly used (much like my Christmas decorations), and some of it not as much (more like my gym equipment). So why don’t we throw stuff out? Well, when you’re in business, you generally have to work within the confines of various frameworks and regulations. So it’s not as simple as saying “Let’s get rid of the old stuff we haven’t used in 24 months”. Unlike those particularly unhelpful self-help books on decluttering, trashing corporate data isn’t the same as throwing out old boxes of magazines.
This is a real problem for corporations, and is only going to get worse. More and more data is being generated every day, much of it simply dumped on unstructured file stores with little to no understanding of the data’s value. Komprise seem to be doing a good job of helping to resolve an old problem. I still naively like to think that this would be better if people would use document management systems properly and take some responsibility for their stuff. But, much like the mislabelled boxes of files in my garage, it’s often not that simple. People move on, don’t know to do with the data, and assume that the IT folks will take care of it. I think solutions like the one from Komprise, while being technically very interesting, also have an important role to play in the enterprise. I’m just wondering if I can do something like this with all of the stuff in my garage.