Random Short Take #76

Welcome to Random Short Take #76. Summer’s almost here. Let’s get random.


Komprise – It’s About Data, Not Storage

Disclaimer: I recently attended Storage Field Day 22.  Some expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Komprise recently presented at Storage Field Day 22. You can see their videos from Storage Field Day 22 here, and download a PDF copy of my rough notes from here.


The Age Of Data, Not Storage

It’s probably been the age of data for some time now, but I couldn’t think of a catchy heading. One comment from the Komprise folks during the presentation that really stood out to me was “Data outlives its storage infrastructure”. If I think back ten years to how I thought about managing data movement, it was certainly tied to the storage platform hosting the data, rather than what the data did. Whenever I had to move from one array to the next, or one protocol to another, I wasn’t thinking in terms of where the data would necessarily be best placed to serve the business. Generally speaking, I was approaching the problem in terms of getting good performance for blocks and files, but rarely was I thinking in terms of the value of the data to the business. Nowadays, it seems that there’s an improved focus on getting the “[d]ata in the right place at the right time – not just for efficiency – but to extract maximum value”. We’re no longer thinking about data in terms of old stuff living on slow storage, and fresh bits living on the fast stuff. As the amount of data being managed in enterprises continues to grow at an insane rate, it’s becoming more important than ever to understand just what usefulness the data offers the business.

[image courtesy of Komprise]

The variety of storage platforms available now is also a little more extensive than it was last century, and that presents some more interesting challenges in getting the data to where it needs to be. As I mentioned earlier, data growth is going berserk the world over. Add to this the problem of ubiquitous cloud access (and IT departments struggling to keep up with the governance necessary to wrangle these solutions into some sensible shape), and most enterprises looking to save money wherever possible, and data management can present real problems to most enterprise shops.

[image courtesy of Komprise]


Analytics To The Rescue!

Komprise has come up with an analytics-driven approach to data management that is built on some sound foundational principles. The solution needs to:

  1. Go beyond storage efficiency – it’s not just about dedupe and compression at a certain scale.
  2. Must be multi-directional – you need to be able to get stuff back.
  3. Not disrupt users and workflows – do that and you may as well throw the solution in the bin.
  4. Should create new uses for your data – it’s all about value, after all.
  5. Puts your data first.

The final point is possibly the most critical one. If I think about the storage-centric approaches to data management that I’ve seen over the years, there’s definitely been a viewpoint that the underlying storage infrastructure would heavily influence how the data is used, rather than the data dictating how the storage platforms should be architected. Some of that is a question of visibility – if you don’t understand your data, it’s hard to come up with tailored solutions. Some of the problem is also the disconnect that seems to exist between “the business” and IT departments in a large number of enterprises. It’s not an easy problem to solve, by any stretch, but it does explain some of the novel approaches to data management that I’ve seen over the years.


Thoughts and Further Reading

Data management is hard, and it keeps getting harder because we keep making more and more data. And we frequently don’t have the time, or take the time, to work out what value the data actually has. This problem isn’t going to go away, so it’s good to see Komprise moving the conversation past that and into the realm of how we can best focus on deriving value from the data itself. There was certainly some interesting discussion during the presentation about the term analytics,  and what that really meant in terms of the Komprise solution. Ultimately, though, I’m a fan of anything that elevates the conversation beyond “I can move your terabytes from this bucket to that bucket”. I want something that starts to tell me more about what type of data I’m storing, who’s using it, and how they’re using it. That’s when it gets interesting from a data management perspective. I think there’s a ways to go in terms of getting this solution right for everyone, but it strikes me that Komprise is on the right track, and I’m looking forward to seeing how the solution evolves alongside the storage technologies it’s using to get the most from everyone’s data. You can read more on the Komprise approach here.

Storage Field Day 22 – (Fairly) Full Disclosure

Disclaimer: I recently attended Storage Field Day 22.  Some expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Here are my notes on gifts, etc, that I received as a conference attendee at Storage Field Day 22. This is by no stretch an interesting post from a technical perspective, but it’s a way for me to track and publicly disclose what I get and how it looks when I write about various things. With all of this stuff happening (waves hands around), it’s not going to be as lengthy as normal, but I did receive a box of stuff in the mail, so I wanted to disclose it.

The Tech Field Day team sent over some stickers, a TFD tote bag, and a TFD pin, and a TFD patch. Fujifilm kindly gave me a 16GB USB drive (with both USB 2 and Lightning connectors), a webcam cover, stylus, USB charging cable, a Bluetooth tracker, a phone cradle, and a beach towel. Komprise sent over some neat socks, three Komprise-branded Titleist golf balls, and a sticker.

It wasn’t fancy food and limos this time around, but it was nonetheless an enjoyable event. Hopefully we can get back to in-person events some time this decade. Thanks again to Stephen and the team for having me back. Thanks also to my employer for giving me time away from the office to attend.

Komprise Announces Cloud Capability

Komprise recently made some announcements around extending its product to cloud. I had the opportunity to speak to Krishna Subramanian (President and COO) about the news and I thought I’d share some of my thoughts here.


The Announcement

Komprise has traditionally focused on unstructured data stored on-premises. It has now extended the capabilities of Komprise Intelligent Data Management to include cloud data. There’s currently support for Amazon S3 and Wasabi, with Google Cloud, Microsoft Azure, and IBM support coming soon.



So what do you get with this capability?

Analyse data usage across cloud accounts and buckets easily

  • Single view across cloud accounts, buckets, and storage classes
  • Analyse AWS usage by various metrics accurately based on access times
  • Explore different data archival, replication, and deletion strategies with instant cost projections

Optimise AWS costs with analytics-driven archiving

  • Continuously move objects by policy across Cloud Network Attached Storage (NAS), Amazon S3, Amazon S3 Standard-IA, Amazon S3 Glacier, and Amazon S3 Glacier DeepArchive
  • Minimise costs and penalties by moving data at the right time based on access patterns

Bridge to Big Data/Artificial Intelligence (AI) projects

  • Create virtual data lakes for Big Data, AI – search for exactly what you need across cloud accounts and buckets
  • Native access to moved data on each storage class with full data fidelity

Create Cyber Resiliency with AWS

  • Copy S3 data to AWS to protect from ransomware with an air-gapped copy

[image courtesy of Komprise]


Why Is This Good?

The move to cloud storage hasn’t been all beer and skittles for enterprise. Storing large amounts of data in public cloud presents enterprises with a number of challenges, including:

  • Poor visibility – “Bucket sprawl”
  • Insufficient data – Cloud does not easily track last access / data use
  • Cost complexity – Manual data movement can lead to unexpected retrieval cost surprises
  • Labour – Manually moving data is error-prone and time-consuming

Sample Use Cases

Some other reasons you might want to have Komprise manage your data include:

  • Finding ex-employee data stored in buckets.
  • Data migration – you might want to take a copy of your data from Wasabi to AWS.

There’s support for all unstructured data (file and object), so the benefits of Komprise can be enjoyed regardless of how you’re storing your unstructured data. It’s also important to note that there’s no change to the existing licensing model, you’re just now able to use the product on public cloud storage.



Effective data management remains a big challenge for enterprises. It’s no secret that public cloud storage is really just storage that lives in another company’s data centre. Sure, it might be object storage, rather than file based, but it’s still just a bunch of unstructured data sitting in another company’s data centre. The way you consume that data may have changed, and certainly the way you pay for it has changed, but fundamentally it’s still your unstructured data sitting on a share or a filesystem. The problems you had on-premises though, still manifest in public cloud environments (i.e. data sprawl, capacity issues, etc). That’s why the Komprise solution seems so compelling when it comes to managing your on-premises storage consumption, and extending that capability to cloud storage is a no-brainer. When it comes to storing unstructured data, it’s frequently a bin fire of some sort or another. The reason for this is because it doesn’t scale well. I don’t mean the storage doesn’t scale – you can store petabytes all over the place if you like. But if you’re still hand crafting your shares and manually moving data around, you’ll notice that it becomes more and more time consuming as time goes on (and your data storage needs grow).

One way to address this challenge is to introduce a level of automation, which is something that Komprise does quite well. If you’ve got many terabytes of data stored on-premises and in AWS buckets (or you’re looking to move some old data from on-premises to the cloud) and you’re not quite sure what it’s all for or how best to go about it, Komprise can certainly help you out.

Random Short Take #39

Welcome to Random Short Take #39. Not a huge amount of players have worn 39 in the NBA, and I’m not going to pretend I’m any real fan of The Dwightmare. But things are tough all around, so let’s remain optimistic and push through to number 40. Anyway let’s get random.

  • VeeamON 2020 was online this week, and Anthony Spiteri has done a great job of summarising the major technical session announcements here.
  • I’ve known Howard Marks for a while now, and always relish the opportunity to speak with him when I can. This post is pretty hilarious, and I’m looking forward to reading the followup posts.
  • This is a great article from Alastair Cooke on COVID-19 and what En-Zed has done effectively to stop the spread. It was interesting to hear his thoughts on returning to the US, and I do agree that it’s going to be some time until I make the trip across the Pacific again.
  • Sometimes people get crazy ideas about how they might repurpose some old bits of technology. It’s even better when they write about their experiences in doing so. This article on automating an iPod Hi-Fi’s volume control over at Six Colors was fantastic.
  • Chris M. Evans put out a typically thought-provoking piece on data migration challenges recently that I think is worth checking out. I’ve been talking a lot to customers that are facing these challenges on a daily basis, and it’s interesting to see how, regardless of the industry vertical they operate in, it’s sometimes just a matter of the depth varying, so to speak.
  • I frequently bump into Ray Lucchesi at conferences, and he knows a fair bit about what does and doesn’t work. This article on his experiences recently with a number of virtual and online conferences is the epitome of constructive criticism.
  • Speaking of online conferences, the Australian VMUG UserCon will be virtual this year and will be held on the 30th July. You can find out more and register here.
  • Finally, if you’ve spent any time with me socially, you’ll know I’m a basketball nut. And invariably I’ll tell you that Deftones is may favouritest band ever. So it was great to come across this article about White Pony on one of my favourite sports (and popular culture) websites. If you’re a fan of Deftones, this is one to check out.


Komprise Announces Elastic Data Migration

Komprise recently announced the availability of its Elastic Data Migration solution. I was lucky enough to speak with Krishna Subramanian about the announcement and thought I’d share some of my notes here.


Migration Evolution


I’ve written about Komprise before. A few times, as it happens. Subramanian describes it as “analytics driven data management software”, capable of operating with NFS, SMB, and S3 storage. The data migration capability was added last year (at no additional charge), but it was initially focused on LAN-based migration.

Enter Elastic Data Migration

Elastic Data Migration isn’t just for LAN-based migrations though, it’s for customers want to migrate to the cloud, or perhaps another data centre. Invariably they’ll be looking to do this over a WAN, rather than a LAN. Given that WAN connections invariably suffer from lower speeds and higher latencies, how does Komprise deal with this? I’m glad you asked. The solution addresses latency thusly:

  • Increased parallelism inside the software (based on Komprise VMs, and the nature of the data sets);
  • Reducing round trips over the network; and
  • It’s been optimised to reduce the chatter of the protocol (eg NFS being chatty).

Sounds simple enough, but Komprise is seeing some great results when compared to traditional tools such as rsync.

It’s Graphical

There are some other benefits over the more traditional tools, including GUI access that allows you to run hundreds of migrations simultaneously.

[image courtesy of Komprise]

Of course, if you’re not into doing things with GUIs (and it doesn’t always make sense where a level of automation is required), you can do this programmatically via API access.


Thoughts and Further Reading

Depending on what part of the IT industry you’re most involved in, the idea of data migrations may seem like something that’s a little old fashioned. Moving a bunch of unstructured data around using tools from way back when? Why aren’t people just using the various public cloud options to store their data? Well, I guess it’s partly because things take time to evolve and, based on the sorts of conversations I’m still regularly having, simple to use data migration solutions for large volumes of data are still required, and hard to come across.

Komprise has made its name making sense of vast chunks of unstructured data living under various rocks in enterprises. It also has a good story when it comes to archiving that data. It makes a lot of sense that it would turn its attention to improving the experience and performance of migrating a large number of terabytes of unstructured data from one source to another. There’s already a good story here in terms of extensive multi-protocol support and visibility into data sources. I like that Komprise has worked hard on the performance piece as well, and has removed some of the challenges traditionally associated with migrating unstructured data over WAN connections. Data migrations are still a relatively complex undertaking, but they don’t need to be painful.

One of the few things I’m sure of nowadays is that the amount of data we are storing is not shrinking. Komprise is working hard to make sense of what all that data is being used for. Once it knows what that data is for, it’s making it easy to put it in the place that you’ll get the most value from it. Whether that’s on a different NAS on your LAN, or sitting in another data centre somewhere. Komprise has published a whitepaper with the test results I referred to earlier, and you can grab it from here (registration required). Enrico Signoretti also had Subramanian on his podcast recently – you can listen to that here.

Komprise – Non-Disruptive Data Management

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Komprise recently presented at Storage Field Day 19. You can see videos of the presentation here, and download my rough notes from here.


What Do You Need From A Data Management Solution?

Komprise took us through the 6 tenets used to develop the solution:

  • Insight into our data
  • Make the insight actionable
  • Don’t get in front of hot data
  • Show us a path to the cloud
  • Scale to manage massive quantities of data
  • Transparent data movement

3 Architectural pillars

  • Dynamic Data Analytics – analyses data so you can make the right decision before buying more storage or backup
  • Transparent Move Technology – moves data with zero interference to apps, users, or hot data
  • Direct Data Access – puts you in control of your data – not your vendor

Archive successfully

  • No disruption
    • Transparency
    • No interference with hot data
  • Save money
  • Without lock-in
  • Extract value



So what does the Komprise architecture look like? There are a couple of components.

  • The Director is a VM that can be hosted on-premises or in a cloud. This hosts the console, exposes the API, and stores configuration information.
  • The Observer runs on-premises and can run on ESXi, or can be hosted on Linux bare metal. It’s used to discover the storage (and should be hosted in the same DC as said storage).
  • Deep Analytics indexes the files, and the Director can run queries against it. It can also be used to tag the data. Deep Analytics supports multiple Observers (across multiple DCs), giving you a “global metadata lake” and can also deliver automatic performance throttling for scans.

One neat feature is that you can choose to put a second copy somewhere when you’re archiving data. Komprise said that the typical customer starting size is 1PB or more.


Thoughts and Further Reading

I’ve previously written enthusiastically about what I’ve seen from Komprise. Data management is a difficult thing to get right at the best of times. I believe the growth in primary, unstructured storage has meant that the average punter / enterprise can’t really rely on file systems and directories to store data in a sensible location. There’s just so much stuff that gets generated daily. And a lot of it is important (well, at least a fair chunk of it is). One of the keys to getting value from the data you generate, though, is the ability to quickly access that data after it’s been generated. Going back to a file in 6 months time to refer to something can be immensely useful. But it’s a hard thing to do if you’ve forgotten about the file, or what was in it. So it’s a nice thing to have a tool that can track this stuff for you in a relatively sane fashion.

Komprise can also guide you down the path when it comes to intelligently accessing and storing your unstructured data. It can help with reducing your primary storage footprint, reducing your infrastructure spend and, hopefully, your operational costs. What’s more exciting, though, is the fact that all of this can be done in a transparent fashion to the end user. Betty in the finance department can keep generating documents that have ridiculous file names, and storing them forever, and Komprise will help you move those spreadsheets to where they’re of most use.

Storage is cheaper than it once was, but we’re also storing insanely big amounts of data. And for much longer than we have previously. Even if my effective $/GB stored is low compared to what it was in the year 2000, my number of GB stored is exponentially higher. Anything I can do to reduce that spend is going to be something that my enterprise is interested in. It seems like Komprise is well-positioned to help me do that. It’s biggest customer has close to 100PB of data being looked after by Komprise.

You can download a whitepaper overview of the Komprise architecture here (registration required). For a different perspective on Komprise, check out Becky’s article here. Chin-Fah also shared his thoughts here.

Komprise Continues To Gain Momentum

I first encountered Komprise at Storage Field Day 17, and was impressed by the offering. I recently had the opportunity to take a briefing with Krishna Subramanian, President and COO at Komprise, and thought I’d share some of my notes here.




The primary reason for our call was to discuss Komprise’s Series C funding round of US $24 million. You can read the press release here. Some noteworthy achievements include:

  • Revenue more than doubled every single quarter, with existing customers steadily growing how much they manage with Komprise; and
  • Some customers now managing hundreds of PB with Komprise.


Key Verticals

Komprise are currently operating in the following key verticals:

  • Genomics and health care, with rapidly growing footprints;
  • Financial and Insurance sectors (5 out of 10 of the largest insurance companies in the world apparently use Komprise);
  • A lot of universities (research-heavy environments); and
  • Media and entertainment.


What’s It Do Again?

Komprise manages unstructured data over three key protocols (NFS, SMB, S3). You can read more about the product itself here, but some of the key features include the ability to “Transparently archive data”, as well as being able to put a copy of your data in another location (the cloud, for example).


So What’s New?

One of Komprise’s recent announcements was NAS to NAS migration.  Say, for example, you’d like to migrate your data from an Isilon environment to FlashBlade, all you have to do is set one as a source, and one as target. The ACLs are fully preserved across all scenarios, and Komprise does all the heavy lifting in the background.

They’re also working on what they call “Deep Analytics”. Komprise already aggregates file analytics data very efficiently. They’re now working on indexing metadata on files and exposing that index. This will give you “a Google-like search on all your data, no matter where it sits”. The idea is that you can find data using any combination of metadata. The feature is in beta right now, and part of the new funding is being used to expand and grow this capability.


Other Things?

Komprise can be driven entirely from an API, making it potentially interesting for service providers and VARs wanting to add support for unstructured data and associated offerings to their solutions. You can also use Komprise to “confine” data. The idea behind this is that data can be quarantined (if you’re not sure it’s being used by any applications). Using this feature you can perform staged deletions of data once you understand what applications are using what data (and when).



I don’t often write articles about companies getting additional funding. I’m always very happy when they do, as someone thinks they’re on the right track, and it means that people will continue to stay employed. I thought this was interesting enough news to cover though, given that unstructured data, and its growth and management challenges, is an area I’m interested in.

When I first wrote about Komprise I joked that I needed something like this for my garage. I think it’s still a valid assertion in a way. The enterprise, at least in the unstructured file space, is a mess based on the what I’ve seen in the wild. Users and administrators continue to struggle with the sheer volume and size of the data they have under their management. Tools such as this can provide valuable insights into what data is being used in your organisation, and, perhaps more importantly, who is using it. My favourite part is that you can actually do something with this knowledge, using Komprise to copy, migrate, or archive old (and new) data to other locations to potentially reduce the load on your primary storage.

I bang on all the time about the importance of archiving solutions in the enterprise, particularly when companies have petabytes of data under their purview. Yet, for reasons that I can’t fully comprehend, a number of enterprises continue to ignore the problem they have with data hoarding, instead opting to fill their DCs and cloud storage with old data that they don’t use (and very likely don’t need to store). Some of this is due to the fact that some of the traditional archive solution vendors have moved on to other focus areas. And some of it is likely due to the fact that archiving can be complicated if you can’t get the business to agree to stick to their own policies for document management. In just the same way as you can safely delete certain financial information after an amount of time has elapsed, so too can you do this with your corporate data. Or, at the very least, you can choose to store it on infrastructure that doesn’t cost a premium to maintain. I’m not saying “Go to work and delete old stuff”. But, you know, think about what you’re doing with all of that stuff. And if there’s no value in keeping the “kitchen cleaning roster May 2012.xls” file any more, think about deleting it? Or, consider a solution like Komprise to help you make some of those tough decisions.

I Need Something Like Komprise For My Garage

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.


Komprise recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here. Here’s a blurry photo (love that iPhone camera quality) of Kumar K. Goswami (Founder and CEO of Komprise) presenting.


What’s In Your Garage?

My current house has a good sized garage, and we only have one car. So I have a lot of space to store things in it. When we moved in we added some storage cupboards and some additional shelving to accommodate our stuff. Much like Parkinson’s Law (and the corollary for storage systems), the number of things in my garage has expanded to fill the available space. I have toys from when my children were younger, old university assignments, clothes, Christmas decorations, oft-neglected gym equipment. You get the idea. Every year I give a bunch of stuff away to charities or throw it out. But my primary storage (new things) keeps expanding too, so I need to keep moving stuff to my garage for storage.

If you’ve ever had the good (!) fortune of managing file servers, you’ll understand that there’s a lot of data being stored in corporate environments that people don’t know what to do with. As Komprise pointed out in their presentation, we’re “[d]rowning in unstructured data”. Komprise wants to help out by “[i]dentifying cold data and syphoning it off before it goes into the data workflow and data protection systems”. The idea is that it delivers non-disruptive data management. Unlike cleaning up my garage, things just move about based on policies.


How’s That Work Then?

Komprise works by moving unstructured data about the place. It’s a hybrid SaaS solution, with a console in the cloud, and “observers” running in VMs on-premises.

[image courtesy of Komprise]

I don’t want to talk too much about how the product works, as I think the video presentation does a better job of that than I would. And there’s also an excellent article on their website covering the Komprise Filesystem. From a visualisation perspective though, the dashboard presents a “green doughnut”, providing information including:

  • Data by age;
  • File analytics (size, types, top users, etc); and
  • Then set policies and see ROI based on the policy (customer enters their own costs).

When files are moved around, Komprise leaves a “breadcrumb” on the source storage. They were careful not to call it a stub – it’s a Komprise Dynamic Link – a 4KB symbolic link.


It’s A Real Problem

One thing that really struck me about Komprise’s presentation was when they said they wanted to “[m]ove things you don’t want to cheaper storage”. It got me thinking that a lot of corporate file servers are very similar to my garage. There’s an awful lot of stuff being stored on them. Some of it is regularly used (much like my Christmas decorations), and some of it not as much (more like my gym equipment). So why don’t we throw stuff out? Well, when you’re in business, you generally have to work within the confines of various frameworks and regulations. So it’s not as simple as saying “Let’s get rid of the old stuff we haven’t used in 24 months”. Unlike those particularly unhelpful self-help books on decluttering, trashing corporate data isn’t the same as throwing out old boxes of magazines.

This is a real problem for corporations, and is only going to get worse. More and more data is being generated every day, much of it simply dumped on unstructured file stores with little to no understanding of the data’s value. Komprise seem to be doing a good job of helping to resolve an old problem. I still naively like to think that this would be better if people would use document management systems properly and take some responsibility for their stuff. But, much like the mislabelled boxes of files in my garage, it’s often not that simple. People move on, don’t know to do with the data, and assume that the IT folks will take care of it. I think solutions like the one from Komprise, while being technically very interesting, also have an important role to play in the enterprise. I’m just wondering if I can do something like this with all of the stuff in my garage.


Further Reading

I heartily recommend checking out Enrico’s post, as well as Aaron’s take on the data management problem.