Komprise – Non-Disruptive Data Management

Disclaimer: I recently attended Storage Field Day 19.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Komprise recently presented at Storage Field Day 19. You can see videos of the presentation here, and download my rough notes from here.


What Do You Need From A Data Management Solution?

Komprise took us through the 6 tenets used to develop the solution:

  • Insight into our data
  • Make the insight actionable
  • Don’t get in front of hot data
  • Show us a path to the cloud
  • Scale to manage massive quantities of data
  • Transparent data movement

3 Architectural pillars

  • Dynamic Data Analytics – analyses data so you can make the right decision before buying more storage or backup
  • Transparent Move Technology – moves data with zero interference to apps, users, or hot data
  • Direct Data Access – puts you in control of your data – not your vendor

Archive successfully

  • No disruption
    • Transparency
    • No interference with hot data
  • Save money
  • Without lock-in
  • Extract value



So what does the Komprise architecture look like? There are a couple of components.

  • The Director is a VM that can be hosted on-premises or in a cloud. This hosts the console, exposes the API, and stores configuration information.
  • The Observer runs on-premises and can run on ESXi, or can be hosted on Linux bare metal. It’s used to discover the storage (and should be hosted in the same DC as said storage).
  • Deep Analytics indexes the files, and the Director can run queries against it. It can also be used to tag the data. Deep Analytics supports multiple Observers (across multiple DCs), giving you a “global metadata lake” and can also deliver automatic performance throttling for scans.

One neat feature is that you can choose to put a second copy somewhere when you’re archiving data. Komprise said that the typical customer starting size is 1PB or more.


Thoughts and Further Reading

I’ve previously written enthusiastically about what I’ve seen from Komprise. Data management is a difficult thing to get right at the best of times. I believe the growth in primary, unstructured storage has meant that the average punter / enterprise can’t really rely on file systems and directories to store data in a sensible location. There’s just so much stuff that gets generated daily. And a lot of it is important (well, at least a fair chunk of it is). One of the keys to getting value from the data you generate, though, is the ability to quickly access that data after it’s been generated. Going back to a file in 6 months time to refer to something can be immensely useful. But it’s a hard thing to do if you’ve forgotten about the file, or what was in it. So it’s a nice thing to have a tool that can track this stuff for you in a relatively sane fashion.

Komprise can also guide you down the path when it comes to intelligently accessing and storing your unstructured data. It can help with reducing your primary storage footprint, reducing your infrastructure spend and, hopefully, your operational costs. What’s more exciting, though, is the fact that all of this can be done in a transparent fashion to the end user. Betty in the finance department can keep generating documents that have ridiculous file names, and storing them forever, and Komprise will help you move those spreadsheets to where they’re of most use.

Storage is cheaper than it once was, but we’re also storing insanely big amounts of data. And for much longer than we have previously. Even if my effective $/GB stored is low compared to what it was in the year 2000, my number of GB stored is exponentially higher. Anything I can do to reduce that spend is going to be something that my enterprise is interested in. It seems like Komprise is well-positioned to help me do that. It’s biggest customer has close to 100PB of data being looked after by Komprise.

You can download a whitepaper overview of the Komprise architecture here (registration required). For a different perspective on Komprise, check out Becky’s article here. Chin-Fah also shared his thoughts here.

Komprise Continues To Gain Momentum

I first encountered Komprise at Storage Field Day 17, and was impressed by the offering. I recently had the opportunity to take a briefing with Krishna Subramanian, President and COO at Komprise, and thought I’d share some of my notes here.




The primary reason for our call was to discuss Komprise’s Series C funding round of US $24 million. You can read the press release here. Some noteworthy achievements include:

  • Revenue more than doubled every single quarter, with existing customers steadily growing how much they manage with Komprise; and
  • Some customers now managing hundreds of PB with Komprise.


Key Verticals

Komprise are currently operating in the following key verticals:

  • Genomics and health care, with rapidly growing footprints;
  • Financial and Insurance sectors (5 out of 10 of the largest insurance companies in the world apparently use Komprise);
  • A lot of universities (research-heavy environments); and
  • Media and entertainment.


What’s It Do Again?

Komprise manages unstructured data over three key protocols (NFS, SMB, S3). You can read more about the product itself here, but some of the key features include the ability to “Transparently archive data”, as well as being able to put a copy of your data in another location (the cloud, for example).


So What’s New?

One of Komprise’s recent announcements was NAS to NAS migration.  Say, for example, you’d like to migrate your data from an Isilon environment to FlashBlade, all you have to do is set one as a source, and one as target. The ACLs are fully preserved across all scenarios, and Komprise does all the heavy lifting in the background.

They’re also working on what they call “Deep Analytics”. Komprise already aggregates file analytics data very efficiently. They’re now working on indexing metadata on files and exposing that index. This will give you “a Google-like search on all your data, no matter where it sits”. The idea is that you can find data using any combination of metadata. The feature is in beta right now, and part of the new funding is being used to expand and grow this capability.


Other Things?

Komprise can be driven entirely from an API, making it potentially interesting for service providers and VARs wanting to add support for unstructured data and associated offerings to their solutions. You can also use Komprise to “confine” data. The idea behind this is that data can be quarantined (if you’re not sure it’s being used by any applications). Using this feature you can perform staged deletions of data once you understand what applications are using what data (and when).



I don’t often write articles about companies getting additional funding. I’m always very happy when they do, as someone thinks they’re on the right track, and it means that people will continue to stay employed. I thought this was interesting enough news to cover though, given that unstructured data, and its growth and management challenges, is an area I’m interested in.

When I first wrote about Komprise I joked that I needed something like this for my garage. I think it’s still a valid assertion in a way. The enterprise, at least in the unstructured file space, is a mess based on the what I’ve seen in the wild. Users and administrators continue to struggle with the sheer volume and size of the data they have under their management. Tools such as this can provide valuable insights into what data is being used in your organisation, and, perhaps more importantly, who is using it. My favourite part is that you can actually do something with this knowledge, using Komprise to copy, migrate, or archive old (and new) data to other locations to potentially reduce the load on your primary storage.

I bang on all the time about the importance of archiving solutions in the enterprise, particularly when companies have petabytes of data under their purview. Yet, for reasons that I can’t fully comprehend, a number of enterprises continue to ignore the problem they have with data hoarding, instead opting to fill their DCs and cloud storage with old data that they don’t use (and very likely don’t need to store). Some of this is due to the fact that some of the traditional archive solution vendors have moved on to other focus areas. And some of it is likely due to the fact that archiving can be complicated if you can’t get the business to agree to stick to their own policies for document management. In just the same way as you can safely delete certain financial information after an amount of time has elapsed, so too can you do this with your corporate data. Or, at the very least, you can choose to store it on infrastructure that doesn’t cost a premium to maintain. I’m not saying “Go to work and delete old stuff”. But, you know, think about what you’re doing with all of that stuff. And if there’s no value in keeping the “kitchen cleaning roster May 2012.xls” file any more, think about deleting it? Or, consider a solution like Komprise to help you make some of those tough decisions.

I Need Something Like Komprise For My Garage

Disclaimer: I recently attended Storage Field Day 17.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.


Komprise recently presented at Storage Field Day 17. You can see their videos from Storage Field Day 17 here, and download a PDF copy of my rough notes from here. Here’s a blurry photo (love that iPhone camera quality) of Kumar K. Goswami (Founder and CEO of Komprise) presenting.


What’s In Your Garage?

My current house has a good sized garage, and we only have one car. So I have a lot of space to store things in it. When we moved in we added some storage cupboards and some additional shelving to accommodate our stuff. Much like Parkinson’s Law (and the corollary for storage systems), the number of things in my garage has expanded to fill the available space. I have toys from when my children were younger, old university assignments, clothes, Christmas decorations, oft-neglected gym equipment. You get the idea. Every year I give a bunch of stuff away to charities or throw it out. But my primary storage (new things) keeps expanding too, so I need to keep moving stuff to my garage for storage.

If you’ve ever had the good (!) fortune of managing file servers, you’ll understand that there’s a lot of data being stored in corporate environments that people don’t know what to do with. As Komprise pointed out in their presentation, we’re “[d]rowning in unstructured data”. Komprise wants to help out by “[i]dentifying cold data and syphoning it off before it goes into the data workflow and data protection systems”. The idea is that it delivers non-disruptive data management. Unlike cleaning up my garage, things just move about based on policies.


How’s That Work Then?

Komprise works by moving unstructured data about the place. It’s a hybrid SaaS solution, with a console in the cloud, and “observers” running in VMs on-premises.

[image courtesy of Komprise]

I don’t want to talk too much about how the product works, as I think the video presentation does a better job of that than I would. And there’s also an excellent article on their website covering the Komprise Filesystem. From a visualisation perspective though, the dashboard presents a “green doughnut”, providing information including:

  • Data by age;
  • File analytics (size, types, top users, etc); and
  • Then set policies and see ROI based on the policy (customer enters their own costs).

When files are moved around, Komprise leaves a “breadcrumb” on the source storage. They were careful not to call it a stub – it’s a Komprise Dynamic Link – a 4KB symbolic link.


It’s A Real Problem

One thing that really struck me about Komprise’s presentation was when they said they wanted to “[m]ove things you don’t want to cheaper storage”. It got me thinking that a lot of corporate file servers are very similar to my garage. There’s an awful lot of stuff being stored on them. Some of it is regularly used (much like my Christmas decorations), and some of it not as much (more like my gym equipment). So why don’t we throw stuff out? Well, when you’re in business, you generally have to work within the confines of various frameworks and regulations. So it’s not as simple as saying “Let’s get rid of the old stuff we haven’t used in 24 months”. Unlike those particularly unhelpful self-help books on decluttering, trashing corporate data isn’t the same as throwing out old boxes of magazines.

This is a real problem for corporations, and is only going to get worse. More and more data is being generated every day, much of it simply dumped on unstructured file stores with little to no understanding of the data’s value. Komprise seem to be doing a good job of helping to resolve an old problem. I still naively like to think that this would be better if people would use document management systems properly and take some responsibility for their stuff. But, much like the mislabelled boxes of files in my garage, it’s often not that simple. People move on, don’t know to do with the data, and assume that the IT folks will take care of it. I think solutions like the one from Komprise, while being technically very interesting, also have an important role to play in the enterprise. I’m just wondering if I can do something like this with all of the stuff in my garage.


Further Reading

I heartily recommend checking out Enrico’s post, as well as Aaron’s take on the data management problem.