Komprise Announces Elastic Data Migration

Komprise recently announced the availability of its Elastic Data Migration solution. I was lucky enough to speak with Krishna Subramanian about the announcement and thought I’d share some of my notes here.

 

Migration Evolution

Komprise?

I’ve written about Komprise before. A few times, as it happens. Subramanian describes it as “analytics driven data management software”, capable of operating with NFS, SMB, and S3 storage. The data migration capability was added last year (at no additional charge), but it was initially focused on LAN-based migration.

Enter Elastic Data Migration

Elastic Data Migration isn’t just for LAN-based migrations though, it’s for customers want to migrate to the cloud, or perhaps another data centre. Invariably they’ll be looking to do this over a WAN, rather than a LAN. Given that WAN connections invariably suffer from lower speeds and higher latencies, how does Komprise deal with this? I’m glad you asked. The solution addresses latency thusly:

  • Increased parallelism inside the software (based on Komprise VMs, and the nature of the data sets);
  • Reducing round trips over the network; and
  • It’s been optimised to reduce the chatter of the protocol (eg NFS being chatty).

Sounds simple enough, but Komprise is seeing some great results when compared to traditional tools such as rsync.

It’s Graphical

There are some other benefits over the more traditional tools, including GUI access that allows you to run hundreds of migrations simultaneously.

[image courtesy of Komprise]

Of course, if you’re not into doing things with GUIs (and it doesn’t always make sense where a level of automation is required), you can do this programmatically via API access.

 

Thoughts and Further Reading

Depending on what part of the IT industry you’re most involved in, the idea of data migrations may seem like something that’s a little old fashioned. Moving a bunch of unstructured data around using tools from way back when? Why aren’t people just using the various public cloud options to store their data? Well, I guess it’s partly because things take time to evolve and, based on the sorts of conversations I’m still regularly having, simple to use data migration solutions for large volumes of data are still required, and hard to come across.

Komprise has made its name making sense of vast chunks of unstructured data living under various rocks in enterprises. It also has a good story when it comes to archiving that data. It makes a lot of sense that it would turn its attention to improving the experience and performance of migrating a large number of terabytes of unstructured data from one source to another. There’s already a good story here in terms of extensive multi-protocol support and visibility into data sources. I like that Komprise has worked hard on the performance piece as well, and has removed some of the challenges traditionally associated with migrating unstructured data over WAN connections. Data migrations are still a relatively complex undertaking, but they don’t need to be painful.

One of the few things I’m sure of nowadays is that the amount of data we are storing is not shrinking. Komprise is working hard to make sense of what all that data is being used for. Once it knows what that data is for, it’s making it easy to put it in the place that you’ll get the most value from it. Whether that’s on a different NAS on your LAN, or sitting in another data centre somewhere. Komprise has published a whitepaper with the test results I referred to earlier, and you can grab it from here (registration required). Enrico Signoretti also had Subramanian on his podcast recently – you can listen to that here.

Datadobi Announces DobiMigrate 5.8 – Introduces Chain of Custody

Datadobi recently announced version 5.8 of its DobiMigrate software and introduced a “Chain of Custody” feature. I had the opportunity to speak to Carl D’Halluin and Michael Jack about the announcement and thought I’d share some thoughts on it here.

 

Don’t They Do File Migration?

If you’re unfamiliar with Datadobi, it’s a company that specialises in NAS migration software. It tends to get used a lot by the major NAS vendors as rock solid method of moving data of a competitor’s box and onto theirs. Datadobi has been around for quite a while, and a lot of the founders have heritage with EMC Centera.

Chain of Custody?

So what exactly does the Chain of Custody feature offer?

  • Tracking files and objects throughout an entire migration
  • Full photo-finish of source and destination system at cutover time
  • Forensic input which can serve as future evidence of tampering
  • Available for all migrations.
    • No performance hit.
    • No enlarged maintenance window.

[image courtesy of Datadobi]

Why Is This Important?

Organisations are subject to a variety of legislative requirements the word over to ensure that the data presented as evidence in courts of law hasn’t been tampered with. Some of them spend an inordinate amount of money ensuring that the document management systems (and the hardware those systems reside on) offer all kinds of compliance and governance features that ensure that you can reliably get up in front of a judge and say that nothing has been messed with. Or you can reliably say that it has been messed with. Either way though, it’s reliable. Unfortunately, nothing lasts forever (not even those Centera cubes we put in years ago).

So what do you do when you have to migrate your data from one platform to another? If you’ve just used rsync or robocopy to get the data from one share to another, how can you reliably prove that you’ve done so, without corrupting or otherwise tampering with the data? Logs are just files, after all, so what’s to stop someone “losing” some data. along the way?

It turns out that a lot of folks in the legal profession have been aware that this was a problem for a while, but they’ve looked the other way. I am no lawyer, but as it was explained to me, if you introduce some doubt into the reliability of the migration process, it’s easy enough for the other side to counter that your stuff may not have been so reliable either, and the whole thing becomes something of a shambles. Of course, there’s likely a more coherent way to explain this, but this is tech blog and I’m being lazy.

 

Thoughts

I’ve done all kinds of data migrations over the years. I think I’ve been fortunate that I’ve never specifically had to deal with a system that was being relied on seriously for legislative reasons, because I’m sure that some of those migrations were done more by the seat of my pants than anything else. Usually the last thing on the organisation’s mind (?) was whether the migration activity was compliant or not. Instead, the focus of the project manager was normally to get the data from the old box to the new box as quickly as possible and with as little drama / downtime as possible.

If you’re working on this stuff in a large financial institution though, you’ll likely have a different focus. And I’m sure the last thing your corporate counsel want to hear is that you’ve been playing a little fast and loose with data over the years. I anticipate this announcement will be greeted with some happiness by people who’ve been saddled with these kinds of daunting tasks in the past. As we move to a more and more digital world, we need to carry some of the concepts from the physical world across. It strikes me that Datadobi has every reason to be excited about this announcement. You can read the press release here.

 

Cohesity – NAS Data Migration Overview

Data Migration

Cohesity NAS Data Migration, part of SmartFiles, was recently announced as a generally available feature within the Cohesity DataPlatform 6.4 release (after being mentioned in the 6.3 release blog post). The idea behind it is that you can use the feature to perform the migration of NAS data from a primary source to the Cohesity DataPlatform. It is supported for NAS storage registered as SMB or NFS (so it doesn’t necessarily need to be a NAS appliance as such, it can also be a file share hosted somewhere).

 

What To Think About

There are a few things to think about when you configure your migration policy, including:

  • The last time the file was accessed;
  • Last time the file was modified; and
  • The size of the file.

You also need to think about how frequently you want to run the job. Finally, it’s worth considering which View you want the archived data to reside on.

 

What Happens?

When the data is migrated an SMB2 symbolic link is left in place of the file with the same name as the file and the original data is moved to the Cohesity View. Note that on Windows boxes, remote to remote symbolic links are disabled, so you need to run these commands:

C:\Windows\system32>fsutil behavior set SymlinkEvaluation R2R:1
C:\Windows\system32>fsutil behavior query SymlinkEvaluation

Once the data is migrated to the Cohesity cluster, subsequent read and write operations are performed on the Cohesity host. You can move data back to the environment by mounting the Cohesity target View on a Windows client, and copying it back to the NAS.

 

Configuration Steps

To get started, select File Services, and click on Data Migration.

Click on the Migrate Data to configure a migration job.

You’ll need to give it a name.

 

The next step is to select the Source. If you already have a NAS source configured, you’ll see it here. Otherwise you can register a Source.

Click on the arrow to expand the registered NAS mount points.

Select the mount point you’d like to use.

Once you’ve selected the mount point, click on Add.

You then need to select the Storage Domain (formerly known as a ViewBox) to store the archived data on.

You’ll need to provide a name, and configure schedule options.

You can also configure advanced settings, including QoS and exclusions. Once you’re happy, click on Migrate and the job will be created.

You can then run the job immediately, or wait for the schedule to kick in.

 

Other Things To Consider

You’ll need to think about your anti-virus options as well. You can register external anti-virus software or install the anti-virus app from the Cohesity Marketplace

 

Thoughts And Further Reading

Cohesity have long positioned their secondary storage solution as something more than just a backup and recovery solution. There’s some debate about the difference between storage management and data management, but Cohesity seem to have done a good job of introducing yet another feature that can help users easily move data from their primary storage to their secondary storage environment. Plenty of backup solutions have positioned themselves as archive solutions, but many have been focused on moving protection data, rather than primary data from the source. You’ll need to do some careful planning around sizing your environment, as there’s always a chance that an end user will turn up and start accessing files that you thought were stale. And I can’t say with 100% certainty that this solution will transparently work with every line of business application in your environment. But considering it’s aimed at SMB and NFS shares, it looks like it does what it says on the tin, and moves data from one spot to another.

You can read more about the new features in Cohesity DataPlatform 6.4 (Pegasus) on the Cohesity site, and Blocks & Files covered the feature here. Alastair also shared some thoughts on the feature here.

Broken sVMotion

It’s been a very long few weeks, and I’m looking forward to taking a few weeks off. I’ve been having a lot of fun doing a data migration from a CLARiiON CX3-40f and CX700 to a shiny, new CX4-960. To give you some background, instead of doing an EMC-supported data in place upgrade in August last year, we decided to buy another 4-960 (on top of the already purchased 4-960 upgrade kit) and migrate the data manually to the new array. The only minor probelm with this is that there’s about 100TB of data in various forms that I need to get on to the new array. Sucks to be me, but I am paid by the hour.

I started off by moving VMs from one cluster to the new array using sVMotion, as there was a requirement to be a bit non-disruptive where possible. Unfortunately, on the larger volumes attached to fileservers, I had a few problems. I’ll list them, just for giggles:

There were 3 800GB volumes and 5 500GB volumes and 1 300GB volume that had 0% free space on the VMFS. And I mean 0%, not 5MB or 50MB, 0%. So that’s not cool for a few reasons. ESX likes to update journaling data on VMFS, because it’s a journaling filesystem. If you don’t give it space to do this, it can’t do it, and you’ll find volumes start to get remounted with very limited writeability. If you try to storage VMotion these volumes, you’ll again be out of luck, as it wants to keep a dmotion file on the filesystem to track any changes to the vmdk file while the migration is happening. I found my old colleague Leo’s post to be helpful when a few migrations fail, but unfortunately the symptoms he described were not the same as mine, in my case the VMs fell over entirely. More info from VMware can be had here.

If you want to move just a single volume, you try your luck with this method, which I’ve used successfully before. But I was tired, and wanted to use vmkfstools since I already had an unexpected outage and had to get something sorted.

The problem with vmkfstools is that there’s no restartable copy option – as far as I know. So when you get 80% through a 500GB file and it fails, well, that’s 400GB of pointless copying and time you’ll never get back. Multiply that out over 3 800GB volumes and a few ornery 500GB vmdks and you’ll start to get a picture of what kind of week I had.

After suffering through a number of failures, I ended up taking one node out of the 16-node cluster and placing it and its associated datastores (the ones I needed to migrate) in their own Navisphere storage group. That way, there’d be no “busy-ness” affecting the migration process (we had provisioned about 160 LUNs to the cluster at this stage and we were, obviously, getting a few “SCSI reservation conflicts” and “resource temporarily unavailable” issues). This did the trick, and I was able to get some more stuff done. now there’s only about 80TB to go before the end of April. Fun times.

And before you ask why didn’t I use SAN Copy? I don’t know, I suppose I’ve never had the opportunity to test it with ESX, and while I know that underlying technology is the same as MirrorView, I just really didn’t feel I was in a position to make that call. I probably should have just done it, but I didn’t really expect that I’d have as much trouble as I did with sVMotion and / or vmkfstools. So there you go.