Cloudtenna Announces DirectSearch

 

I had the opportunity to speak to Aaron Ganek about Cloudtenna and their DirectSearch product recently and thought I’d share some thoughts here. Cloudtenna recently announced $4M in seed funding, have Citrix as a key strategic partner, and are shipping a beta product today. Their goal is “[b]ringing order to file chaos!”.

 

The Problem

Ganek told me that there are three major issues with file management and the plethora of collaboration tools used in the modern enterprise:

  • Search is too much effort
  • Security tends to fall through the cracks
  • Enterprise IT is dangerously non-compliant

Search

Most of these collaboration tools are geared up for search, because people don’t tend to remember where they put files, or what they’ve called them. So you might have some files in your corporate Box account, and some in Dropbox, and then some sitting in Confluence. The problem with trying to find something is that you need to search each application individually. According to Cloudtenna, this:

  • Wastes time;
  • Leads to frustration; and
  • Often yields poor results.

Security

Security also becomes a problem when you have multiple storage repositories for corporate files.

  • There are too many apps to manage
  • It’s difficult to track users across applications
  • There’s no consolidated audit trail

Exposure

As a result of this, enterprises find themselves facing exposure to litigation, primarily because they can’t answer these questions:

  • Who accessed what?
  • When and from where?
  • What changed?

As some of my friends like to say “people die from exposure”.

 

Cloudtenna – The DirectSearch Solution

Enter DirectSearch. At its core it’s a SaaS offering that

  • Catalogues file activity across disparate data silos; and
  • Delivers machine learning services to mitigate the “chaos”.

Basically you point it at all of your data repositories and you can then search across all of those from one screen. The cool thing about the catalogue is not just that it tracks metadata and leverages full-text indexing, it also tracks user activity. It supports a variety of on-premises, cloud and SaaS applications (6 at the moment, 16 by September). You only need to login once and there’s full ACL support – so users can only see what they’re meant to see.

According to Ganek, it also delivers some pretty fast search results, in the order of 400 – 600ms.

[image courtesy of Cloudtenna]

I was interested to know a little more about how the machine learning could identify files that were being worked on by people in the same workgroup. Ganek said they didn’t rely on Active Directory group membership, as these were often outdated. Instead, they tracked file activity to create a “Shadow IT organisational chart” that could be used to identify who was collaborating on what, and tailor the search results accordingly.

 

Thoughts and Further Reading

I’ve spent a good part of my career in the data centre providing storage solutions for enterprises to host their critical data on. I talk a lot about data and how important it is to the business. I’ve worked at some established companies where thousands of files are created every day and terabytes of data is moved around. Almost without fail, file management has been a pain in the rear. Whether I’ve been using Box to collaborate, or sending links to files with Dropbox, or been stuck using Microsoft Teams (great for collaboration but hopeless from a management perspective), invariably files get misplaced or I find myself firing up a search window to try and track down this file or that one. It’s a mess because we don’t juts work from a single desktop and carefully curated filesystem any more. We’re creating files on mobile devices, emailing them about, and gathering data from systems that don’t necessarily play well on some platforms. It’s a mess, but we need access to the data to get our jobs done. That’s why something like Cloudtenna has my attention. I’m looking forward to seeing them progress with the beta of DirectSearch, and I have a feeling they’re on to something pretty cool with their product. You can also read Rich’s thoughts on Cloudtenna over at the Gestalt IT website.

Come And Splash Around In NetApp’s Data Lake

Disclaimer: I recently attended Storage Field Day 15.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

NetApp recently presented at Storage Field Day 15. You can see videos of their presentation here, and download my rough notes from here.

 

You Say Day-ta, I Say Dar-ta

Santosh Rao (Senior Technical Director, Workloads and Ecosystems) took us through some of the early big data platform challenges NetApp are looking to address.

 

Early Generation Big Data Analytics Platform

These were designed to deliver initial analytics solutions and were:

  • Implemented as Proof of concept; and
  • Solved a point project need.

The primary considerations of these solutions were usually cost and agility. The focus was to:

  • Limit up front costs and get the system operational quickly; and
  • Scalability, availability, and governance were afterthoughts

A typical approach to this was to use cloud or commodity infrastructure. This ended up becoming the final architecture. The problem with this approach, according to NetApp, is that it lead to unpredictable behaviour as copies manifested. You’d end up with 3-5 replicas of data copied across lines of business and various functions. Not a great situation.

 

Early Generation Analytics Platform Challenges

Other challenges with this architecture included:

  • Unpredictable performance;
  • Inefficient storage utilisation;
  • Media and node failures;
  • Total cost of ownership;
  • Not enterprise ready; and
  • Storage and compute tied (creating imbalance).

 

Next Generation Data Pipeline

So what do we really need from a data pipeline? According to NetApp, the key is “Unified Insights across LoBs and Functions”. By this they mean:

  • A unified enterprise data lake;
  • Federated data sources across the 2nd and 3rd platforms;
  • In-place access to the data pipeline (copy avoidance);
  • Spanned across edge, core and cloud; and
  • Future proofed to allow shifts in architecture.

Another key consideration is the deployment. The first proof of concept is performed by the business unit, but it needs to scale for production use.

  • Scale edge, core and cloud as a single pipeline
  • Predictable availability
  • Governance, data protection, security on data pipeline

This provides for a lower TCO over the life of the solution.

 

Data Pipeline Requirements

We’re not just playing in the core any more, or exclusively in the cloud. This stuff is everywhere. And everywhere you look the requirements differ as well.

Edge

  • Massive data (few TB/device/day)
  • Real-time Edge Analytics / AI
  • Ultra Low Latency
  • Network Bandwidth
  • Smart Data Movement

Core

  • Ultra high IO bandwidth (20 – 200+ GBps)
  • Ultra-low latency (micro – nanosecond)
  • Linear scale (1 – 128 node AI)
  • Overall TCO for 1-100+ PB

Cloud

  • Cloud analytics, AI/DL/ML
  • Consume and not operate
  • Cloud vendor vs on-premises stack
  • Cost-effective archive
  • Need to avoid cloud lock-in

Here’s picture of what the data pipeline looks like for NetApp.

[Image courtesy of NetApp]

 

NetApp provided the following overview of what the data pipeline looks like for AI / Deep Learning environments. You can read more about that here.

[Image courtesy of NetApp]

 

What Does It All Mean?

NetApp have a lot of tools at their disposal, and a comprehensive vision for meeting the requirements of big data, AI and deep learning workloads from a number of different angles. It’s not just about performance, it’s about understanding where the data needs to be to be considered useful to the business. I think there’s a good story to tell here with NetApp’s Data Fabric, but it felt a little like there remains some integration work to do. Big data, AI and deep learning means different things to different people, and there’s sometimes a reluctance to change the way people do things for the sake of adopting a new product. NetApp’s biggest challenge will be demonstrating the additional value they bring to the table, and the other ways in which they can help enterprise succeed.

NetApp, like some of the other Tier 1 storage vendors, has a broad portfolio of products at its disposal. The Data Fabric play is a big bet on being able to tie this all together in a way that their competitors haven’t managed to do yet. Ultimately, the success of this strategy will rely on NetApp’s ability to listen to customers and continue to meet their needs. As a few companies have found out the hard way, it doesn’t matter how cool you think your idea is, or how technically innovative it is, if you’re not delivering results for the business you’re going to struggle to gain traction in the market. At this stage I think NetApp are in a good place, and hopefully they can stay there by continuing to listen to their existing (and potentially new) customers.

For an alternative perspective, I recommend reading Chin-Fah’s thoughts from Storage Field Day 15 here.

Data Virtualisation is More Than Just Migration for Primary Data

Disclaimer: I recently attended Storage Field Day 10.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

logo-Primary_Data-400px

Before I get started, you can find a link to my raw notes on Primary Data’s presentation here. You can also see videos of the presentation here. I’ve seen Primary Data present at SFD7 and SFD8, and I’ve typically been impressed with their approach to Software-Defined Storage (SDS) and data virtualisation generally. And I’m also quite a fan of David Flynn‘s whiteboarding chops.

SFD10_Pd_DavidFlynn

 

Data Virtualisation is More Than Just Migration

Primary Data spent  some time during their presentation at SFD10 talking about Data Migration vs Data Mobility.

SFD10_Pd_DataMigrationvsMobility

[image courtesy of Primary Data]

Data migration can be a real pain to manage. It’s quite often a manual process and is invariably tied to the capabilities of the underlying storage platform hosting the data. The cool thing about Primary Data’s solution is that it offers dynamic data mobility, aligning “data’s needs (objectives) with storage capabilities (service levels) through automated mobility, arbitrated by economic value and reported as compliance”. Sounds like a mouthful, but it’s a nice way of defining pretty much what everyone’s been trying to achieve with storage virtualisation solutions for the last decade or longer.

What I like about this approach is that it’s a data-centric, rather than employing a storage platform focused approach. Primary Data supports “anything that can be presented to Linux as a block device”, so the options to deploy this stuff are fairly broad. Once you’ve presented your data to DSX, there’s some smart service level objectives (SLOs) that can be applied to the data. These can be broken down into the categories of protection, performance, and price/penalty:

Protection

  • Durability
  • Availability
  • Recoverability – Security
  • Priority
  • Sovereignty

Performance

  • IOPS / Bandwidth / Latency – Read / Write
  • Sustained / Burst

Price / Penalty

  • Per File
  • Per Byte
  • Per Operation

Access Control can also be applied to your data. With Primary Data, “[e]very storage container is a landlord with floorspace to lease and utilities available (capacity and performance)”.

 

Further Reading and Final Thoughts

I like the approach to data virtualisation that Primary Data have taken. There are a number of tools on the market that claim to fully virtualise storage and offer mobility across platforms. Some of them do it well, and some focus more on the benefits provided around ease of migration from one platform to another.

That said, there’s certainly some disagreement in the market place on whether Primary Data could be considered a fully-fledged SDS solution. Be that as it may, I really like the focus on data, rather than silos of storage. I’m also a big fan of applying SLOs to data, particularly when it can be automated to improve the overall performance of the solution and make the data more accessible and, ultimately, more valuable.

Primary Data has a bunch of use cases that extend beyond data mobility as well, including deployment options ranging from Hyperconverged, software-defined NAS and clustering across existing storage platforms. Primary Data want to “do for storage what VMware did for compute”. I think the approach they’ve taken has certainly gotten them on the right track, and the platform has matured greatly in the last few years.

If you’re after some alternative (and better thought out) posts on Primary Data, you can read Jon‘s post here. Max also did a good write-up here, while Chris M.Evans did a nice preview post on Primary Data that you can find here.

CrashPlan – Backup Adoption is the Killer App

I’ve been happily using CrashPlan for about a year now, after publicly breaking up with MozyHome, and sleeping around on Backblaze. I’ve signed up for another 3 years or so, so I’m fairly committed at this point. I’m a big fan of the Aussie DC presence and ability to use a local seed drive. The client itself is easy to use, and the pricing has been reasonable in my experience. But enough about my opinions.

I had a weird problem the other day on my main iMac where it looked like I had to re-seed all of my data. I’d had this problem before with MozyHome (link), but with a smaller set of data, so wasn’t too keen to re-upload over 900GB again.

CrashPlan

So I logged a case with support. A very nice gentleman named Daniel R got in contact with me and got me to send through some logs. I hadn’t realised I could clicky on the CrashPlan icon in the main window to open up a console. That’s kind of neat.

CP_window

I sent through the logs and Daniel got back in touch to have me modify my settings.xml file. No dice though. He then got back to me to advise that my archive was in a “maintenance queue” and he’d removed it from that queue and advised me to restart everything and see how it went. I’m fascinated by what the “maintenance queue” might be and how my archive ended up there.

Still no go, so he had me do a full uninstall (I think with prejudice) and re-install. The instructions for this process can be found here. For a complete uninstall, the following steps need to be done (on Mac OSX).

  1. Open the Finder
  2. Press Command-Shift-G and paste /Library/Application Support/CrashPlan/Uninstall.app into the dialog
  3. Double-click Uninstall
  4. Follow the prompts to complete the uninstall process
  5. Remove the following directory from your system:
  6. Custom installation (as user): ~/Library/Application Support/CrashPlan​

 

Once I’d re-installed everything, I could log back in with my normal credentials, and “adopt” the backup sitting in the Code42 DC that was assigned to my iMac. Simple as that. And then all I had to do was synchronize the changes. Seriously awesome, and so simple. No data loss, and smiles all round. And resolved in about 52 hours (including about 12 hours of them waiting for me to send logs through). And hence the title of the blog post. The ability to re-attach / adopt backups with new / replacement / freshly re-installed machines is a really cool feature that no doubt is saving people a fair bit of angst. It’s also not a feature you really think about until you actually need it.

So I’d like to publicly thank Daniel R at Code42 Support for his promptness and courtesy when dealing with me, as well as his ability to actually, well, resolve the issue.

CrashPlan – Initial thoughts and “feelings”

[Disclaimer: CrashPlan in AU provided me with a free 12-month Family subscription and use of a seed drive. This isn’t a paid review but clearly I’ve benefitted.]

So, a short time after my post on Backblaze and Mozy and why I was going for the cheapest (but not necessarily nastiest) personal cloud backup solution, the Australian arm of CrashPlan got in touch and offered to help get me started with them. So I thought I’d do a post to cover off on some initial thoughts and feelings and provide some public feedback on how it went. Just a reminder, every product is different, and every user’s circumstances are different, so don’t complain to me if you find that CrashPlan isn’t for you. Additionally, I hope you appreciate just how hard it is to take photos that look this bad.

So, the killer feature that CrashPlan offers for me, and residents of the US, is seeded backup. You can read more about how that works here. This was one of my complaints with Backblaze – I couldn’t get all of the data I wanted to up to the provider due to the extraordinarily shitty ADSL1 connection at my house. So gigabytes of home movies and other media were, beyond Time Machine backups, at risk. So, Adrian Johnson from Code42 offered me the use of a seeded backup drive, and I must say it’s been a really smooth experience. Again, here’re the rough steps, but you can look it up for yourself:

  • Support contact me to confirm my details;
  • Courier arrives with hard drive;
  • I attach hard drive to computer and add it as a destination;
  • I backup my stuff to hard drive;
  • I box up hard drive and send by pre-paid courier back to CrashPlan;
  • They contact me when they receive it;
  • They contact me when seed data is uploaded at their end;
  • I restart cloudy backup. Everything is pretty much there, barring a few new files from iPhoto; and
  • Profit.

It was pretty much that simple. So, here are some pictures to fill in the space where I should be offering thoughts. Firstly, I was mildly panicked when I saw that the drive was formatted as FAT32. It seemed like that would just suck as a transfer mechanism, especially for large files.

cp1

And at that start of the process, it certainly looked like it was going to take some time.

cp2

But the key thing with this service is compatibility. It is compatible with Mac OS X, Windows, Linux and Dots OS (?).

IMG_7246

I also found that by fiddling with some of the power saving settings on my Mac I was able to get the transfer speeds up to a more reasonable level. Also, like most backup products, lots of small files will choke the I/O, whereas big DV files go through at a healthy clip. Note also that this isn’t a straight file transfer. The data is being de-duped, compressed and encrypted. So, you know, that can take some time. Particularly on a 850GB backup set.

So what’s in the box? You get:

  • Instructions;
  • A LaCie rugged drive (1TB);
  • A USB3 cable; and
  • A pre-paid courier satchel to send it back in.

I took some photos, to make me look more like a tech journo.

IMG_7240

IMG_7243

IMG_7244

IMG_7245

And, then, magically, a little over 2 weeks after the drive arrived, I have 850GB of my data in the cloud. Almost like magic.

cp3

There are a few other things you can do with CrashPlan but I’ll look to cover those off in the next post. Because I’m tired now. In short, the NAS compatibility is cool (if you’re a QNAP owner – check this post out), as is the ability to send data to your friends.

So, I’ll wrap up with some of what I thought were good things about the product. Firstly, I can pay in Australian dollars. This may not seem like a big thing, as we’ve had parity with the US for a while, but recently the dollar has dipped to 85 cents. So, on a $50 subscription, I pay, after fees and charges, $60. Which, isn’t that big a deal, but it’s enough to make me pause. Secondly, the access to local support and a seed drive service is fricking awesome. And support have been helpful and informative every step of the way. Thirdly, CrashPlan pricing, for unlimited storage, is pretty competitive. Here’s a link to the Australian offering. Whether they can sustain that pricing remains to be seen. As an aside, I often wonder what Mozy’s pricing would have been like if they hadn’t been bought by EMC. But that may have had nothing to do with it.

So, in short, I’ve been really happy with my CrashPlan experience thus far, and am looking forward to doing some more stuff with it. I still won’t hesitate to recommend Backblaze to people, if it seems like a good fit for them, but I’m having a hard time arguing against a local presence and the somewhat parochial comfort that that provides. Thanks again to Adrian Johnson and the team at Code42 support for making this a really simple and effective exercise.

 

 

OT – Mozy, Backblaze and my race to the bottom …

Welcome back. I know it’s been a while, so I thought I’d try something different and do more of a thinky thing about my personal use of cloud backup. Strap yourselves in, because I don’t usually give my opinion on things, so this might just get really wild. Or not.

[Disclaimer: Backblaze haven’t paid for nor asked for my opinion. And Mozy have done nothing particularly heinous either. This is just my experience and opinion. What works for me mightn’t work for you.]

I’m about 2GB away from backing up the last 30GB of my holiday photos from Europe. As such, it seems like a perfect time to announce to my three loyal readers that I’ve switched my home cloud backup product from MozyHome to Backblaze. I’ve been running MozyHome on my Mac since 2009, and was generally happy with the performance and the product. It did some weird things at times, but Mozy support were generally pretty helpful, particularly when I took to my blog to rant about them. This is a good example of their support staff going beyond the call of duty. I even felt okay about their price structure change, although I don’t think it was very well handled with existing customers. In the meantime, I’d been looking at various home-brew NAS solutions and came across the Backblaze storage pod stuff (version 2 and 3 designs are here too). I’m no fan of them hippy startups, but there was something about Backblaze that got me interested. Not that my perpetually tolerant family would really put up with me building a storage pod for home use, but I liked that I could access the plans if I wanted to. So I kept researching, and tried out the client. And looked at the price.

And there you have it, my personal race to the bottom. I am the reason we have so much crap stuff in the world. I am the consumer who wants fast and quality for cheap. And that’s what I get with Backblaze. And it’s what I had for a while with MozyHome. And I imagine (without any evidence to back it up) that I would have had it with MozyHome to this day if Decho weren’t swallowed up by EMC. But here’s the hilarious thing: I’m on an ADSL1 internet connection. And I get about 300Kbps upload. If I’m lucky. And if nothing else is happening between my house and the exchange. Let me just clarify that it takes quite some time to get 220GB “to the cloud” when you have that kind of connection. Hell, I had a 13Mbps/13Mbps synchronous connection at my hotel in Korea on my way back from Europe. So here’s where I get thinky. Firstly, major tech companies doing “cloud” backup aren’t necessarily thinking about suburbanites in Australia when they’re talking about what their products can do. And that’s okay, because they’re going to make a lot more money off the enterprise than they will off me. But am I in the minority? Is everyone else sitting on fat connections to the internet? Or are they just not pushing as much data up there? I mean, I haven’t even considered sending my home videos to the cloud yet. That’s another few hundred GB. My friend has access to the NBN – maybe I could take my computer to her house and just let it seed the data for a week (month?) or so? Maybe I wouldn’t have this problem if I didn’t have a family and an insatiable desire to keep every photo I ever took of my kids?

In any case, here’s my enhancement request for Backblaze. Let me send you a hard drive of my stuff to manually seed in your data centre. I’ll pay the shipping to the US. I’ll even fill out the stupid forms and show my ID. You can keep the drive. In the same way you offer a recovery service where I can order a hard drive of my data from you, let me do the reverse. Please. Pretty please. Because some of us don’t have fat pipes but we still have data we want to protect.

Okay, maybe it wasn’t as thinky as we’d all hoped. I should probably also point out that my race to the bottom is on price, not quality.