Come And Splash Around In NetApp’s Data Lake

Disclaimer: I recently attended Storage Field Day 15.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

NetApp recently presented at Storage Field Day 15. You can see videos of their presentation here, and download my rough notes from here.

 

You Say Day-ta, I Say Dar-ta

Santosh Rao (Senior Technical Director, Workloads and Ecosystems) took us through some of the early big data platform challenges NetApp are looking to address.

 

Early Generation Big Data Analytics Platform

These were designed to deliver initial analytics solutions and were:

  • Implemented as Proof of concept; and
  • Solved a point project need.

The primary considerations of these solutions were usually cost and agility. The focus was to:

  • Limit up front costs and get the system operational quickly; and
  • Scalability, availability, and governance were afterthoughts

A typical approach to this was to use cloud or commodity infrastructure. This ended up becoming the final architecture. The problem with this approach, according to NetApp, is that it lead to unpredictable behaviour as copies manifested. You’d end up with 3-5 replicas of data copied across lines of business and various functions. Not a great situation.

 

Early Generation Analytics Platform Challenges

Other challenges with this architecture included:

  • Unpredictable performance;
  • Inefficient storage utilisation;
  • Media and node failures;
  • Total cost of ownership;
  • Not enterprise ready; and
  • Storage and compute tied (creating imbalance).

 

Next Generation Data Pipeline

So what do we really need from a data pipeline? According to NetApp, the key is “Unified Insights across LoBs and Functions”. By this they mean:

  • A unified enterprise data lake;
  • Federated data sources across the 2nd and 3rd platforms;
  • In-place access to the data pipeline (copy avoidance);
  • Spanned across edge, core and cloud; and
  • Future proofed to allow shifts in architecture.

Another key consideration is the deployment. The first proof of concept is performed by the business unit, but it needs to scale for production use.

  • Scale edge, core and cloud as a single pipeline
  • Predictable availability
  • Governance, data protection, security on data pipeline

This provides for a lower TCO over the life of the solution.

 

Data Pipeline Requirements

We’re not just playing in the core any more, or exclusively in the cloud. This stuff is everywhere. And everywhere you look the requirements differ as well.

Edge

  • Massive data (few TB/device/day)
  • Real-time Edge Analytics / AI
  • Ultra Low Latency
  • Network Bandwidth
  • Smart Data Movement

Core

  • Ultra high IO bandwidth (20 – 200+ GBps)
  • Ultra-low latency (micro – nanosecond)
  • Linear scale (1 – 128 node AI)
  • Overall TCO for 1-100+ PB

Cloud

  • Cloud analytics, AI/DL/ML
  • Consume and not operate
  • Cloud vendor vs on-premises stack
  • Cost-effective archive
  • Need to avoid cloud lock-in

Here’s picture of what the data pipeline looks like for NetApp.

[Image courtesy of NetApp]

 

NetApp provided the following overview of what the data pipeline looks like for AI / Deep Learning environments. You can read more about that here.

[Image courtesy of NetApp]

 

What Does It All Mean?

NetApp have a lot of tools at their disposal, and a comprehensive vision for meeting the requirements of big data, AI and deep learning workloads from a number of different angles. It’s not just about performance, it’s about understanding where the data needs to be to be considered useful to the business. I think there’s a good story to tell here with NetApp’s Data Fabric, but it felt a little like there remains some integration work to do. Big data, AI and deep learning means different things to different people, and there’s sometimes a reluctance to change the way people do things for the sake of adopting a new product. NetApp’s biggest challenge will be demonstrating the additional value they bring to the table, and the other ways in which they can help enterprise succeed.

NetApp, like some of the other Tier 1 storage vendors, has a broad portfolio of products at its disposal. The Data Fabric play is a big bet on being able to tie this all together in a way that their competitors haven’t managed to do yet. Ultimately, the success of this strategy will rely on NetApp’s ability to listen to customers and continue to meet their needs. As a few companies have found out the hard way, it doesn’t matter how cool you think your idea is, or how technically innovative it is, if you’re not delivering results for the business you’re going to struggle to gain traction in the market. At this stage I think NetApp are in a good place, and hopefully they can stay there by continuing to listen to their existing (and potentially new) customers.

For an alternative perspective, I recommend reading Chin-Fah’s thoughts from Storage Field Day 15 here.

NetApp Aren’t Just a Pretty FAS

Disclaimer: I recently attended Storage Field Day 12.  My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

 

Here are some notes from NetApp‘s presentation at Storage Field Day 12. You can view the video here and download my rough notes here. I made a joke during the presentation about Dave Hitz being lucky enough to sit next to me, but he’s the smart guy in this equation.

While I’ve not had an awful lot to do with NetApp previously, it’s not often I get to meet guys like Dave in real life. As such I found the NetApp presentation to be a tremendous experience. But enough about stars in my eyes. Arthur Lent spent some time covering off two technologies that I found intriguing: SnapCenter and Cloud Control for Microsoft Office 365.

[image courtesy of Tech Field Day]

 

SnapCenter Overview

SnapCenter is a key part of NetApp’s data protection strategy. You can read about this here. Here’s an overview on what was delivered with version 1.0.

End-to-end Data Protection

  • Simple, scalable, single interfaces to protect enterprise data (physical and virtualised) across the data fabric;
  • Meets SLAs easily by leveraging NTAP technologies;
  • Replaces traditional tape infrastructure with backup to the cloud; and
  • Extensible using user-created custom plug-ins.

 

Efficient In-place Copy Data Management

  • Leverages your existing NTAP storage infrastructure;
  • Provides visibility of copies across the data fabric; and
  • Enables reuse of copies for test/dev, DR, and analytics.

 

Accelerated application development

  • Transforms traditional IT to be more agile
  • Empowers application and database admins to self-serve
  • Enables DevOps and data lifecycle management for faster time to market

Sounds pretty good? There’s more though …

 

New with SnapCenter Version 2.0

  • End-to-end data protection for NAS file services from flash to disk to cloud (public or private);
  • Flexible, cost-effective tape replacement solution;
  • Integrated file catalog for simplified file search and recovery across the hybrid cloud; and
  • Automated protection relationship management and pre canned backup policies reduce management overhead.

SnapCenter custom plug-ins enable the creation and use of custom plugins. There are two community plug-ins available at release. Why use plugins?

  • Some mission critical applications or DBs are difficult to backup;
  • Custom plugins offer a  way to consistently backup almost anything;
  • Write the plugin once and distribute it to multiple hosts through SnapCenter;
  • Get all the SnapCenter benefits; and
  • A plugin only has the capabilities written into it.

 

Cloud Control for Microsoft Office 365

NetApp advised that this product would be “Available Soon”. I don’t know when that is, but you can read more about it here. NetApp says it offers a “[h]ighly scalable, multi-tenant SaaS offering for data protection, security, and compliance”. In short, it:

  • Is a SaaS offering to provide backup for Office 365 data: Exchange Online, SharePoint Online, OneDrive for Business;
  • Is an automated and simplified way to backup copies of customer’s critical data;
  • Provides flexibility – select your deployment model, archiving length, backup window;
  • Delivers search-and-browse features as well as granular recovery capabilities to find and restore lost data; and
  • Provides off-boarding capability to migrate users (mailboxes, files, folders) and site collections to on-premises.

 

Use Cases

  • Retain control of sensitive data as you move users, folders, mailboxes to O365;
  • Enable business continuity with fault-tolerant data protection;
  • Store data securely on NetApp at non-MS locations; and
  • Meet regulatory compliance with cloud-ready services.

 

Conclusion and Further Reading

In my opinion, the improvements in SnapCenter 2.0 demonstrate NetApp’s focus on improving some key elements of the offering, with the ability to use custom plugins being an awesome feature. I’m even more excited by Cloud Control for Office 365, simply because I’ve lost count of the number of enterprises that have shoved their email services up there (“low-hanging fruit” for cloud migration) and haven’t even considered how the hell they’re going to protect or retain the data in a useful way (“Doesn’t Microsoft do that for me?”). The amount of times people have simply overlooked some of the regulatory requirements on corporate email services is troubling, to say the least. If you’re an existing or potential NetApp customer this kind of product is something you should be investigating post haste.

Of course, I’ve barely begun to skim the surface of NetApp’s Data Fabric offering. As a relative newcomer, I’m looking forward to diving into this further in the near future. If you’re thinking of doing the same, I recommend you check out this white paper on NetApp Data Fabric Architecture Fundamentals for a great overview of what NetApp are doing in this space.