ClearSky Data Are Here To Help

Disclaimer: I recently attended VMworld 2016 – US.  My flights were paid for by myself, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.



ClearSky Data presented recently at Tech Field Day Extra VMworld US 2016. You can see video from the presentation here. My rough notes on the session are here.



Lazarus Vekiarides, CTO and Co-founder, took us through an overview. “ClearSky’s Global Storage Network delivers enterprise storage, spanning the entire data lifecycle, as a fully-managed service”. Sounds good. I like when people talk about lifecycles, and fully managed. These things are hard to do though.

ClearSky are aiming to provide “the performance and availability of on-premises storage with the economics and scale of the cloud”. They do this with:

  • economics
  • scalability
  • reliability
  • security
  • performance

According to ClearSky, we’ve previously used a “Fragmented Hybrid” model when it comes to cloud storage.


I must have been watching too much Better Off Ted with my eldest daughter, but when I heard of the Global Storage Network, it sounded a lot like something from a Veridian Dynamics advertisement. It’s not though, it’s cooler than that. With the Global Storage Network, ClearSky brings it all together.


You can read a whitepaper from ClearSky here, and there’s a data sheet here.


These Pictures are Compelling, But What Is It?

ClearSky say they are changing how enterprises access data

  • eliminate storage silos
  • pay only for what you use – up to 100% useable storage only
  • guaranteed 100% uptime
  • multi-site data access without replication
  • maximum of 30minute response time for Sev 1 and 2 tickets


This is all delivered via consumption-based model. The idea behind this is that you get charged for only the capacity you use, but your applications have all the performance they need. Like all good consumption models, if you delete data, you give back the space ClearSky and are no longer billed for any of it.

“Customers simply plug into the ClearSky service to get the storage they need, when and where they need it, with the security, scalability and resilience that a business depends on.”


I’m Still Not Sure

That’s because I’m bad at explaining things. There’s an edge appliance (2RU appliance / 24 slots – about 6TB of flash cache) that is used. Cache is available (on resilient storage), but not copied. ClearSky POPs then offer distributed and optimised storage, with multiple copies to the cloud. Maybe a picture will explain it a bit better.


With this architecture, ClearSky manages the entire data lifecycle. Active data lives either next to your applications, or in the metro area near your applications. Any cold data, backup and DR stuff is stored as multiple copies of data geographically dispersed in the network.

There’s support for iSCSI or FC today and write back cache is processed every 10 minutes and pushed to the metro cache or cloud.


What Do I Use It For?

Data in the ClearSky network can be accessed from multiple locations without replication, offering mobility and availability.

Multi-site availability

  • Load balancing and disaster recovery

Workload mobility

  • In-metro and cross-metro
  • Application data can be accessed from other metros

And you can use it in all the ways you think you would, including DR, DC migration, and load balancing.


Make it Splunky

You probably know that companies use Splunk to analyse machine data. I’ve used it at home to munge squid logs when trying to track my daughter’s internet use. Splunk captures, indexes and correlates machine data in a searchable repository from which it can generate graphs, reports, alerts, and visualisations. Spunk demands high performance and agile storage, and ClearSky have some experience with this. There’s also a Splunk Reference Architecture. ClearSky say they’re a good fit for Splunk Enterprise. The indexers simply write to the ClearSky Edge Cache & ClearSky manages index migration through cache and storage layers – greatly simplifying the solution. They also offer “[h]ighly consistent ingest performance, cloud capacity, and integrated backup using ClearSky snapshot technology”.



This was the first time I’d encountered ClearSky Data, and I liked the sound of a lot of what I heard. They make some big claims on performance, but the architecture seems to support these, at least on the face of it. I’m a fan of people who are into fully-managed data lifecycles. I hope to have the opportunity to dig further into this technology at some stage to see if they’re the real deal. People use caching solutions because they have the ability to greatly improve the perceived (and actual) performance of infrastructure. And managed services are certainly popular with enterprises looking at alternatives to their current, asset-heavy, models of storage consumption. If ClearSky can do everything it says it can, they are worth looking into further.

Scality’s RING has a lot going on

Disclaimer: I recently attended VMworld 2016 – US.  My flights were paid for by myself, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.




Here are my notes from Scality’s presentation at Tech Field Day Extra VMworld US 2016 Edition. You can get a rough copy here. You can also view videos of the Scality presentation here.


The Ring?

Like the movie? No. The RING. The Scality RING is object-based software-defined storage for the cloud. It runs on standard x86 servers to create a giant pool of storage.


[image via Scality website]

It can also protect the data and provides 100% reliable, high performance access for any capacity-driven applications. While it can run on any x86 hardware, it was pointed out that “[s]ome servers are better than others”.

Customers are telling Scality that:

  • The “cloudification” of enterprise IT is accelerating
  • Enterprise wants “multiple clouds”
  • Object is the best for large capacity storage, and S3 is the standard API
  • Files are integral part of enterprise IT
  • DevOps influences infrastructure choices

Scality have 116 customers so far, spread across the globe (50% North America, 35% EMEA, 15% APAC). Scality are big on hardware alliances (being a software play, this makes sense), and have agreements in place with HPE, Dell, and Cisco.


(The) RING 6.0 – A better sequel than we’d hoped for

Paul Speciale,  VP of Products at Scality,  took us through some of the features of RING 6.0.


The focus for Scality with 6.0 has been on

  • “Enterprization” – I’m not sure it’s a real word, but I do like the connotation
  • S3 Connector – Enterprise Deployments
  • Easy deployment model
  • Secure multi-tenancy and data at rest
  • Directory services federation
  • Utilisation reporting and management


Easy Deployment Model

  • All services deployed uniformly as Docker containers
  • Full scale-out: Any S3 request can be handled by any S3 Connector (“any-to-any”), standard IP load balancing and failover

Vault Service

  • Implements IAM Multi-tenancy with Accounts, Users, Groups, Roles, Access Key/Secret Key pairs
  • IAM REST compatible managed via AWS CLI
  • Can be federated with Active Directory over ADFS/sAML 2.0

Metadata Service

  • S3 optimised service: fast, available, scale-out
  • Integral in RING layer – leveraged for Bucket & Vault metadata


Comprehensive IAM multi-tenancy and encryption

AWS Identity and Access Management (IAM)

  • S3 Connector implements all IAM multi-tenancy concepts: Accounts, Keys, Users, Groups, Roles
  • IAM policies for highly granular access control
  • AWS compatible: Management of IAM entities (Users, Groups) via standard AWS CLI and JSON policy language
  • Secure authentication via AWS Signature v4 and v2 HMAC schemes

Bucket-level Encryption

  • Pre-bucket encryption-at-rest of object data (specified through header on Bucket PUT)
  • Encryption via AES-256bit OpenSSL libraries
  • Integrates with customer-provided Key Management Service (KMS) via KMIP 1.1 API
  • KMS is invoked on PUT and GET operations



Federated Access SSO to S3

  • Requires a SAML 2.0 Compatible ldP
  • ldP provides mapping from Enterprise Direcoty Server (AD)
  • Vault enables SSO via SAML assertion


S3 Utilization Reporting and Management

Stats and management framework

  • Real-time and historical statistics and metrics collected in scalable repository

Published RESTful APIs for monitoring and management

  • S3 Connector publishes key utilisation metrics (capacity, bandwidth and operations) at four levels of granularity
  • REST APIs for custom tool integrations

Management tools

  • User and Group management via standard AWS commands (CLI) and REST API
  • Integrated tools for graphing, metrics, log visualisation and search: Elastic Search and Kibana, Grafana, Redis.


S3 Metadata – the scale-out engine of the connector

Metadata Service

  • Purpose-built for availability, resiliency, scale-out and fast performance for requirements of S3 operations
  • Key/value store replicated on SSDs (one per server)
  • Additional copy maintained as diff backup in RING for DR

The hard part: Distributed Consensus Algorithm

  • Leader with dynamic election and management of consistency (modified Raft protocol)
  • Can be distributed across DCs to enable multi-geo operations
  • By default, strict consistency rules enforced

High-availability and Performance

  • The cluster consists of multiple servers – odd number to provide majority quorum (5, 7 or 9)
  • As long as the majority (quorum) of servers is available, the service and Bucket remain available
  • Restarts failed servers with automated resynchronization


S3 Connector Scale-out at all levels


S3 as the best On-ramp to Object Storage


  • Developers can install and develop S3-based apps locally
  • Enterprises can host a small, local object storage systems in production
  • Enterprise can host a local test/dev environment to learn about object storage


Scality Open Source S3 Server

S3 API Compatible with the S3 Connector

  • Single Docker Container for simplified deployment
  • Stores data in local Docker Volume (local storage)
  • Metadata managed in single key/value database
  • S3 compatible Bucket and Object operations, error and response codes

Downloadable on Docker Hub

  • Can be pulled via UI or Docker pull command as per instructions on
  • Can be hosted on laptops and single servers
  • Seamless transition to scale-out solution on RING

ISV Certified with multiple solutions

  • Backup, archive, sync-n-share, surveillance, migration



So what do you get with Scality?

  • S3 Server & S3 Connector
  • Provides a seamless transition from “free” test/dev single-server trial to full scale-out deployments (note that the trial is not available to robots).
  • Small to large deployments from local storage to full RING
  • Simple to deploy via Docker containers
  • Comprehensive Enterprise Deployment Features
  • Multi-tenancy
  • Active Directory SSO/federation


Further Reading and Thoughts

Justin did a comprehensive write-up on Scality here. Sure, I could have saved you a lot of time and sent you there in the first place, but that’s not how I roll. I admit I’m not super familiar with Scality and have yet to get cracking with the RING trial. That said, with version 6.0 they seem to included a lot of features that enterprises are interested in when looking at object storage with cloudy tendencies. There’s decent support for file protocols such as NFS and SMB, just no block. I covered some of the other enterprise features above, and they’ve been around for a little while now. But that’s not what the kids are into these days in any case. If you’re looking into rolling your own object solution, I recommend giving Scality a spin.

So NooBaa, eh?

Disclaimer: I recently attended VMworld 2016 – US.  My flights were paid for by myself, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.



I had the opportunity to speak with NooBaa about six months ago. At the time they were still developing their product, but I thought it looked pretty cool. At Tech Field Day Extra,  they demoed their cloud services engine. The company was founded by Yuval Dimnik (Co-founder and CEO) and Guy Margalit (Co-founder and CTO). If you’re familiar with Exanet or Dell FluidFS, you’ll be familiar with some of their capabilities. NooBaa was founded in 2014, with a product launch in September 2016, and a current headcount of 14 (they tell us have a strong security/storage DNA).

“Customers don’t care how you do your tech, they care how it fixes their problems”


So NooBaa, eh?

They have thought about the name. A lot. It’s a pure software product enabling folks to create and provision cloud services

  • Storage (like AWS S3) – First!
  • Serverless compute (like AWS Lambda) – Future

The key is that the customer owns the service, with

  • Full control of who accesses what, and what stays on-premises
  • No cloud vendor lock-in

The services use

  • Heterogeneous resources – cloud resources and servers
  • In the cloud, on-premises, and spanned

So, take all the spare storage you have lying about on Windows and Linux VMs, bang it all in a single namespace and present it back to your object-friendly apps. Replicate it to the cloud if you like. Or use all your spare clouds. Sounds like a cool idea.
Design Considerations (once bitten, twice shy)

They wanted to design a product that behaves like the cloud, but gives you the choice to consume from on-premises or cloud.

But can you predict the unpredictable?

  • Cloud strategy? Everyone has one of those, they’re just not sure what it really means.
  • Growth rate? Oh, it grows a lot.
  • Hardware technologies? Yep, software still needs hardware.
  • Vendors? Who can really work out what they do?
  • Organisational changes?
  • Security issues and lurking “heart bleeds”?

Stuff is hard. Along with this, NooBaa were looking to add the following capabilities

  • On-premises, multi-cloud, and supporting cloud migration
  • P2P scalable capacity
  • Monitor hardware and adapt
  • Agnostic to the machine
  • Allowed to grow, allowed to shrink
  • User space as a religion – when you need to fix that you can do it right away


NooBaa is all about a hybrid approach to resources, supporting multiple cloud providers and on-premises resources. It also has support for multiple sites.


The key to NooBaa’s storage performance in what might seem to be non-performant environments is the way it stores data, as you can see in the below diagram.



Note that they’re not targeting low-latency workloads. At this stage they’re cloud agnostic and hoping to keep things that way. Heterogeneous resources are key for NooBaa. You can also sign up for the Community Edition – limited to 20TB aggregate object size.
Final Thoughts and Reading


The name doesn’t roll off the tongue, and the colour-scheme is very pretty. But I think this belies the thought that’s gone into this product. Yuval and his team have a strong background in scalable object storage, and I’m excited to see them finally come out of stealth. The concept of treating storage nodes as second class citizens is interesting, and I’m looking forward to taking the Community Edition for a spin when I get my act together in the near future. In the meantime, head over to Alastair’s blog for a more succinct write-up on what we saw. John White also did a great post here. You can grab a copy of my raw notes here, and watch NooBaa’s TFDx presentations here.


Paessler have been doing this for a while now

Disclaimer: I recently attended VMworld 2016 – US.  My flights were paid for by myself, VMware provided me with a free pass to the conference and various bits of swag, and Tech Field Day picked up my hotel costs. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event.  Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.



Paessler recently presented at Tech Field Day Extra – VMworld US 2016. My rough notes can be found here. You can see videos from the presentation here and here.


What’s a Paessler?

Benjamin Day, Senior Systems Engineer with Paessler took us through some of the background on the company. Founded in 1997 in Nuremberg, Germany, they are 100% owned by founders and employees. The US is their largest market and they tell us that over 70% of Fortune 100 enterprises worldwide use PRTG.


What’s a Sensor?

PRTG is often referred to as “MRTG for Windows”. When I say often, I mean it was mentioned by Paessler yesterday. But they also say it on their website. You can get a product overview from here. You can also check out a demo here.

So what are sensors? PRTG is defined (built and licensed) at the sensor level. Pretty much anything you would monitor is a sensor (you can read more on that here). Note also that it’s one sensor, but not one metric (these are known as channels). Generally speaking you can count on using 5-10 sensors per device. Here’s an image I swiped from the Paessler website that kind of shows what sensors look like.

TFDx - Paessler - PRTG Sensor_web

Licences come in lots of 500, 1000, 2500, 5000, and Unlimited. The good things is that they’re not named, so Christopher doesn’t have to monitor those printers if he really doesn’t want to.
From a notification perspective, there are a bunch of options to get the message out, and you can send things via:

  • Email;
  • SMS (through third-party or IP-enabled SMS gateways);
  • PRTG-enabled smart devices (there’s a mobile app);
  • syslog; and
  • SNMP traps.

There are also options for auto remediation, and you can do things via a script (powershell, shell, etc) or, amongst other things, kick off a web action (handy for ticketing systems)


Thresholds and Notifications

There are all sorts of things you can do in terms of actions when you exceed thresholds, including:

  • Sending email
  • Sending push notifications (to a user or group, and you can customise the message)

You can modify the format – html, text, text with custom content and customise the priority. You can add entry to event logs and send Amazon simple notification service message. You might want to assign a ticket as well.

Note also that PRTG is multi-tenant capable, making it an interesting choice for service providers. There’s also an option to “white box” it with your own logo if you’re into that kind of thing. Note that MSP licensing is done in a different fashion to normal licensing.

My favourite thing (besides what seems like a pretty comprehensive monitoring capability and lightweight deployment requirement) is that every sensor has a QR code. And the PRTG app has a QR code scanner (you see where I’m going with this?). You can print out the device QR codes and they’re come up in PRTG. There’s no longer a requirement to faff about with long labels on hosts. If you’re using per port sensors on your switches, you can put a QR code on the cable.



Paessler have been doing this for almost 20 years now. It strikes me that the product seems easy to deploy and use while being fairly powerful and feature-rich. If you’d like to try PRTG out there’s a free license you can use for both personal and commercial use. This is limited to 100 sensors.

If you can monitor it with SNMP (their preference) or WMI, and are happy to use a Windows platform, then PRTG could be the tool for you. I recommend checking them out.


Tech Field Day – I’ll Be At TFD Extra at VMworld US 2016


Sure, the title is a bit of a mouthful. But I think it gets the point across. I mentioned recently that I’ll be heading to the US in less than a week for VMworld. This is a quick post to say that I’ll also have the opportunity to participate in my first Tech Field Day Extra event while at VMworld.  If you haven’t heard of the very excellent Tech Field Day events, you should check them out. You can also check back on the TFDx website during the event as there’ll likely be video streaming along with updated links to additional content. You can also see the list of delegates and event-related articles that they’ve published.

I think it’s a great line-up of companies this time around, with some I’m familiar with and some not so much. I’m attending the Tuesday session and will be hearing from ClearSky Storage, NooBaa and Paessler.


It should be a lot of fun!