Speaking of old things, El Reg had some info on running (hobbyist) x86-64 editions of OpenVMS. I ran OpenVMS on a DEC Alpha AXP-150 at home for a brief moment, but that feels like it was a long time ago.
This article from JB on the Bowlo was excellent. I don’t understand why Australians are so keen on poker machines (or gambling in general), but it’s nice when businesses go against the grain a bit.
Disclaimer: I recently attended Storage Field Day 22. Some expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
You’ve heard of Fujifilm before, right? They do a whole bunch of interesting stuff – batteries, cameras, copiers. Nami Matsumoto, Director of DMS Marketing and Operations, took us through some of Fujifilm’s portfolio. Fujifilm’s slogan is “Value From Innovation”, and it certainly seems to be looking to extract maximum value from its $1.4B annual spend on research and development. The Recording Media Products Division is focussed on helping “companies future proof their data”.
[image courtesy of Fujifilm]
The Problem
The challenge, as always (it seems), is that data growth continues apace while budgets remain flat. As a result, both security and scalability are frequently sacrificed when solutions are deployed in enterprises.
Rapid data creation: “More than 59 Zettabytes (ZB) of data will be created, captured, copied, and consumed in the world this year” (IDC 2020)
Shift from File to Object Storage
Archive Market – 60 – 80%
Flat IT budgets
Cybersecurity concerns
Scalability
Enter The Archive
FUJIFILM Object Archive
Chris Kehoe, Director of DMS Sales and Engineering, spent time explaining what exactly FUJIFILM Object Archive was. “Object Archive is an S3 based archival tier designed to reduce cost, increase scale and provide the highest level of security for long-term data retention”. In short, it:
Predictable costs and TCO with no API or egress fees
Workloads?
It’s optimised to handle the long-term retention of data, which is useful if you’re doing any of these things:
Digital preservation
Scientific research
Multi-tenant managed services
Storage optimisation
Active archiving
What Does It Look Like?
There are a few components that go into the solution, including a:
Storage Server
Smart cache
Tape Server
[image courtesy of Fujifilm]
Tape?
That’s right, tape. The tape library supports LTO7, LTO8, TS1160. The data is written using “OTFormat” specification (you can read about that here). The idea is that it packs a bunch of objects together so they get written efficiently.
The product starts at 1PB of licensing. You can read the Solution Brief here. There’s an informative White Paper here. And there’s one of those nice Infographic things here.
Deployment Example
So what does this look like from a deployment perspective? One example was a typical primary storage deployment, with data archived to an on-premises object storage platform (in this case NetApp StorageGRID). When your archive got really “cold”, it would be moved to the Object Archive.
[image courtesy of Fujifilm]
[image courtesy of Fujifilm]
Thoughts
Years ago, when a certain deduplication storage appliance company was acquired by a big storage slinger, stickers with “Tape is dead, get over it” were given out to customers. I think I still have one or two in my office somewhere. And I think the sentiment is spot on, at least in terms of the standard tape library deployments I used to see in small to mid to large enterprise. The problem that tape was solving for those organisations at the time has largely been dealt with by various disk-based storage solutions. There are nonetheless plenty of use cases where tape is still considered useful. I’m not going to go into every single reason, but the cost per GB of tape, at a particular scale, is hard to beat. And when you want to safely store files for a long period of time, even offline? Tape, again, is hard to beat. This podcast from Curtis got me thinking about the demise of tape, and I think this presentation from Fujifilm reinforced the thinking that it was far from on life support – at least in very specific circumstances.
Data keeps growing, and we need to keep it somewhere, apparently. We also need to think about keeping it in a way that means we’re not continuing to negatively impact the environment. It doesn’t necessarily make sense to keep really old data permanently online, despite the fact that it has some appeal in terms of instant access to everything ever. Tape is pretty good when it comes to relatively low energy consumption, particularly given the fact that we can’t yet afford to put all this data on All-Flash storage. And you can keep it available in systems that can be relied upon to get the data back, just not straight away. As I said previously, this doesn’t necessarily make sense for the home punter, or even for the small to midsize enterprise (although I’m tempted now to resurrect some of my older tape drives and see what I can store on them). It really works better at large scale (dare I say hyperscale?). Given that we seem determined to store a whole bunch of data with the hyperscalers, and for a ridiculously long time, it makes sense that solutions like this will continue to exist, and evolve. Sure, Fujifilm has sold something like 170 million tapes worldwide. But this isn’t simply a tape library solution. This is a wee bit smarter than that. I’m keen to see how this goes over the next few years.
Disclaimer: I recently attended Storage Field Day 21. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
MinIO recently presented at Storage Field Day 21. You can see videos of the presentation here, and download my rough notes from here.
What Is It?
To quote the good folks at MinIO, it is a “high performance, Kubernetes-native object store”. It is designed to be used for large-scale data infrastructure, and was built from scratch to be cloud native.
[image courtesy of MinIO]
Design Principles
MinIO has been built with the following principles in mind:
Cloud Native – born in the cloud with “cloud native DNA”
Performance Focussed – believe it is the fastest object store in existence
Simplicity – designed for simplicity because “simplicity scales”
S3 Compatibility
MinIO is heavily focussed on S3 compatibility. It was first to market with V4 and one of the few vendors to support S3 Select. It has also been strictly consistent from inception.
Put Me In Your Favourite Box
The cloud native part of MinIO was no accident, and as a result more than 62% of MinIO instances run in containers (according to MinIO). 43% of those instances are also managed via Kubernetes. It’s not just about jamming this solution into your favourite container solution though. The lightweight nature of it means you can deploy it pretty much anywhere. As the MinIO folks pointed out during the presentation, MinIO is going everywhere that AWS S3 isn’t.
Thoughts And Further Reading
I love object storage. Maybe not in the way I love my family or listening to records or beer, but I do love it. It’s not just useful for storage for the great unwashed of the Internet, but also backup and recovery, disaster recovery, data archives, and analytics. And I’m a big fan of MinIO, primarily because of the S3 compatibility and simplicity of deployment. Like it or not, S3 is the way forward in terms of a standard for object storage for cloud native (and a large number of enterprise) workloads. I’ve written before about other vendors being focussed on this compatibility, and I think it’s great that MinIO has approached this challenge with just as much vigour. There are plenty of problems to be had deploying applications at the best of times, and being able to rely on the storage vendor sticking to the script in terms of S3 compatibility takes one more potential headache away.
The simplicity of deployment is a big part of what intrigues me about MinIO too. I’m old enough to remember some deployments of early generation on-premises object storage systems that involved a bunch of hardware and complicated software interactions for what ultimately wasn’t a great experience. Something like MinIO can be up and running on some pretty tiny footprints in no time at all. A colleague of mine shared some insights into that process here.
And that’s what makes this cool. It’s not that MinIO are trying to take a piece of the AWS pie. Rather, it’s positioning the solution as one that can operate everywhere that the hyperscalers aren’t. Putting object storage solutions in edge locations has historically been a real pain to do. That’s no longer the case. Part of this has to do with the fact that we’ve got access to really small computers and compact storage. But it also has a bit to do with lightweight code that can be up and running in a snap. Like some of the other on-premises object vendors, MinIO has done a great job of turning people on to the possibility of doing cool storage for cloud native workloads outside of the cloud. It seems a bit odd until you think about all of the use cases in enterprise that might work really well in cloud, but aren’t allowed to be hosted in the cloud. It’s my opinion that MinIO has done a great job of filling that gap (and exceeding expectations) when it comes to lightweight, easy to deploy object storage. I’m looking forward to see what’s next for them, particularly as the other vendors start to leverage the solution. For another perspective on MinIO’s growth, check out Ray’s article here.
Disclaimer: I recently attended Storage Field Day 20. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Intel recently presented at Storage Field Day 20. You can see videos of the presentation here, and download my rough notes from here.
Intel Optane Persistent Memory
If you’re a diskslinger, you’ve very likely heard of Intel Optane. You may have even heard of Intel Optane Persistent Memory. It’s a little different to Optane SSD, and Intel describes it as “memory technology that delivers a unique combination of affordable large capacity and support for data persistence”. It looks a lot like DRAM, but the capacity is greater, and there’s data persistence across power losses. This all sounds pretty cool, but isn’t it just another form factor for fast storage? Sort of, but the application of the engineering behind the product is where I think it starts to get really interesting.
Enter DAOS
Distributed Asynchronous Object Storage (DAOS) is described by Intel as “an open source software-defined scale-out object store that provides high bandwidth, low latency, and high I/O operations per second (IOPS) storage containers to HPC applications”. It’s ostensibly a software stack built from the ground up to take advantage of the crazy speeds you can achieve with Optane, and at scale. There’s a handy overview of the architecture available on Intel’s website. Traditional object (and other storage systems) haven’t really been built to take advantage of Optane in quite the same way DAOS has.
[image courtesy of Intel]
There are some cool features built into DAOS, including:
Ultra-fine grained, low-latency, and true zero-copy I/O
Advanced data placement to account for fault domains
Software-managed redundancy supporting both replication and erasure code with online rebuild
End-to-end (E2E) data integrity
Scalable distributed transactions with guaranteed data consistency and automated recovery
Dataset snapshot capability
Security framework to manage access control to storage pools
Software-defined storage management to provision, configure, modify, and monitor storage pools
Exciting? Sure is. There’s also integration with Lustre. The best thing about this is that you can grab it from Github under the Apache 2.0 license.
Thoughts And Further Reading
Object storage is in its relative infancy when compared to some of the storage architectures out there. It was designed to be highly scalable and generally does a good job of cheap and deep storage at “web scale”. It’s my opinion that object storage becomes even more interesting as a storage solution when you put a whole bunch of really fast storage media behind it. I’ve seen some media companies do this with great success, and there are a few of the bigger vendors out there starting to push the All-Flash object story. Even then, though, many of the more popular object storage systems aren’t necessarily optimised for products like Intel Optane PMEM. This is what makes DAOS so interesting – the ability for the storage to fundamentally do what it needs to do at massive scale, and have it go as fast as the media will let it go. You don’t need to worry as much about the storage architecture being optimised for the storage it will sit on, because the folks developing it have access to the team that developed the hardware.
The other thing I really like about this project is that it’s open source. This tells me that Intel are both focused on Optane being successful, and also focused on the industry making the most of the hardware it’s putting out there. It’s a smart move – come up with some super fast media, and then give the market as much help as possible to squeeze the most out of it.
You can grab the admin guide from here, and check out the roadmap here. Intel has plans to release a new version every 6 months, and I’m really looking forward to seeing this thing gain traction. For another perspective on DAOS and Intel Optane, check out David Chapa’s article here.
Datadobirecently announced S3 migration capabilities as part of DobiMigrate 5.9. I had the opportunity to speak to Carl D’Halluin and Michael Jack about the announcement and thought I’d share some thoughts on it here.
What Is It?
In short, you can now use DobiMigrate to perform S3 to S3 object storage migrations. It’s flexible too, offering the ability to migrate data from a variety of on-premises object systems up to public cloud object storage, between on-premises systems, or back to on-premises from public cloud storage. There’s support for a variety of S3 systems, including:
In the future Datadobi is looking to add support for AWS Glacier, object locks, object tags, and non-current object versions.
Why Would You?
There are quite a few reasons why you might want to move S3 data around. You could be seeing high egress charges from AWS because you’re accessing more data in S3 than you’d initially anticipated. You might be looking to move to the cloud and have a significant on-premises footprint that needs to go. Or you might be looking to replace your on-premises solution with a solution from another vendor.
How Would You?
The process used to migrate object is fairly straightforward, and follows a pattern that will be familiar if you’ve done anything with any kind of storage migration tool before. In short, you setup a migration pair (source and destination), run a scan and first copy, then do some incremental copies. Once you’ve got a maintenance window, there’s a cutover where the final scan and copy is done. And then you’re good to go. Basically.
[image courtesy of Datadobi]
Final Thoughts
Why am I so interested in these types of offerings? Part of it is that it reminds of all of the time I burnt through earlier in my career migrating data from various storage platforms to other storage platforms. One of the funny things about storage is that there’s rarely enough to service demand, and it rarely delivers the performance you need after it’s been in use for a few years. As such, there’s always some requirement to move data from one spot to another, and to keep that data intact in terms of its permissions, and metadata.
Amazon’s S3 offering has been amazing in terms of bringing object storage to the front of mind of many storage consumers who had previously only used block or file storage. Some of those users are now discovering that, while S3 is great, it can be expensive if you haven’t accounted for egress costs, or you’ve started using a whole lot more of it than initially anticipated. Some companies simply have to take their lumps, as everything is done in public cloud. But for those organisations with some on-premises footprint, the idea of being able to do performance oriented object storage in their own data centre holds a great deal of appeal. But how do you get it back on-premises in a reliable fashion? I believe that’s where Datadobi’s solution really shines.
I’m a fan of software that makes life easier for storage folk. Platform migrations can be a real pain to deal with, and are often riddled with risky propositions and daunting timeframes. Datadobi can’t necessarily change the laws of physics in a way that will keep your project manager happy, but it can do some stuff that means you won’t be quite as broken after a storage migration as you might have been previously. They already had a good story when it came to file storage migration, and the object to object story enhances it. Worth checking out.
Disclaimer: I recently attended Storage Field Day 19. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
MinIO recently presented at Storage Field Day 19. You can see videos of the presentation here, and download my rough notes from here.
MinIO – A Sharp Looking 3-Piece Suite
AB Periasamy spoke to the delegates first, describing MinIO as “a high performance, software-defined, distributed object storage server, designed for peta-scale data infrastructure”. It was built from scratch with the private cloud as its target and is comprised of three components:
MinIO Server
MinIO Client
MinIO SDK
He noted that “the private cloud is a very different beast to the public cloud”.
Why Object?
The MinIO founders felt strongly that data would continue to grow, S3 would overtake POSIX, and that the bulk of data would exist outside of AWS. It’s his opinion that private cloud finally started emerging as a real platform last year.
Architecture
A number of guiding principles were adopted by MinIO when designing the platform. MinIO is:
Focused on performance. They believe it is the fastest object store in existence;
Cloud native. It is the most K8s-friendly solution available for the private cloud;
100% open-source enables an increasingly dominant position in the enterprise;
Built for scale using the same philosophy as web scalers; and
Designed for simplicity. Simplicity scales – across clients, clouds, and machines.
[image courtesy of MinIO]
Other Features
Some of the other key features MinIO is known for include:
Scalability;
Support for erasure coding;
Identity and Access Management capability;
Encryption; and
Lifecycle Management.
MinIO is written in Go and is 100% open source. “The idea of holding customers hostage with a license key – those days are over”.
Deployment Use Cases
MinIO delivers usable object storage capability in all of the places you would expect it to.
Big Data / Machine Learning environments
HDFS replacements
High performance data lake / warehouse infrastructure
Cloud native applications (replacing file and block)
Multi-cloud environments (portability)
Endpoint for streaming workloads
Thoughts and Further Reading
If you watch the MinIO presentation, or check my notes, you’ll see a lot of slides with some impressive numbers in terms of both performance and market penetration. MinIO is not your standard object storage stack. A number of really quite big customers use it internally to service their object storage requirements. And, because it’s open source, a whole lot of people are really curious about how it all works, and have taken it for a spin at some stage or another. The story here isn’t that MinIO is architecturally a bit different from some other vendors’ storage offerings. Rather, it’s the fact that it’s open source and accessible to every punter who wants to grab it. This is exactly the reason why neckbeards get excited about open source products. Because you can take a core infrastructure function, and build a product that does something extremely useful from it. And you can contribute back to the community.
The big question, though, is how to make money of this kind of operating model. A well known software company made a pretty decent stab at leveraging open source products as a business model, delivering enhanced support services as a way to keep the cash coming in. This is very much what MinIO is doing as well. It has a number of very big customers willing to pay for an enhanced support experience via a subscription. It’s an interesting idea. Come up with a product that does what it says it will quite well. Make it easy to get hold of. Help big companies adopt it at scale. Then keep them up and running when said open source code becomes a mission critical piece of their business workflow. I want this model to work, I really do. And I have no evidence to say that it won’t. The folks at MinIO were pretty confident about what they could deliver with SUBNET in terms of the return on investment. I’m optimistic that MinIO will be around for a while longer, as the product looks the goods, and the people behind the product have spent some time thinking through what this will look like in the future. I also recommend checking out Chin-Fah’s recent article for another perspective.
We spoke briefly about just how insane modern data requirements are becoming, in terms of both volume and performance requirements. The example offered up was that of an Advanced Driver-Assistance System (ADAS). These things need a lot of capacity to work, with training data starting at 15PB of data with performance requirements approaching 100GB/s.
Autonomy – Level 2+
10 Deep neural networks needed
Survey car – 2MP cameras
2PB per year per car
100 NVIDIA DGX-1 servers per car
When your hot data is 15 – 30PB and growing – it’s a problem.
What’s New In 7?
SwiftStack has been working to address those kinds of challenges with version 7.
Ultra-scale Performance Architecture
SwiftStack has managed to get some pretty decent numbers under its belt, delivering over 100GB/s at scale with a platform that’s designed to scale linearly to higher levels. The numbers stack up well against some of their competitors, and have been validated through:
ProxyFS Edge takes advantage of SwiftStack’s file services to deliver distributed file services between edge, core, and cloud. The idea is that you can use it for “high-throughput, data-intensive use cases”.
[image courtesy of SwiftStack]
Enabling functionality:
Containerised deployment of ProxyFS agent for orchestrated elasticity
Caching at the edge, minimising latency for improved application performance
Load-balanced, high-throughput API-based communication to the core
1space File Connector
But what if you have a bunch of unstructured data sitting in file environments that you want to use with your more modern apps? 1space File Connector brings enterprise file data into the cloud namespace, and “[g]ives modern, cloud-native applications access to existing data without migration”. The thinking is that you can modernise your workflows at an incremental rate, rather than having to deal with the app and the storage all in one go. incrementally
[image courtesy of SwiftStack]
Enabling functionality:
Containerised deployment 1space File Connector for orchestrated elasticity
File data is accessible using S3 or Swift object APIs
Scales out and is load balanced for high-throughput
1space policies can be applied to file data when migration is desired
The SwiftStack AI Architecture
SwiftStack has also developed a comprehensive AI Architecture model, describing it as “the customer-proven stack that enables deep learning at ultra-scale”. You can read more on that here.
Ultra-Scale Performance
Shared-nothing distributed architecture
Keep GPU compute complexes busy
Elasticity from Edge-to-Core-to-Cloud
With 1space, ingest and access data anywhere
Eliminate data silos and move beyond one cloud
Data Immutability
Data can be retained and referenced indefinitely as it was originally written
Enabling traceability, accountability, confidence, and safety throughout the life of a DNN
Optimal TCO
Compelling savings compared to public cloud or all-flash arrays Real-World Confidence
Notable AI deployments for autonomous vehicle development
SwiftStack PRO
The final piece is the SwiftStack PRO offering, a support service delivering:
24×7 remote management and monitoring of your SwiftStack production cluster(s);
Incorporating operational best-practices learned from 100s of large-scale production clusters;
Including advanced monitoring software suite for log aggregation, indexing, and analysis; and
Operations integration with your internal team to ensure end-to-end management of your environment.
Thoughts And Further Reading
The sheer scale of data enterprises are working with every day is pretty amazing. And data is coming from previously unexpected places as well. The traditional enterprise workloads hosted on NAS or in structured applications are insignificant in size when compared to the PB-scale stuff going on in some environments. So how on earth do we start to derive value from these enormous data sets? I think the key is to understand that data is sometimes going to be in places that we don’t expect, and that we sometimes have to work around that constraint. In this case, SwiftStack has recognised that not all data is going to be sitting in the core, or the cloud, and it’s using some interesting technology to get that data where you need it to get the most value from it.
Getting the data from the edge to somewhere useable (or making it useable at the edge) is one thing, but the ability to use unstructured data sitting in file with modern applications is also pretty cool. There’s often reticence associated with making wholesale changes to data sources, and this solution helps to make that transition a little easier. And it gives the punters an opportunity to address data challenges in places that may have been inaccessible in the past.
SwiftStack has good pedigree in delivering modern scale-out storage solutions, and it’s done a lot of work ensure that its platform adds value. Worth checking out.
The Backblaze Storage Pod (currently version 6) recently turned 10 years old. That’s a long time for something to be around (and successful) in a market like cloud storage. I asked to Yev and Andy about where they saw the pod heading, and whether they thought there was room for Flash in the picture. Andy pointed out that, with around 900PB under management, Flash still didn’t look like the most economical medium for this kind of storage task. That said, they have seen the main HDD manufacturers starting to hit a wall in terms of the capacity per drive that they can deliver. Nonetheless, the challenge isn’t just performance, it’s also the fact that people are needing more and more capacity to store their stuff. And it doesn’t look like they can produce enough Flash to cope with that increase in requirements at this stage.
Version 7.0
We spoke briefly about what Pod 7.0 would look like, and it’s going to be a “little bit faster”, with the following enhancements planned:
Updating the motherboard
Upgrade the CPU and consider using an AMD CPU
Updating the power supply units, perhaps moving to one unit
Upgrading from 10Gbase-T to 10GbE SFP+ optical networking
Upgrading the SATA cards
Modifying the tool-less lid design
They’re looking to roll this out in 2020 some time.
Tiger Style?
So what’s all this about Veeam, Tiger Bridge, and Backblaze B2? Historically, if you’ve been using Veeam from the cheap seats, it’s been difficult to effectively leverage object storage to use as a repository for longer term data storage. Backblaze and Tiger Technology have gotten together to develop an integration that allows you to use B2 storage to copy your Veeam protection data to the Backblaze cloud. There’s a nice overview of the solution that you can read here, and you can read some more comprehensive instructions here.
Thoughts and Further Reading
I keep banging on about it, but ten years feels like a long time to be hanging around in tech. I haven’t managed to stay with one employer longer than 7 years (maybe I’m flighty?). Along with the durability of the solution, the fact that Backblaze made the design open source, and inspired a bunch of companies to do something similar, is a great story. It’s stuff like this that I find inspiring. It’s not always about selling black boxes to people. Sometimes it’s good to be a little transparent about what you’re doing, and relying on a great product, competitive pricing, and strong support to keep customers happy. Backblaze have certainly done that on the consumer side of things, and the team assures me that they’re experiencing success with the B2 offering and their business-oriented data protection solution as well.
The Veeam integration is an interesting one. While B2 is an object storage play, it’s not S3-compliant, so they can’t easily leverage a lot of the built-in options delivered by the bigger data protection vendors. What you will see, though, is that they’re super responsive when it comes to making integrations available across things like NAS devices, and stuff like this. If I get some time in the next month, I’ll look at setting this up in the lab and running through the process.
I’m not going to wax lyrical about how Backblaze is democratising data access for everyone, as they’re in business to make money. But they’re certainly delivering a range of products that is enabling a variety of customers to make good use of technology that has potentially been unavailable (in a simple to consume format) previously. And that’s a great thing. I glossed over the news when it was announced last year, but the “Rebel Alliance” formed between Backblaze, Packet and ServerCentral is pretty interesting, particularly if you’re looking for a more cost-effective solution for compute and object storage that isn’t reliant on hyperscalers. I’m looking forward to hearing about what Backblaze come up with in the future, and I recommend checking them out if you haven’t previously. You can read Ken‘s take over at Gestalt IT here.
Disclaimer: I recently attended Storage Field Day 15. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.
Western Digital recently presented at Storage Field Day 15. You might recall there are a few different brands under the WD umbrella, including Tegile and HGST and folks from both Tegile and HGST presented during Storage Field Day 15. I’d like to talk about the ActiveScale session however, mainly because I’m interested in object solutions. I’ve written about Tegile previously, although obviously a fair bit has changed for them too. You can see their videos from Storage Field Day 15 here, and download a PDF copy of my rough notes from here.
ActiveScale, Probably Not What You Thought It Was
ActiveScale isn’t some kind of weight measurement tool for exercise fanatics, but rather the brand of scalable object system that HGST sells. It comes in two flavours: the P100 and X100. Apparently the letters in product names sometimes do mean things, with the “P” standing for Petabyte, and the “X” for Exabyte (possibly in the same way that X stands for Excellent). From a speeds and feeds perspective, the typical specs are as follows:
P100 – starts as low as 720TB, goes to 18PB. 17x 9s data durability, 4.6KVA typical power consumption; and
X100 – 5.4PB in a rack, 840TB – 52PB, 17x 9s data durability, 6.5KVA typical power consumption.
You can scale out to 9 expansion racks, with 52PB of scale out object storage goodness per namespace. Some of the key capabilities of the ActiveScale platform include:
Archive and Backup;
Active Data for Analytics;
Data Forever Architecture;
Versioning;
Encryption;
Replication;
Single Pane Management;
S3 Compatible APIs;
Multi-Geo Availability Zones; and
Scale Up and Scale Out.
They use “BitSpread” for dynamic data placement and you can read a little about their erasure coding mechanism here. “BitDynamics” assures continuous data integrity, offering the following features:
Background – verification process always running
Performance – not impacted by verification or repair
Automatic – all repairs happen with no intervention
There’s also a feature called “GeoSpread” for geographical availability.
Single – Distributed erasure coded copy;
Available – Can sustain the loss of an entire site; and
Efficient – Better than 2 or 3 copy replication.
What Do I Use It For Again?
Like a number of other object storage systems in the market, ActiveScale is being positioned as a very suitable platform for:
Media & Entertainment
Media Archive
Tape replacement and augmentation
Transcoding
Playout
Life Sciences
Bio imaging
Genomic Sequencing
Analytics
Thoughts And Further Reading
Unlike a lot of people, I find technical sessions discussing object storage at extremely large scale to be really interesting. It’s weird, I know, but there’s something that I really like about the idea of petabytes of storage servicing media and entertainment workloads. Maybe it’s because I don’t frequently come across these types of platforms in my day job. If I’m lucky I get to talk to folks about using object as a scalable archive platform. Occasionally I’ll bump into someone doing stuff with life sciences stuff in a higher education setting, but they’ve invariably built something that’s a little more home-brew than HGST’s offering. Every now and then I’m lucky enough to spend some time with media types who regale me with tales of things that go terribly wrong when the wrong bit of storage infrastructure is put in the path of a particular editing workflow or transcode process. Oh how we laugh. I can certainly see these types of scalable platforms being a good fit for archive and tape replacement. I’m not entirely convinced they make for a great transcode or playout platform, but I’m relatively naive when it comes to those kinds of workloads. If there are folks reading this who are familiar with that kind of stuff, I’d love to have a chat.
But enough with my fascination with the media and entertainment industry’s infrastructure requirements. From what I’ve seen of ActiveScale, it looks to be a solid platform with a lot of very useful features. Coupled with the cloud management feature it seems like they’re worth a look. Western Digital aren’t just making hard drives for your NAS (and other devices), they’re doing a whole lot more, and a lot of it is really cool. You can read El Reg’s article on the X100 here.
I haven’t coveredSwiftStack in a little while, and they’ve been doing some pretty interesting stuff. They made some announcements recently but a number of scheduling “challenges” and some hectic day job commitments prevented me from speaking to them until just recently. In the end I was lucky enough to snaffle 30 minutes with Mario Blandini and he kindly took me through the latest news.
6.0 Then, So What?
Universal Access
Universal Access is really very cool. Think of it as a way to write data in either file or object format, and then read it back in file or object format, depending on how you need to consume it.
[image courtesy of SwiftStack]
Key features include:
Gateway free – the data is stored in cloud-native format in a single namespace;
Accessible via file (SMB3 / NFS4) and / or object API (S3 / Swift). Note that this is not a replacement for NAS, but it will give you the ability to work with some of those applications that expect to see file in places; and
Applications can write data one way, access the data another way, and vice versa.
The great thing is that, according to SwiftStack, “Universal Access enables applications to take advantage of all data under management, no matter how it was written or where it is stored, without the need to refactor applications”.
Universal Access Multi-Cloud
So what if you take to really neat features like, say, Cloud Sync and Universal Access, and combine them? You get access to a single, multi-cloud, storage namespace.
[image courtesy of SwiftStack]
Thoughts
As Mario took me through the announcements he mentioned that SwiftStack are “not just an object storage thing based on Swift” and I thought that was spot on. Universal Access (particularly with multi-cloud) is just the type of solution that enterprises looking to add mobility to workloads are looking for. The problem for some time has been that data gets tied up in silos based on the protocol that a controller speaks, rather than the value of the data to the business. Products like this go a long way towards relieving some of the pressure on enterprises by enabling simpler access to more data. Being able to spread it across on-premises and public cloud locations also makes for simpler consumption models and can help business leverage the data in a more useful way than was previously possible. Add in the usefulness of something like Cloud Sync in terms of archiving data to public cloud buckets and you’ll start to see that these guys are onto something. I recommend you head over to the SwiftStack site and request a demo. You can read the press release here.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.