VMUG UserCon Sydney 2019 – See You There?

I’ll be presenting at VMUG UserCon in Sydney this year. If you’re unfamiliar with UserCon, it’s a free event run by the larger VMUG groups and brings together a mix of different folk, all of whom are focused on VMware technologies in one way or another. There are plenty of technical discussions to be had, as well as community talks on “soft” skills from people like me. I’ll be doing a presentation called “Build A Personal Brand: How to Start and Maintain a Blog!”. The alternative title is “Become an overnight sensation in just 11 years”. The first title is probably better though (and more accurate). It covers a lot of the basics of getting started with a blog, using it to maintain a voice in the community, and a bunch of lessons learnt over the last few years.

In any case, even if you don’t want to hear me talk, there’s sure to be something there that will spark your interest. The keynote speakers are Chris Wolf and Brian Madden – both very interesting people. The event’s being held at the Sydney International Convention Centre on Tuesday March 19th. And it’s free to boot. So get along if you can. And there’s also an event happening in Melbourne on Thursday 21st. I won’t be there, but I’m sure it will be very good. I know the VMUG teams are working on the schedule, and I’ll post that here as soon as I know it.

VMware vSphere and NFS – Some Links

Most of my experience with vSphere storage has revolved around various block storage technologies, such as DAS, FC and iSCSI. I recently began an evaluation of one of those fresh new storage startups running an NVMe-based system. We didn’t have the infrastructure to support NVMe-oF in our lab, so we’ve used NFS to connect the datastores to our vSphere environment. Obviously, at this point, it is less about maximum performance and more about basic functionality. In any case, I thought it might be useful to include a series of links regarding NFS and vSphere that I’ve been using to both get up and running, and troubleshoot some minor issues we had getting everything running. Note that most of these links cover vSphere 6.5, as our lab is currently running that version.

Basics

Create an NFS Datastore

How to add NFS export to VMware ESXi 6.5

NFS Protocols and ESXi

Best Practice

Best Practices for running VMware vSphere on Network Attached Storage

Troubleshooting

Maximum supported volumes reached (1020652)

Increasing the default value that defines the maximum number of NFS mounts on an ESXi/ESX host (2239)

Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts (1003967)

Imanis Data and MDL autoMation Case Study

Background

I’ve covered Imanis Data in the past, but am the first to admit that their focus area is not something I’m involved with on a daily basis. They recently posted a press release covering a customer success story with MDL autoMation. I had the opportunity to speak with both Peter Smails from Imanis Data, as well as Eric Gutmann from MDL autoMation. Whilst I enjoy speaking to vendors about their successes in the market, I’m even more intrigued by customer champions and what they have to say about their experience with a vendor’s offering. It’s one thing to talk about what you’ve come up with as a product, and how you think it might work well in the real world. It’s entirely another thing to have a customer take the time to speak to people on your behalf and talk about how your product works for them. Ultimately, these are usually interesting conversations, and it’s always useful for me to hear about how various technologies are applied in the real world. Note that I spoke to them separately, so Gutmann wasn’t being pushed in a certain direction by Imanis Data – he’s just really enthusiastic about the solution.

 

The Case Study

The Customer

Founded in 2006, MDL autoMation (MDL) is “one of the automotive industry’s leaders in the application of IoT and SaaS-based technologies for process improvement, automated customer recognition, vehicle tracking and monitoring, personalised customer service and sales, and inventory management”. Gutmann explained to me that for them, “every single customer is a VIP”. There’s a lot of stuff happening on the back-end to make sure that the customer’s experience is an extremely smooth one. MongoDB provides the foundation for the solution. When they first deployed the environment, they used MongoDB Cloud Manager to protect the environment, but struggled to get it to deliver the results they required.

 

Key Challenges

MDL moved to another provider, and spent approximately six months with getting it running. It worked well at the time, and met their requirements, saving them money and delivering quick backup on-premises and quick restores. There were a few issues though, including the:

  • Cost and complexity of backup and recovery for 15-node, sharded, MongoDB deployment across three data centres;
  • Time and complexity associated with daily refresh to non-sharded QA test cluster (it would take 2 days to refresh QA); and
  • Inability to use Active Directory for user access control.

 

Why Imanis Data?

So what got Gutmann and MDL excited about Imanis Data? There were a few reasons that Eric outlined for me, including:

  • 10x backup storage efficiency;
  • 26x faster QA refresh time – incremental restore;
  • 95% reduction in number policies to manage – enterprise policy engine, the number of policies to manage was reduced from 40 to 2; and
  • Native integration with Active Directory.

It was cheaper again than the previous provider, and, as Gutmann puts it “[i]t took literally hours to implement the Imanis product”. MDL are currently protecting 1.6TB of data, and it takes 7 minutes every hour to backup any changes.

 

Conclusion and Further Reading

Data protection is a problem that everyone needs to deal with at some level. Whether you have “traditional” infrastructure delivering your applications, or one of those fancy new NoSQL environments, you still need to protect your stuff. There are a lot of built-in features with MongoDB to ensure it’s resilient, but keeping the data safe is another matter. Coupled with that is the fact that developers have relied on data recovery activities to get data in to quality assurance environments for years now. Add all that together and you start to see why customers like MDL are so excited when they come across a solution that does what they need it to do.

Working in IT infrastructure (particularly operations) can be a grind at times. Something always seems to be broken or about to break. Something always seems to be going a little bit wrong. The best you can hope for at times is that you can buy products that do what you need them to do to ensure that you can produce value for the business. I think Imanis Data have a good story to tell in terms of the features they offer to protect these kinds of environments. It’s also refreshing to see a customer that is as enthusiastic as MDL is about the functionality and performance of the product, and the engagement as a whole. And as Gutmann pointed out to me, his CEO is always excited about the opportunity to save money. There’s no shame in being honest about that requirement – it’s something we all have to deal with one way or another.

Note that neither of us wanted to focus on the previous / displaced solution, as it serves no real purpose to talk about another vendor in a negative light. Just because that product didn’t do what MDL wanted it to do, doesn’t mean that that product wouldn’t suit other customers and their particular use cases. Like everything in life, you need to understand what your needs and wants are, prioritise them, and then look to find solutions that can fulfil those requirements.

Random Short Take #11

Here are a few links to some random news items and other content that I found interesting. You might find it interesting too. Maybe. Happy New Year too. I hope everyone’s feeling fresh and ready to tackle 2019.

  • I’m catching up with the good folks from Scale Computing in the next little while, but in the meantime, here’s what they got up to last year.
  • I’m a fan of the fruit company nowadays, but if I had to build a PC, this would be it (hat tip to Stephen Foskett for the link).
  • QNAP announced the TR-004 over the weekend and I had one delivered on Tuesday. It’s unusual that I have cutting edge consumer hardware in my house, so I’ll be interested to see how it goes.
  • It’s not too late to register for Cohesity’s upcoming Helios webinar. I’m looking forward to running through some demos with Jon Hildebrand and talking about how Helios helps me manage my Cohesity environment on a daily basis.
  • Chris Evans has published NVMe in the Data Centre 2.0 and I recommend checking it out.
  • I went through a basketball card phase in my teens. This article sums up my somewhat confused feelings about the card market (or lack thereof).
  • Elastifile Cloud File System is now available on the AWS Marketplace – you can read more about that here.
  • WekaIO have posted some impressive numbers over at spec.org if you’re into that kind of thing.
  • Applications are still open for vExpert 2019. If you haven’t already applied, I recommend it. The program is invaluable in terms of vendor and community engagement.

 

 

Cohesity – Helios Article and Upcoming Webinar

I’ve written about Cohesity’s Helios offering previously, and also wrote a short article on upgrading multiple clusters using Helios. I think it’s a pretty neat offering, so to that end I’ve written an article on Cohesity’s blog about some of the cool stuff you can do with Helios. I’m also privileged to be participating in a webinar in late January with Cohesity’s Jon Hildebrand. We’ll be running through some of these features from a more real-world perspective, including doing silly things like live demos. You can get further details on the webinar here.

Storage Field Day – I’ll Be At Storage Field Day 18

Here’s some good news for you. I’ll be heading to the US in late February for another Storage Field Day event. If you haven’t heard of the very excellent Tech Field Day events, you should check them out. I’m looking forward to time travel and spending time with some really smart people for a few days. It’s also worth checking back on the Storage Field Day 18 website during the event (February 27 – March 1) as there’ll be video streaming and updated links to additional content. You can also see the list of delegates and event-related articles that have been published.

I think it’s a great line-up of both delegates and presenting companies (including a “secret company”) this time around. I know them all pretty well, but there may also still be a few companies added to the line-up. I’ll update this if and when they’re announced. [Update: Here’s the full list of presenters for this event]

 

I’d like to publicly thank in advance the nice folks from Tech Field Day who’ve seen fit to have me back, as well as my employer for letting me take time off to attend these events. Also big thanks to the companies presenting. It’s going to be a lot of fun. Seriously. If you’re in the Bay Area and want to catch up prior to the event, please get in touch. I’ll have some free time, so perhaps we could check out a Warriors game on the 23rd and discuss the state of the industry? ;)

OpenMediaVault – Good Times With mdadm

Happy 2019. I’ve been on holidays for three full weeks and it was amazing. I’ll get back to writing about boring stuff soon, but I thought I’d post a quick summary of some issues I’ve had with my home-built NAS recently and what I did to fix it.

Where Are The Disks Gone?

I got an email one evening with the following message.

I do enjoy the “Faithfully yours, etc” and the post script is the most enlightening bit. See where it says [UU____UU]? Yeah, that’s not good. There are 8 disks that make up that device (/dev/md0), so it should look more like [UUUUUUUU]. But why would 4 out of 8 disks just up and disappear? I thought it was a little odd myself. I had a look at the ITX board everything was attached to and realised that those 4 drives were plugged in to a PCI SATA-II card. It seems that either the slot on the board or the card are now failing intermittently. I say “seems” because that’s all I can think of, as the S.M.A.R.T. status of the drives is fine.

Resolution, Baby

The short-term fix to get the filesystem back on line and useable was the classic “assemble” switch with mdadm. Long time readers of this blog may have witnessed me doing something similar with my QNAP devices from time to time. After panic rebooting the box a number of times (a silly thing to do, really), it finally responded to pings. Checking out /proc/mdstat wasn’t good though.

dan@openmediavault:~$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
unused devices: <none>

Notice the lack of, erm, devices there? That’s non-optimal. The fix requires a forced assembly of the devices comprising /dev/md0.

dan@openmediavault:~$ sudo mdadm --assemble --force --verbose /dev/md0 /dev/sd[abcdefhi]
[sudo] password for dan:
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdh is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdi is identified as a member of /dev/md0, slot 6.
mdadm: forcing event count in /dev/sdd(2) from 40639 upto 40647
mdadm: forcing event count in /dev/sdc(3) from 40639 upto 40647
mdadm: forcing event count in /dev/sdf(4) from 40639 upto 40647
mdadm: forcing event count in /dev/sde(5) from 40639 upto 40647
mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdd
mdadm: clearing FAULTY flag for device 2 in /dev/md0 for /dev/sdc
mdadm: clearing FAULTY flag for device 5 in /dev/md0 for /dev/sdf
mdadm: clearing FAULTY flag for device 4 in /dev/md0 for /dev/sde
mdadm: Marking array /dev/md0 as 'clean'
mdadm: added /dev/sdb to /dev/md0 as 1
mdadm: added /dev/sdd to /dev/md0 as 2
mdadm: added /dev/sdc to /dev/md0 as 3
mdadm: added /dev/sdf to /dev/md0 as 4
mdadm: added /dev/sde to /dev/md0 as 5
mdadm: added /dev/sdi to /dev/md0 as 6
mdadm: added /dev/sdh to /dev/md0 as 7
mdadm: added /dev/sda to /dev/md0 as 0
mdadm: /dev/md0 has been started with 8 drives.

In this example you’ll see that /dev/sdg isn’t included in my command. That device is the SSD I use to boot the system. Sometimes Linux device conventions confuse me too. If you’re in this situation and you think this is just a one-off thing, then you should be okay to unmount the filesystem, run fsck over it, and re-mount it. In my case, this has happened twice already, so I’m in the process of moving data off the NAS onto some scratch space and have procured a cheap little QNAP box to fill its role.

 

Conclusion

My rush to replace the homebrew device with a QNAP isn’t a knock on the OpenMediaVault project by any stretch. OMV itself has been very reliable and has done everything I needed it to do. Rather, my ability to build semi-resilient devices on a budget has simply proven quite poor. I’ve seen some nasty stuff happen with QNAP devices too, but at least any issues will be covered by some kind of manufacturer’s support team and warranty. My NAS is only covered by me, and I’m just not that interested in working out what could be going wrong here. If I’d built something decent I’d get some alerting back from the box telling me what’s happened to the card that keeps failing. But then I would have spent a lot more on this box than I would have wanted to.

I’ve been lucky thus far in that I haven’t lost any data of real import (the NAS devices are used to store media that I have on DVD or Blu-Ray – the important documents are backed up using Time Machine and Backblaze). It is nice, however, that a tool like mdadm can bring you back from the brink of disaster in a pretty efficient fashion.

Incidentally, if you’re a macOS user, you might have a bunch of .ds_store files on your filesystem. Or stuff like .@Thumb or some such. These things are fine, but macOS doesn’t seem to like them when you’re trying to move folders around. This post provides some handy guidance on how to get rid of a those files in a jiffy.

As always, if the data you’re storing on your NAS device (be it home-built or off the shelf) is important, please make sure you back it up. Preferably in a number of places. Don’t get yourself in a position where this blog post is your only hope of getting your one copy of your firstborn’s pictures from the first day of school back.

Random Short Take #10

Here are a few links to some random news items and other content that I found interesting. You might find it interesting too. Maybe. This will be the last one for this year. I hope you and yours have a safe and merry Christmas / holiday break.

  • Scale Computing have finally entered the Aussie market in partnership with Amnesium. You can read more about that here
  • Alastair is back in the classroom, teaching folks about AWS. He published a bunch of very useful notes from a recent class here.
  • The folks at Backblaze are running a “Refer-A-Friend” promotion. If you’re looking to become a new Backblaze customer and sign up with my referral code, you’ll get some free time on your account. And I will too! Hooray! I’ve waxed lyrical about Backblaze before, and I recommend it. The offer runs out on January 6th 2019, so get a move on.
  • Howard did a nice article on VVols that I recommend checking out.
  • GDPR has been a challenge (within and outside the EU), but I enjoyed Mark Browne‘s take on Cohesity’s GDPR compliance.
  • I’m quite a fan of the Netflix Tech Blog, and this article on the Netflix Media Database was a ripper.
  • From time to time I like to poke fun at my friends in the US for what seems like an excessive amount of shenanigans happening in that country, but there’s plenty of boneheaded stuff happening in Australia too. Read Preston’s article on the recently passed anti-encryption laws to get a feel for the heady heights of stupidity that we’ve been able to reach recently.

 

Updated Articles Page

I recently had the opportunity to upgrade my Cohesity lab environment using Helios and thought I’d run through the basics. There’s a new document outlining the process on the articles page.

Google WiFi – A Few Notes

Like a lot of people who work in IT as their day job, the IT situation at my house is a bit of a mess. I think the real reason for this is because, once the working day is done, I don’t want to put any thought into doing this kind of stuff. As a result, like a lot of tech folk, I have way more devices and blinking lights in my house than I really need. And I’m always sure to pile on a good helping of technical debt any time I make any changes at home. It wouldn’t be any fun without random issues to deal with from time to time.

Some Background – Apple Airport

I’ve been running an Apple Airport Extreme and a number of Airport Express devices in my house for a while in a mesh network configuration. Our house is 2 storeys and it was too hard to wire up properly with Ethernet after we bought it. I liked the Apple devices primarily because of the easy to use interface (via browser or phone), and Airplay, in my mind at least, was a killer feature. So I’ve stuck with these things for some time, despite the frequent flakiness I experienced with the mesh network (I’d often end up connected to an isolated access point with no network access – a reboot of the base station seemed to fix this) and the sometimes frustrating lack of visibility into what was going on in the network. 

Enter Google Wifi

I had some Frequent Flier points available that meant I could get a 3-pack of Google access points for under $200 AU (I think that’s about $15 in US currency). I’d already put up the Christmas tree, so I figured I could waste a few hours on re-doing the home network. I’m not going to do a full review of the Google Wifi solution, but if you’re interested in that kind of thing, Josh Odgers does a great job of that here. In short, it took me about an hour to place the three access points in the house and get everything connected. I have about 30 – 40 devices running, some of which are hardwired to a switch connected to my ISP’s NBN gateway, and most of which connect wirelessly. 

So What’s The Problem?

The problem was that I’d kind of just jammed the primary Google Wifi point into the network (attached to a dumb switch downstream of the modem). As a result, everything connecting wirelessly via the Google network had an IP range of 192.168.86.x, and all of my other devices were in the existing 10.x.x.x range. This wasn’t a massive problem, as the Google solution does a great job of routing stuff between the “wan” and “lan” subnets, but I started to notice that my pi-hole device wasn’t picking up hostnames properly, and some devices were getting confused about which DNS to use. Oh, and my port mapping for Plex was a bit messed up too. I also had wired devices (i.e. my desktop machine) that couldn’t see Airplay devices on the wireless network without turning on Wifi.

The Solution?

After a lot of Googling, I found part of the solution via this Reddit thread. Basically, what I needed to do was follow a more structured topology, with my primary Google device hanging off my ISP’s switch (and connected via the “wan” port on the Google Wifi device). I then connected the “lan” port on the Google device to my downstream switch (the one with the pi-hole, NAS devices, and other stuff connected to it). 

Now the pi-hole could play nicely on the network, and I could point my devices to it as the DNS server via the Google interface. I also added a few more reservations into my existing list of hostnames on the pi-hole (instructions here) so that it could correctly identify any non-DHCP clients. I also changed the DHCP range on the Google Wifi to a single IP address (the one used by the pi-hole) and made sure that there was a reservation set for the pi-hole on the Google side of things. The reason for this (I think) is that you can’t disable DHCP on the Google Wifi device. To solve the Plex port mapping issue, I set a manual port mapping on my ISP modem and pointed it to the static IP address of the primary Google Wifi device. I then created a port mapping on the Google side of things to point to my Plex Media Server. It took a little while, but eventually everything started to work. 

It’s also worth noting that I was able to reconfigure the Airport Express devices connected to speakers to join the new Wifi network and I can still use Airplay around the house as I did before.

Conclusion 

This seems like a lot of mucking about for what is meant to be a plug and play wireless solution. In Google’s defence though, my home network topology is a bit more fiddly than the average punter’s would be. If I wasn’t so in love with pi-hole, and didn’t have devices that I wanted to use static IP addresses and DNS, then I wouldn’t have had as many problems as I did with the setup. From a performance and usability standpoint, I think the Google solution is excellent. Of course, this might all go to hell in a hand basket when I ramp up IPv6 in the house, but for now it’s been working well. Coupled with the fact that my networking skills are pretty subpar and we should all just be happy I was able to post this article on the Internet from my house.