The concept of domains have been with CLARiiON and (later) VNX arrays since the early part of the 21st Century. The configuration is fairly simple, and, in keeping with the idea that you can do anything with naviseccli, I thought I’d do a quick post on using naviseccli to join SPs to a domain. This assumes you have security setup with your naviseccli environment, and you know the IPs of the SPs you’re trying to add to the domain.
You can the set the master node for a domain with this command. Note that the nominated node can’t be a member of another domain at the time.
naviseccli -h SPA-IP-Address domain -setmaster SPB-IP-Address WARNING: You are about to set the following node as the master of the domain: SPB-IP-Address Proceed? (y/n) y
If a node is a problem, or you’re about to remove an array from your environment, it’s a good idea to remove it from the domain before you rip it out of the rack.
naviseccli -h SPA-IP-Address domain -remove SPA-IP-Address WARNING: You are about to remove the following node from the domain: SPA-IP-Address Proceed? (y/n) y
You may also wish to add another couple of nodes, particularly if you have a number of arrays in the environment.
naviseccli -h SPB-IP-Address domain -add SPA-IP-Address WARNING: You are about to remove the following node from the domain: SPA-IP-Address Proceed? (y/n) y
Many moons ago I wrote a brief article about accessing RemotelyAnywhere on the CX4. This was prompted by changes in Release 29 of FLARE that changed the access mechanism for remote console access on the SPs. I’ve been working on some VNX2s recently (or VNX with MCx – as EMC really would like them to be known), and I was curious as to whether the process was the same.
Pretty much, yep.
Nowadays there are a few ways to access RemotelyAnywhere on the VNX SP. There are a few different ports on the array that can be used, depending on your circumstances. In some environments, where you’re not allowed to touch the customer’s network with your own gear, the service port may be more appropriate. Here’s an image of the ports from EMC. The model of VNX you’re using will dictate the layout of the ports.
You can go via:
the SP’s management port: http://<SP IP address>:9519;
the SP’s service port: https://184.108.40.206:9519 (SP A) or https://220.127.116.11:9519 (SP B); and
the SP’s serial port: https://192.168.1.1:9519 (this assumes you’re connected via serial already – more on that below).
This is fairly straightforward, and you’ll need to be on a network that has access to the management ports.
So, you’re probably already aware that the best way to connect to the service port is to set your laptop TCP/IP settings as follows:
IP Address – 18.104.22.168 or 22.214.171.124
Subnet Mask – 255.255.255.248
Default Gateway – leave blank
DNS server entries – leave blank
If you want to connect via the serial cable, you’ll need to setup a PPP connection on your laptop. The following steps assume that you’ve got a USB to serial adapter and you’re using a Windows 7 machine.
Right Click on your Computer icon and Select Manage
Click on Device Manager
Expand Ports (COM & LPT)
Look for the USB-to-Serial Comm Port (COM##)
The COMM number will be the one you will select during the configuration of your PPP connection.
Create the COM Port
Click Start -> Control Panel -> Phone and Modem.
Click the “modem tab” and click Add.
On the Install new Modem Pane, select the Don’t detect my modem box, then click Next.
Select Communications cable between two computers, then click Next.
Select the COM port from the previous step, then click Next.
Highlight the new modem and click Properties.
Select the “modem tab” and adjust the max speed to 115200 then click OK.
Click OK again to exit the Phone and Modem screen.
In the Computer Management window, disable and then re-enable the USB Serial connection in Device Manager. Do this by right-clicking on it.
Setting Up the PPP Connection
Click Start -> Control Panel -> Network and Sharing Center, click Set up a new connection or network (at the bottom).
Click Next, select Set up a dial-up connection and click Next.
This screen should list modems and select the Communications cable between two computers created above.
On the next screen put in a random phone number. This is required in order to complete this step. You need at least one digit, but you’ll remove it later. Next put in the username and password and give it a name. Then click on Connect. This connection will fail displaying: Connection Failed with error 777. Click on Set up the connection anyway.
You will get: “The connection to the Internet is ready to use”. Select Close.
The above connection should now appear in Network Connections. Open Control Panel and select Change Adapter Settings.
Right-click on your new Modem connection and select Cancel as Default Connection.
Right Click on your new Modem connection again and select Properties.
Modify the Settings
In the General tab, remove the phone number entry and leave it blank.
In the General tab, click configure and set Max speed to 115200 and select enable hardware flow control.
In the Options tab, click PPP settings and check that the top two boxes are selected (LCP extensions and SW compression).
In the Security tab, check Data Encryption is Optional Encryption (connect even if no encryption) is set.
In the Networking tab, check Internet Protocol Version 4 is selected and click on Properties.
In the Networking tab, choose Internet Protocol Properties, then the Advanced button. Uncheck Use default gateway on remote network.
Here’s what it looks like when you log in – enjoy.
I had a question about this come up this week and thought I’d already posted something about it. Seems I was lazy and didn’t. If you have access, have a look at Primus article emc311319 on EMC’s support site. If you don’t, here’s the rough guide to what it’s all about.
When a Storage Pool is created, a large number of private LUNs are bound on all the Pool drives and these are divided up between SP A and B. When a Pool LUN is created, it uses the allocation owner to determine which SP private LUNs should be used to store the Pool LUN slices. If the default and current owner are not the same as the allocation owner, the I/O will have to be passed over the CMI bus between SP, to reach the Pool Private FLARE LUNs. This is a bad thing, and can lead to higher response times and general I/O bottlenecks.
OMG, I might have this issue, what should I do? You can change the default owner of a LUN by accessing the LUN properties in Unisphere. You can also change the default owner of a LUN thusly.
naviseccli -h <SP A or B> chglun -l <metalun> -d owner <0|1>
-d owner 0 = SP A
-d owner 1 = SP B
But what if you have too many LUNs where the allocation owner sits on one SP? And when did I start writing blog posts in the form of a series of questions? I don’t know the answer to the latter question. But for the first, the simplest remedy is to create a LUN on the alternate SP and use EMC’s LUN migration tool to get the LUN to the other SP. Finally, to match the current owner of a LUN to the default owner, simply trespass the LUN to the default owner SP.
Note that this is a problem from CX4 arrays through to VNX2 arrays. It does not apply to traditional RAID Group FLARE LUNs though, only Pool LUNs.
If you’re trying to do an OE upgrade on a VNX you might get the following error after you’ve run through the “Prepare for Installation” phase.
Turns out you just need to upgrade USM to the latest version. You can do this manually or via USM. Further information on this error can be found on support.emc.com by searching for the following Primus ID: emc321171.
Incidentally, I’d just like to congratulate EMC on how much simpler it is upgrade FLARE / VNX OE nowadays than it was when I first started on FC and CX arrays. Sooo much nicer …
Mat’s been doing some useful scripting again. This time it’s a small PERL script that identifies the allocation owner and default owner of a pool LUN on a CX4 or VNX and lets you know whether the LUN is “non-optimal” or not. For those of you playing along at home, I found the following information on this (but can’t remember where I found it). “Allocation owner of a pool LUN is the SP that owns and maintains the metadata for that LUN. It is not advised to trespass the LUNs to an SP that is not the allocation owner. This introduces lag. The SP that provides the best performance for the pool LUN. The allocation owner SP is set by the system to match the default SP owner when you create the LUN. You cannot change the allocation owner after the LUN is created. If you change the default owner for the LUN, the software will display a warning that a performance penalty will occur if you continue.”
There’s a useful article by Jithin Nadukandathil on the ECN site, as well as a most excellent writeup by fellow EMC Elect member Jon Klaushere. In short, if you identify NonOptimal LUN ownership, your best option is to create a new LUN and migrate the data to that LUN via the LUN Migration tool. You can download a copy of the script here. Feel free to look at the other scripts that are on offer as well. Here’s what the output looks like.
Disclaimer: I don’t work for Stormons, and I’ve not been compensated for this post. I just think it’s a cool product that is worth checking out.
Didier from Stormons recently got in touch to let me know there’s now a Professional version of the software available now as part of a subscription deal. I’ve previously covered Stormons here and here and think it’s pretty good stuff – and definitely worth checking out – particularly if you have a large environment to work with. Apparently EMC in Bangalore are heavy users of the product as well. The Professional Edition is offered on a subscription basis, and they’re running a discounted rate until May to celebrate the release. Find out more about it here. You can also still access the free edition from the downloads page.
Mat has come up with a few new scripts – FlipFASTTiering and ReplicationCapacity. They’re PERL scripts that you can use to list / modify FAST Tiering scheduling and report on MirrorView replication data respectively. Hopefully you’ll find them of some use. Further information can be found on the Utilities page.
EMC have also breathlessly announced the introduction of “Multi-Core Everything” or MCx as an Operating Environment replacement for FLARE. I thought I’d spend a little time going through what that means, based on information I’ve been provided by EMC.
MCx is a redesign of a number of components, providing functionality and performance improvements for:
Multi-Core Cache (MCC);
Multi-Core RAID (MCR);
Multi-Core FAST Cache (MCF); and
Active / Active data access.
As I mentioned in my introductory post, the OE has been redesigned to spread the operations across each core – providing linear scaling across cores. Given that FLARE pre-dated multi-core x86 CPUs, it seems like this has been a reasonable thing to implement.
EMC are also suggesting that this has enabled them to scale out within the dual-node architecture, providing increased performance, at scale. This is a response to a number of competitors who have suggested that the way to scale out in the mid-range is to increase the number of nodes. How well this scales in real-world scenarios remains to be seen.
With MCC, the cache engine has been modularized to take advantage of all the cores available in the system. There is now also no requirement for manually separate space for Read and Write Cache, meaning no management overhead in
ensuring the cache is working in the most effective way regardless of the IO mix. Interestingly, cache is dynamically assigned, allocating space on-the-fly for reads, writes, thin metadata, etc depending on the needs of the system at the time.
The larger overall working cache provides mutual benefit between reads and writes. Note also that data is not discarded from cache after a write destage (this greatly improves cache hits). The new caching model employs intelligent monitoring of pace of disk writes to avoid forced flushing and works on a standardized 8KB physical page size.
MCR introduces some much-needed changes to the historically clumsy way in which disks were managed in the CLARiiON / VNX. Of particular note is the handling of hot spares. Instead of having to define disks as permanent hot spares, the sparing function is now more flexible and any unassigned drive in the system can operate as a hot spare.
There are three policies for hot spares:
No hot spares; and
The “Recommended” policy implements the same sparing model as today (1 spare per 30 drives). Note, however, that if there is an unused drive that can be used as a spare (even if you have the “No Hot Spare” policy set) and there is a fault in a RAID group, the system will use that drive. Permanent sparing also means that when a drive has been used in a sparing function, that drive is then a permanent part of the RAID Group or Pool so there is no need to re-balance and copy back the spare drive to a new drive. The cool thing about this is that it reduces the exposure and performance overhead of sparing operations. If the concept of storage pools that spread all over the place didn’t freak out the RAID Group huggers in your storage team, then the idea of spares becoming permanent might just do it. There will be a CLI command available to copy the spare back to the replaced drive, so don’t freak out too much.
What I like the look of, having been stuck “optimising” CLARiiONs previously, is the idea of “Portable drives”, where drives can be physically removed from a RAID group and relocated to another disk shelf. Please note that you can’t use this feature to migrate RAID Groups to other VNX systems, you’ll need to use more traditional migration methods to achieve this.
Multi-Core FAST Cache
According to EMC, MCF is all about increased efficiency and performance. They’ve done this by moving MCC above the FAST Cache driver – all DRAM cache hits are processed without the need to check whether a block resides in FAST Cache, saving this CPU cycle overhead on all IOs. There’s also a faster initial warm-up for FAST Cache, resulting in better initial performance. Finally, instead of requiring 3 hits, if the FAST Cache is not full, it will perform more like a standard extension of cache and load data based on a single hit. Once the Cache is 80% full it reverts to the default 3-hit promotion policy.
Symmetric Active / Active
Real symmetric? Not that busted-arse ALUA? Sounds good to me. With Symmetric Active/Active, EMC are delivering true concurrent access via both SPs (no trespassing required – hooray!). However, this is currently only supported for Classic LUNs. It does have benefits for Pool LUNs by removing the performance impacts of trespassing the allocation owner. Basically, there’s a stripe locking service in lpay that’s providing access to the LUN for each SP, with the CMI being used for communication between the SPs. Here’s what LUN Parallel Access looks like.
And that’s about it. Make no mistake, MCx sounds like it should really improve performance for EMC’s mid-range. Remember though, given some of the fundamental changes outlined here, it’s unlikely that this will work on previous-generation VNX models.