How to script on Spectrum Virtualize – tips and tricks – Barry Whyte and Andrew Martin : IBM Storage

ORIGINALLY PUBLISHED 11th November 2017

Hi everyone,

I have been asked to write up some hints and tips about how to do scripting on SVC / Storwize / Spectrum Virtualize. I do a lot of scripting against our products – sometimes scripts that get published on developerworks (e.g. the Standby Node Script) and sometimes smaller scripts to help clients with specific issues.

There are a few basic things that we need to understand when doing scripting. Hopefully the tips below make some sense to you and will prove helpful.

Finding the commands that you need to run

How to do error checking

Where to run the script

Useful reference pages

Edit 29 May 2017: Getting vdisk properties efficiently

Finding the commands that you need to run

Arguably finding the “right” command can be the hardest thing here. Our command line was designed in 2003 and the CLI often hasn’t been changed when things got renamed. For example the GUI and documentation all talk about Volumes but the CLI call them vdisks. So sometimes the right command can be a little hard to find. However the commands are normally really well structured.

Tip: Let the GUI tell you what command to run to achieve your objective

Whenever you perform a task in the GUI, you will always see an action box that has a “view more details” twisty on it. If you click on that twisty you can see all of the action commands that were run to achieve your objective. This is not always enough to reproduce a full script, but it will give you a good start

Tip: The help pages for the CLI commands are normally fairly complete and helpful, after you’ve read the first few and understand the formatting.

You can view these on the command line with “<command> -h” for the syntax (if you’ve forgotten whether the command was -iogrp or -io_grp) or “help command” for the full detailed help. However you can also find these commands in the knowledge center. They are in the “Command-line interface” section of the KC, and they are helpfully grouped into sections that will give you a leg up when trying to find a command when you don’t know what it’s called. For example there is a subsection called “Drive commands” that give you all of the commands related to drives.

When to do error checking

This part is really important, and often overlooked!

If you are just creating a one-time use script that you are going to use to create 50 volumes, then you don’t need to do any error checking – apart from watching the screen.

However if you are writing a long-running script, or a script that has multiple dependent steps (e.g. creating 50 volumes and mapping them to hosts then creating RC relationships) you probably want some form of error checking and retry mechanism.

It is important to know that SVC/Storwize has a lot of rate limiting on SSH connections when running command lines. These are there to try to prevent a flood of SSH commands overloading the configuration node and causing a performance impact (or worse). However it does mean that if you have a lot of things using the CLI then your script is likely to have one or more of the SSH connections fail and need retrying. If you have 2 Spectrum Control servers or a lot of scripts you could find that SSH connections fail much more often

SVC/Storwize also has a rate limit on the number of pings per second per physical adapter. So ping tests dropping frames often just means that there are too many pings being sent to the adapter, rather than a real problem.

How to do error checking

You could in theory work out all of the possible error codes and handle them specifically. In practice this is nearly impossible. So my approach is much simpler – write a little module that does retries for you and validate that that the result matches what you expect. If the result is unexpected, then retry.

For example – when you make a vdisk – you expect the result to be something like this:

 Virtual Disk, id [0], successfully created

Or if you are running an svcinfo command, almost all of them start with

id name ...

If the output is anything else then something went wrong. I will normally do a limited number of retries if the result isn’t what I expect and then kill the script.

So here’s some rough pseudo-code to do just that

function svccommand_wrapper (svc_command, expected_result, maximum_retries) {

    ssh_command = “ssh myuser@mycluster.domain.com svc_command 2>&1”

    counter = 0
    success = 0
    return_string = “”

    while ( counter < maximum_retries AND success not equal to 1) {

        return_string = execute_command (ssh_command)

        if (return_string matches expected_result) {
            success = 1
        } else {
            sleep (counter * 10 seconds)
        }
        counter = counter + 1
    }

    if (success equal 0) {

        exit script with an error because we’ve retried a number of times without success
    }

    return return_string
}

There are more subtleties to it than just this. For example, checking whether the return_string matches the expected_result is not very trivial to do unless you understand Regular Expressions (RegExps). If you don’t understand regular expression yet I find that I use regular expressions in nearly all of my scripts. I really recommend that you try and understand them because they are incredibly powerful. Having said that I have a vague recollection of trying to learn them <mumble> years ago and they can be slightly overwhelming the first time you see them. If anyone has a good reference for learning Regexps then please put it in the comments below. I guess that should be a tip…

Tip: Learn Regular Expressions

Now that I’ve made a horrible job of trying to write pseudo-code, here’s a real function that I used in a recent script I was using with a customer (although this one doesn’t have the increasing sleep times that I recommended above). This is written in perl – which is my preferred scripting language:

sub svcinfo_wrapper {

    my ($cmd, $success_regexp, $loopcount, $give_up_regexp) = @_;
    my $cli_success=0;
    my @cli_output;

    while (!$cli_success) {
        @cli_output = `$cmd 2>&1`;
        $loopcount--;

        if ($loopcount ==0) {
            die "Too many attempts to run $cmd have failed";
        } elsif ($loopcount <= 9) {
            #only sleep the second time through the loop to avoid sleeping on success
            sleep 5;
        }

        if (scalar (@cli_output) >0 && $cli_output[0] =~ $success_regexp) {
            $cli_success=1;
        } elsif(defined $give_up_regexp && scalar (@cli_output) >0 && $cli_output[0] =~ $give_up_regexp) {
            unshift @cli_output, $SVCINFO_GIVING_UP_MESSAGE;
            $cli_success=1;
        } else {
            my $firstline = (defined $cli_output[0])?$cli_output[0]:"";
            print "\n\tFailed - retrying ($firstline)";
        }
    }

    return @cli_output;
}

Tip: when running a ssh command you often want to get the error output as well as the normal output back into your logic. However linux will normally normal output on a channel known as STDOUT and the error output on a channel known as STDERR. It is possible to be clever and get these pieces of data separately, however often it’s simpler to simply merge the two outputs together. The slightly magic looking “2>&1 “ means redirect STDERR to STDOUT, which merges the two outputs together. This is quite a useful thing to know.

Where to run the script

Some scripts are really simple and you can just run them on the command line of SVC these days (as long as you are running 7.5 or higher).

But most of the time you will need to run on a server somewhere that has network access to the SVC/Storwize machine. In this case you will need to create an login user that has an SSH public/private key pair and use that for authentication. It is almost impossible to write a script that uses SSH and a password.

If you don’t know about SSH keys – The “Secure Shell” subsection of the “Command-line interface” section in the knowledge center has a lot of useful information, and I’m sure you can google for even more information.

Important: You can configure a password on your SSH private key for additional security, but if you do this then you will also need to learn about either ssh-agent (linux) or pageant so that your computer can cache a copy of the private key that can be used without a password.

Tip: The very first time you ssh to a server, you will get a prompt asking you to validate that the SSH fingerprint is correct. After that first attempt it shouldn’t ask you again. I’ve never found a reliable way of handling that behavior in a script – so you can resolve it one of two ways:

Manually SSH into the target machine before running your script for the first time
You can add -oStrictHostKeyChecking=no into the SSH command line to bypass the fingerprint checking. But this means that you disable the Man-in-the-middle protection that is included in SSH.

Also remember to log all output from your SSH commands somewhere so when you find a script that just seems to be hanging because of this issue, you will quickly be able to work out what is happening.

Scripting with SSH

SSH makes scripting really easy. If I would normally just run “lsvdisk” on the SVC command line, then you can run that same command via SSH with the command below. The command below will simply run “lsvdisk -delim :” on the machine called mysvccluster.domain.com after logging in with the user name myuser.

(Linux/Unix) ssh myuser@mysvccluster.domain.com lsvdisk -delim :

(Windows) plink myuser@mysvccluster.domain.com lsvdisk -delim :

It’s as simple as that, as long as you have set up your SSH key and stored it in the *default* location. If you’ve got your SSH key in a non-default location you’ll need to tell SSH where it is with the -i flag.

ssh -i /path/to/ssh/private/key myuser@mysvccluster.domain.com ‘lsvdisk’

Note: If you more complicated commands that include semicolons or ‘&’ or any other special bash characters you will need to learn about quoting and escaping special characters. However if you stick to a single command per ssh command you should be OK.

IMPORTANT TIP: Be very careful with loops. You could write a script that loops forever and sends svcinfo lsnodestats to an SVC cluster with no sleeps between the commands. This will work – however it will put a large amount of load onto the SVC. If you do this you will proabably cause a performance problem for your production applications. So if you are running any loop you need to put sleeps in there – especially if your loop will run more than 20 or 30 commands

Output and Logging

When you are doing simple scripts for yourself, then this doesn’t matter. But if you have a long-running script or you have a script that is going to be run by other people this becomes VERY important.

Tip: Think about what is the right level of information to print to the user on the screen. Too much is often just as bad as too little.

Tip: Whatever print statements you consider “debug level” you should log to a file.

I’ve had a few scripts that I’ve given to other people and they haven’t quite worked properly and I’ve had to ask then to run the script again with -debug so I could work out what’s gone wrong. As a result of that I normally write debug output to a file every time. That way if I need to know what’s going wrong I have the details available without them having to run the script again.

I often make little dummy functions called “debug, output and fault” that just print the outputs to the places that I need them to go (file only, screen and file, screen file and exit the program respectively). I know I could use more complex logging modules like Log4Perl (as an example) – but normally the simple solution is good enough.

General Tips

Tip: Always use svcinfo and svctask in your script so that it will work on all products

There are some subtle differences between the CLIs on the different products, like the fact that on SVC you would run lsnode and on Storwize the same command is lsnodecanister.

To make your life simple, if you always put “svcinfo” (for commands that start with ls) or “svctask” (for commands that change the configuration or state) at the beginning of your commands then if they run on one product, they will run on all products!

Tip: Use the -delim flag on ls<command> commands. This allows you to make it MUCH easier to parse output from the command. It puts a single character between each field, rather than an arbitrary number of spaces. If you try and make a computer read an output that is space delimited, you will be surprised how hard it gets quite quickly.

I normally used “-delim :” because that works well for most commands – however it doesn’t work very well for any commands that include IPv6 addresses, because IPV6 contains a lot of :s

Example:

      svcinfo lsmdiskgrp -delim :

Tip: The command line flag -nohdr can be a useful flag when running ls<command> commands – it doesn’t show the header line.

This means you can bypass the code you need to “skip” the first line. I’ve had a number of silly bugs caused by me forgetting to skip that first line of output. However – I’ve since found that the header line is a really really good way of validating that the command has worked (see the error checking section later) – so I don’t use it as much any more

Some Examples

So that’s a lot of background and tips – lets try and do some examples.

Example 1: Create 10 100GB vdisks, name them db2_data_<index> and map them to a host called “db2server”

This is a nice simple one we can actually do on the SVC/Storwize command line, without using SSH. This is using the bash command line. I simply ssh onto the SVC/Storwize once and either type or paste the script below.

Here’s the script with indentation to make it easier to read

for count in `seq 1 10`
do
    svctask mkvdisk -name db2_data_$count -iogrp 0 -mdiskgrp 0 -size 100 -unit GB
    svctask mkvdiskhostmap -host db2server db2_data_$count
done

When you put this on the command line you should be able to type or paste it in with the new lines – but here’s a single line version that can be pasted with a bit less of a problem

for count in `seq 1 10` ; do svctask mkvdisk -name db2_data_$count -iogrp 0 -mdiskgrp 0 -size 100 -unit GB;  svctask mkvdiskhostmap -host db2server db2_data_$count; done;

Tip: seq is a very useful program when you want to run a loop a number of times. It generates a list of numbers for you. There are a number of different things you can do with it. For example if I wanted to make vdisks named db2_data_31 to db2_data_40 I would use “seq 31 40” instead.

The for loop will set the value of $count to 1, then 2,3,4,5,6,7,8,9 then finally 10. For each value of $count it will run the code between the “do” and “done” labels.

This script will run the following commands

svctask mkvdisk -name db2_data_1 -iogrp 0 -mdiskgrp 0 -size 100 -unit GB;

svctask mkvdiskhostmap -host db2server db2_data_1;

svctask mkvdisk -name db2_data_2 -iogrp 0 -mdiskgrp 0 -size 100 -unit GB;

svctask mkvdiskhostmap -host db2server db2_data_2;

...

svctask mkvdisk -name db2_data_10 -iogrp 0 -mdiskgrp 0 -size 100 -unit GB;

svctask mkvdiskhostmap -host db2server db2_data_10;

Example 1A: The same as example 1 – but running it on a server rather than on the SVC/Storwize command line

Here’s exactly the same script – but running on a linux server in a bash shell, rather than on the SVC/Storwize

for count in `seq 1 10` ;
do
    ssh myuser@mysvccluster.domain.com svctask mkvdisk -name db2_data_$count -iogrp 0 -mdiskgrp 0 -size 100 -unit GB;
    ssh myuser@mysvccluster.domain.com svctask mkvdiskhostmap -host db2server db2_data_$count;
done;

Example 2: Restarting any stopped Remote Copy relationships every 10 minutes

I’m afraid I can’t do this in pseudo-code, so I’ll have to do it in perl instead… I’m not going to do error checking here for readability but ideally a script that is starting to get a bit more complicated should probably be doing error checking.

#!/usr/bin/perl

use strict;

use warnings;

my $ssh_cmd_prefix = 'ssh myuser@mysvccluster.domain.com';

while (1) {

       #Look for any stopped rc consistency groups

       my @stopped_cgs = `$ssh_cmd_prefix svcinfo lsrcconsistgrp  -delim :`;

       #output fields for lsrcconsistgrp
       #id:name:master_cluster_id:master_cluster_name:aux_cluster_id:aux_cluster_name:
       #primary:state:relationship_count:copy_type:cycling_mode:freeze_time

       foreach my $line (@stopped_cgs) {

             if ($line =~ /^id:/) {

                    #This is the header line - skip it

             } else {

                    my @parts = split (/:/, $line);

                    #We can see from the output fields above that state is the 8th field

                    # and name is the 2nd field - so in the array world that becomes 7 and 1

                    my $rccg_state = $parts[7];

                    my $rccg_name = $parts[1];

                    print "RCCG $rccg_name - in state $rccg_state\n";

                    #Only restart if in consistent_stopped or inconsistent_stopped

                    # NOT Idling - because if we are in idling then this might be a real DR

                    if ($rccg_state =~ /_stopped$/) {

                           print "Starting RCCG $rccg_name\n";

                           my $output = `$ssh_cmd_prefix svcinfo lsrcconsistgrp  -delim : 2>&1`;

                           print "Result was $output\n\n";

                    } else {

                           #Do Nothing

                    }

             }

       }

       #Sleep 10 minutes

       sleep 10*60;

}

Note: This is valid perl code – but I confess that I didn’t actually test it, so there might be some subtle bugs in there.

Useful reference pages:

Syntax for the basic scripting languages such as bash – https://ss64.com/

When using perl I always google for “perldoc <thing I want to do>”. This search seems to find (in this order) perl manual pages, CPAN module pages and if those don’t work it will find the wisdom of the perl monks.

I hope this was useful. Please feel free to add your own suggestions and tips below in the comments.

Edit 29 May 2017: Getting vdisk properties efficiently

I have been at IBM TechU in Florida the last week. During the conference I met up with someone who had already done a lot of scripting with SVC, and I gave him a tip that made him want to throttle me and thank me at the same time. Since it seemed like a good tip I thought I would share it here.

If you want to get all of the properties for a lot of volumes, you normally have to do 10s or 100s of lsvdisk commands.

Well – if you want to be a bit more efficient you can get almost everything you want to know about every vdisk in the system with 2 or 3 commands:

1/ lsvdisk -gui

2/ lssevdiskcopy

3/ lsvdiskcopy

The -gui flag is an undocumented flag on almost all “ls” commands that adds many if not all of the detailed vdisk properties to the “summary” output. This can make writing scripts MUCH more useful. However this is an internal-use-only view so we are at liberty to add/remove columns at any time (unlike the normal views where we try really hard not to remove or re-order properties to avoid breaking scripts).

so the lsvdisk -gui should give you most of the output before the first “copy” line in the output.

lssevdiskcopy gives you details about copies 0 and/or 1 if those copies are thin or compressed. Most importantly that is where you get the used, and real capacities from.

If you have any non-thin copies and you want some properties for those you can use lsvdiskcopy (I don’t think you’ll ever need this)

You should be able to get everything you need from these three “summary” commands without running any detailed commands. So that gets you down from up to 10,000 lsvdisk detailed view commands down to 2 commands total – that seems like a probably useful tip 🙂

Remember – if you write long-term scripts to use this, then you will need to check the header line to make sure that things haven’t moved!