Using AWS Storage Gateway Virtual Tape Library in Linux

If you are of around my age or younger you probably didn’t have much exposure to tape backup technologies. Tapes are sooo 90’s right?! I definitely didn’t expect that as an AWS Consultant I will have to learn about tapes. But I did! One of our customers wanted to use AWS Storage Gateway (SGW) in the Virtual Tape Library (VTL) mode and use it for backups with Veeam Backup suite. Veeam seems to “just work” with SGW VTL but I like to understand how things work under the hood so I decided to backup my Linux test system to VTL using just the low level Linux tools. That’s the best way to learn.

AWS Storage Gateway Virtual Tape Library in a nutshell

AWS SGW is a preconfigured virtual machine that can be deployed either to an on-premises VM in VMware ESX, Microsoft Hyper-V or in AWS as an EC2 instance. The Storage Gateway is fully managed through AWS Console and there is very little you can do on the VM or EC2 itself. Essentially you can only set up networking so it can connect to AWS. SGW can be configured in on of 3 modes – File gateway (NFS), Volume gateway (iSCSI) or Tape gateway (Virtual Tape Library – VTL). We will discuss the last one.

SGW VTL provides 10 virtual tape drives and we can create 1 or more virtual tapes to use in those drives. Each tape has an ID (Barcode) e.g. AB123456, XYZ98765, etc. SGW VTL also provides a virtual media changer that can move the tapes in to and out from the drives. Data written to a tape in a drive are stored locally on SGW in its Cache disk and also immediately sent to S3.

In our customer’s case we’ve got 10 drives and 10 tapes (1TB each) and Veeam backs up to one tape after another. Once a tape is full it moves on to the next drive loaded with an empty tape and continues to backup there. And so on, until the backup cycle is finished. Next week it automatically swaps the week’s tape set with a new tape set for the next week. That’s all Veeam’s job, AWS SGW VTL only handles the drives and tapes and backing them up to S3.

Now the interesting part. Interesting to me at least. VTL stores the tapes in Storage elements (slots) and there are 3200 of them! Half of them are “normal” slots (ID 1 ~ 1600) where we can store unused tapes and move them to and from the drives when needed. Tapes in these slots (as well as those in the drives) are backed by S3 and are ready to use.

The other half are “Import/Export Storage elements” (slots ID 1601 ~ 3200). Newly created virtual tapes pop up there and can be transferred to normal slots or loaded into virtual drives.

However when a tape is moved into an Import/Export slot it is immediately archived to AWS Glacier and is no longer available for use. From Glacier it can be retrieved in read-only mode for backup recovery or deleted permanently.

That’s in a nutshell what AWS SGW VTL does. The rest is your backup software job.

Not we will go step by step from installing the gateway through using the media changer to actually backing up and restoring data.

Step 1 – Install AWS Storage Gateway – Virtual Tape Library

There’s no magic to this step. Just follow the prompts in the AWS Console and deploy SGW either on EC2 or in your on-prem VMware or Hyper-V. I also tried to stand it up in VirtualBox and it worked just fine. For this demo I deployed SGW on EC2 instance in the Oregon region and attached two 150GB data disks – one for Cache and one for Upload buffer. In production you may want to have bigger disks, depending on your daily / weekly backup size and internet link speed. Note that you will need HTTPS access to the SGW instance in order to complete the installation – make sure the security group permits that.

Once the Storage Gateway is installed we create 5 new tapes, 100GiB each with a “Barcode” prefix “DEMO”. Yes I know that 5x 100GiB is more than the disk space allocated to SGW but it doesn’t matter. The primary place where the tapes are stored is S3 and the Cache and Upload Buffer disks are only used to .. well .. cache the data from S3 locally on the SGW and buffer the uploads to S3. We can just as well create 10x 2.5TB tapes and it will still work, albeit slower.

We can also create another EC2 instance with Amazon Linux for testing client access. We will need iscsi-initiator-utils, mt-st and mtx packages. The latter two from Fedora 27 Rawhide repository as they are not available in Amazon Linux repo.

[root@ip-172-31-15-7 ~]# yum install iscsi-initiator-utils lsscsi
[root@ip-172-31-15-7 ~]# rpm -ivh --nodeps ftp://rpmfind.net/linux/fedora/linux/development/rawhide/Everything/x86_64/os/Packages/m/mt-st-1.1-20.fc27.x86_64.rpm
[root@ip-172-31-15-7 ~]# rpm -ivh ftp://rpmfind.net/linux/fedora/linux/development/rawhide/Everything/x86_64/os/Packages/m/mtx-1.3.12-17.fc27.x86_64.rpm

Important: make sure that st and sg kernel modules are loaded at boot. In Amazon Linux and RedHat Linux it’s best achieved by adding two lines into /etc/rc.modules:

[ec2-user@ip-172-31-15-7 ~]$ cat /etc/rc.modules
modprobe sg
modprobe st

It may also be a good idea to attach another large disk with some sample data to test the backups. I created 200GB filesystem under /data200, downloaded lots and lots of Linux kernel tarballs from a nearby kernel.org mirror and unpacked them side by side. Those 200GB filled up pretty quickly 🙂

Step 2 – Connect VTL devices to Linux

All communication between the Linux client and VTL is over iSCSI protocol, which means TCP traffic on port 3260. Make sure your Firewalls or Security Groups permit traffic from the Linux client to the Storage Gateway IP address.

First of all we have to discover and attach all the iSCSI targets that SGW VTL offers. Here 172.31.15.7 is my Linux box and 172.31.7.216 is the AWS Storage Gateway.

[root@ip-172-31-15-7 ~]# iscsiadm --mode discovery --type sendtargets --portal 172.31.7.216
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-09
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-05
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-06
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-07
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-mediachanger
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-08
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-01
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-02
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-03
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-04
172.31.7.216:3260,1 iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-10

The next step is to login to all the targets, in other words attach them as devices to the Linux box. First attach the media changer, then all the tape drives in a simple loop.

[root@ip-172-31-15-7 ~]# /sbin/iscsiadm --mode node --targetname iqn.1997-05.com.amazon:sgw-/sbin/iscsiadm --mode node --targetname iqn.1997-05.com.amazon:sgw-ab6587c2-mediachanger --portal 172.31.7.216:3260,1 --login
Logging in to [iface: default, target: iqn.1997-05.com.amazon:sgw-ab6587c2-mediachanger, portal: 172.31.7.216,3260] (multiple)
Login to [iface: default, target: iqn.1997-05.com.amazon:sgw-ab6587c2-mediachanger, portal: 172.31.7.216,3260] successful.

[root@ip-172-31-15-7 ~]# for i in $(seq --format=%02.0f 01 10); do /sbin/iscsiadm --mode node --targetname iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-${i} --portal 172.31.7.216:3260,1 --login; done
Logging in to [iface: default, target: iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-01, portal: 172.31.7.216,3260] (multiple)
Login to [iface: default, target: iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-01, portal: 172.31.7.216,3260] successful.
[...]
Logging in to [iface: default, target: iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-10, portal: 172.31.7.216,3260] (multiple)
Login to [iface: default, target: iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-10, portal: 172.31.7.216,3260] successful.

Now we’ve got all the remote tapes accessible as local devices /dev/st0 ~ /dev/st9 for the tapes and /dev/sgX for the media changer. The assignment of /dev/stX indexes to the tape drive IDs is a bit chaotic and I’m afraid it’s not even reboot-proof, i.e. the device names may be different next time the system reboots. Likewise media changer /dev/sgX index is a little unpredictable.

Fortunately there are stable symlinks for the stX and sgX names under /dev/tape/by-id and /dev/tape/by-path:

[ec2-user@ip-172-31-15-7 ~]$ ls -l /dev/tape/by-path/
total 0
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216:3260-iscsi-iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-01-lun-0 -> ../../st4
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216:3260-iscsi-iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-01-lun-0-nst -> ../../nst4
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-02-lun-0 -> ../../st9
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-02-lun-0-nst -> ../../nst9
[...]
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-09-lun-0 -> ../../st1
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-09-lun-0-nst -> ../../nst1
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-10-lun-0 -> ../../st0
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-10-lun-0-nst -> ../../nst0

[ec2-user@ip-172-31-15-7 ~]$ ls -l /dev/tape/by-id/
lrwxrwxrwx 1 root root  9 Sep 12 05:38 scsi-2414d14236 -> ../../sg9

That’s great, now we can refer to the media changer as /dev/tape/by-id/scsi-2414d14236 (and we could make this a symlink to /dev/changer as well) and the tape drive 05 will always be /dev/tape/by-path/ip-172.31.7.216:3260-iscsi-iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-05-lun-0 for example. Nice, stable, descriptive names, although a bit long. Never mind, we’ll live with that.

Step 3 – Using the media changer

The media changer is controlled by the mtx program. Let’s see if we can gather some info:

[root@ip-172-31-15-7 ~]# mtx -f /dev/tape/by-id/scsi-2414d14236 inquiry
Product Type: Medium Changer
Vendor ID: 'AWS     '
Product ID: 'Gateway-VTL     '
Revision: '0100'
Attached Changer API: No

That’s great, it’s AWS Gateway-VTL – good to have it confirmed.

Now list the tape drives, tapes and slots:

[root@ip-172-31-15-7 ~]# mtx -f /dev/tape/by-id/scsi-2414d14236 status
Storage Changer /dev/tape/by-id/scsi-2414d14236:10 Drives, 3200 Slots ( 1600 Import/Export )
Data Transfer Element 0:Empty
Data Transfer Element 1:Empty
Data Transfer Element 2:Empty
Data Transfer Element 3:Empty
[...]
Data Transfer Element 7:Empty
Data Transfer Element 8:Empty
Data Transfer Element 9:Empty
Storage Element 1:Empty:VolumeTag=
Storage Element 2:Empty:VolumeTag=
Storage Element 3:Empty:VolumeTag=
[...]
Storage Element 1598:Empty:VolumeTag=
Storage Element 1599:Empty:VolumeTag=
Storage Element 1600:Empty:VolumeTag=
Storage Element 1601 IMPORT/EXPORT:Full :VolumeTag=DEMO1FE3BA
Storage Element 1602 IMPORT/EXPORT:Full :VolumeTag=DEMO19E3BC
Storage Element 1603 IMPORT/EXPORT:Full :VolumeTag=DEMO18E3BD
Storage Element 1604 IMPORT/EXPORT:Full :VolumeTag=DEMO1EE3BB
Storage Element 1605 IMPORT/EXPORT:Full :VolumeTag=DEMO1BE3BE
Storage Element 1606 IMPORT/EXPORT:Empty:VolumeTag=
 Storage Element 1607 IMPORT/EXPORT:Empty:VolumeTag=
[...]
Storage Element 3198 IMPORT/EXPORT:Empty:VolumeTag=
Storage Element 3199 IMPORT/EXPORT:Empty:VolumeTag=
Storage Element 3200 IMPORT/EXPORT:Empty:VolumeTag=
[root@ip-172-31-15-7 ~]#

Great, we’ve got:

10 tape drives (Data Transfer Element 0 ~ 9),
1600 “normal” slots (Storage Element 1 ~ 1600), and
1600 “IMPORT/EXPORT slots (torage Element 1601 ~ 3200)
5 tapes ready in the IMPORT slots 1601 ~ 1605 with labels DEMOxxxxxx – those were created in Step 1 above.

All the drives are currently empty and the tapes are sitting in the Import slots. Let’s move load them to the drives. The command is mtx -f /dev/tape/by-id/scsi-{...} load <slotnum> <drive>

[root@ip-172-31-15-7 ~]# mtx -f /dev/tape/by-id/scsi-2414d14236 status | grep -v Empty
Storage Changer /dev/tape/by-id/scsi-2414d14236:10 Drives, 3200 Slots ( 1600 Import/Export )
Storage Element 1601 IMPORT/EXPORT:Full :VolumeTag=DEMO1FE3BA

[root@ip-172-31-15-7 ~]# mtx -f /dev/tape/by-id/scsi-2414d14236 load 1601 0
Loading media from Storage Element 1601 into drive 0...done

[root@ip-172-31-15-7 ~]# mtx -f /dev/tape/by-id/scsi-2414d14236 status | grep -v Empty
Storage Changer /dev/tape/by-id/scsi-2414d14236:10 Drives, 3200 Slots ( 1600 Import/Export )
Data Transfer Element 0:Full (Storage Element 1 Loaded):VolumeTag = DEMO1FE3BA
Storage Element 1602 IMPORT/EXPORT:Full :VolumeTag=DEMO19E3BC
Storage Element 1603 IMPORT/EXPORT:Full :VolumeTag=DEMO18E3BD
Storage Element 1604 IMPORT/EXPORT:Full :VolumeTag=DEMO1EE3BB
Storage Element 1605 IMPORT/EXPORT:Full :VolumeTag=DEMO1BE3BE

Now we’ve got 1 tape in the first drive ready to use. The other 4 tapes are still in their Import slots.

We can also “unload” tapes from drives back to the slots and “transfer” between slots. If the tape is unloaded / transferred to a “Normal” slot (slot id 1 ~ 1600) it will stay there ready for another use. If however the tape is unloaded to an “Export” slot (slot id 1601 ~ 3200) it will disappear and will be immediately archived to Glacier and no longer available for loading back to a drive.

See the man mtx for details on load, unload, transfer and other commands.

Step 4 – Backing up data

Now with a tape in a drive we are finally in a position to write something to the tape. We will use the classic tar – the Tape ARchiver – and also the mt tool to find out some info about the tape drive that we will need.

The basic tar usage is probably familiar to you: tar -cf <archive-name> <files...>

When writing to a tape the archive-name is the tape drive device name – one of those /dev/stX or /dev/nstX devices. But which one?

We will have to go back to /dev/tape/by-path:

[root@ip-172-31-15-7 ~]# ls -l /dev/tape/by-path/
total 0
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-01-lun-0 -> ../../st4
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-01-lun-0-nst -> ../../nst4
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-02-lun-0 -> ../../st9
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-02-lun-0-nst -> ../../nst9
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-03-lun-0 -> ../../st6
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-03-lun-0-nst -> ../../nst6
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-04-lun-0 -> ../../st2
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-04-lun-0-nst -> ../../nst2
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-05-lun-0 -> ../../st7
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-05-lun-0-nst -> ../../nst7
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-06-lun-0 -> ../../st5
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-06-lun-0-nst -> ../../nst5
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-07-lun-0 -> ../../st3
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-07-lun-0-nst -> ../../nst3
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-08-lun-0 -> ../../st8
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-08-lun-0-nst -> ../../nst8
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-09-lun-0 -> ../../st1
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-09-lun-0-nst -> ../../nst1
lrwxrwxrwx 1 root root  9 Sep 12 05:38 ip-172.31.7.216...-tapedrive-10-lun-0 -> ../../st0
lrwxrwxrwx 1 root root 10 Sep 12 05:38 ip-172.31.7.216...-tapedrive-10-lun-0-nst -> ../../nst0

One option is to use the full long “iSCSI Path” name:

tar -c -f "/dev/tape/by-path/ip-172.31.7.216...-tapedrive-01-lun-0" <files>

That’s clear and descriptive but long. These long names are actually symbolic links to the actual linux device names like /dev/st0. Of course we can use those instead if we want to save some typing. In the directory listing above you can see that tapedrive-01 is a symlink to /dev/st4.

tar -c -f /dev/st4 <file>

This command is exactly equivalent to the previous one. It’s shorter but not obvious which tape drive we are using.

To confuse things even more mtx numbers the drives 0 ~ 9 while the iSCSI target names are tapedrive-01 to tapedrive-10 and the corresponding to /dev/stX numbers are mixed up in no particular order. Phew, what a mess!

Now that we know the device name let’s try to backup a cloned Linux kernel GIT repository onto the virtual tape.

[root@ip-172-31-15-7 data100]# tar -cv -f /dev/st4 linux
linux/
linux/certs/
linux/certs/blacklist.c
linux/certs/Makefile
tar: /dev/st4: Cannot write: Invalid argument
tar: Error is not recoverable: exiting now

Hmm, not good. It took me a while to figure out this error. Let’s look at the kernel messages output:

[root@ip-172-31-15-7 data100]# dmesg
[...]
[75883.166817] st 7:0:0:0: [st4] Block limits 0 - 1048576 bytes.
[75883.174664] st 7:0:0:0: [st4] Write not multiple of tape block size.

Apparently we are writing with wrong block sizes. But that’s the correct size? That’s where mt comes to rescue:

[root@ip-172-31-15-7 data100]# mt -f /dev/st4 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 65536 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (41010000):
BOT ONLINE IM_REP_EN

Tape block size is 65536 bytes – let’s try that with tar.

[root@ip-172-31-15-7 data100]# time tar -cv --record-size=65536 -f /dev/st4 linux
linux/
linux/certs/
linux/certs/blacklist.c
linux/certs/Makefile
linux/certs/Kconfig
linux/certs/.gitignore
linux/certs/system_certificates.S
linux/certs/blacklist_nohashes.c
linux/certs/blacklist_hashes.c
linux/certs/system_keyring.c
linux/certs/blacklist.h
linux/drivers/
[...]
linux/firmware/WHENCE
linux/firmware/myricom/
linux/firmware/myricom/lanai.bin.ihex
linux/firmware/ihex2fw.c
linux/firmware/r128/
linux/firmware/r128/r128_cce.bin.ihex

real    2m12.947s
user    0m0.640s
sys    0m3.172s
[root@ip-172-31-15-7 data100]#

Yay, that’s better! A quick look at the AWS Storage Gateway console confirms that some 2GB of data were written to the tape:

Now we can also list the contents of the tape:

[root@ip-172-31-15-7 data100]# time tar -tv --record-size=65536 -f /dev/st4
drwxrwxr-x ec2-user/ec2-user    0 2017-09-11 05:49 linux/
drwxrwxr-x ec2-user/ec2-user    0 2017-09-11 05:49 linux/certs/
-rw-rw-r-- ec2-user/ec2-user 4134 2017-09-11 05:49 linux/certs/blacklist.c
-rw-rw-r-- ec2-user/ec2-user 4152 2017-09-11 05:49 linux/certs/Makefile
-rw-rw-r-- ec2-user/ec2-user 3447 2017-09-11 05:49 linux/certs/Kconfig
-rw-rw-r-- ec2-user/ec2-user   44 2017-09-11 05:49 linux/certs/.gitignore
[...]
-rw-rw-r-- ec2-user/ec2-user 6742 2017-09-11 05:49 linux/firmware/ihex2fw.c
drwxrwxr-x ec2-user/ec2-user    0 2017-09-11 05:49 linux/firmware/r128/
-rw-rw-r-- ec2-user/ec2-user 5644 2017-09-11 05:49 linux/firmware/r128/r128_cce.bin.ihex

That’s what we expected 🙂

/dev/stX vs /dev/nstX

You may have noticed that for each virtual tape drive we’ve got two devices – for example /dev/st4 and /dev/nst4. What’s the difference? Those /dev/stX devices are rewinding and the /dev/nstX devices are non-rewinding. What does that mean?

After we finish writing our archive to /dev/st4 or to /dev/tape/by-path/ip-...-tapedrive-01-lun-0 it automatically rewinds the virtul tape back to start and positions the virtual “head” back to the beginning of the tape. Next write will start from the beginning and overwrite the previous archive. Or the next read will read the archive that we just wrote.

On the other hand when we finish writing to /dev/nst4 or /dev/tape/by-path/ip-...-tapedrive-01-lun-0-nst it will stay there, at that position on the tape and ready to write the next archive. This way we can write multiple archives on a single tape, one after another. The next read will complain that we are at the end of the tape 🙂

Step 7 – Archiving the tapes to Glacier

Many organisations require backup tapes to be stored off-site for a long time for compliance reasons. This is where VTL tape archiving comes to play.

I have backed up some 58 GiB of kernel source files onto our virtual tape and decided to preserve this precious collection for future generations. Note that at the moment the tape is in “Available” state.

To archive it all I need to do is unload it from the tape drive into one of the Import/Export slots with IDs 1601 ~ 3200. Lets unload it to slot 3200. Note that it will no longer appear in the list of tapes:

[root@ip-172-31-15-7 data200]# mtx -f /dev/tape/by-id/scsi-2414d14236 status | grep DEMO1FE3BA
Data Transfer Element 0:Full (Storage Element 1 Loaded):VolumeTag = DEMO1FE3BA

[root@ip-172-31-15-7 data200]# mtx -f /dev/tape/by-id/scsi-2414d14236 unload 3200 0
Unloading drive 0 into Storage Element 3200...done

[root@ip-172-31-15-7 data200]# mtx -f /dev/tape/by-id/scsi-2414d14236 status | grep DEMO1FE3BA
[root@ip-172-31-15-7 data200]#

At the same time in the AWS Storage Gateway console the tape status will change from Available to Archived. To use it again we will have to “Retrieve” it from Glacier. Note that you can retrieve it to the same or to a different AWS SGW VTL than the one used to create it.

Once it is Retrieved it will pop up again in the Import slot in read-only mode. Then we can load it back to one of the virtual tape drives and restore any data we need from it.

Step 8 – Delete it all

There are a few steps to delete the AWS Storage Gateway – Virtual Tape Library:

Eject (unload) all tapes from the tape drives using “mtx unload”
Re-archive all Retrieved tapes using “mtx unload” or “mtx transfer” to Export slot > 1601
Delete all Available and Archived tapes using the AWS Console or AWS CLI.
Delete the Storage Gateway sgw-abcd1234 using the AWS Console or AWS CLI.
Shut down and delete the SGW EC2 instance or VMware VM.

If you don’t follow these steps you may end up with messages like:

Gateways that failed deletion: sgw-AB6587C2

Tapes that failed deletion: DEMO18E3BD DEMO19E3BC

Cannot delete resources due to one or more resources’ status such as archiving or retrieving.

If you get any of these follow the steps above and try again 🙂

Backup schedules

With regards to backup schedules we were deciding between 2 options.

One scenario is to have e.g. 4 sets of tapes (4x 10 tapes, e.g. labeled AAxxxxxx, BBxxxxxx, CCxxxxxx, DDxxxxxx), one set per week, and rotate them weekly between the drives and the “normal” slots. Week 1 backups go to AA tapes, Week 2 backups to BB tapes, etc. Week 5 goes to AA tapes again. These will never be stored away in Glacier and will give you 3 recent weeks of backups plus the current week. That should be enough for most users and is simple to setup and manage.

Another scenario is to create a new tape set every week. Move the tapes from the Import slots to the drives, run the backups and at the end of the week move them to the Export slots for storing in Glacier. It is more work as it requires creating and deleting tapes every week, but it can be automated of course for example through Lambda functions. This is more suitable for customers who want to keep the backups for a very long time, perhaps for compliance purposes.

For our customer we decided to implement the first scenario with 4 tape sets and no Glacier archiving.

Troubleshooting

Being new to SGW VTL and to tape archiving in general I encountered a number of problems, and often spent quite a bit of time trying to figure out what was wrong. For future reference here are some common problems:

tar: /dev/st0: Cannot open: No medium found

[root@ip-172-31-15-7 ~]# tar tf /dev/st0
tar: /dev/st0: Cannot open: No medium found
tar: Error is not recoverable: exiting now

You would assume that drive 0 is /dev/st0, right? But it’s not. Not necessarily. If you loaded a tape to drive 0 and want to use it you must look up which /dev/stX device to use in /dev/tape/by-path.

mtx: cannot open SCSI device ‘/dev/changer’ – No such file or directory

[root@ip-172-31-15-7 ~]# mtx -f /dev/tape/by-<TAB>cannot open SCSI device '/dev/changer' - No such file or directory
cannot open SCSI device '/dev/changer' - No such file or directory

Annoying tab-completion bug in bash that assumes that a parameter for mtx should be /dev/changer<something>. Best to disable any specific ‘mtx’ tab-completion handling with: ~ # complete -r mtx

Put it to your ~/.bashrc for convenience.

mtx: cannot open SCSI device ‘/dev/changer’ – No medium found

[root@ip-172-31-15-7 ~]# mtx -f /dev/tape/by-id/scsi-2414d14236 status
cannot open SCSI device '/dev/tape/by-id/scsi-2414d14236' - No medium found

This was a confusing one. The device file was there but all requests were failing with “No medium found”.

The reason was that ‘sg’ kernel module wasn’t loaded and /dev/tape/by-id/scsi-2414d14236 was a symlink to /dev/st5 instead of /dev/sg5. Load the ‘sg’ module and try again: ~ # modprobe sg

tar: /dev/st0: Cannot open: Read-only file system

[root@ip-172-31-15-7 ~]# tar -cv --record-size=65536 -f /dev/tape/by-path/ip-172.31.7.216:3260-iscsi-iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-01-lun-0 /data200/
tar: /dev/tape/by-path/ip-172.31.7.216:3260-iscsi-iqn.1997-05.com.amazon:sgw-ab6587c2-tapedrive-01-lun-0: Cannot open: Read-only file system
tar: Error is not recoverable: exiting now

Most likely the tape in the drive was “Retrieved” from Archive (Glacier) and is therefore Read-Only. Retrieved tapes can’t be modified, erased or overwritten.

No devices show up under /dev/tape/by-path

Either you are not logged into the iSCSI targets, in which case follow the steps in Step 2 above.
Or you don’t have st kernel module loaded. Run: ~ # modprobe st

SGW crashing out of memory

SGQ will not complain if the VM or EC2 instance has less than 16GB RAM assigned but it may occasionally crash due to Out Of Memory. You can see these messages on the VM console in such a case. Give it 16GB RAM and it will work.

Problems deleting tapes or the storage gateway

Refer to Step 8 above. The tapes must be ejected (unloaded) before they can be deleted and all tapes must be deleted before the storage gateway can be deleted.

That’s all I’ve got to write about AWS Storage Gateway—Virtual Tape Library and Linux. Let me know in the comments if you found it useful 🙂