Exploring an IBM v7000 Storage Engine

Within the storage landscape today we have a multitude of products available to solve the never ending storage challenge. Every storage product has it’s own set of features and characteristics which deliver certain elements of value to storage architects like myself. Currently an opportunity surfaced in which an IBM v7000 storage engine was available for me to review. So with the IBM v7000 in hand I started a process to evaluate what features and characteristics this product hosted to create some interesting solutions to some common storage challenges. Before breaking into the main content of this blog I would like to state what this blog isn’t about. This is not a bench mark of everything the v7000 can do. Good performance is very subjective and with enough resource and tuning we can solve any load requirement with any product. As well it’s is not a feature comparison exercise with other products in the same class. What I’m looking for is value in the product and how we can apply the value to solve storage problems.

If your not familiar with the v7000 it does not take long to understand and navigate it, the management web based GUI is logically strait forward. It is beneficial  to understand how the v7000 presents storage when working in the GUI. Historically the v7000 is a version of the original IBM SAN Volume Controller or more commonly known as the IBM SVC with an updated set of management interfaces and some new useful features. The primary concept of the SVC is to provide a virtually abstracted instance of any block storage architecture the SVC can interface with. The design uses a model to which we define units of managed disk or MDisks. MDisks are block level devices or LU’s such as a single disk or an array of disks which can be located internally or externally from the SVC’s clustered engines. MDisks can be placed into tiered storage resource pools or directly served as a block image. As the SVC virtually re-provisions it’s storage attachment’s to initiators it creates multi-pathed nv-cache accelerated active/active connections for it’s storage consumers even if the original MDisk lacks this function. This definitely adds value as a tool to solve storage problems, especially where you may need to migrate or restructure an active SAN environment.

The setup phase of this evaluation proved to be very easy, the v7000 ships with a USB Flash Drive which was pre-loaded with a simple Windows based initialization tool. This simple tool allows anyone to quickly reach the browser management subsystem in less than 5 minutes.  Once the system is configured the v7000 only accepts a small set of functions from the tool that require physical access such as resetting the superuser password. This limits any serious security events like connecting the USB Flash Drive to an incorrect host and disruptively changing it’s interface configuration.

Here are some screen shots of what brought me to the GUI in less than 5 minutes.

IBM v7000 Init Tool

v7000 InitTool IP Address

v7000 InitTool Completion

For myself this easy to use tool was actually not the significant element of value with the initial system configuration. Under the covers of the graphical tool we can see the v7000 provides the ability to configure a system from a text file when it’s brought up from the default factory state. A subset of system command line interface functions are available via the USB based configuration satask.txt file. The InitTool itself generates the subset or you can create your own specific commands once you know the appropriate statements. This means we can create an automated configuration process with it. The reason I find this useful stems more from the context of a disaster recovery functional element and test lab provisioning. For example if the alternate environment hosted a ZFS replicated set of LUNs on a commodity server we could easily create a configuration driven from the command line scripts that will serve a cloned set of the LUNs served in image mode using the v7000 or its lower cost v3700 counterpart. This would allow us to cost effectively mirror the functionality of a source environment on demand while also allowing us to return back to a test lab or other previous state all in the same day. In other words we can create the agility to restructure what a set of VMware ESXi or other hypervisor clusters can view repetitively on demand.

In order to evaluate the performance behaviors of the v7000 I decided it needed to go through what I like to call a little storage server hell. It’s a work exercise were we push the server to perform a 100% random IO load at 50% read and write over 512 byte requests. In this case it is a test of what a specific client can drive under specific conditions. The client driving this load is a Windows 7 based VMware VM with 2 x 3.0Ghz Intel i7 vCPUs running IOMeter with 64 worker threads. The VMware hypervisor host is a vSphere 5.1 based ESXi instance running over a dual 2GB Fiber channel HBA. The loading VM resembles a very heavy storage consumer in respect to a virtual storage consumption  context. In almost all of these evaluations we will be exercising them against 2 defined pools which are dp1 and dp2. dp1 is a pool hosted by the internally based storage resources of the v7000 evaluation unit, mdisk0 is mapped as a raid 5 array and mdisk4 is our mirrored SSD array. The SSD tier is not used in this specific test. We will also be exploring some externally configured FC attached storage which is defined to storage pool dp2 so more of that will follow later in this blog entry.

v7000 Test Base Pools

Let explore the test configuration and it’s results.

The specifics for the workload are as follows:

Storage Server Stress Scale  (1 – 10) = 10
Workers(Threads) = 64
Disk Size in Sectors = 24,000,000
IO Size in Bytes= 512
Outstanding Requests per Worker =  1
Random Operation Percentage  = 100
Read Operation  Percentage  = 50
Write Operation  Percentage = 50
Fiber Channel Paths = 2
Fiber Channel Speed in GB = 2
Settling Time in Seconds = 240

Test 1 = Random IO Storage Provisioning Hell
Tool = IOMeter
Volume Name = svc1-san-vol0
Raid Mode = 5
Thick Block Map Mode = 0
Compression = 0
SSD Cache  = 0
NV Ram Cache in GB = 8

As we can observe the virtual disk sector map size of 24M far exceeds the storage servers non volatile cache and this means the server will need to exercise random seeks for a significant portion of the requests. Let’s look at the results graphically.

v7000 Performance iops under 100% rnd 50% read/write

 

Obviously the v7000 system deals with the punishment very efficiently. Specifically we can see the latency is very low even under significant stress. This demonstrates the system performance excels even under a most demanding completely random work loads. The v7000 design maturity is evident in this load test. I’m sure we could push it further than this but we must keep the test results in context as the evaluation system only hosts an 8 drive 10k SAS raid 5 array. The constraint is an effective way to observe how well the SAN Volume Controller cluster software performs.

Just for brevity I collapsed the 24M virtual disk down to 1M of assigned sectors in a effort to observe what the storage cache based IO response would present under a  100% random 50 % read/write IO load.

v7000 Performance iops under 100% rnd 50% read/writre

Well it was very apparent that the Windows 7 VM is the limiting factor here, however it still demonstrates the value in the engineering. The latency is zero even with 23500 random IOPS hitting the storage cluster.

I found the compression feature to be effective for almost any type of VM’s IO load. There are some basic rules of engagement one should follow when using compression on the v7000. The first rule surrounds the load type, specifically be wary of very high intensity random write loads. This is not because the v7000 will not perform well for this load, it’s actually rooted around the systems CPU load factor when other CPU demand factors are present. You do not want to push the normal running CPU load over 60% for a sustained period as it will increase the possibility of creating excessive peak loading events. The second rule addresses the desire to engage in easy tier functionality. The issue becomes one where compression will not predicate a proper heat map pattern since writing compressed data is always a pattern shifting scenario and thus will not remain at the last heat map hit point. You can still drive a compressed volume but you would need to move the entire LUN to the SSD tier to be fully effective.

In order to gauge how well the v7000’s compression algorithm responds I chose to drive the engine with a typical Windows random 4k 70% read, 30% write load. Let’s observe the graphical result of a sustained 4 minute run.

The specifics for the workload are as follows:
Storage Server Stress Scale  (1 – 10) = 6
Workers(Threads) = 32
Disk Size in Sectors = 24,000,000
IO Size in Bytes= 4096
Outstanding Requests per Worker =  1
Random Operation Percentage  = 100
Read Operation  Percentage  = 70
Write Operation  Percentage = 30
Fiber Channel Paths = 2
Fiber Channel Speed in GB = 2
Settling Time in Seconds = 240

Test 2 = Typical Random IO Compressed Storage Provisioning
Tool = IOMeter
Volume Name = svc1-san-vol0
Raid Mode = 5
Thick Block Map Mode = 0
Compression = 1
SSD Cache  = 0
NV Ram Cache in GB = 8

v7000 Compression Performance iops 4k 70-30 read-write mix

 

The results are very interesting, we can observe a substantial drop is the SAS backplane IOPS in the interface metrics section. This is a excellent result as it will reduce the mdisk load and thus increase the data throughput. Another important element is the greatly reduced IO load at the mdisk layer. The input side volume is receiving a total of 2516 IOPS while the output side only requires 1452 IOPS. This is certainly a valued performance enhancing behavior when employing the compression feature. The final element I see as noteworthy is the very low latency result at the provisioned volume in which it never exceeds 3ms during the entire load run.

As a bonus we only consumed 17% of the CPU and gained the following compression capacity gain:

v7000 Compression Gain on a Windows 7 VM

Even with a completely random data foot print workload function for this performance behavior test case, we gained 32% in capacity when using the compression feature and I’m quite happy with that result.

From the IOMeter client side the performance results do correlate and it is demonstrated with this screen shot.

VM to v7000 IOMeter IOPS on compressed volume no ssd

Storage tiering is one of the more important elements the v7000 Storage Server can provision. All the marketing noise for this product emphasize that it’s easy to use and I would concur it was very easy to use and it works without any effort. The v7000 provisions tiering by granting the storage administrator the ability to define performance classes of mdisk arrays within a pool. IBM engineers make use of IO activity heat maps to determine which block extents within a defined volume should be migrated to a higher performance tier. You do have control of the initial size of the extents when you create the pool itself. Once created you cannot change the extent size and nor should. The default extent size is 256K which I did do a series of performance checks on and the IBM engineers have chosen a very good default. 256K fits the general use VM provisioning most suitably with the best performance over a range from 32K to 1MB. The v7000 engineers chose a 24Hr cycle of activity time within the heat map data to determine which extents should move to a higher tier and I agree with this methodology. Many dialogs about the subject of using shorter sampling time algorithms do circulate the web. I find that if the algorithm is too short or the extent size is too small the results are not favorable. When we move data to quickly we begin to thrash it around and this is not efficient. To much movement generates fragmentation, as well it uses too much backplane bandwidth and other systems resources resources like cache unnecessarily. It also does not allow the system an opportunity to move the extent when the system is most idle.

To observe the benefits of running tiered on the v7000 I chose to perform a before and after workload run using the same typical Windows 4k random IO with 70% read and 30% write. The specifics of the run are as follows and we will observe the results at the IOMeter client side.

Storage Server Stress Scale  (1 – 10) = 6
Workers(Threads) = 32
Disk Size in Sectors = 24,000,000
IO Size in Bytes= 4096
Outstanding Requests per Worker =  1
Random Operation Percentage  = 100
Read Operation  Percentage  = 70
Write Operation  Percentage = 30
Fiber Channel Paths = 2
Fiber Channel Speed in GB = 2
Settling Time in Seconds = 240

Test 3 = Typical Random IO Tiered Storage Provisioning
Tool = IOMeter
Volume Name = svc1-san-vol0
Raid Mode = 5
Thick Block Map Mode = 0
Compression = 0
SSD Cache  = 1
NV Ram Cache in GB = 8

VM to v7000 IOMeter IOPS Tiered mode before heat map move

And after 24 hours the same test parameters yielded the following result.

VM to v7000 IOMeter IOPS Tiered mode after heat map move

Obviously the result demonstrates significant IOPS performance gains. In this test the first workload run executed for 4 minutes and was then left idle for a period of 24 hours. Subsequently the second run was performed for the same 4 minute length. Within the IOMeter results I did find it very interesting that the throughput gain was quite remarkable. I was not expecting to see such a significant increase in the Total MB/s value. It’s actually 22 times greater than the original run. I did have to run it a second time just to verify that it was not an anomaly in the original test run. After tearing down the volume, recreating it, rerunning the workload and waiting the required 24 hours it again presented the same result. It’s something I will have to investigate further as the reason eludes me for the moment. None the less the numbers speak for themselves.

One of the most important features that the v7000 hosts for myself is the ability to virtualize external storage systems that are presented via fiber channel protocols. The reason I find value in this feature is that it grants the ability to move significant amounts of storage around without major impact to the primary external storage consumer. In addition to the migration capability one can also front end an external storage host and synchronously mirror the data to a second external storage host.

The v7000 officially supports a significant number of FC based external storage systems. Personally I wanted to investigate if the v7000 could handle an open source based product such as OpenSolaris which is now formally any Illumos based engine. There is a synergy that can be gained within the world of the svc and the commodity open source world. With that idea in mind I built the required elements and did some very interesting tests of provisioning up some OpenIndiana FC based LUNs to the v7000.

Lets walk through some of the build elements.

The source storage host hardware was some off the shelf white box commodity components as follows:

1 – Antec Case
1 – LSI SAS3442 Adapter
1 – QL2462 Dual FC Adapter
8Gb – DRAM
1 – Intel i7 930 CPU
4 – Seagate NL SAS ST32000645SS
1 – USB Flash
1 – X58 Gigatech Mobo

The USB Flash Drive was loaded with an OpenIndiana USB based install using version oi_151a7.
The basic OpenIndiana storage configuration elements are as follows:

~# zpool create -f sp1 raidz1 c7t13d0 c7t14d0 c7t15d0 c7t16d0

(4 Disk Raidz1 array)

~# update_drv -a -i ‘”pciex1077,2432″‘ qlt

(FC Target Mode Driver Binding For COMSTAR)

~# zfs create -b 64K -s -V 256G sp1/zfs1-san-vol1

~# zfs create -b 64K -s -V 256G sp1/zfs1-san-vol2

(Some ZFS Posix Block Devices)

stmfadm create-lu /dev/zvol/rdsk/sp1/zfs1-san-vol1

stmfadm create-lu /dev/zvol/rdsk/sp1/zfs1-san-vol2

(Some COMSTAR exposed LUNs)

~# stmfadm create-hg svc1

~# stmfadm add-view -n 0 -h svc1 600144F0F5644400000050BD35750001

~# stmfadm add-view -n 1 -h svc1 600144F0F5644400000050BD35750002

(Some COMSTAR host groups and views to the LUNs now assigned with a GUID)

~# stmfadm add-hg-member -g svc1 wwn.500507680220146B

~# stmfadm add-hg-member -g svc1 wwn.500507680210146B

(Add the v7000 to the COMSTAR svc1 host group)

~# zfs create -s -b 8K -V 32G sp1/zfs1-san-vol3

~# stmfadm create-hg esx1

~# stmfadm add-hg-member -g esx1 wwn.210000e08b83cef2

~# stmfadm create-lu /dev/zvol/rdsk/sp1/zfs1-san-vol3

~# stmfadm add-view -n 4 -h esx1 600144F0F5644400000050CD19240001

(Create a volume to test the v7000 image mode)

During the initial testing I found that the v7000 does indeed successfully connect to the open source based OpenIndiana storage host and the LUNs are identified as generic targets. After discovering the LUNs they were added to the dp2 pool. As a comparative I chose to perform the typical Windows 4k 70/30 workload run on a newly created volume from the v7000.

Lets observe the metrics presented on the v7000 performance console.

v7000 Performance iops 4k 70-30 read-write comstar

The performance is impressive for a 4 disk array mdisk presentation. There is an interesting v7000 caching effect revealed at the mdisks metric panel where we can observe the write load is only 420 IOPS verses the virtual volume IOPS write rate of 1650. This is definitely a beneficial  impact of the non-volitile cache in the v7000 cluster. We can also see that the disk latency at the external side is considerably higher for write operations than that of the virtual volume layer. As well we can see the external storage host is also optimizing the operations demonstrated by the gradual increase in FC operations on the Interface metrics panel. I’m very pleased to see that the v7000 can successfully serve an open source based storage target and that there are valuable optimizations gained from this configuration.

One element I was very interested in exploring was the image mode feature of the v7000 which gives us the ability to present a volume in passthrough mode. In other words the v7000 acts as the target presenting the external storage content as a block for block image. The same caching benefits observed in the above test are also presented when using the image mode. In this next test we will first present some storage from the external OpenIndiana host to a VMware ESXi 5.1 hypervisor and create a VMFS volume with it. We will then place the IOMeter client VM on the volume and run a load test using the Windows 4k 70/30 run. Then we will shut the VM down, remove the volume from the ESXi host and present the OpenIndiana LUN to the v7000 for import. Once imported into the v7000 in image mode we will present and add the LUN back to the ESXi host. Finally we will rescan the FC adapter for VMFS volumes and observe the result.

Lets walk through the operation graphically.

TZVM IOMeter VM on OpenIndiana

TZVM running on OpenIndiana ZFS VMFS volume named zfs-san-vol3.

TZVM paths on OpenIndiana

Observing the VMware presented Fiber Channel paths for zfs1-san-vol3. Note the policy storage array type.

TZVM zfs1-san-vol3 Pre Image Mode IOMeter test

The pre-image mode migration IOMeter results are now presented for a 4 Min run. This is a 4K random 70/30 read write mix. At this point we need to shutdown the VM and we will also remove the zfs1-san-vol3 datastore from the ESXi host prior to re-introducing the same volume over the v7000 svc engine. We simply remove the ESXi FC initiator member definition from the COMSTAR esx1 group and this will prevent any connectivity of the original datastore instance. We do this to prevent any VMware snapshot detection issues.  At the same time we will add the zfs1-san-vol3 LUN view to the COMSTAR svc1 group.

~# stmfadm remove-hg-member -g esx1 wwn.210000e08b83cef2

~# stmfadm add-view -n 4 -h svc1 600144F0F5644400000050CD19240001

 

v7000 SVC Image Mode View of zfs1-san-vol3 as mdisk7No pool placement for the Image is correct

At this point we have run the v7000 mdisk detection and imported the newly discovery mdisk7 LUN which is the zfs1-san-vol3 datastore. We do not add the image to a pool.

TZVM VMware Datastore zfs1-san-vol3 SVC Image Mode Add

We can now proceed to add the svc image of zfs1-san-vlo3 back to the ESXi host and we can observe its now exposed as an IBM Fiber Channel presentation on LUN4.

ESXi Resignaturing zfs1-san-vol3

When the datastore is added VMware does notice the naa value is different and it needs to confirm that we do want the current datastore volume to be mounted with the same signature as before. This is a typical response for an changed naa. If this was not the correct LUN for this signature accepting this naa would introduce instability to this VMware VMFS clustered datastore on all other ESXi hosts.

VMFS being Resolved

VMFS Resolved

VMware does indeed identify the external v7000 image mode presentation and we can observe zfs1-san-vol3 is completely intact.

 

v7000 Path Observation After Migration

We can observe the newly defined paths and take note of the  policy mode storage array type as its now in SVC mode. We also now have 4 paths available as well.

TZVM IOMeter Result after v7000 Image

With the ESXi now up and running with the re-established zfs1-san-vol3 datastore in image mode over the v7000 we can now run the ran 4k random 70/30 read write mix. We can obverse an immediate gain of 500 IOPS which is the write load hitting the v700 nv-cache 4 minutes into the test and we can see the synergy working. I let the test run for an addition 6 minutes for a total run of 10 minutes to observe the full cache benefit of the v7000 as the storage virtualization head.

TZVM IOMeter Result after v7000 Image Full 10 Min

Obviously we can see the benefit of the ZFS arc cache and v7000 nv-cache working together to improve our system latency and IOPS flow. The information presented in exploration does demonstrate that the v7000 brings values in many unique attributes and specifically drives a high degree of agility within the storage solutions scope.

Well this brings a close to this blog entry and I must say the results were very interesting and enlightening.

I hope you enjoyed the post.

Regards,

Mike

 

Site Contents: © 2013  Mike La Spina