Running ZFS over iSCSI as a VMware vmfs store

The first time I looked at ZFS it totally floored me. This is a file system that has changed storage system rules as we currently know them and continues to do so. It is with no doubt the best architecture to date and now you can use it for your VMware stores.

Previously I had explored using it for a VMware store but ran into many issues which were real show stoppers. Like the VPD page response issue which made VMware see only one usable iSCSI store. But things are soon to be very different when Sun releases the snv_93 or above to all. I am currently using the unreleased snv_93 iscsitgt code and it works with VMware in all the ways you would want. Many thanks to the Sun engineers for adding NAA support on the iSCSI target service. With that being said let me divulge the details and behaviors of the first successful X4500 ZFS iSCSI VMware implementation in the real world.

Lets look at the Architectural view first.

X4500 iSCSI Architecture by Mike La Spina

The architecture uses a best practice approach consisting of completely separated physical networks for the iSCSI storage data plane. All components have redundant power and network connectivity. The iSCSI storage backplane is configured with an aggregate and is VLAN’d off from the server management network. Within the physical HP 2900’s an inter-switch ISL connection is defined but is not critical. This allows for more available data paths if additional interfaces were assigned on the ESX host side.
The Opensolaris aggregate and network components are configured as follows:

For those of you using Indiana….By default nwam is enabled on Indiana and this needs to be disabled and the physical network service enabled.

svcadm disable svc:/network/physical:nwam
svcadm enable svc:/network/physical:default

The aggregate is defined using the data link adm utility but first any bindings need to be cleared by unplumbing the interfaces.

e.g. ifconfig e1000g0 unplumb

Once cleared the assignment of the physical devices is possible using the following commands

dladm create-aggr –d e1000g0 –d e1000g1 –P L2,L3 1
dladm create-aggr –d e1000g2 –d e1000g3 –P L2,L3 2

Here we have set the policy allowing layer 2 and 3 and defined two aggregates aggr1 and aggr2. We can now define the VLAN based interface shown here as VLAN 500 instances 1 are 2 respective of the aggr instances. You just need to apply the following formula for defining the VLAN interface.

(Adaptor Name) + vlan * 1000 + (Adaptor Instance)

ifconfig aggr500001 plumb up netmask
ifconfig aggr500002 plumb up netmask

To persist the network configuration on boot you will need to create hostname files and hosts entries for the services to apply on startup.

echo ss1.iscsi1 > /etc/hostname.aggr500001
echo ss1.iscsi2 > /etc/hostname.aggr500002

Edit /etc/hosts to have the following host entries.

::1 localhost ss1.local localhost loghost ss1 ss1.iscsi1 ss1.iscsi2

On the HP switches its a simple static trunk definition on port 1 and 2 using the following at the CLI.

trunk 1-2 trk1 trunk 

Once all the networking components are up and running and persistent, its time to define the ZFS store and iSCSI targets. I chose to include both mirrored and raidz pools. I needed to find and organize the cxtxdx device names using the cfgadm command or you could issue a format command as well to see the controller, target, disk names if you’re not using an X4500. I placed the raidz devices across controllers to improve I/O and distribute the load. It would not be a prudent to place one array on a single SATA controller. So here is what it ends up looking like from the ZFS command view.

zpool create –f rp1 raidz1 c4t0d0 c4t6d0 c5t4d0 c8t2d0 c9t1d0 c10t1d0
zpool add rp1 raidz1 c4t1d0 c4t7d0 c5t5d0 c8t3d0 c9t2d0 c10t2d0
zpool add rp1 raidz1 c4t2d0 c5t0d0 c5t6d0 c8t4d0 c9t3d0 c10t3d0
zpool add rp1 raidz1 c4t3d0 c5t1d0 c5t7d0 c8t5d0 c9t5d0 c11t0d0
zpool add rp1 raidz1 c4t4d0 c5t2d0 c8t0d0 c8t6d0 c9t6d0 c11t1d0
zpool add rp1 raidz1 c4t5d0 c5t3d0 c8t1d0 c8t7d0 c10t0d0 c11t2d0
zpool add rp1 spare c11t3d0
zpool create –f mp1 mirror c10t4d0 c11t4d0
zpool add mp1 mirror c10t5d0 c11t5d0
zpool add mp1 mirror c10t6d0 c11t6d0
zpool add mp1 spare c9t7d0

It only takes seconds to create terabytes of storage, wow it truly is a thing of beauty (geek!). Now it’s time to define a few pools and stores in preparation for the creation of the iSCSI targets. I chose to create units of 750G since VMware would not perform well with much more than that. This is somewhat dependant on the size of the VM and type of I/O but generally ESX host will serve a wide mix so try I keep it to a reasonable size or it ends up with SCSI reservation issues (that’s a bad thing chief).

You must also consider I/O block size before creating a ZFS store this is not something that can be changed later so now is the time. It’s done by adding the –b 64K to the ZFS create command. I chose to use 64k for the block size which aligns with VMWare default allocation size thus optimizing performance. The –s option enables a sparse volume feature aka thin provisioning. In this case the space was available but it is my favorite way to allocate storage.

zfs create rp1/iscsi
zfs create -s -b 64K -V 750G rp1/iscsi/lun0
zfs create -s -b 64K -V 750G rp1/iscsi/lun1
zfs create -s -b 64K -V 750G rp1/iscsi/lun2
zfs create -s -b 64K -V 750G rp1/iscsi/lun3
zfs create mp1/iscsi
zfs create -s -b 64K -V 750G mp1/iscsi/lun0

Originally I wanted to build the ESX hosts using a local disk but thanks to some bad IBM x346 engineering I could not use the QLA4050C and an integrated Adaptec controller on the ESX host server hardware. So I decided to give boot from iSCSI a go thus here is the boot LUN definition that I used for it. The original architectural design requires local disk to prevent an ESX host failure in the event of an iSCSI path outtage.

zfs create rp1/iscsi/boot
zfs create -s -V 16G rp1/iscsi/boot/esx1

Now that the ZFS stores are complete we can create the iSCSI targets for the ESX hosts to use. I have named the target alias to reflect something about the storage system which makes it easier to work with. I also created an iSCSI configuration store so we can persist the iSCSI targets on reboots. (This may now be included with Opensolaris Indiana but I have not tested it)

mkdir /etc/iscsi/config
iscsitadm modify admin –base-directory /etc/iscsi/config
iscsitadm create target -u 0 -b /dev/zvol/rdsk/rp1/iscsi/lun0 ss1-zrp1
iscsitadm create target -u 1 -b /dev/zvol/rdsk/rp1/iscsi/lun1 ss1-zrp1
iscsitadm create target -u 2 -b /dev/zvol/rdsk/rp1/iscsi/lun2 ss1-zrp1
iscsitadm create target -u 3 -b /dev/zvol/rdsk/rp1/iscsi/lun3 ss1-zrp1
iscsitadm create target -b /dev/zvol/rdsk/mp1/iscsi/lun0 ss1-zmp1
iscsitadm create target -b /dev/zvol/rdsk/rp1/iscsi/boot/esx1 ss1-esx1-boot

Most blog examples of enabling targets show the ZFS command line method as shareiscsi=on. This works well for a new iqn but if you want to allocate additional LUN under that iqn then you need to use this –b backing store method.

Now that we have some targets you should be able to list them using:

iscsitadm list target

Notice that we only see one iqn for ss1-zrp1, you can use the –v option to show all the LUN’s if required.

Target: ss1-zrp1
iSCSI Name:
Connections: 2
Target: ss1-zmp1
iSCSI Name:
Connections: 2
Target: ss1-esx1-boot
iSCSI Name:
Connections: 2

It would be prudent to create some target initiator entries to allow authorization control of what initiator iqn’s can connect to a particular target.
This is an important step. It will create the ability to use CHAP or at least only allow named iqn’s to connect to that target. iSNS also provides a similar service.

iscsitadm create initiator –iqn esx1.1
iscsitadm create initiator –iqn esx1.2

Now we can assign these initiators to a target and then the target will only accept those initiators. You can also add CHAP authentication as well, but that’s beyond the scope of this blog.

iscsitadm modify target –acl esx1.1 ss1-esx1-boot
iscsitadm modify target –acl esx1.2 ss1-esx1-boot
iscsitadm modify target –acl esx1.1 ss1-zrp1
iscsitadm modify target –acl esx1.2 ss1-zrp1
iscsitadm modify target –acl esx1.1 ss1-zmp1
iscsitadm modify target –acl esx1.2 ss1-zmp1

In order to boot from the target LUN we need to configure the QLA4050C boot feature. You must do this from the ESX host using the ctrl Q sequence during the boot cycle. It is simply a matter of entering the primary boot target IP set the mode to manual and enter the iqn exactly as it was listed from the iscsitadm list targets command. e.g.

Once the iqn is entered the ESX host software can be installed and configured.
Till next time….

