Protecting Active Directory with Snapshot Strategies

Using snapshots to protect Active Directory (AD) without careful planning will most definitely end up in a complete disaster. AD is a loosely consistent distributed multi-master database and it must not be treated as a static system.  Without carefully addressing how AD works with Time Stamps, Version Stamps, Update Sequence Numbers (USNs), Globally Unique Identification numbers (GUIDs), Relative Identification numbers (RIDs),  Security Identifiers (SIDs) and restoration requirements the system could quickly become unusable or severally damaged in the event of an incorrectly invoked snapshot reversion.

There are many negative scenarios that can occur if we were to re-introduce an AD replica to service from a snapshot instance without special handling. In the event of a snapshot based re-introduction the RID functional component is seriously impacted. In any AD system RIDs are created in range blocks and assigned for use to a participating Domain Controller (DC) by the RID master DC AD role. RIDs are used to create SIDs for all AD objects like Group or User objects and they must all be unique. Lets take a closer look at the SID to understand why RIDs are such a critical function.

A SID is composed with the following symbolic format: S-R-IA-SA-RID:

  • S: Indicates the type of value is a  SID.
  • R: Indicates the revision of the SID.
  • IA: Indicates the issuing authority. Most are the NT Authority identity number 5.
  • SA: Indicates the sub-authority aka domain identifier.
  • RID: Indicates the Relative ID.

Now looking at some real SID example values we see that on a DC instance only the RID component of the SID is unique as show here in red text.

DS0User1      = S-1-5-21-3725033245-1308764377-180088833-3212
DS0UserGroup1 = S-1-5-21-3725033245-1308764377-180088833-7611

When an older snapshot image of a DC is reintroduced it’s assigned RID range will likely have RID entries that were already used to generate SIDs. Those SIDs would have replicated to the other DCs in the AD forest. When the reintroduced DC starts up it will try to participate in replication and servicing authentications of accounts. Depending on the age and configuration of its secure channel the DC could be successfully connected. This snapshot reintroduction event should be avoided since any RID usage from the aged DC will very likely result in duplicated SID creations and is obviously very undesirable.

Under normal AD recovery methods we would either need to restore AD or build a new server and perform a DC promo on it and possibly seize DC roles if required . The most important element of an normal AD restore process is the DC GUID reinitialization function. The DC GUID value reinitialization operation  allows the restoration of an AD DC to occur correctly. A  newly generated GUID becomes part of the Domain Identifier and thus the DC can create SIDs that are unique despite the fact that the RID assignment range it holds may be from a previously used one.

When we use a snapshot image of a guest DC VM none of the required Active Directory restore requirements will occur on  system startup and thus we must manually bring the host online in DSRM mode without a network connection and then set the NTDS restore mode up. I see this as a serious security risk as there a is significant probability that the host could be brought online without these steps occurring and potentially create integrity issues.

One mitigation to this identified risk is to perform the required changes before a snapshot is captured and once the capture is complete revert the change back to the non-restore state. This action will completely prevent a snapshot image of a DC from coming online from a past time reference.

In order to achieve this level of server state and snapshot automation we would need to provision a service channel from our storage head to the involved VMs or for that matter any storage consumer. A service channel can provide other functionality beyond the NDTS state change as well. One example is the ability to flush I/O using VSS or sync etc.

We can now look at a practical example of how to implement this strategy on OpenSolaris based storage heads and W2K3 or W2K8 servers.

The first part of the process is to create the service channel on a VM or any other windows host which can support VB or Power Shell etc. In this specific case we need to provision an SSH Server daemon that will allow us to issue commands directed towards the storage consuming guest VM from the providing storage head. There are many possible products available that can provide this service. I personally like MobaSSH which I will use in this example. Since this is a Domain Controller we need to use the Pro version which supports domain based user authentication from our service channel VM.

We need to create a dedicated user that is a member of the domains BUILTINAdministrators group. This poses a security risk and thus you should mitigate it by restricting this account to only the machines it needs to service.

e.g. in AD restrict it to the DCs or possibly any involved VM’s to be managed and the Service Channel system itself.

Restricting user machine logins

A dedicated user allows us to define authentication from the storage head to the service channel VM  using a trusted ssh RSA key that is mapped to the user instance on both the VM and OpenSolaris storage host. This user will launch any execution process that is issued from the OpenSolaris storage head.

In this example I will use the name scu, which is short for Service Channel User.

First we need to create the scu user on our OpenSolaris storage head.

root@ss1:~# useradd -s /bin/bash -d /export/home/scu -P ‘ZFS File System Management’ scu
root@ss1:~# mkdir /export/home/scu
root@ss1:~# cp /etc/skel/* /export/home/scu
root@ss1:~# echo PATH=/bin:/sbin:/usr/ucb:/etc:. > /export/home/scu/.profile
root@ss1:~# echo export PATH >> /export/home/scu/.profile
root@ss1:~# echo PS1=$’${LOGNAME}@$(/usr/bin/hostname)’~#’ ‘ >> /export/home/scu/.profile

root@ss1:~# chown –R scu /export/home/scu
root@ss1:~# passwd scu

In order to use an RSA key for authentication we must first generate an RSA private/public key pair on the storage head. This is performed using ssh-keygen while logged in as the scu user. You must set the passphrase as blank otherwise the session will prompt for it.

root@ss1:~# su – scu

scu@ss1~#ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/export/home/scu/.ssh/id_rsa):
Created directory ‘/export/home/scu/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /export/home/scu/.ssh/id_rsa.
Your public key has been saved in /export/home/scu/.ssh/
The key fingerprint is:
0c:82:88:fa:46:c7:a2:6c:e2:28:5e:13:0f:a2:38:7f scu@ss1

We now have the public key available in the file named the content of this file must be copied to the target ssh instance file named .ssh/authorized_keys. The private key file named id_rsa MUST NOT be exposed to any other location and should be secured. You do not need to store the private key anywhere else as you can regenerate the pair anytime if required.

Before we can continue we must install and configure the target Service Channel VM with MobaSSH.

Its a simple setup, just download MobaSSH Pro to the target local file system.

Execute it.

Click install.

Configure only the scu domain based user and clear all others from accessing the host.


Moba Domain Users

Once MobaSSH is installed and restarted we can connect to it and finalize the secured ssh session. Don’t forget to add the scu user to your AD domains BUILTINAdministrators group before proceeding.  Also you need to perform an initial NT login to the Service Channel Windows VM using the scu user account prior to using the SSH daemon, this is required to create it’s home directories.

In this step we are using  putty to establish an ssh session to the Service Channel VM and then secure shelling to the storage server named ss1. Then we transfer the public key back to our self using scp and exit host ss1. Finally we use cat to append the public key file content to our  .ssh/authorized_key file in the scu users profile. Once these steps are complete we can establish an automated prompt less secured encrypted session from ss1 to the Service Channel Windows NT VM.

[Fri Dec 18 - 19:47:24] ~
[scu.ws0] $ ssh ss1
The authenticity of host ‘ss1 (’ can’t be established.
RSA key fingerprint is 5a:64:ea:d4:fd:e5:b6:bf:43:0f:15:eb:66:99:63:6b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘ss1,′ (RSA) to the list of known hosts.
Last login: Fri Dec 18 19:47:28 2009 from
Sun Microsystems Inc.   SunOS 5.11      snv_128 November 2008

scu@ss1~#scp .ssh/ ws0:/home/scu/.ssh/
scu@ws0′s password:           100% |*****************************|   217       00:00

[Fri Dec 18 - 19:48:09]
[scu.ws0] $ cat .ssh/ >> .ssh/authorized_keys

With our automated RSA key password definition completed we can proceed to customize the MobaSSH service instance to run as the scu user. We need to perform this modification in order to enable VB script WMI DCOM impersonate caller rights when instantiating objects. In this case we are calling a remote regedit object over WMI and modifying the NTDS service registry start up values and thus this can only be performed by an administrator account. This modification essentially extends the storage hosts capabilities to reach any Windows host that need integral system management function calls.

On our OpenSolaris Storage head we need to invoke a script which will remotely change the NTDS service state and then locally snapshot the provisioned storage  and lastly return the NTDS service back to a normal state.  To accomplish this function we will define a cron job. The cron job needs some basic configuration steps as follows.

The is required to submit a cron job, this allows us to create the job but not administer the cron service.
If an /etc/cron.d/cron.allow file exists then this RBAC setting will be overridden by the files existence and you will need to add the user to that file or convert to the best practice methods of RBAC.

root@ss1~# usermod -A scu
root@ss1~# crontab –e scu
59 23 * * * ./

Hint: crontab uses vi –  “vi cheat sheet”

The key sequence would be hit “i” and key in the line then hit “esc :wq” and to abort “esc :q!”

Be aware of the timezone the cron service runs under, you should check it and adjust it if required. Here is a example of whats required to set it.

root@ss1~# pargs -e `pgrep -f /usr/sbin/cron`

8550:   /usr/sbin/cron
envp[0]: LOGNAME=root
envp[1]: _=/usr/sbin/cron
envp[2]: LANG=en_US.UTF-8
envp[3]: PATH=/usr/sbin:/usr/bin
envp[4]: PWD=/root
envp[5]: SMF_FMRI=svc:/system/cron:default
envp[6]: SMF_METHOD=start
envp[7]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[8]: SMF_ZONENAME=global
envp[9]: TZ=PST8PDT

Let’s change it to CST6CDT

root@ss1~# svccfg -s system/cron:default setenv TZ CST6DST

Also the default environment path for cron may cause some script “command not found” issues, check for a path and adjust it if required.

root@ss1~# cat /etc/default/cron
# Copyright 1991 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#pragma ident   “%Z%%M% %I%     %E% SMI”

This one has no default path, add the path using echo.

root@ss1~# echo PATH=/usr/bin:/usr/sbin:/usr/ucb:/etc:. > /etc/default/cron
# svcadm refresh cron
# svcadm restart cron

With a cron job defined to run the script named in the default home directory of the scu user we are now ready to create the script content. Our OpenSolaris storage host needs to call a batch file on the remote Service Channel VM and it will execute  a vbscript from there to set the NTDS start up mode . To do this from a unix bash script we will use the following statements in the file.

ssh -t ws0 NTDS-PreSnapshot.bat
snap_date=”$(date +%d-%m-%y-%H:%M)”
pfexec zfs snapshot rp1/san/vol1@$snap_date
ssh -t ws0 NTDS-PostSnapshot.bat

Here we are running a secure shell call to the MobaSSH daemon with a -t option which runs the tty screen locally and this allows use to issue an “exit” from the remote calling script closing the secure shell. On the Service Channel VM the followng batch file vbscript calls are executed using the pre and post batch files illustrated as follows.

scu Batch Files

cscript NTDS-SnapshotRestoreModeOn.vbs DS0

cscript NTDS-SnapshotRestoreModeOff.vbs DS0


strComputer = Wscript.Arguments(0)  
const HKLM=&H80000002
Set oregService=GetObject(“WinMgmts:{impersonationLevel=impersonate}!\” & strComputer & “rootdefault:stdRegProv”)
oregService.SetDWordValue HKLM, “SYSTEMCurrentControlSetServicesntdsparameters”, “Database restored from   backup”, 1
Set oregService=Nothing


strComputer = Wscript.Arguments(0)  
const HKLM=&H80000002
Set oregService=GetObject(“WinMgmts:{impersonationLevel=impersonate}!\” & strComputer & “rootdefault:stdRegProv”)
oregService.SetDWordValue HKLM, “SYSTEMCurrentControlSetServicesntdsparameters”, “Database restored from   backup”, 0
Set oregService=Nothing

We now have Windows integrated storage volume snapshot functionality that allows an Active Directory domain controller to be securely protected using a snapshot strategy. In the event we need to fail back to a previous point in time there will be no danger that the snapshot will cause AD corruption. The integration process has other highly desirable capabilities such as the ability to call VSS snapshots and any other application backup preparatory function calls.  We could also branch out using more sophisticated PowerShell calls to VMware hosts in a fully automated recovery strategy using ZFS replication and remote sites.

Hope you enjoyed this entry.

Seasons Greetings to All.




Site Contents: © 2009  Mike La Spina

Controlling Snapshot Noise

The ability to perform file system, database and volume snapshots grants us many data protection benefits. However there are some serious problems that can occur if we do not carefully architect snapshot based storage infrastructures. This blog entry will discuss some of the issues with data noise induction and data integrity when using point in time data snapshot activities  and how we can reduce the negative aspects of these data protection methods.

With the emergence of snapshot technology in the data center data noise induction is an unwanted by product and needs to addressed. Active data within a file store or raw volume will have significant amounts of temporary data like  memory swaps and application temp files. This type of data is required for system operations but unfortunately it is completely useless within a point in time snapshot and simply consumes valuable storage space with no permanent value within the scope of system data protection. There are many sources of this undesirable data noise that we need to consider and define strategies to isolate and eliminate them where possible.

In some cases using raw stores eg. iSCSI, FC etc. we will have duplicate snapshot functionality points in the storage stream and this further complicates how we approach a solution to noise induction issues. One common example of snapshot functionality duplication is Microsoft Windows 2003 Volume Snapshot Service aka. VSS. If we enable VSS and an external snapshot service is employed then we are now provisioning snapshots of snapshots which of course is less than optimal because much of the delta between points in time are just redundant encapsulated data. There are some higher level advantages to allowing this to occur like provisioning self service end user restores and VSS aware file system quiescence but for the most part it is not optimal from a space consumption or performance and efficiency  perspective.

If we perform snapshots at multiple points in the data storage stream using VSS we will have three points of data delta. The changed data elements on the source store of the primary files, a copy-on-write set of  changed blocks of the same primary store including it’s meta data and finally the external snapshot delta and it’s meta data. As well if the two snapshot events were to occur at the same time it creates a non-integral copy of a snapshot and meta data which is just pure wasted space since it is inconsistent.

With the co-existing use of VSS we need to define what functionality is important to us. For example VSS is limited to 512 snapshots and 64 versions so if we need to exceed these limits we have to employ an external snapshot facility. Or perhaps we need to allow a user self service file restore functionality from a shared folder. In many cases this functionality is provided by the external snapshot provisioning point. OpenSolaris, EMC and NetApp are some examples of storage products that can provide such functionality. Of course my preference is Custom OpenSolaris storage servers or the S7000 series of storage product which is based on OpenSolaris and is well suited for the formally supported side of things.

Solely provisioning the snapshots externally verses native MS VSS can significantly reduce induced data noise if the external provider supports VSS features or provides tools to control VSS. VSS copy on write snapshot capability should not be active when using this strategy so as to eliminate the undesirable snapshot echo noise. Most environments will find that they require the use of snapshot services that exceed the native MS VSS capabilities.  Provisioning the snapshot function directly on a shared storage system is a significantly better strategy verses allowing a distributed deployment of storage management points across your infrastructure.

OpenSolaris and ZFS provides superior depth in snapshot provisioning than Microsoft shared folder snapshot copy services. Implementing ZFS dramatically reduces space consumption and allows snapshots up to the maximum capacity of the storage system and OpenSolaris provides MS SMB client access to the snapshots which users can manage recovery as a self service. By employing ZFS as a backing store, snapshot management is simplified and snapshots are  available for export to alternate share points by cloning and provisioning the point in time to data consumers performing a multitude of desirable tasks such as audits, validation, analysis and testing.

If we need to employ MS VSS snapshot services provisioned on a storage server that uses snapshot based data protection strategies  then we will need prevent  continuous snapshots on the storage server. We can use features like snap mirror and zfs replication to provision a replica of the data however this would need to be strictly limited in count e.g. 2 or 3 versions and timed to avoid a multiple system snapshot time collision. Older snapshots should be purged and we only allow the MS VSS snapshot provisioning to keep the data deltas.

Another common source of snapshot noise is temporary file data or memory swaps. Fortunately with this type of noise the solution is relatively easy to solve as we simply isolate this type of storage onto storage volumes or shares that are explicitly excluded from a snapshot service function. For example if we are using VMFS stores we can place vswp files on a designated VMFS volume and conversely within an operating system we can create a separate vmdk disk that maps to a VMFS volume which we also exclude from the snapshot function. This strategy requires that we ensure that any replication scheme incorporates the creation or one time replication of these volumes. Unfortunately this methodology does not play well with storage vmotion so one must ensure that the a relocation does not move the noisy vmdk’s back into the snapshot service provisioned stores.

VMware VMFS volume VM snapshots is a significant source of data noise. When a snapshot is initiated from within VMware all data writes are placed on delta file instances. These delta files will be captured on the external storage systems snapshot points and will remain there after the VM snapshot is removed. Significant amount of data delta are produced by VM based snapshots and sometimes mulitple deltas can exceed original vmdk size. An easy way to prevent this undesirable impact is to clone the VM to a store outside the snapshot provisioned stores rather than invoking snapshots.

Databases are probably the most challenging source of snapshot noise data and requires a different strategy than isolation because the data within a specific snapshot is all required to provide system integrity. For example we cannot isolate SQL log data because it is required to do a crash recovery or to roll forward etc.  We can isolate temp database stores since any data in those date stores would not be valid in a crash recovery state.

One strategy that I use as both a blanket method when we are not are able to use other methods and in concert with the previously discussed isolation methods is a snapshot roll-up function. This strategy simply reduces the number of long term snapshot copies that are kept. The format is based on a Grand Father, Father and Son (GFS) retention chain of the snapshot copies and is well suited for a variety of data types. The effect is to provide a reasonable amount of data protection points to satisfy most computing environments and keep the captured noise to a manageable value. For example if we were to snapshot without any management cycle every 15 minutes we would accumulate ~35,000 delta points of data over the period of 1 year. Conversely if we employ the GFS method we will accumulate 294 delta points of data over the period of 1 year. Obviously the consumption of storage resource is so greatly reduced that  we could keep many additional key points in time if we wished and still maintain a balance of recovery point verses consumption rate.

Let’s take a look at a simple real example of how snapshot noise can impact our storage system using VMware, OpenSolaris and ZFS block based iSCSI volume snapshots. In this example we have a simple Windows Vista VM that is sitting idle, in other words only the OS is loaded and it is power on and running.

First we take a ZFS snapshot of the VMFS ZFS iSCSI volume.

zfs snapshot sp1/ss1-vol0@beforevmsnap

Now we invoke a VMware based snapshot and have a look at the result.

root@ss1:~# zfs list -t snapshot
NAME                               USED  AVAIL  REFER  MOUNTPOINT
sp1/ss1-vol0@beforevmsnap          228M      -  79.4G  -

Keep in mind that we are not modifying any data files in this VM if we were to change data files the deltas would be much larger and in many cases with multiple VMware snapshots could exceed the VMs original size if it is allowed to remain for long periods of weeks and longer. The backing store snapshot initially consumes 228MB which will continue to grow as changes occurs on the volume. A significant part of the 228MB is the VMs memory image in this case and of course it has no permanent storage value.

sp1/ss1-vol0@after1stvmsnap          1.44M      -  79.5G  -

After the initial VMware snapshot occurs we create a new point in time ZFS snapshot and here we observe some noise in the next snapshot and again we have not changed any data files in the last minute or so.

sp1/ss1-vol0@after2ndvmsnap       1.78M      -  79.5G  -

And yet another ZFS snapshot a couple of minutes later shows more snapshot noise accumulation. This is one of the many issues that are present when we allow non-discretionary placement of files and temporary storage on snapshot based systems.

Now lets see the impact of destroying a snapshot that was created after we delete the VMware based snapshot.

root@ss1:~# zfs destroy sp1/ss1-vol0@beforevmsnap
root@ss1:~# zfs list -t snapshot
NAME                               USED  AVAIL  REFER  MOUNTPOINT
sp1/ss1-vol0@after1stvmsnap       19.6M      -  79.2G  -
sp1/ss1-vol0@after2ndvmsnap       1.78M      -  79.2G  -

Here we observe the reclamation of more than 200MB of storage. And this is why GFS based snapshot rollups can provide some level of noise control.

Well I hope you found this entry to be useful.

Til next time..





Site Contents: © 2009  Mike La Spina

Multi Protocol Storage Provisioning with COMSTAR

 COMSTAR is a new breed of open source storage product available to the world. What was traditionally a closed and proprietary storage capability is now available to our open source communities. With OpenSolaris and COMSTAR the ability to freely provision virtual storage services over very mature high end protocols on standard commodity server hardware is now a reality. High performance transports are integral within the feature sets of COMSTAR and Sun’s open source portfolio of projects. The COMSTAR product is revolutionary in its method of provisioning storage virtualization and transport services to storage resource consumers.
COMSTAR provisions virtualized SCSI block storage over multiple SCSI transport protocols. While this function class is not new to us the ease of implementation using COMSTAR certainly is. All the complexities of using a multi protocol target services platform are cleaned up. It is simple to use and facilitates advanced high performance storage provisioning at the block level.
The services within this product have multiple common storage provisioning applications. One very interesting application is a storage gateway server and this blog demonstrates howto build a Fiber Channel (FC) storage gateway using the COMSTAR service layers and as well provision some additional features using the target services.


 COMSTAR FC Gateway Architecture by Mike La Spina

In this example instance we are re-provisioning an existing storage system with an OpenSolaris COMSTAR configuration running on a commodity white box which functions as a storage server head that can compress, scrub, thin provision, replicate, snapshot and clone the existing block storage attachments. The example FC based storage could also be comprised of a JBOD FC array directly attached to the OpenSolaris storage head if we so desired or many other commonly available SCSI attachment methods. The objective here is to extend and enhance any block storage system with high performance transports and virtualization features. Of course we could also formalize the white box to an industrial strength host once we are satisfied that the proof of concept is mature and optimal.  

The reality is that many older existing FC storage systems are installed without these features primarily due to the excessive licensing costs of them. And even when these features are available, its use is probably restricted to like proprietary systems thus obsolescing the entire lot of any useful future functionality. But what if you could re-purpose an older storage system to act as a DR store or backup cache system or maybe a test and development environment. With today’s economy this is from a cost perspective, very attractive and can be accomplished with very little risk on the investment side.

One of the possible applications for this flexible storage service is the re-provisioning of existing LUN’s from an existing system to newer more flexible SCSI transport protocols. This is particularly useful when we need to re-target the existing storage system from FC to iSCSI or the likes of. We can begin by exploring this functionality and explain how COMSTAR can provide us with this service.

First we need to understand the high level functionality of the COMSTAR service layers. Virtual LUN’s on COMSTAR are provisioned with a service layer named the LU provider. This layer maps backing stores of various types to a storage GUID assignment and additionally defines other properties like the LUN ID and size dimensions. This layer allows us to carve out the available block storage devices that are accessible on our OpenSolaris storage host. For example if we attached an FC Initiator to an external storage system we can then map the accessible SCSI block devices to the LU provider layer and then present this virtualized LUN to the other COMSTAR service layers for further processing.

Once we have defined the LU’s we can present this storage resource to the SCSI Target Mode Framework Service (STMF) layer which acts as the storage gate keeper. At this layer we define which clients (initiators) can connect to the LU’s based on Membership of Target Groups and Host Groups that are assigned logical views of the LU(s). The STMF layer routes the defined LU(s) as SCSI targets over a multiprotocol interface connection pool to a Port Provider. Port Providers are the protocol connection service instances which can be the likes of FC, iSCSI, SAS, iSER, FCoE and so on. 

With these COMSTAR basics in mind let us begin by diving into some of the details of how this can be applied.  

Sun has detailed howto setup COMSTAR at so no need to re-invent the wheel here.

Just as a note SXCE snv_103 and up integrate the COMSTAR FC and iSCSI port provider code. With the COMSTAR software components and FC target setup we can demonstrate the re-provisioning of an existing FC based storage server. Since I don’t have the luxury of having a proprietary storage server at home I will emulate this storage using an additional COMSTAR white box to act as the FC storage target to be re-provisioned.

On the existing FC target system we need to create Raid0 arrays of three disks each which will total up to a set of six trios. We will use these six non-fault tolerant disk groups as vdevs for a ZFS raidz2 group. This will allow us to create fault tolerant arrays from the existing storage server. The reasons for sets of three Raid0 groupings are to reduce the possibility of reaching the LUN maximums of the proprietary storage system and also we do not want to erode the performance by layering Raid 5 groups. As well we can tolerate a disk failure in two of the trios since we have Raidz2 across the Raid0 trio groups. Additionally using these Raid0 disk groups actually lowers the array failure probability rate. For example if a second disk were to failure in a single Raid0 set there would be no additional impact to other trios, thus reducing the overall failure rate. 

To create the emulated FC storage system I have defined the following 16G ZFS sparse volumes respectively named trio1 through trio6 each as a representation of the 3 disk Raid0 spanned LUN on a source storage host named ss1. 

root@ss1:~# zfs create sp1/gw
root@ss1:~# zfs create -s -V 16G sp1/gw/trio1
root@ss1:~# zfs create -s -V 16G sp1/gw/trio2
root@ss1:~# zfs create -s -V 16G sp1/gw/trio3
root@ss1:~# zfs create -s -V 16G sp1/gw/trio4
root@ss1:~# zfs create -s -V 16G sp1/gw/trio5
root@ss1:~# zfs create -s -V 16G sp1/gw/trio6

Once these mockup volumes are created they are then defined as backing stores using the sbdadm utility as follows.

root@ss1:~# sbdadm create-lu /dev/zvol/rdsk/sp1/gw/trio1

Created the following LU:

              GUID                    DATA SIZE           SOURCE
——————————–  ——————-  —————-
600144f01eb3862c0000494b55cd0001      17179803648      /dev/zvol/rdsk/sp1/gw/trio1

All the backing stores were added to the LU provider service layer to which in turn were assigned to the STMF service layer. Here we can see the automatically generated GUID’s that are assigned to the ZFS backing stores.

root@ss1:~# sbdadm list-lu

Found 6 LU(s)

              GUID                    DATA SIZE           SOURCE
——————————–  ——————-  —————-
600144f01eb3862c0000494b56000006      17179803648      /dev/zvol/rdsk/sp1/gw/trio6
600144f01eb3862c0000494b55fd0005      17179803648      /dev/zvol/rdsk/sp1/gw/trio5
600144f01eb3862c0000494b55fa0004      17179803648      /dev/zvol/rdsk/sp1/gw/trio4
600144f01eb3862c0000494b55f80003      17179803648      /dev/zvol/rdsk/sp1/gw/trio3
600144f01eb3862c0000494b55f50002      17179803648      /dev/zvol/rdsk/sp1/gw/trio2
600144f01eb3862c0000494b55cd0001      17179803648      /dev/zvol/rdsk/sp1/gw/trio1

A host group was defined named GW1 and respectively these LU GUID’s were added to the GW1 host group as LU views assigning LUN 0 to 5.

Just as a note the group names are case sensitive. 
root@ss1:~#stmfadm create-hg GW1

Here we assigned the GUID’s a LUN value on the GW1 host group with the -n parm.   

root@ss1:~# stmfadm add-view -h GW1 -n 0 600144F01EB3862C0000494B55CD0001
root@ss1:~# stmfadm add-view -h GW1 -n 1 600144F01EB3862C0000494B55F50002
root@ss1:~# stmfadm add-view -h GW1 -n 2 600144F01EB3862C0000494B55F80003
root@ss1:~# stmfadm add-view -h GW1 -n 3 600144F01EB3862C0000494B55FA0004
root@ss1:~# stmfadm add-view -h GW1 -n 4 600144F01EB3862C0000494B55FD0005
root@ss1:~# stmfadm add-view -h GW1 -n 5 600144F01EB3862C0000494B56000006

With the LU’s now available in a host group view we can add the COMSTAR re-provisioning gateway server FC wwn’s to this host group and it will become available as a storage resource on the re-provisioning gateway server named ss2. We need to obtain the wwn from the gateway server using the fcinfo hba-port command.  
root@ss2:~# fcinfo hba-port
HBA Port WWN: 210000e08b100163
        Port Mode: Initiator
        Port ID: 10300
        OS Device Name: /dev/cfg/c8
        Manufacturer: QLogic Corp.
        Model: QLA2300
        Firmware Version: 03.03.27
        FCode/BIOS Version:  BIOS: 1.47;
        Serial Number: not available
        Driver Name: qlc
        Driver Version: 20080617-2.30
        Type: N-port
        State: online
        Supported Speeds: 1Gb 2Gb
        Current Speed: 2Gb
        Node WWN: 200000e08b100163
        NPIV Not Supported

Using the stmfadm utility we add the gateway server’s wwn address to the GW1 host group. 
root@ss1:~# stmfadm add-hg-member -g GW1 wwn.210000e08b100163

Once added to ss1 we can see that it is indeed available and online. 
root@ss1:~# stmfadm list-target -v

Target: wwn.2100001B320EFD58
    Operational Status: Online
    Provider Name     : qlt
    Alias             : qlt2,0
    Sessions          : 1
        Initiator: wwn.210000E08B100163
            Alias: :qlc1
            Logged in since: Fri Dec 19 01:47:07 2008

The cfgadm command will scan for the newly available LUN’s and now we can access the emulated (aka boat anchor) storage system using our gateway server ss2. Of course we could also set up more initiators and access it over a multipath connection.  

cfgadm -a

root@ss2:~# format
Searching for disks…done

       0. c0t600144F01EB3862C0000494B55CD0001d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>

       1. c0t600144F01EB3862C0000494B55F50002d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>

       2. c0t600144F01EB3862C0000494B55F80003d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>

       3. c0t600144F01EB3862C0000494B55FA0004d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>

       4. c0t600144F01EB3862C0000494B55FD0005d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>

       5. c0t600144F01EB3862C0000494B56000006d0 <DEFAULT cyl 2086 alt 2 hd 255 sec 63>

Now that we some FC LUN connections configured from to the storage system to be re-provisioned we can create a ZFS based pool which grants us the ability to carve out the block storage in a virtual manner. As discussed previously we will use raid dp a.k.a. raidz2 to provide a higher level of availability with the zpool create raidz2 option command.

root@ss2:~# zpool create gwrp1 raidz2 c0t600144F01EB3862C0000494B55CD0001d0 c0t600144F01EB3862C0000494B55F50002d0 c0t600144F01EB3862C0000494B55F80003d0 c0t600144F01EB3862C0000494B55FA0004d0 c0t600144F01EB3862C0000494B55FD0005d0 c0t600144F01EB3862C0000494B56000006d0

A quick status check reveals all is well with the ZFS pool.

root@ss2:~# zpool status gwrp1
  pool: gwrp1
 state: ONLINE
 scrub: none requested

        NAME                                       STATE     READ WRITE CKSUM
        gwrp1                                      ONLINE       0     0     0
          raidz2                                   ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55CD0001d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55F50002d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55F80003d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55FA0004d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B55FD0005d0  ONLINE       0     0     0
            c0t600144F01EB3862C0000494B56000006d0  ONLINE       0     0     0

Let’s carve out some of this newly created pool as a 32GB sparse volume. The -p option creates the full path if it does not currently exist.

root@ss2:~# zfs create -p -s -V 32G gwrp1/stores/lun0

root@ss2:~# zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
gwrp1                        220K  62.6G  38.0K  /gwrp1
gwrp1/stores                67.9K  62.6G  36.0K  /gwrp1/stores
gwrp1/stores/lun0           32.0K  62.6G  32.0K  -

With a slice of the pool created we can now assign a GUID within the LU Provider layer using the sbdadm utility.

root@ss2:~# sbdadm create-lu /dev/zvol/rdsk/gwrp1/stores/lun0

Created the following LU:

              GUID                    DATA SIZE           SOURCE
——————————–  ——————-  —————-
600144f07ed404000000496813070001      34359672832      /dev/zvol/rdsk/gwrp1/stores/lun0

The LU Provider layer can also provision sparse based storage. However in this case the ZFS backing store is already thin provisioned. If this were a physical disk backing store it would be prudent to use the LU Provider’s sparse/thin provisioning feature. At this point we are ready to create the STMF Host Group and View that will be used to demonstrate a real world example of the multi protocol capability with the COMSTAR OpenStorage ss2 host. In this case I will use VMware ESX as a storage consumer. To reflect the host group type we will name it ESX1 and then we need to add a view for the LU GUID of the virtualized storage.

root@ss2:~# stmfadm create-hg ESX1

root@ss2:~# stmfadm add-view -h ESX1 -n 1 600144f07ed404000000496813070001

root@ss2:~# stmfadm list-view -l 600144F07ED404000000496813070001
View Entry: 0
    Host group   : ESX1
    Target group : All
    LUN          : 1

With a view defined for the VMware hosts let’s add an ESX host FC HBA wwn membership to the defined ESX1 host group. We need to retrieve the wwn from the VMware server using either the console or a Virtual Infrastructure Client GUI. Personally I like the console esxcfg-info tool, however if it’s an ESXi host then the GUI will serve the info just as well.

VMware Screen shot WWN by Mike La Spina

[root@vh1 root]# esxcfg-info -s | grep ‘Adapter WWN’
                     |—-Adapter WWNN…………………………20:00:00:e0:8b:01:f7:e2

root@ss2:~# stmfadm add-hg-member -g ESX1 wwn.210000e08b01f7e2

And the result of this change after we issue a rescan on vmhba1 and create a VMFS volume named ss2-cstar-zs0.0 with the re-provisioned storage is reflected here.

VMware Screen shot VMFS volume by Mike La Spina

This crafted storage is now a thinly provisioned VMFS store that can deliver replication, snapshots, cloning, advanced error detection and can also be re-platformed to a new storage system at a later date using ZFS’s hardware autonomy. The storage server is very attractive as it creates a level of future proofing and insulates the storage consumers from proprietary vendor lock in. But that’s not the best part of this example. Let’s say you wish provide different tiers of connectivity services for your storage consumers. For example we could attach a development or test environment using an iSCSI protocol and the more critical environments can use FC or FCoE based protocol.
So let’s look at how we can add a second SCSI transport protocol to this interesting configuration.
Just as a note the new iSCSI port provider is a kernel based implementation and has superior performance to its predecessor iscsitgt user land implementation.

To add the iSCSI protocol we need to enable the iscsi/target port provider service.





root@ss2:~# svcadm enable iscsi/target

Now we need to create an iSCSI target and iSCSI initiator definition so that we can add the iSCSI initiator to the ESX1 host group. As well we should define a target portal group so we can control what host IP(s) will service this target.

root@ss2:~# itadm create-tpg 2

root@ss2:~# itadm create-target

root@ss2:~# itadm create-target -n -t 2
Target successfully created

By default the iqn will be created as a member of the All targets group.

If we left out the parameters the itadm utility would create an iqn GUID and use the default target portal group of 1. And yes for those familiar with the predecessor iscsitadm utility we can now create a iqn name at the command line.

At this point we need to define the initiator iqn to the iSCSI port provider service and if required additionally secure it using CHAP. We need to retrieve the VMware initiator iqn name from either the Virtual Infrastructure Client GUI or console command line. Just as a note if we did not specify a host group when we defined our view the default would allow any initiator FC, iSCSI or otherwise to connect to the LU and this may have a purpose but generally it is a bad practice to allow in most configurations. Once created the initiator is added to the ESX1 host group thus enables our second access protocol to the same LU.

[root@vh1 root]# esxcfg-info -s | grep ‘iqn’
         |—-ISCSI Name……………………………………
         |—-ISCSI Alias……………………………………

root@ss2:~# itadm create-initiator

root@ss2:~# stmfadm add-hg-member -g ESX1

After adding the ss2 iSCSI interface IP to VMware’s Software iSCSI initiator we now have a multipath multiprotocol connection to our COMSTAR storage host.

 VMware iqn example By Mike La Spina

VMware mpath example by Mike La Spina 

This is simply the most functional and advanced Open Source storage product in the world today. Here we have commodity white boxes serving advanced storage protocols in my home lab, can you imagine what could be done with Data Center class server hardware and Fishworks. You can begin to see the advantages of this future proof platform. As protocols like FCoE, Infiniband and iSER (iSCSI without the TCP session overhead) already working in COMSTAR the Sun Software Engineers and OpenSolaris community are crafting outstanding storage products.

Hope you found this blog to be interesting.







Site Contents: © 2009  Mike La Spina

ZFS Snapshot Rollup Bash Script

As a follow on to my blog entry Provisioning Disaster Recovery with ZFS, iSCSI and VMware I created this snapshot rollup script to help maintain the growing snapshots and minimize disk consumption. The script is an add-on to the zfsadm account cron jobs and runs under the security privileges of the zfsadm user detailed in that blog. An input text file is used to specify what ZFS path’s will be rolled up to a Grandfather Father Son backup scheme. All out of scope snapshots are destroyed leaving the current day’s and week’s snapshots, Friday weekly snapshots of the current month, each month’s end and as well, in time the year end snapshots. The cron job needs to run at minimum on target host but it would be prudent to run it on both systems. The script is aware of the possiblity that a snapshot may be cloned and will detect and log it. To add the job is simply a matter of adding it to the zfsadm users crontab.

# crontab –e zfsadm

0 3 * * * ./ zfsrollup.lst

Hint: crontab uses vi –  “vi cheat sheet”

The key sequence would be hit “i” and key in the line then hit “esc :wq” and to abort “esc :q!”

The job detailed here will run once a day at 3:00 AM which may need to be extended if you have a very slow link between the servers. If you intend to use this script as shown you should follow the additional details for adding a cronjob found in the original blog, items like time zone and the likes of are discussed there.

As well the script expects the gnu based versions of date and expr.

Here are the two files that are required


Hopefully you will find it to be useful.




Site Contents: © 2008  Mike La Spina

Provisioning Disaster Recovery with ZFS, iSCSI and VMware

OpenSolaris, ZFS, iSCSI and VMware are a great combination for provisioning Disaster Recovery (DR) systems at exceptionally low cost. There are some fundamentally well suited features of ZFS and VMFS volumes that provide a relatively simply and very efficient recovery process for VMware hosted non-zero RPO crash consistent recovery based environments. In this weblog I will demonstrate this capability and provide some step by step howto’s for replicating a ZFS, iSCSI and VMFS VMware based environment securely over a WAN or whatever you may have to a single ESXi remote server hosting a child OpenSolaris VM which provisions ZFS and iSCSI VMFS LUN’s to the parent ESXi host. The concept is to wrap the DR services into a single low cost self contained DR box that can be expanded out in the event of an actual DR incident while allowing for regular testing and validation processing without the high costs normally associated with standby DR systems. As one would expect this method becomes a very appealing solution for small to medium businesses who would normally abstain from DR provisioning activity due to the inherently high cost and complexity of DR.

The following diagram illustrates this DR system architecture.


 Disaster Recovery Architecture by Mike La Spina

When we have VMFS volumes backed by iSCSI based ZFS targets we gain the powerful replication capability of ZFS send and receive commands. This ZFS feature procures the ability to send an entire VMFS volume by way of a raw iSCSI target ZFS backing store. And once sent initially we can base all subsequent sends as a delta of change from a previous send snapshot which are respectively referred as snapshot deltas. Thus if we initially snapshot an iSCSI backing store and send the stream to a remote ZFS file system we can then send all the changed object data from that previous snapshot point to the current snapshot point and whatever else may be in between those snapshots. The result is a constant update of VMFS changes from the source ZFS file system to the remote ZFS file system of which can be completely different hardware. This ZFS hardware autonomy gift allows us to provision a much lower cost system on the DR remote side to host the VMFS volumes. For example the target system which is presented in this weblog is an IBM x3500 and the source is a SUN X4500 detailed in a previous blog.

There are some important considerations that should be kept in mind when we use snapshots to create any DR process. One of the most important areas to consider is the data change rates on the VMFS volumes that are to be included in the DR send/receive process. When we have VMware servers or VM’s that have low memory allocations (a.k.a. over committed memory) or application behaviors that swap to disk frequently we will observe high volumes of what I call disk noise or disk data change that has no permanent value. High amounts of disk noise will consume more storage and bandwidth on both systems when snapshots are present. In cases where the disk noise reaches a rate of 1GB/Day or more per volume it would be prudent to isolate the noise sources on a VMFS volume that will not be part of the replication strategy. You could for example create a VMFS LUN for swap and temp files on the local ESX host which can be ignored in the replication scope. Another important area is the growth rate of the overall storage may require routine pruning of older snapshots to reduce the total consumption of disk. For example if we have high change rates from database sources which can not be isolated we can at monthly intervals destroy all but one of the last months snapshots to conserve the available storage on both systems. This method still provisions a good DR process and as well provides a level of continuous data protection (CDP) and is simmilar to a grandfather/father/son preservation scheme.

Since we are handling valuable information we must use secure methods to access and handle the data transfers. This can be provisioned by using ssh and dedicated service accounts that will perform this one specific function. ZFS send and receive functions use an inherently secure approach by employing ssh as a transport tunnel when transmitting storage data to the target ZFS file system. This is just what we need to provision a secure exchange to a DR environment. Conversly we could use IPSec but this would be significantly more complex to achieve and complexity is not a good this when short implementation time is a priority.With this explanation wrapped up in our minds lets begin some of the detailed tasks that are required to procure this real world DR solution.

ESXi Server  

The DR VMware server is based on the free ESXi product and with this we can encapsulate the entire DR functionallity in one server hardware platform. Within the ESXi engine we need to install and configure an OpenSolaris 5.11 snv_98 or higher VM using VMFS as the storage provider. This ESXi server configuration consists of a single SATA boot LUN and this LUN also stores the OpenSolaris iSCSI VM. In addition to the boot LUN we will create the ZFS iSCSI stores on serveral additional SATA disks that are presented to the OpenSolaris VM as separate VMFS datastores which we will use to create large vmdk’s. The virtual vmdk disks will be assigned as vdev’s for our receiving ZFS zpool. Talk about rampent layering. At this point we have a OpenSolaris VM defined on a hardware platform on which OpenSolaris would normally never work with natively in this example. You have goto love what you can do with VMware virtualization. By the way when SUN’s xVM product is more mature it could provision the same fuctionallity with native ZFS provisioning and that alone really is worth a talk, but lets continue our focus on this platform for now.

There are many configuration options available on the network provisioning side of our ESXi host. In this case VLAN’s are definetly a solid choice for this application and is my prefered approach to controlling iSCSI data flow. We initially would only need to provide iSCSI access for the local OpenSolaris VM as this will provision a virtual SAN to the parent ESXi host. The parent ESXi host needs to be able mount the iSCSI target LUN’s that were available in the production environmant and validate that the DR process works. In the event of DR activation we would need to add external ESXi hosts and VLAN’s will provide both locally isolated iSCSI networks with easy expansion if these services are required externally all with out need to purchase external switch hardware for the system until it is required. Thus within the ESXi host we need to define a VLAN for the iSCSI SAN and an isolated VLAN for production VM validations and finally we need to define the replication and management network which can optionally use a VLAN or be untagged depending on your environment.

This virtualized DR environment grants advanced capabilties to perform rich system tests at commodity prices. Very attracive indeed. For example you can now clone the replicated VMFS LUN’s on the DR engine and with a liitle Solaris SMF iSCSI target service magic provision the clone as a duplicated ESX environment which does not impact the ongoing replication. As well we have network isolation and virtualization that allows the environment to exist in a closed fully functional remotely accessible world. This world can also be extended out as a production mirror test environment with dynamic revert back in time and repeat functionallity.

There are many possible ESXi network and disk configurations that would meet the DR server’s requirements. At the minimum we should provision the following elements.

  •  Provision a bootable single separate SATA disk with a minimum of 16G available for the VMFS LUN that will store the OpenSolaris iSCSI VM.
  •  Provision a minimum of three (optimally six) additional SATA disks or more if required as VMFS LUN’s to host the ZFS zpool vdev’s with vmdk’s.
  •  Provision a minimum of two 1Gb Ethernet adaptors, teamed would be preferable if more are available.
  •  Define vSwitch0 with a VLAN tagged VM Network portgroup to connect the replication side of the OpenSolaris iSCSI VM and a Service Console portgroup to manage the ESXi host.
  •  Define vSwitch1 with a VLAN tagged iSCSI VM kernel portgroup to service the iSCSI data plane and also define a VM Network portgroup on the same VLAN to connect with the target interface of the OpenSolaris iSCSI VM.
  •  Define the required isolated VLAN tagged identically named portgroups  as production on vSwitch0 and use a separated VLAN numbering set for them for isolation.
  •  Define the OpenSolaris VM with one adapter to connected to the production network portgroup and one adapter to attached to the iSCSI data plane portgroup to serve the iSCSI target IP.

Here is an example of what the VM disk assignments should look like.

VMFS LUN example by Mike La Spina

Once the ESXi server is successfully built and the Opensolaris iSCSI VM is installed and functional we can create the required elements for enabling ZFS replication.

Create Service Accounts

On the systems that will act as replication partners create zfsadm ID’s as service accounts using the provided commands.

# useradd -s /usr/bin/bash -d /export/home/zfsadm -P ‘ZFS File System Management’ zfsadm
# mkdir /export/home/zfsadm
# mkdir /export/home/zfsadm/backup
# cp /etc/skel/* /export/home/zfsadm
# echo PATH=/usr/bin:/usr/sbin:/usr/ucb:/etc:. > /export/home/zfsadm/.profile
# echo export PATH >> /export/home/zfsadm/.profile
# chown –R zfsadm /export/home/zfsadm
# passwd zfsadm

Note the parameter -P ‘ZFS File System Management’, This will grant the account an RBAC profile association to administratively manage our ZFS file system unlike root which is much too powerful and is all to often used by many of us. 

The next step is to generate some crypto keys for ssh connectivity we start this with a login as the newly created zfsadm user and run a secure shell locally to ensure you have a .ssh directory and key files created in the home drive for the zfsadm user. Note this directory is normally hidden. 

# ssh localhost
The authenticity of host ‘localhost (’ can’t be established.
RSA key fingerprint is 0c:aa:64:72:84:b5:04:1c:a2:d0:42:8e:9f:4e:09:9d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost’ (RSA) to the list of known hosts.
# exit

Now that we have the .ssh directory we can create a crypto key pair and configure a relatively secure login without the need to enter a password for the remote host using this account.

Do not enter a passphrase, it needs to be blank.

# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/export/home/zfsadm/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /export/home/zfsadm/.ssh/id_dsa.
Your public key has been saved in /export/home/zfsadm/.ssh/
The key fingerprint is:
bf:58:7b:97:8d:b5:d2:31:26:14:4c:9f:ce:72:a7:20 zfsadm@ss1

The id_dsa file should not be exposed outside of this directory as it contains the private key of the pair, only the public key file needs to be exported. Now that our key pair is generated we need to append the public portion of the key pair to a file named authorized_keys2. 

# cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys2

Repeat all the Create Service Accounts steps and crypto key steps on the remote server as well.

We will use the Secure Copy command to place the public key file on each opposing hosts zfsadm users home directory so that when the ssh tunnel is started the remote host can decrypt the encrypted connection request completing the tunnel which is generated with the private part of the pair. This is why we must protect the private part of the pair from exposure. Granted we have also defined an additional layer of security here by defining a dedicated user for the ZFS send activity it is very important that the private key is secured properly and it is not necessary to back it up as you can regenerate them if required.

From the local server here named ss1 (The remote server is ss2)

# scp $HOME/.ssh/ ss2:$HOME/.ssh/
Password:      100% |**********************************************|   603       00:00
# scp ss2:$HOME/.ssh/ $HOME/.ssh/
Password:      100% |**********************************************|   603       00:00
# cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys2

And on the remote server ss2

# ssh ss2
# cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys2
# exit

This completes the trusted key secure login configuration and you should be able to secure shell from either system to the other without a password prompt using the zfsadm account. To further limit security exposure we could employe ipaddress restrictions and as well enable a firewall but this is beyond the scope of this blog.

 Target Pool and ZFS rights

As a prerequisite you need to create the receiving zpool on the target to allow the zfs sends to occur. The receiving zpool name should be the same as the source to allow ease in the re-serving of iSCSI targets. Earlier we granted the “ZFS File System Management” profile to this zfsadm user. This RBAC profile allows us to run a pfexec command which pre checks what profiles the user is assigned and then executes appropriately based on this assignment. The bonus here is you do not have to create granular rights assignments to the ZFS file system.

On the target server create your receiveing zpool.

# zpool create rp1 <your vdev’s>

Create a Cron Job

Using a cron job we will invoke our ZFS snapshots and send tasks to the target host with the execution of a bash script named We need to use the crontab command to create a job that will execute it as the zfsadm user, no other user except root can access this job and that a good thing considering it has the ability to shell to another host!

As root add the zfs user name to the /etc/cron.d/cron.allow file.

# echo zfsadm >> /etc/cron.d/cron.allow
# crontab –e zfsadm
59 23 * * * ./ zfs-daily.rpl

Hint: crontab uses vi –  “vi cheat sheet”

The key sequence would be hit “i” and key in the line then hit “esc :wq” and to abort “esc :q!”

Be aware of the timezone the cron service runs under, you should check it and adjust it if required. Here is a example of whats required to set it. 

# pargs -e `pgrep -f /usr/sbin/cron`

8550:   /usr/sbin/cron
envp[0]: LOGNAME=root
envp[1]: _=/usr/sbin/cron
envp[2]: LANG=en_US.UTF-8
envp[3]: PATH=/usr/sbin:/usr/bin
envp[4]: PWD=/root
envp[5]: SMF_FMRI=svc:/system/cron:default
envp[6]: SMF_METHOD=start
envp[7]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[8]: SMF_ZONENAME=global
envp[9]: TZ=PST8PDT

Let’s change it to CST6CDT

# svccfg -s system/cron:default setenv TZ CST6DST

Also the default environment path for cron may cause some script “command not found” issues, check for a path and adjust it if required.

# cat /etc/default/cron
# Copyright 1991 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#pragma ident   “%Z%%M% %I%     %E% SMI”

This one has no default path, add the path using echo.

# echo PATH=/usr/bin:/usr/sbin:/usr/ucb:/etc:. > /etc/default/cron
# svcadm refresh cron
# svcadm restart cron

Create Snapshot Replication Script

Here is the link for the replication script you will need to grant exec rights to this file e.g.

# chmod 755 

The replcation script needs to live in the zfsadm home directory /export/home/zfsadm at this point I only have the one script built but other ones are in the works like a grandfather/father/son snapshot rollup script. The first run of the script can take a considerable amount of time depending on the available bandwidth and size of the VMFS luns. This cron job runs at midnight and took 6 hours over 100MB’s of bandwidth the first time and less that 5 min thereafter. A secondary script that runs hourly and is rolled up at days end would be beneficial. I will get it around to that one and the grandfather/father/son script later.

At this point we have an automated DR process that provides a form of CDP. But we do not have a way to access it so we need to perform some additional steps. In order for VMware to use the relocated VMFS iSCSI targets we need to reinstate some critical configuration info that was stored on the source Service Management Facility (SMF) repository. Within the iscsitgtd service properties we have the Network Address Authority (NAA) value which is named GUID in the properties list. This value is very important, when a VMFS is initialized the NAA is written to the VMFS volume header and this will need to be redefined on the DR target so that VMware will recognize the data store as available. If the NAA on the header and target do not match, the volume will not be visible to the DR VMware ESXi host. To protect this configuration info we need to export it from the source host and send it to the target host.

Export SMF iSCSI configuration

The iscstgtd service configuration elements can be easily exported using the following  command.

# svccfg export iscsitgt > /export/home/zfsadm/backup/ss1-iscsitgt.xml

Once exported to the backup directory we can Secure Copy this directory to the target system and this directory may also contain other useful info like installation instructions and so forth.

# scp ss1:/export/home/zfsadm/backup/* ss2:/export/home/zfsadm/backup/

This scp directory copy can be added to the crontab script after it is performed once manually as it requires an interactive key signature trust authorization alternately it can be done manually after a configuration change occurs. I prefer the automated method so it is included in the script.

SMF iscsitgt import and iSCSI configuration details

To import the production service we would issue the following commands.

# svcadm disable iscsitgt
# svccfg delete iscsitgt
# svccfg import /export/home/zfsadm/backup/ss1-iscsitgt.xml

Importing the iscsitgt service configuration is a simple task but it does have some elements that will be problematic if they are left unchecked. For example iSCSI Target Portal Group Tag values are included with the exported/inport function and thus you may need to change the portal groups values to correct discovery failure when the ip addresses are different on the target system. Another potential issue is leaving the existing SMF config in place and then importing the new one on top of it. This is not a best practice as you may create an invalid SMF for the iscsitgt service with elements that are orphaned out etc. The SMF properties will have the backing store path from the source server and if the target server does not have the same zpool name this will need to be fixed. And lastly make sure you have the same iscsitgtd version on each end since it will have potential changes between the versions.

You will also need to add the ESXi software initiator to the iSCSI target(s) on the receiving server and grant access with an acl entry and chap info if used.

# iscsitadm create initiator –iqn vh0.0
# iscsitadm modify target –acl vh0.0 ss1-zstore0

To handle a TPGT configuration change its simply a matter of re-adding them with the iscsitadm utility as demonstrated here or possibly deleting the one that are not correct.

# iscsitadm create tpgt 1
# iscsitadm modify tpgt -i 1
# iscsitadm modify tpgt -i 1
# iscsitadm modify target -p 1 ss1-zstore0

To delete a tpgt that is not correct is very strait forward.

# iscsitadm delete target -p 1 ss1-zstore0
# iscsitadm delete tpgt -A 1

Where and 2 are the target interfaces that should participate in portal group 1 and ss1-zstore0 is the target alias. In some cases you may have to remove the tpgt  all together. The backing store is editable as well as many other SMF properties. To change a backing store value in the SMF we use the svccfg command as follows.

Here is an example of listing all the backing stores and then changing the /dev/zvol/rdsk/sp2/iscsi/lun0 so its on zpool sp1 instead of sp2

# svcadm enable iscsitgt
# svccfg -s iscsitgt listprop | grep backing-store

param_dr-zstore0_0/backing-store                astring  /dev/zvol/rdsk/sp2/iscsi/lun0
param_dr-zstore0_1/backing-store                astring  /dev/zvol/rdsk/sp1/iscsi/lun1

# svccfg -s iscsitgt setprop param_dr-zstore0_0/backing-store=/dev/zvol/rdsk/sp1/iscsi/lun0
# svccfg -s iscsitgt listprop | grep backing-store

param_dr-zstore0_0/backing-store                astring  /dev/zvol/rdsk/sp1/iscsi/lun0
param_dr-zstore0_1/backing-store                astring  /dev/zvol/rdsk/sp1/iscsi/lun1

Changing the backing store value is instrumental if you wish to mount the VMFS LUN’s to provision system validation or online testing. However do not attach the  file system from the active replicated zfs backing store to the ESXi server for validation or testing as it will fail any additional replications once it is modified outside of the active replication stream. You must first create a clone of a chosen snapshot and then modify the backing store to use this new backing store path. This method will present a read/write clone through the iscsitgt service and will have the same iqn names so no reconfiguration would be required to create different time windows into the data stores or reversion to a previous point.

Here is an example of how  this would be accomplished.

# zfs create sp1/iscsi/clones
# zfs clone sp1/iscsi/lun0@10-10-2008-23:45 sp1/iscsi/clones/lun0
# svcadm refresh iscsitgt
# svcadm restart iscsitgt

To change to a different snapshot time you would simply need to destroy or rename  the current clone and replace it with a new or renamed clone of an existing snapshot on the same clone backing store path.

# zfs destroy sp1/iscsi/clones/lun0
# zfs clone sp1/iscsi/lun0@10-11-2008-23:45 sp1/iscsi/clones/lun0
# svccfg -s iscsitgt setprop param_dr-zstore0_0/backing-store=/dev/zvol/rdsk/sp1/iscsi/clones/lun0
# svcadm refresh iscsitgt
# svcadm restart iscsitgt

VMware Software iSCSI configuration

The ESXi iSCSI software configuration is quite strait forward. In this architecture we need to place an interface of the OpenSolaris iSCSI target host on vSwitch1 which is where we defined the iSCSI-Net0 VM kernel network. To do this we create a VM Network portgroup on the same VLAN ID as the iSCSI VM kernel interface.

Here is an example of what this configuration looks like.

DR Net example by Mike La Spina

For more deatail on how to configure the iSCSI VM interfaces see this blog in this case you would not need to define an aggregate since there is only one interface for the iSCSI vSAN.

The final step in the configuration is to define a discovery target on the iSCSI software configuration panel and then rescan the vmhba for new devices.

Hopefully this blog was of interest for you.

Til next time….



Site Contents: © 2008  Mike La Spina