Updated ZFS Replication and Snapshot Rollup Script

Thanks to the efforts of Ryan Kernan we have an updated ZFS replication and snapshot rollup script. Ryan’s OpenIdiana/Solaris/Illumos community contribution improves  the script to allow for a more dynamic source to target pool replication and changes the shapshot retention method to a specific number of snapshots rather than a Grandfather Father Son method.

zfs-replication.sh

Regards,

Mike  

Site Contents: © 2011  Mike La Spina

Protecting Active Directory with Snapshot Strategies

Using snapshots to protect Active Directory (AD) without careful planning will most definitely end up in a complete disaster. AD is a loosely consistent distributed multi-master database and it must not be treated as a static system.  Without carefully addressing how AD works with Time Stamps, Version Stamps, Update Sequence Numbers (USNs), Globally Unique Identification numbers (GUIDs), Relative Identification numbers (RIDs),  Security Identifiers (SIDs) and restoration requirements the system could quickly become unusable or severally damaged in the event of an incorrectly invoked snapshot reversion.

There are many negative scenarios that can occur if we were to re-introduce an AD replica to service from a snapshot instance without special handling. In the event of a snapshot based re-introduction the RID functional component is seriously impacted. In any AD system RIDs are created in range blocks and assigned for use to a participating Domain Controller (DC) by the RID master DC AD role. RIDs are used to create SIDs for all AD objects like Group or User objects and they must all be unique. Lets take a closer look at the SID to understand why RIDs are such a critical function.

A SID is composed with the following symbolic format: SRIASARID:

  • S: Indicates the type of value is a  SID.
  • R: Indicates the revision of the SID.
  • IA: Indicates the issuing authority. Most are the NT Authority identity number 5.
  • SA: Indicates the sub-authority aka domain identifier.
  • RID: Indicates the Relative ID.

Now looking at some real SID example values we see that on a DC instance only the RID component of the SID is unique as show here in red text.

DS0User1      = S1521-3725033245-1308764377-1800888333212
DS0UserGroup1 = S1521-3725033245-1308764377-1800888337611

When an older snapshot image of a DC is reintroduced it’s assigned RID range will likely have RID entries that were already used to generate SIDs. Those SIDs would have replicated to the other DCs in the AD forest. When the reintroduced DC starts up it will try to participate in replication and servicing authentications of accounts. Depending on the age and configuration of its secure channel the DC could be successfully connected. This snapshot reintroduction event should be avoided since any RID usage from the aged DC will very likely result in duplicated SID creations and is obviously very undesirable.

Under normal AD recovery methods we would either need to restore AD or build a new server and perform a DC promo on it and possibly seize DC roles if required . The most important element of an normal AD restore process is the DC GUID reinitialization function. The DC GUID value reinitialization operation  allows the restoration of an AD DC to occur correctly. A  newly generated GUID becomes part of the Domain Identifier and thus the DC can create SIDs that are unique despite the fact that the RID assignment range it holds may be from a previously used one.

When we use a snapshot image of a guest DC VM none of the required Active Directory restore requirements will occur on  system startup and thus we must manually bring the host online in DSRM mode without a network connection and then set the NTDS restore mode up. I see this as a serious security risk as there a is significant probability that the host could be brought online without these steps occurring and potentially create integrity issues.

One mitigation to this identified risk is to perform the required changes before a snapshot is captured and once the capture is complete revert the change back to the non-restore state. This action will completely prevent a snapshot image of a DC from coming online from a past time reference.

In order to achieve this level of server state and snapshot automation we would need to provision a service channel from our storage head to the involved VMs or for that matter any storage consumer. A service channel can provide other functionality beyond the NDTS state change as well. One example is the ability to flush I/O using VSS or sync etc.

We can now look at a practical example of how to implement this strategy on OpenSolaris based storage heads and W2K3 or W2K8 servers.

The first part of the process is to create the service channel on a VM or any other windows host which can support VB or Power Shell etc. In this specific case we need to provision an SSH Server daemon that will allow us to issue commands directed towards the storage consuming guest VM from the providing storage head. There are many possible products available that can provide this service. I personally like MobaSSH which I will use in this example. Since this is a Domain Controller we need to use the Pro version which supports domain based user authentication from our service channel VM.

We need to create a dedicated user that is a member of the domains BUILTINAdministrators group. This poses a security risk and thus you should mitigate it by restricting this account to only the machines it needs to service.

e.g. in AD restrict it to the DCs or possibly any involved VM’s to be managed and the Service Channel system itself.

Restricting user machine logins

A dedicated user allows us to define authentication from the storage head to the service channel VM  using a trusted ssh RSA key that is mapped to the user instance on both the VM and OpenSolaris storage host. This user will launch any execution process that is issued from the OpenSolaris storage head.

In this example I will use the name scu, which is short for Service Channel User.

First we need to create the scu user on our OpenSolaris storage head.

root@ss1:~# useradd -s /bin/bash -d /export/home/scu -P ‘ZFS File System Management’ scu
root@ss1:~# mkdir /export/home/scu
root@ss1:~# cp /etc/skel/* /export/home/scu
root@ss1:~# echo PATH=/bin:/sbin:/usr/ucb:/etc:. > /export/home/scu/.profile
root@ss1:~# echo export PATH >> /export/home/scu/.profile
root@ss1:~# echo PS1=$’${LOGNAME}@$(/usr/bin/hostname)’~#’ ‘ >> /export/home/scu/.profile

root@ss1:~# chown –R scu /export/home/scu
root@ss1:~# passwd scu

In order to use an RSA key for authentication we must first generate an RSA private/public key pair on the storage head. This is performed using ssh-keygen while logged in as the scu user. You must set the passphrase as blank otherwise the session will prompt for it.

root@ss1:~# su – scu

scu@ss1~#ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/export/home/scu/.ssh/id_rsa):
Created directory ‘/export/home/scu/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /export/home/scu/.ssh/id_rsa.
Your public key has been saved in /export/home/scu/.ssh/id_rsa.pub.
The key fingerprint is:
0c:82:88:fa:46:c7:a2:6c:e2:28:5e:13:0f:a2:38:7f scu@ss1
scu@ss1~#

We now have the public key available in the file named id_rsa.pub the content of this file must be copied to the target ssh instance file named .ssh/authorized_keys. The private key file named id_rsa MUST NOT be exposed to any other location and should be secured. You do not need to store the private key anywhere else as you can regenerate the pair anytime if required.

Before we can continue we must install and configure the target Service Channel VM with MobaSSH.

Its a simple setup, just download MobaSSH Pro to the target local file system.

Execute it.

Click install.

Configure only the scu domain based user and clear all others from accessing the host.

e.g.















Moba Domain Users















Once MobaSSH is installed and restarted we can connect to it and finalize the secured ssh session. Don’t forget to add the scu user to your AD domains BUILTINAdministrators group before proceeding.  Also you need to perform an initial NT login to the Service Channel Windows VM using the scu user account prior to using the SSH daemon, this is required to create it’s home directories.

In this step we are using  putty to establish an ssh session to the Service Channel VM and then secure shelling to the storage server named ss1. Then we transfer the public key back to our self using scp and exit host ss1. Finally we use cat to append the public key file content to our  .ssh/authorized_key file in the scu users profile. Once these steps are complete we can establish an automated prompt less secured encrypted session from ss1 to the Service Channel Windows NT VM.

[Fri Dec 18 – 19:47:24] ~
[scu.ws0] $ ssh ss1
The authenticity of host ‘ss1 (10.10.0.1)’ can’t be established.
RSA key fingerprint is 5a:64:ea:d4:fd:e5:b6:bf:43:0f:15:eb:66:99:63:6b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘ss1,10.10.0.1’ (RSA) to the list of known hosts.
Password:
Last login: Fri Dec 18 19:47:28 2009 from ws0.laspina.ca
Sun Microsystems Inc.   SunOS 5.11      snv_128 November 2008

scu@ss1~#scp .ssh/id_rsa.pub ws0:/home/scu/.ssh/ss1-rsa-scu.pub
scu@ws0’s password:
id_rsa.pub           100% |*****************************|   217       00:00
scu@ss1~#exit

[Fri Dec 18 – 19:48:09]
[scu.ws0] $ cat .ssh/ss1-rsa-scu.pub >> .ssh/authorized_keys

With our automated RSA key password definition completed we can proceed to customize the MobaSSH service instance to run as the scu user. We need to perform this modification in order to enable VB script WMI DCOM impersonate caller rights when instantiating objects. In this case we are calling a remote regedit object over WMI and modifying the NTDS service registry start up values and thus this can only be performed by an administrator account. This modification essentially extends the storage hosts capabilities to reach any Windows host that need integral system management function calls.

On our OpenSolaris Storage head we need to invoke a script which will remotely change the NTDS service state and then locally snapshot the provisioned storage  and lastly return the NTDS service back to a normal state.  To accomplish this function we will define a cron job. The cron job needs some basic configuration steps as follows.

The solaris.jobs.user is required to submit a cron job, this allows us to create the job but not administer the cron service.
If an /etc/cron.d/cron.allow file exists then this RBAC setting will be overridden by the files existence and you will need to add the user to that file or convert to the best practice methods of RBAC.

root@ss1~# usermod -A solaris.jobs.user scu
root@ss1~# crontab –e scu
59 23 * * * ./vol1-snapshot.sh

Hint: crontab uses vi – http://www.kcomputing.com/kcvi.pdf  “vi cheat sheet”

The key sequence would be hit “i” and key in the line then hit “esc :wq” and to abort “esc :q!”

Be aware of the timezone the cron service runs under, you should check it and adjust it if required. Here is a example of whats required to set it.

root@ss1~# pargs -e `pgrep -f /usr/sbin/cron`

8550:   /usr/sbin/cron
envp[0]: LOGNAME=root
envp[1]: _=/usr/sbin/cron
envp[2]: LANG=en_US.UTF-8
envp[3]: PATH=/usr/sbin:/usr/bin
envp[4]: PWD=/root
envp[5]: SMF_FMRI=svc:/system/cron:default
envp[6]: SMF_METHOD=start
envp[7]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[8]: SMF_ZONENAME=global
envp[9]: TZ=PST8PDT

Let’s change it to CST6CDT

root@ss1~# svccfg -s system/cron:default setenv TZ CST6DST

Also the default environment path for cron may cause some script “command not found” issues, check for a path and adjust it if required.

root@ss1~# cat /etc/default/cron
#
# Copyright 1991 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
#pragma ident   “%Z%%M% %I%     %E% SMI”
CRONLOG=YES

This one has no default path, add the path using echo.

root@ss1~# echo PATH=/usr/bin:/usr/sbin:/usr/ucb:/etc:. > /etc/default/cron
# svcadm refresh cron
# svcadm restart cron

With a cron job defined to run the script named vol1-snapshot.sh in the default home directory of the scu user we are now ready to create the script content. Our OpenSolaris storage host needs to call a batch file on the remote Service Channel VM and it will execute  a vbscript from there to set the NTDS start up mode . To do this from a unix bash script we will use the following statements in the vol1-snapshot.sh file.

ssh -t ws0 NTDS-PreSnapshot.bat
snap_date=”$(date +%d-%m-%y-%H:%M)”
pfexec zfs snapshot rp1/san/vol1@$snap_date
ssh -t ws0 NTDS-PostSnapshot.bat
exit

Here we are running a secure shell call to the MobaSSH daemon with a -t option which runs the tty screen locally and this allows use to issue an “exit” from the remote calling script closing the secure shell. On the Service Channel VM the followng batch file vbscript calls are executed using the pre and post batch files illustrated as follows.

scu Batch Files

NTDS-PreSnapshot.bat
cscript NTDS-SnapshotRestoreModeOn.vbs DS0
exit

NTDS-PostSnapshot.bat
cscript NTDS-SnapshotRestoreModeOff.vbs DS0
exit

NTDS-SnapshotRestoreModeOn.vbs

strComputer = Wscript.Arguments(0)  
const HKLM=&H80000002
Set oregService=GetObject(“WinMgmts:{impersonationLevel=impersonate}!\” & strComputer & “rootdefault:stdRegProv”)
oregService.SetDWordValue HKLM, “SYSTEMCurrentControlSetServicesntdsparameters”, “Database restored from   backup”, 1
Set oregService=Nothing

NTDS-SnapshotRestoreModeOff.vbs

strComputer = Wscript.Arguments(0)  
const HKLM=&H80000002
Set oregService=GetObject(“WinMgmts:{impersonationLevel=impersonate}!\” & strComputer & “rootdefault:stdRegProv”)
oregService.SetDWordValue HKLM, “SYSTEMCurrentControlSetServicesntdsparameters”, “Database restored from   backup”, 0
Set oregService=Nothing

We now have Windows integrated storage volume snapshot functionality that allows an Active Directory domain controller to be securely protected using a snapshot strategy. In the event we need to fail back to a previous point in time there will be no danger that the snapshot will cause AD corruption. The integration process has other highly desirable capabilities such as the ability to call VSS snapshots and any other application backup preparatory function calls.  We could also branch out using more sophisticated PowerShell calls to VMware hosts in a fully automated recovery strategy using ZFS replication and remote sites.

Hope you enjoyed this entry.

Seasons Greetings to All.

Regards,

Mike



Site Contents: © 2009  Mike La Spina

Controlling Snapshot Noise

The ability to perform file system, database and volume snapshots grants us many data protection benefits. However there are some serious problems that can occur if we do not carefully architect snapshot based storage infrastructures. This blog entry will discuss some of the issues with data noise induction and data integrity when using point in time data snapshot activities  and how we can reduce the negative aspects of these data protection methods.

With the emergence of snapshot technology in the data center data noise induction is an unwanted by product and needs to addressed. Active data within a file store or raw volume will have significant amounts of temporary data like  memory swaps and application temp files. This type of data is required for system operations but unfortunately it is completely useless within a point in time snapshot and simply consumes valuable storage space with no permanent value within the scope of system data protection. There are many sources of this undesirable data noise that we need to consider and define strategies to isolate and eliminate them where possible.

In some cases using raw stores eg. iSCSI, FC etc. we will have duplicate snapshot functionality points in the storage stream and this further complicates how we approach a solution to noise induction issues. One common example of snapshot functionality duplication is Microsoft Windows 2003 Volume Snapshot Service aka. VSS. If we enable VSS and an external snapshot service is employed then we are now provisioning snapshots of snapshots which of course is less than optimal because much of the delta between points in time are just redundant encapsulated data. There are some higher level advantages to allowing this to occur like provisioning self service end user restores and VSS aware file system quiescence but for the most part it is not optimal from a space consumption or performance and efficiency  perspective.

If we perform snapshots at multiple points in the data storage stream using VSS we will have three points of data delta. The changed data elements on the source store of the primary files, a copy-on-write set of  changed blocks of the same primary store including it’s meta data and finally the external snapshot delta and it’s meta data. As well if the two snapshot events were to occur at the same time it creates a non-integral copy of a snapshot and meta data which is just pure wasted space since it is inconsistent.

With the co-existing use of VSS we need to define what functionality is important to us. For example VSS is limited to 512 snapshots and 64 versions so if we need to exceed these limits we have to employ an external snapshot facility. Or perhaps we need to allow a user self service file restore functionality from a shared folder. In many cases this functionality is provided by the external snapshot provisioning point. OpenSolaris, EMC and NetApp are some examples of storage products that can provide such functionality. Of course my preference is Custom OpenSolaris storage servers or the S7000 series of storage product which is based on OpenSolaris and is well suited for the formally supported side of things.

Solely provisioning the snapshots externally verses native MS VSS can significantly reduce induced data noise if the external provider supports VSS features or provides tools to control VSS. VSS copy on write snapshot capability should not be active when using this strategy so as to eliminate the undesirable snapshot echo noise. Most environments will find that they require the use of snapshot services that exceed the native MS VSS capabilities.  Provisioning the snapshot function directly on a shared storage system is a significantly better strategy verses allowing a distributed deployment of storage management points across your infrastructure.

OpenSolaris and ZFS provides superior depth in snapshot provisioning than Microsoft shared folder snapshot copy services. Implementing ZFS dramatically reduces space consumption and allows snapshots up to the maximum capacity of the storage system and OpenSolaris provides MS SMB client access to the snapshots which users can manage recovery as a self service. By employing ZFS as a backing store, snapshot management is simplified and snapshots are  available for export to alternate share points by cloning and provisioning the point in time to data consumers performing a multitude of desirable tasks such as audits, validation, analysis and testing.

If we need to employ MS VSS snapshot services provisioned on a storage server that uses snapshot based data protection strategies  then we will need prevent  continuous snapshots on the storage server. We can use features like snap mirror and zfs replication to provision a replica of the data however this would need to be strictly limited in count e.g. 2 or 3 versions and timed to avoid a multiple system snapshot time collision. Older snapshots should be purged and we only allow the MS VSS snapshot provisioning to keep the data deltas.

Another common source of snapshot noise is temporary file data or memory swaps. Fortunately with this type of noise the solution is relatively easy to solve as we simply isolate this type of storage onto storage volumes or shares that are explicitly excluded from a snapshot service function. For example if we are using VMFS stores we can place vswp files on a designated VMFS volume and conversely within an operating system we can create a separate vmdk disk that maps to a VMFS volume which we also exclude from the snapshot function. This strategy requires that we ensure that any replication scheme incorporates the creation or one time replication of these volumes. Unfortunately this methodology does not play well with storage vmotion so one must ensure that the a relocation does not move the noisy vmdk’s back into the snapshot service provisioned stores.

VMware VMFS volume VM snapshots is a significant source of data noise. When a snapshot is initiated from within VMware all data writes are placed on delta file instances. These delta files will be captured on the external storage systems snapshot points and will remain there after the VM snapshot is removed. Significant amount of data delta are produced by VM based snapshots and sometimes mulitple deltas can exceed original vmdk size. An easy way to prevent this undesirable impact is to clone the VM to a store outside the snapshot provisioned stores rather than invoking snapshots.

Databases are probably the most challenging source of snapshot noise data and requires a different strategy than isolation because the data within a specific snapshot is all required to provide system integrity. For example we cannot isolate SQL log data because it is required to do a crash recovery or to roll forward etc.  We can isolate temp database stores since any data in those date stores would not be valid in a crash recovery state.

One strategy that I use as both a blanket method when we are not are able to use other methods and in concert with the previously discussed isolation methods is a snapshot roll-up function. This strategy simply reduces the number of long term snapshot copies that are kept. The format is based on a Grand Father, Father and Son (GFS) retention chain of the snapshot copies and is well suited for a variety of data types. The effect is to provide a reasonable amount of data protection points to satisfy most computing environments and keep the captured noise to a manageable value. For example if we were to snapshot without any management cycle every 15 minutes we would accumulate ~35,000 delta points of data over the period of 1 year. Conversely if we employ the GFS method we will accumulate 294 delta points of data over the period of 1 year. Obviously the consumption of storage resource is so greatly reduced that  we could keep many additional key points in time if we wished and still maintain a balance of recovery point verses consumption rate.

Let’s take a look at a simple real example of how snapshot noise can impact our storage system using VMware, OpenSolaris and ZFS block based iSCSI volume snapshots. In this example we have a simple Windows Vista VM that is sitting idle, in other words only the OS is loaded and it is power on and running.

First we take a ZFS snapshot of the VMFS ZFS iSCSI volume.

zfs snapshot sp1/ss1-vol0@beforevmsnap

Now we invoke a VMware based snapshot and have a look at the result.

root@ss1:~# zfs list -t snapshot
NAME                               USED  AVAIL  REFER  MOUNTPOINT
sp1/ss1-vol0@beforevmsnap          228M      –  79.4G  –

Keep in mind that we are not modifying any data files in this VM if we were to change data files the deltas would be much larger and in many cases with multiple VMware snapshots could exceed the VMs original size if it is allowed to remain for long periods of weeks and longer. The backing store snapshot initially consumes 228MB which will continue to grow as changes occurs on the volume. A significant part of the 228MB is the VMs memory image in this case and of course it has no permanent storage value.

sp1/ss1-vol0@after1stvmsnap          1.44M      –  79.5G  –

After the initial VMware snapshot occurs we create a new point in time ZFS snapshot and here we observe some noise in the next snapshot and again we have not changed any data files in the last minute or so.

sp1/ss1-vol0@after2ndvmsnap       1.78M      –  79.5G  –

And yet another ZFS snapshot a couple of minutes later shows more snapshot noise accumulation. This is one of the many issues that are present when we allow non-discretionary placement of files and temporary storage on snapshot based systems.

Now lets see the impact of destroying a snapshot that was created after we delete the VMware based snapshot.

root@ss1:~# zfs destroy sp1/ss1-vol0@beforevmsnap
root@ss1:~# zfs list -t snapshot
NAME                               USED  AVAIL  REFER  MOUNTPOINT
sp1/ss1-vol0@after1stvmsnap       19.6M      –  79.2G  –
sp1/ss1-vol0@after2ndvmsnap       1.78M      –  79.2G  –

Here we observe the reclamation of more than 200MB of storage. And this is why GFS based snapshot rollups can provide some level of noise control.

Well I hope you found this entry to be useful.

Til next time..

Regards,

Mike

 



Site Contents: © 2009  Mike La Spina

Securing COMSTAR and VMware iSCSI connections

Connecting VMware iSCSI sessions to COMSTAR or any iSCSI target provider securely is required to maintain a reliable system. Without some level of initiator to target connection gate keeping we will eventually encounter a security event. This can happen from a variety of sources, for example a non-cluster aware OS can connect to an unsecured VMware shared storage LUN and cause severe damage to it since the OS has no shared LUN access knowledge.  All to often we make assumptions that security is about confidentiality when it is actually more commonly about data availability and integrity which will both be compromised if an unintentional connection were to write on a shared LUN.

At the very minimum security level we should apply non-authenticated named initiator access grants to our targets. This low security method defines initiator to target connection states for lower security tolerant environments. This security method is applicable when confidentiality is not as important and security is maintained with the physical access control realm. As well it should also coincide with SAN fabric isolation and be strictly managed by the Virtual System or Storage Administrators. Additionally we can increase access security control by enabling CHAP authentication which is a serious improvement over named initiators. I will demonstrate both of these security methods using COMSTAR iSCSI Providers and VMware within this blog entry.

Before we dive into the configuration details lets examine how LU’s are exposed. COMSTAR controls iSCSI target access using several combined elements. One of these elements is within the COMSTAR STMF facility where we can assign membership of host and target groups. By default if we do not define a host or target group any created target will belong to an implied ALL group. This group as we would expect grants any connecting initiator membership to the ALL group assigned LUN’s. These assignments are called views in the STMF state machine and are a mapping function of the Storage Block Driver service (SBD) to the STMF IT_nexus state tables.

This means that if we were to create an initiator without assigning a host group or host/target group combination, an initiator would be allowed unrestricted connectivity to any ALL group LUN views and possibly without any authentication at all. Allowing this to occur would of course be very undesirable from a security perspective in almost all cases. Conversely if we use a target group definition then only the initiators that connect to the respective target will see the LUN views which are mapped on that target definition instance.

While target groups do not significantly improve access security it does provide a means controlling accessibility based on the definition of interface connectivity classes which in turn can be mapped out on respective VLAN priority groups, bandwidth availability and applicable path fault tolerance capabilities which are all important aspects of availability and unfortunately are seldom considered security concepts in many architectures.

Generally on most simple storage configurations the use of target groups is not a requirement. However they do provide a level of access control with LUN views. For example we can assign LUN views to a target group which in turn frees us from having to add the LUN view to each host group within shared LUN configurations like VMware stores. With combination’s of host and target groups we can create more flexible methods in respect to shared LUN visibility. With the addition of simple CHAP authentication we can more effectively insulate target groups. This is primarily due to the ability to assign separate CHAP user and password values for each target.

Lets look at this visual depiction to help see the effect of using target and host groups.

COMSTAR host and target view depiction

In this depiction any initiator that connects to the target group prod-tg1 will by default see the views that are mapped to that target groups interfaces. Additionally if the initiator is also a member of the host group prod-esx1 those view mapping will also be visible.

One major difference with target groups verses the all group is that you can define LU views on mass to an entire class of initiator connections e.g. a production class. This becomes an important control element in a unified media environment where the use of VLANs separates visibility. Virtual interfaces can be created at the storage server and attached to VLANs respectively. Target groups become a very desirable as a control within a unified computing context.

Named Initiator Access

Enabling named initiator to target using unauthenticated access with COMSTAR and VMware iSCSI services is a relatively simple operation. Let’s examine how this method controls initiator access.

We will define two host groups, one for production esx hosts and one for test esx hosts.

# stmfadm create-hg prod-esx1

# stmfadm create-hg test-esx1

With these host groups defined we individually assign LU’s views to the host groups and then we define any initiator to be a member of one of the host groups to which it would only see the views which belong to the host group and additionally any views assigned to the default all group.

To add a host initiator to a host group, we must first create it in the port provider of choice which in this case is the iSCSI port provider.

# itadm create-initiator iqn.1998-01.com.vmware:vh1.1

Once created the defined initiator can be added to a host group.

# stmfadm add-hg-member -g prod-esx1 iqn.1998-01.com.vmware:vh1.1

An ESX host initiator with this iqn name can now attach to our COMSTAR targets and will see any LU views that are added to the prod-esx1 host group. But there are still some issues here, for example any ESX host with this initiator name will be able to connect to our targets and see the LUs. This is where CHAP can help to improve access control.

Adding CHAP Authentication on the iSCSI Target

Adding CHAP authentication is very easy to accomplish, we simply need to set a chap user name and secret on the respective iSCSI target. Here is an example of its application.

# itadm modify-target -s -u tcuid1 iqn.2009-06.target.ss1.1

Enter CHAP secret:
Re-enter secret:

The CHAP secret must be between 12 and 255 characters long. The addition of CHAP allows us to further reduce any risks of a potential storage security event. We can define an additional target and they can have a different chap user names and or secrets.

CHAP is more secure when used in a mutual authentication back to the source initiator which is my preferred way to implement it on ESX 4 (ESX 3 does not support mutual chap). This mode does not stop a successful one-way authentication from an initiator to the target, it allows the initiator to request that the target host system iSCSI services must authenticate back to the initiator which provides validation that the target is indeed the correct one. Here is an example of the target side initiator definition that would provide this capability.

# itadm modify-initiator -s -u icuid1 iqn.1998-01.com.vmware:vh1.1

Enter CHAP secret:
Re-enter secret:

Configuring the ESX 4 Software iSCSI Initiator

On the ESX 4 host side we need to enter our initiator side CHAP values.

ESX 4 iSCSI Mutual CHAP

 

Be careful here, there are three places we can configure CHAP elements. The general tab allows a global point of admin where any target will inherit those entered values by default where applicable e.g. target chap settings. The the dynamic tab can override the global settings and as well the static tab overrides the global and dynamic ones. In this example we are configuring a dynamically discovered target to use mutual (aka bidirectional) authentication.

In closing CHAP is a reasonable method to ensure that we correctly grant initiator to target connectivity assignments in an effort to promote better integrity and availability. It does not however provide much on the side of confidentially for that we need more complex solutions like IPSec.

Hope you found this blog interesting.

Regards,

Mike

Site Contents: © 2009  Mike La Spina

Additional VMFS Backup Automation script features

I was conversing with William Lam about my blog entry Protecting ESX VMFS Stores with Automation and we exchanged ideas on the simple automation script that I originally posted. William is well versed in bash and has brought more functionality to the original automation script. We now have a have a rolling backup set 10 versions deep with folder augmented organization based on the host name, store alias, date label and the rolling instance number. The updated script is named vmfs-bu2 linked here.

Thanks for your contribution William!

Regards,

Mike

Site Contents: © 2009  Mike La Spina

Next Page »