Protecting Active Directory with Snapshot Strategies

Using snapshots to protect Active Directory (AD) without careful planning will most definitely end up in a complete disaster. AD is a loosely consistent distributed multi-master database and it must not be treated as a static system.  Without carefully addressing how AD works with Time Stamps, Version Stamps, Update Sequence Numbers (USNs), Globally Unique Identification numbers (GUIDs), Relative Identification numbers (RIDs),  Security Identifiers (SIDs) and restoration requirements the system could quickly become unusable or severally damaged in the event of an incorrectly invoked snapshot reversion.

There are many negative scenarios that can occur if we were to re-introduce an AD replica to service from a snapshot instance without special handling. In the event of a snapshot based re-introduction the RID functional component is seriously impacted. In any AD system RIDs are created in range blocks and assigned for use to a participating Domain Controller (DC) by the RID master DC AD role. RIDs are used to create SIDs for all AD objects like Group or User objects and they must all be unique. Lets take a closer look at the SID to understand why RIDs are such a critical function.

A SID is composed with the following symbolic format: S-R-IA-SA-RID:

  • S: Indicates the type of value is a  SID.
  • R: Indicates the revision of the SID.
  • IA: Indicates the issuing authority. Most are the NT Authority identity number 5.
  • SA: Indicates the sub-authority aka domain identifier.
  • RID: Indicates the Relative ID.

Now looking at some real SID example values we see that on a DC instance only the RID component of the SID is unique as show here in red text.

DS0User1      = S-1-5-21-3725033245-1308764377-180088833-3212
DS0UserGroup1 = S-1-5-21-3725033245-1308764377-180088833-7611

When an older snapshot image of a DC is reintroduced it’s assigned RID range will likely have RID entries that were already used to generate SIDs. Those SIDs would have replicated to the other DCs in the AD forest. When the reintroduced DC starts up it will try to participate in replication and servicing authentications of accounts. Depending on the age and configuration of its secure channel the DC could be successfully connected. This snapshot reintroduction event should be avoided since any RID usage from the aged DC will very likely result in duplicated SID creations and is obviously very undesirable.

Under normal AD recovery methods we would either need to restore AD or build a new server and perform a DC promo on it and possibly seize DC roles if required . The most important element of an normal AD restore process is the DC GUID reinitialization function. The DC GUID value reinitialization operation  allows the restoration of an AD DC to occur correctly. A  newly generated GUID becomes part of the Domain Identifier and thus the DC can create SIDs that are unique despite the fact that the RID assignment range it holds may be from a previously used one.

When we use a snapshot image of a guest DC VM none of the required Active Directory restore requirements will occur on  system startup and thus we must manually bring the host online in DSRM mode without a network connection and then set the NTDS restore mode up. I see this as a serious security risk as there a is significant probability that the host could be brought online without these steps occurring and potentially create integrity issues.

One mitigation to this identified risk is to perform the required changes before a snapshot is captured and once the capture is complete revert the change back to the non-restore state. This action will completely prevent a snapshot image of a DC from coming online from a past time reference.

In order to achieve this level of server state and snapshot automation we would need to provision a service channel from our storage head to the involved VMs or for that matter any storage consumer. A service channel can provide other functionality beyond the NDTS state change as well. One example is the ability to flush I/O using VSS or sync etc.

We can now look at a practical example of how to implement this strategy on OpenSolaris based storage heads and W2K3 or W2K8 servers.

The first part of the process is to create the service channel on a VM or any other windows host which can support VB or Power Shell etc. In this specific case we need to provision an SSH Server daemon that will allow us to issue commands directed towards the storage consuming guest VM from the providing storage head. There are many possible products available that can provide this service. I personally like MobaSSH which I will use in this example. Since this is a Domain Controller we need to use the Pro version which supports domain based user authentication from our service channel VM.

We need to create a dedicated user that is a member of the domains BUILTINAdministrators group. This poses a security risk and thus you should mitigate it by restricting this account to only the machines it needs to service.

e.g. in AD restrict it to the DCs or possibly any involved VM’s to be managed and the Service Channel system itself.

Restricting user machine logins

A dedicated user allows us to define authentication from the storage head to the service channel VM  using a trusted ssh RSA key that is mapped to the user instance on both the VM and OpenSolaris storage host. This user will launch any execution process that is issued from the OpenSolaris storage head.

In this example I will use the name scu, which is short for Service Channel User.

First we need to create the scu user on our OpenSolaris storage head.

root@ss1:~# useradd -s /bin/bash -d /export/home/scu -P ‘ZFS File System Management’ scu
root@ss1:~# mkdir /export/home/scu
root@ss1:~# cp /etc/skel/* /export/home/scu
root@ss1:~# echo PATH=/bin:/sbin:/usr/ucb:/etc:. > /export/home/scu/.profile
root@ss1:~# echo export PATH >> /export/home/scu/.profile
root@ss1:~# echo PS1=$’${LOGNAME}@$(/usr/bin/hostname)’~#’ ‘ >> /export/home/scu/.profile

root@ss1:~# chown –R scu /export/home/scu
root@ss1:~# passwd scu

In order to use an RSA key for authentication we must first generate an RSA private/public key pair on the storage head. This is performed using ssh-keygen while logged in as the scu user. You must set the passphrase as blank otherwise the session will prompt for it.

root@ss1:~# su – scu

scu@ss1~#ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/export/home/scu/.ssh/id_rsa):
Created directory ‘/export/home/scu/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /export/home/scu/.ssh/id_rsa.
Your public key has been saved in /export/home/scu/.ssh/id_rsa.pub.
The key fingerprint is:
0c:82:88:fa:46:c7:a2:6c:e2:28:5e:13:0f:a2:38:7f scu@ss1
scu@ss1~#

We now have the public key available in the file named id_rsa.pub the content of this file must be copied to the target ssh instance file named .ssh/authorized_keys. The private key file named id_rsa MUST NOT be exposed to any other location and should be secured. You do not need to store the private key anywhere else as you can regenerate the pair anytime if required.

Before we can continue we must install and configure the target Service Channel VM with MobaSSH.

Its a simple setup, just download MobaSSH Pro to the target local file system.

Execute it.

Click install.

Configure only the scu domain based user and clear all others from accessing the host.

e.g.















Moba Domain Users















Once MobaSSH is installed and restarted we can connect to it and finalize the secured ssh session. Don’t forget to add the scu user to your AD domains BUILTINAdministrators group before proceeding.  Also you need to perform an initial NT login to the Service Channel Windows VM using the scu user account prior to using the SSH daemon, this is required to create it’s home directories.

In this step we are using  putty to establish an ssh session to the Service Channel VM and then secure shelling to the storage server named ss1. Then we transfer the public key back to our self using scp and exit host ss1. Finally we use cat to append the public key file content to our  .ssh/authorized_key file in the scu users profile. Once these steps are complete we can establish an automated prompt less secured encrypted session from ss1 to the Service Channel Windows NT VM.

[Fri Dec 18 - 19:47:24] ~
[scu.ws0] $ ssh ss1
The authenticity of host ‘ss1 (10.10.0.1)’ can’t be established.
RSA key fingerprint is 5a:64:ea:d4:fd:e5:b6:bf:43:0f:15:eb:66:99:63:6b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘ss1,10.10.0.1′ (RSA) to the list of known hosts.
Password:
Last login: Fri Dec 18 19:47:28 2009 from ws0.laspina.ca
Sun Microsystems Inc.   SunOS 5.11      snv_128 November 2008

scu@ss1~#scp .ssh/id_rsa.pub ws0:/home/scu/.ssh/ss1-rsa-scu.pub
scu@ws0′s password:
id_rsa.pub           100% |*****************************|   217       00:00
scu@ss1~#exit

[Fri Dec 18 - 19:48:09]
[scu.ws0] $ cat .ssh/ss1-rsa-scu.pub >> .ssh/authorized_keys

With our automated RSA key password definition completed we can proceed to customize the MobaSSH service instance to run as the scu user. We need to perform this modification in order to enable VB script WMI DCOM impersonate caller rights when instantiating objects. In this case we are calling a remote regedit object over WMI and modifying the NTDS service registry start up values and thus this can only be performed by an administrator account. This modification essentially extends the storage hosts capabilities to reach any Windows host that need integral system management function calls.

On our OpenSolaris Storage head we need to invoke a script which will remotely change the NTDS service state and then locally snapshot the provisioned storage  and lastly return the NTDS service back to a normal state.  To accomplish this function we will define a cron job. The cron job needs some basic configuration steps as follows.

The solaris.jobs.user is required to submit a cron job, this allows us to create the job but not administer the cron service.
If an /etc/cron.d/cron.allow file exists then this RBAC setting will be overridden by the files existence and you will need to add the user to that file or convert to the best practice methods of RBAC.

root@ss1~# usermod -A solaris.jobs.user scu
root@ss1~# crontab –e scu
59 23 * * * ./vol1-snapshot.sh

Hint: crontab uses vi – http://www.kcomputing.com/kcvi.pdf  “vi cheat sheet”

The key sequence would be hit “i” and key in the line then hit “esc :wq” and to abort “esc :q!”

Be aware of the timezone the cron service runs under, you should check it and adjust it if required. Here is a example of whats required to set it.

root@ss1~# pargs -e `pgrep -f /usr/sbin/cron`

8550:   /usr/sbin/cron
envp[0]: LOGNAME=root
envp[1]: _=/usr/sbin/cron
envp[2]: LANG=en_US.UTF-8
envp[3]: PATH=/usr/sbin:/usr/bin
envp[4]: PWD=/root
envp[5]: SMF_FMRI=svc:/system/cron:default
envp[6]: SMF_METHOD=start
envp[7]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[8]: SMF_ZONENAME=global
envp[9]: TZ=PST8PDT

Let’s change it to CST6CDT

root@ss1~# svccfg -s system/cron:default setenv TZ CST6DST

Also the default environment path for cron may cause some script “command not found” issues, check for a path and adjust it if required.

root@ss1~# cat /etc/default/cron
#
# Copyright 1991 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
#pragma ident   “%Z%%M% %I%     %E% SMI”
CRONLOG=YES

This one has no default path, add the path using echo.

root@ss1~# echo PATH=/usr/bin:/usr/sbin:/usr/ucb:/etc:. > /etc/default/cron
# svcadm refresh cron
# svcadm restart cron

With a cron job defined to run the script named vol1-snapshot.sh in the default home directory of the scu user we are now ready to create the script content. Our OpenSolaris storage host needs to call a batch file on the remote Service Channel VM and it will execute  a vbscript from there to set the NTDS start up mode . To do this from a unix bash script we will use the following statements in the vol1-snapshot.sh file.

ssh -t ws0 NTDS-PreSnapshot.bat
snap_date=”$(date +%d-%m-%y-%H:%M)”
pfexec zfs snapshot rp1/san/vol1@$snap_date
ssh -t ws0 NTDS-PostSnapshot.bat
exit

Here we are running a secure shell call to the MobaSSH daemon with a -t option which runs the tty screen locally and this allows use to issue an “exit” from the remote calling script closing the secure shell. On the Service Channel VM the followng batch file vbscript calls are executed using the pre and post batch files illustrated as follows.

scu Batch Files

NTDS-PreSnapshot.bat
cscript NTDS-SnapshotRestoreModeOn.vbs DS0
exit

NTDS-PostSnapshot.bat
cscript NTDS-SnapshotRestoreModeOff.vbs DS0
exit

NTDS-SnapshotRestoreModeOn.vbs

strComputer = Wscript.Arguments(0)  
const HKLM=&H80000002
Set oregService=GetObject(“WinMgmts:{impersonationLevel=impersonate}!\” & strComputer & “rootdefault:stdRegProv”)
oregService.SetDWordValue HKLM, “SYSTEMCurrentControlSetServicesntdsparameters”, “Database restored from   backup”, 1
Set oregService=Nothing

NTDS-SnapshotRestoreModeOff.vbs

strComputer = Wscript.Arguments(0)  
const HKLM=&H80000002
Set oregService=GetObject(“WinMgmts:{impersonationLevel=impersonate}!\” & strComputer & “rootdefault:stdRegProv”)
oregService.SetDWordValue HKLM, “SYSTEMCurrentControlSetServicesntdsparameters”, “Database restored from   backup”, 0
Set oregService=Nothing

We now have Windows integrated storage volume snapshot functionality that allows an Active Directory domain controller to be securely protected using a snapshot strategy. In the event we need to fail back to a previous point in time there will be no danger that the snapshot will cause AD corruption. The integration process has other highly desirable capabilities such as the ability to call VSS snapshots and any other application backup preparatory function calls.  We could also branch out using more sophisticated PowerShell calls to VMware hosts in a fully automated recovery strategy using ZFS replication and remote sites.

Hope you enjoyed this entry.

Seasons Greetings to All.

Regards,

Mike



Share
  • Share

Tags: , , , , , ,

Site Contents: © 2009  Mike La Spina

5 Comments

  • daniel says:

    Interesting approach, I knew about the hazards of restoring domain controllers that had been offline for more than the tombstone lifetime but this was new. I would’ve thought that the RID master somehow prevented this from happening? Either way, a Netapp consultant told me a couple of weeks ago that domain controllers can get upset from the vmware snapshot operation itself but had no direct explanation of why. Have you ever heard this claim?

  • Hi Daniel,

    This claim is true under certain conditions. If the underlying hosts disk IO rate is heavy loaded or under scaled the guest will suspend during the snapshot creation when including the running guest memory image. This can drift the VM clock due to halted CPU scheduling. Thus a time issue can occur on the Kerberos ticket side and other time sensitive operations.
    I have observed this behavior directly.

    Regards,

    Mike

  • daniel says:

    Thanks for the answer, I’ll make sure to create a seperate datastore for domain controllers and only snapshot the underlying netapp volume and not the guest VM.

  • ixus says:

    Hi Mike,

    Nice article. I have a very quick question. I have a primary Netapp Controller configured with many LUN’x which are connected to Linux Host’s. Can you provide tips. On how to configure :
    That hosts manage way we take the snapshots on the volume hosting their LUNs. My backup netapp controller takes this snapshot via. SNAPVAULT.

    Your prompt reply will be appreciated..:)
    Regards,

    IXUS

  • Ixus,

    I presume you want your linux host to control its own snapshot invocation.
    You would need to setup a cron job that invokes an rsh to the appliance.
    I don’t have an appliance to work with so I can’t show you the details.

    It’s fairly strait forward.

    You need to grant the hosts ip access to the appliance
    - see the Netapp man pages
    http://ecserv1.uwaterloo.ca/netapp/man/man8/na_rshd.8.html

    The cron job would then use an interactive echo to initiate a snapvault command against it’s data store.
    http://ecserv1.uwaterloo.ca/netapp/man/man1/na_snapvault.1.html

    Regards,

    Mike

Leave a Reply

XHTML: You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>