<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: About</title>
	<atom:link href="http://blog.laspina.ca/about/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.laspina.ca</link>
	<description>Blogging for technical minds.</description>
	<lastBuildDate>Fri, 27 Jan 2012 17:53:51 -0600</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Frederik</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3972</link>
		<dc:creator>Frederik</dc:creator>
		<pubDate>Wed, 21 Dec 2011 08:29:50 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3972</guid>
		<description>Mmmm, not what I expected. Still many thanks. I&#039;m using dedicated machines, AMD and Intel, SAS and Sata and see it on both. I did an analysis of the commits and made a shortlist of commits possible responsible for this. Based on the list I&#039;m rebuilding the code and test at what point the regression was introduced. Hopefully a single commit can be pinpointed for this. A pity you don&#039;t have the DDRdrive anymore...
I&#039;ll keep you posted about any progress.
Again thank you for time,
Frederik</description>
		<content:encoded><![CDATA[<p>Mmmm, not what I expected. Still many thanks. I&#8217;m using dedicated machines, AMD and Intel, SAS and Sata and see it on both. I did an analysis of the commits and made a shortlist of commits possible responsible for this. Based on the list I&#8217;m rebuilding the code and test at what point the regression was introduced. Hopefully a single commit can be pinpointed for this. A pity you don&#8217;t have the DDRdrive anymore&#8230;<br />
I&#8217;ll keep you posted about any progress.<br />
Again thank you for time,<br />
Frederik</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike La Spina</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3970</link>
		<dc:creator>Mike La Spina</dc:creator>
		<pubDate>Wed, 21 Dec 2011 04:01:32 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3970</guid>
		<description>Hi Frederik,

Here are the numbers from using dd in an OI VM:

148 slog
4294967296 bytes (4.3 GB) copied, 80.0237 s, 53.7 MB/s async
4294967296 bytes (4.3 GB) copied, 559.775 s, 7.7 MB/s sync

148 no slog
4294967296 bytes (4.3 GB) copied, 76.7171 s, 56.0 MB/s async
4294967296 bytes (4.3 GB) copied, 226.254 s, 19.0 MB/s sync

151 slog
4294967296 bytes (4.3 GB) copied, 76.3189 s, 56.3 MB/s async
4294967296 bytes (4.3 GB) copied, 559.104 s, 7.7 MB/s sync

151 no slog
4294967296 bytes (4.3 GB) copied, 79.7127 s, 53.9 MB/s async
4294967296 bytes (4.3 GB) copied, 174.932 s, 24.6 MB/s sync

I think this rules out the zil code direction. 
I think it is more likely that your dealing with a driver issue.

Keep in mind these tests are not using a fast slog, it&#039;s just an external disk over VT-d on an LSI 1068 SAS Adapter and thus is not accelerated. 
The other disk is a local vmdk and it is cached at the ESXi host.

Regards,
Mike</description>
		<content:encoded><![CDATA[<p>Hi Frederik,</p>
<p>Here are the numbers from using dd in an OI VM:</p>
<p>148 slog<br />
4294967296 bytes (4.3 GB) copied, 80.0237 s, 53.7 MB/s async<br />
4294967296 bytes (4.3 GB) copied, 559.775 s, 7.7 MB/s sync</p>
<p>148 no slog<br />
4294967296 bytes (4.3 GB) copied, 76.7171 s, 56.0 MB/s async<br />
4294967296 bytes (4.3 GB) copied, 226.254 s, 19.0 MB/s sync</p>
<p>151 slog<br />
4294967296 bytes (4.3 GB) copied, 76.3189 s, 56.3 MB/s async<br />
4294967296 bytes (4.3 GB) copied, 559.104 s, 7.7 MB/s sync</p>
<p>151 no slog<br />
4294967296 bytes (4.3 GB) copied, 79.7127 s, 53.9 MB/s async<br />
4294967296 bytes (4.3 GB) copied, 174.932 s, 24.6 MB/s sync</p>
<p>I think this rules out the zil code direction.<br />
I think it is more likely that your dealing with a driver issue.</p>
<p>Keep in mind these tests are not using a fast slog, it&#8217;s just an external disk over VT-d on an LSI 1068 SAS Adapter and thus is not accelerated.<br />
The other disk is a local vmdk and it is cached at the ESXi host.</p>
<p>Regards,<br />
Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike La Spina</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3969</link>
		<dc:creator>Mike La Spina</dc:creator>
		<pubDate>Wed, 21 Dec 2011 01:18:21 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3969</guid>
		<description>Hi Frederik,

I do not have the DDRdrive, however I can simply use a VT-d attached disk as a slog. If this is a slog txg code issue it should show up just the same. I am doing the unit testing now, the result will be available shortly

Regards,
Mike</description>
		<content:encoded><![CDATA[<p>Hi Frederik,</p>
<p>I do not have the DDRdrive, however I can simply use a VT-d attached disk as a slog. If this is a slog txg code issue it should show up just the same. I am doing the unit testing now, the result will be available shortly</p>
<p>Regards,<br />
Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederik</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3965</link>
		<dc:creator>Frederik</dc:creator>
		<pubDate>Tue, 20 Dec 2011 12:09:18 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3965</guid>
		<description>Hi Mike,

thank you for investigating this. The regression only happens when using a dedicated slog on anything later than oi148. And, at least for the moment, doesn&#039;t involve NFS. It can be easily reproduced locally with dd. 
# dd if=/dev/zero of=/mpool/4g.bin [oflag=sync] bs=32K count=128K
This more or less simulates the esxi nfs writes sync and async. I tried various block sizes but that didn&#039;t made much of a difference
For example a 2 disk pool with and without slog
oi148 with slog async dd=218MB/s 
oi148 with slog sync dd=50MB/s zilstat iops/per txg=7500
oi148 NO slog async dd=216MB/s 
oi148 NO slog sync dd=5MB/s zilstat iops/per txg=750

illumos with slog async dd=223MB/s 
illumos with slog sync dd=13MB/s zilstat iops/per txg= first=4400 second=675 third=750 this pattern repeats
illumos NO slog async dd=221MB/s 
illumos NO slog sync dd=5MB/s zilstat iops/per txg=770

Do you still have the DDRdrive? It would be interesting to see if the same behaviour manifests itself using this device, since it isn&#039;t connected to a HBA.

I&#039;m very curious if you&#039;re tests involved a slog. Sorry to have wasted your time benching ESXi while it appears to be a local problem. But I never suspected that. One the brighter side is that one can remove the slog nowadays without recreating the pool.

Again many thanks and I hope you&#039;ve some more time and a dedicated slog to verify my latest finding,

Frederik</description>
		<content:encoded><![CDATA[<p>Hi Mike,</p>
<p>thank you for investigating this. The regression only happens when using a dedicated slog on anything later than oi148. And, at least for the moment, doesn&#8217;t involve NFS. It can be easily reproduced locally with dd.<br />
# dd if=/dev/zero of=/mpool/4g.bin [oflag=sync] bs=32K count=128K<br />
This more or less simulates the esxi nfs writes sync and async. I tried various block sizes but that didn&#8217;t made much of a difference<br />
For example a 2 disk pool with and without slog<br />
oi148 with slog async dd=218MB/s<br />
oi148 with slog sync dd=50MB/s zilstat iops/per txg=7500<br />
oi148 NO slog async dd=216MB/s<br />
oi148 NO slog sync dd=5MB/s zilstat iops/per txg=750</p>
<p>illumos with slog async dd=223MB/s<br />
illumos with slog sync dd=13MB/s zilstat iops/per txg= first=4400 second=675 third=750 this pattern repeats<br />
illumos NO slog async dd=221MB/s<br />
illumos NO slog sync dd=5MB/s zilstat iops/per txg=770</p>
<p>Do you still have the DDRdrive? It would be interesting to see if the same behaviour manifests itself using this device, since it isn&#8217;t connected to a HBA.</p>
<p>I&#8217;m very curious if you&#8217;re tests involved a slog. Sorry to have wasted your time benching ESXi while it appears to be a local problem. But I never suspected that. One the brighter side is that one can remove the slog nowadays without recreating the pool.</p>
<p>Again many thanks and I hope you&#8217;ve some more time and a dedicated slog to verify my latest finding,</p>
<p>Frederik</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike La Spina</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3962</link>
		<dc:creator>Mike La Spina</dc:creator>
		<pubDate>Tue, 20 Dec 2011 01:43:51 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3962</guid>
		<description>Hi Frederik,

A quick bench using ESXi 4.1 and an XP VM running on an NFS share served by an OI VM 148 -&gt; 151.1 shows no performance loss. It actually results in a 5%-10% improvement in IOPS over NFS. The bench was a simple IOMeter and SQLIO load using the same variables on both 148 and 151. Can you describe your OI host configuration. Specifically the disk controller hardware and motherboard your running. This sounds like a driver issue.

Regards,
Mike</description>
		<content:encoded><![CDATA[<p>Hi Frederik,</p>
<p>A quick bench using ESXi 4.1 and an XP VM running on an NFS share served by an OI VM 148 -&gt; 151.1 shows no performance loss. It actually results in a 5%-10% improvement in IOPS over NFS. The bench was a simple IOMeter and SQLIO load using the same variables on both 148 and 151. Can you describe your OI host configuration. Specifically the disk controller hardware and motherboard your running. This sounds like a driver issue.</p>
<p>Regards,<br />
Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederik</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3959</link>
		<dc:creator>Frederik</dc:creator>
		<pubDate>Mon, 19 Dec 2011 16:37:56 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3959</guid>
		<description>Much appreciated. Very coarse stats from zilstat show a drop from 3100-3400 io&#039;s per txg to 1900-2300 on anything later oi148. And I&#039;m using as fast dedicated slog, ram based, on both machines. So no bottlenecks there. When inspecting the snoop files if found out that when copying esxi opened the new file (proc 7) on the nfs datastore with the fsync flag, as expected. Everything from mount,fsstat etc lookup normal. I want to the simulate the nfs rpc&#039;s issued by esxi shell cp command on a linux client to see if I can reproduce the behavior on another platform. 
TIA,
Frederik</description>
		<content:encoded><![CDATA[<p>Much appreciated. Very coarse stats from zilstat show a drop from 3100-3400 io&#8217;s per txg to 1900-2300 on anything later oi148. And I&#8217;m using as fast dedicated slog, ram based, on both machines. So no bottlenecks there. When inspecting the snoop files if found out that when copying esxi opened the new file (proc 7) on the nfs datastore with the fsync flag, as expected. Everything from mount,fsstat etc lookup normal. I want to the simulate the nfs rpc&#8217;s issued by esxi shell cp command on a linux client to see if I can reproduce the behavior on another platform.<br />
TIA,<br />
Frederik</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike La Spina</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3958</link>
		<dc:creator>Mike La Spina</dc:creator>
		<pubDate>Mon, 19 Dec 2011 15:58:12 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3958</guid>
		<description>Hi Frederik,

I see ... I will do some bench marks on my end and see if it&#039;s something I have overlooked.

Regards,
Mike</description>
		<content:encoded><![CDATA[<p>Hi Frederik,</p>
<p>I see &#8230; I will do some bench marks on my end and see if it&#8217;s something I have overlooked.</p>
<p>Regards,<br />
Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederik</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3955</link>
		<dc:creator>Frederik</dc:creator>
		<pubDate>Mon, 19 Dec 2011 13:33:25 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3955</guid>
		<description>I&#039;m aware of that issue. I even installed an updated (unpublished) e1000g driver a couple of days ago on top of my latest illumos bits but that did not resolve the issue. And to further exclude the dma issue and narrow it down a bit more. I compiled an iperf for the esxi console and can reach 934mbit both ways from the esxi service console to the storage servers. I can ftp at wirespeed, on a single socket, to the zfs storage from other clients. So basically the storage is ok EXCEPT when writing from ESXi. This can be observed from within the guest and from the service console, when doing a copy from a local fast ssd to the nfs datastore. As soon as I reboot back into oi148 the writes are back on the expected performance. The second test server was totally fresh installed with minimal configuration. The same behaviour can be observed on that machine as well.
Do you know how to simulate the behavior of esx when writing to vmdk&#039;s on a nfs backed store on a Linux client for example. Mounting the store from Linux with the sync option gave totally different behavior. Does esx mounts the store async perhaps but open the vmdk file with the sync flag perhaps.
Any other testing and or debugging strategy would be more than welcome before I file bug report.
Again, thank you for your time.</description>
		<content:encoded><![CDATA[<p>I&#8217;m aware of that issue. I even installed an updated (unpublished) e1000g driver a couple of days ago on top of my latest illumos bits but that did not resolve the issue. And to further exclude the dma issue and narrow it down a bit more. I compiled an iperf for the esxi console and can reach 934mbit both ways from the esxi service console to the storage servers. I can ftp at wirespeed, on a single socket, to the zfs storage from other clients. So basically the storage is ok EXCEPT when writing from ESXi. This can be observed from within the guest and from the service console, when doing a copy from a local fast ssd to the nfs datastore. As soon as I reboot back into oi148 the writes are back on the expected performance. The second test server was totally fresh installed with minimal configuration. The same behaviour can be observed on that machine as well.<br />
Do you know how to simulate the behavior of esx when writing to vmdk&#8217;s on a nfs backed store on a Linux client for example. Mounting the store from Linux with the sync option gave totally different behavior. Does esx mounts the store async perhaps but open the vmdk file with the sync flag perhaps.<br />
Any other testing and or debugging strategy would be more than welcome before I file bug report.<br />
Again, thank you for your time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike La Spina</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3952</link>
		<dc:creator>Mike La Spina</dc:creator>
		<pubDate>Mon, 19 Dec 2011 06:49:32 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3952</guid>
		<description>The performance issue you have described is probably related to a known DMA driver function on certain network adapter calls which is currently fixed. You would need to point to the experimental repo and update the code to verify it on your install.

Regards,
Mike</description>
		<content:encoded><![CDATA[<p>The performance issue you have described is probably related to a known DMA driver function on certain network adapter calls which is currently fixed. You would need to point to the experimental repo and update the code to verify it on your install.</p>
<p>Regards,<br />
Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederik</title>
		<link>http://blog.laspina.ca/about/comment-page-1#comment-3948</link>
		<dc:creator>Frederik</dc:creator>
		<pubDate>Sun, 18 Dec 2011 23:42:21 +0000</pubDate>
		<guid isPermaLink="false">http://ux1.laspina.ca/?page_id=2#comment-3948</guid>
		<description>Hi Mike,

have you by any change used OpenIndiana 151a or latest Illumos bits as a VMware datastore on one machine serving a different esxi machine? I&#039;m experiencing a serious regression on write perf, both NFS and iSCSI. I see this on two test servers, AMD and INTEL. Once I boot back into oi148, the last opensolaris bits before the gate closed, performance is as expected. When I reboot into later, Illumos based bits, write performance is roughly halved. Since this is such a common setup I&#039;m wondering why nobody else reported this. Except one other user who confirmed by mail that he still had the write performance drop. And because you use ESXi daily and did do a lot with OpenSolaris I thought if anybody would have seen this it would be you. Before filing a bug report I would like to double check with some known &quot;power users&quot;. 
Thank you for your time.</description>
		<content:encoded><![CDATA[<p>Hi Mike,</p>
<p>have you by any change used OpenIndiana 151a or latest Illumos bits as a VMware datastore on one machine serving a different esxi machine? I&#8217;m experiencing a serious regression on write perf, both NFS and iSCSI. I see this on two test servers, AMD and INTEL. Once I boot back into oi148, the last opensolaris bits before the gate closed, performance is as expected. When I reboot into later, Illumos based bits, write performance is roughly halved. Since this is such a common setup I&#8217;m wondering why nobody else reported this. Except one other user who confirmed by mail that he still had the write performance drop. And because you use ESXi daily and did do a lot with OpenSolaris I thought if anybody would have seen this it would be you. Before filing a bug report I would like to double check with some known &#8220;power users&#8221;.<br />
Thank you for your time.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

