I originally tried posting this back on April 11 - now that the list is "fixed" I want to try again - thanks:
Hi, we¹ve aligned all our Vmware vmdk¹s according to the Netapp best practices while tracking the pw.over_limit counter see: http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html
Counters that indicate improper alignment ( ref: ftp://service.boulder.ibm.com/storage/isv/NS3593-0.pdf) ³There are various ways of determining if you do not have proper alignment. Using perfstat counters, under the wafl_susp section, ³wp.partial_writes³, ³pw.over_limit³, and ³pw.async_read,³ are indicators of improper alignment. The ³wp.partial write³ is the block counter of unaligned I/O. If more than a small number of partial writes happen, then IBM® System StorageTM N series with WAFL® (write anywhere file layout) will launch a background read. These are counted in ³pw.async_read³; ³pw.over_limit³ is the block counter of the writes waiting on disk reads.²
--
So the pw.over_limit counter is still recording an 5 minute average of 14 with 7-10 peaks in the 50-100 range at certain times of the day. If I look at the clients talking to the Netapp those times its mostly Oracle RAC servers with storage for data and voting disks on NFS.
This leads me to the question: What if any are the other possible sources for unaligned IO on Netapp? All references I find are vmware vmdk but are there others like Oracle which may be doing block IO over NFS?
Many thanks
If you have database logs in your VMs, they can create what LOOKS like unaligned IO, but the nature of how the DB logs happen, they are false positives.
On Wed, Aug 24, 2011 at 10:16 PM, Fletcher Cocquyt fcocquyt@stanford.eduwrote:
I originally tried posting this back on April 11 - now that the list is "fixed" I want to try again - thanks:
Hi, we’ve aligned all our Vmware vmdk’s according to the Netapp best practices while tracking the pw.over_limit counter see: http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html
Counters that indicate improper alignment ( ref: ftp://service.boulder.ibm.com/storage/isv/NS3593-0.pdf) “There are various ways of determining if you do not have proper alignment. Using perfstat counters, under the wafl_susp section, “wp.partial_writes“, “pw.over_limit“, and “pw.async_read,“ are indicators of improper alignment. The “wp.partial write“ is the block counter of unaligned I/O. If more than a small number of partial writes happen, then IBM® System StorageTM N series with WAFL® (write anywhere file layout) will launch a background read. These are counted in “pw.async_read“; “pw.over_limit“ is the block counter of the writes waiting on disk reads.”
--
So the pw.over_limit counter is still recording an 5 minute average of 14 with 7-10 peaks in the 50-100 range at certain times of the day. If I look at the clients talking to the Netapp those times its mostly Oracle RAC servers with storage for data and voting disks on NFS.
This leads me to the question: What if any are the other possible sources for unaligned IO on Netapp? All references I find are vmware vmdk – but are there others like Oracle which may be doing block IO over NFS?
Many thanks
-- Fletcher Cocquyt Principal Engineer Information Resources and Technology (IRT) Stanford University School of Medicine
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
There are several other causes for misaligned I/O. The first is, as Jeff Mohler mentioned, volumes with transaction logs for databases. Transactions can be very small and they are written sequentially, so they can create crazy alignment statistics. One other is when you don't know exactly how the lun type (-t parameter when creating a lun) and the operating system offset for a partition work together.
I.e. if you use a Windows 2008 server, create a LUN of type "windows" (not "windows_2008"), and partition the LUN with default parameters, you will have misaligned I/O.
Also, if you use a Windows 2003 server, create a LUN of type "windows", and partition the LUN using diskpart with "create partition primary align=64" (as recommended by most other SAN vendors), you will have misaligned I/O because even though the LUN has the correct OS type, if it not partitioned with a default offset, you will have misalignment.
If you are not using virtualization, NetApp's LUN type must be set to the OS you are using, AND the partition must be created with default settings, or you will get misaligned I/O. If you are using virtualization, TR-3747 has all the information you need.
Also note that I am talking about _partition_ creation using default settings. Once the partition is created, you are free to format the LUN with any blocksize you like. For instance, for MS-SQL, selecting 4kB or 64kB blocks makes no difference for alignment, but the larger blocksize will yield better performance in SQL.
By the way, from ONTAP 8.0.1+, you can directly check alignment with the "lun show -v" command:
/vol/vol1/qtree1/lun1 300.0g (322126640640) (r/w, online, mapped) Serial#: SKK4gZcq8s/m Share: none Space Reservation: disabled Multiprotocol Type: windows_2008 Maps: win2008server Occupied Size: 809.2m (848465920) Creation Time: Mon May 16 12:35:58 CEST 2011 Alignment: aligned Cluster Shared Volume Information: 0x0
Regards,
Anton van Bohemen
Van: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] Namens Jeff Mohler Verzonden: donderdag 25 augustus 2011 17:34 Aan: Fletcher Cocquyt CC: toasters@teaparty.net Onderwerp: Re: Sources of unaligned IO other that Vmware? - pw.over_limit persists
If you have database logs in your VMs, they can create what LOOKS like unaligned IO, but the nature of how the DB logs happen, they are false positives. On Wed, Aug 24, 2011 at 10:16 PM, Fletcher Cocquyt <fcocquyt@stanford.edumailto:fcocquyt@stanford.edu> wrote: I originally tried posting this back on April 11 - now that the list is "fixed" I want to try again - thanks:
Hi, we've aligned all our Vmware vmdk's according to the Netapp best practices while tracking the pw.over_limit counter see: http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html
Counters that indicate improper alignment ( ref: ftp://service.boulder.ibm.com/storage/isv/NS3593-0.pdf) "There are various ways of determining if you do not have proper alignment. Using perfstat counters, under the wafl_susp section, "wp.partial_writes", "pw.over_limit", and "pw.async_read," are indicators of improper alignment. The "wp.partial write" is the block counter of unaligned I/O. If more than a small number of partial writes happen, then IBM(r) System StorageTM N series with WAFL(r) (write anywhere file layout) will launch a background read. These are counted in "pw.async_read"; "pw.over_limit" is the block counter of the writes waiting on disk reads."
--
So the pw.over_limit counter is still recording an 5 minute average of 14 with 7-10 peaks in the 50-100 range at certain times of the day. If I look at the clients talking to the Netapp those times its mostly Oracle RAC servers with storage for data and voting disks on NFS.
This leads me to the question: What if any are the other possible sources for unaligned IO on Netapp? All references I find are vmware vmdk - but are there others like Oracle which may be doing block IO over NFS?
Many thanks
-- Fletcher Cocquyt Principal Engineer Information Resources and Technology (IRT) Stanford University School of Medicine [cid:image001.jpg@01CC63FF.6D69EAD0] http://vmadmin.info
_______________________________________________ Toasters mailing list Toasters@teaparty.netmailto:Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
-- --- Gustatus Similis Pullus
This e-mail is personal. For our full disclaimer, please visit www.centric.eu/disclaimer.
"By the way, from ONTAP 8.0.1+, you can directly check alignment with the “lun show –v” command:"
---
Gotta be careful here, this only works on the -direct- lun.
This will not tell you how healthy any underlying virtual filesystems within the LUN may or may not be, only the high level direct LUN itself.
On Fri, Aug 26, 2011 at 6:07 AM, Bohemen, Anton van < Anton.van.Bohemen@centric.eu> wrote:
There are several other causes for misaligned I/O. The first is, as Jeff Mohler mentioned, volumes with transaction logs for databases. Transactions can be very small and they are written sequentially, so they can create crazy alignment statistics. One other is when you don’t know exactly how the lun type (-t parameter when creating a lun) and the operating system offset for a partition work together.****
I.e. if you use a Windows 2008 server, create a LUN of type “windows” (not “windows_2008”), and partition the LUN with default parameters, you will have misaligned I/O.****
Also, if you use a Windows 2003 server, create a LUN of type “windows”, and partition the LUN using diskpart with “create partition primary align=64” (as recommended by most other SAN vendors), you will have misaligned I/O because even though the LUN has the correct OS type, if it not partitioned with a default offset, you will have misalignment.****
If you are not using virtualization, NetApp’s LUN type must be set to the OS you are using, AND the partition must be created with default settings, or you will get misaligned I/O. If you are using virtualization, TR-3747 has all the information you need.****
Also note that I am talking about _*partition*_ creation using default settings. Once the partition is created, you are free to format the LUN with any blocksize you like. For instance, for MS-SQL, selecting 4kB or 64kB blocks makes no difference for alignment, but the larger blocksize will yield better performance in SQL.****
By the way, from ONTAP 8.0.1+, you can directly check alignment with the “lun show –v” command:****
/vol/vol1/qtree1/lun1 300.0g (322126640640) (r/w, online, mapped)****
Serial#: SKK4gZcq8s/m**** Share: none**** Space Reservation: disabled**** Multiprotocol Type: windows_2008**** Maps: win2008server**** Occupied Size: 809.2m (848465920) **** Creation Time: Mon May 16 12:35:58 CEST 2011****
Alignment: aligned* Cluster Shared Volume Information: 0x0****
Regards,****
Anton van Bohemen****
*Van:* toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] *Namens *Jeff Mohler *Verzonden:* donderdag 25 augustus 2011 17:34 *Aan:* Fletcher Cocquyt *CC:* toasters@teaparty.net *Onderwerp:* Re: Sources of unaligned IO other that Vmware? - pw.over_limit persists****
If you have database logs in your VMs, they can create what LOOKS like unaligned IO, but the nature of how the DB logs happen, they are false positives.****
On Wed, Aug 24, 2011 at 10:16 PM, Fletcher Cocquyt fcocquyt@stanford.edu wrote:****
I originally tried posting this back on April 11 - now that the list is "fixed" I want to try again - thanks:
Hi, we’ve aligned all our Vmware vmdk’s according to the Netapp best practices while tracking the pw.over_limit counter see: http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html
Counters that indicate improper alignment ( ref: ftp://service.boulder.ibm.com/storage/isv/NS3593-0.pdf) “There are various ways of determining if you do not have proper alignment. Using perfstat counters, under the wafl_susp section, “wp.partial_writes“, “pw.over_limit“, and “pw.async_read,“ are indicators of improper alignment. The “wp.partial write“ is the block counter of unaligned I/O. If more than a small number of partial writes happen, then IBM® System StorageTM N series with WAFL® (write anywhere file layout) will launch a background read. These are counted in “pw.async_read“; “pw.over_limit“ is the block counter of the writes waiting on disk reads.”
--
So the pw.over_limit counter is still recording an 5 minute average of 14 with 7-10 peaks in the 50-100 range at certain times of the day. If I look at the clients talking to the Netapp those times its mostly Oracle RAC servers with storage for data and voting disks on NFS.
This leads me to the question: What if any are the other possible sources for unaligned IO on Netapp? All references I find are vmware vmdk – but are there others like Oracle which may be doing block IO over NFS?
Many thanks
-- Fletcher Cocquyt Principal Engineer Information Resources and Technology (IRT) Stanford University School of Medicine
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters****
--
Gustatus Similis Pullus****
This e-mail is personal. For our full disclaimer, please visit www.centric.eu/disclaimer.
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
On Fri, Aug 26, 2011 at 11:34 AM, Jeff Mohler speedtoys.racing@gmail.comwrote:
"By the way, from ONTAP 8.0.1+, you can directly check alignment with the “lun show –v” command:"
Gotta be careful here, this only works on the -direct- lun.
This will not tell you how healthy any underlying virtual filesystems within the LUN may or may not be, only the high level direct LUN itself.
Will easily queryable stats about misaligned VMDKs on NFS ever be available? Without mounting the datastores on Linux and checking with fdisk...