I'd agree with the latency standpoint.
It's a good metric to determine when the users will start to bitch.
The latency might be hidden by the application, but it depends on how well it's written (single threaded vs multi threaded\low concurrency vs high concurrency).
Typically, this is all a function of CPU\DISK (and sometimes memory) load. As a result, those are good to watch, and they are a direct result of the overall workload (CIFS\NFS\iSCSI IOPS).
The hard part is determining what\why the workload tanks. Most times, users are only going to feel impact if there is a long running event that generates high latency - however, depending on workload, a single spike that lasts only moments (a second or two) can be devastating: this is where knowing your environment comes in handy.
Glenn
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Blake Golliher Sent: Wednesday, November 15, 2006 7:08 PM To: George, Andrew Cc: Rohit; toasters@mathworks.com Subject: Re: Measuring Filer Performance
Latency is primary for what most of what my filers are doing. Usually watched via nfs_hist (being sure to -z the stats), or using stats all wrapped up in a shell script.
rsh $FILER 'priv set -q diag ; stats show -i $INTER -n $SAMPLES volume:$VOL:avg_latency volume:$VOL:total_ops volume:$VOL:read_latency volume:$VOL:read_ops volume:$VOL:write_latency volume:$VOL:write_ops volume:$VOL:other_latency volume:$VOL:other_ops volume:$VOL:nfs_read_latency volume:$VOL:nfs_read_ops volume:$VOL:nfs_write_latency volume:$VOL:nfs_write_ops volume:$VOL:nfs_other_latency volume:$VOL:nfs_other_ops' nfsv3:nfs:nfsv3_avg_op_latency nfsv3:nfs:nfsv3_op_percent nfsv3:nfs:nfsv3_write_ops nfsv3:nfs:nfsv3_write_latency nfsv3:nfs:nfsv3_read_ops nfsv3:nfs:nfsv3_read_latency nfsv3:nfs:nfsv3_ops'
It's good to note that while stats show averages, nfs_hist shows a histogram, and can spot the occasional very high latency operation. Good to watch.
I just poked around, there is a cifs component for stats, and it shows some interesting metrics the cifs users might want to watch.
filer01*> stats show cifsdomain cifs cifs:cifs:cifs_ops:2658/s cifs:cifs:cifs_op_count: GetAttr: 1296 Read: 989 Write: 0 Lock: 1 Open/Close: 23 Directory: 0 Other: 335 cifs:cifs:cifs_op_pct: GetAttr: 48% Read: 37% Write: 0% Lock: 0% Open/Close: 0% Directory: 0% Other: 12% cifs:cifs:cifs_latency:3.27ms cifsdomain:10.0.0.3:netlogon_latency:9.50ms cifsdomain:10.0.0.3:netlogon_latency_base:6 cifsdomain:10.0.0.3:lsa_latency:0ms cifsdomain:10.0.0.3:lsa_latency_base:0 cifsdomain:10.0.0.3:samr_latency:0ms cifsdomain:10.0.0.3:samr_latency_base:0 cifsdomain:10.0.0.2:netlogon_latency:2.00ms cifsdomain:10.0.0.2:netlogon_latency_base:1 cifsdomain:10.0.0.2:lsa_latency:0ms cifsdomain:10.0.0.2:lsa_latency_base:0 cifsdomain:10.0.0.2:samr_latency:0ms cifsdomain:10.0.0.2:samr_latency_base:0 cifsdomain:10.0.0.1:netlogon_latency:0ms cifsdomain:10.0.0.1:netlogon_latency_base:0 cifsdomain:10.0.0.1:lsa_latency:0ms cifsdomain:10.0.0.1:lsa_latency_base:0 cifsdomain:10.0.0.1:samr_latency:0ms cifsdomain:10.0.0.1:samr_latency_base:0
There's also smb_hist.
That's from a 960 running 6.5.6PsomethingDsomething....
-Blake
On 11/15/06, George, Andrew georgea@anz.com wrote:
We only use our filers for CIFS (so this may be completely
inapplicable
to you)
- Cache age! Mainly because when I have seen filers that are worked
to
death, that's the value that tends to react to it (though CP there are
a
lot of secondary metrics to support that).
- As mentioned by someone else...the magic metric is how badly the
users are bitching. That being a little "rubbery" we also check cache age, cp type, disk utilisation, cpu usage, Ops/sec and network traffic to determine if anything's bottlenecking.
-----Original Message----- From: owner-toasters@mathworks.com
[mailto:owner-toasters@mathworks.com]
On Behalf Of Rohit Sent: Wednesday, 15 November 2006 9:59 PM To: toasters@mathworks.com Subject: Measuring Filer Performance
Hi Folks
I had some open questions and i hope you'll help me answer them based
on
your experience.
- If you just had a single metric to determine filer performance what
would it be ? Would it be CPU usage, throughput, latency, IOPS..or anything else
?
- At your place, what metrics do you use to determine if a filer is
still good enough not to worry about ?
thanks rohit
------------------------------------------------------------------------
-- Refinance & Home Equity Loan Quotes Low credit score? Old Merchants can help. Get a quote in 2 easy steps! http://tagline.bidsystem.com/fc/BgLEQfI7zNZsu5fZHhXyj0UMjSNzY6ZfSaWC/
"This e-mail and any attachments to it (the "Communication") is,
unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Australia and New Zealand Banking Group Limited ABN 11 005 357 522, or any of its related entities including ANZ National Bank Limited (together "ANZ"). ANZ does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication."
Hi,
i agree that latency is one of the main factors. But sometimes i wonder how to measure the latency the best way.
One way i know is to use "tethereal -i [interface] -z rpc,rtt,100003,3" Which gives me an overview of the rtt which is seen by the client.
But i am not so sure what to use to see the output from the filer. I know that statit gives me some kind of rtt for each disk. But where do you get a good overall-response-time for your filers from?
Those counters shown by Blake (...stat show...) are very good, but i think they only show what the head sees. What is a reasonable value to be added as an "head-overhead".
For example:
- I run tethereal from a direct-attached client and see a read-rtt of 12ms - I run stat show -i 1 nfsv3:nfs:nfsv3_read_latency and see 3ms - I run a ping from the nfsv3:nfs:nfsv3_read_latency and get a rtt of 2ms - So i would think that the filer-overhead is ~7ms
Do you think this is correct way to go?
Best Regards
Jochen
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker Sent: Thursday, November 16, 2006 4:25 AM To: Blake Golliher; George, Andrew Cc: Rohit; toasters@mathworks.com Subject: RE: Measuring Filer Performance
I'd agree with the latency standpoint.
It's a good metric to determine when the users will start to bitch.
The latency might be hidden by the application, but it depends on how well it's written (single threaded vs multi threaded\low concurrency vs high concurrency).
Typically, this is all a function of CPU\DISK (and sometimes memory) load. As a result, those are good to watch, and they are a direct result of the overall workload (CIFS\NFS\iSCSI IOPS).
The hard part is determining what\why the workload tanks. Most times, users are only going to feel impact if there is a long running event that generates high latency - however, depending on workload, a single spike that lasts only moments (a second or two) can be devastating: this is where knowing your environment comes in handy.
Glenn
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Blake Golliher Sent: Wednesday, November 15, 2006 7:08 PM To: George, Andrew Cc: Rohit; toasters@mathworks.com Subject: Re: Measuring Filer Performance
Latency is primary for what most of what my filers are doing. Usually watched via nfs_hist (being sure to -z the stats), or using stats all wrapped up in a shell script.
rsh $FILER 'priv set -q diag ; stats show -i $INTER -n $SAMPLES volume:$VOL:avg_latency volume:$VOL:total_ops volume:$VOL:read_latency volume:$VOL:read_ops volume:$VOL:write_latency volume:$VOL:write_ops volume:$VOL:other_latency volume:$VOL:other_ops volume:$VOL:nfs_read_latency volume:$VOL:nfs_read_ops volume:$VOL:nfs_write_latency volume:$VOL:nfs_write_ops volume:$VOL:nfs_other_latency volume:$VOL:nfs_other_ops' nfsv3:nfs:nfsv3_avg_op_latency nfsv3:nfs:nfsv3_op_percent nfsv3:nfs:nfsv3_write_ops nfsv3:nfs:nfsv3_write_latency nfsv3:nfs:nfsv3_read_ops nfsv3:nfs:nfsv3_read_latency nfsv3:nfs:nfsv3_ops'
It's good to note that while stats show averages, nfs_hist shows a histogram, and can spot the occasional very high latency operation. Good to watch.
I just poked around, there is a cifs component for stats, and it shows some interesting metrics the cifs users might want to watch.
filer01*> stats show cifsdomain cifs cifs:cifs:cifs_ops:2658/s cifs:cifs:cifs_op_count: GetAttr: 1296 Read: 989 Write: 0 Lock: 1 Open/Close: 23 Directory: 0 Other: 335 cifs:cifs:cifs_op_pct: GetAttr: 48% Read: 37% Write: 0% Lock: 0% Open/Close: 0% Directory: 0% Other: 12% cifs:cifs:cifs_latency:3.27ms cifsdomain:10.0.0.3:netlogon_latency:9.50ms cifsdomain:10.0.0.3:netlogon_latency_base:6 cifsdomain:10.0.0.3:lsa_latency:0ms cifsdomain:10.0.0.3:lsa_latency_base:0 cifsdomain:10.0.0.3:samr_latency:0ms cifsdomain:10.0.0.3:samr_latency_base:0 cifsdomain:10.0.0.2:netlogon_latency:2.00ms cifsdomain:10.0.0.2:netlogon_latency_base:1 cifsdomain:10.0.0.2:lsa_latency:0ms cifsdomain:10.0.0.2:lsa_latency_base:0 cifsdomain:10.0.0.2:samr_latency:0ms cifsdomain:10.0.0.2:samr_latency_base:0 cifsdomain:10.0.0.1:netlogon_latency:0ms cifsdomain:10.0.0.1:netlogon_latency_base:0 cifsdomain:10.0.0.1:lsa_latency:0ms cifsdomain:10.0.0.1:lsa_latency_base:0 cifsdomain:10.0.0.1:samr_latency:0ms cifsdomain:10.0.0.1:samr_latency_base:0
There's also smb_hist.
That's from a 960 running 6.5.6PsomethingDsomething....
-Blake
On 11/15/06, George, Andrew georgea@anz.com wrote:
We only use our filers for CIFS (so this may be completely
inapplicable
to you)
- Cache age! Mainly because when I have seen filers that are worked
to
death, that's the value that tends to react to it (though CP there are
a
lot of secondary metrics to support that).
- As mentioned by someone else...the magic metric is how badly the
users are bitching. That being a little "rubbery" we also check cache age, cp type, disk utilisation, cpu usage, Ops/sec and network traffic to determine if anything's bottlenecking.
-----Original Message----- From: owner-toasters@mathworks.com
[mailto:owner-toasters@mathworks.com]
On Behalf Of Rohit Sent: Wednesday, 15 November 2006 9:59 PM To: toasters@mathworks.com Subject: Measuring Filer Performance
Hi Folks
I had some open questions and i hope you'll help me answer them based
on
your experience.
- If you just had a single metric to determine filer performance what
would it be ? Would it be CPU usage, throughput, latency, IOPS..or anything else
?
- At your place, what metrics do you use to determine if a filer is
still good enough not to worry about ?
thanks rohit
------------------------------------------------------------------------
-- Refinance & Home Equity Loan Quotes Low credit score? Old Merchants can help. Get a quote in 2 easy steps! http://tagline.bidsystem.com/fc/BgLEQfI7zNZsu5fZHhXyj0UMjSNzY6ZfSaWC/
"This e-mail and any attachments to it (the "Communication") is,
unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Australia and New Zealand Banking Group Limited ABN 11 005 357 522, or any of its related entities including ANZ National Bank Limited (together "ANZ"). ANZ does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication."