Hello,
We've got a 3020 cluster that's running OnTap 7.2.3; each node in the cluster is connected to DS14 and DS14MK2 shelves. We recently added some shelves (DS14 MK2) and as part of this maintenance we decided to migrate data off the existing traditional volumes to new flexible volumes since we're not interested in having traditional volumes anymore. For this purpose we used ndmpcopy -- to migrate data between shelves *local* to each node in the cluster, e.g.:
filer> ndmpcopy -l 0 /vol/oldtradvol /vol/newflexvol
Since we'd done this kind of maintenance on our other 3020s before, we'd expected throughput of ~18-20g/hr as this hardware is identical to filers we'd worked on in the past. Unfortunately results were totally inconsistent:
node A performed as we'd expected and I was able to migrate 300g of data overnight. node B has consistently transferred 3-6g/hr.
I'm not seeing any errors on node B and I've rechecked my cabling to make sure nothing is amiss.
Why might I see such disparity in results on node A and node B?
If I were to guess.... Does the slow volume have a large number of files? Are those files tiny?
Do a df -i to see how many inodes are in use. Have you ever increased maxfiles? Have in ever increased the volume option for max-dir-szie? Either of those two changes will adversely affect the speed of *any* file-based backup.
--tmac
RedHat Certified Engineer #804006984323821 (RHEL4) RedHat Certified Engineer #805007643429572 (RHEL5)
Principal Consultant
On Tue, Sep 16, 2008 at 12:59 PM, Nathan Patwardhan noopy.org@gmail.comwrote:
Hello,
We've got a 3020 cluster that's running OnTap 7.2.3; each node in the cluster is connected to DS14 and DS14MK2 shelves. We recently added some shelves (DS14 MK2) and as part of this maintenance we decided to migrate data off the existing traditional volumes to new flexible volumes since we're not interested in having traditional volumes anymore. For this purpose we used ndmpcopy -- to migrate data between shelves *local* to each node in the cluster, e.g.:
filer> ndmpcopy -l 0 /vol/oldtradvol /vol/newflexvol
Since we'd done this kind of maintenance on our other 3020s before, we'd expected throughput of ~18-20g/hr as this hardware is identical to filers we'd worked on in the past. Unfortunately results were totally inconsistent:
node A performed as we'd expected and I was able to migrate 300g of data overnight. node B has consistently transferred 3-6g/hr.
I'm not seeing any errors on node B and I've rechecked my cabling to make sure nothing is amiss.
Why might I see such disparity in results on node A and node B?
-- Nathan Patwardhan "But the album's biggest setback, other than the fact that its title sounds like a Neutrogena product, ..." -- review of Jewels' latest album
On Tue, Sep 16, 2008 at 1:33 PM, tmac tmacmd@gmail.com wrote:
If I were to guess.... Does the slow volume have a large number of files?
You betcha. :-)
Are those files tiny?
Some yes, some no.
Do a df -i to see how many inodes are in use.
We seem to be around 60% on average.
Have you ever increased maxfiles?
No.
Have in ever increased the volume option for max-dir-szie?
Yes. Values are consistently set at 20971 on source and destination volumes on *all* filers.
Either of those two changes will adversely affect the speed of *any* file-based backup.
Even so, when I consider that the other node had the same settings and didn't exhibit this slowness, I am left confused. :-(
You're moving from trad vols to flex vols? Did you perhaps expand the traditional volume by adding a disk or two when you started running out of space, as opposed to pre-allocating all your disks? You could be doing your reads from all your disks concurrently on one filer and one or two on the other. Also, is this filer being used for other tasks while the ndmp is occurring?
-n
On 9/16/08 11:03 AM, "Nathan Patwardhan" noopy.org@gmail.com wrote:
On Tue, Sep 16, 2008 at 1:33 PM, tmac tmacmd@gmail.com wrote:
If I were to guess.... Does the slow volume have a large number of files?
You betcha. :-)
Are those files tiny?
Some yes, some no.
Do a df -i to see how many inodes are in use.
We seem to be around 60% on average.
Have you ever increased maxfiles?
No.
Have in ever increased the volume option for max-dir-szie?
Yes. Values are consistently set at 20971 on source and destination volumes on *all* filers.
Either of those two changes will adversely affect the speed of *any* file-based backup.
Even so, when I consider that the other node had the same settings and didn't exhibit this slowness, I am left confused. :-(
On Tue, Sep 16, 2008 at 2:24 PM, Nicholas Bernstein nick@nicholasbernstein.com wrote:
You're moving from trad vols to flex vols? Did you perhaps expand the traditional volume by adding a disk or two when you started running out of space, as opposed to pre-allocating all your disks?
I'm sure that this has happened over time as we've added capacity to the environment and upgraded from OnTap 6.x to 7.x.
You could be doing your reads from all your disks concurrently on one filer and one or two on the other.
This makes sense and comparing filers who are impacted versus those who are not, I'd bet this is the case. - I've noticed that for the impacted filers I'll get lousy performance for the first, say, 150g of a ndmpcopy (like <2g/hr on average) and more acceptable performance (like 9g/hr on average) after 150g has been transferred. - I've noticed that ndmpcopy between trad and flex vols (where the trad vol is the same size but with far less inodes used), the performance of ndmpcopy is approximately equal to what I've seen on our other files (18-20g/hr). I make note of this since one of the other responders in this thread mentioned inode usage/directory size and I suspect that this is one of two things that we're bumping up against here.
Also, is this filer being used for other tasks while the ndmp is occurring?
No, thankfully.
Any VMDK's on the slow volume?
Christopher
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Nathan Patwardhan Sent: Tuesday, September 16, 2008 1:04 PM To: tmac Cc: toasters@mathworks.com Subject: Re: Inconsistent ndmpcopy performance on a 3020 cluster
On Tue, Sep 16, 2008 at 1:33 PM, tmac tmacmd@gmail.com wrote:
If I were to guess.... Does the slow volume have a large number of files?
You betcha. :-)
Are those files tiny?
Some yes, some no.
Do a df -i to see how many inodes are in use.
We seem to be around 60% on average.
Have you ever increased maxfiles?
No.
Have in ever increased the volume option for max-dir-szie?
Yes. Values are consistently set at 20971 on source and destination volumes on *all* filers.
Either of those two changes will adversely affect the speed of *any* file-based backup.
Even so, when I consider that the other node had the same settings and didn't exhibit this slowness, I am left confused. :-(
-- Nathan Patwardhan "But the album's biggest setback, other than the fact that its title sounds like a Neutrogena product, ..." -- review of Jewels' latest album
On Tue, Sep 16, 2008 at 5:41 PM, Christopher Mende Christopher.Mende@peakuptime.com wrote:
Any VMDK's on the slow volume?
No.
Is minra different on the two volume sets? Minra=on makes reads much slower and can affect NDMP performance (tape or ndmpcopy).
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Nathan Patwardhan Sent: Tuesday, September 16, 2008 1:00 PM To: toasters@mathworks.com Subject: Inconsistent ndmpcopy performance on a 3020 cluster
Hello,
We've got a 3020 cluster that's running OnTap 7.2.3; each node in the cluster is connected to DS14 and DS14MK2 shelves. We recently added some shelves (DS14 MK2) and as part of this maintenance we decided to migrate data off the existing traditional volumes to new flexible volumes since we're not interested in having traditional volumes anymore. For this purpose we used ndmpcopy -- to migrate data between shelves *local* to each node in the cluster, e.g.:
filer> ndmpcopy -l 0 /vol/oldtradvol /vol/newflexvol
Since we'd done this kind of maintenance on our other 3020s before, we'd expected throughput of ~18-20g/hr as this hardware is identical to filers we'd worked on in the past. Unfortunately results were totally inconsistent:
node A performed as we'd expected and I was able to migrate 300g of data overnight. node B has consistently transferred 3-6g/hr.
I'm not seeing any errors on node B and I've rechecked my cabling to make sure nothing is amiss.
Why might I see such disparity in results on node A and node B?