I'm trying to restore a file on a filer (via NetBackup) and the restore is *crawling* (a few KB/s on avg).
One test I'd normally do would be to simply mount this tape up on the system and test that I could read it quickly. Is there any way to do this directly on the filer? I see 'dd', but I don't see how to read from the tape device.
I don't think I can run a 'restore' because the data on the tape will be wrapped with NetBackup header information. I just want to test that the data can be read quickly. So actually interpreting the data isn't important at this point.
Thanks.
Background: Directly attached LTO3 drive. Backups are normally nice and speedy (50+ MB/s). ndmpd probe suggests that it's read 15GB of data in over 24 hours. *way* too slow.
Darren,
I think I may know what your issue is. I ran into this same issue at a client while implementing a Nearstore VTL 700 and performing restore tests on a FAS3040c. We saw backup speeds in excess of 100+ MB/s. However, when we did restores, we only saw speeds of 4-6MB/s. It turned out that the system was being affected by BUG 230194 - http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=230194 . This is easy to test. If you are working with a clustered system, temporarily disable clustering (cf disable) and run your restore test. If you the performance picks up as expected, then you are affected by this. According to NOW, this was first fixed in 7.2.2 however, at the time, my client was running 7.2.3P4 and it was still present. We upgraded them to 7.2.3P8 (will be upgrading them to 7.3.1 when ready) at the time and the problem was gone.
Regards, Andre M. Clark
From the notes of that bug:
Make sure to obtain a fix from bug#272701.
Perhaps that is related in some way to the fact you didn't see the fix until 7.2.3P8?
--- On Fri, 9/19/08, "André M. Clark" Andre.Clark@earthlink.net wrote:
From: "André M. Clark" Andre.Clark@earthlink.net Subject: Re: Test tape read speed To: "A Darren Dunham" ddunham@taos.com Cc: toasters@mathworks.com Date: Friday, September 19, 2008, 12:56 AM Darren,
I think I may know what your issue is. I ran into this same issue at a client while implementing a Nearstore VTL 700 and performing restore tests on a FAS3040c. We saw backup speeds in excess of 100+ MB/s. However, when we did restores, we only saw speeds of 4-6MB/s. It turned out that the system was being affected by BUG 230194
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=230194
. This is easy to test. If you are working with a clustered system, temporarily disable clustering (cf disable) and run your restore test. If you the performance picks up as expected, then you are affected by this. According to NOW, this was first fixed in 7.2.2 however, at the time, my client was running 7.2.3P4 and it was still present. We upgraded them to 7.2.3P8 (will be upgrading them to 7.3.1 when ready) at the time and the problem was gone.
Regards, Andre M. Clark
On Fri, Sep 19, 2008 at 12:56:40AM -0400, "Andr? M. Clark" wrote:
Darren,
I think I may know what your issue is. I ran into this same issue at a client while implementing a Nearstore VTL 700 and performing restore tests on a FAS3040c. We saw backup speeds in excess of 100+ MB/s. However, when we did restores, we only saw speeds of 4-6MB/s. It turned out that the system was being affected by BUG 230194 - http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=230194.
Darn. Nope, this is a standalone machine. No cluster license or interconnect present.
Any other guesses?
Thanks!
On Thu, Sep 18, 2008 at 11:23:36PM +0000, A Darren Dunham wrote:
Background: Directly attached LTO3 drive. Backups are normally nice and speedy (50+ MB/s). ndmpd probe suggests that it's read 15GB of data in over 24 hours. *way* too slow.
I guess I was just looking for a read technique when I asked. I didn't really think the system details were important, so I forgot to include them. But in case they are...
Filer: FAS6070 OnTap: 7.2.2
slot 0: FC Host Adapter 0g (Dual-channel, QLogic 2322 rev. 3, 64-bit, L-port, <UP>) Firmware rev: 3.3.21 Host Loop Id: 7 FC Node Name: 5:00a:098600:011457 Cacheline size: 16 FC Packet size: 2048 SRAM parity: Yes External GBIC: No Link Data Rate: 2 Gbit 111: Tape: IBM ULTRIUM-TD3 64D0 slot 0: FC Host Adapter 0h (Dual-channel, QLogic 2322 rev. 3, 64-bit, L-port, <UP>) Firmware rev: 3.3.21 Host Loop Id: 7 FC Node Name: 5:00a:098700:011457 Cacheline size: 16 FC Packet size: 2048 SRAM parity: Yes External GBIC: No Link Data Rate: 2 Gbit 111: Tape: IBM ULTRIUM-TD3 64D0
You can use 'mt' to mount a tape, enable diagnostics, and some rudimentary testing of the device. If you are using ndmp to perform the restore, you can enable debugging ('ndmpd debug 70' for max info) to see if there are any issues there.
sysstat could be helpful during the restore to try to determine any bottlenecks on the filer end. Also, the syslog for any SCSI
Restores will be slower than backups. I don't know the max's for the FAS6070, but when we had an R200 we saw that we could run about 200MB/sec backups with the CPUs at 100%. According to a NetApp engineer the max that an R200 could write to disk was about 50MB/sec. 15GB/24hr restore is incredibly slow. We've replaced our R200 with a FAS6070 with 6ea LTO-4 drives and backup 8TB daily over an 18hr period. I haven't actually timed my test restores as they have always seemed to be within acceptable time periods.
I don't know if it makes a difference, but we don't use the onboard FC ports for our tape drives. We use quad port FC tape adapters.
On Thu, Sep 18, 2008 at 7:23 PM, A Darren Dunham ddunham@taos.com wrote:
I'm trying to restore a file on a filer (via NetBackup) and the restore is *crawling* (a few KB/s on avg).
One test I'd normally do would be to simply mount this tape up on the system and test that I could read it quickly. Is there any way to do this directly on the filer? I see 'dd', but I don't see how to read from the tape device.
I don't think I can run a 'restore' because the data on the tape will be wrapped with NetBackup header information. I just want to test that the data can be read quickly. So actually interpreting the data isn't important at this point.
Thanks.
Background: Directly attached LTO3 drive. Backups are normally nice and speedy (50+ MB/s). ndmpd probe suggests that it's read 15GB of data in over 24 hours. *way* too slow. -- Darren
On Fri, Sep 19, 2008 at 12:57:25PM -0400, Bill Holland wrote:
You can use 'mt' to mount a tape, enable diagnostics, and some rudimentary testing of the device. If you are using ndmp to perform the restore, you can enable debugging ('ndmpd debug 70' for max info) to see if there are any issues there.
Yeah. Tape motion works fine. I just wanted to see if I could direct OnTap to read the tape rather than deal with netbackup.
While netbackup/NDMP is restoring the file, I see very slow reads on sysstat, but I don't know if that's an NDMP issue or a tape read issue.
sysstat could be helpful during the restore to try to determine any bottlenecks on the filer end. Also, the syslog for any SCSI
Restores will be slower than backups. I don't know the max's for the FAS6070, but when we had an R200 we saw that we could run about 200MB/sec backups with the CPUs at 100%.
I can't get near that on the LTO-3. I never get faster than 80MB/s. The restore is around 100 KB/s.
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 54% 803 46 0 10642 5422 19045 15465 193 0 >60 18% 224 1 0 1330 5169 13065 13240 193 0 >60 22% 636 0 0 7691 6498 18857 12320 64 0 >60 20% 673 0 0 8540 6608 23073 19397 129 0 >60 29% 501 0 0 7063 5842 14262 29822 64 0 >60 24% 520 105 0 11136 5959 15357 4519 258 0 >60
I'm actually trying to restore a single 100K file. I assume that it has to read through some of headers at the beginning of the tape, but it was sure slow doing it.
Another potential issue.. is your backup software DAR capable/enabled? Which ndmp version are you running for your backups?
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of A Darren Dunham Sent: Friday, September 19, 2008 1:13 PM To: toasters@mathworks.com Subject: Re: Test tape read speed
On Fri, Sep 19, 2008 at 12:57:25PM -0400, Bill Holland wrote:
You can use 'mt' to mount a tape, enable diagnostics, and some
rudimentary
testing of the device. If you are using ndmp to perform the restore, you can enable debugging ('ndmpd debug 70' for max info) to see if there are any issues there.
Yeah. Tape motion works fine. I just wanted to see if I could direct OnTap to read the tape rather than deal with netbackup.
While netbackup/NDMP is restoring the file, I see very slow reads on sysstat, but I don't know if that's an NDMP issue or a tape read issue.
sysstat could be helpful during the restore to try to determine any bottlenecks on the filer end. Also, the syslog for any SCSI
Restores will be slower than backups. I don't know the max's for the FAS6070, but when we had an R200 we saw that we could run about
200MB/sec
backups with the CPUs at 100%.
I can't get near that on the LTO-3. I never get faster than 80MB/s. The restore is around 100 KB/s.
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 54% 803 46 0 10642 5422 19045 15465 193 0
60
18% 224 1 0 1330 5169 13065 13240 193 0
60
22% 636 0 0 7691 6498 18857 12320 64 0
60
20% 673 0 0 8540 6608 23073 19397 129 0
60
29% 501 0 0 7063 5842 14262 29822 64 0
60
24% 520 105 0 11136 5959 15357 4519 258 0
60
I'm actually trying to restore a single 100K file. I assume that it has to read through some of headers at the beginning of the tape, but it was sure slow doing it.
A> On Fri, Sep 19, 2008 at 12:57:25PM -0400, Bill Holland wrote:
You can use 'mt' to mount a tape, enable diagnostics, and some rudimentary testing of the device. If you are using ndmp to perform the restore, you can enable debugging ('ndmpd debug 70' for max info) to see if there are any issues there.
A> Yeah. Tape motion works fine. I just wanted to see if I could A> direct OnTap to read the tape rather than deal with netbackup.
Did you enable DAR with your NDMP backups? As I understand it, DAR allows the backup software to move the tape to the closest position near the file to be restored, which makes things much faster. Otherwise it needs to scan the tape sequentially...
A> While netbackup/NDMP is restoring the file, I see very slow reads A> on sysstat, but I don't know if that's an NDMP issue or a tape read A> issue.
Hmmm.... does NetBackup use a really small block size?
Maybe NDMP copy could be used to try and pull the backup off tape and send it to another node where you can extract it via pipe to tar (or whatever NetBackups uses for it's internal format) that way?
sysstat could be helpful during the restore to try to determine any bottlenecks on the filer end. Also, the syslog for any SCSI
Restores will be slower than backups. I don't know the max's for the FAS6070, but when we had an R200 we saw that we could run about 200MB/sec backups with the CPUs at 100%.
A> I can't get near that on the LTO-3. I never get faster than A> 80MB/s. The restore is around 100 KB/s.
That's just way too slow. I assume it's a fibre channel drive? All the way down to the drive? You'd think even a plain read/scan of the tape would fly.
What happens if you create your own backup tape(s) on the filer using plain 'dump' commands to the same drive. Can you then read it back more quickly? That will at least give you more confidence that you've got a good connection.
Speaking of that, maybe you've got a flaky fibre connection or dirty optics? Try re-seating both ends of the cable to make sure you're getting the right connection. I remember one with a 50pin (or was it 68?) where we bent one pin in the connector and it just failed down to old SCSI-1 speeds, but it did work. Just really really slowly...
A> CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache A> in out read write read write age A> 54% 803 46 0 10642 5422 19045 15465 193 0 >60 A> 18% 224 1 0 1330 5169 13065 13240 193 0 >60 A> 22% 636 0 0 7691 6498 18857 12320 64 0 >60 A> 20% 673 0 0 8540 6608 23073 19397 129 0 >60 A> 29% 501 0 0 7063 5842 14262 29822 64 0 >60 A> 24% 520 105 0 11136 5959 15357 4519 258 0 >60
A> I'm actually trying to restore a single 100K file. I assume that it has A> to read through some of headers at the beginning of the tape, but it was A> sure slow doing it.
A> -- A> Darren
On Fri, Sep 19, 2008 at 02:51:03PM -0400, John Stoffel wrote:
A> On Fri, Sep 19, 2008 at 12:57:25PM -0400, Bill Holland wrote: A> Yeah. Tape motion works fine. I just wanted to see if I could A> direct OnTap to read the tape rather than deal with netbackup.
Did you enable DAR with your NDMP backups? As I understand it, DAR allows the backup software to move the tape to the closest position near the file to be restored, which makes things much faster. Otherwise it needs to scan the tape sequentially...
Correct. Or at least what I've done is to not disable it. My restoration log says that DAR is enabled, but it still took a couple of days to restore.
A> While netbackup/NDMP is restoring the file, I see very slow reads A> on sysstat, but I don't know if that's an NDMP issue or a tape read A> issue.
Hmmm.... does NetBackup use a really small block size?
It shouldn't, but that's why I want to try a 'dd' or similar to see how the tape performs. Other than 'restore', I can't see anything I can do in OnTAP to read the tape.
Maybe NDMP copy could be used to try and pull the backup off tape and send it to another node where you can extract it via pipe to tar (or whatever NetBackups uses for it's internal format) that way?
I'm not sure how I'd do that. The data on tape is wrapped in a netbackup header, so I don't think OnTAP will like looking at the data.
A> I can't get near that on the LTO-3. I never get faster than A> 80MB/s. The restore is around 100 KB/s.
That's just way too slow. I assume it's a fibre channel drive? All the way down to the drive? You'd think even a plain read/scan of the tape would fly.
Correct. Just a straight path from the controller to the drive. Not even a switch.
What happens if you create your own backup tape(s) on the filer using plain 'dump' commands to the same drive. Can you then read it back more quickly? That will at least give you more confidence that you've got a good connection.
Well, just the fact that the dump runs at good speed makes me think the connection is good. But yes, I suppose I can do a dump/restore test. But I'd rather test this tape... :-)
Speaking of that, maybe you've got a flaky fibre connection or dirty optics? Try re-seating both ends of the cable to make sure you're getting the right connection. I remember one with a 50pin (or was it 68?) where we bent one pin in the connector and it just failed down to old SCSI-1 speeds, but it did work. Just really really slowly...
Backup speeds (80MB/s) are fine.