hi,
we are finally getting legato networker going with ndmp. my f760 is running 5.36r2. it has a directly attached spectralogic treefrog. the sysstat data below is running at 1 second intervals. the filer is not being used yet, we are testing. i notice a regular pattern of cpu util going from about 10 to 20 to 30 to 40% and then back down. then quiet at about 7% for about 10 seconds then up and down again. is this normal for backup? thanks.
7% 0 0 0 0 0 5328 0 0 5714 41 7% 0 0 0 1 0 5477 16 0 5658 41 7% 0 0 0 1 0 6152 0 0 6205 41 6% 0 0 0 1 0 5653 0 0 5843 41 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 7% 0 0 0 0 0 5676 16 0 5407 41 7% 0 0 0 1 1 5645 0 0 6335 41 7% 0 0 0 1 0 5876 0 0 5960 41 7% 0 0 0 1 1 5456 16 0 5530 41 10% 0 1 0 1 1 6190 0 0 6150 41 22% 0 0 0 14 26 8312 0 0 8479 41 29% 0 0 0 14 26 12840 212 0 12485 41 41% 0 0 0 40 78 10868 0 0 10936 41 22% 0 0 0 14 26 9709 0 0 9533 41 28% 0 0 0 27 52 9820 16 0 9400 41 22% 0 0 0 14 25 11816 0 0 12349 41 10% 0 0 0 1 1 7207 0 0 7011 41 31% 0 0 0 27 52 9232 16 0 9400 41 12% 0 0 0 0 0 9305 0 0 9779 41 8% 0 0 0 1 0 6916 0 0 6820 41 9% 0 0 0 1 0 7471 16 0 7565 41 10% 0 0 0 1 0 7184 0 0 7864 41 11% 0 0 0 1 0 7223 188 0 7257 41 9% 0 0 0 1 0 7216 24 0 7188 41 8% 0 0 0 0 0 7000 0 0 7188 41
On Mon, 5 Mar 2001, neil lehrer wrote:
Hi Neil & All,
To directly answer your question, the following sysstat looks fine to me. Look like you caught it in the middle of writing file data to tape. Notice that for every 1 second interval, the amount of data read from disk is approximately the amount that's written to tape. I believe it's just in the phase of copying file from file system to tape.
In an attempt to explain the flutuation in CPU usage, we need to bring dump, the utility that underlies NDMP backup on filer, into the picture. Netapp filers have a native command, dump, that allows you to backup a volume, or part of it, from console or rsh. It also serves as the engine for NDMP backups. NDMP applications manage the tape volumes and they all invoke dump to stream the data to tapes in BSD-dump format.
The CPU usage you are seeing here is mainly due to dump(since you said nothing else is running on the filer). Dump has it's own CPU overhead and several factors affects it.
First of all, the distribution of required data block on disk has an effect on it's performance and CPU usage. If dump had to block to wait for required data block constantly, obviously CPU would be idling(low CPU usage) and data throughput is small as well. You should notice from the sysstat log that data throughput increases with CPU usage. The throughput per second/per % of CPU remains to be relatively constant.
One other major factor that affects CPU utilization of dump is the size of a file. Since dump stream format is BSD-dump compliant, it's required to build a 1KB headr for every file(fixed CPU cost). If the files are relatively small, the CPU cost becomes proportionally large as we stream same throughput to tape.
I am not sure if the backup in this particular case is a full volume/qtree dump, but the fact that dumps in Ontap 5.3.x releases read the inode file once in every phase might have some impact on the CPU.
I would also like to take this opportunity to bring the attention of this mailing list to some enhancements we introduced in Ontap 6.0.x to address the last point.
For full volume/qtree dump, Ontap 5.3.x reads the inode file to select the inode to dump. Since dump is partitioned into several phases(including dumpmaps, directories, files, and ACLs), dump run through the inode file several times, each time just to determine if a file/directory should be written out to tape in that particular phase. The time we spend on search the inode file might sometimes be blown out of proportion if you have a small set of file in a large inode file to dump. And we do this several times in a dump.
In Ontap 6.0.x, we enhance this process by running through the inode file once and building some temporary bitmap files which indicate directories, files and ACLs to be dumped at the beginning. When it comes to time to determine what to dump, we look it up in the bitmap files. The bitmap files are partially cached in memory and thus reduces a lot of disk I/O to read inode files. This also eliminates a lot of problems in Ontap 5.3.x involving switching tapes.
In conclusion, we have seen a lot less problems with dump in 6.0.x than in 5.3.x. From that perspective, it's worth upgrading to 6.0.x as it has a lot more stable dump engine. Of course, you probably want to contact your tech support at netapp to access the viability of the upgrade.
Hope it help.
=) Steve
hi,
we are finally getting legato networker going with ndmp. my f760 is running 5.36r2. it has a directly attached spectralogic treefrog. the sysstat data below is running at 1 second intervals. the filer is not being used yet, we are testing. i notice a regular pattern of cpu util going from about 10 to 20 to 30 to 40% and then back down. then quiet at about 7% for about 10 seconds then up and down again. is this normal for backup? thanks.
7% 0 0 0 0 0 5328 0 0 5714 41 7% 0 0 0 1 0 5477 16 0 5658 41 7% 0 0 0 1 0 6152 0 0 6205 41 6% 0 0 0 1 0 5653 0 0 5843 41 CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 7% 0 0 0 0 0 5676 16 0 5407 41 7% 0 0 0 1 1 5645 0 0 6335 41 7% 0 0 0 1 0 5876 0 0 5960 41 7% 0 0 0 1 1 5456 16 0 5530 41 10% 0 1 0 1 1 6190 0 0 6150 41 22% 0 0 0 14 26 8312 0 0 8479 41 29% 0 0 0 14 26 12840 212 0 12485 41 41% 0 0 0 40 78 10868 0 0 10936 41 22% 0 0 0 14 26 9709 0 0 9533 41 28% 0 0 0 27 52 9820 16 0 9400 41 22% 0 0 0 14 25 11816 0 0 12349 41 10% 0 0 0 1 1 7207 0 0 7011 41 31% 0 0 0 27 52 9232 16 0 9400 41 12% 0 0 0 0 0 9305 0 0 9779 41 8% 0 0 0 1 0 6916 0 0 6820 41 9% 0 0 0 1 0 7471 16 0 7565 41 10% 0 0 0 1 0 7184 0 0 7864 41 11% 0 0 0 1 0 7223 188 0 7257 41 9% 0 0 0 1 0 7216 24 0 7188 41 8% 0 0 0 0 0 7000 0 0 7188 41 --
regards
neil lehrer nlehrer@ibb.gov writes:
we are finally getting legato networker going with ndmp. my f760 is running 5.36r2. it has a directly attached spectralogic treefrog. the sysstat data below is running at 1 second intervals. the filer is not being used yet, we are testing. i notice a regular pattern of cpu util going from about 10 to 20 to 30 to 40% and then back down. then quiet at about 7% for about 10 seconds then up and down again. is this normal for backup? thanks.
[...]
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 7% 0 0 0 0 0 5676 16 0 5407 41 7% 0 0 0 1 1 5645 0 0 6335 41 7% 0 0 0 1 0 5876 0 0 5960 41 7% 0 0 0 1 1 5456 16 0 5530 41 10% 0 1 0 1 1 6190 0 0 6150 41 22% 0 0 0 14 26 8312 0 0 8479 41 29% 0 0 0 14 26 12840 212 0 12485 41 41% 0 0 0 40 78 10868 0 0 10936 41 22% 0 0 0 14 26 9709 0 0 9533 41 28% 0 0 0 27 52 9820 16 0 9400 41 22% 0 0 0 14 25 11816 0 0 12349 41 10% 0 0 0 1 1 7207 0 0 7011 41 31% 0 0 0 27 52 9232 16 0 9400 41 12% 0 0 0 0 0 9305 0 0 9779 41 8% 0 0 0 1 0 6916 0 0 6820 41 9% 0 0 0 1 0 7471 16 0 7565 41
I tried running a "sysstat 1" during our dumps this evening (dump, not ndmp, but it shouldn't make too much difference; tape drive is a DLT7000). The filer was only moderately quiet at the time though, and the 10 second periodicity I saw was clearly due to the CPs. That shouldn't be the case for you above: in fact I don't really understand why you have a trickle of disk write activity every 3 seconds at all. Does this happen even when a dump is not running?
Anyway, your dump rate would seem to be limited by the tape drive, and the disk read rate matches it very closely. The periods with higher rates are probably due to better compression being achieved. Better compression may correspond to smaller files, which would push the CPU rate up. What sort of data have you populated the filer with for these tests?
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.
the filer is for general use, so word, excel, too many darned mp3 files, anything else they dump on it. i have copied the user's home dirs onto the filer for testing. we are nfs and cifs.
Chris Thompson wrote:
neil lehrer nlehrer@ibb.gov writes:
we are finally getting legato networker going with ndmp. my f760 is running 5.36r2. it has a directly attached spectralogic treefrog. the sysstat data below is running at 1 second intervals. the filer is not being used yet, we are testing. i notice a regular pattern of cpu util going from about 10 to 20 to 30 to 40% and then back down. then quiet at about 7% for about 10 seconds then up and down again. is this normal for backup? thanks.
[...]
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 7% 0 0 0 0 0 5676 16 0 5407 41 7% 0 0 0 1 1 5645 0 0 6335 41 7% 0 0 0 1 0 5876 0 0 5960 41 7% 0 0 0 1 1 5456 16 0 5530 41 10% 0 1 0 1 1 6190 0 0 6150 41 22% 0 0 0 14 26 8312 0 0 8479 41 29% 0 0 0 14 26 12840 212 0 12485 41 41% 0 0 0 40 78 10868 0 0 10936 41 22% 0 0 0 14 26 9709 0 0 9533 41 28% 0 0 0 27 52 9820 16 0 9400 41 22% 0 0 0 14 25 11816 0 0 12349 41 10% 0 0 0 1 1 7207 0 0 7011 41 31% 0 0 0 27 52 9232 16 0 9400 41 12% 0 0 0 0 0 9305 0 0 9779 41 8% 0 0 0 1 0 6916 0 0 6820 41 9% 0 0 0 1 0 7471 16 0 7565 41
I tried running a "sysstat 1" during our dumps this evening (dump, not ndmp, but it shouldn't make too much difference; tape drive is a DLT7000). The filer was only moderately quiet at the time though, and the 10 second periodicity I saw was clearly due to the CPs. That shouldn't be the case for you above: in fact I don't really understand why you have a trickle of disk write activity every 3 seconds at all. Does this happen even when a dump is not running?
Anyway, your dump rate would seem to be limited by the tape drive, and the disk read rate matches it very closely. The periods with higher rates are probably due to better compression being achieved. Better compression may correspond to smaller files, which would push the CPU rate up. What sort of data have you populated the filer with for these tests?
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.