From: Eyal Traitel [mailto:eyal.traitel@motorola.com]
Where/What/Who... :) How do I enable it ? First time I've ever heard about it being in 5.3
These were introduced in 5.3.4. It is a summary form of a full core.
"Graham C. Knight" wrote:
Does anyone know if there is a way to have the mini-core files that are introduced in 5.3 mailed to a list *other* than my autosupport list
- or to disable them being mailed at all?
Currently these get delivered to the recipients on the "autosupport.to" list, and there isn't a way to tweak that. We plan to make this type of thing more configurable in the future.
Do you like getting copies of all the other autosupports, but want to avoid having the mini cores delivered to your mailbox? Or would you also like more flexibility regarding whether and where they get delivered?
---------------------------------------------------------------------- steve klinkner klink@netapp.com
"Klinkner, Steve" wrote:
Do you like getting copies of all the other autosupports, but want to avoid having the mini cores delivered to your mailbox? Or would you also like more flexibility regarding whether and where they get delivered?
Yes. :-)
A different option for each type of e-mail would be preferable. I have a bunch of people getting our autosupport mail (it goes to a maillist in the company) - those people started complaining when a bunch of mini-core's came out of a filer we were having problems with.
It's a nitpick really - i hope to not have these things being mailed very often. :-)
Graham
Steve Klinkner klink@netapp.com writes:
From: Eyal Traitel [mailto:eyal.traitel@motorola.com]
Where/What/Who... :) How do I enable it ? First time I've ever heard about it being in 5.3
These were introduced in 5.3.4. It is a summary form of a full core.
It's about time the man pages for na_savecore and na_crash were updated, isn't it? They don't even admit the existence of core.*.nz files, which have certainly existed at least since 5.2.1.
Incidentally, has anyone found a way of significantly compressing the .nz dumps before transfer to NetApp? When I tried gzip, I got only about a 2% reduction in size, which wasn't worth it.
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.
From Chris Thompson on Wed, 22 Mar 2000 22:10:46 GMT:
It's about time the man pages for na_savecore and na_crash were updated, isn't it? They don't even admit the existence of core.*.nz files, which have certainly existed at least since 5.2.1.
I agree.
Incidentally, has anyone found a way of significantly compressing the .nz dumps before transfer to NetApp? When I tried gzip, I got only about a 2% reduction in size, which wasn't worth it.
Isn't that the point of .nz cores? They are pre-compressed.
<soapbox>
Incidentally, have you noticed that it now takes about five times longer to dump a core now that they are .nz compressed? This is a very frustrating "feature". The savecore process works while the filer is ON-LINE, whereas the dump process works while the filer is OFF-LINE.
If the goal is for the filer to be ON-LINE more than it is OFF-LINE, why delay the time it spends dumping with compressing a core when we could all gzip it when the filer is ON-LINE? Or why not put the compression code in the savecore process which can be executed after the filer is back ON-LINE?
I don't know about other SysAdmins, but I hate watching dots fly across the screen for 10 minutes while some Director of Engineering is screaming into the phone because their box isn't back ON-LINE. I'm especially frustrated when I know those dots could be doing their job after the Director has "enhanced their calm" when the box is back ON-LINE and making them money again.
</soapbox>
<rfe>
It would be nice if:
1) Compression was done during "savecore" rather than "dumping core" 2) Compression could be turned off via "options compress.cores.enable off"
Either would have the end result of allowing us to get boxes back on-line faster and compress the cores on our own time rather than on our customers' time.
</rfe>
Thanks!
-- Jeff
-- ---------------------------------------------------------------------------- Jeff Krueger E-Mail: jeff@qualcomm.com NetApp File Server Lead Phone: 858-651-6709 IT Engineering and Support Fax: 858-651-6627 QUALCOMM, Incorporated Web: www.qualcomm.com
Dear Jeff:
You said--
Incidentally, have you noticed that it now takes about five times longer to dump a core now that they are .nz compressed? This is a very frustrating "feature". The savecore process works while the filer is ON-LINE, whereas the dump process works while the filer is OFF-LINE.
If the goal is for the filer to be ON-LINE more than it is OFF-LINE, why delay the time it spends dumping with compressing a core when we could all gzip it when the filer is ON-LINE? Or why not put the compression code in the savecore process which can be executed after the filer is back ON-LINE?
The problem lies in the implementation of core dump.
Cores are not dumped directly to the file system--at the time the filer panics, you can't risk touching the file system lest you corrupt it. So the current state--the core--is dumped to a reserved area on the disk. Rather, all the reserved areas on all disks are filled up one by one with chunks of coredump, and savecore unwinds this after reboot.
As filers have received more main memory, we began running out of reserved disk areas before the whole core was dumped. The actual ratio of memory to disk depends on memory size and disk model; the larger the ratio the more likely you'll be unable to dump the whole core. In response, we implemented the compressed core feature. Now if the filer computes there isn't enough disk space to save the entire core uncompressed, we compress the core before writing it out.
(The other obvious change is to change the size of the reserved disk area, but we were loath to do that as we didn't want to make changes to the disk layout. Such changes would deeply affect both reverting back to previous releases, and the migration of disks to new filer heads during an upgrade. Basically, we concluded that the data layout on the disk is sacrosanct.)
Finally, we concluded that panics were sufficiently rare events that we were willing to trade off some time during compression to ensure that we got the entire core, without too badly affecting our overall availability. Of course, the very fact that we had to make tradeoffs means that some customers in some configurations would see some degradation. We are constantly looking for ways to improve our self-diagnostic capability. In fact, the guy in the next office is looking at core dumps right now, and I'll make sure he has read your note.
Yours, Mike Tuciarone Platform Software
From "Michael J. Tuciarone" on Wed, 22 Mar 2000 16:54:54 PST:
Cores are not dumped directly to the file system--at the time the filer panics, you can't risk touching the file system lest you corrupt it. So the current state--the core--is dumped to a reserved area on the disk. Rather, all the reserved areas on all disks are filled up one by one with chunks of coredump, and savecore unwinds this after reboot.
This part I was aware of.
As filers have received more main memory, we began running out of reserved disk areas before the whole core was dumped. The actual
Oh. =(
ratio of memory to disk depends on memory size and disk model; the larger the ratio the more likely you'll be unable to dump the whole core. In response, we implemented the compressed core feature. Now if the filer computes there isn't enough disk space to save the entire core uncompressed, we compress the core before writing it out.
So its doesn't always compress the core during the dump? Even our filers with 56 of the 18GB drives take a noticeably longer time to dump. Understandably they have 1GB of RAM, but if the reserved areas 52 4GB disks could hold a core from 512MB of RAM on an F630, why can't 56 18GB drives hold a core from 1GB of RAM on an F760? It seems like 5 times the amount of disk got added but only twice the amount of RAM.
(The other obvious change is to change the size of the reserved disk area, but we were loath to do that as we didn't want to make changes to the disk layout. Such changes would deeply affect both reverting back to previous releases, and the migration of disks to new filer heads during an upgrade. Basically, we concluded that the data layout on the disk is sacrosanct.)
Understandably so. Thanks for not changing the size - reverting would have been horrible.
Finally, we concluded that panics were sufficiently rare events that we
Uhhh... sufficiently rare? My customer's definition of sufficiently rare downtime is none whatsoever. =)
were willing to trade off some time during compression to ensure that we got the entire core, without too badly affecting our overall availability. Of course, the very fact that we had to make tradeoffs
In the end, of course we'll spend the extra few minutes off-line to get the core so you folks can fix our problem(s). Unfortunately, this is the first time we've gotten a complete technical explination of the problem (memory to reserved disk ratio). All we have heard before was "Guess what? You don't have to gzip your cores anymore!" which, obviously didn't sit well. =)
Is there any metric we can use to know if the filer is going to compress the core or not? All our filers seem to compress all their cores.
Thanks for the in-depth response!
-- Jeff
-- ---------------------------------------------------------------------------- Jeff Krueger E-Mail: jeff@qualcomm.com NetApp File Server Lead Phone: 858-651-6709 IT Engineering and Support Fax: 858-651-6627 QUALCOMM, Incorporated Web: www.qualcomm.com
As filers have received more main memory, we began running out of reserved disk areas before the whole core was dumped. The actual ratio of memory to disk depends on memory size and disk model; the larger the ratio the more likely you'll be unable to dump the whole core. In response, we implemented the compressed core feature. Now if the filer computes there isn't enough disk space to save the entire core uncompressed, we compress the core before writing it out.
This makes me very interested in providing enough disks to allow uncompressed cores. However, looking at the documentation I don't think that's going to happen. According to the SAG, each disks provides only 20 MB of dump space, so for an F760 with 1024 MB of main memory, fifty-two disks will be necessary for an uncompressed dump. Is this correct? The chart still says only fourteen for "256 or more." I've got fourteen on each filer; if twenty, say, would do it, I'd rearrange filers so that the most sensitive ones had twenty, but fifty-two is out of the question.
ejt
Ethan Torretta ejt@tellme.com writes:
"Michael J. Tuciarone" tooch@netapp.com wrote:
As filers have received more main memory, we began running out of reserved disk areas before the whole core was dumped. The actual ratio of memory to disk depends on memory size and disk model; the larger the ratio the more likely you'll be unable to dump the whole core. In response, we implemented the compressed core feature. Now if the filer computes there isn't enough disk space to save the entire core uncompressed, we compress the core before writing it out.
This makes me very interested in providing enough disks to allow uncompressed cores. However, looking at the documentation I don't think that's going to happen. According to the SAG, each disks provides only 20 MB of dump space, so for an F760 with 1024 MB of main memory, fifty-two disks will be necessary for an uncompressed dump. Is this correct? The chart still says only fourteen for "256 or more." I've got fourteen on each filer; if twenty, say, would do it, I'd rearrange filers so that the most sensitive ones had twenty, but fifty-two is out of the question.
The problem would seem to be that the 20 MB reserved area hasn't altered from the days of 2 GB, maybe 1 GB, discs, and so things have got out of proportion.
How about using the space wasted by "right-sizing" for dumps? Only if it exists, of course, but in practice it will, and will be a lot more than 20 MB per disc!
Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.