Greetings,
I'm thinking about doing something that is not supported and was wondering if anyone had done the same or has more detailed insight.
We have a very busy cluster (6040s 7.3.5.1P4). It looks like we are largely maxing out the heads for CPU. We are getting a pair of 6080s and really need to try and do the head swap live (takeover / giveback) if at all possible. The unsupported part I want to do is keep the 6040 NVRAM cards and put them in the 6080s as I swap them. The reason for this is I would not have to change the system ID ownership on all the drives.
I know changing the system ID is generally not a big deal by booting each head to maintenance mode and reassigning the old SID to the new SID. In our case it worries me. Last week we were going to move a project to the other head by reassigning the appropriate drives for a couple of aggregates. While trying to reassign these the SAS buses started panic'ing and crashed the controlling filer. The entire cluster was down. The ensuing mess took several hours to clean up.
If it crashed while trying to change ownership of a few drives, I'm afraid of what will happen when it tries to reassign all the old SID drives for the new NVRAM card. I was hoping if we could keep the cards, we could swap heads, not change SIDs, and minimize our chance of repeating the crash. I could do the disks one at a time, but I have 796 drives on this cluster and would rather not.
Is there a requirement for the hardware to have the bigger memory cards? Since there are more CPUs, I can see where maybe something needs it, I just don't know what. We will probably have a downtime in a couple of months where I can put the correct ones back in.
Thanks,
Jeff
Jeff,
Great point of view and also quite pragmatic. However, I would not do that and this has quite a few reasons.
The undocumented nature of the NVRAM content. Are you sure the NVRAM contains just the system id? I don't think so. I think it is quite possible that the NVRAM contains machine relevant content which will just dig you deeper into problems if it goes wrong.
I think your panicked filer is not an nvram issue, more like a software issue. Your Data ONTAP version is pretty ol, 7.3.5.* has a large amount of WTF bugs, depending on what SAS HBA you are using there are a few new driver built into new dot releases. Having a look into the bug database is always recommended, enjoy the obscurity of some ontap bugs. ;)
HTH, Felix -- Felix Schröder Support Engineer
teamix GmbH Suedwestpark 35 90449 Nuernberg
fon: +49 911 30999-68 fax: +49 911 30999-99 mail: fs@teamix.de web: http://www.teamix.de
Amtsgericht Nürnberg, HRB 18320 Geschäftsführer: Oliver Kügow, Richard Müller
----- Ursprüngliche Mail ----- Von: "Jeff Cleverley" jeff.cleverley@avagotech.com An: toasters@teaparty.net Gesendet: Donnerstag, 31. Januar 2013 02:58:00 Betreff: 6080 heads with 6040 NVRAM cards.
Greetings,
I'm thinking about doing something that is not supported and was wondering if anyone had done the same or has more detailed insight.
We have a very busy cluster (6040s 7.3.5.1P4). It looks like we are largely maxing out the heads for CPU. We are getting a pair of 6080s and really need to try and do the head swap live (takeover / giveback) if at all possible. The unsupported part I want to do is keep the 6040 NVRAM cards and put them in the 6080s as I swap them. The reason for this is I would not have to change the system ID ownership on all the drives.
I know changing the system ID is generally not a big deal by booting each head to maintenance mode and reassigning the old SID to the new SID. In our case it worries me. Last week we were going to move a project to the other head by reassigning the appropriate drives for a couple of aggregates. While trying to reassign these the SAS buses started panic'ing and crashed the controlling filer. The entire cluster was down. The ensuing mess took several hours to clean up.
If it crashed while trying to change ownership of a few drives, I'm afraid of what will happen when it tries to reassign all the old SID drives for the new NVRAM card. I was hoping if we could keep the cards, we could swap heads, not change SIDs, and minimize our chance of repeating the crash. I could do the disks one at a time, but I have 796 drives on this cluster and would rather not.
Is there a requirement for the hardware to have the bigger memory cards? Since there are more CPUs, I can see where maybe something needs it, I just don't know what. We will probably have a downtime in a couple of months where I can put the correct ones back in.
Thanks,
Jeff
Felix,
Thanks for the reply. I figure the cards can't have any host specific information on them since it would go away any time you had to replace one. I can see it having set aside storage blocks for something like that for the added CPUs, etc.
We have been considering upgrading to a newer OS, but our CPU loading went up between 30-50% when we upgrade to this on one of our filers. NetApp never could figure out what changed. Not knowing what caused it made looking for bugs quite difficult :-) We've been afraid of what might happen going to a new OS.
At some point we may get 6280s or similar which will force an OS upgrade. Unfortunately the cluster controller on the 6280 is SAS and not the infiniband connector so we could not do a hot swap with this model. Otherwise we might have thought about bigger heads this time around.
We most likely won't do this but it is always good to get as much information as possible :-)
Jeff
On Wed, Jan 30, 2013 at 8:30 PM, Felix Schröder fs@teamix.de wrote:
Jeff,
Great point of view and also quite pragmatic. However, I would not do that and this has quite a few reasons.
The undocumented nature of the NVRAM content. Are you sure the NVRAM contains just the system id? I don't think so. I think it is quite possible that the NVRAM contains machine relevant content which will just dig you deeper into problems if it goes wrong.
I think your panicked filer is not an nvram issue, more like a software issue. Your Data ONTAP version is pretty ol, 7.3.5.* has a large amount of WTF bugs, depending on what SAS HBA you are using there are a few new driver built into new dot releases. Having a look into the bug database is always recommended, enjoy the obscurity of some ontap bugs. ;)
HTH, Felix -- Felix Schröder Support Engineer
teamix GmbH Suedwestpark 35 90449 Nuernberg
fon: +49 911 30999-68 fax: +49 911 30999-99 mail: fs@teamix.de web: http://www.teamix.de
Amtsgericht Nürnberg, HRB 18320 Geschäftsführer: Oliver Kügow, Richard Müller
----- Ursprüngliche Mail ----- Von: "Jeff Cleverley" jeff.cleverley@avagotech.com An: toasters@teaparty.net Gesendet: Donnerstag, 31. Januar 2013 02:58:00 Betreff: 6080 heads with 6040 NVRAM cards.
Greetings,
I'm thinking about doing something that is not supported and was wondering if anyone had done the same or has more detailed insight.
We have a very busy cluster (6040s 7.3.5.1P4). It looks like we are largely maxing out the heads for CPU. We are getting a pair of 6080s and really need to try and do the head swap live (takeover / giveback) if at all possible. The unsupported part I want to do is keep the 6040 NVRAM cards and put them in the 6080s as I swap them. The reason for this is I would not have to change the system ID ownership on all the drives.
I know changing the system ID is generally not a big deal by booting each head to maintenance mode and reassigning the old SID to the new SID. In our case it worries me. Last week we were going to move a project to the other head by reassigning the appropriate drives for a couple of aggregates. While trying to reassign these the SAS buses started panic'ing and crashed the controlling filer. The entire cluster was down. The ensuing mess took several hours to clean up.
If it crashed while trying to change ownership of a few drives, I'm afraid of what will happen when it tries to reassign all the old SID drives for the new NVRAM card. I was hoping if we could keep the cards, we could swap heads, not change SIDs, and minimize our chance of repeating the crash. I could do the disks one at a time, but I have 796 drives on this cluster and would rather not.
Is there a requirement for the hardware to have the bigger memory cards? Since there are more CPUs, I can see where maybe something needs it, I just don't know what. We will probably have a downtime in a couple of months where I can put the correct ones back in.
Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611 _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Hi,
You are forgetting something. The FAS6040 and FAS6080 use different type of NVRAM cards with different amount of memory (512 MB vs 2 GB).
Even if will work technically work, which I doubt, it would certainly be a totally unsupported configuration.
Pascal.
-----Original Message----- From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Cleverley Sent: Thursday, January 31, 2013 2:58 AM To: toasters@teaparty.net Subject: 6080 heads with 6040 NVRAM cards.
Greetings,
I'm thinking about doing something that is not supported and was wondering if anyone had done the same or has more detailed insight.
We have a very busy cluster (6040s 7.3.5.1P4). It looks like we are largely maxing out the heads for CPU. We are getting a pair of 6080s and really need to try and do the head swap live (takeover / giveback) if at all possible. The unsupported part I want to do is keep the 6040 NVRAM cards and put them in the 6080s as I swap them. The reason for this is I would not have to change the system ID ownership on all the drives.
I know changing the system ID is generally not a big deal by booting each head to maintenance mode and reassigning the old SID to the new SID. In our case it worries me. Last week we were going to move a project to the other head by reassigning the appropriate drives for a couple of aggregates. While trying to reassign these the SAS buses started panic'ing and crashed the controlling filer. The entire cluster was down. The ensuing mess took several hours to clean up.
If it crashed while trying to change ownership of a few drives, I'm afraid of what will happen when it tries to reassign all the old SID drives for the new NVRAM card. I was hoping if we could keep the cards, we could swap heads, not change SIDs, and minimize our chance of repeating the crash. I could do the disks one at a time, but I have 796 drives on this cluster and would rather not.
Is there a requirement for the hardware to have the bigger memory cards? Since there are more CPUs, I can see where maybe something needs it, I just don't know what. We will probably have a downtime in a couple of months where I can put the correct ones back in.
Thanks,
Jeff
-- Jeff Cleverley Unix Systems Administrator 4380 Ziegler Road Fort Collins, Colorado 80525 970-288-4611 _______________________________________________ Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
I don't want to start up the thread about AV on or not on the filers (I run it on my filer), but we got hit with a nasty variant of this. Not sure why Trend didn't block it, but basically a bunch of my CIF folders were set to read only/hidden.. Not fun. There are some other reasons that need to be mitigated that I wont go into, but this is the virus.
In addition to all the other things mentioned:
YOU DO NOT (and should not) GET INTO MAINTENANCE MODE while a takeover is taking place! Really bad idea! (unlocking mailbox disks and all that)
INSTEAD: While in Takeover Mode (on the node that has taken over) do a disk _re_assign. This will reassign *all* disks belonging _to the partner_ to a new sysid.
This should work non-disruptively.
As for your previous aggregate-reassign: How did you do it? I sure hope you
* turned auto assign off on both nodes * took the aggregate offline * _un_assigned from the owning node (disk assign xxx -s unowned -f) * assigned from the new node o look there: a new aggregate. should be the response of the new node * turn on auto-assign (if needed)
Anything else and I'm not surprised it panicked... Reassigning one-by-one (on online aggregates) leads to broken raid-groups and panics from that.
Hope that helped
Sebastian
On 31.01.2013 02:58, Jeff Cleverley wrote:
Greetings,
I'm thinking about doing something that is not supported and was wondering if anyone had done the same or has more detailed insight.
We have a very busy cluster (6040s 7.3.5.1P4). It looks like we are largely maxing out the heads for CPU. We are getting a pair of 6080s and really need to try and do the head swap live (takeover / giveback) if at all possible. The unsupported part I want to do is keep the 6040 NVRAM cards and put them in the 6080s as I swap them. The reason for this is I would not have to change the system ID ownership on all the drives.
I know changing the system ID is generally not a big deal by booting each head to maintenance mode and reassigning the old SID to the new SID. In our case it worries me. Last week we were going to move a project to the other head by reassigning the appropriate drives for a couple of aggregates. While trying to reassign these the SAS buses started panic'ing and crashed the controlling filer. The entire cluster was down. The ensuing mess took several hours to clean up.
If it crashed while trying to change ownership of a few drives, I'm afraid of what will happen when it tries to reassign all the old SID drives for the new NVRAM card. I was hoping if we could keep the cards, we could swap heads, not change SIDs, and minimize our chance of repeating the crash. I could do the disks one at a time, but I have 796 drives on this cluster and would rather not.
Is there a requirement for the hardware to have the bigger memory cards? Since there are more CPUs, I can see where maybe something needs it, I just don't know what. We will probably have a downtime in a couple of months where I can put the correct ones back in.
Thanks,
Jeff
Greetings,
I thought I'd summarize what went on and pass along some key points that may help others in the future.
1. We used the 6080 NVRAM cards because nobody could tell us for sure if the 6040 cards would work properly. 2. We found you cannot live swap 6040 and 6080 heads in a cluster. When you takeover a 6040 and replace it with a 6080, things will work until you do a giveback to what is now the 6080. The 6040 head disables the cluster because of the different card in the 6080. You can force a takeover of the 6040 if you want to skip the message about it corrupting your data. 3. The 6080 heads previously booted 8.x images. We run 7.3.5.1. The system would not boot in any way and we could not get the compact flash cards to update after netboots and software installs. Ultimately we just swapped the compact flash cards from the 6040s and everything worked fine.
Thanks for all the help and information on this.
Jeff
On Thu, Jan 31, 2013 at 12:10 AM, Sebastian Goetze spgoetze@gmail.com wrote:
In addition to all the other things mentioned:
YOU DO NOT (and should not) GET INTO MAINTENANCE MODE while a takeover is taking place! Really bad idea! (unlocking mailbox disks and all that)
INSTEAD: While in Takeover Mode (on the node that has taken over) do a disk _re_assign. This will reassign *all* disks belonging _to the partner_ to a new sysid.
This should work non-disruptively.
As for your previous aggregate-reassign: How did you do it? I sure hope you
turned auto assign off on both nodes took the aggregate offline _un_assigned from the owning node (disk assign xxx -s unowned -f) assigned from the new node
look there: a new aggregate. should be the response of the new node
turn on auto-assign (if needed)
Anything else and I'm not surprised it panicked... Reassigning one-by-one (on online aggregates) leads to broken raid-groups and panics from that.
Hope that helped
Sebastian
On 31.01.2013 02:58, Jeff Cleverley wrote:
Greetings,
I'm thinking about doing something that is not supported and was wondering if anyone had done the same or has more detailed insight.
We have a very busy cluster (6040s 7.3.5.1P4). It looks like we are largely maxing out the heads for CPU. We are getting a pair of 6080s and really need to try and do the head swap live (takeover / giveback) if at all possible. The unsupported part I want to do is keep the 6040 NVRAM cards and put them in the 6080s as I swap them. The reason for this is I would not have to change the system ID ownership on all the drives.
I know changing the system ID is generally not a big deal by booting each head to maintenance mode and reassigning the old SID to the new SID. In our case it worries me. Last week we were going to move a project to the other head by reassigning the appropriate drives for a couple of aggregates. While trying to reassign these the SAS buses started panic'ing and crashed the controlling filer. The entire cluster was down. The ensuing mess took several hours to clean up.
If it crashed while trying to change ownership of a few drives, I'm afraid of what will happen when it tries to reassign all the old SID drives for the new NVRAM card. I was hoping if we could keep the cards, we could swap heads, not change SIDs, and minimize our chance of repeating the crash. I could do the disks one at a time, but I have 796 drives on this cluster and would rather not.
Is there a requirement for the hardware to have the bigger memory cards? Since there are more CPUs, I can see where maybe something needs it, I just don't know what. We will probably have a downtime in a couple of months where I can put the correct ones back in.
Thanks,
Jeff