toasters December 2016

toasters@lists.teaparty.net

18 participants
14 discussions

System attention LED
by Alexander Griesser 01 Feb '17

01 Feb '17

Hi there, I have two filers (1x FAS2240 and 1x FAS2650) and both have the system attention LED lit and I have no idea why. `System health status show` as well as config advisor are both A-OK and there are no visible indicators why this LED is still lit. Any idea how to find out the root cause of this LED on the front bezel or how I can check or maybe reset the status of the LED through CLI? Thanks, Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

6 22

ONTAP9 SystemManager issues with quotas
by Sami Kapanen 11 Jan '17

11 Jan '17

Hi, We are having issues with ONTAP9 SystemManager, quota report doesn't show everything, and target filtering is not working. Also, it doesn't allow to sort the rows by username or space used columns. Others seen this? We have case open and escalated, sk

1 1

Atomicity of rename on NFS
by Edward Rolison 03 Jan '17

03 Jan '17

Hello fellow NetApp Admins. I have a bit of an odd one that I'm trying to troubleshoot - and whilst I'm not sure it's specifically filer related, it's NFS related (and is happening on a filer mount). What happens is this - there's a process that updates a file, and relies on 'rename()' being atomic- a journal is updated, and then reference pointer (file) is newly created, and renamed over an old one. The expectation is that this file will always be there - because "rename()" is defined as an atomic operation. But that's not quite what I'm getting - I have one nfs client doing it's (atomic) rename. And another client (different NFS host) reading it, and - occasionally - reporting 'no such file or directory'. This is causing an operation to fail, which in turn means that someone has to intervene in the process. This operation (and multiple extremely similar ones) happen at 5m intervals, and every few days (once a week maybe?) it fails for this reason, and our developers think that should be impossible. But as such - it looks like a pretty narrow race condition. So what I'm trying to figure out is first off: - Could this be a NetApp bug? We've moved from 7 mode to CDOT, and it didn't happen before. On the flip side though - I have no guarantee that it 'never happened before' because we weren't catching a race condition. (moving to new tin and improving performance does increase race condition likelihood after all) - Could this be a kernel bug? We're all on kernel 2.6.32-504.12.2.el6.x86_64 - and whilst we're deploying Centos 7, all the hosts involved aren't yet. (But that's potentially also just coincidence, as there's quite a few hosts, and they're all the same kernel versions). - Is it actually impossible for a file A renamed over file B to generate ENOENT on a different client? Specifically, in RFC3530 We have: " The RENAME operation must be atomic to the client.". So the client doing the rename sees an atomic operation - but the expectation is that a separate client will also perceive an 'atomic' change - once the cache is refreshed, the 'new' directory has the new files, and at no point was there 'no such file or directory' because it was either the old one, or the newly renamed one. Is this actually a valid thing to think? This is a bit of a complicated one, and has me clutching at straws a bit - I can't reliably reproduce it - a basic fast spinning loop script on multiple client to read-write-rename didn't hit it. I've got pcaps running hoping to catch it 'in flight' - but haven't yet managed to catch it happening. But any suggestions would be gratefully received.

5 12

SOLVED: snapmirror source and target aggregate usage don't match?
by Rue, Randy 23 Dec '16

23 Dec '16

Hello All, This provided an opportunity to find and implement some improvements in our dedupe/snapmirror configuration but that wasn't the real problem. We disabled dedupe on all but the two SVMs that were showing space savings of >10% or a couple of hundred GB, and changed the dedupe schedule to run hourly and with ample time to finish before the hourly snapmirror schedule. The real problem turned out to be that while we configure all our data volumes on the source cluster as thin provisioned, when DOT creates the target volume on the secondary "thick." And as the Storage Efficiency settings aren't exposed anywhere in the GUI on the secondary cluster for those volumes, it took some digging and five people staring at the CLI output before we figured this out. We changed all the target volumes to thin and will check/change new ones as they're created Life is good. Randy From: toasters-bounces(a)teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Rue, Randy Sent: Saturday, December 17, 2016 11:14 AM To: toasters(a)teaparty.net Subject: RE: snapmirror source and target aggregate usage don't match? Thank you for a NA docs pointer from a list member and the opportunity for a little RTFM before I reply to the list! Looks like my colleague's recollection of earlier versions still applies to 8.3. Essentially, we've been snapmirroring un-deduped data. My next question is, is it realistic to run dedupes hourly on 30-40 volumes totalling ~100TB? Because that's a much easier proposition than amending our SLAs to lengthen our snapshot cycles. And to get both on the same cycle, is it possible to make snapshots dependent on dedupe finishing, or do we just assume dedupe will complete, and if so, if it doesn't, what are the consequences? For example, if a dedupe that usually finishes at 5 minutes after the hour isn't done, and snapmirror runs then, will snapmirror then be syncing a full hour of full-sized changes? Last question for now: assuming both are on the same schedule, how do I get the current lost space back? Will it be reclaimed when the schedules are synced and the snapshots have rolled off? Or do I need to destroy and recreate the target volumes? Hope to hear from you, especially from any other shop running both snapmirror and dedupe. Randy (and if a solution requires DOT9 we do have an upgrade on our roadmap) Replying to just toy for now. Hopefully this will help and you can report back Look at this... https://library.netapp.com/ecm/ecm_download_file/ECMLP2348026 Page 142. I think you may need to work out some scheduling and that may help. Going to look a bit more...i have another idea, but that may only be ONTAP 9 related. Get Outlook for iOS<https://aka.ms/o0ukef> _____________________________ From: Rue, Randy <rrue(a)fredhutch.org<mailto:rrue@fredhutch.org>> Sent: Saturday, December 17, 2016 11:14 AM Subject: snapmirror source and target aggregate usage don't match? To: <toasters(a)teaparty.net<mailto:toasters@teaparty.net>> Hello All, We run two 8.3 filers with a list of vservers and their associated volumes, with each volume snapmirrored (volume level) from the active primary cluster to matching vserver/volumes on the passive secondary. Both clusters have a similar set of aggregates of just about equal size. Both clusters' aggregates contain the same list of volumes of the same size, with the same space total/used/available on both sets. But on the target cluster the same aggregates are reporting 30% more used space. This is about on par with the dedupe savings we're getting on the primary so when I first noticed this my thought was to check that dedupe was OK on the target. But if you look in the webUI, it reports that no "storage efficiency" is available on a replication target, and ended up thinking this meant that the secondary data would have to be full-sized. I even recall asking someone and having this confirmed, but can't recall if that came from the vendor SE or our VAR SE or a support tech or. Now we're approaching the space limit of the secondary cluster and I'm looking deeper. At this point, as it appears that for each volume the total/used/free space matches after dedupe on the source, I'm thinking that dedupe properties aren't exposed on the target but the data is still a true copy of the deduped original. This is supported by being able to view dedupe stats on the target via the CLI that show the same savings as on the source. Note that we're also snapshotting these volumes, and while we're deduping daily, we're snapshotting hourly. A colleague mentioned remembering that this could mean mirrored data that's not deduped yet is being replicated full-size. But if so, wouldn't this be reflected in the dedupe stats on the target? OK, just found that "storage aggregate show -fields usedsize,physical-used" on the primary/source cluster shows that used and physical-used are about identical for all aggrs. On the secondary/target, used is consistently larger than physical-used and the total difference makes up the 30% I'm "missing." Is this a problem with my reporting? Are we actually OK and I need to look at physical-used instead of used? Or if we're not OK, where is the space being used and can I get it back? Thanks in advance for your guidance... Randy

2 1

Flash Cache vs Flash Pool
by Ehrenwald, Ian 21 Dec '16

21 Dec '16

Hello Toasters I'm thinking about if/how to implement Flash Pool in our new cDOT clusters, and was wondering if anyone could provide a bit of real world guidance for me. We have two SAS aggregates, aggr_sas_600g_c1n1 with 8x24 600g and aggr_sas_1200g_c1n1 with 4x24 1.2t. I've been doing a bunch of reading about Flash Pool vs Flash Cache and am trying to better understand their strengths and weaknesses. Flash Pool accelerates writes as well as reads (Flash Cache is reads only), however with Flash Pool there seems to be the potential for slower cache access/throughput vs Flash Cache since the data needs to travel the SAS path vs Flash Cache which is probably DMA through PCIe. Maybe that's not a concern at all, I don't know. Additionally, it appears that using Flash Pool disables the Flash Cache functionality for the aggregates which are in hybrid mode (makes sense), but then we have expensive add-in cards doing nothing. Our theoretical Flash Pool would be 2x24 200g, giving us about 5.5t of usable caching space to sprinkle into these aggregates. I've been running AWA on the cluster against those two SAS aggregates for ~24 hours and have come up with these stats: ### FP AWA Stats ### Host mrk_c1n1 Memory 61054 MB ONTAP Version NetApp Release 8.3.2P5: Tue Aug 23 01:27:00 PDT 2016 Basic Information Aggregate aggr_sas_600g_c1n1 Current-time Tue Nov 22 17:06:53 EST 2016 Start-time Mon Nov 21 12:48:17 EST 2016 Total runtime (sec) 101918 Interval length (sec) 600 Total intervals 157 In-core Intervals 1024 Summary of the past 157 intervals max ------------ Read Throughput (MB/s): 339.039 Write Throughput (MB/s): 123.536 Cacheable Read (%): 56 Cacheable Write (%): 66 Max Projected Cache Size (GiB): 787.463 Summary Cache Hit Rate vs. Cache Size Referenced Cache Size (GiB): 714.650 Referenced Interval: ID 132 starting at Tue Nov 22 12:33:05 EST 2016 Size 20% 40% 60% 80% 100% Read Hit (%) 25 30 30 30 33 Write Hit (%) 1 2 2 2 2 The entire results and output of Automated Workload Analyzer (AWA) are estimates. The format, syntax, CLI, results and output of AWA may change in future Data ONTAP releases. AWA reports the projected cache size in capacity. It does not make recommendations regarding the number of data SSDs required. Please follow the guidelines for configuring and deploying Flash Pool; that are provided in tools and collateral documents. These include verifying the platform cache size maximums and minimum number and maximum number of data SSDs. Basic Information Aggregate aggr_sas_1200g_c1n1 Current-time Tue Nov 22 17:06:53 EST 2016 Start-time Mon Nov 21 12:40:57 EST 2016 Total runtime (sec) 102357 Interval length (sec) 600 Total intervals 158 In-core Intervals 1024 Summary of the past 158 intervals max ------------ Read Throughput (MB/s): 914.247 Write Throughput (MB/s): 257.318 Cacheable Read (%): 41 Cacheable Write (%): 26 Max Projected Cache Size (GiB): 2412.178 Summary Cache Hit Rate vs. Cache Size Referenced Cache Size (GiB): 2142.380 Referenced Interval: ID 113 starting at Tue Nov 22 09:04:41 EST 2016 Size 20% 40% 60% 80% 100% Read Hit (%) 34 38 38 38 41 Write Hit (%) 7 7 7 7 9 The entire results and output of Automated Workload Analyzer (AWA) are estimates. The format, syntax, CLI, results and output of AWA may change in future Data ONTAP releases. AWA reports the projected cache size in capacity. It does not make recommendations regarding the number of data SSDs required. Please follow the guidelines for configuring and deploying Flash Pool; that are provided in tools and collateral documents. These include verifying the platform cache size maximums and minimum number and maximum number of data SSDs. ### FP AWA Stats End ### Aggregate aggr_sas_600g_c1n1 has a lot of random overwrites (66%) that could have been cached. The volumes in that aggregate are pretty much exclusively Oracle databases. The other aggregate, aggr_sas_1200g_c1n1, doesn't seem hit as hard. Given those statistics, what would you do if your options were ~5.5t of Flash Pool vs buying another 2t Flash Cache card per node in this HA pair? I seem to be missing the 'Projected Read Offload' and 'Projected Write Offload' statistics which would have been very useful, mentioned at the Flash Pool documentation in https://library.netapp.com/ecmdocs/ECMP1368404/html/GUID-2C3EC0DF-FEFE-4871 -A161-4A3BAC87DB69.html Thanks for any insight you all can provide. -- Ian Ehrenwald Senior Infrastructure Engineer Hachette Book Group, Inc. 1.617.263.1948 / ian.ehrenwald(a)hbgusa.com This may contain confidential material. If you are not an intended recipient, please notify the sender, delete immediately, and understand that no disclosure or reliance on the information herein is permitted. Hachette Book Group may monitor email to and from our network.

4 8

Re: snapmirror source and target aggregate usage don't match?
by jordan slingerland 18 Dec '16

18 Dec '16

Maybe also review the dedup savings you are acheveing and consider disabling it on volumes getting 10% savings or so or less since the metadata required for dedup is something like 7%. On Dec 17, 2016 9:23 PM, "tmac" <tmacmd(a)gmail.com> wrote: Trial by fire? You will likely need to get a rough baseline of how long a single dedupe run takes on a source volume. Then look at doing multiple volumes and see how much the process slows down. With Spinnng media, the dedupe process can take a while on larger volumes. You may need to get a little creative and group some of the volumes together and created schedules for the deduping to happen at specific times. I really doubt you are going to be able to dedupe 30-40 volumes totalling ~100TB every hour. You should probably stagger and run throughout the day with different schedules you can create custom vol eff. policies that only run for a certain duration ("-" is default for no duration or up to 999 hours (whole numbers in hours)) and the qos-policy (for vol eff operations!) can be set to background (run, but do not impede) or best-effort (may slightly impact operations) as far as the timing, you may want to look at scripting with powershell or the NetApp SDK. Wait for a vol eff operation to finish, then do snapshots and/or mirroring. We could spend lots of time on this. I hope this helps at least a little. --tmac *Tim McCarthy, **Principal Consultant* *Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>* *I Blog at TMACsRack <https://tmacsrack.wordpress.com/>* On Sat, Dec 17, 2016 at 2:13 PM, Rue, Randy <rrue(a)fredhutch.org> wrote: > Thank you for a NA docs pointer from a list member and the opportunity for > a little RTFM before I reply to the list! > > Looks like my colleague’s recollection of earlier versions still applies > to 8.3. Essentially, we've been snapmirroring un-deduped data. > > My next question is, is it realistic to run dedupes hourly on 30-40 > volumes totalling ~100TB? Because that's a much easier proposition than > amending our SLAs to lengthen our snapshot cycles. > > And to get both on the same cycle, is it possible to make snapshots > dependent on dedupe finishing, or do we just assume dedupe will complete, > and if so, if it doesn't, what are the consequences? For example, if a > dedupe that usually finishes at 5 minutes after the hour isn't done, and > snapmirror runs then, will snapmirror then be syncing a full hour of > full-sized changes? > > Last question for now: assuming both are on the same schedule, how do I > get the current lost space back? Will it be reclaimed when the schedules > are synced and the snapshots have rolled off? Or do I need to destroy and > recreate the target volumes? > > Hope to hear from you, especially from any other shop running both > snapmirror and dedupe. > > Randy > > (and if a solution requires DOT9 we do have an upgrade on our roadmap) > > > > Replying to just toy for now. Hopefully this will help and you can report > back > Look at this... > https://library.netapp.com/ecm/ecm_download_file/ECMLP2348026 > > Page 142. > > I think you may need to work out some scheduling and that may help. Going > to look a bit more...i have another idea, but that may only be ONTAP 9 > related. > > > Get Outlook for iOS <https://aka.ms/o0ukef> > > _____________________________ > From: Rue, Randy <rrue(a)fredhutch.org> > Sent: Saturday, December 17, 2016 11:14 AM > Subject: snapmirror source and target aggregate usage don't match? > To: <toasters(a)teaparty.net> > > > Hello All, > > > > We run two 8.3 filers with a list of vservers and their associated > volumes, with each volume snapmirrored (volume level) from the active > primary cluster to matching vserver/volumes on the passive secondary. > > > > Both clusters have a similar set of aggregates of just about equal size. > Both clusters’ aggregates contain the same list of volumes of the same > size, with the same space total/used/available on both sets. > > > > But on the target cluster the same aggregates are reporting 30% more used > space. > > > > This is about on par with the dedupe savings we’re getting on the primary > so when I first noticed this my thought was to check that dedupe was OK on > the target. But if you look in the webUI, it reports that no “storage > efficiency” is available on a replication target, and ended up thinking > this meant that the secondary data would have to be full-sized. I even > recall asking someone and having this confirmed, but can’t recall if that > came from the vendor SE or our VAR SE or a support tech or. > > > > Now we’re approaching the space limit of the secondary cluster and I’m > looking deeper. At this point, as it appears that for each volume the > total/used/free space matches after dedupe on the source, I’m thinking that > dedupe properties aren’t exposed on the target but the data is still a true > copy of the deduped original. This is supported by being able to view > dedupe stats on the target via the CLI that show the same savings as on the > source. > > > > Note that we’re also snapshotting these volumes, and while we’re deduping > daily, we’re snapshotting hourly. A colleague mentioned remembering that > this could mean mirrored data that’s not deduped yet is being replicated > full-size. But if so, wouldn’t this be reflected in the dedupe stats on the > target? > > > > OK, just found that “storage aggregate show -fields > usedsize,physical-used” on the primary/source cluster shows that used and > physical-used are about identical for all aggrs. On the secondary/target, > used is consistently larger than physical-used and the total difference > makes up the 30% I’m “missing.” > > > > Is this a problem with my reporting? Are we actually OK and I need to look > at physical-used instead of used? Or if we’re not OK, where is the space > being used and can I get it back? > > > > Thanks in advance for your guidance… > > > > Randy > > > > > > > > _______________________________________________ > Toasters mailing list > Toasters(a)teaparty.net > http://www.teaparty.net/mailman/listinfo/toasters > > _______________________________________________ Toasters mailing list Toasters(a)teaparty.net http://www.teaparty.net/mailman/listinfo/toasters

1 0

RE: snapmirror source and target aggregate usage don't match?
by Rue, Randy 18 Dec '16

18 Dec '16

Thank you for a NA docs pointer from a list member and the opportunity for a little RTFM before I reply to the list! Looks like my colleague’s recollection of earlier versions still applies to 8.3. Essentially, we've been snapmirroring un-deduped data. My next question is, is it realistic to run dedupes hourly on 30-40 volumes totalling ~100TB? Because that's a much easier proposition than amending our SLAs to lengthen our snapshot cycles. And to get both on the same cycle, is it possible to make snapshots dependent on dedupe finishing, or do we just assume dedupe will complete, and if so, if it doesn't, what are the consequences? For example, if a dedupe that usually finishes at 5 minutes after the hour isn't done, and snapmirror runs then, will snapmirror then be syncing a full hour of full-sized changes? Last question for now: assuming both are on the same schedule, how do I get the current lost space back? Will it be reclaimed when the schedules are synced and the snapshots have rolled off? Or do I need to destroy and recreate the target volumes? Hope to hear from you, especially from any other shop running both snapmirror and dedupe. Randy (and if a solution requires DOT9 we do have an upgrade on our roadmap) Replying to just toy for now. Hopefully this will help and you can report back Look at this... https://library.netapp.com/ecm/ecm_download_file/ECMLP2348026 Page 142. I think you may need to work out some scheduling and that may help. Going to look a bit more...i have another idea, but that may only be ONTAP 9 related. Get Outlook for iOS<https://aka.ms/o0ukef> _____________________________ From: Rue, Randy <rrue(a)fredhutch.org<mailto:rrue@fredhutch.org>> Sent: Saturday, December 17, 2016 11:14 AM Subject: snapmirror source and target aggregate usage don't match? To: <toasters(a)teaparty.net<mailto:toasters@teaparty.net>> Hello All, We run two 8.3 filers with a list of vservers and their associated volumes, with each volume snapmirrored (volume level) from the active primary cluster to matching vserver/volumes on the passive secondary. Both clusters have a similar set of aggregates of just about equal size. Both clusters’ aggregates contain the same list of volumes of the same size, with the same space total/used/available on both sets. But on the target cluster the same aggregates are reporting 30% more used space. This is about on par with the dedupe savings we’re getting on the primary so when I first noticed this my thought was to check that dedupe was OK on the target. But if you look in the webUI, it reports that no “storage efficiency” is available on a replication target, and ended up thinking this meant that the secondary data would have to be full-sized. I even recall asking someone and having this confirmed, but can’t recall if that came from the vendor SE or our VAR SE or a support tech or. Now we’re approaching the space limit of the secondary cluster and I’m looking deeper. At this point, as it appears that for each volume the total/used/free space matches after dedupe on the source, I’m thinking that dedupe properties aren’t exposed on the target but the data is still a true copy of the deduped original. This is supported by being able to view dedupe stats on the target via the CLI that show the same savings as on the source. Note that we’re also snapshotting these volumes, and while we’re deduping daily, we’re snapshotting hourly. A colleague mentioned remembering that this could mean mirrored data that’s not deduped yet is being replicated full-size. But if so, wouldn’t this be reflected in the dedupe stats on the target? OK, just found that “storage aggregate show -fields usedsize,physical-used” on the primary/source cluster shows that used and physical-used are about identical for all aggrs. On the secondary/target, used is consistently larger than physical-used and the total difference makes up the 30% I’m “missing.” Is this a problem with my reporting? Are we actually OK and I need to look at physical-used instead of used? Or if we’re not OK, where is the space being used and can I get it back? Thanks in advance for your guidance… Randy

2 1

snapmirror source and target aggregate usage don't match?
by Rue, Randy 17 Dec '16

17 Dec '16

Hello All, We run two 8.3 filers with a list of vservers and their associated volumes, with each volume snapmirrored (volume level) from the active primary cluster to matching vserver/volumes on the passive secondary. Both clusters have a similar set of aggregates of just about equal size. Both clusters' aggregates contain the same list of volumes of the same size, with the same space total/used/available on both sets. But on the target cluster the same aggregates are reporting 30% more used space. This is about on par with the dedupe savings we're getting on the primary so when I first noticed this my thought was to check that dedupe was OK on the target. But if you look in the webUI, it reports that no "storage efficiency" is available on a replication target, and ended up thinking this meant that the secondary data would have to be full-sized. I even recall asking someone and having this confirmed, but can't recall if that came from the vendor SE or our VAR SE or a support tech or. Now we're approaching the space limit of the secondary cluster and I'm looking deeper. At this point, as it appears that for each volume the total/used/free space matches after dedupe on the source, I'm thinking that dedupe properties aren't exposed on the target but the data is still a true copy of the deduped original. This is supported by being able to view dedupe stats on the target via the CLI that show the same savings as on the source. Note that we're also snapshotting these volumes, and while we're deduping daily, we're snapshotting hourly. A colleague mentioned remembering that this could mean mirrored data that's not deduped yet is being replicated full-size. But if so, wouldn't this be reflected in the dedupe stats on the target? OK, just found that "storage aggregate show -fields usedsize,physical-used" on the primary/source cluster shows that used and physical-used are about identical for all aggrs. On the secondary/target, used is consistently larger than physical-used and the total difference makes up the 30% I'm "missing." Is this a problem with my reporting? Are we actually OK and I need to look at physical-used instead of used? Or if we're not OK, where is the space being used and can I get it back? Thanks in advance for your guidance... Randy

2 1

AW: System attention LED
by Alexander Griesser 16 Dec '16

16 Dec '16

Hi, one is running 9.1RC2 and the other one is running 8.1.2P4 (FAS2240) Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 Von: CPO [mailto:cavinoquin@gmail.com] Gesendet: Donnerstag, 15. Dezember 2016 14:23 An: Alexander Griesser <AGriesser(a)anexia-it.com> Betreff: Re: System attention LED What version on data on tap? On Dec 14, 2016 3:22 PM, "Alexander Griesser" <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> wrote: Hi there, I have two filers (1x FAS2240 and 1x FAS2650) and both have the system attention LED lit and I have no idea why. `System health status show` as well as config advisor are both A-OK and there are no visible indicators why this LED is still lit. Any idea how to find out the root cause of this LED on the front bezel or how I can check or maybe reset the status of the LED through CLI? Thanks, Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 _______________________________________________ Toasters mailing list Toasters(a)teaparty.net<mailto:Toasters@teaparty.net> http://www.teaparty.net/mailman/listinfo/toasters

1 0

AW: System attention LED
by Alexander Griesser 15 Dec '16

15 Dec '16

Alright, thanks – it’s strange that one can only guess what the problem is with those LEDs :-/ Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 Von: Vervloesem Wouter [mailto:wouter.vervloesem@neoria.be] Gesendet: Donnerstag, 15. Dezember 2016 13:52 An: Alexander Griesser <AGriesser(a)anexia-it.com> Cc: jordan slingerland <jordan.slingerland(a)gmail.com>; toasters(a)teaparty.net Betreff: Re: System attention LED I had a similar problem in the past. It turned out that one of the PSU's had lost its power in the past, and the LED wasn't cleared correctly. The solution (for me) was to reseat the PSU in the system, few seconds after that everything was fine. Met vriendelijke groeten, Wouter Vervloesem<https://be.linkedin.com/pub/wouter-vervloesem/5/a63/a41> Storage Consultant Neoria NV<http://www.neoria.be/> Prins Boudewijnlaan 41 - 2650 Edegem T +32 3 451 23 82 | M +32 496 52 93 61 Op 15 dec. 2016, om 13:30 heeft Alexander Griesser <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> het volgende geschreven: Hi, the LED is still on and I guess it should have been fully charged by now if this would have been the issue. So, any other ideas besides opening a case? Regarding the FAS2240 I mentioned in my initial e-mail: SP netapp1*> system fru led show 0 FRU LED ID 0 is off SP netapp1*> system fru led show 1 FRU LED ID 1 is on I did google a bit for this symptom and therefore updated the SP firmware to the latest compatible one for my ontap version (2.1.3P2), but that did not fix the issue. Any fix for this symptom except for halting both nodes? Best, Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 Von: toasters-bounces(a)teaparty.net<mailto:toasters-bounces@teaparty.net> [mailto:toasters-bounces@teaparty.net] Im Auftrag von Alexander Griesser Gesendet: Mittwoch, 14. Dezember 2016 23:29 An: jordan slingerland <jordan.slingerland(a)gmail.com<mailto:jordan.slingerland@gmail.com>> Cc: toasters(a)teaparty.net<mailto:toasters@teaparty.net> Betreff: AW: AW: System attention LED Maybe that has changed with OnTap 9.1? I’m running 9.1RC2 here on this filer and not because I’m brave, but because it got shipped with it ☺ Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 Von: jordan slingerland [mailto:jordan.slingerland@gmail.com] Gesendet: Mittwoch, 14. Dezember 2016 23:27 An: Alexander Griesser <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> Cc: toasters(a)teaparty.net<mailto:toasters@teaparty.net>; Brad Thompson <brad.thompson877(a)gmail.com<mailto:brad.thompson877@gmail.com>> Betreff: Re: AW: System attention LED Hmm, expected it to look like this listing charge voltage. Current and hours it can hold https://kb.netapp.com/support/s/article/how-to-check-the-nvram-battery-stat… On Dec 14, 2016 5:16 PM, "Alexander Griesser" <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> wrote: Looks good to me: netapp01> environment status chassis list-sensors Sensor Name State Current Critical Warning Warning Critical Reading Low Low High High ------------------------------------------------------------------------------------------------- PSU2 FRU GOOD PSU1 FRU GOOD SP Status IPMI_HB_OK mSATA Status OK mSATA Pres PRESENT Best, Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 Von: Brad Thompson [mailto:brad.thompson877@gmail.com<mailto:brad.thompson877@gmail.com>] Gesendet: Mittwoch, 14. Dezember 2016 23:09 An: Alexander Griesser <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> Cc: toasters(a)teaparty.net<mailto:toasters@teaparty.net> Betreff: Re: System attention LED The battery might be charging. check: environment status chassis list-sensors On Wed, Dec 14, 2016 at 4:49 PM, Alexander Griesser <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> wrote: Hi Brad, netapp01> storage show fault No faults found in storage subsystem netapp02> storage show fault No faults found in storage subsystem No luck with `system fru led show` on the SP of the 2650: SP netapp01*> system fru led show system fru show - display fru id inventory data system fru list - list fru id system fru log show - display fru log system fru log clear - clear fru log But this one seems to work (I’ve highlighted the ones where the LED seems to be in the „on“ state): SP netapp01*> system led show Invalid command usage Usage: system led show <FRU-LED-ID> Display status of the specified <FRU-LED-ID>. <FRU-LED-ID>: 1 - SP Controller Fault LED 2 - SP Chassis Locate LED 3 - SP System Fault LED 4 - SP SAS Port A Fault LED 5 - SP SAS Port B Fault LED 6 - SP CNA 0 Port 1 Fault LED 7 - SP CNA 0 Port 2 Fault LED 8 - SP CNA 1 Port 1 Fault LED 9 - SP CNA 1 Port 2 Fault LED 10 - SP SFP Port 0 Fault LED 11 - SP SFP Port 1 Fault LED 12 - SP CPU DIMM 1 Fault LED 13 - SP CPU DIMM 2 Fault LED 14 - SP NVME 1 Fault LED 15 - SP MSATA Port Fault LED 16 - SP Battery Fault LED 17 - SP RTC Battery Fault LED (Coin Cell Battery) all others are not valid SP netapp01*> SP netapp01*> SP netapp01*> system led show 1 FRU LED ID 1 is on SP netapp01*> system led show 2 FRU LED ID 2 is off SP netapp01*> system led show 3 FRU LED ID 3 is on. Set by SP SP netapp01*> system led show 4 FRU LED ID 4 is off SP netapp01*> system led show 5 FRU LED ID 5 is off SP netapp01*> system led show 6 FRU LED ID 6 is off SP netapp01*> system led show 7 FRU LED ID 7 is off SP netapp01*> system led show 8 FRU LED ID 8 is off SP netapp01*> system led show 9 FRU LED ID 9 is off SP netapp01*> system led show 10 FRU LED ID 10 is off SP netapp01*> system led show 11 FRU LED ID 11 is off SP netapp01*> system led show 12 FRU LED ID 12 is off SP netapp01*> system led show 13 FRU LED ID 13 is off SP netapp01*> system led show 14 FRU LED ID 14 is off SP netapp01*> system led show 15 FRU LED ID 15 is off SP netapp01*> system led show 16 FRU LED ID 16 is on SP netapp01*> system led show 17 FRU LED ID 17 is off On the second controller, all LEDs (except for ID 3, which has been set by partner SP and is therefore related) are off. So thanks a bunch, I do at least finally know where this LED comes from ☺ Since this filer was unboxed today and installed, I’m not aware of any previous faults – could it be that the battery is almost empty and needs to fully charge? Any way to check the battery charging state? Best, Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 Von: Brad Thompson [mailto:brad.thompson877@gmail.com<mailto:brad.thompson877@gmail.com>] Gesendet: Mittwoch, 14. Dezember 2016 22:39 An: Alexander Griesser <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> Cc: toasters(a)teaparty.net<mailto:toasters@teaparty.net> Betreff: Re: System attention LED Is clustering enabled (cf status)? That will turn on the fault led if it's not enabled. What does "storage show fault" say? Other than that, the fault light could be on due to a previous fault in the system that has been cleared. Are you aware of any previous faults? If you log in to the Service Processor you can run "priv set diag", followed by "system fru led show". That command will list out the available fru's that you can check the status on. For example: SP nodeA*> system fru led show Invalid command usage Usage: system fru led show <FRU-LED-ID> Display status of the specified <FRU-LED-ID>. <FRU-LED-ID>: 0 = SP Controller Fault LED 1 = SP Heartbeat LED 2 = SP System Fault LED 3 = SP Fan 1 Fault LED 4 = SP Fan 2 Fault LED 5 = SP Fan 3 Fault LED 6 = SP IOXM Fault LED all others are not valid SP nodeA*> system fru led show 2 FRU LED ID 2 is off You can check the status of each fru id to see which one is on. From there double check there aren't any actual faults with that component. Still no faults? You may just be able to turn off the led using the "system fru led set" command. On Wed, Dec 14, 2016 at 4:16 PM, Alexander Griesser <AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com>> wrote: Hi there, I have two filers (1x FAS2240 and 1x FAS2650) and both have the system attention LED lit and I have no idea why. `System health status show` as well as config advisor are both A-OK and there are no visible indicators why this LED is still lit. Any idea how to find out the root cause of this LED on the front bezel or how I can check or maybe reset the status of the LED through CLI? Thanks, Alexander Griesser Head of Systems Operations ANEXIA Internetdienstleistungs GmbH E-Mail: AGriesser(a)anexia-it.com<mailto:AGriesser@anexia-it.com> Web: http://www.anexia-it.com<http://www.anexia-it.com/> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt Geschäftsführer: Alexander Windbichler Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601 _______________________________________________ Toasters mailing list Toasters(a)teaparty.net<mailto:Toasters@teaparty.net> http://www.teaparty.net/mailman/listinfo/toasters _______________________________________________ Toasters mailing list Toasters(a)teaparty.net<mailto:Toasters@teaparty.net> http://www.teaparty.net/mailman/listinfo/toasters _______________________________________________ Toasters mailing list Toasters(a)teaparty.net<mailto:Toasters@teaparty.net> http://www.teaparty.net/mailman/listinfo/toasters

2 1

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

toasters December 2016