Here’s how I usually explain this –

 

NVRAM is a journal. Under normal circumstances, it is written but never read. When writes to disk occur, they will read from main RAM, not the NVRAM.  After that’s done, the NVRAM is simply overwritten. The only time it should be read is during an event such as a failure on that controller that requires the partner to take over. The partner replays the uncommitted transactions from that NVRAM onto the disks. In this respect, it’s just like the journal on any journaled filesystems and it’s much like the redo logging functions of databases. Most storage arrays work in a similar way.

 

The reason it’s so small compared to most arrays is because of WAFL. We don’t need that much NVRAM because when writes happen, ONTAP writes out single complete RAID stripes and calculates parity in memory. If there was a need to do lots of reads to regenerate parity, then we’d have to increase the NVRAM more to smooth out performance.

 

If a system is having write problems, it’s not because of NVRAM sizing or speed, it’s because the back-end disks’ capability to receive writes is being overwhelmed. Technically speaking, increasing NVRAM would probably delay the point at which performance problems begin, but it would be relief measured in seconds. If there are problems, it’s because the disks are overwhelmed. It’s a pretty easy thing to monitor because of the use of aggregates. I used to work for Oracle, and we’d just measure the aggregate utilization as a whole. If the aggregate is peaking at 56% utilization or something like that, then there’s plenty of headroom left. If utilization starts touching 90% then it’s probably time to start the purchase order for a new shelf (or checking the workload to see if something is incorrectly loading down the filer.

 

The sizing of the NVRAM is related mostly to CPU capability. When the writes are committed and the journal on NVRAM is invalidated and then overwritten, there is IO to be processed. If there was 500GB of NVRAM on a FAS2240, that would exceed the ability of the platform to process IO in a timely manner. You would see performance problems because the platform wouldn’t be able to write down anything close to 500GB of inbound writes during the disk writing phase. In that extreme of an example, the filer would probably cease responding for 60 seconds every so often.

 

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
Sent: Friday, April 12, 2013 6:38 AM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

 

Correct:   A copy is placed into NVRAM in parallel to main RAM, RAM allocation fills, CP happens, contents of both are thrown away.

Other magic happens in an HA pair..but the same process is followed.

 

On Thu, Apr 11, 2013 at 9:32 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:

Classification: UNCLASSIFIED
Caveats: NONE

While I think it's rational to state that increasing NVRAM won't result in faster writes to disk, I don't think that statement addresses the premise that "NVRAM has no place in the IO operation of a filer".  Here's a quote from NetApp KB article 3011527:

***
The NetApp Storage Controller has two buffers for accepting and caching write data. The Storage Controller can also only process one Consistency Point at a time due to this buffered writing scenario. Under certain circumstances, while one Consistency Point is being processed and written to disk, the second memory buffer can become full prior to the previous Consistency Point being completed. Since the Consistency Point process is global ( meaning ALL writes flow through this mechanism ), a Storage Controller in this situation must stop acknowledging ALL incoming write data requests until the first Consistency Point is completed and the corresponding Non-Volatile RAM (NVRAM) and local memory buffers are cleared.
***

--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds.  No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]

Sent: Thursday, April 11, 2013 4:01 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

...that is to say..if your controller was designed to work with 1.



On Thu, Apr 11, 2013 at 9:00 PM, Jeff Mohler <speedtoys.racing@gmail.com> wrote:


        NVRAM isnt the producer of writes, its not the arbiter of what writes it can do..its just matched to what will be kept in RAM an brokered within an HA pair.


        Plus, some really smart people made the determination.   Trust me, 48G of NVRAM wont write to disk any faster than 1.


        That's it..that simple.



        On Thu, Apr 11, 2013 at 8:45 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


                Classification: UNCLASSIFIED
                Caveats: NONE

                If NVRAM has no place in a NetApp storage system's I/O operations, why would the size of NVRAM correlate with the I/O capabilities of a NetApp storage system?

                --
                Jim Blackburn
                US Army
                james.m.blackburn33.civ@mail.mil
                Note: I am not authorized to obligate, commit, or execute United States government funds.  No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


                -----Original Message-----
                From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]

                Sent: Thursday, April 11, 2013 3:12 PM
                To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
                Cc: Toasters@teaparty.net
                Subject: Re: NVRAM weirdness (UNCLASSIFIED)

                Sure.   NVRAM only holds a copy of data to be written..if all goes well, it's discarded.




                On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


                        Classification: UNCLASSIFIED
                        Caveats: NONE

                        Would "capabilities" happen to be a reference to "I/O capabilities?"

                        --
                        Jim Blackburn
                        US Army
                        james.m.blackburn33.civ@mail.mil
                        Note: I am not authorized to obligate, commit, or execute United States government funds.  No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


                        -----Original Message-----

                        From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
                        Sent: Thursday, April 11, 2013 2:56 PM
                        To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
                        Cc: Toasters@teaparty.net

                        Subject: Re: NVRAM weirdness (UNCLASSIFIED)

                        It is sized appropriately to the capabilities it [controller] was designed for.

                        Hence..adding more, would not provide an improvement for any given system.





                        On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


                                Classification: UNCLASSIFIED
                                Caveats: NONE

                                If "NVRAM has no place in the IO operation of a filer", why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
                                --
                                Jim Blackburn
                                US Army
                                james.m.blackburn33.civ@mail.mil
                                Note: I am not authorized to obligate, commit, or execute United States government funds.  No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


                                -----Original Message-----
                                From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
                                Sent: Thursday, April 11, 2013 2:10 PM
                                To: Scott Eno
                                Cc: Toasters@teaparty.net
                                Subject: Re: NVRAM weirdness

                                For the OP and others..NVRAM has no place in the IO operation of a filer.

                                It holds copies of data that are in main RAM, just in -case- there is a problem with site power/etc.


                                There has been a continued misunderstanding of how Netapp NVRAM works over time..when in fact, it does no work at all.

                                :)


                                In sysstat..a view of "disks" being 100% busy is more a view of "storage" as a whole.  Not 'a' disk.   If storage is stuck..then CP time will also be likewise stuck at 100%...


                                Incorrect interpretation of sysstat and how Ontap works...can lead to a lot of confusion, and continued incorrect phraseology.






                                On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:


                                        Looks like you're hanging on a consistency point operation, but the sysstat output starts after that last CP was logged.  This looks very similar to the ":s" CP bug, but can't say for sure.  You have a long running CP operation, you NVRAM has filled up waiting for the operation to complete, but with no "s" after the "#", then it's not processing special files.  It's doing something else.

                                        Could be one of these bugs:

                                        651573 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
                                        651597 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
                                        651926 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
                                        651929 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
                                        655724 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
                                        657018 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
                                        650554 Inefficient pre-fetching of metadata blocks delays WAFL consistency point



                                        On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com> wrote:


                                                Hi all,


                                                Has anyone seen this pattern in a sysstat before?

                                                TOASTERa> sysstat -u 1
                                                 CPU   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk
                                                       ops/s      in    out    read  write    read  write    age    hit  time  ty  util
                                                 29%   36942       1      0     264      0       0      0     7s    91%  100%  #   100%
                                                 29%   36896       1      0    1616      0       0      0     7s    90%  100%  #   100%
                                                 28%   36658       1      0      32     24       0      0     7s   100%  100%  #   100%
                                                 28%   36625       1      0     108      0       0      0     7s    97%  100%  #   100%
                                                 27%   36318       1      0     136      8       0      0     7s   100%  100%  #   100%
                                                 28%   36391       1      0     364    144       0      0     7s    97%  100%  #   100%
                                                 25%   30387       1      0      88      0       0      0     7s    98%  100%  #   100%
                                                 21%   24450       1      0       8      0       0      0     7s   100%  100%  #   100%
                                                 20%   22042       1      0      24    672       0      0     7s   100%  100%  #   100%
                                                 18%   19781       1      0     284      8       0      0     7s    69%  100%  #   100%
                                                 18%   19636       1      0       0      0       0      0     7s   100%  100%  #   100%
                                                 16%   18066       1      0      24     24       0      0     7s   100%  100%  #   100%
                                                 15%   16999       1      0      56      0       0      0     7s    56%  100%  #   100%
                                                 15%   16622       1      0   11040   6496       0      0     7s    46%  100%  #   1824%
                                                 14%   15947       1      0      24     32       0      0     7s   100%  100%  #   100%
                                                 14%   15448       0      0       0      0       0      0     7s   100%  100%  #   100%
                                                 14%   15902       0      0    3304   4532       0      0     7s    88%  100%  #   100%
                                                 14%   15800       0      0      32     24       0      0     7s   100%  100%  #   100%
                                                 15%   16651       0      0       0      0       0      0     7s   100%  100%  #   100%
                                                 CPU   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk
                                                       ops/s      in    out    read  write    read  write    age    hit  time  ty  util
                                                 14%   15778       0      0    1520   2932       0      0     7s   100%  100%  #   100%
                                                 13%   14311       0      1      28     24       0      0     7s   100%  100%  #   100%
                                                 12%   12153       0      0       0      0       0      0     7s   100%  100%  #   100%
                                                 12%   11993       0      0       8      0       0      0     7s   100%  100%  #   100%
                                                 12%   12134       0      0    7368   5996       0      0     7s    18%  100%  #   100%

                                                 13%   11981       0      0   20356 14956 <tel:20356%20%C2%A014956>  <tel:20356%C2%A0%2014956>        0      0     7s   100%  100%  #   100%

                                                 12%   11978       0      0       0      0       0      0     7s   100%  100%  #   100%
                                                 11%   11617       0      0      24     24       0      0     7s   100%  100%  #   100%
                                                 12%   11708       0      0       8      0       0      0     7s   100%  100%  #   100%
                                                 11%   11630       0      0   11936  17872       0      0     7s   100%  100%  #   100%
                                                 11%   11639       0      0      24     32       0      0     7s   100%  100%  #   100%
                                                 12%   11791       0      0      12      0       0      0     7s    87%  100%  #   100%
                                                 22%   11874       0      0   64452  92188       0      0     7s   100%  100%  #   100%
                                                 12%   12015       0      0      32     24       0      0     7s   100%  100%  #   100%
                                                 12%   11954       0      0     556      0       0      0     7s    85%  100%  #   100%
                                                 12%   11899       0      0       0      8       0      0     7s   100%  100%  #   100%
                                                 13%   13408       0      0   23816  35600       0      0     7s    91%  100%  #   100%
                                                 13%   14610       0      0       0      0       0      0     7s   100%  100%  #   100%
                                                 14%   15355       0      0    1248      0       0      0     7s    72%  100%  #   100%
                                                 14%   15584       0      0   18652  15224       0      0     7s    98%  100%  #   100%
                                                 CPU   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk
                                                       ops/s      in    out    read  write    read  write    age    hit  time  ty  util
                                                 14%   15397       0      0       0      8       0      0     7s   100%  100%  #   100%
                                                 15%   15859       0      1       0      0       0      0     7s   100%  100%  #   100%
                                                 14%   15557       0      0    4944   4568       0      0     7s   100%  100%  #   100%
                                                 14%   15559       0      0       8      0       0      0     7s   100%  100%  #   100%
                                                 14%   15603       0      0       0      0       0      0     7s   100%  100%  #   100%
                                                 14%   15507       0      0    3652   4724       0      0     7s   100%  100%  #   100%
                                                 14%   15593       0      0      20      0       0      0     7s    75%  100%  #   100%
                                                 12%   12571       0      0     268      0       0      0     7s   100%  100%  #   100%
                                                 10%    9962       0      0      92   2072       0      0     7s    96%  100%  #   100%
                                                  9%    8521       0      0       0      0       0      0     7s   100%  100%  #   100%
                                                  8%    7680       0      0       4      8       0      0     7s   100%  100%  #   100%
                                                  8%    7459       0      0      24     24       0      0     7s   100%  100%  #   100%
                                                  8%    7050       0      0    2580   1600       0      0     7s   100%  100%  #   100%
                                                  8%    6961       0      0       8      0       0      0     7s   100%  100%  #   100%
                                                  8%    6954       0      0      24     24       0      0     7s   100%  100%  #   100%
                                                  7%    6936       0      0     584   2400       0      0     7s   100%  100%  #   100%
                                                  8%    7043       0      0     248      0       0      0     7s     0%  100%  #   100%
                                                  7%    6946       0      0      24     24       0      0     7s   100%  100%  #   100%
                                                  8%    7320       0      0    1860   7252       0      0     7s   100%  100%  #   100%
                                                 55%   19882       2      2   89920  29733       0      0     1     86%   99%  B    45%
                                                 CPU   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk
                                                       ops/s      in    out    read  write    read  write    age    hit  time  ty  util
                                                 88%   17409      31    187  267319 196942       0      0     6s    83%  100%  :    89%
                                                 87%   16064      11    249  304646 192912       0      0     1     78%  100%  :    85%
                                                 85%    6274      70     86  322234 233245       0      0     1     81%  100%  #    87%
                                                 85%   12486      18    340  236474 102383       0      0     3s    82%  100%  B    73%
                                                 98%   13842       3     29  247937 255188       0      0     2s    85%  100%  :    80%
                                                 99%   17878       1      1  348355 223367       0      0     3s    84%  100%  #    86%
                                                 99%    4117       1      1  359628 269212       0      0     3s    78%  100%  #    92%
                                                 83%   12622      10     31  252272 170064       0      0     2s    83%  100%  B    73%
                                                 94%   19321       4     10  248864 249292       0      0     2s    88%  100%  :    96%
                                                 99%   15893       0      0  318375 261482       0      0     2s    85%  100%  :    89%
                                                 88%   13326      34     76  284960 194032       0      0     2s    82%  100%  B    90%
                                                 86%   11106      73    104  267760 231524       0      0     2s    81%  100%  :   100%
                                                 72%   11160       0      0  206927 128869       0      0     3s    75%  100%  :   100%
                                                 63%   11304       0      0  220736  25276       0      0     3s    67%  100%  :   100%
                                                 68%   11216       0      0  193324  15184       0      0     3s    66%  100%  :   100%
                                                 65%   11343       0      0  175120  12588       0      0     3s    60%  100%  :   100%
                                                 52%    7384       0      0  205780 104564       0      0     3s    61%  100%  #   100%
                                                 83%    7205       0      0  261108 178256       0      0     3s    74%  100%  B    90%
                                                 75%   11256       0      0  225164  96332       0      0     3s    76%  100%  :    93%
                                                 99%    6279       0      0  296056 251191       0      0     2s    87%  100%  :    90%
                                                 CPU   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk
                                                       ops/s      in    out    read  write    read  write    age    hit  time  ty  util
                                                 65%    8616       0      0  220068 165749       0      0     2s    76%   73%  :    71%
                                                 89%    5443       0      1  338684 239932       0      0     3s    80%   94%  F    87%
                                                 95%    6726       0      0  379242 240750       0      0     3s    79%  100%  :    91%
                                                 83%    8721       1      0  342940 113394       0      0     2s    82%   61%  :    89%
                                                 77%   11971       1      0  314356     24       0      0     2s    85%    0%  -    92%


                                                ... and back to normal.

                                                Looking in Perf Advisor, it was during an extremely short, sharp latency spike that impacted every volume on every aggregate at exact the same time.  It was the fourth of four spikes that had happened a few minutes apart during the previous 20 mins.

                                                It looks like the NVRAM flush got stuck somehow but I have no idea what would cause that.

                                                We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.


                                                Thanks,
                                                Peta

                                                _______________________________________________
                                                Toasters mailing list
                                                Toasters@teaparty.net
                                                http://www.teaparty.net/mailman/listinfo/toasters




                                        _______________________________________________
                                        Toasters mailing list
                                        Toasters@teaparty.net
                                        http://www.teaparty.net/mailman/listinfo/toasters






                                --
                                ---
                                Gustatus Similis Pullus

                                Classification: UNCLASSIFIED
                                Caveats: NONE



                                _______________________________________________
                                Toasters mailing list
                                Toasters@teaparty.net
                                http://www.teaparty.net/mailman/listinfo/toasters





                        --
                        ---
                        Gustatus Similis Pullus

                        Classification: UNCLASSIFIED
                        Caveats: NONE



                        _______________________________________________
                        Toasters mailing list
                        Toasters@teaparty.net
                        http://www.teaparty.net/mailman/listinfo/toasters





                --
                ---
                Gustatus Similis Pullus

                Classification: UNCLASSIFIED
                Caveats: NONE



                _______________________________________________
                Toasters mailing list
                Toasters@teaparty.net
                http://www.teaparty.net/mailman/listinfo/toasters





        --
        ---
        Gustatus Similis Pullus




--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters




--
---
Gustatus Similis Pullus