Did you actually see “rlw_update” process in “ps” output? Could you show example of “ps” output where this process is seen? This is the first time I hear about “rlw_update” *process*. There is “rlw_upgrading” aggregate flag …

 

I have here one filer where RLW update still runs and I do not see this process. RLW upgrading is performed as part of normal aggregate scrubbing. May be you confuse extra load caused by aggregate scrubbing for load caused by RLW upgrading?

 

The thread you refer to intermixes at least half a dozen of different performance related problems and none of these problems is related to RLW at the end. The thread is actually pretty bad information source because it is no more possible to understand which problem is discussed.

 

Also on forums.netapp.com NetApp employee gave pretty good explanation of what RLW upgrade is. 

 

-andrey

 

 

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Randy Rue
Sent: Tuesday, December 04, 2012 2:37 AM
To: toasters@teaparty.net
Subject: 8.1.1 RLW Performance Trouble, 8.1.2 issues?

 

Hello All,

 

After a few months of planning and preparation we were just about to pull the trigger on an 8.1.1 upgrade to our v3170 currently running 7.3.5.1P5 with a 3Par T800 spinning 528 SATA spindles. We've crawled through release notes and upgrade notes, used the online "upgrade advisor" tools, checked this list and engaged our NetApp SE. Tested against a simulator, killed a chicken at midnight of the full moon, etc.

 

Then last week one of our admins found this link and asked in passing if this issue was relevant to our situation:

Data Ontap 8.1 upgrade - RLW_Upgrading process and other issues

https://communities.netapp.com/thread/22676

 

Looks like upgrading to 8.1x implements a new layer of protection called RLW (RAID protection from Lost Writes). This requires the addition of some metadata to the disk system and after upgrade a background process "rlw_update" runs for some period of time. The trouble is, this process does not "nice" itself as it's meant to when other processes are running. Worse, if it runs at the same time as other "not so nice" processes like de-dupe there can be disastrous performance issues. This problem is exacerbated if disk utilization is high, or if slower disks are used, or if a lot of misaligned traffic is running. Users in the wild have reported the rlw_update process taking several weeks and horrible performance issues during its tenure.

 

Yike, I'm just glad we found out about this now and not Monday morning. Our filer consistently runs ~90% disk utilization, is usually running two or three de-dupe processes, is running on SATA disks, and one node does nothing but serve up NFS datastores to our VMware farm which we suspect is running mostly misaligned VMDK files.

 

So now we've postponed the upgrade. We'll need to retest against 8.1.2 (or later). We're averse to get too eager to upgrade to 8.1.2 until it's been out a while, especially as NetApp seems to be on a roll lately, releasing new versions to address performance issues that then seem have performance issues. We're also concerned about upgrading to any new version while the system is heavily loaded (we're also in the process of deploying a new block-level storage system that will offload more than half of the current performance load on the v3170).

 

So. This post is intended first as a heads-up to anyone in a similar situation who's about upgrade to 8.1.1. Another takeaway might be to acknowledge the value of the official NetApp Communities pages: I've usually relied on this toasters list and/or our NetApp/VAR technical staff, but will also be following the "official" user community resources from now on.

 

And I also welcome any feedback from anyone with experience or information to offer. Anyone been in this situation? Anyone running 8.1.2 yet? Anyone have advice on upgrading a significantly busy system?

 

Hope to hear from you,

 

Randy Rue

Seattle