Can volumes span FCAL controllers?

List overview All Threads
Download

newer

older

RE: /home layout with many filers...

/home layout with many filers and...

Matt Phelps

9 Jan 2001 9 Jan '01

7:52 p.m.

Hi,

We're probably going to ba adding another FCAL controller to our 840 soon. Can volumes include disks that are on both controllers?

-- Matt Phelps System Administrator, Computation Facility Harvard - Smithsonian Center for Astrophysics mphelps@cfa.harvard.edu, http://cfa-www.harvard.edu/~mphelps

Show replies by date

Bruce Sterling Woodcock

9 Jan 9 Jan

8:30 p.m.

----- Original Message ----- From: "Matt Phelps" mphelps@cfa.harvard.edu To: toasters@mathworks.com Sent: Tuesday, January 09, 2001 11:52 AM Subject: Can volumes span FCAL controllers?

...

Hi,

We're probably going to ba adding another FCAL controller to our 840 soon. Can volumes include disks that are on both controllers?

Yes, but there's a slight performance penalty in some cases on heavy writes and thus it is not recommended.

Bruce

Jason Santos

12 Jan 12 Jan

3:07 a.m.

That's not entirely true -- you can span a volume across controllers, just dont span RAID groups across controllers.

Bruce Sterling Woodcock wrote:

...

----- Original Message ----- From: "Matt Phelps" mphelps@cfa.harvard.edu To: toasters@mathworks.com Sent: Tuesday, January 09, 2001 11:52 AM Subject: Can volumes span FCAL controllers?

...
Hi,

We're probably going to ba adding another FCAL controller to our 840 soon. Can volumes include disks that are on both controllers?

Yes, but there's a slight performance penalty in some cases on heavy writes and thus it is not recommended.

Bruce

-- Jason Santos UNIX System Administrator ON Semiconductor jason.santos@onsemi.com (602) 244-3769

Bruce Sterling Woodcock

3:28 a.m.

----- Original Message ----- From: "Jason Santos" jason.santos@onsemi.com To: "Bruce Sterling Woodcock" sirbruce@ix.netcom.com Cc: "Matt Phelps" mphelps@cfa.harvard.edu; toasters@mathworks.com Sent: Thursday, January 11, 2001 7:07 PM Subject: Re: Can volumes span FCAL controllers?

...

That's not entirely true -- you can span a volume across controllers, just dont span RAID groups across controllers.

When a volume has 2 RAID groups, is the NVRAM split among RAID groups? How are CPs done?

Bruce

Steve Losen

1:50 p.m.

...

...
That's not entirely true -- you can span a volume across controllers, just dont span RAID groups across controllers.

When a volume has 2 RAID groups, is the NVRAM split among RAID groups? How are CPs done?

Bruce

As I understand it, NVRAM is used for logging write requests from clients -- not as a disk buffer cache. The filer periodically generates consistency points where the disk volumes are perfectly consistent. These occur no less often than every 10 seconds.

In order to update the disks efficiently, the filer allows write requests to accumulate for awhile and commits them in a coordinated fashion. This greatly reduces the load on the parity drives. (If you write a bunch of blocks in the same stripe at the same time, you only have to update the parity block once.)

Once a consistency point has been generated, the NVRAM log is cleaned up to make room for logging more incoming write requests.

Due to the design of WAFL, writing a new consistency point does not undo or damage the previous consistency point.

Here's how the filer recovers from a crash or loss of power when the volumes are inconsistent. When the filer comes up, it reverts back to the most recent consistency point (no more than 10 sec old) and replays all the write requests logged in NVRAM that arrived after the consistency point was generated.

So in answer to your question, the NVRAM is shared by all volumes and raid groups because it is a log of incoming write requests, not a disk buffer cache.

Steve Losen scl@virginia.edu phone: 804-924-0640

University of Virginia ITC Unix Support

Chris Thompson

3:44 p.m.

Steve Losen scl@sasha.acc.virginia.edu wrote:

...

[in response to Bruce Sterling Woodcock sirbruce@ix.netcom.com]

...

...
...
That's not entirely true -- you can span a volume across controllers, just dont span RAID groups across controllers.

When a volume has 2 RAID groups, is the NVRAM split among RAID groups? How are CPs done?

[ Good introduction to consistency points for newbies ]

...

So in answer to your question, the NVRAM is shared by all volumes and raid groups because it is a log of incoming write requests, not a disk buffer cache.

But to resume the original thread, the question is whether or not the writes done as part of CPs are clustered in a way that helps to reduce the overheads of switching between FCAL controllers.

CPs for different volumes are logically distinct operations, but are in practice synchronised, either by the 10-second clock or by NVRAM filling up. In saying that one should avoid spreading a volume over multiple controllers, but can have different volumes on different controllers, the assumption is that writes (mostly) occur first to one volume, then to another.

If the same is to apply to RAID groups, then the assumption is that the writes associated with taking a CP on a single volume are (mostly) clustered by RAID group. This sounds entirely reasonable, given that the filer tries to write whole stripes, or at least stripes in which as many planes as possible are being updated.

Chris Thompson University of Cambridge Computing Service, Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, Phone: +44 1223 334715 United Kingdom.

Sam Rafter

11:52 p.m.

Hello Toasters!

I hope to clarify the multiple controller issue with a fairly long winded note. If I leave out anyone's point on RAID groups, or volumes, spanning FCAL controllers, let me know; or let the whole list know. I'm game.

I suspect this precautionary rule of thumb arose from a hardware bug in the 700 series, burt 19290. There's a short description on NOW here:

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=19290

It's cryptic and brief, and should probably be reworded. I'll look into that. The long story is, arbitrating multiple PCI bus requests on 700 series hardware didn't go as smoothly as we'd hope, resulting in performance degradation in this case. How often does the case come up, you ask? The worst case is when writes over 100tx ethernet go to a quad ethernet card (which causes a lot of interrupts) in slot 1, 2 or 3; an FCAL controller is also in slot 1, 2 or 3; and another FCAL controller with destination disks for the write op is in slot 4, 5, 6 or 7. It's worth pointing out that gigabit controllers buffer data much more efficiently, and aren't throttled by waiting for PCI interrupts. A single ethernet card doesn't have so much data to unload, so it can get the bus much more readily. A quad card has lots of data to unload, with very little buffer space on the controller. Scheduling interrupts for the NIC to unload the data, and two other interrupts to load data on to the two FCAL controllers, will not go as fast as the quad card would like. This is why the NIC starts reporting h/w overflows and bus underruns in ifstat.

For the raid group / volume distinction, writes are allocated per volume, and the free space in the given raid groups will determine where the writes go. NVRAM isn't divided on a per-volume or per-raid group basis. The write data, by and large, isn't in NVRAM. The write transaction description is in NVRAM until the write is committed to disk, normally with the data being served from system memory. Teeny tiny writes are the exception to this. If you blow out an FCAL controller (which really doesn't happen that often), WAFL won't see the disks on the end of the controller. If either a volume or a raid group in a given volume is split across this hypothetically blown controller, the affected volume will lose disks. If it loses more than two in a raid group, it'll go away. Performance and redundancy are at odds for this configuration, but the exposure is low.

So, the following configurations are safe for multi-adapter volumes:

+ 800-series filers + GbE workloads + non write-intensive workloads + writes directed at multiple volumes simultaneously + 700-series implementing the workaround in burt 19290

Anyone left?

WAFL volumes can, and often do, span FCAL adapters. In many cases there is a performance benefit from spreading volumes across multiple FCALs, due to improved load balancing between adapters. Performance will not suffer if you split a RAID group (or volume) across adapters, barring bug 19290. Most of our F840 performance benchmarks are run in this configuration. Follow up questions are welcome, but may not be answered until Monday. Enjoy!

-- Sam Rafter Escalations Jerk Network Appliance rafter@netapp.com * It was Fri, Jan 12, 2001 at 03:44:53PM +0000 when Chris Thompson wrote: > Steve Losen scl@sasha.acc.virginia.edu wrote: > > > [in response to Bruce Sterling Woodcock sirbruce@ix.netcom.com] > > > > > > > That's not entirely true -- you can span a volume across controllers, > > > > just dont span RAID groups across controllers. > > > > > > When a volume has 2 RAID groups, is the NVRAM split among > > > RAID groups? How are CPs done? > > > [ Good introduction to consistency points for newbies ] > > > > So in answer to your question, the NVRAM is shared by all volumes > > and raid groups because it is a log of incoming write requests, > > not a disk buffer cache. > > But to resume the original thread, the question is whether or not the > writes done as part of CPs are clustered in a way that helps to reduce > the overheads of switching between FCAL controllers. > > CPs for different volumes are logically distinct operations, but are > in practice synchronised, either by the 10-second clock or by NVRAM > filling up. In saying that one should avoid spreading a volume over > multiple controllers, but can have different volumes on different > controllers, the assumption is that writes (mostly) occur first to > one volume, then to another. > > If the same is to apply to RAID groups, then the assumption is that > the writes associated with taking a CP on a single volume are (mostly) > clustered by RAID group. This sounds entirely reasonable, given that > the filer tries to write whole stripes, or at least stripes in which > as many planes as possible are being updated. > > Chris Thompson University of Cambridge Computing Service, > Email: cet1@ucs.cam.ac.uk New Museums Site, Cambridge CB2 3QG, > Phone: +44 1223 334715 United Kingdom. >

Sam Rafter

13 Jan 13 Jan

12:04 a.m.

...

If it loses more than two in a raid group, it'll go away.

Of course what I meant was, if a disk loses two disks in a given raid group, not more than two.

-- Sam Rafter Escalations Jerk Network Appliance rafter@netapp.com

Bruce Sterling Woodcock

8:01 a.m.

...

For the raid group / volume distinction, writes are allocated per volume, and the free space in the given raid groups will determine where the

writes

...

go. NVRAM isn't divided on a per-volume or per-raid group basis. The write data, by and large, isn't in NVRAM. The write transaction description is in NVRAM until the write is committed to disk, normally

with

...

the data being served from system memory. Teeny tiny writes are the exception to this.

Yes, I understand that writes are logged simultaneously to both RAM and NVRAM and the writes come from RAM and NVRAM is only used in case of a crash. However, the data in RAM has to be written to disk before the NVRAM can be flushed. Normally NVRAM is divided into two sections anyway. The question is, with volumes and groups, is NVRAM divided further? When a CP is triggered, either by timer or log full, does it write out all volumes or all groups or one volume or one group or what?

If one write has to span two different controllers, then there has to be at least a minimal timing impact. The impact will, of couse, be "behind the scenes" and not effect the response time unless writes are sufficiently intensive that you're doing a cp_from_cp.

Bruce

Sam Rafter

11:23 a.m.

* It was Sat, Jan 13, 2001 at 12:01:52AM -0800 when Bruce Sterling Woodcock wrote:

...

...
For the raid group / volume distinction, writes are allocated per volume, and the free space in the given raid groups will determine where the

writes

...
go. NVRAM isn't divided on a per-volume or per-raid group basis. The write data, by and large, isn't in NVRAM. The write transaction description is in NVRAM until the write is committed to disk, normally

with

...
the data being served from system memory. Teeny tiny writes are the exception to this.

Yes, I understand that writes are logged simultaneously to both RAM and NVRAM and the writes come from RAM and NVRAM is only used in case of a crash. However, the data in RAM has to be written to disk before the NVRAM can be flushed.

Not necessarily. If the filer doesn't crash, yes. The write is reported successful to the client as soon as it hits NVRAM.

...

The question is, with volumes and groups, is NVRAM divided further?

No, NVRAM isn't divided on a per-volume or per-raid group basis.

...

When a CP is triggered, either by timer or log full, does it write out all volumes or all groups or one volume or one group or what?

It writes out data from all volumes that had writes to it.

...

If one write has to span two different controllers, then there has to be at least a minimal timing impact. The impact will, of couse, be "behind the scenes" and not effect the response time unless writes are sufficiently intensive that you're doing a cp_from_cp.

But, you're sending out data to two controllers, with two different write queues, talking to different disks. The time spent passing along the interrupt compared to the time spent doing the i/o, is negligible.

-- Sam Rafter Escalations Jerk Network Appliance rafter@netapp.com

8986

Age (days ago)

8990

Last active (days ago)

toasters@lists.teaparty.net

9 comments

6 participants

tags (0)

participants (6)

Bruce Sterling Woodcock
Chris Thompson
Jason Santos
Matt Phelps
Sam Rafter
Steve Losen