From: Brian Tao [mailto:taob@risc.org]
Is anyone using Netapps as a data storage backend for MSCS-aware
applications? I've read the white paper on SQL Server 7 and MSCS at http://www.netapp.com/tech_library/3084.html, but do those principles apply to any clustered application? I read Microsoft's own MSCS FAQ, and although it starts off saying it needs a shared storage backend, it goes on to mention only shared SCSI devices... nothing about FibreChannel or CIFS. It also needs a shared "quorum" disk, which somehow allows MSCS to determine which servers in the cluster are down (but the FAQ doesn't say how exactly the negotiation is done).
The quorum device is a disk that is accessible by any member of the cluster that can potentially "form" the cluster. When Microsoft says it is a shared device it means that it is accessible by > 1 node (e.g. shared scsi bus). It does not mean that the device is written to by multiple nodes at the same time (as only one node can online a disk at a time). The quorum device must support SCSI reserve, release, and bus reset commands.
The basic algorithm for acquiring the quorum device is to do:
o bus reset (which blows away any existing SCSI reservations) o wait TWO reassertion intervals o Try to reserve the quorum if successful then the node now owns the quorum if a reservation conflict error is returned then someone else owns the quorum and has reasserted the reservation. o If the ownership of the quorum was obtained then reassert the reservation every ONE reassertion interval
This means that the owning node keeps the quorum by reasserting the reservation every, say, 1 second. A different node can determine if the owning node is still up (say there's a network partition and they cannot talk) by blowing away the reservation, waiting 2 seconds, and then attempting to do the reserve. If the owner is still up it will have reasserted the reservation.
Note this is how MSCS worked in NT4; I'm not sure about W2K. -Steve