Iirc (and I'd check since my info may be out of date) SAS 50/50 is only that over the long run; it does a lot of reads (up to entire analysis sets) followed by a lot of writes, with quite a bit of IO to temporary files too. It can be quite a handful to manage on shared storage. You need a good understanding of the app to be able to size it properly.
Alex McDonald NetApp, Office of the CTO Send on a bb, excuse the typos
----- Original Message ----- From: Raj Patel phigmov@gmail.com To: Maxwell Reid max.reid@saikonetworks.com Cc: toasters@mathworks.com toasters@mathworks.com Sent: Fri Mar 11 20:16:12 2011 Subject: Re: SAN for SAS
Hi Max,
There are a couple of issues for these guys -
* Long copy times for the datasets from the data-center to their workstations * Long processing times for i/o intensive SAS processes on their PC - they're just using SAS Workstation and batching up the work themselves (ie not distributed by design but by necessity)
WAN circuit costs are pricey so ideally we'd centralise their workloads - powerful SAS server plus plenty of disk or even 40 servers with individual workstation licenses (either VDI, Citrix or real) and shared disk. Their theory is that their tasks aren't necessarily CPU bound but disk i/o bound.
SSD's are fine for reads but they're skeptical about writes. SAS apparently supports RAM disks but they're pretty pricey for the size they'd need.
A bit of googling indicates SAS has a distributed processing mechanism. I'll have to chat to them about licensing (suspect its not cheap).
Anyone using their SAN's for storing or running weather, population or financial simulations (ie massive data-sets with a 50/50 mix of reads/writes) ?
Cheers to all for the tips so far !
Raj.
On 3/12/11, Maxwell Reid max.reid@saikonetworks.com wrote:
HI Raj,
Would I be correct in assuming that the main problem you're trying to solve is reducing the replication via WAN to the local machines?
In this case, moving the processing nodes to 40 machines in the datacenter with or without a shared storage arrangement would fix that issue; RPC or protocol based replication between nodes would be required unless you're going to switch to a clustered filesystem or network filesystem. In this case a SAN or NAS setup would work fine.
From the HPC side of things, sticking a bunch of PCIe SSDs in the nodes and connecting them together via 10GbE or Infiniband would certainly speed things up without the need for a shared disk pool, but that's probably overkill for what you're trying to accomplish.
~Max
On Fri, Mar 11, 2011 at 10:43 AM, Raj Patel phigmov@gmail.com wrote:
Hi,
A bit of a generic SAN question (not necessarily NetApp specific).
I've got a team of 40 people who use a statistical analysis package (SAS) to crunch massive time-series data sets.
They claim their biggest gripe is disk contention - not necessarily one person using the same data but 40. So they process these data-sets locally on high-spec PC's with several disks (one for OS, one for scratch, one for reads and one for writes).
I think they'd be much better off utilising shared storage (ie a SAN) in a datacenter so at least the workloads are spread across multiple spindles and they only need to copy or replicate the data within the datacenter rather than schlep it up and down the WAN which is what they currently do to get it to their distributed team PC's.
Are there any useful guides or comparisons for best practise in designing HPC environments on shared infrastructure ?
Other than knowing what SAS does I'm not sure on its HPC capabilities (ie distributed computing, clustering etc) so I'll need to follow that angle up too.
Thanks in advance, Raj.