HI Raj,

Would I be correct in assuming that the main problem you're trying to solve is reducing the replication via WAN to the local machines? 

In this case, moving the processing nodes to 40 machines  in the datacenter with or without a shared storage arrangement would fix that issue; RPC or protocol based replication between nodes would be required unless you're going to switch to a clustered filesystem or network filesystem.  In this case a SAN or NAS setup would work fine. 

From the HPC side of things, sticking a bunch of PCIe SSDs in the nodes and connecting them together via 10GbE or  Infiniband would certainly speed things up without the need for a shared disk pool, but that's probably overkill for what you're trying to accomplish. 


~Max




On Fri, Mar 11, 2011 at 10:43 AM, Raj Patel <phigmov@gmail.com> wrote:
Hi,

A bit of a generic SAN question (not necessarily NetApp specific).

I've got a team of 40 people who use a statistical analysis package
(SAS) to crunch massive time-series data sets.

They claim their biggest gripe is disk contention - not necessarily
one person using the same data but 40. So they process these data-sets
locally on high-spec PC's with several disks (one for OS, one for
scratch, one for reads and one for writes).

I think they'd be much better off utilising shared storage (ie a SAN)
in a datacenter so at least the workloads are spread across multiple
spindles and they only need to copy or replicate the data within the
datacenter rather than schlep it up and down the WAN which is what
they currently do to get it to their distributed team PC's.

Are there any useful guides or comparisons for best practise in
designing HPC environments on shared infrastructure ?

Other than knowing what SAS does I'm not sure on its HPC capabilities
(ie distributed computing, clustering etc) so I'll need to follow that
angle up too.

Thanks in advance,
Raj.