Iirc (and I'd check since my info may be out of date) SAS 50/50 is only that over the long run; it does a lot of reads (up to entire analysis sets) followed by a lot of writes, with quite a bit of IO to temporary files too. It can be quite a handful to manage on shared storage. You need a good understanding of the app to be able to size it properly.



Alex McDonald
NetApp, Office of the CTO
Send on a bb, excuse the typos

----- Original Message -----
From: Raj Patel <phigmov@gmail.com>
To: Maxwell Reid <max.reid@saikonetworks.com>
Cc: toasters@mathworks.com <toasters@mathworks.com>
Sent: Fri Mar 11 20:16:12 2011
Subject: Re: SAN for SAS

Hi Max,

There are a couple of issues for these guys -

* Long copy times for the datasets from the data-center to their workstations
* Long processing times for i/o intensive SAS processes on their PC -
they're just using SAS Workstation and batching up the work themselves
(ie not distributed by design but by necessity)

WAN circuit costs are pricey so ideally we'd centralise their
workloads - powerful SAS server plus plenty of disk or even 40 servers
with individual workstation licenses (either VDI, Citrix or real) and
shared disk. Their theory is that their tasks aren't necessarily CPU
bound but disk i/o bound.

SSD's are fine for reads but they're skeptical about writes. SAS
apparently supports RAM disks but they're pretty pricey for the size
they'd need.

A bit of googling indicates SAS has a distributed processing
mechanism. I'll have to chat to them about licensing (suspect its not
cheap).

Anyone using their SAN's for storing or running weather, population or
financial simulations (ie massive data-sets with a 50/50 mix of
reads/writes) ?

Cheers to all for the tips so far !

Raj.

On 3/12/11, Maxwell Reid <max.reid@saikonetworks.com> wrote:
> HI Raj,
>
> Would I be correct in assuming that the main problem you're trying to solve
> is reducing the replication via WAN to the local machines?
>
> In this case, moving the processing nodes to 40 machines  in the datacenter
> with or without a shared storage arrangement would fix that issue; RPC or
> protocol based replication between nodes would be required unless you're
> going to switch to a clustered filesystem or network filesystem.  In this
> case a SAN or NAS setup would work fine.
>
> From the HPC side of things, sticking a bunch of PCIe SSDs in the nodes and
> connecting them together via 10GbE or  Infiniband would certainly speed
> things up without the need for a shared disk pool, but that's probably
> overkill for what you're trying to accomplish.
>
>
> ~Max
>
>
>
>
> On Fri, Mar 11, 2011 at 10:43 AM, Raj Patel <phigmov@gmail.com> wrote:
>
>> Hi,
>>
>> A bit of a generic SAN question (not necessarily NetApp specific).
>>
>> I've got a team of 40 people who use a statistical analysis package
>> (SAS) to crunch massive time-series data sets.
>>
>> They claim their biggest gripe is disk contention - not necessarily
>> one person using the same data but 40. So they process these data-sets
>> locally on high-spec PC's with several disks (one for OS, one for
>> scratch, one for reads and one for writes).
>>
>> I think they'd be much better off utilising shared storage (ie a SAN)
>> in a datacenter so at least the workloads are spread across multiple
>> spindles and they only need to copy or replicate the data within the
>> datacenter rather than schlep it up and down the WAN which is what
>> they currently do to get it to their distributed team PC's.
>>
>> Are there any useful guides or comparisons for best practise in
>> designing HPC environments on shared infrastructure ?
>>
>> Other than knowing what SAS does I'm not sure on its HPC capabilities
>> (ie distributed computing, clustering etc) so I'll need to follow that
>> angle up too.
>>
>> Thanks in advance,
>> Raj.
>>
>