Re: SAN for SAS

12 Mar 2011


      ...
...
...
...
...
"Raj" == Raj Patel phigmov@gmail.com writes:
Raj> There are a couple of issues for these guys -
Raj> * Long copy times for the datasets from the data-center to their workstations
What percentage of time are they spending on waiting for their data?
Raj> * Long processing times for i/o intensive SAS processes on their
Raj> PC - they're just using SAS Workstation and batching up the work
Raj> themselves (ie not distributed by design but by necessity)
How busy are the CPUs on these systems?  How much memory do they have?
Are they swapping (or using lots of pagefile in Windows terms I think)
to disk while processing the data?
Raj> WAN circuit costs are pricey so ideally we'd centralise their
Raj> workloads - powerful SAS server plus plenty of disk or even 40
Raj> servers with individual workstation licenses (either VDI, Citrix
Raj> or real) and shared disk. Their theory is that their tasks aren't
Raj> necessarily CPU bound but disk i/o bound.
In that case, a simple test would be to setup their home system with a
RAID0, where you stripe all their volumes across multiple disks.
You mention that they have one disk for OS, one for temp, one for
data, etc.  Instead you should use those three or four (or more!)
disks in a nice RAID0 stripe set, so they are reading and writing
across multiple disks at once.
If that gives them a big speedup, then you know that they're IO bound
in their simulations.
Raj> SSD's are fine for reads but they're skeptical about writes. SAS
Raj> apparently supports RAM disks but they're pretty pricey for the
Raj> size they'd need.
It all depends on how large the data sets are and how much RAM the
systems have.  You don't mention any of that, so it's hard to know.
But in general, if they're IO bound, then I'd look into have a central
system with multiple CPUs, but more importantly LOTS of spidles setup
in RAID1 mirrors, then stripped across the pairs of mirrors so that
they have both speed and reliability.
I'd also look into stuffing lots of RAM into the system to see if they
can use RAM for their temp space.  But it all depends on where the
bottleneck is.
Raj> A bit of googling indicates SAS has a distributed processing
Raj> mechanism. I'll have to chat to them about licensing (suspect its
Raj> not cheap).
Time = $$$ for alot of people.
Raj> Anyone using their SAN's for storing or running weather,
Raj> population or financial simulations (ie massive data-sets with a
Raj> 50/50 mix of reads/writes) ?
It sounds like you're doing streaming reads/writes, i.e. reading in a
large data set in chunks, running calculations with a fair amount of
locality, then streaming out results or processed data.  In that case,
throughput, not CPU, is your big problem.
And I suspect that Netapp won't be the solution here, but it's hard to
know.  Remember, a 1Gig network connection is 100MBytes/sec max, so if
you have 40 compute nodes all trying to push that much data at once,
that's ALOT.  You'd want to goto 10Gig ethernet, etc.
But first things first.  You need to characterize your data set and
examine where the bottlenecks are.  But thinking about it more, if
they're willing to load datasets over the WAN, they can't be *that*
big.  So maybe it's TEMP writes which is killing their performance?
Or contention between the single disk they're using for local storage.
So pull in a test case and look at what's really happening.  Network
IO, Disk IO, IO patterns, etc.  Thne you know where to spend your
time/money to speed things up.
John
    John Stoffel - Senior Staff Systems Administrator - System LSI Group
  Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec
         john.stoffel@taec.toshiba.com - 508-486-1087

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Re: SAN for SAS