"Raj" == Raj Patel phigmov@gmail.com writes:
Raj> There are a couple of issues for these guys -
Raj> * Long copy times for the datasets from the data-center to their workstations
What percentage of time are they spending on waiting for their data?
Raj> * Long processing times for i/o intensive SAS processes on their Raj> PC - they're just using SAS Workstation and batching up the work Raj> themselves (ie not distributed by design but by necessity)
How busy are the CPUs on these systems? How much memory do they have? Are they swapping (or using lots of pagefile in Windows terms I think) to disk while processing the data?
Raj> WAN circuit costs are pricey so ideally we'd centralise their Raj> workloads - powerful SAS server plus plenty of disk or even 40 Raj> servers with individual workstation licenses (either VDI, Citrix Raj> or real) and shared disk. Their theory is that their tasks aren't Raj> necessarily CPU bound but disk i/o bound.
In that case, a simple test would be to setup their home system with a RAID0, where you stripe all their volumes across multiple disks.
You mention that they have one disk for OS, one for temp, one for data, etc. Instead you should use those three or four (or more!) disks in a nice RAID0 stripe set, so they are reading and writing across multiple disks at once.
If that gives them a big speedup, then you know that they're IO bound in their simulations.
Raj> SSD's are fine for reads but they're skeptical about writes. SAS Raj> apparently supports RAM disks but they're pretty pricey for the Raj> size they'd need.
It all depends on how large the data sets are and how much RAM the systems have. You don't mention any of that, so it's hard to know.
But in general, if they're IO bound, then I'd look into have a central system with multiple CPUs, but more importantly LOTS of spidles setup in RAID1 mirrors, then stripped across the pairs of mirrors so that they have both speed and reliability.
I'd also look into stuffing lots of RAM into the system to see if they can use RAM for their temp space. But it all depends on where the bottleneck is.
Raj> A bit of googling indicates SAS has a distributed processing Raj> mechanism. I'll have to chat to them about licensing (suspect its Raj> not cheap).
Time = $$$ for alot of people.
Raj> Anyone using their SAN's for storing or running weather, Raj> population or financial simulations (ie massive data-sets with a Raj> 50/50 mix of reads/writes) ?
It sounds like you're doing streaming reads/writes, i.e. reading in a large data set in chunks, running calculations with a fair amount of locality, then streaming out results or processed data. In that case, throughput, not CPU, is your big problem.
And I suspect that Netapp won't be the solution here, but it's hard to know. Remember, a 1Gig network connection is 100MBytes/sec max, so if you have 40 compute nodes all trying to push that much data at once, that's ALOT. You'd want to goto 10Gig ethernet, etc.
But first things first. You need to characterize your data set and examine where the bottlenecks are. But thinking about it more, if they're willing to load datasets over the WAN, they can't be *that* big. So maybe it's TEMP writes which is killing their performance? Or contention between the single disk they're using for local storage.
So pull in a test case and look at what's really happening. Network IO, Disk IO, IO patterns, etc. Thne you know where to spend your time/money to speed things up.
John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087