Hi,
A bit of a generic SAN question (not necessarily NetApp specific).
I've got a team of 40 people who use a statistical analysis package (SAS) to crunch massive time-series data sets.
They claim their biggest gripe is disk contention - not necessarily one person using the same data but 40. So they process these data-sets locally on high-spec PC's with several disks (one for OS, one for scratch, one for reads and one for writes).
I think they'd be much better off utilising shared storage (ie a SAN) in a datacenter so at least the workloads are spread across multiple spindles and they only need to copy or replicate the data within the datacenter rather than schlep it up and down the WAN which is what they currently do to get it to their distributed team PC's.
Are there any useful guides or comparisons for best practise in designing HPC environments on shared infrastructure ?
Other than knowing what SAS does I'm not sure on its HPC capabilities (ie distributed computing, clustering etc) so I'll need to follow that angle up too.
Thanks in advance, Raj.
I think you are better off get the performance spec off the PC first.
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Raj Patel Sent: Friday, March 11, 2011 12:43 PM To: toasters@mathworks.com Subject: SAN for SAS
Hi,
A bit of a generic SAN question (not necessarily NetApp specific).
I've got a team of 40 people who use a statistical analysis package (SAS) to crunch massive time-series data sets.
They claim their biggest gripe is disk contention - not necessarily one person using the same data but 40. So they process these data-sets locally on high-spec PC's with several disks (one for OS, one for scratch, one for reads and one for writes).
I think they'd be much better off utilising shared storage (ie a SAN) in a datacenter so at least the workloads are spread across multiple spindles and they only need to copy or replicate the data within the datacenter rather than schlep it up and down the WAN which is what they currently do to get it to their distributed team PC's.
Are there any useful guides or comparisons for best practise in designing HPC environments on shared infrastructure ?
Other than knowing what SAS does I'm not sure on its HPC capabilities (ie distributed computing, clustering etc) so I'll need to follow that angle up too.
Thanks in advance, Raj.
First thing I'd do is profile the performance characteristics of the app. Is it read or write intensive? Contiguous or random access?
A large cache (like on a NetApp box) can make up for a variety of sins if it's read intensive.
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Raj Patel Sent: Friday, March 11, 2011 1:43 PM To: toasters@mathworks.com Subject: SAN for SAS
Hi,
A bit of a generic SAN question (not necessarily NetApp specific).
I've got a team of 40 people who use a statistical analysis package (SAS) to crunch massive time-series data sets.
They claim their biggest gripe is disk contention - not necessarily one person using the same data but 40. So they process these data-sets locally on high-spec PC's with several disks (one for OS, one for scratch, one for reads and one for writes).
I think they'd be much better off utilising shared storage (ie a SAN) in a datacenter so at least the workloads are spread across multiple spindles and they only need to copy or replicate the data within the datacenter rather than schlep it up and down the WAN which is what they currently do to get it to their distributed team PC's.
Are there any useful guides or comparisons for best practise in designing HPC environments on shared infrastructure ?
Other than knowing what SAS does I'm not sure on its HPC capabilities (ie distributed computing, clustering etc) so I'll need to follow that angle up too.
Thanks in advance, Raj.
Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
HI Raj,
Would I be correct in assuming that the main problem you're trying to solve is reducing the replication via WAN to the local machines?
In this case, moving the processing nodes to 40 machines in the datacenter with or without a shared storage arrangement would fix that issue; RPC or protocol based replication between nodes would be required unless you're going to switch to a clustered filesystem or network filesystem. In this case a SAN or NAS setup would work fine.
From the HPC side of things, sticking a bunch of PCIe SSDs in the nodes and connecting them together via 10GbE or Infiniband would certainly speed things up without the need for a shared disk pool, but that's probably overkill for what you're trying to accomplish.
~Max
On Fri, Mar 11, 2011 at 10:43 AM, Raj Patel phigmov@gmail.com wrote:
Hi,
A bit of a generic SAN question (not necessarily NetApp specific).
I've got a team of 40 people who use a statistical analysis package (SAS) to crunch massive time-series data sets.
They claim their biggest gripe is disk contention - not necessarily one person using the same data but 40. So they process these data-sets locally on high-spec PC's with several disks (one for OS, one for scratch, one for reads and one for writes).
I think they'd be much better off utilising shared storage (ie a SAN) in a datacenter so at least the workloads are spread across multiple spindles and they only need to copy or replicate the data within the datacenter rather than schlep it up and down the WAN which is what they currently do to get it to their distributed team PC's.
Are there any useful guides or comparisons for best practise in designing HPC environments on shared infrastructure ?
Other than knowing what SAS does I'm not sure on its HPC capabilities (ie distributed computing, clustering etc) so I'll need to follow that angle up too.
Thanks in advance, Raj.
Hi Max,
There are a couple of issues for these guys -
* Long copy times for the datasets from the data-center to their workstations * Long processing times for i/o intensive SAS processes on their PC - they're just using SAS Workstation and batching up the work themselves (ie not distributed by design but by necessity)
WAN circuit costs are pricey so ideally we'd centralise their workloads - powerful SAS server plus plenty of disk or even 40 servers with individual workstation licenses (either VDI, Citrix or real) and shared disk. Their theory is that their tasks aren't necessarily CPU bound but disk i/o bound.
SSD's are fine for reads but they're skeptical about writes. SAS apparently supports RAM disks but they're pretty pricey for the size they'd need.
A bit of googling indicates SAS has a distributed processing mechanism. I'll have to chat to them about licensing (suspect its not cheap).
Anyone using their SAN's for storing or running weather, population or financial simulations (ie massive data-sets with a 50/50 mix of reads/writes) ?
Cheers to all for the tips so far !
Raj.
On 3/12/11, Maxwell Reid max.reid@saikonetworks.com wrote:
HI Raj,
Would I be correct in assuming that the main problem you're trying to solve is reducing the replication via WAN to the local machines?
In this case, moving the processing nodes to 40 machines in the datacenter with or without a shared storage arrangement would fix that issue; RPC or protocol based replication between nodes would be required unless you're going to switch to a clustered filesystem or network filesystem. In this case a SAN or NAS setup would work fine.
From the HPC side of things, sticking a bunch of PCIe SSDs in the nodes and connecting them together via 10GbE or Infiniband would certainly speed things up without the need for a shared disk pool, but that's probably overkill for what you're trying to accomplish.
~Max
On Fri, Mar 11, 2011 at 10:43 AM, Raj Patel phigmov@gmail.com wrote:
Hi,
A bit of a generic SAN question (not necessarily NetApp specific).
I've got a team of 40 people who use a statistical analysis package (SAS) to crunch massive time-series data sets.
They claim their biggest gripe is disk contention - not necessarily one person using the same data but 40. So they process these data-sets locally on high-spec PC's with several disks (one for OS, one for scratch, one for reads and one for writes).
I think they'd be much better off utilising shared storage (ie a SAN) in a datacenter so at least the workloads are spread across multiple spindles and they only need to copy or replicate the data within the datacenter rather than schlep it up and down the WAN which is what they currently do to get it to their distributed team PC's.
Are there any useful guides or comparisons for best practise in designing HPC environments on shared infrastructure ?
Other than knowing what SAS does I'm not sure on its HPC capabilities (ie distributed computing, clustering etc) so I'll need to follow that angle up too.
Thanks in advance, Raj.
WAN circuit costs are pricey so ideally we'd centralise their workloads - powerful SAS server plus plenty of disk or even 40 servers with individual workstation licenses (either VDI, Citrix or real) and shared disk. Their theory is that their tasks aren't necessarily CPU bound but disk i/o bound.
SSD's are fine for reads but they're skeptical about writes.
Just to be clear, I'm talking about SSD's that connect directly to the PCI/E bus, not via a SAS or FC connection. Write bandwidth on these is somewhere around 1Gigabyte/ps (not bits.) If you need the speed, I wouldn't skeptical about the technology anymore.
~Max
SAS apparently supports RAM disks but they're pretty pricey for the size they'd need.
A bit of googling indicates SAS has a distributed processing mechanism. I'll have to chat to them about licensing (suspect its not cheap).
Anyone using their SAN's for storing or running weather, population or financial simulations (ie massive data-sets with a 50/50 mix of reads/writes) ?
Cheers to all for the tips so far !
Raj.
On 3/12/11, Maxwell Reid max.reid@saikonetworks.com wrote:
HI Raj,
Would I be correct in assuming that the main problem you're trying to
solve
is reducing the replication via WAN to the local machines?
In this case, moving the processing nodes to 40 machines in the
datacenter
with or without a shared storage arrangement would fix that issue; RPC or protocol based replication between nodes would be required unless you're going to switch to a clustered filesystem or network filesystem. In this case a SAN or NAS setup would work fine.
From the HPC side of things, sticking a bunch of PCIe SSDs in the nodes
and
connecting them together via 10GbE or Infiniband would certainly speed things up without the need for a shared disk pool, but that's probably overkill for what you're trying to accomplish.
~Max
On Fri, Mar 11, 2011 at 10:43 AM, Raj Patel phigmov@gmail.com wrote:
Hi,
A bit of a generic SAN question (not necessarily NetApp specific).
I've got a team of 40 people who use a statistical analysis package (SAS) to crunch massive time-series data sets.
They claim their biggest gripe is disk contention - not necessarily one person using the same data but 40. So they process these data-sets locally on high-spec PC's with several disks (one for OS, one for scratch, one for reads and one for writes).
I think they'd be much better off utilising shared storage (ie a SAN) in a datacenter so at least the workloads are spread across multiple spindles and they only need to copy or replicate the data within the datacenter rather than schlep it up and down the WAN which is what they currently do to get it to their distributed team PC's.
Are there any useful guides or comparisons for best practise in designing HPC environments on shared infrastructure ?
Other than knowing what SAS does I'm not sure on its HPC capabilities (ie distributed computing, clustering etc) so I'll need to follow that angle up too.
Thanks in advance, Raj.
Thanks for all the excellent feedback.
I'll have a chat to SAS direct and see what they recommend and might also touch base with our Server vendor to see what their HPC offerings are like.
Cheers, Raj.
"Raj" == Raj Patel phigmov@gmail.com writes:
Raj> There are a couple of issues for these guys -
Raj> * Long copy times for the datasets from the data-center to their workstations
What percentage of time are they spending on waiting for their data?
Raj> * Long processing times for i/o intensive SAS processes on their Raj> PC - they're just using SAS Workstation and batching up the work Raj> themselves (ie not distributed by design but by necessity)
How busy are the CPUs on these systems? How much memory do they have? Are they swapping (or using lots of pagefile in Windows terms I think) to disk while processing the data?
Raj> WAN circuit costs are pricey so ideally we'd centralise their Raj> workloads - powerful SAS server plus plenty of disk or even 40 Raj> servers with individual workstation licenses (either VDI, Citrix Raj> or real) and shared disk. Their theory is that their tasks aren't Raj> necessarily CPU bound but disk i/o bound.
In that case, a simple test would be to setup their home system with a RAID0, where you stripe all their volumes across multiple disks.
You mention that they have one disk for OS, one for temp, one for data, etc. Instead you should use those three or four (or more!) disks in a nice RAID0 stripe set, so they are reading and writing across multiple disks at once.
If that gives them a big speedup, then you know that they're IO bound in their simulations.
Raj> SSD's are fine for reads but they're skeptical about writes. SAS Raj> apparently supports RAM disks but they're pretty pricey for the Raj> size they'd need.
It all depends on how large the data sets are and how much RAM the systems have. You don't mention any of that, so it's hard to know.
But in general, if they're IO bound, then I'd look into have a central system with multiple CPUs, but more importantly LOTS of spidles setup in RAID1 mirrors, then stripped across the pairs of mirrors so that they have both speed and reliability.
I'd also look into stuffing lots of RAM into the system to see if they can use RAM for their temp space. But it all depends on where the bottleneck is.
Raj> A bit of googling indicates SAS has a distributed processing Raj> mechanism. I'll have to chat to them about licensing (suspect its Raj> not cheap).
Time = $$$ for alot of people.
Raj> Anyone using their SAN's for storing or running weather, Raj> population or financial simulations (ie massive data-sets with a Raj> 50/50 mix of reads/writes) ?
It sounds like you're doing streaming reads/writes, i.e. reading in a large data set in chunks, running calculations with a fair amount of locality, then streaming out results or processed data. In that case, throughput, not CPU, is your big problem.
And I suspect that Netapp won't be the solution here, but it's hard to know. Remember, a 1Gig network connection is 100MBytes/sec max, so if you have 40 compute nodes all trying to push that much data at once, that's ALOT. You'd want to goto 10Gig ethernet, etc.
But first things first. You need to characterize your data set and examine where the bottlenecks are. But thinking about it more, if they're willing to load datasets over the WAN, they can't be *that* big. So maybe it's TEMP writes which is killing their performance? Or contention between the single disk they're using for local storage.
So pull in a test case and look at what's really happening. Network IO, Disk IO, IO patterns, etc. Thne you know where to spend your time/money to speed things up.
John John Stoffel - Senior Staff Systems Administrator - System LSI Group Toshiba America Electronic Components, Inc. - http://www.toshiba.com/taec john.stoffel@taec.toshiba.com - 508-486-1087