>>>>> "Jr" == Jr Gardner <phil.gardnerjr@gmail.com> writes:
So that means you're indexing 300Gb worth of data, or generating 300Gb
worth of Solr indexes? How big are the generated Indexes? The reason
I ask is because maybe if they are that big, it means you're writing
1.2Tb of indexes, which is the same data...
So if you could maybe split the indexers into two pools, and have each
pair have only one system doing the indexing... that might big a big
savings.
How was the performance on the handful of DL360s? And how is the
perforance of the slaves compared to the old setup? Even if you're
beating the Netapp to death for a bit, maybe it's a net win?
So this I think answers my question, the master trawls the dataset
building the index, then the slaves copy those new indexes to their
storage area.
And it really is too bad that you can't just use the Netapp for the
replication with either a snapshot or a snapmirror to replicate the
new files to the six slaves. If they're read only, you should be
working hard to keep them from reading the same file six times from
the master and then writing six copies back to the Netapp.
Now hopefully the clients aren't re-writing all 300Gb each time, and
the write numbers you show are simply huge! You're seeing 10x the
writes compared to reads, which implies that these slaves aren't setup
right. They should be read/mostly!
Does the index really need to be updated every three minutes? That's
a pretty darn short time.
And is there other load on the 8040 cluster as well?
What's the load on the Netapp when no nodes are writing at all? Are
you getting hit by lots of writes then? If so... you need more
spindles. And how full is the aggregate? And how busy/full are the
other volumes?