Hi,
Anybody have insights on this article? Does our ONTAP "Lost Write Protection" address this issue?
Regards, Jenson
A bit of a flaw with SATA disk drives
June 03, 2008 By Jerome Wendt Network World Asia
High-capacity serial ATA (SATA) disk drives are now a mainstay in many storage systems and make it feasible for almost any company to obtain a storage system with terabytes of capacity at a reasonable cost. Yet these systems reveal a specific, known deficiency of SATA disk drives that demand companies exercise caution as to what environments they deploy these systems into.
A minor flaw with SATA disk drives that high capacity storage systems expose is their bit error rate. Bit errors occur infrequently - about once for every 100 trillion bits. However RAID technology, which is normally used by storage systems to protect against data loss, does not detect if a specific bit on a SATA drive becomes unreadable.
While this is normally not a problem on smaller systems, as storage systems add more capacity, the issue becomes more acute. On systems with more than 10TB of capacity the probability of a specific bit of data becoming unreadable is a distinct possibility. On systems with over 100TB, it becomes almost a certainty.
So the question becomes: Does losing access to one bit of data really matter? Often, it doesn't unless one stores deduplicated data on these systems which is now the fastest growing trend in data storage. When data is deduplicated, the storage system's need to read every bit of data becomes paramount. The inability to access even a bit of data can result in multiple files becoming unreadable since they all may depend on a specific bit of data to complete their reconstruction.
High capacity SATA-based storage systems are the answer to many companies' archiving and backup problems. But SATA bits can bite and using SATA drives to store large amounts of deduplicated data is not always the match made in heaven that vendors make them out to be.
Jerome Wendt is the president and lead analyst at DCIG Inc. You may read his blogs at www.dciginc.com http://www.dciginc.com/ .
It does. This is old news in the industry for which an effective prescription would be Data ONTAP and current firmware ;-)
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com www.netapp.com http://www.netapp.com/
________________________________
From: Chong, Jenson Sent: Thursday, June 05, 2008 6:54 AM To: toasters@mathworks.com Subject: Flaw with SATA disks - not suitable for deduplication environment
Hi,
Anybody have insights on this article? Does our ONTAP "Lost Write Protection" address this issue?
Regards, Jenson
A bit of a flaw with SATA disk drives
June 03, 2008 By Jerome Wendt Network World Asia
High-capacity serial ATA (SATA) disk drives are now a mainstay in many storage systems and make it feasible for almost any company to obtain a storage system with terabytes of capacity at a reasonable cost. Yet these systems reveal a specific, known deficiency of SATA disk drives that demand companies exercise caution as to what environments they deploy these systems into.
A minor flaw with SATA disk drives that high capacity storage systems expose is their bit error rate. Bit errors occur infrequently - about once for every 100 trillion bits. However RAID technology, which is normally used by storage systems to protect against data loss, does not detect if a specific bit on a SATA drive becomes unreadable.
While this is normally not a problem on smaller systems, as storage systems add more capacity, the issue becomes more acute. On systems with more than 10TB of capacity the probability of a specific bit of data becoming unreadable is a distinct possibility. On systems with over 100TB, it becomes almost a certainty.
So the question becomes: Does losing access to one bit of data really matter? Often, it doesn't unless one stores deduplicated data on these systems which is now the fastest growing trend in data storage. When data is deduplicated, the storage system's need to read every bit of data becomes paramount. The inability to access even a bit of data can result in multiple files becoming unreadable since they all may depend on a specific bit of data to complete their reconstruction.
High capacity SATA-based storage systems are the answer to many companies' archiving and backup problems. But SATA bits can bite and using SATA drives to store large amounts of deduplicated data is not always the match made in heaven that vendors make them out to be.
Jerome Wendt is the president and lead analyst at DCIG Inc. You may read his blogs at www.dciginc.com http://www.dciginc.com/ .
I disagree with the statement that "RAID technology [...] does not detect if a specific bit on a SATA drive becomes unreadable." If a bit is unreadable or reads back as the wrong value, RAID-5 (or NetApp's RAID-DP) can detect and fix the error when the data is read back.
(Note that if the data is never read back, then it's immaterial whether it's correct.)
Deduplication does not increase this risk. In fact, deduplication means that the duplicated data is read back more often, which should mean that any errors that occur would be detected sooner.
On Thursday 05 June 2008 06:54:15 am Chong, Jenson wrote:
Hi,
Anybody have insights on this article? Does our ONTAP "Lost Write Protection" address this issue?
Regards, Jenson
A bit of a flaw with SATA disk drives June 03, 2008 By Jerome Wendt Network World Asia High-capacity serial ATA (SATA) disk drives are now a
mainstay in many storage systems and make it feasible for almost any company to obtain a storage system with terabytes of capacity at a reasonable cost. Yet these systems reveal a specific, known deficiency of SATA disk drives that demand companies exercise caution as to what environments they deploy these systems into.
A minor flaw with SATA disk drives that high capacity
storage systems expose is their bit error rate. Bit errors occur infrequently - about once for every 100 trillion bits. However RAID technology, which is normally used by storage systems to protect against data loss, does not detect if a specific bit on a SATA drive becomes unreadable.
While this is normally not a problem on smaller systems,
as storage systems add more capacity, the issue becomes more acute. On systems with more than 10TB of capacity the probability of a specific bit of data becoming unreadable is a distinct possibility. On systems with over 100TB, it becomes almost a certainty.
So the question becomes: Does losing access to one bit
of data really matter? Often, it doesn't unless one stores deduplicated data on these systems which is now the fastest growing trend in data storage. When data is deduplicated, the storage system's need to read every bit of data becomes paramount. The inability to access even a bit of data can result in multiple files becoming unreadable since they all may depend on a specific bit of data to complete their reconstruction.
High capacity SATA-based storage systems are the answer
to many companies' archiving and backup problems. But SATA bits can bite and using SATA drives to store large amounts of deduplicated data is not always the match made in heaven that vendors make them out to be.
Jerome Wendt is the president and lead analyst at DCIG
Inc. You may read his blogs at www.dciginc.com http://www.dciginc.com/ .
On Thu, Jun 05, 2008 at 08:16:26AM -0400, David Lee Lambert wrote:
I disagree with the statement that "RAID technology [...] does not detect if a specific bit on a SATA drive becomes unreadable." If a bit is unreadable or reads back as the wrong value, RAID-5 (or NetApp's RAID-DP) can detect and fix the error when the data is read back.
*bzzt* nope. RAID by itself does not "detect" bad data. It can correct bad data when it's detected (by other means, usually the hardware driver), but it doesn't detect it.
At least, that's not how it's implemented in all RAID systems that I'm aware of. To _detect_ bad data with RAID, you'd have to read an entire track, and verify that the checksum is correct. If it isn't, then AT LEAST ONE of the sectors is in error, but it's impossible to determine which one, without additional information.
RAID-DP might distill that info from the diagonal checksums (if it's a single sector error), but you can hardly expect your super fast networked storage hardware device to go and solve crossword puzzles for you every time you request a block of data.
RAID works because drives don't just flip bits, they fail. Or the CRC on the drive block fails. In any way, there's some indication that something is amiss on a certain drive. RAID then offers the ability to replace the failed data from the other drives, using the parity.
(Note that if the data is never read back, then it's immaterial whether it's correct.)
Deduplication does not increase this risk. In fact, deduplication means that the duplicated data is read back more often, which should mean that any errors that occur would be detected sooner.
On Thursday 05 June 2008 06:54:15 am Chong, Jenson wrote:
Hi,
Anybody have insights on this article? Does our ONTAP "Lost Write Protection" address this issue?
Regards, Jenson
A bit of a flaw with SATA disk drives June 03, 2008 By Jerome Wendt Network World Asia High-capacity serial ATA (SATA) disk drives are now a
mainstay in many storage systems and make it feasible for almost any company to obtain a storage system with terabytes of capacity at a reasonable cost. Yet these systems reveal a specific, known deficiency of SATA disk drives that demand companies exercise caution as to what environments they deploy these systems into.
A minor flaw with SATA disk drives that high capacity
storage systems expose is their bit error rate. Bit errors occur infrequently - about once for every 100 trillion bits. However RAID technology, which is normally used by storage systems to protect against data loss, does not detect if a specific bit on a SATA drive becomes unreadable.
While this is normally not a problem on smaller systems,
as storage systems add more capacity, the issue becomes more acute. On systems with more than 10TB of capacity the probability of a specific bit of data becoming unreadable is a distinct possibility. On systems with over 100TB, it becomes almost a certainty.
So the question becomes: Does losing access to one bit
of data really matter? Often, it doesn't unless one stores deduplicated data on these systems which is now the fastest growing trend in data storage. When data is deduplicated, the storage system's need to read every bit of data becomes paramount. The inability to access even a bit of data can result in multiple files becoming unreadable since they all may depend on a specific bit of data to complete their reconstruction.
High capacity SATA-based storage systems are the answer
to many companies' archiving and backup problems. But SATA bits can bite and using SATA drives to store large amounts of deduplicated data is not always the match made in heaven that vendors make them out to be.
Jerome Wendt is the president and lead analyst at DCIG
Inc. You may read his blogs at www.dciginc.com http://www.dciginc.com/ .
--
David L. Lambert Software Developer, Precision Motor Transport Group, LLC Work phone 517-349-3011 x215 Cell phone 586-873-8813
Folks, the original article refers to "bit errors" on SATA disk drives creating "problems".. The type of problem that this does create is typically a Media error. Media errors are like death and taxes, they're something that you know is inevitable, it's only a question of when.
Five years ago, Netapp realized that arial density was on the rise and that simply relying on disk access and weekly RAID scrubs to find and fix media errors was insufficient. We actually had a spike in double double disk failures occurring due to media related failures. Ironically this was happening on the Native FC disk subsystems due to media substrate defects vs the SATA drive environments which were just not on-scene yet..
With the 6.4.2 release of Data ONTAP, we created a new background media scan feature. This process runs in the background and insures that media errors are detected and fixed by RAID prior to escallating to a situation where double disk failures could occur.
Over the releases that have followed, this core piece of Data ONTAP has been modified and updated to keep pace with larger capacity disk introductions. Background Media Scan together with the advent of RAID DP and on disk checksums effectively provides industry leading data integrity in this area.
Hope this helps,
- Doug
Doug Coatney Senior Software Engineer Storage Systems Team
NetApp 408.822.3708 Direct 408.822.4579 Fax dougc@netapp.com www.netapp.com
-----Original Message----- From: Jan-Pieter Cornet [mailto:johnpc@xs4all.nl] Sent: Thursday, June 05, 2008 5:59 AM To: David Lee Lambert Cc: toasters@mathworks.com Subject: Re: Flaw with SATA disks - not suitable for deduplication environment
On Thu, Jun 05, 2008 at 08:16:26AM -0400, David Lee Lambert wrote:
I disagree with the statement that "RAID technology [...] does not detect if a specific bit on a SATA drive becomes unreadable." If a bit is unreadable or reads back as the wrong value, RAID-5 (or NetApp's RAID-DP) can detect and fix the error when the data
is read back.
*bzzt* nope. RAID by itself does not "detect" bad data. It can correct bad data when it's detected (by other means, usually the hardware driver), but it doesn't detect it.
At least, that's not how it's implemented in all RAID systems that I'm aware of. To _detect_ bad data with RAID, you'd have to read an entire track, and verify that the checksum is correct. If it isn't, then AT LEAST ONE of the sectors is in error, but it's impossible to determine which one, without additional information.
RAID-DP might distill that info from the diagonal checksums (if it's a single sector error), but you can hardly expect your super fast networked storage hardware device to go and solve crossword puzzles for you every time you request a block of data.
RAID works because drives don't just flip bits, they fail. Or the CRC on the drive block fails. In any way, there's some indication that something is amiss on a certain drive. RAID then offers the ability to replace the failed data from the other drives, using the parity.
(Note that if the data is never read back, then it's immaterial whether it's correct.)
Deduplication does not increase this risk. In fact, deduplication means that the duplicated data is read back more often,
which should
mean that any errors that occur would be detected sooner.
On Thursday 05 June 2008 06:54:15 am Chong, Jenson wrote:
Hi,
Anybody have insights on this article? Does our ONTAP "Lost Write Protection" address this issue?
Regards, Jenson
A bit of a flaw with SATA disk drives June 03, 2008 By Jerome Wendt Network World Asia High-capacity serial ATA (SATA) disk drives are
now a mainstay in
many storage systems and make it feasible for almost any
company to
obtain a storage system with terabytes of capacity at a reasonable cost. Yet these systems reveal a specific, known
deficiency of SATA
disk drives that demand companies exercise caution as to what environments they deploy these systems into.
A minor flaw with SATA disk drives that high
capacity storage
systems expose is their bit error rate. Bit errors occur infrequently - about once for every 100 trillion bits.
However RAID
technology, which is normally used by storage systems to protect against data loss, does not detect if a specific bit on a
SATA drive
becomes unreadable.
While this is normally not a problem on smaller
systems, as
storage systems add more capacity, the issue becomes more
acute. On
systems with more than 10TB of capacity the probability of a specific bit of data becoming unreadable is a distinct
possibility.
On systems with over 100TB, it becomes almost a certainty.
So the question becomes: Does losing access to
one bit of data
really matter? Often, it doesn't unless one stores
deduplicated data
on these systems which is now the fastest growing trend in data storage. When data is deduplicated, the storage system's need to read every bit of data becomes paramount. The inability to access even a bit of data can result in multiple files becoming
unreadable
since they all may depend on a specific bit of data to
complete their reconstruction.
High capacity SATA-based storage systems are
the answer to many
companies' archiving and backup problems. But SATA bits
can bite and
using SATA drives to store large amounts of deduplicated
data is not
always the match made in heaven that vendors make them out to be.
Jerome Wendt is the president and lead analyst
at DCIG Inc. You
may read his blogs at www.dciginc.com http://www.dciginc.com/ .
--
David L. Lambert Software Developer, Precision Motor Transport Group, LLC Work phone 517-349-3011 x215 Cell phone 586-873-8813
-- Jan-Pieter Cornet johnpc@xs4all.nl !! Disclamer: The addressee of this email is not the intended recipient. !! !! This is only a test of the echelon and data retention systems. Please !! !! archive this message indefinitely to allow verification of the logs. !!
On Thu, Jun 05, 2008 at 06:54:15PM +0800, Chong, Jenson wrote:
Hi,
Anybody have insights on this article? Does our ONTAP "Lost Write Protection" address this issue?
I don't have any particular insights, but I believe that you're reasonably safe with RAID DP and block level checksums (or is that ECC?).
Netapp formats drives with 520 bytes per sector, for the specific reason to add 8 extra bytes of checksum/ECC code per block. So ONtap knows when a block is bad, single bit flips from faulty SATA drives can't fool it. Also, the weekly scrub should highlight those faults, so they don't pile up in unused corners of your filesystem.
RAID DP will make sure that during reconstruction of a single failed drive, single bit errors on the other drives have a backup on the second parity.
However, having said that... in the about 5 years that we deploy SATA drives with netapp, we've had one case where data could possibly have been lost (so we had to re-initialize the snapmirror. It was only used as backup), and another where data was lost that was not currently in use by the filesystem. This mainly happened when we had simultaneous disk failures (3 or more drives in the same machine, max 2 in the same raid group).
We've never seen 2 simultaneous drive failures (within the rebuild time) with FC drives in the 10+ years we use those.
So, to recap: there is some truth in this article, but it's certainly not as bad as it sounds, with ONtap. And even normal drives have CRCs appended to sectors, so I cannot imagine bits just flipping without the drivers giving a warning, even on other systems.
Regards, Jenson
A bit of a flaw with SATA disk drives June 03, 2008 By Jerome Wendt Network World Asia
High-capacity serial ATA (SATA) disk drives are now a
mainstay in many storage systems and make it feasible for almost any company to obtain a storage system with terabytes of capacity at a reasonable cost. Yet these systems reveal a specific, known deficiency of SATA disk drives that demand companies exercise caution as to what environments they deploy these systems into.
A minor flaw with SATA disk drives that high capacity
storage systems expose is their bit error rate. Bit errors occur infrequently - about once for every 100 trillion bits. However RAID technology, which is normally used by storage systems to protect against data loss, does not detect if a specific bit on a SATA drive becomes unreadable.
While this is normally not a problem on smaller systems,
as storage systems add more capacity, the issue becomes more acute. On systems with more than 10TB of capacity the probability of a specific bit of data becoming unreadable is a distinct possibility. On systems with over 100TB, it becomes almost a certainty.
So the question becomes: Does losing access to one bit
of data really matter? Often, it doesn't unless one stores deduplicated data on these systems which is now the fastest growing trend in data storage. When data is deduplicated, the storage system's need to read every bit of data becomes paramount. The inability to access even a bit of data can result in multiple files becoming unreadable since they all may depend on a specific bit of data to complete their reconstruction.
High capacity SATA-based storage systems are the answer
to many companies' archiving and backup problems. But SATA bits can bite and using SATA drives to store large amounts of deduplicated data is not always the match made in heaven that vendors make them out to be.
Jerome Wendt is the president and lead analyst at DCIG
Inc. You may read his blogs at www.dciginc.com http://www.dciginc.com/ .