NetApp burn-in testing (was "raid swap question")

13 Nov 1997


      Brian Tao asks:
BTW, could someone at Netapp tell us what exactly is done during
    their pre-ship burn-in testing?
I asked the folks in manufacturing what exactly we do, and here's what
I learned.  The actual burn-in testing for each system consists of
three phases:
(1) ICT (In-Circuit Test)
Individual boards are tested in isolation -- before systems are
    assembled -- using special ICT test harnesses.
My understanding is that the ICT harness has special pins that
    directly touch traces on the board.  This lets ICT test
    individual components to identify failures at a very low
    level.  (This helps us to trace problems back to their root
    cause.  Maybe a given lot of capacitors from a particular
    vendor is out of spec.)
(2) System Functional Testing (for 1 day)
Next the systems are assembled as per the customer's order and
    then tested at the system level.
First they go through a 1-day functional test that checks CPU,
    memory subsystem, I/O, NVRAM, and the storage subsystem.  This
    testing includes our standard system diagnostics.
(3) Stress Testing (for 2 days)
After the functional testing, systems go into 2 days of stress
    testing.  Originally we generated load using UNIX clients, but
    now we use two filers in a back-to-back configuration to test
    each other.  They each generate a load simulating 40 clients.
All accessories (stand-alone drives, memory, NICs, etc.) are put into a
filer and run through this same test process prior to shipment.
It's important to distinguish between pre-shipment burn-in and the
Quality Assurance (QA) that we do as part of each new software or new
hardware release.
The goal of QA is to ensure that the DESIGN is correct.  The goal of
burn-in is to ensure that the hardware is assembled correctly and that
all componants work.  QA tends to focus on software coverage, while
burn-in tends to focus on hardware component coverage.
So it is in QA that we try think up nasty things to do to filers, much
as you have been doing with your spare systems.
Our QA setup is actually very cool.  We have a system called ANT
(Automated Nightly Testing) that automatically builds our software
every night, downloads it to a filer, and runs it through a series of
functional tests.  This gives us very quick feedback during development
if something goes wrong.  And of course, we're always trying to extend
the automated tests, as we find clever new ways of breaking filers.
In addition, there are lots of manual tests, and longer-term stress
tests that each release must go through before being released.  Some of
the manual testing includes things that are very difficult to automate,
like pulling the power and screwing around with drives.  Other tests
are manual because we haven't yet gotten around to automating them.
Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

NetApp burn-in testing (was "raid swap question")