That doesn't fully answer the question.
How does one deal with this problem *before* patching 5.3.6R2?
Does one have to nuke the filer to get quota functional again?
-----Original Message-----
From: Heller, Jeffrey [mailto:Jeffrey.Heller@netapp.com]
Sent: Thursday, August 10, 2000 6:14 PM
To: Krishnan, Prabhakar; Jiang, Perry
Cc: toasters
Subject: RE: user quota crashes filer
This problem is fixed in 5.3.6R2 which is on NOW as of today.
-----Original Message-----
From: Prabhakar Krishnan […
[View More]mailto:kpkar@netapp.com]
Sent: Thursday, August 10, 2000 1:54 PM
To: Jiang, Perry
Cc: toasters
Subject: Re: user quota crashes filer
this was a bug.
- patch is availabe , contact the SE/CS
- fixed in the next release
"Jiang, Perry" wrote:
> Hi,
> I have a 740 filer with Data OnTAP 5.3.6
> I put a quota entry for username "perry" into /vol/vol0/etc/quotas
> perry user 50M -
>
> But there is no entry for "perry" in /vol/vol0/etc/passwd, so there is no
> UID for perry(No NIS running, of course).
> When I run "quota on vol0", the filer crashes.
>
> I have to boot up the filer from floppy disk, remove the entry of "perry" in
> /vol/vol0/etc/quotas file, then reboot the filer.
>
> After the filter is up, everything is ok. However, I can't enable quota
> anymore. Every time
> I run "quota on vol0", the filer would crash.
>
> The only thing I have in /vol/vol0/etc/quotas now is couple of entries for
> Qtree quota.
>
> Can anybody explain when the filer still crashes after I remove the "perry"
> entry?
> How can I make the quota work again?
> Do I have to rebuild the filer?
>
> Thanks
>
> Perry Jiang
[View Less]
>1. Does a hot backup (as described on
>http://www.netapp.com/tech_library/3049.html) present any
>restoration problems? Has anyone done a restore from a hot
>backup?
How about handling of the control files?
In other hot-backup schemes I have seen, there are
a lot of stuff on how and when to do the
"ALTER DATABASE BACKUP CONTROLFILE ... ..." command.
Also also some stuff on "ALTER SYSTEM SWITCH LOGFILE" or
"ARCHIVE LOG NEXT"
None of these are mentioned in the above TechLibrary
…
[View More]article.
/Gynt
______________________________________________
FREE Personalized Email at Mail.com
Sign up at http://www.mail.com/?sr=signup
______________________________________________
FREE Personalized Email at Mail.com
Sign up at http://www.mail.com/?sr=signup
[View Less]
>1. Does a hot backup (as described on
>http://www.netapp.com/tech_library/3049.html) present any
>restoration problems? Has anyone done a restore from a hot
>backup?
How about handling of the control files?
In other hot-backup schemes I have seen, there are
a lot of stuff on how and when to do the
"ALTER DATABASE BACKUP CONTROLFILE ... ..." command.
Also also some stuff on "ALTER SYSTEM SWITCH LOGFILE" or
"ARCHIVE LOG NEXT"
None of these are mentioned in the above TechLibrary
…
[View More]article.
/Gynt
______________________________________________
FREE Personalized Email at Mail.com
Sign up at http://www.mail.com/?sr=signup
[View Less]
This problem is fixed in 5.3.6R2 which is on NOW as of today.
-----Original Message-----
From: Prabhakar Krishnan [mailto:kpkar@netapp.com]
Sent: Thursday, August 10, 2000 1:54 PM
To: Jiang, Perry
Cc: toasters
Subject: Re: user quota crashes filer
this was a bug.
- patch is availabe , contact the SE/CS
- fixed in the next release
"Jiang, Perry" wrote:
> Hi,
> I have a 740 filer with Data OnTAP 5.3.6
> I put a quota entry for username "perry" into /vol/vol0/etc/quotas
> perry …
[View More] user 50M -
>
> But there is no entry for "perry" in /vol/vol0/etc/passwd, so there is no
> UID for perry(No NIS running, of course).
> When I run "quota on vol0", the filer crashes.
>
> I have to boot up the filer from floppy disk, remove the entry of "perry" in
> /vol/vol0/etc/quotas file, then reboot the filer.
>
> After the filter is up, everything is ok. However, I can't enable quota
> anymore. Every time
> I run "quota on vol0", the filer would crash.
>
> The only thing I have in /vol/vol0/etc/quotas now is couple of entries for
> Qtree quota.
>
> Can anybody explain when the filer still crashes after I remove the "perry"
> entry?
> How can I make the quota work again?
> Do I have to rebuild the filer?
>
> Thanks
>
> Perry Jiang
[View Less]
Hi,
I have a 740 filer with Data OnTAP 5.3.6
I put a quota entry for username "perry" into /vol/vol0/etc/quotas
perry user 50M -
But there is no entry for "perry" in /vol/vol0/etc/passwd, so there is no
UID for perry(No NIS running, of course).
When I run "quota on vol0", the filer crashes.
I have to boot up the filer from floppy disk, remove the entry of "perry" in
/vol/vol0/etc/quotas file, then reboot the filer.
After the filter is up, everything is ok. However, I can't enable quota
…
[View More]anymore. Every time
I run "quota on vol0", the filer would crash.
The only thing I have in /vol/vol0/etc/quotas now is couple of entries for
Qtree quota.
Can anybody explain when the filer still crashes after I remove the "perry"
entry?
How can I make the quota work again?
Do I have to rebuild the filer?
Thanks
Perry Jiang
[View Less]
I have been hearing this question from a lot of technically astute clients
lately. While I don't have all the answers, I can suggest at least one reason
why the protocol overhead inherent in NFS and CIFS is not as big a disadvantage
in practice as it appears to be in theory. The main point to consider is that
the time it takes to get the head on a disk drive positioned over the data to be
read is much, much greater than the time it takes to transfer the data.
Lets take for example, the Seagate …
[View More]Barracuda 18 Gig drive. The average seek time
is ~8 msec, and the average rotational latency is ~4 msec, so it will take on
average 12 msec to position the head over the data of interest and start the
operation. If the operation was to read in 4KB at a media-to-buffer rate of 20
MBytes/sec, it would be completed in ~0.2 msec, 2 orders of magnitude faster
than the time it took to get into position. If your average system has a lot of
processes, each doing I/O to different areas of the disk, you will spend on
average 60x more time doing seeks than transferring data. Reducing this overhead
clearly makes a much bigger difference to performance than increasing the
bandwidth of the path from the host to the disk.
While the average times give you the feel, the edge times are very interesting
as well. The seek time for the same drive varies from ~1.5 msec to move a single
track, to ~15 msec to move across the platter. Thus, there is an order of
magnitude improvement possible if the filesystem was optimized to locate related
data together. Our file system, WAFL, does just that. Other filesystems which
put their metadata in a particular area of the disk are guaranteed to cause a
lot of mechanical seeks under normal load.
If you had just one single threaded application writing and reading large files
to dedicated local storage, then the seek latency described above would not be a
big issue. Most databases are not like that. In this environment the appropriate
emphasis is to improve performance by reducing the effects of the mechanical
latency involved in head seeks and rotational delays. That's what filers do, and
that is one reason why they perform so well in spite of the overhead of NFS or
CIFS.
Blaine Agnew
> -----Original Message-----
> From: Pesce, Nicholas (FUSA) [mailto:Nicholaspesce@firstusa.com]
> Sent: Wednesday, August 09, 2000 11:34 AM
> To: 'rdobbins(a)netmore.net'; keith(a)netapp.com
> Cc: toasters(a)mathworks.com
> Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp
> ques tions)
>
>
> I'm sorry Keith. But I've heard this argument before. NFS
> versus Direct
> attatch storage? I'm going to have to vote for a good direct attach
> solution. Why?
>
> NFS and CIFS have huge overhead.
>
>
> Also, I took a look at the link you provided. The information was
> incomplete at best
>
> snip:
>
> .The following results were obtained with Bluecurve's Dynameasure SQL
> Professional product. We were running MS SQL Server 6.5 on a
> Dell 4200 with
> two 200 Mhz Pentium Pro CPUs and 512 Mb of RAM. The local
> disk was on a Dell
> PowerEdge RAID adapter with 6 9Gb UltraSCSI disks. The filer
> was an F630
> with 512 Mb of RAM, and 14 9Gb UltraSCSI disks. The Dell was
> running RAID 0.
> The Filer, needless to say, was running RAID 4.
>
> snip
>
> What were the speeds of the drives? The CPU information here
> is important
> but not compared to drive information. Also I'd like to point out that
> writing to 3 disks (you are using raid 0 so the 6 disks are
> now effectively
> 3) spindles is most likely going to be slower than writing to
> 14 spindles.
>
> Following the logic of this link. (ignoring obvious disk information
> questions) Netapp would like me to believe that a machine running a
> non-parity raid platform (mirroring) and having a low
> protocol overhead
> (scsi) is slower that a Network appliance running a parity
> system (raid 4)
> and running NFS or CIFS (I'm assuming it was CIFS because it
> was a MS sequel
> server, but I could be incorrect).
>
> I think I would like to see a test where the Disk sizes and
> number were
> similar, I sincerely doubt the Netapp would do as well.
>
> I believe that Network Appliances are solid machines that
> perform their
> functions effectively, but they are NOT a replacement for
> direct attach
> where speed is essential.
> -----Original Message-----
> From: rdobbins(a)netmore.net [mailto:rdobbins@netmore.net]
> Sent: Tuesday, August 08, 2000 8:47 PM
> To: keith(a)netapp.com
> Cc: toasters(a)mathworks.com
> Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp
> ques tions)
>
>
>
> No one thoughout recorded history, apart from assorted nuts, has ever
> believed that the Earth was flat; Eratosthenes calculated both its
> circumference to within 1% of the true value as well its tilt
> relative to
> the plane of the ecliptic no later than 200 B.C. The only
> real problem was
> that of terra incognita - i.e., nobody knew exactly where the
> landmasses
> were located until a) someone sailed there, and b) accurate
> chronometers
> were developed by John Harrison in the 18th Century, enabling
> navigators to
> calculate longitude with a high degree of accuracy, and then
> relay that
> information to cartographers.
>
> Advances in spherical geometry a la Mercator assisted the
> latter group, of
> course.
>
> Historical canards aside, let me restate that I'm very
> interested in hearing
> about production experience with NetApp filers and Oracle
> over NFS. I've a
> 740 with a Gigabit Ethernet interface, plugged into a
> Catalyst 5509 doing
> MPLS, and so would be willing to entertain the notion if
> someone can give me
> anything beyond benchmarks.
>
> I know all about snapshots and all that, by the bye. It's
> -performance-
> which is the question.
>
> Thanks for the pointer to the link, I'll be sure and check it out.
>
> -----------------------------------------------------------
> Roland Dobbins <rdobbins(a)netmore.net> // 818.535.5024 voice
>
>
>
> -----Original Message-----
> From: Keith Brown [mailto:keith@netapp.com]
> Sent: Tuesday, August 08, 2000 5:48 PM
> To: rdobbins(a)netmore.net; Perry.Jiang(a)bmo.com
> Cc: toasters(a)mathworks.com
> Subject: Filer storage for databases, seriously? (Was: Re: NetApp
> questions)
>
>
>
> > As to running Oracle with the data and logfiles on a filer via NFS,
> > I should think that even with a NetApp using Gigabit Ethernet,
> > you'd take a -huge- performance hit as compared to a local
> > disk array.
>
> Beware conventional wisdom Roland. People used to think the
> Earth was flat
> too. :-)
>
> While I wouldn't be so bold as to *guarantee* performance boosts in
> utilizing the filer storage approach for every database
> application under
> the Sun, the simple fact is that filers contain a myriad of
> features that
> are very attractive to the database market, and NetApp now draws a
> significant and growing portion of its revenues from this space.
>
> Snapshots & SnapRestore greatly simplify and enhance database
> backup and
> restore environs. The WAFL RAID design puts failure
> resiliancy into the disk
> subsystem without forcing you take the performance hits
> inherent in general
> purpose RAID-5 designs or going to disk-doubling RAID-1 approaches.
> SnapMirror gives you database replication to offsite
> locations for disaster
> recovery purposes. WAFL's ready expandibility lets you make
> room for growing
> databases without disrupting their operation. The list goes on...
>
> Oh.. and yes... performance very often gets a shot in the arm too!
>
> > I've no empirical data
> > to back this up, mind you;
>
> Don't worry. Nobody ever does, not even our direct-attach
> competitors, not
> that they can be too harshly criticized. Meaningful
> performance comparisons
> are tricky to architect, usually have a short shelf life, and
> customers have
> an understandable tendency not to believe vendor funded
> benchmarks anyway
> (due to the fact that the vendor performing and/or funding
> the benchmark
> almost always wins!).
>
> Nevertheless, we did publish a relatively innocuous one some
> time ago, which
> can be viewed here:
>
http://www.netapp.com/tech_library/3044.html
> it's just that there's so much overhead
> associated with NFS even on an optimized platform
> like the NetApp filer, I can't see it as being a win.
There are certainly some "swings-and-roundabouts"-type things to consider
when looking at the two approaches, and some people do conclude that there
is more overhead in the network attach approach, dismissing it offhand.
However, as far as performance goes, all the theory in the world is no
substitute for the practical experience that could be gained by trying a
solution for the application you have and actually bearing witness to how
well it works and what it performs like.
> If there's anyone out there with Oracle experience on filers
> via NFS, either pro or con, I'd love to hear from you.
I'm hoping there will be some on this list. As I mentioned, beware
conventional wisdom. America might have been discovered hundreds of years
before Columbus sailed over the horizon, if only all his ancestors hadn't
been terrified of falling off the edge of the world!
Keith
[View Less]
Due to instability issues with the recent release of the NOW site, on August
5th, we are going to "roll back" the new features and return to the previous
version of the site. We apologize for the inconvenience but the stability of the
site is a priority over new features.
We will notify you via this mechanism when we have determined the cause of the
instability and give you fair notice of our intent to move to the new release.
The new features that will be unavailable until further notice are …
[View More]the
following:
o Newly re-designed 'Order Status Tool'
o Purchase Web-Based training directly on-line!
o New NOW homepage
o Bugs On-Line Enhancements
If you have any comments or questions, please contact me at the address below.
----------------------------------------------------------------------
Barry Davis - Director of eService
Network Appliance, Inc.
Email: barry.davis(a)netapp.com
Get answers NOW! - NetApp On the Web - http://now.netapp.com
[View Less]
I'm just a NetApp engineer, so please take what I've got
to say on this subject with the appropriate amount of
salt.
The problem with comparing storage metaphors by analytic
methods is that storage systems are so complex. The disk
drive hardware and firmware, fabric protocols, networking
stacks, OS firmware, disk drivers, block access protocols,
file access protocols and file sharing protocols all interact
in subtle and tricky ways. Discussions about backplane or
fabric speed, I/O …
[View More]bandwidths, seek times and so on never
seem able to take these interactions into account.
Analytic discussions about the systems therefore fail in
many of the same ways that benchmarks fail. They retain
their allure for many people in spite of this.
There's just no substitute, though, for measurement of actual
end-to-end performance in real world situations. Not a
particularly fun realization, because it's expensive and a lot
of work to set up the real world situations and take the
measurements.
It's worth it though. Almost everyone I know who's gone
to the trouble agrees that the results are interesting enough--
and different enough from what even sophisticated analysis
expects--to make it worth the work.
Alan
===============================================================
Alan G. Yoder, Ph.D. agy(a)netapp.com
Network Appliance, Inc.
Sunnyvale, CA 408-822-6919
===============================================================
> -----Original Message-----
> From: Brian Tao [mailto:taob@risc.org]
> Sent: Thursday, August 10, 2000 11:33 AM
> To: Pesce, Nicholas (FUSA)
> Cc: keith(a)netapp.com; toasters(a)mathworks.com
> Subject: RE: Filer storage for databases, seriously? (Was: Re: NetApp
> ques tions)
>
>
> On Wed, 9 Aug 2000, Pesce, Nicholas (FUSA) wrote:
> >
> > I'm sorry Keith. But I've heard this argument before. NFS versus
> > Direct attatch storage? I'm going to have to vote for a good direct
> > attach solution. Why?
> >
> > NFS and CIFS have huge overhead.
>
> Sure, as someone else mentioned, all other things being equal,
> direct-attach storage is faster than network-attached storage. The
> logic (which I've argued before as well) is simple: a NAS box talks
> to its drives as DAS. Thus, the DAS must necessarily be "faster"
> (yeah, a vague term). For example, setting caching aside, it is not
> possible for a filer to pump 50MB/s of data to an NFS client if it can
> only read 30MB/s off its own drives.
>
> However, all things are not equal, at least in benchmarks that are
> possible in the real world. Yes, NFS and CIFS add overhead compared
> to SCSI-over-FibreChannel or what have you. However, that is offset
> by an optimized OS (Data ONTAP), by an efficient filesystem (WAFL), by
> read and write caching, by an optimized TCP/IP stack, etc. If you
> could port all that and run it to DAS, then you might have a fair
> comparison.
>
> > I think I would like to see a test where the Disk sizes and number
> > were similar, I sincerely doubt the Netapp would do as well.
>
> Depends on the application, of course, but I've been surprised
> many times in the past when I thought for sure the Netapp would not be
> able to keep up. I have a 4x450-MHz E420R with a VxVM RAID-0 device,
> spread over 16 50GB 7200 rpm drives on two U2SCSI buses. The server
> also has a Gigabit Ethernet connection to an F740 with one shelf of
> 36GB 10000 rpm drives (5 data, 1 parity, 1 spare). The local
> filesystem is vxfs, mounted with delaylog and the largest allowable
> log area.
>
> I ran a few filesystem replication and backup/restore tests (this
> is our central tape server). The local filesystem handily beat the
> Netapp doing large sequential reads and writes (120MB/sec vs.
> 22MB/sec)... no surprise there. File deletions were a little closer
> (~2500 unlinks/sec on vxfs, ~2000 unlinks/sec on the Netapp). In all
> other tests, the Netapp was as fast or faster (sometimes by a large
> margin) than local filesystem. The Netapp seems to especially shine
> when you have multiple processes reading and writing to all points on
> the filesystem. vxfs does not appear to handle it as gracefully with
> dozens or hundreds of concurrent access requests.
>
> I re-ran some of the same tests with a Veritas RAID-5 volume (to
> be fair to the Netapp), but I stopped after the first couple. There
> is no contest at that point. Veritas software RAID-5 is dog-slow (I
> think I saw bursts of 8MB/sec sequential writes). Turn on a Veritas
> snapshot, and writes to the snapped filesystem go even further into
> the toilet. The performance degradation is cumulative with the number
> of snapshots. There is no such penalty on the Netapp.
>
> One caveat I should mention, since it bit us in the past: file
> locking performance. We have one application that, when running on
> the same type of hardware as above (E420R with those drives), spews
> forth 150,000 syscalls per second, according to Solaris' "vmstat".
> 80% of those calls are fcntl() locks/unlocks to various database files
> on disk. Poor programming practice aside, this application runs very
> slowly over NFS. It simply cannot match in-kernel file locking when
> you're dealing with a local filesystem. Besides that one exceptional
> application, we run Netapps for everything else (including Oracle).
> --
> Brian Tao (BT300, taob(a)risc.org)
> "Though this be madness, yet there is method in't"
>
[View Less]
Comments interspersed below:
-----Original Message-----
From: Aaron Sherman [mailto:ajs@ajs.com]
Sent: Wednesday, August 09, 2000 1:50 PM
To: toasters(a)mathworks.com
Subject: Snapshots and Oracle
I have a few snapshot questions for anyone else who's running Oracle:
1. Does a hot backup (as described on
http://www.netapp.com/tech_library/3049.html) present any
restoration problems? Has anyone done a restore from a hot
backup?
We (NetApp) have done lots of restores in testing …
[View More]to ensure that they work
as expected. However, I am unaware of a customer having to do this
live up to this time. Generally, we consider this a good thing! However,
I would strongly suggest you test it in your environment before too long
so that you get a feel for how it works and differs from a more
"traditional" storage solution.
2. Hot backups are as close to "live" as you're supposed to
come, but it seems to me that a nice extra bit of paranoia
would be to take snapshots 2-3 times per day of the live
database. The image would be consistent, and Oracle would
come up off of it as it would from a crash (unlike, say a
tar of the data-files which would be junk). The only problem
would be that you would loose any half-completed
actions which were not correctly tansactionalized. Can
anyone comment on this?
Being sure we're on the same page, a hot backup is where the updates have
"extra" info written to the redo logs while the datafiles are being
manipulated by the backup program. Theoretically enough information exists
in the redo logs to completely recover any actions occurring to the datafiles
during the backup. So we put the entire database
into backup mode, manually take a snapshot on the filer (typically via an
'rsh' from the Oracle server system), and then take the database out of
backup. This operation can certainly be repeated several times each day
with little adverse effect except for the growing number of disk blocks
consumed by each snapshot image. So if the rate of change of the
datafiles is moderate, this procedure works just fine. Of course, this
also requires correct management of the archived logs.
Following this procedure, there are no half-completed actions that need
recovery. And given the short time this requires (usually 2-3 minutes)
it is not a bad thing to do several times each day.
However, if your thought was to just let a few snapshots occur using
the built-in (cron) mechanism on the filer, then your assumption is
correct that a handful of transactions will not have made it into
the datafiles. However, all logfile data will be correct since the
NVRAM is flushed prior to a snapshot being taken. In this way, while
the database is "inconsistent" it is in a state that Oracle
should never have a problem of "warm recovering".
3. Has anyone seen roughly how much storage an Oracle snapshot
takes? It seems to me that it would be larger than a
filesystem snapshot because more blocks are changing. Of
course, it depends on your read/write ratio, but is there a
general consensus?
Again, hard to say. A Snapshot per se does not take up space. It
locks down blocks currently in use and forces updates to these
same blocks to be written to new locations. So the additional
space occupied by a snapshot occurs after the snapshot is taken
and is directly related to the rate of modification to the database.
A lot of database customers have just left the default 20% setting
in place and not yet found a reason to change it.
Actually, it is probably less than a filesystem snapshot since lots
of user applications don't actually modify blocks within a file, they
modify a whole file. Most editors, for instance, read a file into
memory, allow it to be edited, and then write the file back out from
the beginning. So even though we only modified a single word, the
entire file is generally written to new disk blocks. Oracle, on the
other hand, does modify files in-place and generally only modifies
all blocks during certain maintenance operations
required to restructure the internal elements.
Database logfiles, on the other hand, are getting completely modified
all the time. Of course, they generally are only appended to, so a
logfile in use across the snapshot will not occupy too much new space,
but once it is copied to the archive directory, the space does
not immediately become available. This is one reason why we recommend
segregating logfiles from datafiles onto their own volume; their
management and space allocation requirements are significantly different.
I've so far been very impressed with my filer. Oracle over NFS sounded
outright dumb to me at first, but damn if it isn't faster than local
disk.
> Glad to hear you like it!
>
> Bruce Clarke
> Enterprise Solutions Development Manager
> Network Appliance
>
Thanks.
--
Aaron Sherman
Systems Architect
HighWired.com -- The global high school community
http://www.highwired.com
300 North Beacon Street
Watertown, MA 02472
(617) 926-1850 x238
(617) 926-1861 fax
asherman(a)highwired-inc.com / ICQ#43677395 / Y!ajstrader
"We had some good machines, but they don't work no more."
-"Faded Flowers" / Shriekback
[View Less]