The WAFL file system store metadata in files, three files : - inode file - block-map file - inode map
Is it possible to copy this files ? I would like to make precises stats on this file system.
Thanks in advance,
Séb.
I'm not sure what your goal is, but wafl_susp -w might give you some answers you are looking for. or wafl stats.
What are you looking for?
-Blake
On 10/22/07, Sebastien Campion sebastien.campion@irisa.fr wrote:
The WAFL file system store metadata in files, three files :
- inode file
- block-map file
- inode map
Is it possible to copy this files ? I would like to make precises stats on this file system.
Thanks in advance,
Séb.
Thanks, i'll see the manual. I need to query file system instantly, like a SQL database.
I had 25Tb of data and many directories of 100 000 files, when i try "ls -all" it's too long ..... I'd enough time to take a coffee :)
Blake Golliher wrote:
I'm not sure what your goal is, but wafl_susp -w might give you some answers you are looking for. or wafl stats.
What are you looking for?
-Blake
On 10/22/07, Sebastien Campion sebastien.campion@irisa.fr wrote:
The WAFL file system store metadata in files, three files :
- inode file
- block-map file
- inode map
Is it possible to copy this files ? I would like to make precises stats on this file system.
Thanks in advance,
Séb.
I had 25Tb of data and many directories of 100 000 files, when i try "ls -all" it's too long ..... I'd enough time to take a coffee :)
Just a hint:
'ls' sorts it's output, so you won't see anything until the 'getdents-loop' finished.
You could try using 'find', it should be a lot faster than ls (because it doesn't sort) but it will still take some time to complete.
Regards, Adrian
I have to deal with millions of objects in filesystems, I highly recommend subdirectories. look at your nfs_hist output. First do nfs_hist -z. Then count to 30 and run nfs_hist again. It's a histogram of all nfs ops, and how long they took in milisecond buckets. I'd bet lookup is taking a very long time. When dealing with a large number of objects, sensible directory structures are key.
-Blake
On 10/22/07, Adrian Ulrich toaster@blinkenlights.ch wrote:
I had 25Tb of data and many directories of 100 000 files, when i try "ls -all" it's too long ..... I'd enough time to take a coffee :)
Just a hint:
'ls' sorts it's output, so you won't see anything until the 'getdents-loop' finished.
You could try using 'find', it should be a lot faster than ls (because it doesn't sort) but it will still take some time to complete.
Regards, Adrian
On Mon, Oct 22, 2007 at 12:55:48PM -0700, Blake Golliher wrote:
I have to deal with millions of objects in filesystems, I highly recommend subdirectories. look at your nfs_hist output. First do nfs_hist -z. Then count to 30 and run nfs_hist again. It's a histogram of all nfs ops, and how long they took in milisecond buckets. I'd bet lookup is taking a very long time. When dealing with a large number of objects, sensible directory structures are key.
Yes, but to be fair, this is a weakness in the wafl filesystem. You cannot have everything, and wafl has made a trade off in the way it stores file metadata that makes it slow to handle large number of files in a directory.
I am not sure if netapp is planning any enhancements in this area or even what would be possible.
Anybody care to comment?
Regards, pdg
--
See mail headers for contact information.
I'd argue that this is general file system issue, not so much a wafl issue. I don't think wafl is particularly slow at this workload either, they do far better then most other nas gear I've used for this workload. There are trickier things out there, like bluearc has some preloaded cache for meta data to help speed it along, but that's just fixing the problem by tossing it all in memory. If you compare a file system operation to file system operation between netapp and bluearc I'm sure you'd fine similar performance issues for a directory with millions of objects.
But I do hope some of those wafl guys can figure out a way to make lots of objects in a file system faster. It can be a huge pain.
-Blake
On 10/22/07, Peter D. Gray pdg@uow.edu.au wrote:
On Mon, Oct 22, 2007 at 12:55:48PM -0700, Blake Golliher wrote:
I have to deal with millions of objects in filesystems, I highly recommend subdirectories. look at your nfs_hist output. First do nfs_hist -z. Then count to 30 and run nfs_hist again. It's a histogram of all nfs ops, and how long they took in milisecond buckets. I'd bet lookup is taking a very long time. When dealing with a large number of objects, sensible directory structures are key.
Yes, but to be fair, this is a weakness in the wafl filesystem. You cannot have everything, and wafl has made a trade off in the way it stores file metadata that makes it slow to handle large number of files in a directory.
I am not sure if netapp is planning any enhancements in this area or even what would be possible.
Anybody care to comment?
Regards, pdg
--
See mail headers for contact information.
On Mon, Oct 22, 2007 at 06:26:56PM -0700, Blake Golliher wrote:
I'd argue that this is general file system issue, not so much a wafl issue. I don't think wafl is particularly slow at this workload
The problem itself is a general filesystem issue I agree.
However, my understanding is that wafl stores the metadata with the file, I assume as the first block. UFS stores the metadata in inodes, and they are not stored with the file, but the system tries to keep inodes "close" to the file. Inodes reside together with other inodes so a single read can pick up a lot of inodes.
So to me it seems possible thatUFS may have an advantage when accessing metadata because the number of individual block reads (and hence seeks etc) could be a lot less.
Obviously statistics come into play a lot here. Thats why I am talking in generalities.
If anything I say above is incorrect, please correct me. I am not a FS developer.
Regards, pdg
--
See mail headers for contact information.
Peter D. Gray wrote:
On Mon, Oct 22, 2007 at 06:26:56PM -0700, Blake Golliher wrote:
I'd argue that this is general file system issue, not so much a wafl issue. I don't think wafl is particularly slow at this workload
The problem itself is a general filesystem issue I agree.
However, my understanding is that wafl stores the metadata with the file, I assume as the first block. UFS stores the metadata in inodes, and they are not stored with the file, but the system tries to keep inodes "close" to the file. Inodes reside together with other inodes so a single read can pick up a lot of inodes.
A quote from http://www.csse.monash.edu.au/~carlo/SYSTEMS/Network-Appliance-0697.htm:
Additional features have been added to the WAFL design to enhance performance. The most important of these is a clever hashing mechanism to speed up searches for files in large directories. This facility, termed a directory hash, is in effect a directory level name cache which is designed to cache every single file in the directory, rather than a recently accessed subset as is the case in more conventional name caching models. Netapps published data suggest this provides a five-fold performance increase for a 30,000 file directory search.
The effect is, if you know the name of the file the access is very fast.
Hi Peter,
From the classic TR 3001:
"Like Episode, WAFL uses files to store meta-data. The three most important WAFL meta-data files are the inode file (which contains all inodes), a free block bitmap file, and a free block count file. Keeping meta-data in files allows meta-data blocks to be written anywhere on disk. This is the origin of the name WAFL, which stands for Write Anywhere File Layout. WAFL has complete flexibility in its write allocation policies because no blocks are permanently assigned to fixed disk locations as they are in the Berkeley Fast File System (FFS)."
So data blocks and inodes are stored separately. This is why snapshots are so darn fast...OnTap just copies those files over. This is an incredible innovation.
Techno-history buffs may also be interested in checking out the original paper describing WAFL that was presented at Usenix '94 at:
http://www.usenix.org/publications/library/proceedings/sf94/full_papers/hitz...
But I'm pretty sure that the reason ls on a directory with lots of objects takes a long time is that you run out of CPU on the single filer head. Generic FS issue which can't be solved unless you have a clustered NAS system which can use multiple heads that take up the extra load of sorting files and so on.
Full disclosure: I work for one of those clustered NAS companies :-)
Regards, Sandeep Cariapa
--- "Peter D. Gray" pdg@uow.edu.au wrote:
On Mon, Oct 22, 2007 at 06:26:56PM -0700, Blake Golliher wrote:
I'd argue that this is general file system issue,
not so much a wafl
issue. I don't think wafl is particularly slow at
this workload
The problem itself is a general filesystem issue I agree.
However, my understanding is that wafl stores the metadata with the file, I assume as the first block. UFS stores the metadata in inodes, and they are not stored with the file, but the system tries to keep inodes "close" to the file. Inodes reside together with other inodes so a single read can pick up a lot of inodes.
So to me it seems possible thatUFS may have an advantage when accessing metadata because the number of individual block reads (and hence seeks etc) could be a lot less.
Obviously statistics come into play a lot here. Thats why I am talking in generalities.
If anything I say above is incorrect, please correct me. I am not a FS developer.
Regards, pdg
--
See mail headers for contact information.
Peter D. Gray wrote:
Yes, but to be fair, this is a weakness in the wafl filesystem. You cannot have everything, and wafl has made a trade off
Actually, WAFL performs quite well with large directories, much better than filesystems on average. At least, that was the fact in year 2000 when we got our first filer.