 |
 |
Panther has Automatic File Defragmentation
|
 |
|
 |
|
Forum Regular
Join Date: Dec 2001
Location: Sacramento, Calif.
Status:
Offline
|
|
While we're talking about not so well known or hidden new Panther features: let's not forget to mention Automatic File Defragmentation.
In Panther, everytime an application opens a file for reading, HFS+ checks if the file is fragmented and is less than 20MB in size. If so, it copies the file's contents to a continuous region on the disk and frees up the previously allocated blocks.
Q: What count's as a fragmented file?
As far as the auto-defrag code in concerned, this:
if (fp->ff_blocks &&
fp->ff_extents[7].blockCount != 0 &&
fp->ff_size <= (20 * 1024 * 1024))
which roughly means "files less than 20MB that have a non-zero block count in the last item of their list of extents structures." You'll have to chase all those structs through the code to find out if that necessarily indicates fragmentation or is just a best-guess.
Q: Is that code executed when the file is actually read, or when the file is just opened? I was thinking that it would be fairly straightforward to `find / -type f` to list all the files, and pipe that to a program (or script?) that would just call `open()` and then `close()` it back again. If it defrags on `open()`, then that would end up defragging all the eligible files, w/o the overhead of having to launch the proper program or deal with saving it and all that stuff.
As far as I can tell from reading the code: HFS+ references files using "extents". An extent is a data structure in the HFS catalog file (the file that keeps track of all the other files in the system). A given extent stores a start location, and a length - essentially, it tracks a single contiguous chunk of bytes on disk. As a file becomes fragmented, it uses more and more extents to keep track of it (i.e. a 100 byte file might be tracked in a 1 extent of 20 bytes, 3 extents of 10 bytes each, and 1 extent of 50 bytes). Up to 8 extents can be tracked directly in the catalog tree. If a file has more than 8 extents, they are tracked in another file - the Extents Overflow File. Access to a file that is tracked in both the Catalog and the Extents Overflow File is slow - both because it implies that the file is (relatively) highly fragmented, and because the Extents Overflow File probably isn't as "hot" as the Catalog file (not as likely to be cached). In general, access to a fragmented file is slow.
What this code is doing is: When a file is opened, OSX will look at it. If the file has any data, if the file is less than 20MB in size, and if the file has 8 or more extents, it will be defragmented. Note that the defragmentation won't happen if the disk has very little contiguous freespace left, since you don't have the space to defragment.
I'd guess that the parameters (20MB and 8+ extents) were gathered through some sort of research or testing. It seems high for my taste, but I'll trust the engineers (particularly since Panther seems pretty fast). My guess as to why this happens _all_ the time: Willingness to pay an up front cost for speed in subsequent operations. There might also be the possibility that most of the time an open is followed by a read, so defragging might not do much more than preheat the file-cache. I'm not sure what "non-busy" means in the code, I'll have to do more digging. One nice thing about this scheme is that you don't spend time moving around files that no-one cares about. If you don't use it, it doesn't get defragged.
There's also something called: "Hot-File-Adaptive-Clustering":
The rest of the mechanism is interesting, and hasn't been discussed yet - it looks like over a period of time, OSX keeps track of which files have been use the most and moves them to a "hot band". This is already a bonus, since it'll defrag the files during the move, but I'm not sure what the "hot band" is (probably the fastest area on the physical hard disk). Looks like this all happens in the background over several days.
Q: I'm trying to understand the Hot File Clustering. It sounds like a great idea - find all the frequently used files, and move them to the fastest part of the disk.
First of all, Hot File Clustering only works on Journaled HFS+ disks, and it only works on the boot disk. HFC reserves a chunk of the disk (5MB for every 1GB of disk) as the Hotfile area - this is the "fast" part of the disk.
This is where the hotfile mechanism does all its work. Over a period of about 60 hours, files that are read are added to a list of potential hotfile candidates. Once the 60 hours is up, all the files are examined. Each file is assigned a temperature (basically, the number of times the file was read during the 60 hour cycle), and merged into the hotfile catalog file. During this time, old files are cooled (dropping their temperature by half). At this point, cold files are moved out of the hotfile area to make room. Finally, hot files are moved into the hotfile area. Then the process starts over.
Q: One thing that I don't understand at this point is that when files are moved into or out of the hotfile area, there seems to be a strong limit - no more than 50 files or 300 allocation blocks (whichever comes first). This implies that no file larger than 1.2KB (300 * 4KB for a standard HFS+ disk) will ever get moved into the hotfile area. This seems bizarre to me (but then I suppose it's possible that OSX spends a lot of time reading a _lot_ of _tiny_ files). I also wonder why this is limited to the boot disk only... though if my supposition is correct (that OSX likes many tiny files), it might not help - most of what OSX would be reading is probably system files. Still, it probably couldn't hurt?
As far as the auto-defrag code in concerned, this:
if (fp->ff_blocks &&
fp->ff_extents[7].blockCount != 0 &&
fp->ff_size <= (20 * 1024 * 1024))
which roughly means "files less than 20MB that have a non-zero block count in the last item of their list of extents structures." You'll have to chase all those structs through the code to find out if that necessarily indicates fragmentation or is just a best-guess.
Q: What is done with the information collected on the files that aren't defragged? Can it be accessed i.e. is it stored in a database of sorts? The reason I ask is that we've heard rumors about the extensive rewrite of the file system, and I'm wondering how much of this can be seen as groundwork for future advances.
Nothing is done with the files that aren't defragged. Eventually, they will be aged out of the candidate tree. The mismatch between allowed filesizes is apparently just a bug. This is a good optimization overall, but it's not rocket science - Windows has had this for years (since Win98 I think), though Windows does it more manually, and more comprehensively - they track a larger wad of files, and periodically (once a week?) schedule a defragmentation/optimization sweep.
It looks pretty much bolted onto HFS+, just like the journaling code.
|
|
davidb
|
| |
|
|
|
 |
|
 |
|
Forum Regular
Join Date: Dec 2001
Location: Sacramento, Calif.
Status:
Offline
|
|
I am really surprised that no one has even commented on these stunning new features in Panther. I posted the information about auto-defragging and adaptive hot-file clustering thinking that this might spark some intelligent discussion. However, I still see people posting questions about what utility to buy to defrag Panther... totalling ignoring these new built-in features. I hope the information in this post wasn't too deep for people to understand.
|
|
davidb
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: Apr 2001
Location: europe
Status:
Offline
|
|
As long as you don't mention your source you can tell us something about the horse.
|
|
Nasrudin sat on a river bank when someone shouted to him from the opposite side: "Hey! how do I get across?" "You are across!" Nasrudin shouted back.
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Dec 2000
Status:
Offline
|
|
I'm sure people appreciate it, but don't take it personally if they're not responding. on this topic there's not much to say besides 'I like it, good job Apple,' or the random 'mac sukz'.
|
|
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: Oct 2001
Location: Yokohama, Japan
Status:
Offline
|
|
Maybe people aren't replying because this is old news.
|
|
|
| |
|
|
|
 |
|
 |
|
Registered User
Join Date: Nov 2002
Status:
Offline
|
|
|
|
|
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Jul 2001
Location: New York, NY
Status:
Offline
|
|
Originally posted by rocheb:
I don't think I fully understand what this means.
I have been thinking of getting an external HD for my MP3s, even though I have more than enough room on my powerbook. I thought that the constant moving of files (adding, deleting, copying) would defrag my disk and things would become slow. Panther is able to fix this? Am I even talking about defragmentation here?!?!?
no. you're talking about disk fragmentation.
de-fragmentation is what you do to cure fragmentation.
But yes, that's the basic idea - apparently (and I didn't know this until this post) OS X will defrag small files for you automagically.
|
|
cpac
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
sorry but could anyone tell me why hitting this thread made my mac want to connect to some place at port 8100?
|
|
|
| |
|
|
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Jul 2002
Location: Germany
Status:
Offline
|
|
Originally posted by fortepianissimo:
sorry but could anyone tell me why hitting this thread made my mac want to connect to some place at port 8100?
Hi!
I came across something similar some time ago, and someone told me that it was due to a picture in a signature that was trying to be loaded.
|
|
|
| |
|
|
|
 |
|
 |
|
Forum Regular
Join Date: Dec 2001
Location: Sacramento, Calif.
Status:
Offline
|
|
Originally posted by Taipan:
Hi!
I came across something similar some time ago, and someone told me that it was due to a picture in a signature that was trying to be loaded.
Sorry 'bout that...my signature includes a .jpg that is located at our MUG's website and it is accessed via port 8100.
|
|
davidb
|
| |
|
|
|
 |
|
 |
|
Forum Regular
Join Date: Dec 2001
Location: Sacramento, Calif.
Status:
Offline
|
|
Originally posted by rocheb:
I don't think I fully understand what this means.
If Hot-file Clustering puts "your" most used files onto the Hot-band of the hard drive and the act of copying them there results in their being defragged, then over time the clustering of these most used files (in a defragged state) on the hottest band of the hard drive should make the functions of the "Hot-files" faster and more responsive.
I think that Journaling is a safety net for the integrity of all the files being moved around during auto-defragging and Hot_File Clustering.
|
|
davidb
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: Apr 2001
Location: europe
Status:
Offline
|
|
What's the point of "Secure Delete" if the file has been moved around to another place meanwhile if Panther really has automatic file defragmentation (which I doubt)?
Shouldn't it then be possible to turn off defragmentation for people who need "Secure Delete"?
|
|
Nasrudin sat on a river bank when someone shouted to him from the opposite side: "Hey! how do I get across?" "You are across!" Nasrudin shouted back.
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Sep 2000
Location: Canada
Status:
Offline
|
|
Originally posted by Developer:
What's the point of "Secure Delete" if the file has been moved around to another place meanwhile if Panther really has automatic file defragmentation (which I doubt)?
Shouldn't it then be possible to turn off defragmentation for people who need "Secure Delete"?
I only skimmed the thread, but I think Panther checks a file for de-fragmenting only when it's opened. Since you're not likely to securely delete a file that is open, this isn't a problem.
|
|
|
| |
|
|
|
 |
|
 |
|
Grizzled Veteran
Join Date: Jan 2002
Location: Melbourne, Australia
Status:
Offline
|
|
Originally posted by Developer:
if Panther really has automatic file defragmentation (which I doubt)?
The code is from the Darwin source. You are free to download it ifyou want. This feature is very real.
A post on the Apple mailing lists by John Sirucusa regarding automatic defragmenting.
The reply from an Apple Darwin Filesystem developer.
A query about Hot-file clustering
The response by the same Apple developer.
|
|
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: Apr 2001
Location: europe
Status:
Offline
|
|
Originally posted by dtriska:
I only skimmed the thread, but I think Panther checks a file for de-fragmenting only when it's opened. Since you're not likely to securely delete a file that is open, this isn't a problem.
I don't think that it is an unlikely scenario that a user opens a file to see if it is the correct one and then deletes it. If that opening moves the file the secure delete deletes the new defragmented copy but leaves the old fragmented remains alone.
|
|
Nasrudin sat on a river bank when someone shouted to him from the opposite side: "Hey! how do I get across?" "You are across!" Nasrudin shouted back.
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|