MacNN Forums (http://forums.macnn.com/)
-   Mac OS X (http://forums.macnn.com/mac-os-x/)
-   -   Time Machine and HFS corruption (http://forums.macnn.com/90/mac-os-x/502227/time-machine-and-hfs-corruption/)

 
besson3c Jul 14, 2013 02:59 PM
Time Machine and HFS corruption
I have been reading that TM wanting to recreate a backup is often a sign of a corrupt backup. This has been happening frequently on my wife's MBA. Recently when this happened I discovered that HFS was screwed up and Disk Utility/fsck wanted to boot off the recovery partition to repair the damage, so I did this. I was really shocked to see that a few days later it wanted to do the same thing after having more TM difficulties, with more HFS corruption being reported (with the same message about needing to boot from the recovery partition).

Is it true that TM is reliant upon the HFS file structure (maybe the journal?) to keep its backups straight?


Why won't Apple give us a file system that is actually safe to write data to? This is insane. How are normal users supposed to know to run Disk Utility to check for something like this, and how to band-aid HFS when it blows up? Of course, this no doubt happens even more frequently on SATA drives (like I said, this happened on an SSD). Anyway, excuse the rant, you know my outspoken opinions on HFS being complete crap.

Has anybody experienced these TM errors before? Am I barking up the right tree?
 
Waragainstsleep Jul 14, 2013 05:13 PM
HFS does seem to get less reliable as time goes on. I see the old "OS X cannot repair..." message more often that I used to.
 
P Jul 15, 2013 03:39 PM
In my experience, a repaired HFS slice is significantly less reliable than a fresh one. Don't ask me why that is, but my way of working has always been to repair a slice to get stuff off it and then "reformat" (install a new file system).

Note that HFS+ with journaling doesn't just "go bad" spontaneously. Either there is a bug in the fs code in the kernel - which seems highly unlikely at this point - or the underlying drive is unreliable.
 
OreoCookie Jul 16, 2013 03:34 AM
You're preaching to the choir. Every year at WWDC, I am waiting for an announcement that Apple will replace their filesystem. Perhaps now that all Macs are going the route of SSDs, we'll get a filesystem specifically tailored to non-spinning platter storage device and synching? Not sure whether this is realistic, but a man can dream ;)
 
besson3c Jul 16, 2013 03:38 AM
Quote, Originally Posted by P (Post 4238699)
In my experience, a repaired HFS slice is significantly less reliable than a fresh one. Don't ask me why that is, but my way of working has always been to repair a slice to get stuff off it and then "reformat" (install a new file system).

Note that HFS+ with journaling doesn't just "go bad" spontaneously. Either there is a bug in the fs code in the kernel - which seems highly unlikely at this point - or the underlying drive is unreliable.

Sure it can go bad spontaneously, there is no write integrity of any kind. Do you mean go bad without any disk activity?
 
P Jul 17, 2013 05:10 PM
HFS+ without journalling could go bad for pure software reasons - a system crash while writing was enough. With journalling, it basically takes a bug in the FS code or a hardware defect for the journal to become corrupted. This last is what ZFS famously guards against.
 
besson3c Jul 18, 2013 04:55 AM
Quote, Originally Posted by P (Post 4239097)
HFS+ without journalling could go bad for pure software reasons - a system crash while writing was enough. With journalling, it basically takes a bug in the FS code or a hardware defect for the journal to become corrupted. This last is what ZFS famously guards against.
AFAIK the journal just keeps a record that something has been written so that the file system structure remains pure and uncorrupted, but with every write there is no way for HFS to know that that write took place successfully with a checksum that matches the original. Couldn't this bit rot apply to the HFS data/metadata itself?
 
P Jul 18, 2013 05:37 AM
The journaling means that any metadata that is written is written twice, to ensure that the write completes (basically. It's a little more complicated than that). It does rely on the hardware performing according to specs, however - if the hardware fails to write the correct thing, it won't ever be detected. Before journaling, the file system could be corrupted if a write was interrupted - the fsck program that ran after reboot had to "guess" what the file system was supposed to say, and it could end up in an inconsistent state. With the journal, you can be guaranteed that the writes to the file system complete or are not done at all - they are what you call atomic.

The silent corruption you're talking about is a hardware failure. HFS+, like most file systems, have no mechanism to detect or correct that. Drives have some of that in hardware, but it is not 100%. Of course that can happen in the file directory sectors as well - it is even the most likely sectors for that to happen - but it's still a hardware error.

(Also, I think that HFS+ keeps an internal backup of some directory structures, so a catastrophic error can be fixed. Doesn't guard against silent corruption, however)
 
besson3c Jul 18, 2013 06:06 AM
Quote, Originally Posted by P (Post 4239182)
The journaling means that any metadata that is written is written twice, to ensure that the write completes (basically. It's a little more complicated than that). It does rely on the hardware performing according to specs, however - if the hardware fails to write the correct thing, it won't ever be detected. Before journaling, the file system could be corrupted if a write was interrupted - the fsck program that ran after reboot had to "guess" what the file system was supposed to say, and it could end up in an inconsistent state. With the journal, you can be guaranteed that the writes to the file system complete or are not done at all - they are what you call atomic.

The silent corruption you're talking about is a hardware failure. HFS+, like most file systems, have no mechanism to detect or correct that. Drives have some of that in hardware, but it is not 100%. Of course that can happen in the file directory sectors as well - it is even the most likely sectors for that to happen - but it's still a hardware error.

(Also, I think that HFS+ keeps an internal backup of some directory structures, so a catastrophic error can be fixed. Doesn't guard against silent corruption, however)

This makes sense, but then, my wife's laptop is a Macbook Air and it has been otherwise running beautifully. Isn't it kind of unlikely that there would be an SSD hardware failure?

What kind of (preferably free) tools do you guys recommend for checking on SSD health? I was reading that the SMART tools are sort of helpful, but SMART was essentially built for drives with mechanical parts so it doesn't seem worth running tools like SMARTReporter and such. I'd be interested in testing this theory that the HFS corruption is due to hardware failure.
 
P Jul 18, 2013 04:00 PM
That is a gap in the market. Manufacturers have tools for their own drives, but usually only for Windows. All we have is SMART and trying to read the raw data fields.
 
joseph_techi Jul 31, 2013 02:42 PM
Hi
I use an external drive to backup my Mac with Time Machine.
Sometimes I feel some problem - if you have to turn off your machine by holding down the power button instead of unmounting the Time Machine volume even when it's not actively taking back up, you will end up with a corrupted HFS Volume. That can only be fixed by running fsck_hfs -f. I'm not really sure if it is exactly a time machine related issue or any hardware related issue. I think, you are experiencing same problem ,isn't it?
 
P Jul 31, 2013 03:45 PM
No, that's not the same at all. If you force a shutdown, the file directory will go bad. Normally it should be fixed at reboot, or at worst by running Disk Utility (which runs the same fsck_hfs tool, except without the -f flag). Not sure why you need the -f flag.
 
besson3c Jul 31, 2013 07:08 PM
Yeah, it's not the same problem, and my wife actually just got the Time Machine error again even with a clean file system. It looks like this can also happen with the Time Machine disk image becoming corrupt, although I don't know why that is happening for her but not for me.
 
All times are GMT -4. The time now is 03:08 AM.

Copyright © 2005-2007 MacNN. All rights reserved.
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2016, vBulletin Solutions, Inc.


Content Relevant URLs by vBSEO 3.3.2