DiskFailure has a single purpose, to warn you about potential or imminent hard disk failure before the disk is rendered nonfunctional/data is lost.
It will warn you trough alert dialogs, growl notifications and the menubar icon, it also features* a detailed list of your drives and the monitored parameters for each and a kernel log history of any relevant drive events.
If iCloud is enabled all the data is stored in the cloud, the information for any disks or logs is available on all your machines and accessible from either one so you can monitor disks remotely, never lose any history and it is all within reach.
* also includes a MagicPrefs plugin with the same core functionality (logging,notifications) as in the standalone app.
Tips
On my own experience with rotational disk drives and information researched from external sources i have come to a number of conclusions, i do not imply any guarantee about any of them however and my anecdotal evidence is limited to mostly WD drives (external and internal) :
- SMART stats and bad block negatives are very unreliable, that is your drive could be damaged even if SMART reports no problem and no bad blocks, positives on the other hand are pretty reliable signs the drive is damaged but they usually surface too late or not at all
- Power supply spikes/cuts still have potential of damaging rotational drives even with the parking failsafes in them, surge protection/ups is a drive's best friend after good heat dissipation
- Non mechanical failures can lead to mechanical failures, this is a theory i have based on observations that initial loss of magnetic reliability on some sectors spirals down into mechanical failure as the drive servo's struggle over and over with the compromised sectors instead of quickly marking them as bad and avoiding them, this seems to lead to actual mechanical failure in the servo's or otherwise
- The partition with the operating system's caches and user's home directory seems to degrade quickest on account of wear and tear from the bulk of the file system operations (caching, logs, preferences, etc)
- Seth Noble/Data Expedition has a excellent (and mac centric) article about data loss, recovery and everything in between: Hard Drive Recovery - UNIX Tips
My particular setup on account of the above observations is one that incorporates a external drive which should be significantly cooler than one inside a machine, a ups protecting it from power surges and cuts and lastly fault separation by partitioning, i will go on to detail this:
I always partition a drive in at around five or so, 3 smaller ones for operating systems and 1 or 2 partitions for media and applications (the bulk of stuff that does not get written into often), i install the main OS on the first partition, install a second on the second and keep a third for backup or a older OS, this way i have been able to isolate a reading error occurring on the main OS partition of a 1 year old drive without junking the whole thing (within minutes migrated the settings from the main OS into the second partition and stopped using the faulty partition altogether)
- look in your console log if you get any read/write errors, things like "kernel[0]: jnl: disk1s10: flushing fs disk buffer" "kernel: disk0s7: operation was aborted." "kernel: jnl: disk0s7: journal start/end pointers reset!" note which partition experiences the errors
FAQ
Q: Why does the volume name displayed on the left side of logs seems to not match the actual partition with the issues ?
A: This happens if partitions were added/removed between since the log entry (recovery partitions count too)
Q: What do the red log lines mean ?
A: It depends on each specific one but generally it indicates a serious issue and is generally coupled with drive status set to failing.
Plugin
DiskFailure is not only available as a standalone application but also as a MagicPrefs plugin that you get bundled within the standalone application.
SMART
SMART stands for Self-Monitoring, Analysis and Reporting Technology and was initially implemented in DiskFailure however i have eventually removed it due to it's unreliability and Apple's sandboxing requirements which prevent it's use entirely, DiskFailure customers can download a standalone application (requires DiskFailure from the Mac App Store) and check SMART parameters at their own leisure or the legacy DiskFailure1 below.
DiskFailure1
A unsupported legacy version of DiskFailure version 1.x is still available for download (requires DiskFailure from the Mac App Store).
It is un-sandboxed and implements SMART so the following parameters also apply:
HDD
-bad blocks over 10
-temperature over 68°C/155°F
-drive boot count is over 5000
-load cycles over 500000
SSD
-bad blocks over 1000
-temperature over 68°C/155°F