hdd drive problems by keyword

10B8B
“10 bit to 8 bit” error flag
usually a Drive Interface Issue

AMNF
data Address Mark Not Found error flag
serious, a physical drive issue, may be indicative of a failing drive
the sector ID was found, but the start of data cannot be found, so the data for the sector is lost

ATA bus error
usually a Drive Interface Issue

BadCRC
usually indicates a bad cable
check each of the Drive Interface Issues below, but most likely it is Drive interface issue #2
Directory bread

DisPar
usually a Drive Interface Issue

DRDY
Drive ReaDY flag, not a problem so ignore it
failed to IDENTIFY
bad, drive is not able to identify itself
usually a Drive Interface Issue

failed to recover
bad, no communications even after resetting the drive
usually a Drive Interface Issue

frozen
means the exception handler is ‘frozen’ while dealing with the error; uninformative so just ignore it
Handshk
Handshake error flag
usually a Drive Interface Issue

hard resetting link
not an error, very common message indicating the error handler is trying to reset the channel and attached drive(s) in order to resume normal communications
HostInt
Host interface error flag
usually a Drive Interface Issue

HSM violation
invalid ‘Host State Machine’ state or response, “STATUS value doesn’t match HSM requirement”
this error could be caused by almost anything, such as buggy driver, faulty device (buggy or crashed firmware on the drive), buggy firmware on the disk controller, and/or bad SATA cable
invariably, this error is ultimately fixed by an upgrade somewhere, to the driver or to one of the firmwares; unfortunately an upgrade may not yet be available, so a downgrade may be necessary instead (or live with it!)

ICRC
interface CRC error
usually indicates a bad cable

IDNF
sector ID Not Found error flag
serious, a physical drive issue, may be indicative of a failing drive
since the sector ID could not be found, the sector cannot be found, and the data for the sector is lost

interface fatal error
usually a Drive Interface Issue

LinkSeq
usually a Drive Interface Issue

media error
generally indicates a bad sector, but should be confirmed by an increase in the REALLOC’s and/or CURRENT_PENDING’s on the SMART report

PHYRdyChg
usually a Drive Interface Issue

qc timeout
unsure, but not good; one of a number of timeouts

revalidation failed
not good
usually a Drive Interface Issue

soft resetting link
not an error, very common message indicating the error handler is trying to reset the channel and attached drive(s) in order to resume normal communications
timeout

UNC
UNCorrectable media error flag, usually associated with a bad sector

UnrecovData
usually a Drive Interface Issue

Drive problems by error message

There are many kinds of drive errors. Examine each section below for the highlighted key words that most closely match the errors you see in your syslog.
The examples below will often include the ATA channel number involved with a particular drive. The actual numbers are not important, and will be different for each drive. The channel itself is usually something like ata3 or ata12, the actual attached drive will be something like ata2.00 and ata13.01. Most will end in .00, as there is only one drive per SATA channel, but IDE or IDE emulating channels may have 2 (eg. ata4.00 and ata4.01, master and slave), and port multipliers and SAS channels can have even more, such as ata5.00, ata5.01, ata5.02, ata5.03, and ata5.04. For more information on these ata drive symbols, see Drive Symbols.

Drive Interface Issues
These are problems with the cables and connections to the drive, both power and data, or the quality of the power supplied. If your errors match one of these, then almost certainly, your drive is completely fine. There have been many drives returned or thrown out, after numerous errors similar to the following issues, that were entirely the fault of the cables or power or connectors used, NOT the drive itself.
Drive interface issue #1
An example:
ata3.00: exception Emask 0x50 SAct 0x1 SErr 0x280900 action 0x6 frozen
ata3.00: irq_stat 0x08000000, interface fatal error
ata3: SError: { 10B8B BadCRC } often may also include DisPar, UnrecovData, and/or HostInt
From an expert:
“Your machine seems to be suffering genuine link layer problem.
In most cases, this indicates hardware problem and in my experience,
common causes are (in the order of ballpark frequency)…
# inadequate power supply
# device and controller don’t like each other on 3Gbps
# cable too long or flaky connector (especially with eSATA cables or genders or backplanes)
# faulty controller or drive”

tejun (http://lkml.org/lkml/2008/12/2/426)
(written by one of the foremost experts)
The presence of BadCRC is a pretty good indicator of a poor quality SATA cable. However, if a better cable does not solve the issue, then it is probably a power problem (loose power cable or backplane connection, poor connectors, poor power splitter, overloaded power supply, too many drives on power rail, bad power supply, etc).

Drive interface issue #2
An example:
res 40/00:00:48:19:67/00:00:1e:00:00/40 Emask 0x50 (ATA bus error)
ata3: SError: { UnrecovData HostInt 10B8B BadCRC }
These errors are usually related to a bad cable or cable connector, or possibly bad power. The presence of BadCRC or ICRC is a pretty good indicator of a poor quality SATA cable. However, if a better cable does not solve the issue, then it is probably a power problem (loose power cable or backplane connection, poor connectors, poor power splitter, overloaded power supply, too many drives on power rail, bad power supply, etc).

Drive interface issue #3
An example:
ata2.00: exception Emask 0x10 SAct 0x7ff4f SErr 0x400100 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { UnrecovData Handshk }
From an expert:
“This is transmission error. Most common causes are power related or
unreliable connection especially if backplanes are involved. Is the
problem still reproducible? If so, can you please try to move it to
different power connector and SATA port and see what changes?”

tejun

Drive interface issue #4
This is an example of what is probably a loose backplane or cable connection issue: (could be either the SATA connection or the power connection or both)
ata7.00: exception Emask 0x10 SAct 0x7 SErr 0x990000 action 0xa frozen
ata7.00: irq_stat 0x00400000, PHY RDY changed
ata7: SError: { PHYRdyChg 10B8B Dispar LinkSeq }
ata7.00: cmd 60/48:00:af:1b:97/00:00:10:00:00/40 tag 0 ncq 36864 in
res 40/00:10:87:5f:96/00:00:10:00:00/40 Emask 0x10 (ATA bus error)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata7.00: qc timeout (cmd 0xec)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata7.00: revalidation failed (errno=-5)
ata7: failed to recover some devices, retrying in 5 secs
Note: There are no CRC errors here, which normally implicate a bad cable or two.
These problems are often related to a backplane, perhaps loose, perhaps vibration-related, perhaps defective. If the SATA link remains up for awhile, but communications are clearly bad, then the emphasis should probably be on the power connection. The easiest way to test whether it is the fault of the backplane is to reinstall the drive outside of the backplane.
If there is no backplane involved, then the same considerations apply to the cable connections, each end of both the SATA and power cables, including any power cable splitters that may be involved. It is common after opening a computer case, to jostle the cables, and SATA cables are notorious for coming loose, if they aren’t the locking type. It is a good habit to check all SATA connections just before closing a case up.
Good quality SATA and power cables and splitters are strongly recommended. Always make certain that they are firmly connected, and not subject to vibration. The same is even more important for backplanes, make sure that drives are firmly and well seated in their trays, and cannot be vibrated loose.

Physical Drive Issues
These are actual errors from the drive itself, perhaps a failing drive, or perhaps just failing sectors. In general, you will always want to Obtain a SMART report for the drive.
Drive media issue #1
A typical example:
ata3.00: cmd 60/00:10:4f:80:81/04:00:66:00:00/40 tag 2 ncq 524288 in
res 41/40:64:eb:80:81/85:03:66:00:00/40 Emask 0x409 (media error)
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
These are almost always associated with bad sectors. They should be confirmed by examining a SMART report. See the Troubleshooting page, Obtaining a SMART report section. Then run a SMART long test (instructions in same section), to locate the bad sectors. You may need to seek advice as to what to do next, as it will depend on your specific situation.

Other Drive Issues
Unexpected loss of removable drive
“I get a lot of messages like the following in the syslog…
What are they and should I be concerned?”

Mar 10 14:59:10 Tower kernel: FAT: Directory bread(block 510) failed
Mar 10 14:59:10 Tower kernel: FAT: Directory bread(block 511) failed
Usually when those errors appear, the system has lost contact with the flash drive.
It could be the USB port (loose or faulty)
Try re-seating the flash drive
Try connecting to a different USB port
It could be the flash drive is going bad
Test it on another machine
It could be a shared IRQ has been disabled, one that serviced this USB port
Check the syslog for evidence related to its IRQ
more to be added, as discovered

You will have to power off to get the system back, and most likely, unRAID will want to start a parity check, because it cannot update the flash drive with a proper shutdown. Any settings changes won’t be saved either, until the flash drive is accessible again.

Leave a Reply

Your email address will not be published. Required fields are marked *