camscape - for excellent IT solutions itkb.ro - IT knowledge base

linux :: exception emask 0x0 sact 0x0 serr 0x0 action 0x6 frozen

David
David G.
Titleexception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Tagsexception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen,disable NCQ
Desc.Why not to use all drivers in kernel and how to disable NCQ
CodeKBLN0042 v1.0
Date24 noiembrie 2018

I have a big server, 10 drives in RAID10 software. Upon intensive writes, I get in logs:


 

[ 6546.141374] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 6546.141379] ata3.00: failed command: FLUSH CACHE EXT
[ 6546.141388] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 20
[ 6546.141388]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 6546.141390] ata3.00: status: { DRDY }
[ 6546.141395] ata3: hard resetting link
[ 6546.473681] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 6546.475448] ata3.00: supports DRM functions and may not be fully accessible
[ 6546.480640] ata3.00: supports DRM functions and may not be fully accessible
[ 6546.484070] ata3.00: configured for UDMA/133
[ 6546.484074] ata3.00: retrying FLUSH 0xea Emask 0x4
[ 6546.484173] ata3.00: device reported invalid CHS sector 0
[ 6546.484180] ata3: EH complete
[ 7518.897796] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 7518.897800] ata4.00: failed command: FLUSH CACHE EXT
[ 7518.897810] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 10
[ 7518.897810]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 7518.897812] ata4.00: status: { DRDY }
[ 7518.897817] ata4: hard resetting link
[ 7519.240084] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 7519.241938] ata4.00: supports DRM functions and may not be fully accessible
[ 7519.247198] ata4.00: supports DRM functions and may not be fully accessible
[ 7519.250545] ata4.00: configured for UDMA/133
[ 7519.250548] ata4.00: retrying FLUSH 0xea Emask 0x4
[ 7519.250647] ata4.00: device reported invalid CHS sector 0
[ 7519.250654] ata4: EH complete
[ 7979.678045] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 7979.678049] ata7.00: failed command: FLUSH CACHE EXT
[ 7979.678058] ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 18
[ 7979.678058]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 7979.678061] ata7.00: status: { DRDY }
[ 7979.678066] ata7: hard resetting link
[ 7980.010310] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 7980.044636] ata7.00: supports DRM functions and may not be fully accessible
[ 7980.049900] ata7.00: supports DRM functions and may not be fully accessible
[ 7980.053390] ata7.00: configured for UDMA/133
[ 7980.053393] ata7.00: retrying FLUSH 0xea Emask 0x4
[ 7980.053547] ata7.00: device reported invalid CHS sector 0
[ 7980.053551] ata7: EH complete

 

 

So, 3 drives get connection lost. What could be the cause?

Here are the options (all hardware):

  • problems with the drives. Unlikely if more than one
  • problems with SATA cables
  • not enough power from PSU, or faulty PSU

 

 

But we can try something else:

 

a. Disable NCQ if kernel cannot be reinstalled:

 

echo 1 > /sys/block/sdX/device/queue_depth

 

Run this for all drives

 

 

b. Do not load unecessary kernel modules. In my case, only AHCI driver was needed. If kernel was compiled without "ATA SFF Support" the problem was solved.

 

 

Also, take into account that disabling NCQ will be awfull in a multi IO environment. Like hypervisors, database servers a.s.o. In my example, writing a 1TB file was 434MB/sec in a. solution (disabling NCQ) and 620MB/sec in b. solution (unload ATA SFF drivers).