A peculiar case of I/O errors with cheap SFF-8643 cables

This saga could also be named “Why my very expensive LSI 9305-24i drops disks?”.

I recently acquired (from ebay as I cannot justify giving $2000NZ to local scumbags) an LSI 9305-24i controller to drive a 24bay NAS.

The reason switching from LSI 9112-8i with RES2CV360 SAS2 expander is because the LSI 9112-8i is bottlenecked by the 8x PCIe 2.0, thus limiting the throughput of a 24 disk array to about ~100MB/s per disk.

The LSI 9305-24i made a significant boost in the performance, with ~200MB/s (maxing out Seagate Ironwolfs) per disk.

Unfortunately because I needed SFF-8643 to SFF-8087 cables I had limited options of sourcing. I got 6x SFF-8643 to SFF-8087 from Amazon (HiFibre branded).

This whole setup worked wonders until a random disk would drop out of the array.

When that would happen I would see errors like these:

[ 1059.345207] mpt3sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[ 1060.094729] mpt3sas_cm0: log_info(0x31110635): originator(PL), code(0x11), sub_code(0x0635)
[ 1065.399121] mpt3sas_cm0: log_info(0x31120311): originator(PL), code(0x12), sub_code(0x0311)
[ 1065.399128] mpt3sas_cm0: log_info(0x31120311): originator(PL), code(0x12), sub_code(0x0311)

[ 1170.818427] mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x300062b20299153f)
[ 1170.818429] mpt3sas_cm0: removing handle(0x0024), sas_addr(0x300062b20299153f)
[ 1170.818429] mpt3sas_cm0: enclosure logical id(0x500062b202991533), slot(23)
[ 1170.818430] mpt3sas_cm0: enclosure level(0x0000), connector name(     )

[ 3758.857760] sd 0:0:49:0: [sdl] tag#4877 Add. Sense: Information unit iuCRC error detected
[ 3758.857761] sd 0:0:49:0: [sdl] tag#4877 CDB: Read(16) 88 00 00 00 00 00 00 ce 38 e0 00 00 01 00 00 00
[ 3758.857762] blk_update_request: I/O error, dev sdl, sector 13514976 op 0x0:(READ) flags 0x80700 phys_seg 17 prio class 0

After numerous swapping cables, disks backplanes the issue was isolated to a particular port on the card, thus making me believe the card was faulty.

I needed confirmation thus I bought another cable from the Amazon (CableCreation brand).

The port problem persisted. While I was poking about, and the issue was very intermittent, I discovered that I can trigger the issue by gently moving the cable while it was plugged in!.

Not only that, but the issue could be replicated with other cables and ports.

Here is a clip of the issue that is causing I/O errors:

wiggle-wiggle – drop a disk, or two

This motion could be caused by someone walking, or even thermal expansion.

Here is a position where it is less likely to encounter error:

crappy SFF-8643 close together

here is the position where it is most likely to drop a disk or two:

spread apart

So, at this stage I have sunk hundreds of dollars in cables and have an awesome card I can’t use.

I decided to buy Dell cables (this time 75cm long, so I have less tension on the plugs), and will probably make some clips (or even hotglue) to secure them in the adapter.

At this stage, while I wait, I went back to old configuration, I would rather have sub-optimal performance than data loss.

