This saga could also be named “Why my very expensive LSI 9305-24i drops disks?”.
I recently acquired (from ebay as I cannot justify giving $2000NZ to local scumbags) an LSI 9305-24i controller to drive a 24bay NAS.
The reason switching from LSI 9112-8i with RES2CV360 SAS2 expander is because the LSI 9112-8i is bottlenecked by the 8x PCIe 2.0, thus limiting the throughput of a 24 disk array to about ~100MB/s per disk.
The LSI 9305-24i made a significant boost in the performance, with ~200MB/s (maxing out Seagate Ironwolfs) per disk.
Unfortunately because I needed SFF-8643 to SFF-8087 cables I had limited options of sourcing. I got 6x SFF-8643 to SFF-8087 from Amazon (HiFibre branded).
This whole setup worked wonders until a random disk would drop out of the array.
When that would happen I would see errors like these:
[ 1059.345207] mpt3sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000) [ 1060.094729] mpt3sas_cm0: log_info(0x31110635): originator(PL), code(0x11), sub_code(0x0635) [ 1065.399121] mpt3sas_cm0: log_info(0x31120311): originator(PL), code(0x12), sub_code(0x0311) [ 1065.399128] mpt3sas_cm0: log_info(0x31120311): originator(PL), code(0x12), sub_code(0x0311) [ 1170.818427] mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x300062b20299153f) [ 1170.818429] mpt3sas_cm0: removing handle(0x0024), sas_addr(0x300062b20299153f) [ 1170.818429] mpt3sas_cm0: enclosure logical id(0x500062b202991533), slot(23) [ 1170.818430] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 3758.857760] sd 0:0:49:0: [sdl] tag#4877 Add. Sense: Information unit iuCRC error detected [ 3758.857761] sd 0:0:49:0: [sdl] tag#4877 CDB: Read(16) 88 00 00 00 00 00 00 ce 38 e0 00 00 01 00 00 00 [ 3758.857762] blk_update_request: I/O error, dev sdl, sector 13514976 op 0x0:(READ) flags 0x80700 phys_seg 17 prio class 0
After numerous swapping cables, disks backplanes the issue was isolated to a particular port on the card, thus making me believe the card was faulty.
I needed confirmation thus I bought another cable from the Amazon (CableCreation brand).
The port problem persisted. While I was poking about, and the issue was very intermittent, I discovered that I can trigger the issue by gently moving the cable while it was plugged in!.
Not only that, but the issue could be replicated with other cables and ports.
Here is a clip of the issue that is causing I/O errors:
This motion could be caused by someone walking, or even thermal expansion.
Here is a position where it is less likely to encounter error:
here is the position where it is most likely to drop a disk or two:
So, at this stage I have sunk hundreds of dollars in cables and have an awesome card I can’t use.
I decided to buy Dell cables (this time 75cm long, so I have less tension on the plugs), and will probably make some clips (or even hotglue) to secure them in the adapter.
At this stage, while I wait, I went back to old configuration, I would rather have sub-optimal performance than data loss.