2.2.14 & adaptec 1542 not recovering from SCSI bus reset

From: synrg@bgpc.dymaxion.ca
Date: Tue Feb 15 2000 - 09:08:53 EST

  • Next message: Jakub Jelinek: "Dynamic DMA for pci2000.c"

    I have had two incidents in which my system has hung under 2.2.14
    using an Adaptec 1542CF that I just installed, replacing the old
    Adaptec 1520A I had in there before.

    First incident:

    - I was in 'screen' ssh'd to this headless box, so I was able to
      look about a bit even though the disk was giving me IO errors
      every time I tried to access something on the disk
    - The last thing logged to syslog (cached, but not written, as
      I later found out upon a reboot) was a SCSI bus reset
    - After my investigation, I switched over a monitor & kbd to this
      box so I could see anything logged to the console .. unfortunately
      it had scrolled and so entirely consisted of ext2 complaining that
      it couldn't find certain inodes and had remounted the single
      filesystem on this box (/ on /dev/sda1) read-only
    - I saw no kernel oops during the first incident, but that's not
      saying it didn't happen, as the console messages had scrolled
    - Upon reboot, the e2fsck ran clean

    Second incident (this morning):

    - This time I immediately switched the monitor & kbd to the box
      to see if I could spot anything in the console messages
    - I saw the tail end of a kernel oops ... but with only partial
      data and no way to capture it, I couldn't do anything with it
    - I had no way of getting at syslog to see if there was a SCSI
      reset
    - Upon reboot, there were some e2fsck errors that needed manual
      intervention, but nothing terribly damaging
    - I saw no evidence of a SCSI bus reset this time, but that's not
      saying it didn't happen ... with no way to access syslog before
      the reboot, it could very well have been the same as in the first
      incident (i.e. the bus resets, it gets logged, but not written
      to disk before the system is rebooted)

    Combing the list archives for the past little while, I did notice
    a patch introduced in 2.2.14 (which I verified was applied in this
    version) that alleges to fix a kernel oops when the SCSI bus resets.
    I wonder if that patch solves the whole problem.

    I'd give more info if I could. Unfortunately, the nature of my
    problem (the box has a single ext2 filesystem which is immediately
    remounted read-only when the crash occurs, preventing logging of
    useful messages) makes it rather difficult to collect data. If
    someone has tips as to how I can get some useful information, I'll
    be happy to give it a try.

    Finally, as to what might be causing the SCSI bus resets, I don't know
    ... perhaps the cat is playing with my temptingly dangly external SCSI
    cable or the flimsy power cord for the scanner. I could try unhooking
    the SCSI scanner from the box and setting termination on the card to
    eliminate that as a possibility. Nevertheless, a bus reset shouldn't
    cause my system to completely hang and require a reboot, should it?

    Ben

    --
        nSLUG       http://www.nslug.ns.ca      synrg@sanctuary.nslug.ns.ca
        Debian      http://www.debian.org       synrg@debian.org
        Chebucto    http://www.chebucto.ns.ca   aa458@chebucto.ns.ca
    [ pgp key fingerprint = 7F DA 09 4B BA 2C 0D E0  1B B1 31 ED C6 A9 39 4F ]
    

    - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 09:22:34 EST