Re: linux 2.2.18-pre17: "Kernel panic: LRU list corrupted"

From: Andrea Arcangeli (andrea@suse.de)
Date: Tue Oct 24 2000 - 23:06:31 EDT

  • Next message: linux@cr753963-a.glph1.on.wave.home.com: "2.4.0pre9 and an analog joystick"

    On Mon, Oct 23, 2000 at 02:20:17PM -0700, H. Peter Anvin wrote:
    > Hi there,
    >
    > I wanted to let you know that I was trying 2.2.18-pre17 on
    > hera.kernel.org, a uniprocessor with an SMP motherboard. After about six
    > hours, it went catatonic, responding to pings and TCP SYNs but not doing
    > anything that required user space.
    >
    > On the console, it had multiple copies of the message:
    >
    > "Kernel panic: LRU list corrupted" [fs/buffer.c:438]
    >
    > ... but no register dump.
    >
    > I have fallen back to 2.2.17 and it has run stably for a few days now.

    I found one bug that can generate that kind of corruption and lockups and it's
    in 2.2.17 too (and it was in the 2.2.18pre*aa kernels too even if for some
    VM change I did it was extremely hard to reproduce there)

    I fixed it in 2.2.18pre17aa1 (I suggest to give a try to 2.2.18pre17aa1 btw).

    I also included the fix in a new VM-global patch against vanilla 2.2.18pre17
    (the VM-global patch is available as a single patch inside 2.2.18pre17aa1/
    directory too but I have to maintain a separate version of it against clean
    2.2.18pre17 due silly rejects that I can't avoid)

            ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre17/VM-global-2.2.18pre17-7.bz2

    (the way I could reproduce the hang with 2.2.18pre17aa1 is been while testing
    LVM snapshotting because while a LV is under snapshot [as also while using
    raid5] WRITEA will block too)

    Vanilla 2.2.18pre17 can reproduce such bug one order of magnitude more easily
    since it blocks there all the time, and I had to partly change that blocking
    behaviour in my tree for performance reasons. That's why people reported that
    VM-global patch "cured" the problem. But really it had a small window for that
    bug too.

    So now I ported the strict fix to 2.2.18pre17 clean. It's untested but I'm
    almost sure it will fix the problem there too.

    --- 2.2.18pre17/fs/buffer.c.~1~ Tue Sep 5 02:28:47 2000
    +++ 2.2.18pre17/fs/buffer.c Wed Oct 25 04:38:34 2000
    @@ -1468,10 +1468,13 @@
     #define BUFFER_BUSY_BITS ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected))
     #define buffer_busy(bh) ((bh)->b_count || ((bh)->b_state & BUFFER_BUSY_BITS))
     
    -static int sync_page_buffers(struct buffer_head *bh, int wait)
    +static int sync_page_buffers(struct page * page, int wait)
     {
    + struct buffer_head * bh = page->buffers;
             struct buffer_head * tmp = bh;
     
    + page->buffers = NULL;
    +
             do {
                     struct buffer_head *p = tmp;
                     tmp = tmp->b_this_page;
    @@ -1482,6 +1485,8 @@
                             ll_rw_block(WRITE, 1, &p);
             } while (tmp != bh);
     
    + page->buffers = bh;
    +
             do {
                     struct buffer_head *p = tmp;
                     tmp = tmp->b_this_page;
    @@ -1533,7 +1538,7 @@
      busy:
             too_many = (nr_buffers * bdf_prm.b_un.nfract/100);
     
    - if (!sync_page_buffers(bh, wait)) {
    + if (!sync_page_buffers(page_map, wait)) {
     
                     /* If a high percentage of the buffers are dirty,
                      * wake kflushd

    The above strict version of the fix is downloadable from here too:

            ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre17/strict-VM-corruption-fix-1

    Andrea
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Tue Oct 24 2000 - 23:09:19 EDT