[patch] timers again

From: Andrew Morton (andrewm@uow.edu.au)
Date: Sat Jun 17 2000 - 09:23:36 EDT

  • Next message: Andrey Savochkin: "Re: 2.2.16 SMP hangs after a while"

    This one fixes up the del_timer_sync deadlock detector a bit.

    - If a deadlock is detected, print the message and bust out, so the
      machine keeps running (in this case, del_timer_sync will end up
      acting like a really, really slow del_timer_async).

    - Limit the number of messages to 10.

    - Print some extra info (the timer handler address, a stack dump for
      x86).

    - I'm offended to discover that __builtin_return_address(0) returns
      the wrong address under gcc 2.7.2.3 with -fomit-frame-pointer so I
      open-coded the stack peek and made it x86-specific.

    - Moved the what-to-do instructions off of some random guy's web page
      and into Documentation/kernel-timers.txt

    So, where do we now stand with timer deletion races?

    - Various fixlets for net/ipv4, net/core and net/ipv6 are in Alexey's
      hands.

    - With my earlier patch and the stuff which will trickle in from
      maintainers, the important net drivers are done.

    - IDE is safe.

    - drivers/char was mostly fixed in my timer_struct killer.

    OTOH:

    net/sunrpc: looks wrong - tasks can be woken after they've
                been removed from the wait queue.

    drivers/video: some races in the cursor flashing code.
                   Petr has a patch...

    drivers/scsi: Dunno yet. SCSI makes my brain hurt. It
                  could be that SCSI timers only ever expire under
                  catastrophic conditions, so we may not have to worry.

    drivers/net/slip.c: quagmire

    drivers/net/wan/*: hard to fix

    net/appletalk, net/decnet, net/ax25, net/ipx: look wrong, need work

    net/irda: looks wrong. Very hard to fix.

    drivers/net/pcmcia/*: probably not very important
                          for SMP. dhinds will be taking a look.

    Once SCSI is sorted we're OK for a classical SMP "server". But not for
    an SMP "desktop". It would be rather nice if more than one person was
    looking into this...

    --- linux-2.4.0-test1-ac19/kernel/timer.c Fri Jun 16 00:43:30 2000
    +++ linux-akpm/kernel/timer.c Sat Jun 17 23:21:21 2000
    @@ -244,9 +244,18 @@
                             while (timer_is_running(timer) && --count)
                                     ;
                             if (count == 0) {
    - printk( "del_timer_sync(%p): deadlock! Called from %p\n",
    - timer, __builtin_return_address(0));
    - printk("See http://www.uow.edu.au/~andrewm/linux/deadlock.html\n");
    + static int ntimes = 10;
    + if (ntimes) {
    + --ntimes;
    + printk( "del_timer_sync(%p): deadlock!\n", timer);
    + printk("handler=%p\n", timer->function);
    + printk("See Documentation/kernel-timers.txt\n");
    +#ifdef CONFIG_X86
    + printk("Called from %p\n", (&timer) - 1);
    + show_stack(0);
    +#endif
    + }
    + return ret;
                             }
                     }
             }
    --- linux-2.4.0-test1-ac19/Documentation/kernel-timers.txt Sat Jun 17 23:21:34 2000
    +++ linux-akpm/Documentation/kernel-timers.txt Sat Jun 17 16:54:34 2000
    @@ -0,0 +1,42 @@
    +Kernel timer deadlock diagnostics
    +Andrew Morton <andrewm@uow.edu.au>
    +17 June 2000
    +
    +Kernel 2.4.0-test introduced a mechanism to detect and break out of
    +deadlocks which may occur in a call to the del_timer_sync() function.
    +
    +If you're reading this file, then you have probably just seen a message like this:
    +
    + del_timer_sync(c0137680): deadlock!
    + handler=c0123340
    + See Documentation/kernel-timers.txt
    + Called from c0108890
    + cec7ff4c c0220803 00000000 d082913f 00000246 00000000 00000001 00000000
    + 00000001 00000000 d0828155 d08292a4 d0828a95 00000005 d0828a77 d0828445
    + 00000000 00000000 d0828795 00000000 00000100 cecc3f28 d0828052 ffffffea
    +Call Trace: [<c0227529> ... ]
    +
    +come out of your 2.4.x kernel.
    +
    +This means that the timer synchronisation code has detected a deadlock
    +condition which will have to be fixed.
    +
    +Please take the following steps to generate a call backtrace:
    +
    + 1. cd /usr/src/linux
    + 2. gdb vmlinux
    + 3. x/10i 0xc0108890 (The 'Called from' number above, with a '0x' in front)
    + 4. x/10i 0xc0123340 (The 'handler' address from above)
    + 5. x/10i 0xc0227529 (The 'Call Trace' number form above)
    + 6. x/10i .......... (Some more number form the 'Call Trace' list)
    +
    +Omit any address which do not start with "c0" (This is x86 specific.
    +If you're running another platform, you probably know what to do).
    +
    +Please send an email to linux-kernel@vger.rutgers.edu describing
    +
    +- what has happened
    +- how frequently you are seeing it
    +- the output from the gdb command above
    +- system description as per the REPORTING-BUGS file
    +

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat Jun 17 2000 - 09:22:10 EDT