Re: kernel timer races

From: Andrey Savochkin (saw@saw.sw.com.sg)
Date: Sat May 27 2000 - 01:16:32 EDT

  • Next message: dart@iowaone.net: "epic100 && >=2.3.99pre7 && ping -f == reproducible lockup"

    Andrew,

    On Sat, May 27, 2000 at 02:37:04PM +1000, Andrew Morton wrote:
    > Andrey Savochkin wrote:
    > > It's more than del_timer_sync() because it ensures that the code segment
    > > isn't used by timer handler, and, thus, is suitable for module cleanups.
    >
    > Could you please explain this a little more? I don't see it.

    I meant that the 2.3.99 del_timer_sync() waits for timer_exit() call
    in the handler. The proposed code of wait_on_timers() waits for the end of
    run_timer_list() and, thus, guarantees that the handler has _returned_, not
    just called timer_exit().

    It means that
            set_do_not_readd_flag
            del_timer
            wait_on_timers()
            MOD_DEC_USE_COUNT
    is safe, while
            del_timer_sync() /* 2.3.99 implementation */
            MOD_DEC_USE_COUNT
    isn't.

    > Were you referring to the existing (2.3.99) del_timer_sync()?

    Yes.

    > > I've looked more precisely over your timer-race-6 patch.
    > > It has not so big overhead as I expected for the first time.
    > > However, I think it has a potential problem. In del_timer() you drop the
    > > spinlock, wait for the handler completion, and then acquire the lock again
    > > and check if the timer has been added again. It doesn't look very robust.
    >
    > You are correct.
    >
    > But remember that this whole code path is a one-in-a-million thing. So
    > having another one-in-a-million race within it has improved things by,
    > umm, a million. Plus if it ever does happen, we blurt out a warning,
    > and the end result is the same as what we have now.

    I'm just proposing to take the assumption that the handler is non-destructive
    if the code calls del_timer() and simplify and harden the code.

    We cannot touch timer_list structure in run_timer_list() because the handler
    _may_ destroy it's sure that other code doesn't call del_timer() or
    mod_timer() on it at the same time.

    But in del_timer() we may assume that the structure is valid during the whole
    call. I suspect that in your timer-race-6 code we may just go to the
    beginning of del_timer function after spinning on "*vtl == timer" and drop
    all the added_timer stuff.

    >
    > But I'll look at adding a 'goto retry' into that code path. Just have
    > to work out how to test it...

    [snip]
    > > I suggest to assume that if del_timer is called than the timer cannot be free'd
    > > or destroyed. If it isn't true, there will be race conditions in any case.
    > > The assumption will simplify things a lot.
    >
    > That is the assumption which the existing del_timer_sync() makes.
    >
    > Having recently looked at a zillion timer handlers I believe this
    > assumption is safe. But adding timer_exit() to every timer handler is
    > unattractive for obvious reasons (ditto the reinsert-protection flag).

    Yes.

    > The existing del_timer_sync also raises the possibility of the module
    > being unloaded prior to the handler returning, but this is so
    > ridiculously improbable that even I wouldn't make a fuss about it.

    Not too much ridiculous.
    It's a design problem. All design problems are problems independently of how
    often they show.

    >
    > > What do you think about it?
    >
    > I think I need to think some more :) Fun stuff.
    >
    > I hereby do solemnly swear to write Documentation/kernel-timers.txt!

    :-) You're taken at word!

    Best regards
                    Andrey

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat May 27 2000 - 01:18:33 EDT