Re: (shm WARNING) shm still deadlocks

From: Manfred Spraul (manfreds@colorfullife.com)
Date: Mon Mar 20 2000 - 09:52:14 EST

  • Next message: Peter Rival: "Not quite there with 2.3.99-pre2"

    From: "Christoph Rohland" <hans-christoph.rohland@sap.com>
    >
    > I tried to reproduce this on 2.3.99-pre3-2. I did run 60 recursive
    > wget over 100MBIT against my apache and no lockup at all. This is on
    > SMP and UP, under memory pressure and without.
    >
    > Do you use any special apache modules?

    I don't have enough time for a full review of the recent changes, but at
    least the comment in IPC_RMID is dangerous:

        up(&shm_ids.sem);
        /* The kernel lock prevents new attaches being happening. [...]
        */
        shm_remove_name(id)

    Sounds very dangerous. The scheduler automatically releases the big kernel
    lock, and if VFS sleeps on a semaphore, or if shm_unlink sleeps on
    down(&shm_ids.sem), then we get a crash.
    Unless VFS protects us, the following could happen:

    CPU0:
        sys_shmctl(IPC_RMID);
        up(&shm_ids.sem);
        do_unlink();
    CPU1:
        sys_shmget();
        down(&shm_ids.sem);
    CPU0:
        shm_unlink() : calls down(&shm_ids.sem); <<<< sleeps. releases the
    kernel lock.
    CPU1:
        sys_shmget() returns with success.
        sys_shmat(). attaches.

    and now the scheduler decides that the shm_unlink operation should continue.
    crash.

    And a cosmetic note: the big kernel lock is more important than the
    shm_ids.sem semaphore. I would reorder

        lock_kernel();
        down(&shm_ids.sem);
    to
        down(&shm_ids.sem);
        lock_kernel();

    This was standard for the mmap semaphore and lock_kernel in 2.2.
    Neither version will lock-up.

    --
        Manfred
    

    - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Mon Mar 20 2000 - 10:33:21 EST