spinlocks() are severely broken in 2.2.X and 2.4.X for modules

From: Jeff V. Merkey (jmerkey@timpanogas.com)
Date: Fri Jun 30 2000 - 18:31:02 EDT

  • Next message: Tim Waugh: "[patch] devfs_unregister documentation"

    I have spent the past five and one half weeks chasing down a severe
    memory corruption problem in the NWFS LRU. The problem I am reporting
    is what has caused ALL the bugs folks have seen over the past couple of
    months in NWFS with memory corruption. It's what's held up our next
    release by over two months. Despite screaming customers, I have held
    off on posting the next release until I understood clearly what was
    going on -- now I know -- Linux spinlocks() don't work when used by
    kernel modules in linux that include spinlock.h.

    NWFS is fully multi-threaded and uses fine grained locking. Much of
    Linux is not fine grained, which would explain why I may be one of the
    few folks to first see this problem. Initially, I was under the
    assumption that the spinlocks() in Linux worked, and because of this,
    focused my attention on my own code rather than scrutinize the code in
    Linux. I rewrote large sections of NWFS, put in traps, and checks, list
    consistency routines, etc. It was not until I put in a check for
    multiple users being inside the same lock that I located the problem.
    Well, I have completed a very thorough analysis of generated IA32 code,
    and I have discovered something rather shocking, which is that the
    spinlock code in Linux is severely broken, and this is not due to a
    coding error, but a problem with the GCC compiler apparently generating
    garbage. There's also several issues with using "lock bts" instead of
    "xchg eax, 1", which is the recommended method for implementing
    spinlocks() on IA32 intel systems.

    What's really scary here is that a great deal of new code has been
    written that depends on this spinlock code, and once the spinlock code
    gets fixed properly, we may see tons of deadlocks and lockups all over
    the place since this code has been there for a very long time.

    All the NWFS code that manifests this problem was built with the
    following GCC compiler and linker switches:

    Compiler Switches
    -----------------

    -D__SMP__ -DMODVERSIONS -DMODULE -D__KERNEL__ -Wall -Wstrict-prototypes
    -O2 -fomit-frame-pointer -fno-strength-reduce -pipe -m386 -DCPU=386
    -I/usr/src/linux/include -c

    Linker Switches
    ---------------

    -m elf_i386 -Map nwfs.map -r -o

    The code fragment in /usr/src/linux/include/asm/spinlock.h 2.2.16 that
    generates the spin_lock() code is as follows:

    #define spin_lock_string \
            "\n1:\t" \
            "lock ; btsl $0,%0\n\t" \
            "jc 2f\n" \
            ".section .text.lock,\"ax\"\n" \ // THIS CODING CONSTRUCT IS NOT
    WORKING WHEN INCLUDED BY A KERNEL MODULE
            "2:\t" \
            "testb $1,%0\n\t" \
            "jne 2b\n\t" \
            "jmp 1b\n" \
            ".previous"

    This macro is clearly trying to generate a code section that a failing
    lock attempt will jump to if the carry flag was set because the lock was
    already engaged. Unfortunately, for kernel modules that include the
    file spinlock.h this is clearly not what happens. The following code
    fragment is a call to spin_lock_irqsave() within a function called
    NWLockLRU(). As you can see from the code fragment, the lock, upon
    seeing the carry flag set because someone else held the lock, jumps into
    a completely different function's code and returns as though the user
    acquired the lock.

    With regard to 2.4.X kernels, the code fragment in
    /usr/src/linux/include/asm/spinlock.h 2.4.0-test2 that generates the
    spinlock() function code is as follows:

    #define spin_lock_string \
            "\n1:\t" \
            "lock ; decb %0\n\t" \
            "js 2f\n" \
            ".section .text.lock,\"ax\"\n" \ // THIS CODING CONSTRUCT IS NOT
    WORKING WHEN INCLUDED BY A KERNEL MODULE
            "2:\t" \
            "cmpb $0,%0\n\t" \
            "rep;nop\n\t" \
            "jle 2b\n\t" \
            "jmp 1b\n" \
            ".previous"

    We have an identical situation on 2.4.X here in that the spinlock, upon
    failing, jumps into an entirely different code section. The function
    NWLockLRU() is coded as follows in NWFS:

    void NWLockLRU(LRU_HANDLE *lru_handle)
    {
       spinlock_irqsave(&lru_handle->LRU_spinlock, lru_handle->LRU_flags);
       return;
    }

    what gets generated from spinlock.h and GCC in the 2.4.X case is as
    follows:

    <NWLockLRU>
    0x45a0 mov 0x4(%esp, 1),%edx
    0x45a4 pushf
    0s45a5 pop %eax
    0x45a6 cli
    0x45a7 mov %eax,0x80(%edx)
    0x45ad lock decb 0x7C(%edx)
    0x45b1 js 0x4639 <ReleaseWaiters+49> // if the lock fails it
    jumps to this address is in the function ReleaseWaiters() ????
    0x45b7 ret

    <ReleaseWaiters>
    0x4608 push %esi
    0x4609 push %ebx
    0x460a mov 0xc(%esp,1),%ebx
    0x460e cmpl $0x0,0x484(%ebx)
    0x4615 je 0x463d <ReleaseWaiters+53>
    0x4517 lea 0x4c(%ebx),%esi
    0x461a lea 0x0(%esi),%esi
    0x4620 push %esi
    0x4621 call 0x4622 <ReleaseWaiters+26>
    0x4626 mov 0x484(%ebx),%edx
    0x462c lea 0xFFFFFFFF(%eax),%edx
    0x462f mov %edx,0x484(%ebx)
    0x4635 add $0x4,%esp
    0x4638 cmp $0x1,%eax
    0x463b jne 0x4620 <ReleaseWaiters+24> // the call to
    spinlock_irqsave() ends up jumping here, which will exit and allow two
    0x463d pop %ebx // processors to acquire
    the lock at the same time.
    0x463e pop %esi
    0x463f ret

    This is clearly a severe problem. It is uncertain, however, whether it
    is occurring in the kernel proper, or if it only manifests itself when
    kernel modules, like file system drivers, include spinlock.h. It's very
    possible that a large number of drivers could be broken with this, since
    it is gneerated form a macro rather than a kernel function proper that
    is linked when a module loads. It appears that any drivers that have
    been built using these constructs may see this problem. What the code
    is trying to do is not what the compiler is actually outputing into the
    executable code.

    MANOS/NETWARE SPINLOCK IMPLEMENTATION
    --------------------------------------

    This is the spinlock code I wrote for MANOS (an open source os like
    NetWare) and also the code I wrote for the spinlocks that are used in
    NetWare today. I have tested a version of this code with GCC and NWFS
    to get around the spinlock() problem in Linux, and it seems to work
    correctly. It's provided as an example that uses the [xchg] instruction
    instead of "lock bts".

    _TEXT SEGMENT PUBLIC USE32 'CODE'

        ASSUME cs:_TEXT, ds:_DATA, es:_DATA, fs:_DATA

    ;
    ; void spin_lock(ULONG *lock);
    ;

    public spin_lock
    spin_lock:
          mov ecx, [esp + 4]
          xor edx, edx

    lock_xchg:
          mov eax, 1
          xchg eax, dword ptr [ecx]
          or eax, eax
          jz lock_acquired

    spin_test:
          inc edx
          cmp edx, 1FFFFFFFh
          jg deadlock_detected

          cmp dword ptr [ecx], 0
          jnz spin_test
          jmp lock_xchg

    lock_acquired:
          ret

    deadlock_detected:
          push OFFSET spin_lock_deadlock
          call panic
          add esp, 4
          xor eax, eax
          ret
    ;
    ; void spin_unlock(ULONG *lock);
    ;

    public spin_unlock
    spin_unlock:
          mov eax, dword ptr [esp + 4]
          mov dword ptr [eax], 0

    spin_unlock_exit:
          ret
    ;
    ; void spin_try_lock(ULONG *lock);
    ;

    public spin_try_lock
    spin_try_lock:
          mov ecx, [esp + 4]

          cmp dword ptr [ecx], 0
          jnz lock_busy

          mov eax, 1
          xchg eax, dword ptr [ecx]
          or eax, eax
          jnz lock_busy

    lock_obtained:
          mov eax, 1
          ret

    lock_busy:
          xor eax, eax
          ret

    I am at present now using this code (sorry about the MASM'ish structure
    -- gcc has problems with it, but TASM does not and can output COFF and
    ELF format) since the spinlock code in Linux does not seem to work
    properly with external kernel modules. I have not reviewed kernel
    binaries images to determine if the kernel proper has this problem, but
    would strongly advise someone go and look at it and see.

    Please advise.

    Jeff Merkey
    CEO, TRG

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Fri Jun 30 2000 - 18:38:45 EDT