Re: [patch] string-486.h modified

From: Richard B. Johnson (root@chaos.analogic.com)
Date: Thu Aug 31 2000 - 10:41:28 EDT

  • Next message: Alexander Viro: "Re: SCO: "thread creation is about a thousand times faster than on"

    On Thu, 31 Aug 2000, Petko Manolov wrote:

    > Hi to all,
    >
    > I made this patch as some people request using
    > 486 optimized string routines for older
    > (486 and 586) machines.
    >

    With intel processors, the 'rep' before an instruction will not
    execute that instruction if ecx is already zero. You do not
    have to test. Also, a jump is often much more harmful in instruction
    time than straight-through instruction. For instance, the fastest
    486 code for an unaligned copy is:

            movl SRC(%esp), %esi
            movl DST(%esp), %edi
            movl CNT(%esp), %ecx
            shrl $1,%ecx
            rep movsw
            adcl %ecx,%ecx
            rep movsb

    If it's longword aligned, i.e., both source and destination addresss
    are clear in their low two bits, moving longwords through the edx
    register, with eax and ebx being the index registers, is faster, even with
    a beginning test for longword size.

            movl SRC(%esp), %eax
            movl DST(%esp), %ebx
            movl CNT(%esp), %ecx
            testl $3, %ecx
            jz 2f
            shrl $2, %ecx # long words CY set if an extra word
    1: movl (%eax), %edx # Do NOT touch EAX in the next instruction
            movl %edx, (%ebx) # Do NOT touch EBX in the next instruction
            leal 4(%eax), %eax # Adjust EAX index now
            leal 4(%ebx), %ebx # Adjust EBX index now
            decl %ecx # does not change CY
            jnz 1b
     
    2:

    To be able to run some instructions in parallel, you have to follow the
    idea shown in the above comments, i.e., don't touch an index register
    in the instructions immediately following its use to address memory.

    This will allow the memory access to occur during the parallel execution
    of the next instruction(s).

    The decl %ecx should be put BETWEEN the two `leal` instructions so that
    the address calculation can occur in parallel with the register operation.
    LEA does not affect the flags. In the example above I didn't do this
    because it makes the code unclear.

    Various registers used as index registers are not all the same. Register
    EAX was not an index register in i386 machines. It became one in i486
    machines. It is faster to use (%eax) than (%ebx).

    Cheers,
    Dick Johnson

    Penguin : Linux version 2.2.15 on an i686 machine (797.90 BogoMips).

    "Memory is like gasoline. You use it up when you are running. Of
    course you get it all back when you reboot..."; Actual explanation
    obtained from the Micro$oft help desk.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 10:34:42 EDT