Re: [PATCH] Fast csum_partial_copy_generic and more

From: Artur Skawina (skawina@geocities.com)
Date: Fri May 19 2000 - 16:15:47 EDT

  • Next message: Andrew Stubbs: "[ERROR] Setup.c"

    kumon@flab.fujitsu.co.jp wrote:
    >
    > Unfortunately, AS version does not show a significant gain. If the
    > cache is hit,it may show some advantage. But unfortunately, in the

    quite possible. it seems, assuming your numbers are accurate, i gave
    up investigating the prefetching too early. it was pretty obvious
    that on a p3 the prefetch instructions would give a speedup, but
    i wasn't sure the dummy read overhead would be worth it on p2.

    [if anybody wants to play with prefetch, you could start by
     adding two "prefetch" insns to the top of the loop. As these
     should do the right thing, won't generate exceptions and can
     be trivially bypassed for older cpus i'd expect the results
     to be even more spectacular. I don't have a prefetch capable
     cpu to test this on however...)

    > Strictly speaking, this prefetch may read just after source regionn at
    > most 3 byte. But it never causes trouble, because this excessive area

    what you could do is to not use SRC(), but have a dummy exception
    handler. (yeah, this would solve Andrea's "buffer overflow" too ;)

    I'll play with the patch, try to reproduce your numbers, and see
    if merging both patches would be a win.
    It won't likely happen until after the weekend however.

    artur

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Fri May 19 2000 - 17:37:13 EDT