[PATCH] Fast csum_partial_copy_generic and more

From: kumon@flab.fujitsu.co.jp
Date: Fri May 19 2000 - 23:50:38 EDT

  • Next message: willy@thepuffingroup.com: "Re: Dell PE2400, SCSI and Cyclom-Y"

    kumon@flab.fujitsu.co.jp writes:
    > Strictly speaking, this prefetch may read just after source regionn at
    > most 3 byte. But it never causes trouble, because this excessive area
    > and the last transfered byte reside in a same cache block.

    Sorry I mistook the relation beteen explanation and the patch version.
    The above comments is based on long-word prefetching, but what I
    actually posted is a byte prefetching version.

    In the posted version, the above comments is not useless.

    I had measured both version, the performance difference between
    long-word prefetching and byte prefetching is almost un-notisable
    level.

    According to the intel documents, byte access to a part of long-word
    register may cause partial-register stall, and I think better to use
    movl instead of movb. IMHO, this is applied to my case.

    Anyway, the following is architecturally better, ethically worse..

    + SRC(movl -32(%edx),%ebx) ; SRC(movl (%edx),%ebx)

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Fri May 19 2000 - 23:52:48 EDT