Re: Interesting analysis of linux kernel threading by IBM

From: J.D. Bakker (bakker@thorgal.et.tudelft.nl)
Date: Fri Jan 21 2000 - 21:59:22 EST

  • Next message: Lourdes Jones: "RE: Serious potential TCP-IP FLAW(was Re: kernel panics at google)"

    At 19:21 +0100 21-01-2000, Davide Libenzi wrote:
    >Ian Soboroff <ian@cs.umbc.edu> wrote :
    >> in the vast majority of cases, i suspect it's easier and probably
    >> better to redesign the app than redesign the scheduler. that said,
    >> the improvements already done are quite good and needed.
    >
    >why You all speak about worse designed multithreaded apps.
    >Every technology bad applied is worse !
    >Suppose You have a rendering job to be done.
    >It can be subdivided in a highly parallel system with distinct threads that
    >can run together :
    >
    >1) Viewing transformation
    >2) Triangulation
    >3) Scan conversion
    >4) Texturing
    >5) Illumination
    >6) Frame output
    >
    >just to keep it simple.
    >Now can someone tell me why I would not split my job into threads ?

    Oh, you can split your job in threads. But if you do it like that, you will
    get a performance hit; especially on SMP.

    As I see it, there are two ways to do multithreaded apps: one way (the one
    you show) which is easy for the programmer, and one way that yields high
    performance. Think about it: if you have multiple threads communicating in
    a pipeline through shared memory, you will have to bounce *lots* of data
    between processor caches. The scheduler has little or no impact on that.

    When I started multithreading our real-time video compressor, I used the
    model you sketched. Profiling on SMP machines showed soon that the system
    didn't scale; after reading the kernel source and some (IA32) processor
    manuals I saw The Light(tm). The key in getting high performance is
    avoiding cache line ping-pong and multiple threads writing to the same
    pages; two factors that would be ridiculous to try and handle in kernel
    space.

    The model that proved optimal is SIMD: splitting the image in multiple
    slices which are handled by CPU-bound worker threads. I now use as many
    worker threads as there are CPUs in the system, plus two higher priority
    I/O threads (and I'm trying to get rid of one of those). This buys me near
    linear speedup on a 4-CPU box where the old model would get 60% on a good
    day.

    <insert Larry McVoy-quote on processes/threads vs pasta/salt>

    Sincerely,

    Jan-Derk Bakker
    [who firmly believes that a long-term (say 1-hour) load average > the
    number of CPUs in your system tells you that (a) your userland needs
    re-thinking or (b) you are using too light a machine]

    --
    Jan-Derk Bakker, bakker@mmc.et.tudelft.nl
    

    The lazy man's proverb: 'There's no business like slow business !'

    - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat Jan 22 2000 - 19:23:03 EST