Re: Interesting analysis of linux kernel threading by IBM

From: Davide Libenzi (dlibenzi@maticad.it)
Date: Fri Jan 21 2000 - 19:40:02 EST

  • Next message: Perry The Cynic: "Re: Select and UDP problems"

    On Fri, 21 Jan 2000, Larry McVoy wrote:
    > Rendering is what we call an embarrassingly parallel application.
    > In other words, very, very coarse grained parallelism works great for
    > this, in fact, it works orders of magnitude better than what you descibed.
    > Talk to Disney, Pixar, ILM, RFX - all of whom are heavily into this space,
    > all of whom I've visited personally to talk about their computing needs,
    > and all of whom use farms of uniprocessors for rendering. There are
    > a bunch of other ones too, Digital Design, Pacific something (used to be
    > Walnut Creek now are in Palo Alto), etc. All the production and post
    > production digital houses know that farms of machines that share nothing
    > but a network are the highest performance and least cost way to do
    > rendering.

    Are You saying that N processes that run in N uniprocessor systems
    echanging data through network perform better than a single SMP N way system
    echanging data in memory due to the cache effects ( given the same
    software architecture ) ?

    > If you suggested a multithreaded application to do that to any of those
    > guys in a job interview, and stuck to your opinion that it was a good
    > idea, my predicition is that you would be standing on the street wondering
    > what happened in less than 5 minutes. Those people are doing hard work
    > on short schedules and and really don't have time to waste.

    I've not the luckiness You've to meet so interesting peoples so I can't figure
    out what they can say me.

    > I am starting to wonder if you've ever coded up an application both ways
    > and tested it. If you had tried the rendering model that you suggested
    > and then tried the same thing all in one process, I believe that your
    > way would show dramatically lower performance. It's been shown that
    > while the model of fine grained parallelism, especially in data parallel
    > applications like what you are talking about, while that model can be
    > supported, the cache effects of doing so on an SMP dramatically _REDUCE_
    > the performance. It's always been seen that you are better off to divide
    > up the data, do all the different transformations to a chunk of data by
    > one process on one processor in one cache, rather than by spreading the
    > same data over a bunch of caches. In fact, all the research in parallel
    > applications boils down to ``how much can you divide up the data''.
    > If there is so much focus on that, all of it performance related, why
    > is it that you believe something that certainly seems to fly the face
    > of both theory and practice?

    The rendering pipeline ( as the keyword state ) in an highly parallel
    environment in which a subsystem takes one type of data, transform it in a new
    kind of data, and pass the result to the next subsystem. This is true for a
    scanline renderer ( using shadow maps and environment mapping ) not for a
    raytracer. In this environment I'll espect ( You're right I've only coded
    single thread renderers ) that if I decompose the pipeline into N steps and
    I've an N way SMP system I'll get good performance. Where good does not mean
    TotalTime / N , but a time :

    (TotalTime / N) < T << TotalTime

    If even an highly parallel job like a renderer cannot be well coded in SMP,
    what we keep it for ?

    OK, probably the solution You push is clusters of SMPs.

    But recalling what I've asked You in head of this message, given a cluster of
    N computers having an M way SMP system and exchanging data through an
    ethernet, have You measured that ( cost apart ) a single M x N SMP system will
    perform ( scale ) less than the cluster ?
    I can't believe that cache effects are bigger that ethernet bottleneck.

    Unfortunately I don't have neither a Beowulf system nor a 32 SMP system to
    try my thoughts ( only a poor 2 way ).

    Davide.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat Jan 22 2000 - 18:52:07 EST