Re: spin lock contention: lru_list lock

From: David S. Miller (davem@redhat.com)
Date: Sat Feb 19 2000 - 11:42:32 EST

  • Next message: David Woodhouse: "Re: TCP stuck (RH 2.2.12-20 <-> 2.2.13)"

       Date: Sat, 19 Feb 2000 17:17:01 +0100
       From: Manfred Spraul <manfreds@colorfullife.com>

       Yes, the problem is that gtcd polls /dev/cdrom, and that every close()
       call calls __invalidate_buffers(). __invalidate_buffers() must walk the
       complete buffer chain, and as I wrote the buffer chain contained ~
       100000 entries.

       Either we ignore the problem, or we could add a per-kdev_t linked list.

    It's very a difficult issue, and I fully investigated this when
    I initially threaded the buffer cache layer. The summary of the
    difficulty can be discovered by just examining what
    try_to_free_buffers needs to do.

    It is clearly the case that buffers can be accessed and found
    via several tables. The hash tables in block/kdev keyed searches, the
    lru lists for dirtying and buffer flushing/unlocking, and the free
    lists when creating new buffers. Finally, the unused list is accessed
    for page I/O tagging buffer heads.

    try_to_free_buffers is just given an arbitrary page, we are not told
    explicitly whether these buffers are real true buffer heads, or just
    I/O tags for an entry in the page cache. (We can find out, this is
    why I am describing all of this detail).

    So the code there is super conservative and just locks out all the
    entry points to possibly finding a buffer anywhere, in any table.
    Without question, this scheme is much easier to verify than trying
    to fine grain it.

    This function must return with a status which represents the
    disposition of the buffers after trying to free them. It would
    be dangerous to do the locking on each buffer individually. For
    example, if we were to scan and lock based upon where each individual
    buffer is, try to free that one, and redo the locking for the next
    buffer we try to free up, older buffers we dealt with can reappear.

    In my initial threading attempts, I wished to experiment with the
    effects of per-hashchain locks in the buffer cache hash table. This
    has potential good scalability properties, however the issues
    surrounding correct operation of try_to_free_buffers(), as described
    above, prevented me from doing this.

    I do not mean to imply that relaxing the locks here is impossible.

    The problem which would need to be solved to allow this would be to
    somehow "grab" or "freeze" all buffer heads attached to a page, even
    if they are active. It can be done, but it would be tricky to get
    all of the semantics correct everywhere.

    Once this would be accomplished, another attempt could be made to
    use per-hashchain and per-lru-list locking to hopefully reduce
    contention.

    I know this would not fix properly the case you mention, but it does
    demonstrate why adjusting the locking strategy would be a bit
    difficult.

       I have another problem with flush_dirty_buffers():

       flush_dirty_buffers() quits as soon as it finds the first buffer where
       b_flushtime has not yet expired.

       I can't find the code that enforced this chronological ordering, esp.
       since the time until a buffer must be written to disk if different for
       metadata blocks and normal blocks.
       [just grep for b_flushtime, there is no sort algorithm]

    If age_super and ag_buffer tunables were identical, and refile_buffer
    would always add to the tail of the lru list, it might be correct.

    However, of course, neither of these two things are true.
    So I believe your analysis is correct :-)

    I wonder how much it really matters...

    Later,
    David S. Miller
    davem@redhat.com

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat Feb 19 2000 - 11:50:37 EST