Re: [RFC] Generic deferred file writing

From: Eric W. Biederman (ebiederman@uswest.net)
Date: Sat Dec 30 2000 - 17:00:43 EST

  • Next message: Linus Torvalds: "Re: [RFC] Generic deferred file writing"

    Linus Torvalds <torvalds@transmeta.com> writes:

    > In short, I don't see _those_ kinds of issues. I do see error reporting as
    > a major issue, though. If we need to do proper low-level block allocation
    > in order to get correct ENOSPC handling, then the win from doing deferred
    > writes is not very big.

    To get ENOSPC handling 99% correct all we need to do is decrement a counter,
    that remembers how many disks blocks are free. If we need a better
    estimate than just the data blocks it should not be hard to add an
    extra callback to the filesystem.

    There look to be some interesting cases to handle when we fill up a
    filesystem. Before actually failing and returning ENOSPC the
    filesystem might want to fsync itself. And see how correct it's
    estimates were. But that is the rare case and shouldn't affect
    performance.

    <rant>
    In the long term VFS support for deferred writes looks like a major
    win. Knowing how large a file is before we write it to disk allows
    very efficient disk organization, and fast file access (esp combined
    with an extent based fs). Support for compressing files in real time
    falls out naturally. Support for filesystems maintain coherency by
    never writing the same block back to the same disk location also
    appears.
    </rant>

    One other thing to think about for the VFS/MM layer is limiting the
    total number of dirty pages in the system (to what disk pressure shows
    the disk can handle), to keep system performance smooth when swapping.
    All cases except mmaped files are easy, and they can be handled by a
    modified page fault handler that directly puts the dirty bit on the
    struct page. (Except that is buggy with respect to clearing the dirty
    bit on the struct page.) In reality we would have to create a queue
    of pointers to dirty pte's from the page fault handler and depending
    on a timer or memory pressure move the dirty bits to the actual page.

    Combined with the code to make sync and fsync to work on the page
    cache we msync would be obsolete?

    Of course the most important part is that when all of that is
    working, the VFS/MM layer it would be perfect. World domination
    would be achieved. For who would be caught using an OS with an
    imperfect VFS layer :)

    Eric

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat Dec 30 2000 - 17:45:33 EST