Re: Slow TCP connection between linux and wince

From: Aki M Laukkanen (amlaukka@cc.helsinki.fi)
Date: Mon Jun 05 2000 - 12:09:51 EDT

  • Next message: Hans Reiser: "Re: (reiserfs) Re: New Linux 2.5 - 2.6 TODO (Alan Cox suggestsdelaying reiserfs integration)"

    On Sat, 3 Jun 2000 kuznet@ms2.inr.ac.ru wrote:
    > Probably, receiver delays each ACK by 500msec. I have no idea
    > why it does this, because it is apparently illegal behaviour.
    > Look:

    Yes, it looks that way and I got the acks mixed up.

    > for these 5 seconds. Probably, your sender selected illegal sndbuf,
    > which is much less than receiver window.

    The application in question was ttcp. And your guess is quite close to
    truth. I should've known what was the cause since we've been hit with the
    exact same thing already (albeit on the receive side).

    tcp_snd_test: tail: 1, packets_in_flight: 95, snd_cwnd: 95
    end_seq: -685618378, snd_una: -685641578, snd_wnd: 32488

    This is debugging output at the time the delay begins and looks perfectly
    reasonable.

    tcp_snd_test: tail: 1, packets_in_flight: 75, snd_cwnd: 116
    end_seq: -685618190, snd_una: -685636454, snd_wnd: 32488

    And after the delay - in the between no calls were made to tcp_snd_test.
    So the real culprit is this test:

            if (sock_wspace(sk) >= tcp_min_write_space(sk) &&
                (sock = sk->socket) != NULL) {

    The key here is that mss was 256 - sock_wspace() is:

                    amt = sk->sndbuf - atomic_read(&sk->wmem_alloc);

    wmem_alloc contains the sum of skb->truesize fields. If we do not
    take into account the overhead from struct (sk_buff) and aligning this
    test would be circa:

            65536-32767 >= 32767

    when the sender is made to sleep so the two numbers are almost equal.
    However with MTU of 296 (as given to pppd) it is:

    tcp_new_space: wspace: 10435, write_space: 27550
    tcp_new_space: wspace: 11015, write_space: 27260
    tcp_new_space: wspace: 11595, write_space: 26970
    tcp_new_space: wspace: 12175, write_space: 26680
    ... acks flow in ...
    tcp_new_space: wspace: 19715, write_space: 22910
    tcp_new_space: wspace: 20295, write_space: 22620
    tcp_new_space: wspace: 20875, write_space: 22330
    tcp_new_space: wspace: 21455, write_space: 22040

    Hmm, 11015-10435=580 - it'd make sense if there were two skbs allocated
    for each segment? Oh, I see skb_clone() in tcp_send_skb, right?

    The disparity between this test and the available send window is the
    cause of the bursts. Also explained is why the over-scheduling masked
    this behaviour. Following patch changes wmem_alloc to only include
    the actual data and it seems to work. This is a hackish approach at
    best though.

    diff -urN --exclude=*~ linux-2.4.0-test1-ac6.bak/net/ipv4/tcp.c linux-2.4.0-test1-ac6/net/ipv4/tcp.c
    --- linux-2.4.0-test1-ac6.bak/net/ipv4/tcp.c Mon Apr 24 23:59:57 2000
    +++ linux-2.4.0-test1-ac6/net/ipv4/tcp.c Mon Jun 5 18:48:59 2000
    @@ -960,6 +960,7 @@
                                     skb = alloc_skb(tmp, GFP_KERNEL);
                                     if (skb == NULL)
                                             goto do_oom;
    + skb->truesize = copy;
                                     skb_set_owner_w(skb, sk);
                             } else {
                                     /* If we didn't get any memory, we need to sleep. */

    Our second problem with this disparity is on the receive side. The scenario
    is essentially the same but with an unreliable link (read wireless) which
    drops packets. In case of packet drop receiver keeps building an
    out-of-order queue which grows to the limit of the receive buffer
    quite quickly. However sender keeps sending more because of the difference
    between advertised window and the actual allocated space. This triggers
    tcp_input.c:prune_queue() which purges the whole out-of-order queue to
    free up space, thus killing the TCP performance quite effectively.

    The fix in our internal use is similar to the rmem_alloc case. I do think
    both of these situations are quite valid. I am not so sure about the correct
    fix though.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Mon Jun 05 2000 - 13:39:34 EDT