Re: 3c59x NIC overruns with multicast makes networking freeze

From: Arnaud Installe (arnaud@bach.tvd.kotnet.org)
Date: Sat Sep 23 2000 - 07:26:43 EDT

  • Next message: Daniel Phillips: "Re: Tailmerging for Ext2 - release 0.0"

    The description of the NICs in question as found in
    /etc/sysconfig/hwconf
    are:

    3Com Corporation|3c900B-TPO [Etherlink XL TPO]
    3Com Corporation|3c905C-TX [Fast Etherlink]

    In one machine it's the first which is getting lots of overruns; in the
    other it's the second. Depends on which one is connected to the network
    with 60% usage.
     The machines are dual-processor capable, but only a few of them have 2
    processors inside. Yes, they're using standard IDE drives. (I've been
    telling the people at my company we *need* SCSI drives forever... :-P)
     I don't think the 3c95x driver uses promiscuous mode for multicasting, or
    does it ?
     BTW I'm not sure I understand Documentation/networking/multicast.txt.
    (Is it still up-to-date ?) I suppose IP-MRoute, and thus AllMulti's only
    relevant for routers while IP-Multicast is relevant for nodes transmitting
    and receiving multicast. So that means for nodes "hardware filters really
    help", so we really shouldn't be using 3c59x-drivers, right ? Am I
    correct that Intel EtherExpress Pro 10/100 cards would be a better choice ?

    On Sat, Sep 23, 2000 at 03:23:45PM +1100, Andrew Morton wrote:
    > Arnaud Installe wrote:
    > >
    > > Hi,
    > >
    > > Has anyone else seen a lot of overruns while serving multicast on a pretty
    > > loaded (60%) network, with 3c59x cards ?
    >
    > This is the first I've heard of it. Which sort of 3com card?
    >
    > > (BTW What exactly are
    > > "overruns" ? Are they buffer overflows on the NIC side or buffer
    > > overflows on the kernel side, because the software can't follow, or even
    > > something completely different ?)
    >
    > The NIC has an 8 kbyte on-board FIFO, some of which (3-5k) is used for
    > buffering incoming data.
    >
    > An Rx overrun occurs when that on-board FIFO has filled up, and more
    > data needs to go into it. It shouldn't fill up. This can be caused by
    > excessive PCI traffic, or by the software not being able to feed the NIC
    > new Rx buffers fast enough, or by the driver leaving the Rx engine in
    > the stalled state for too long.

    Do I understand correctly that those Rx buffers are DMA buffers provided
    by the kernel ?

    > You should try the 2.2.17 driver. There's a copy at
    > http://www.uow.edu.au/~andrewm/linux/3c59x-2.2.17+.gz

    So it's included in linux-2.2.17.tar.gz, is it ? Is it also included in
    any of the 2.4.0-test kernels ?

    > If it still happens (and it probably will) there are some things you can
    > look at:
    >
    > - Get a faster computer! How quick is it?

    Pentium III 450MHz dual-processor. I thought there might be some locking
    problem with the SMP code, so I installed a single-processor kernel.
    Problem didn't go away though.

    > - Change the driver by doubling the value of RX_RING_SIZE. Keep it a
    > power of two. This is a bit desperate, but it will give you some more
    > elasticity.

    What's the RX_RING_SIZE used for ?

    > - Make sure the NIC isn't sharing an interrupt with another device.
    > Look at /proc/interrupts and if it's shared, have a poke in the BIOS.

    Both NICs have their own IRQ.

    > - It's also possible that some other device exists on your system which
    > has a slow interrupt service routine, and is interrupting the 3c59x ISR
    > for a long time while the Rx engine is stalled. This is a pretty
    > unlikely scenario. Disabling interrupts around the call to
    > boomerang_rx() would be slightly interesting.

    I suppose IDE drives have slow ISRs ? Does unmasking the IRQs for hd's
    work reliably with most of today's IDE drives and controllers ? The IDE
    controller is an "Intel Corporation|82371AB PIIX4 IDE".

    > > After a while the network freezes. I can't ping nor ssh to the machine.
    > > Restarting the network scripts solves the problem.
    >
    > There was a bug in the driver which was fixed 6-8 weeks ago. If the
    > driver's Rx path completely runs out of memory and cannot allocate a new
    > buffer 16 times in a row, the receive path gets stuck and the interface
    > needs to be taken down. Moving to the 2.2.17 driver should eliminate
    > this possibility.

    Hmmm... I suppose when this happens it's normal a lot of overruns occur ?
    Upgrading the driver will make the networking stable, but I suppose that
    won't prevent a lot of multicast UDP packets from getting dropped, or will
    it ? (Though unmasking the hd IRQ might.)

    > You should also try
    >
    > echo "512 1024 1536" > /proc/sys/vm/freepages
    >
    > to double the amount of memory which is reserved for drivers and such.
    > This change put a big smile on the face of a 2.2.12 user earlier this
    > week. (He had 512 megs and a reasonable, but heavy workload. I think
    > we're setting this too low).

    Ok.

    > Please let me know the outcome.

    I sure will. Thanks,

                                                            Arnaud

    -- 
    Arnaud Installe                                             a.installe@ieee.org
    

    Administration: An ingenious abstraction in politics, designed to receive the kicks and cuffs due to the premier or president. -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 07:37:09 EDT