Re: test13-pre6 (Fork Bug with Athlons? Temporary Fix)

From: Byron Stanoszek (gandalf@winds.org)
Date: Fri Dec 29 2000 - 22:08:49 EST

  • Next message: Matti Aarnio: "Re: memmove broken on alpha - was Re: NFS oddity (2.4.0test13pre4ac2 server, 2.0.36/2.2.14 clients)"

    On Fri, 29 Dec 2000, Linus Torvalds wrote:

    >
    > Ok, there's a test13-pre6 out there now, which does a partial sync with
    > Alan, in addition to hopefully fixing the innd shared mapping writeback
    > problem for good. Thanks to Marcelo Tosatti and others..

    I've been noticing a problem with the memory context switching conflicting with
    fork() on my Athlon. The problem began in the test13-pre2 patch, and because
    nobody else has seen this problem (or otherwise reported it) since then, I
    felt I should look into it a little further.

    I narrowed the problem down to a subset of patches from the MM set in
    test13-pre2. Reversing the attached 'context.patch' fixes the problem (only for
    i386), but I'm not yet sure why. test13-pre2 and up work without any problems
    on an Intel cpu (Pentium 180 & P3 800 tested).

    Anyways, I can't seem to find out what really changes with the patch except for
    the obvious 'void *segment' changing into a typedef-struct. The only thing I
    can think of is that the compiler decodes it differently, but I think I can
    safely rule that out. I tried both 2.91.66 and 2.95.2, using both different
    types of parameters for P5 & K7 (-march=i586 & -march=i686 -malign-functions=4)
    and it still gives the problem on the Athlon. Maybe there's something I've
    overlooked in that attached patch. Request for an extra pair of eyes please. :)

    Here are the casual symptoms. The parent seems to die as soon as a forked child
    exits, which seems to me that a new LDT isn't being initialized correctly:

    root:~> ps -aux
    USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
    root 1 1.1 0.4 1228 532 ? S 21:42 0:05 init [3]
    root 2 0.0 0.0 0 0 ? SW 21:42 0:00 [keventd]
    root 3 0.0 0.0 0 0 ? SW 21:42 0:00 [kswapd]
    root 4 0.0 0.0 0 0 ? SW 21:42 0:00 [kreclaimd]
    root 5 0.0 0.0 0 0 ? SW 21:42 0:00 [bdflush]
    root 6 0.0 0.0 0 0 ? SW 21:42 0:00 [kupdate]
    root 289 0.0 0.4 1284 604 ? S 21:42 0:00 syslogd -m 0
    root 299 0.0 0.8 1912 1104 ? S 21:42 0:00 klogd
    root 351 0.0 1.2 9292 1576 ? S 21:42 0:00 named
    root 361 0.0 0.0 0 0 ? Z 21:42 0:00 [named <defunct>]
    root 363 0.0 1.2 9292 1576 ? S 21:42 0:00 named
    root 364 0.0 1.2 9292 1576 ? S 21:42 0:00 named
    root 365 0.0 0.7 2064 936 ? S 21:42 0:00 /usr/sbin/sshd
    ..etc
    (Note PID 361)

    root:~> strace nslookup sunsite.unc.edu
     :
     :
    rt_sigaction(SIGINT, {0x4003ce78, ~[], 0x4000000}, NULL, 8) = 0
    rt_sigaction(SIGTERM, {0x4003ce78, ~[], 0x4000000}, NULL, 8) = 0
    rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
    rt_sigaction(SIGHUP, {SIG_DFL}, NULL, 8) = 0
    rt_sigprocmask(SIG_BLOCK, [HUP INT TERM], NULL, 8) = 0
    getpid() = 2615
    socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    close(3) = 0
    socket(PF_INET6, SOCK_STREAM, 0) = -1 ENOSYS (Function not implemented)
    socket(PF_INET6, SOCK_STREAM, 0) = -1 ENOSYS (Function not implemented)
    socket(PF_INET6, SOCK_STREAM, 0) = -1 EAFNOSUPPORT (Address family not supported by protocol)--- SIGSEGV (Segmentation fault) ---
    +++ killed by SIGSEGV +++

    ---Example parent/child process:

    root:~> tar -xzvvf ../pkgs/zgv-5.2.tar.gz
     :
     :
    -rw------- rus/users 1356 2000-06-01 11:46:57 zgv-5.2/INSTALL
    -rw------- rus/users 17976 1994-08-23 16:09:05 zgv-5.2/COPYING
    -rw------- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
    -rw------- rus/users 120 2000-04-22 22:46:49 zgv-5.2/AUTHORS
    -rw------- rus/users 3714 2000-01-23 16:29:40 zgv-5.2/SECURITY
    Segmentation fault (core dumped)

    root:~> strace tar -xzvvf ../pkgs/zgv-5.2.tar.gz
     :
     :
    open("zgv-5.2/COPYING", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
    write(4, "\t\t GNU GENERAL PUBLIC LICENSE"..., 9728) = 9728
    read(3, "ccept this License. Therefore, "..., 10240) = 10240
    write(4, "ccept this License. Therefore, "..., 8248) = 8248
    close(4) = 0
    utime("zgv-5.2/COPYING", [2000/12/29-20:21:16, 1994/08/23-16:09:05]) = 0
    chown32("zgv-5.2/COPYING", 500, 100) = 0
    write(1, "-rw------- rus/users 1077 1"..., 72-rw------- rus/users 1077 1998-08-26 09:24:31 zgv-5.2/README.fonts
    ) = 72
    open("zgv-5.2/README.fonts", O_WRONLY|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 4
    write(4, "The copyright for *.bdf (taken f"..., 1024) = 1024
    read(3, "\"as\nis\" without express or impli"..., 10240) = 8192
    --- SIGCHLD (Child exited) ---
    --- SIGSEGV (Segmentation fault) ---
    +++ killed by SIGSEGV +++

    Ideas, anyone?

     -Byron

    -- 
    Byron Stanoszek                         Ph: (330) 644-3059
    Systems Programmer                      Fax: (330) 644-8110
    Commercial Timesharing Inc.             Email: bstanoszek@comtime.com
    


    - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Fri Dec 29 2000 - 22:47:46 EST