Re: Problem: shell prompt doesn't return although the invoked program calls _exit().

From: Ishikawa (ishikawa@yk.rim.or.jp)
Date: Sat May 20 2000 - 10:49:57 EDT

  • Next message: Martin Mares: "Re: New PCI IRQ routing code for pre9-2"

    I could produce a short program that reproduces the symptom.

    Ishikawa wrote:

    > Hi,
    >
    > I am writing this message to a few e-mail aliases.
    >
    > This is because I could not figure out what is the cause of the problem.
    >
    > Does anyone have an idea what causes this problem?
    >
    > Observed Platform.
    > Debian Gnu/Linux 2.2.14, 2.2.15, 2.2.16pre2
    >
    > Observed problem.
    >
    > A particular program, called `prog' in the following,
    > invoked like the following manner from shell command line doesn't return
    >
    > to the shell prompt. It never returns.
    >
    > ./prog -q < inputfile > outputfile 2>&1

    Hi,

    I could produce a short version of the C program that showed
    the symptom.
    Looks to me there is a problem in handling opened tty ports when
    _exit() is called.

    Bash doesn't seem to be the cause of the problem. I am CC:ing to bug-bash
    to
    let you know this.

    The C code is about 285 lines of code without the lengthy beginning comment
    which
    I attach below.
    Please drop me a line if you need to take a look at the source code for
    debugging, curiosity, etc..

    My suggested fix:
    I think kernel ought to close the tty ports forcibly if closing is
    requested from within _exit(). [And losing the
    written data which still lay in the buffer. But this is the
    program's intention. Exiting cleanly is probably more important concern
    here. ]

    Happy Hacking!

    Chiaki Ishikawa

    --- begin quote ---

    *
     *
     * $Id: newtest.c,v 1.1 2000/05/20 14:38:32 ishikawa Exp ishikawa $
     *
     * You need to have two UNUSED serial ports.
     * You must not connect anything to it.
     *
     * (Actually you can connect the two ports with
     * a cross cable (null-modem cable) and the
     * the resulting symptom, that the calling shell
     * does not return to the shell prompt, doesn't seem
     * to appear even on Linux !!!
     * On solaris, the (original) program exits cleanly,
     * without such a cable.
     *
     * Overview:
     *
     * This program opens the two serial ports for
     * read/write. (/dev/ttyS[01]. You need to make these
     * world-read/writable if this program is run from normal
     * user account.)
     *
     * This program, then, sets the termios characteristics of the
     * serial ports for 8bit, even parity, one stop bit
     * in the raw mode ( no processing at all.).
     * Flow control is hardware control, etc..
     *
     * Then it enters a loop.
     * For each loop step,
     * it calls usleep to sleep for a short period of time.
     *
     * usleep() is a library function. It calls nanosleep, a system
     * function. My reference to nanosleep() in previous postings might have
     * been a little confusing. The source code doesn't mention nanosleep() at
     * all. If you need to screen the verbose trace from strace, then you
     * need to say something like, strace -e trace=\!nanosleep,read -p PID.
     *
     * Back to the description of the program.
     * Then it calls time() to check the wall clock.
     * At each of these iterations, it
     * tries to read a character from each of the port (if any.).
     * The read() wouldn't block since termios has been set up in such a
     * way, read returns immediately if no character is available, and
     * returns min(available chars, requested chars) if any.
     * It also writes one byte to the port.
     * It then updates the notion of the relative time since
     * the beginning of the program invocation. If one second
     * has passed since the last update, it prints the duration (sec).
     *
     * This iteration is repeated for two minutes and
     * the program calls exit(0).
     *
     * The problem symptom is this:
     * Here after the program calls exit(0),
     * the calling shell prompt doesn't return if no cross cable is
     * connected between the serial ports when this program is executed.
     * This happens on Linux.
     * This problem doesn't happen on Solaris 7 for x86.
     *
     * On linux, ps output shows something like this: Note that
     * newtest appears inside a pair of "[]".
     * I am running the shell inside Emacs shell buffer.
     *
     * 378 ttyp3 S 0:00 /bin/bash -i
     * 582 ttyp3 SW 0:00 [newtest] <--- here!
     * 584 ttyp1 R 0:00 ps axg
     *
     * ps axglw showed
     *
     * 000 1001 378 351 0 0 2448 1280 wait4 S ttyp3 0:00
    /bin/bash -i
     * 004 1001 582 378 0 0 0 0 tty_wa SW ttyp3 0:00
    [newtest]
     *
     * At this stage, the output from the program is like this, and
     * the shell prompt has not returned yet.
     * ...., 119, 120, 120 sec. quitting...

     * By monitoring the system calls executed by this program using
     * strace, I know that _exit(0) has been called by then.
     *
     * After a lot of experimenting, I have found out that
     * if the two ports are connected via cross cable, the
     * shell prompt returns(!).
     * That I found no problem back in early April and March was
     * probably I had cable hooked up to these ports back then.
     *
     * But again, the problem didn't happen on Solaris 7 for x86 (without
     * any cable at all).
     * For solaris, you need to change the name of the tty device.
     *
     * I am not sure what the "tty_wa" in the "ps axglw" output means.
     * Waiting for something?
     * But, since _exit(0) by means of exit(0) has been called,
     * shouldn't the process exit immediately and SIGCHLD be
     * passed to the parent immediately, too?
     *
     * From, linux man page for _exit(2)
     * --- begin quote ---
     * DESCRIPTION
     * _exit terminates the calling process immediately. Any open
     * file descriptors belonging to the process are closed; any
     * children of the process are inherited by process 1, init,
     * and the process's parent is sent a SIGCHLD signal.
     *
     * status is returned to the parent process as the process's
     * exit status, and can be collected using one of the wait
     * family of calls.
     * --- end quote ---
     *
     * (OK, I see there must be a problem in
     * closing of the file descriptors for ttys? Hmm... )
     * Shouldn't we forcibly close the tty in this case when _exit()
     * request such actions?
     *
     * [ This program is a very shortened version of
     * a program to explain the event-driven programming, in
     * which an event-type is the arrival of a certain packet
     * from a device connected to serial port.
     * The intention was to produce a skelton code that can
     * be shown to programmers who might later need to port
     * the skelton code to DOS(aga!), very simple embedded OS, and
     * other OSs. (No select call, for example, for portability
     * reasons. )
     * ]
     */

    --- end quote ---

    PS: writing to the serial port (unconnected) seems to trigger the problem.
    The data presumably lay waiting in the buffer associated with the serial
    line.
    I tested the above program on linux 2.2.16pre3 (Alan Cox's pre-patch didn't
    update
    the uname -a output: it still says 2.2.16pre2 when in fact it is pre3.)

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.rutgers.edu
    Please read the FAQ at http://www.tux.org/lkml/



    This archive was generated by hypermail 2b29 : Sat May 20 2000 - 10:55:02 EDT