Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document mlogin session termination conditions #315

Closed
davepacheco opened this issue Jul 24, 2017 · 3 comments
Closed

document mlogin session termination conditions #315

davepacheco opened this issue Jul 24, 2017 · 3 comments

Comments

@davepacheco
Copy link
Contributor

We were surprised to find that if I backgrounded an infinite loop process in an mlogin session and then exited my shell, the mlogin job continued to run. It's stuck here, not having exited:

$ mlogin
 * created interactive job -- c9c2f7ec-b546-ed22-ad0c-cde30d6feecb
 * waiting for session... | established
dap@manta # while :; do :; done & 
[1] 74678
dap@manta # exit

When we looked at one example, the process tree for the agent looked like this:

  78813 /sbin/init
    79651 /opt/marlin/build/node/bin/node --abort-on-uncaught-exception /opt/ma
      3807  ./node lib/agent.js

There's no child process of 3807, and that process appears to be doing basically nothing. It's in portfs, as we'd expect for an idle Node program. The last entries in the log are:

[2017-07-22T00:07:16.695Z]  INFO: medusa-agent/3807 on dc5bfc46-d914-45b5-bafc-e713ef336f61: start (type=start, cwd=/, term=xterm-256color, columns=85, lines=80, command=/usr/bin/ctrun)
    arguments: [
      "-i",
      "core",
      "-l",
      "child",
      "/bin/bash",
      "--norc"
    ]
[2017-07-22T00:07:16.722Z]  INFO: medusa-agent/3874 on dc5bfc46-d914-45b5-bafc-e713ef336f61: started child process

We did not notice at first that the background bash process is still running, but it's been reparented to init:

root@MS10209 (us-east-1) ~]# ptree 78813
78353 zsched
  78813 /sbin/init
    78831 /lib/svc/bin/svc.startd
      78996 sulogin
        78997 <defunct>
    78839 /lib/svc/bin/svc.configd
    78945 /lib/inet/ipmgmtd
    79037 /usr/sbin/nscd
    79378 /usr/sbin/syslogd
    79651 /opt/marlin/build/node/bin/node --abort-on-uncaught-exception /opt/marlin/
      3807  ./node lib/agent.js
    10821 /bin/bash --norc

It's pid 10821:

[root@MS10209 (us-east-1) ~]# prstat -mLc -p 10821
Please wait...
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 10821 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.2   0 438   0   0 bash/1
Total: 1 processes, 1 lwps, load averages: 15.26, 12.57, 11.30
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 10821 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.0   0 352   0   0 bash/1
Total: 1 processes, 1 lwps, load averages: 15.01, 12.56, 11.30

Medusa is using pty.js here, and pty.js does not report the child having exited until it also gets a 'close' event on the pty. In this case, we have a child process that still has the pty open, so that's never going to happen. We initially worried this was a regression from #244, but it's not. It happens reliably with previous versions as well.

Marlin's termination condition is just that the top level process exited. It's a reasonable question whether mlogin should have similar semantics. After #244, we could achieve that pretty easily by adding the "-o noorphan" flag to ctrun, which is essentially what Marlin itself does. You can test this using mlogin's -c option, and it does cause this example to terminate immediately when you exit the shell.

@jclulow points out that one may want the mlogin session to continue in this case (e.g., if you've forked a child subshell and killed the parent one). With mlogin it seems better to err on the side of keeping the session open, since terminating it unexpectedly is usually frustrating for users. If you manage to run into this, you can always terminate the session by cancelling the job.

So the conclusion is that we should just document this behavior.

@davepacheco
Copy link
Contributor Author

Proposed change: https://cr.joyent.us/#/c/2255/

@davepacheco
Copy link
Contributor Author

The proposed change is make prepush clean aside from the known MPU test failures against a non-MPU-supporting Manta.

@davepacheco
Copy link
Contributor Author

I've submitted PS2, which was a non-trivial rebase. On this patchset, the manual page looks normal in my terminal and the change is make prepush clean. The MPU tests were skipped on this deployment, but I don't think there's any risk to those in this change.

joyent-automation pushed a commit that referenced this issue Dec 12, 2017
Reviewed by: Joshua M. Clulow <jmc@joyent.com>
Approved by: Joshua M. Clulow <jmc@joyent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant