Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Executor interface for Run and Exec #1563

Merged
merged 4 commits into from
Jul 13, 2020

Conversation

coryb
Copy link
Collaborator

@coryb coryb commented Jul 9, 2020

First step for #749 refactor Executor interface to allow for Run (pid1) vs Exec'ing into running containers. Implements proposal outlined in: #749 (comment)

Signed-off-by: Cory Bennett <cbennett@netflix.com>
@coryb coryb force-pushed the executor-exec-interface branch from 2122ce9 to 6d58121 Compare July 9, 2020 23:40
Meta: executor.Meta{
Args: []string{"ps", "-o", "pid,comm"},
},
Stdout: &nopCloser{stdout},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a test that also does stdin?

require.NoError(t, err, fmt.Sprintf("stdout=%q, stderr=%q", stdout.String(), stderr.String()))
require.Equal(t, string(selfCmdline), stdout.String())
}

func TestRuncWorkerExec(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse the same tests for runc/containerd. Shared package with a test function that takes executor interface instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, will look into sharing some testing code to dedup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the test(s) itself. Don't try to dedupe the setup logic.


err = eg.Wait()
// we expect pid1 to get canceled after we test the exec
require.EqualError(t, errors.Cause(err), "context canceled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: context.Canceled.Error()

type ProcessInfo struct {
Meta Meta
Stdin io.ReadCloser
Stdout, Stderr io.WriteCloser
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess TODO in here is Resize <-chan Size. Can be next PR

@@ -310,6 +315,62 @@ func (w *runcExecutor) Exec(ctx context.Context, meta executor.Meta, root cache.
return nil
}

func (w *runcExecutor) Exec(ctx context.Context, id string, process executor.ProcessInfo) error {
// first verify the container is running, if we get an error assume the container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do some sync with go mutex/cond/channel in here first. Eg. detect that Run() has been called for the ID and it has not returned yet. I guess there is still a race when runc binary is being called that needs to be checked with retries but first check and reacting to pid1 returning would be fast then. Also, maybeRun() should take a channel that is closed to indicate started event? Ideally, when pid1 returns while this method is running all the errors returned in here would be about pid1 closing, and not for example that open fails because state file has been deleted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I struggled a bit with race conditions between pid1 and exec. I think a channel passed to Run seems reasonable. Since runc.Runc is blocking I can setup a short goroutine to poll for "running" state before sending on the channel to avoid the ~200ms delay between runc.Run being called and the container being running. I will try to rework it a bit to avoid the Exec retry loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

Run(id string, started chan<- struct{}) {
  done = make(chan error, 1)
  
  e.mu.Lock()
  e.m[id] = done
  defer {
    e.mu.Lock()
    delete(e.m, id)
    e.mu.Unlock()
    done <- err
  }()
  e.mu.Unlock()
  
  close(started)
}

Exec(id string) {
  e.mu.Lock()
  done, ok := e.m[id]
  if !ok {
    return error404
  }
  e.mu.Unlock()
  
  defer func() {
    select {
    case err := <-done:
      err = errors.Wrapf(err, "pid1 returned with error")
    default:
    }
  }()
}

?

I still think retry is needed as well because of the way we call runc without getting the create event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, I will update.

}
if len(process.Meta.Env) > 0 {
// merge exec env with pid1 env
spec.Process.Env = append(spec.Process.Env, process.Meta.Env...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm ... if PATH is set on pid1, and on a subsequent Exec we set just TERM, we expect the Exec to run without any PATH set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATH is kind-of special, eg. we have https://github.com/moby/buildkit/blob/master/solver/llbsolver/ops/exec.go#L728 and should probably do the same for exec. Do you see a case where exec would not know the env of the pid1 in case it wants to reuse some of its env and can't just resend it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that seems fine to me. I was just not sure if we expected the clients to resend the entire env on exec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems safer this way to let the client have the full control when they don't want the same env.

update run/exec tests for stdin and expected failures
move common tests for runc and container to shared tests package

Signed-off-by: Cory Bennett <cbennett@netflix.com>
@coryb
Copy link
Collaborator Author

coryb commented Jul 10, 2020

@tonistiigi thanks for the help with the review. I think I have addressed all your issues, hopefully I captured what you wanted.

Looks like one test failed:
https://travis-ci.org/github/moby/buildkit/jobs/707044698#L2818

Not sure if this could be related to my PR or if the tests are sometimes flaky.


resolvConf, err := oci.GetResolvConf(ctx, w.root, nil, w.dnsConfig)
if err != nil {
return err
return sendErr(done, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better if wrapped in a single defer so there is no risk that wrapping was forgotten.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can do that. I tend to avoid that pattern myself since I have always found the unintentional redefinition of err to be problematic and harder to find, ie:

func Exec(...) (err error) {
  defer func() {
    done <- err
  }()
  resolvConf, err := oci.GetResolvConf(ctx, w.root, nil, w.dnsConfig)
  if err != nil {
    return err
}

would send nil on done if GetResolveConf fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would send nil on done if GetResolveConf fails.

No, it does not :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, whoa, TIL. I tried it on play just to sanity check, I must have misinterpreted the results, so much for sanity :(
I redid the test and of course you are correct, it works fine: https://play.golang.org/p/TvbDMw6q5Z2 I would have sworn I ran into this issue a while ago, maybe older Go versions had this issue? Dunno, anyway, yeah that will make the change much easier :)

Copy link
Member

@tonistiigi tonistiigi Jul 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is what you were thinking https://play.golang.org/p/WJKt7EDGJYD

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, maybe I ran into something like you suggest. I tried on an old go1.2 runtime, it was fine there also. So I guess I have just been wrong for years, o'well, glad you set me straight :)

@coryb coryb force-pushed the executor-exec-interface branch from 4f2fe1a to 36636d3 Compare July 11, 2020 00:04
Signed-off-by: Cory Bennett <cbennett@netflix.com>
@coryb coryb force-pushed the executor-exec-interface branch from 36636d3 to 5909d16 Compare July 11, 2020 01:14
@coryb
Copy link
Collaborator Author

coryb commented Jul 11, 2020

I think I addressed all the issues for round2, let me know if you find more issues.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hinshun Do you want to verify this as well?

@hinshun
Copy link
Collaborator

hinshun commented Jul 11, 2020

@tonistiigi Yes, I'm out on a trip atm, but I'll give this a review tmrw morning.

Copy link
Collaborator

@hinshun hinshun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, small nits.


proc := spec.Process

// TODO how do we get rootfsPath for oci.GetUser in case user passed in username rather than uid:gid?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix this now? It looks like there is some existing code that deals with errors returned from oci.ParseUIDGID(meta.User) here:

uid, gid, err := oci.ParseUIDGID(meta.User)
if err != nil {
lm := snapshot.LocalMounterWithMounts(rootMounts)
rootfsPath, err := lm.Mount()
if err != nil {
return err
}
uid, gid, sgids, err = oci.GetUser(ctx, rootfsPath, meta.User)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem is that we need the rootfsPath to call oci.GetUser, which I could not figure out how to access via containerd apis. It was available when creating the container, but I could not figure out how to determine it after creation. We could track it in the same map where we track the done channel, then I could use it later at Exec time?

type Executor interface {
// TODO: add stdout/err
Exec(ctx context.Context, meta Meta, rootfs cache.Mountable, mounts []Mount, stdin io.ReadCloser, stdout, stderr io.WriteCloser) error
Run(ctx context.Context, id string, rootfs cache.Mountable, mounts []Mount, process ProcessInfo, started chan<- struct{}) error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some godocs and details about the usage of started?

Signed-off-by: Cory Bennett <cbennett@netflix.com>
@hinshun
Copy link
Collaborator

hinshun commented Jul 13, 2020

Only change is godocs, so I'll go ahead and merge this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants