-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Executor interface for Run and Exec #1563
Conversation
Signed-off-by: Cory Bennett <cbennett@netflix.com>
2122ce9
to
6d58121
Compare
worker/runc/runc_test.go
Outdated
Meta: executor.Meta{ | ||
Args: []string{"ps", "-o", "pid,comm"}, | ||
}, | ||
Stdout: &nopCloser{stdout}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a test that also does stdin?
require.NoError(t, err, fmt.Sprintf("stdout=%q, stderr=%q", stdout.String(), stderr.String())) | ||
require.Equal(t, string(selfCmdline), stdout.String()) | ||
} | ||
|
||
func TestRuncWorkerExec(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse the same tests for runc/containerd. Shared package with a test function that takes executor interface instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, will look into sharing some testing code to dedup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the test(s) itself. Don't try to dedupe the setup logic.
worker/runc/runc_test.go
Outdated
|
||
err = eg.Wait() | ||
// we expect pid1 to get canceled after we test the exec | ||
require.EqualError(t, errors.Cause(err), "context canceled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: context.Canceled.Error()
type ProcessInfo struct { | ||
Meta Meta | ||
Stdin io.ReadCloser | ||
Stdout, Stderr io.WriteCloser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess TODO in here is Resize <-chan Size
. Can be next PR
@@ -310,6 +315,62 @@ func (w *runcExecutor) Exec(ctx context.Context, meta executor.Meta, root cache. | |||
return nil | |||
} | |||
|
|||
func (w *runcExecutor) Exec(ctx context.Context, id string, process executor.ProcessInfo) error { | |||
// first verify the container is running, if we get an error assume the container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do some sync with go mutex/cond/channel in here first. Eg. detect that Run()
has been called for the ID and it has not returned yet. I guess there is still a race when runc
binary is being called that needs to be checked with retries but first check and reacting to pid1 returning would be fast then. Also, maybeRun()
should take a channel that is closed to indicate started event? Ideally, when pid1 returns while this method is running all the errors returned in here would be about pid1 closing, and not for example that open fails because state file has been deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I struggled a bit with race conditions between pid1 and exec. I think a channel passed to Run seems reasonable. Since runc.Runc is blocking I can setup a short goroutine to poll for "running" state before sending on the channel to avoid the ~200ms delay between runc.Run being called and the container being running. I will try to rework it a bit to avoid the Exec retry loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
Run(id string, started chan<- struct{}) {
done = make(chan error, 1)
e.mu.Lock()
e.m[id] = done
defer {
e.mu.Lock()
delete(e.m, id)
e.mu.Unlock()
done <- err
}()
e.mu.Unlock()
close(started)
}
Exec(id string) {
e.mu.Lock()
done, ok := e.m[id]
if !ok {
return error404
}
e.mu.Unlock()
defer func() {
select {
case err := <-done:
err = errors.Wrapf(err, "pid1 returned with error")
default:
}
}()
}
?
I still think retry is needed as well because of the way we call runc without getting the create event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, I will update.
executor/runcexecutor/executor.go
Outdated
} | ||
if len(process.Meta.Env) > 0 { | ||
// merge exec env with pid1 env | ||
spec.Process.Env = append(spec.Process.Env, process.Meta.Env...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm ... if PATH is set on pid1, and on a subsequent Exec we set just TERM, we expect the Exec to run without any PATH set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PATH
is kind-of special, eg. we have https://github.com/moby/buildkit/blob/master/solver/llbsolver/ops/exec.go#L728 and should probably do the same for exec. Do you see a case where exec would not know the env of the pid1 in case it wants to reuse some of its env and can't just resend it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that seems fine to me. I was just not sure if we expected the clients to resend the entire env on exec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems safer this way to let the client have the full control when they don't want the same env.
update run/exec tests for stdin and expected failures move common tests for runc and container to shared tests package Signed-off-by: Cory Bennett <cbennett@netflix.com>
@tonistiigi thanks for the help with the review. I think I have addressed all your issues, hopefully I captured what you wanted. Looks like one test failed: Not sure if this could be related to my PR or if the tests are sometimes flaky. |
|
||
resolvConf, err := oci.GetResolvConf(ctx, w.root, nil, w.dnsConfig) | ||
if err != nil { | ||
return err | ||
return sendErr(done, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better if wrapped in a single defer
so there is no risk that wrapping was forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can do that. I tend to avoid that pattern myself since I have always found the unintentional redefinition of err
to be problematic and harder to find, ie:
func Exec(...) (err error) {
defer func() {
done <- err
}()
resolvConf, err := oci.GetResolvConf(ctx, w.root, nil, w.dnsConfig)
if err != nil {
return err
}
would send nil
on done
if GetResolveConf
fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would send nil on done if GetResolveConf fails.
No, it does not :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, whoa, TIL. I tried it on play just to sanity check, I must have misinterpreted the results, so much for sanity :(
I redid the test and of course you are correct, it works fine: https://play.golang.org/p/TvbDMw6q5Z2 I would have sworn I ran into this issue a while ago, maybe older Go versions had this issue? Dunno, anyway, yeah that will make the change much easier :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is what you were thinking https://play.golang.org/p/WJKt7EDGJYD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, maybe I ran into something like you suggest. I tried on an old go1.2 runtime, it was fine there also. So I guess I have just been wrong for years, o'well, glad you set me straight :)
4f2fe1a
to
36636d3
Compare
Signed-off-by: Cory Bennett <cbennett@netflix.com>
36636d3
to
5909d16
Compare
I think I addressed all the issues for round2, let me know if you find more issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@hinshun Do you want to verify this as well?
@tonistiigi Yes, I'm out on a trip atm, but I'll give this a review tmrw morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, small nits.
|
||
proc := spec.Process | ||
|
||
// TODO how do we get rootfsPath for oci.GetUser in case user passed in username rather than uid:gid? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to fix this now? It looks like there is some existing code that deals with errors returned from oci.ParseUIDGID(meta.User)
here:
buildkit/executor/containerdexecutor/executor.go
Lines 107 to 114 in 5909d16
uid, gid, err := oci.ParseUIDGID(meta.User) | |
if err != nil { | |
lm := snapshot.LocalMounterWithMounts(rootMounts) | |
rootfsPath, err := lm.Mount() | |
if err != nil { | |
return err | |
} | |
uid, gid, sgids, err = oci.GetUser(ctx, rootfsPath, meta.User) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the problem is that we need the rootfsPath to call oci.GetUser, which I could not figure out how to access via containerd apis. It was available when creating the container, but I could not figure out how to determine it after creation. We could track it in the same map where we track the done
channel, then I could use it later at Exec time?
type Executor interface { | ||
// TODO: add stdout/err | ||
Exec(ctx context.Context, meta Meta, rootfs cache.Mountable, mounts []Mount, stdin io.ReadCloser, stdout, stderr io.WriteCloser) error | ||
Run(ctx context.Context, id string, rootfs cache.Mountable, mounts []Mount, process ProcessInfo, started chan<- struct{}) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some godocs and details about the usage of started
?
Signed-off-by: Cory Bennett <cbennett@netflix.com>
Only change is godocs, so I'll go ahead and merge this now. |
First step for #749 refactor Executor interface to allow for Run (pid1) vs Exec'ing into running containers. Implements proposal outlined in: #749 (comment)