Skip to content

Commit d919943

Browse files
committed
Dont ignore failure to create cgroup after timeout
Before this commit, creating a cgroup would silently ignore timeouts and carry on. Concretely, this caused cases where a cgroup failed to create, but the caller doesn't realize and ends up looking for files that should exist (e.g. cgroups.controllers), only to find they don't exist. It's very difficult as a caller to deal with this case, where NewSystemd succeeds but the group doesn't exist. The origins of this code seem to trace back to an initial implementation written 5+ years ago: 5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 runc added roughly the same logic here to deal with the same issue: opencontainers/runc#3782 Now, containerd will also error if a cgroup cannot be created within the timeout window. Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
1 parent 190de3b commit d919943

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

cgroup2/manager.go

+4-2
Original file line numberDiff line numberDiff line change
@@ -952,14 +952,16 @@ func startUnit(conn *systemdDbus.Conn, group string, properties []systemdDbus.Pr
952952
}
953953
}
954954

955+
systemdStartUnitTimeout := 30 * time.Second
955956
select {
956957
case s := <-statusChan:
957958
if s != "done" {
958959
attemptFailedUnitReset(conn, group)
959960
return fmt.Errorf("error creating systemd unit `%s`: got `%s`", group, s)
960961
}
961-
case <-time.After(30 * time.Second):
962-
log.G(ctx).Warnf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus. Continuing...", group)
962+
case <-time.After(systemdStartUnitTimeout):
963+
attemptFailedUnitReset(conn, group)
964+
return fmt.Errorf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus after %v", group, systemdStartUnitTimeout)
963965
}
964966

965967
return nil

0 commit comments

Comments
 (0)