Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to use pidfd and epoll to wait init process exit #4517

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Nov 9, 2024

This PR does some optimizations for runc delete -f.

  1. Nowadays, go runtime has tried to use unix.PidFDSendSignal to send signal to the process,
    this is helpful to reduce the risk of pid reuse attack. So we should replace unix.Kill with
    os.Process.Signal in runc when possible.
  2. But os.Process.Wait is used to wait the child process, to wait a unrelated process, we
    should introduce pidfd & epoll to reduce the sleep time when we want to detect the init
    process exited or not.
  3. For the kernel which doesn't support pidfd & epoll solution, we will fall back to the traditional
    unix.Kill solution, but for stopped containers or containers running in a low load machine,
    we don't need to wait 100ms to do the next detection.

Close: #4512

@lifubang
Copy link
Member Author

lifubang commented Nov 9, 2024

@abel-von PTAL

@kolyshkin
Copy link
Contributor

It seems that the first commit can be merged now and is definitely an improvement.

For the rest of it, give me a few days to review.

@kolyshkin
Copy link
Contributor

Reviewing this reminded me the next step needed for pidfd support in Go, so I wrote this proposal: golang/go#70352

@lifubang
Copy link
Member Author

Reviewing this reminded me the next step needed for pidfd support in Go, so I wrote this proposal: golang/go#70352

Wonderful proposal, I used to think that golang wouldn't support similar interfaces, but I think it's very useful, looking forward its coming.

@abel-von
Copy link

LGTM

@kolyshkin
Copy link
Contributor

@lifubang are you going to keep working on it? This looks very good overall

@lifubang
Copy link
Member Author

lifubang commented Dec 7, 2024

@lifubang are you going to keep working on it? This looks very good overall

Thanks, I'll work on it later.

@lifubang lifubang force-pushed the kill-via-pidfd branch 2 times, most recently from cc64599 to 16ae7fc Compare December 8, 2024 14:34
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I'm having a hard time reviewing this because it's hard to wrap around all the kill/destroy logic that we already have in place:

  • container.Signal;
  • terminate method of parentProcess;
  • destroy(c *Container);
  • etc.

All this seem like a bunch of code full of special cases and kludges, and (in my eyes) it cries to be refactored to be more straightforward and clean. Maybe I'm wrong but this stands in the way of me reviewing this.

As I can't finish this, here's my WIP review bits and pieces:

  1. The logic added might also be useful from func destroy(c *Container) error.

  2. Using epoll on pidfd can also be used when there are multiple pids (i.e. from signalAllProcesses, in the current code).

  3. This adds a new public method (container.Kill) when we already have container.Signal. I understand why, but maybe we should call it container.KillSync (or container.EnsureKilled) instead (so it's clear it not just sends the SIGKILL but also waits for the container to be killed). A note to container.Signal should be added referencing the new method. It should also be described in libcontainer's README.

@kolyshkin
Copy link
Contributor

This would be very nice to have in 1.3 but we might not have enough time to have it ready by 1.3-rc1.

@lifubang
Copy link
Member Author

2. Using epoll on pidfd can also be used when there are multiple pids (i.e. from signalAllProcesses, in the current code).

I think it's not worth to do:

  1. We have no killed check mechanism for shared pid ns container killing before;
  2. I think cgroupv1 will be deprecated in a short future, and we have changed to use cgroup.kill in cgroupv2.
    WDYT @kolyshkin

If we don't want to introduce a wait mechanism in signalAllProcesses, I think I will complete this PR in a short time.

@lifubang lifubang force-pushed the kill-via-pidfd branch 2 times, most recently from edc621d to 5f102d8 Compare February 25, 2025 03:46
@kolyshkin
Copy link
Contributor

I think it's not worth to do:

  1. We have no killed check mechanism for shared pid ns container killing before;
  2. I think cgroupv1 will be deprecated in a short future, and we have changed to use cgroup.kill in cgroupv2.
    WDYT @kolyshkin

If we don't want to introduce a wait mechanism in signalAllProcesses, I think I will complete this PR in a short time.

I agree, let's drop the 'signalAllProcesses` (frankly I don't remember why I thought it would be useful in there).

@kolyshkin
Copy link
Contributor

Will try to review the rest of it tomorrow.

Comment on lines 497 to 498
// KillAndWaitExit kills the container and waits for the init process to exit.
func (c *Container) KillAndWaitExit() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some consideration, I don't like the KillAndWaitExit name, because

  • when a process is being killed, it does not exit;
  • we're not waiting for a process to exit -- we're waiting for the kernel to finish killing it.

The KillAndWait name is not good either, as I said earlier (it implies wait syscall, which we don't call as it's not our child).

This is why I've suggested something like KillSync or EnsureKilled before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose EnsureKilled.

Comment on lines 459 to 460
// We don't need unix.PidfdSendSignal because go runtime will use it if possible.
_ = c.Signal(unix.SIGKILL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use c.signal here as we made sure that we have private pidns.

Maybe don't ignore error, because otherwise you might be waiting for 10 seconds for something that will never happen.

Or, you can be explicit and use PidfdSendSignal here to be 100% sure that it succeeded.

lifubang and others added 2 commits March 8, 2025 03:56
Signed-off-by: lifubang <lifubang@acmcoder.com>
When using unix.Kill to kill the container, we need a for loop to detect
the init process exited or not manually, we sleep 100ms each time in the
current, but for stopped containers or containers running in a low load
machine, we don't need to wait so long time. This change will reduce the
delete delay in some situations, especially for those pods with many
containers in.

Co-authored-by: Abel Feng <fshb1988@gmail.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants