Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add graceful shutdown timer to GRPC frontend #7969

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

mattwittwer
Copy link

@mattwittwer mattwittwer commented Jan 27, 2025

What does the PR do?

This PR adds a shutdown timer to the gRPC endpoint for both infer and streaming infer requests. Inflight requests will be allowed to complete before the timer expires and new requests made after shutdown has started will be rejected.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

N/A

Where should the reviewer start?

Start with the new functions in src/grpc/grpc_server.cc and the flow in src/main.cc
GracefulStop();
unload the models in core
Stop();

Test plan:

Review L0_lifecycle and grpc related tests.
Updated shutdown behavior changes the tested responses for some lifecycle tests

  • CI Pipeline ID: 24730709

Caveats:

The updated shutdown process includes three responses. The CANCELLED state cannot be removed as it is part of the GRPC endpoint shutdown behavior:

  • "GRPC server is shutting down and has stopped accepting new requests."
  • "CANCELLED"
  • "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8001: Failed to connect to remote host: connect: Connection refused (111)"

Background

Shutdown behavior for the gRPC endpoint could lead to unexpected errors when unloading the models from the core

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

@mattwittwer mattwittwer added enhancement New feature or request grpc Related to the GRPC server labels Jan 27, 2025
@mattwittwer mattwittwer self-assigned this Jan 27, 2025
@@ -753,14 +753,20 @@ ModelInferHandler::Process(
StartNewRequest();
}

if (ExecutePrecondition(state)) {
if (accepting_new_conn_ && ExecutePrecondition(state)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a lock for accepting_new_conn_?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added lock: conn_mtx_
For both accepting_new_conn_ and cq_shutdown_

@statiraju statiraju requested a review from rmccorm4 January 27, 2025 20:05
@@ -753,14 +753,20 @@ ModelInferHandler::Process(
StartNewRequest();
}

if (ExecutePrecondition(state)) {
if (accepting_new_conn_ && ExecutePrecondition(state)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alongwith reading requests and returning error to the client, I would advice we also prevent calling

service_->RequestModelInfer(
      state->context_->ctx_.get(), &state->request_,
      state->context_->responder_.get(), cq_, cq_, state);

in L671.
The idea is that we should avoid additional activity on the completion queue once the server shutdown is detected. We should just go through the inflight requests and drain their state objects before exiting the server to prevent any memory leak.

When the “Graceful shutdown function” is called, the server immediately notifies all clients to stop sending new RPCs. Then after the clients have received that notification, the server will stop accepting new RPCs.

Source: https://grpc.io/docs/guides/server-graceful-stop/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the forceful shutdown until after the core unloads. This prevents responses from returning from the core once the forceful shutdown has initiated. Completion queues are unloaded only after the forceful shutdown is complete

@rmccorm4 rmccorm4 marked this pull request as draft January 28, 2025 00:16
@rmccorm4 rmccorm4 changed the title draft: mwittwer/grpc shutdown timer draft: feat: Add graceful shutdown timer to GRPC frontend Jan 28, 2025
@mattwittwer mattwittwer requested a review from indrajit96 March 3, 2025 17:52
@mattwittwer mattwittwer changed the title draft: feat: Add graceful shutdown timer to GRPC frontend feat: Add graceful shutdown timer to GRPC frontend Mar 3, 2025
@mattwittwer mattwittwer marked this pull request as ready for review March 3, 2025 18:48
@rmccorm4 rmccorm4 requested review from krishung5 and tanmayv25 March 4, 2025 23:02
#ifdef TRITON_ENABLE_GRPC
if (g_grpc_service) {
// Forceful shutdown of GRPC service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - why would we still need the forceful shutdown of GRPC while we have the graceful shutdown?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graceful shutdown GracefulStop() rejects any new GRPC requests but does not stop ongoing requests. It waits until the timeout completes and then returns but the GRPC server may still exist. Unloading the triton server core first prevents writing back to unloaded states in the GRPC server.

The forceful shutdown in Stop() then cancels any remaining requests and stops the GRPC server if it hasn't stopped already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request grpc Related to the GRPC server
Development

Successfully merging this pull request may close these issues.

3 participants