Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puller may got stuck in some cases #12081

Closed
lidezhu opened this issue Mar 6, 2025 · 0 comments · Fixed by #12080
Closed

puller may got stuck in some cases #12081

lidezhu opened this issue Mar 6, 2025 · 0 comments · Fixed by #12080
Labels
affects-8.5 This bug affects the 8.5.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@lidezhu
Copy link
Collaborator

lidezhu commented Mar 6, 2025

What did you do?

In tikv side, to distinguish connections from different versions of cdc, it maintains a feature list for a every connection.
And tikv use the first request from each connection to set the feature list of the connection.
https://github.com/tikv/tikv/blob/a34740fefaf69092d14f6af5160e8e5ff1c507f8/components/cdc/src/service.rs#L450

If the connection doesn't enable FeatureGate::BATCH_RESOLVED_TS, it won't get any resolved ts message.
https://github.com/tikv/tikv/blob/a34740fefaf69092d14f6af5160e8e5ff1c507f8/components/cdc/src/endpoint.rs#L443

FeatureGate::BATCH_RESOLVED_TS is enabled when the cdc version in the request header is larger than 4.0.8.

But in cdc side, the deregister request's header doesn't have cdc version information. So if the first request of a connection is a deregister request, the connection can never get any resolved ts message.

This problem happens when dispatcher register and deregister happens in a very short time.
Detail steps:

  1. A request worker receive a region for sending;
  2. Before send, the request worker finds that the region of the table is stopped, so it discards the region;
  3. The request worker receives an unregister signal of the table, it sends the deregister request to tikv and tikv disable FeatureGate::BATCH_RESOLVED_TS of the connection, so this connection will never receive any resolved ts.

What did you expect to see?

No response

What did you see instead?

puller never receive any resolved ts message.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)

TiCDC version (execute cdc version):

(paste TiCDC version here)
@lidezhu lidezhu added area/ticdc Issues or PRs related to TiCDC. type/bug The issue is confirmed as a bug. severity/moderate labels Mar 6, 2025
@lidezhu lidezhu added the affects-8.5 This bug affects the 8.5.x(LTS) versions. label Mar 6, 2025
ti-chi-bot bot pushed a commit that referenced this issue Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-8.5 This bug affects the 8.5.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant