Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[fix][txn] fix concurrent error cause txn stuck in TransactionBufferHandlerImpl#endTxn#23551

Merged
lhotari merged 3 commits intoapache:masterfrom
TakaHiR07:fix_concurrent_error_in_TransactionBufferHandlerImpl
Nov 4, 2025
Merged

[fix][txn] fix concurrent error cause txn stuck in TransactionBufferHandlerImpl#endTxn#23551
lhotari merged 3 commits intoapache:masterfrom
TakaHiR07:fix_concurrent_error_in_TransactionBufferHandlerImpl

Conversation

Copy link
Contributor

TakaHiR07 commented Nov 4, 2024 *
edited
Loading

Fixes #23550

Motivation

After diving into the code, finding that there is a concurrent error in TransactionBufferHandlerImpl#checkRequestCredits(), checkPendingRequests(), which would cause the above issue.

Currently, we have config TransactionBufferClientMaxConcurrentRequests to control the concurrent request number. However, if the request and response is executed as follow, the request would permanently stuck in queue.
(to simplify the case, let's set permit is 1)

step request-1 request-2 response-1 request-3
1 start do checkRequestCredits()
2 compareAndSet requestCredits to 0
3 execute endTxn
4 start do checkRequestCredits()
5 get currentPermit = 0
6 trigger onResponse(), set requestCredits to 1
7 trigger checkPendingRequests(), permit == 1 && pendingRequests is null, so break the while process
8 currentPermits == 0 && pendingRequest is null, then add op to pendingRequest
9 start do checkRequestCredits()
10 currentPermit == 1 && pendingRequests is not null , also add op to pendingRequest

Now we can find there is no response can trigger pendingRequest.remove, and then all the new requests just add to pendingRequest but permanently not execute.

Modifications

The root reason is currently only onResponse() can trigger pendingRequest.remove. But when we execute onResponse(), the requestOp may not have been added to pendingRequest.

  • So one modification is to let it can check the pendingRequest queue in checkRequestCredits()
  • And the while(true) in checkPendingRequests() is not necessary, 1 response come back, take 1 requestOp from pendingRequest is OK.

It is hard to add test for this concurrent case.

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 4, 2024
Copy link
Contributor Author

TakaHiR07 commented Nov 4, 2024 *
edited
Loading

@codelipenghui @congbobo184 Can you help review this pr?

lhotari reviewed Nov 5, 2024
TakaHiR07 force-pushed the fix_concurrent_error_in_TransactionBufferHandlerImpl branch from b679478 to e5428b9 Compare November 6, 2024 09:38
lhotari requested review from Demogorgon314, Technoboy- and congbobo184 November 6, 2024 11:58
lhotari approved these changes Nov 6, 2024
Copy link
Member

lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lhotari added this to the 4.1.0 milestone Nov 6, 2024
lhotari assigned TakaHiR07 Nov 6, 2024
congbobo184 approved these changes Nov 11, 2024
Copy link
Contributor

congbobo184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dao-jun approved these changes Nov 11, 2024
Copy link

codecov-commenter commented Jul 30, 2025 *
edited
Loading

Codecov Report

Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
Project coverage is 74.29%. Comparing base (676ba07) to head (ba2daca).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...tion/buffer/impl/TransactionBufferHandlerImpl.java 33.33% 1 Missing and 1 partial
Additional details and impacted files

@@ Coverage Diff @@
## master #23551 +/- ##
=============================================
+ Coverage 38.56% 74.29% +35.73%
- Complexity 13262 33920 +20658
=============================================
Files 1856 1913 +57
Lines 145287 149503 +4216
Branches 16877 17372 +495
=============================================
+ Hits 56025 111074 +55049
+ Misses 81696 29582 -52114
- Partials 7566 8847 +1281
Flag Coverage D
inttests 26.24% <0.00%> (+0.06%)
systests 22.75% <0.00%> (-0.01%)
unittests 73.81% <33.33%> (+39.07%)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage D
...tion/buffer/impl/TransactionBufferHandlerImpl.java 66.25% <33.33%> (+15.32%)

... and 1410 files with indirect coverage changes

New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lhotari requested a review from congbobo184 August 27, 2025 09:23
coderzc modified the milestones: 4.1.0, 4.2.0 Sep 1, 2025
congbobo184 approved these changes Sep 3, 2025
lhotari merged commit c4f125c into apache:master Nov 4, 2025
53 of 54 checks passed
lhotari pushed a commit that referenced this pull request Nov 4, 2025
...andlerImpl#endTxn (#23551)

Co-authored-by: fanjianye
(cherry picked from commit c4f125c)
lhotari pushed a commit that referenced this pull request Nov 4, 2025
...andlerImpl#endTxn (#23551)

Co-authored-by: fanjianye
(cherry picked from commit c4f125c)
lhotari pushed a commit that referenced this pull request Nov 4, 2025
...andlerImpl#endTxn (#23551)

Co-authored-by: fanjianye
(cherry picked from commit c4f125c)
lhotari added the cherry-picked/branch-3.0 label Nov 4, 2025
lhotari pushed a commit that referenced this pull request Nov 4, 2025
...andlerImpl#endTxn (#23551)

Co-authored-by: fanjianye
(cherry picked from commit c4f125c)
lhotari added the cherry-picked/branch-3.3 label Nov 4, 2025
ganesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 6, 2025
...andlerImpl#endTxn (apache#23551)

Co-authored-by: fanjianye
(cherry picked from commit c4f125c)
(cherry picked from commit 74931c9)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 6, 2025
...andlerImpl#endTxn (apache#23551)

Co-authored-by: fanjianye
(cherry picked from commit c4f125c)
(cherry picked from commit 74931c9)
nodece pushed a commit to nodece/pulsar that referenced this pull request Nov 12, 2025
...andlerImpl#endTxn (apache#23551)

Co-authored-by: fanjianye
(cherry picked from commit c4f125c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

lhotari lhotari approved these changes

dao-jun dao-jun approved these changes

congbobo184 congbobo184 approved these changes

Technoboy- Awaiting requested review from Technoboy-

Demogorgon314 Awaiting requested review from Demogorgon314

Assignees

TakaHiR07

Projects

None yet

Milestone

4.2.0

Development

Successfully merging this pull request may close these issues.

[Bug][txn] txn committing stuck and never finish commit process

6 participants