Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Bug/Question] RpcException: timeout when waiting for send fragments RPC (exec_plan_fragment_prepare) #59561

Unanswered
fukai321 asked this question in A - General / Q&A
[Bug/Question] RpcException: timeout when waiting for send fragments RPC (exec_plan_fragment_prepare) #59561
Jan 5, 2026 * 0 comments
Return to top
Discussion options

fukai321
Jan 5, 2026

  1. Describe the bug
    When executing a query, the FE fails with a java.util.concurrent.TimeoutException. It seems the FE waited for 5000ms but didn't receive a response from the BE while sending the execution plan fragment (exec_plan_fragment_prepare).

  2. Environment
    Doris Version: 1.2.1
    Cluster Scale: 1FE 16G32G, 3BE 16C64G

  3. Error Log
    The following error was captured in fe.log:
    Caused by: java.util.concurrent.TimeoutException: Waited 5000 milliseconds (plus 35 milliseconds, 423986 nanoseconds delay) for io.grpc.stub.ClientCalls$GrpcFuture@2eafe466[status=PENDING, info=[GrpcFuture{clientCall={delegate=ClientCallImpl{method= MethodDescriptor{fullMethodName=doris.PBackendService/exec_p lan_fragment_prepare, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$Messa geMarshaller@2b90f55c, responseMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$Mess ageMarshaller@20aab861, schemaDescriptor=org.apache.doris.proto.PBackendServiceGrpc$PBackendServiceMethodDescriptorSupplier@2a53d3aa}}}}]]
    at com.google.common.util.concurrent.AbstractFuture.get(Abstrac tFuture.java:506) ~[spark-dpp-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.Coordinator.waitRpc(Coordinator.java:716 ) ~[doris-fe.jar:1.0-SNAPSHOT]
    ... 13 more
    2026-01-04 23:44:15,017 WARN (mysql-nio-pool-193363|1797346) [StmtExecutor.execute():591] execute Exception. stmt[429323510, c363048331a44cfb-b3b40b967e6e7e65]
    org.apache.doris.rpc.RpcException: timeout when waiting for send fragments RPC. Wait(sec): 5, host: 192.168.130.8
    at org.apache.doris.qe.Coordinator.waitRpc(Coordinator.java:749 ) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.Coordinator.sendFragment(Coordinator.jav a:677) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.Coordinator.exec(Coordinator.java:552) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.StmtExecutor.sendResult(StmtExecutor.jav a:1140) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecuto r.java:1120) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:5 20) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:4 07) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProc essor.java:322) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcess or.java:463) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProc essor.java:690) ~[doris-fe.jar:1.0-SNAPSHOT]
    at org.apache.doris.mysql.nio.ReadListener.lambda$handleEvent$0 (ReadListener.java:52) ~[doris-fe.jar:1.0-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1142) ~[?:1.8.0_92]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:617) ~[?:1.8.0_92]
    at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_92]

Additional Context
Frequency: This issue is intermittent, occurring about 10+ times per day.

Performance: Most of the time, executing the exact same SQL on the FE returns results very quickly. This suggests that the plan fragment distribution is not consistently slow, but rather fails due to sporadic RPC timeouts.

Impact: It causes occasional, unpredictable query failures. I am looking for help to identify if this is due to gRPC connection pooling issues or transient BE thread exhaustion in version 1.2.1.

Any guidance on troubleshooting or relevant parameters to tune would be greatly appreciated. Thank you!

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant