Skip to content

Comments

RATIS-2413. Support different RetryPolicy implementations.#1354

Merged
szetszwo merged 4 commits intoapache:masterfrom
slfan1989:RATIS-2413
Feb 20, 2026
Merged

RATIS-2413. Support different RetryPolicy implementations.#1354
szetszwo merged 4 commits intoapache:masterfrom
slfan1989:RATIS-2413

Conversation

@slfan1989
Copy link
Contributor

What changes were proposed in this pull request?

This PR implements support for different RetryPolicy implementations with a flexible comma-separated format:

New Format:

<classname>,<retry-policy-parameters>

Key Changes:

  • Enhanced RetryPolicy.parse() method to support explicit class name specification:

    • ExponentialBackoffRetry,100ms,5s,100 - Exponential backoff retry
    • MultipleLinearRandomRetry,1ms,10,1s,20,5s,1000 - Multiple linear random retry
  • Backward Compatibility - Legacy format (without class name) is still supported:

    • 1ms,10,1s,20,5s,1000 defaults to MultipleLinearRandomRetry
    • Implemented smart detection via isLegacyMultipleLinearRandomRetryParams() to distinguish between legacy duration-based format and unknown class names
  • Improved Error Handling - Fails fast with clear error messages when:

    • Unknown class name is provided (e.g., UnknownRetry,1ms,10)
    • Required parameters are missing

What is the link to the Apache JIRA

RATIS-2413. Support different RetryPolicy implementations.

How was this patch tested?

Add Unit Tests.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 , thanks for working on this!

  • Let's add another parse(..) method to print out the conf key.
  • Please update GrpcLogAppender and NettyClientStreamRpc.

See the comments inlined and also https://issues.apache.org/jira/secure/attachment/13080872/1354_review.patch


static boolean isLegacyMultipleLinearRandomRetryParams(String firstElement) {
// The legacy format starts with a duration token, not a class name.
final String trimmed = firstElement.trim().replace("_", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not needed since TimeDuration.valueOf(..) will take care of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your point. The code has been updated accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is still here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out this issue. I have revised and improved the relevant code accordingly.

Comment on lines 101 to 102
throw new IllegalArgumentException("Failed to parse MultipleLinearRandomRetry: args.length = "
+ args.length + " <= 1 for " + commaSeparated);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say the that "the parameter list is empty", i.e.

"Failed to parse MultipleLinearRandomRetry: the parameter list is empty for " + commaSeparated

Comment on lines 520 to 524
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update this paragraph?

For `MultipleLinearRandomRetry`, the parameter "1ms,10, 1s,20, 5s,1000" means
that the wait time is $1$ms on average for the first 10 retries.
Then, it becomes $1$s on average for next 20 retries
and $5$s on average for the last 1000 retires.

For `ExponentialBackoffRetry`, the parameter "100ms,5s,100" means
that the base wait time is $100$ms, the maximum wait time is $5$s
and the number of attempts is 100.
The wait time is $\max(2^{(n-1)}\times 100\text{ms}, 5\text{s})$ on average for the $n$-th retry.
In other words,
the wait time is on average $100$ms, $200$ms, $400$ms, $800$ms, $1.6$s, $3.2$s, $5$s, $5$s and so on.

Note that the actual wait time is randomized by a multiplier in the range $[0.5, 1.5)$ for all retry policies.

@slfan1989
Copy link
Contributor Author

@slfan1989 , thanks for working on this!

  • Let's add another parse(..) method to print out the conf key.
  • Please update GrpcLogAppender and NettyClientStreamRpc.

See the comments inlined and also https://issues.apache.org/jira/secure/attachment/13080872/1354_review.patch

@szetszwo Thank you very much for taking the time to review code! I will carefully consider your suggestions and improve the relevant parts accordingly.

@slfan1989
Copy link
Contributor Author

@szetszwo Thanks for the review! I’ve addressed your comments and updated the code accordingly. Thanks again for your time.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 , thanks for the update! Please see the comments inlined.


static boolean isLegacyMultipleLinearRandomRetryParams(String firstElement) {
// The legacy format starts with a duration token, not a class name.
final String trimmed = firstElement.trim().replace("_", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is still here.

For `ExponentialBackoffRetry`, the parameter "100ms,5s,100" means
that the base wait time is 100ms, the maximum wait time is 5s
and the number of attempts is 100.
The wait time is min(2^(n-1) * 100ms, 5s) on average for the n-th retry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch that it is min but not max.

BTW, github supports LaTeX.

Since using plain md also looks good in this case, I am fine if we do not use LaTeX here. 😀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the note! Since GitHub supports LaTeX, I agree it reads more clearly here. I’ve updated the markdown to use LaTeX for the min notation.

image

and the number of attempts is 100.
The wait time is min(2^(n-1) * 100ms, 5s) on average for the n-th retry.
In other words,
the wait time is on average 100ms, 200ms, 400ms, 800ms, 1.6s, 3.2s, 5s, 5s and so on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistake: it should be "the wait times are ..."

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo szetszwo merged commit 493ce43 into apache:master Feb 20, 2026
16 checks passed
@slfan1989
Copy link
Contributor Author

+1 the change looks good.

@szetszwo Thank you for reviewing the code and for the valuable feedback — really appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants