Skip to content

RATIS-2235. Allow only one thread to perform appendLog #1206

Merged
szetszwo merged 3 commits intoapache:masterfrom
SzyWilliam:jira2235
Jan 6, 2025
Merged

RATIS-2235. Allow only one thread to perform appendLog #1206
szetszwo merged 3 commits intoapache:masterfrom
SzyWilliam:jira2235

Conversation

@SzyWilliam
Copy link
Copy Markdown
Member

@SzyWilliam SzyWilliam marked this pull request as draft January 2, 2025 15:57
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SzyWilliam , thanks for working this! Please see a comment inlined.

Comment on lines +1648 to +1652
final CompletableFuture<Void> appendLog = appendLogFuture.updateAndGet(f -> f.thenCompose(ignored -> {
final List<CompletableFuture<Long>> futures = entries.isEmpty() ? Collections.emptyList()
: state.getLog().append(requestRef.delegate(entries));
return JavaUtils.allOf(futures);
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should retain the ref first, i.e.

@@ -1641,9 +1645,8 @@ class RaftServerImpl implements RaftServer.Division,
       state.updateConfiguration(entries);
     }
     future.join();
-
-    final List<CompletableFuture<Long>> futures = entries.isEmpty() ? Collections.emptyList()
-        : state.getLog().append(requestRef.delegate(entries));
+    final CompletableFuture<Void> appendLog = entries.isEmpty()? appendLogFuture.get()
+        : appendLog(requestRef.delegate(entries));
     proto.getCommitInfosList().forEach(commitInfoCache::update);
 
     CodeInjectionForTesting.execute(LOG_SYNC, getId(), null);
@@ -1657,7 +1660,7 @@ class RaftServerImpl implements RaftServer.Division,
 
     final long commitIndex = effectiveCommitIndex(proto.getLeaderCommit(), previous, entries.size());
     final long matchIndex = isHeartbeat? RaftLog.INVALID_LOG_INDEX: entries.get(entries.size() - 1).getIndex();
-    return JavaUtils.allOf(futures).whenCompleteAsync((r, t) -> {
+    return appendLog.whenCompleteAsync((r, t) -> {
       followerState.ifPresent(fs -> fs.updateLastRpcTime(FollowerState.UpdateType.APPEND_COMPLETE));
       timer.stop();
     }, getServerExecutor()).thenApply(v -> {
@@ -1675,6 +1678,13 @@ class RaftServerImpl implements RaftServer.Division,
     });
   }
 
+  private CompletableFuture<Void> appendLog(ReferenceCountedObject<List<LogEntryProto>> entriesRef) {
+    entriesRef.retain();
+    return appendLogFuture.updateAndGet(f -> f.thenCompose(
+            ignored -> JavaUtils.allOf(state.getLog().append(entriesRef))))
+        .whenComplete((v, e) -> entriesRef.release());
+  }
+
   private long checkInconsistentAppendEntries(TermIndex previous, List<LogEntryProto> entries) {
     // Check if a snapshot installation through state machine is in progress.
     final long installSnapshot = snapshotInstallationHandler.getInProgressInstallSnapshotIndex();

@SzyWilliam SzyWilliam marked this pull request as ready for review January 3, 2025 14:54
@SzyWilliam
Copy link
Copy Markdown
Member Author

@szetszwo Thanks for reviewing this pr and help with the code. Now I learned how to use zero-copy in async programming, thank you!

@szetszwo
Copy link
Copy Markdown
Contributor

szetszwo commented Jan 3, 2025

The test failures seem related. Not yet sure what is the problem.

state.updateConfiguration(entries);
}
future.join();
final CompletableFuture<Void> appendLog = entries.isEmpty()? appendLogFuture.get()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SzyWilliam , I see the problem now -- when it is empty, it should not wait for the previous append. Otherwise, it blocks the heartbeats.

    final CompletableFuture<Void> appendLog = entries.isEmpty()? CompletableFuture.completedFuture(null)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a tricky one. Thanks for spotting and troubleshooting this!

@SzyWilliam
Copy link
Copy Markdown
Member Author

@szetszwo the tests are all passed now, PTAL ~

Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@szetszwo szetszwo merged commit 9b74401 into apache:master Jan 6, 2025
@SzyWilliam
Copy link
Copy Markdown
Member Author

@szetszwo Thanks for reviewing and merging this PR!

SzyWilliam added a commit to SzyWilliam/ratis that referenced this pull request Jan 9, 2025
szetszwo pushed a commit to szetszwo/ratis that referenced this pull request May 23, 2025
szetszwo pushed a commit to szetszwo/ratis that referenced this pull request May 23, 2025
slfan1989 pushed a commit to slfan1989/ratis that referenced this pull request May 6, 2026
slfan1989 pushed a commit to slfan1989/ratis that referenced this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants