Skip to content

refactor(submitter): concurrent submitter#3287

Open
julienrbrt wants to merge 12 commits intomainfrom
julien/speedup-submitter
Open

refactor(submitter): concurrent submitter#3287
julienrbrt wants to merge 12 commits intomainfrom
julien/speedup-submitter

Conversation

@julienrbrt
Copy link
Copy Markdown
Member

@julienrbrt julienrbrt commented Apr 24, 2026

Overview

Attempt to improve submitter by doing concurrent sends. We don't need to wait for the answer and we would submit.
Useful when the throughput of blobs needs to be high.

Related to #3244 as Fiber takes time to return, so it is necessary there. This PR is mainly to investigate if we can generalize that improvement to mainline evnode.

Summary by CodeRabbit

  • New Features

    • Submissions are now performed asynchronously with lifecycle hooks and a graceful shutdown to avoid blocking callers.
  • Refactor

    • Improved retry/backoff and batch-size handling to better surface and handle oversized items.
    • Pending-item management updated for more accurate claim/reset behavior and robust re‑try of failed ranges.
  • Bug Fixes

    • More reliable error handling to prevent permanently lost pending items after failures.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

Extends the cache manager with in-flight range reset methods and refactors pending tracking to use claims/gaps; converts DA submission to asynchronous, callback-driven submission with centralized retry/backoff and lifecycle Close().

Changes

Cohort / File(s) Summary
Cache Manager & Public Resetters
block/internal/cache/manager.go, block/internal/cache/pending_headers.go, block/internal/cache/pending_data.go
Added ResetInFlightHeaderRange(start, end) and ResetInFlightDataRange(start, end) methods to the PendingManager API and respective concrete types.
Pending Base (claims & gaps)
block/internal/cache/pending_base.go, block/internal/cache/pending_base_test.go
Refactored pending selection to track contiguous in-flight claims and gaps; getPending now allocates bounded unclaimed ranges, registers claims, and resetInFlightRange converts removed claims back into retryable gaps; helpers for range ops added.
Cache Tests & Small test updates
block/internal/cache/manager_test.go, block/internal/cache/pending_headers_test.go, block/internal/cache/pending_data_test.go
Updated tests to exercise in-flight reset behavior and reposition assertions to validate counts immediately after resets.
DA Submitter Async, Retry & Lifecycle
block/internal/submitting/da_submitter.go, block/internal/submitting/submitter.go, block/internal/submitting/da_submitter_tracing.go, block/internal/submitting/da_submitter_tracing_test.go
Converted SubmitHeaders/SubmitData signatures to accept onSubmitSuccess/onSubmitError callbacks and made submission asynchronous with submitWithRetry, backoff, size-limited batching, and Close() to wait for in-flight goroutines; tracing wrapper updated accordingly.
Submitter & DA submitter tests
block/internal/submitting/da_submitter_test.go, block/internal/submitting/da_submitter_integration_test.go, block/internal/submitting/da_submitter_mocks_test.go, block/internal/submitting/submitter_test.go
Reworked tests and mocks to the callback-based API; added tests for batch-level retry edge cases, adjusted helpers to use Close() and verify callback-driven outcomes.

Sequence Diagrams

sequenceDiagram
    participant Submitter as Submitter
    participant Cache as Cache (pending manager)
    participant DASubmitter as DA Submitter
    participant DA as DA Layer

    Submitter->>Cache: GetPendingHeaders()
    Cache->>Cache: select contiguous unclaimed range\nregister in-flight claim
    Cache-->>Submitter: pending items & range

    Submitter->>DASubmitter: SubmitHeaders(batch, onSuccess, onError)
    DASubmitter-->>Submitter: return nil (async)

    DASubmitter->>DASubmitter: spawn goroutine -> submitWithRetry
    DASubmitter->>DA: submitToDA(batch)

    alt Success
        DA-->>DASubmitter: success
        DASubmitter->>Cache: update post-submit state
        DASubmitter->>Submitter: call onSuccess()
    else Retryable failure
        DA-->>DASubmitter: error
        DASubmitter->>DASubmitter: backoff & retry
    else Non-retryable / terminal
        DA-->>DASubmitter: error
        DASubmitter->>Submitter: call onError(error)
        Submitter->>Cache: ResetInFlightHeaderRange(start,end)
    end
Loading
sequenceDiagram
    participant SubmitLoop as Submit loop
    participant DASubmitter as DA Submitter
    participant Cache as pendingBase

    SubmitLoop->>Cache: GetPendingData()
    Cache->>Cache: choose first unclaimed contiguous range\nregister claim, remove overlapping gaps
    Cache-->>SubmitLoop: pending items + range

    SubmitLoop->>DASubmitter: SubmitData(batch, onSuccess, onError)
    DASubmitter-->>SubmitLoop: nil

    DASubmitter->>DASubmitter: submitWithRetry -> onSuccess/onError callbacks
    alt onError called
        SubmitLoop->>Cache: ResetInFlightDataRange(start,end)
        Cache->>Cache: remove claim, add failing portion as gap
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

T:testing

Suggested reviewers

  • chatton
  • tac0turtle

Poem

🐰 I hop through claims and gaps with glee,
Async submits now set me free.
Callbacks sing when batches fly,
Reset the ranges, give retries a try! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(submitter): concurrent submitter' is clear and directly related to the main change: making the submitter handle concurrent sends instead of waiting synchronously.
Description check ✅ Passed The PR description provides a clear overview explaining the concurrent-send refactoring, its motivation (high blob throughput), and context (related to Fiber latency improvements).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch julien/speedup-submitter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedApr 29, 2026, 11:37 AM

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 24, 2026

Claude finished @julienrbrt's task in 3m 19s —— View job


Code Review: refactor(submitter): concurrent submitter

  • Read and analyze changed files
  • Review cache layer changes (pending_base.go, manager.go, pending_headers.go, pending_data.go)
  • Review submitter changes (da_submitter.go, submitter.go, da_submitter_tracing.go)
  • Review test coverage and quality
  • Post comprehensive review

Overview

The PR converts DA submission from synchronous to asynchronous and adds in-flight range tracking so concurrent claims don't re-submit the same heights. The architecture is sound, but there are several correctness bugs ranging from critical to minor that should be addressed before merging.


Critical Issues

1. Partial Success Callback Uses Wrong Index — Data Loss Bug

block/internal/submitting/da_submitter.go, SubmitHeaders / SubmitData

This is the most serious new bug I found. On a partial success (SubmittedCount < total), submitWithRetry slices marshaled forward but the onSuccess callback still indexes into the original headers slice without an offset:

// In SubmitHeaders, the closure captures the full headers slice:
s.wg.Go(func() {
    s.submitWithRetry(ctx, envelopes, namespace, func(submittedCount int, daHeight uint64) {
        postSubmit(headers[:submittedCount], ...)  // ← always from position 0!
    }, ...)
})

// In submitWithRetry:
case datypes.StatusSuccess:
    onSuccess(submitted, res.Height)
    marshaled = marshaled[submitted:]  // advance marshaled window
    // …but headers[:submittedCount] still starts at index 0!

Scenario: 5 headers [h1..h5], first round submits 2 (SubmittedCount=2):

  • onSuccess(2, …)postSubmit(headers[:2]) → marks h1, h2 DA-included ✓
  • marshaled advances to [e3, e4, e5]

Second round submits 3 (SubmittedCount=3):

  • onSuccess(3, …)postSubmit(headers[:3]) → marks h1, h2, h3 ✗
  • h4 and h5 are never marked DA-included and SetLastSubmittedHeaderHeight(3) instead of 5

The fix is to track an offset in the closure: Fix this →

2. SubmittedCount Not Validated — Infinite Loop or Panic

block/internal/submitting/da_submitter.go:384

When res.SubmittedCount == 0, marshaled = marshaled[0:] is a no-op and the loop spins until MaxAttempts. When res.SubmittedCount > len(marshaled), marshaled[submitted:] panics. The DA client response is external and must be validated:

case datypes.StatusSuccess:
    submitted := int(res.SubmittedCount)
    // Missing: validate 0 < submitted <= len(marshaled)
    if onSuccess != nil {
        onSuccess(submitted, res.Height)
    }

(CodeRabbit also flagged this — confirming it's a real issue.)


Major Issues

3. lastHeight Read Outside inFlightMu — Duplicate Height Race

block/internal/cache/pending_base.go:76-124

getPending reads pb.lastHeight at line 80 before acquiring inFlightMu, while setLastSubmittedHeight updates lastHeight and trims claims inside inFlightMu. Under concurrent calls:

  1. getPending snapshots lastHeight = N
  2. setLastSubmittedHeight(M) runs (M > N): increments lastHeight to M and trims claims for [1..M]
  3. getPending acquires inFlightMu and calls findAvailableRange(gaps, claims, N, storeHeight)
  4. With stale lastHeight=N and trimmed claims, heights [N+1..M] (already submitted!) appear unclaimed → returned again

Fix: move the lastHeight.Load() inside the same inFlightMu critical section, and acquire inFlightMu in setLastSubmittedHeight while updating lastHeight.

4. Tracing Span Closes Before Async Work Completes

block/internal/submitting/da_submitter_tracing.go:33-61

defer span.End() fires when inner.SubmitHeaders returns — immediately after enqueuing the goroutine. The span never records retry latency, final success, or async errors. Wrap the callbacks to propagate the span lifetime to the actual completion path.

5. onError(nil) on Context Cancellation — Misleading Log + Cache Reset

block/internal/submitting/da_submitter.go:355-360

if err := waitForBackoffOrContext(ctx, rs.Backoff); err != nil {
    if onError != nil {
        onError(nil)  // nil error reaches submitter's onError
    }
    return
}

In submitter.go, the onError handler resets the in-flight range and then logs "failed to submit headers" with nil err — which prints confusingly. Distinguish cancellation from failure, e.g. pass ctx.Err() instead of nil.


Minor Issues

6. Dead / Broken Method: ResetInFlightHeaderHeight()

block/internal/cache/manager.go:315-317

func (m *implementation) ResetInFlightHeaderHeight() {
    m.pendingHeaders.ResetInFlightHeaderRange(0, 0)
}

This method is not in the PendingManager interface and passing (0, 0) to resetInFlightRange has no effect since all heights are ≥ 1. This appears to be dead code left over from an earlier API and should be removed.

7. Missing Doc Comments on Exported Methods

block/internal/cache/pending_headers.go:83 and pending_data.go:88

ResetInFlightHeaderRange and ResetInFlightDataRange are new exported methods with no doc comments, violating the project's Go documentation guideline. Add one-line descriptions.

8. fakeDASubmitter Drops Callbacks — Test Coverage Gap

block/internal/submitting/submitter_test.go:424-440

The fake submitter ignores onSubmitSuccess and onSubmitError, meaning the submission loop test only verifies that SubmitHeaders/SubmitData were called, not that the success/error lifecycle correctly updates timestamps and in-flight cache state.

9. Missing defer submitter.Close() in Tests

block/internal/submitting/da_submitter_test.go:216-218, da_submitter_integration_test.go:110

Close() is called after assertions, so an assertion failure leaks the async worker goroutine into subsequent tests. Use defer submitter.Close() (or t.Cleanup) immediately after construction.


Positive Observations

  • The pendingBase claim/gap bookkeeping is well-designed. The sorted-slice helpers (insertClaim, removeGapRange, trimClaimsBelow) are clean and readable.
  • The new TestPendingBase_* test suite thoroughly exercises the in-flight and gap semantics — this is quality work.
  • The backoff/retry logic is correct and the limitBatchBySizeBytes size-cap is a good safety net.
  • The Stop() 5-second timeout avoids hanging shutdown cleanly.
  • Partial handling in submitWithRetry via batch-halving on StatusTooBig is well-structured.
  • makeHeaderPostSubmit and makeDataPostSubmit correctly centralise the post-submission cache update logic.

Summary

The architecture of the concurrent submitter is solid, but issues #1 (partial success index bug) and #2 (SubmittedCount validation) are correctness bugs that can cause heights to never be marked DA-included or cause panics from DA client responses. Issue #3 (the lastHeight race) can cause duplicate submissions. These three should be fixed before merging. The tracing issue (#4) and dead code (#6) are lower priority but worth cleaning up in this PR since they touch the same files.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 70.16575% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.52%. Comparing base (49ef5c9) to head (3b5682c).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/submitting/da_submitter.go 73.49% 29 Missing and 15 partials ⚠️
block/internal/submitting/submitter.go 11.11% 2 Missing and 6 partials ⚠️
block/internal/submitting/da_submitter_tracing.go 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3287      +/-   ##
==========================================
+ Coverage   62.45%   62.52%   +0.06%     
==========================================
  Files         122      122              
  Lines       13047    13078      +31     
==========================================
+ Hits         8149     8177      +28     
+ Misses       4012     4011       -1     
- Partials      886      890       +4     
Flag Coverage Δ
combined 62.52% <70.16%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review April 29, 2026 08:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
block/internal/submitting/da_submitter_tracing.go (1)

33-61: ⚠️ Potential issue | 🟠 Major

The span now ends before the async submission does.

inner.SubmitHeaders/SubmitData now return right after scheduling background work, so defer span.End() closes the span before retries/callbacks run. Async failures routed through onSubmitError never reach the span, and the recorded latency is only the enqueue time. Wrap the callbacks and end/annotate the span from the actual completion path instead.

Also applies to: 64-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/da_submitter_tracing.go` around lines 33 - 61, The
current tracedDASubmitter starts a span in SubmitHeaders (and similarly in
SubmitData) but defers span.End(), which closes the span before async work
completes; instead remove the immediate defer span.End() and wrap the
onSubmitSuccess and onSubmitError callbacks with closures that record
errors/status on the span and call span.End() when the async completion path
runs; i.e., in tracedDASubmitter.SubmitHeaders and SubmitData, create
wrappedOnSuccess := func(){ span.SetStatus(codes.Ok, ""); onSubmitSuccess();
span.End() } and wrappedOnError := func(err error){ span.RecordError(err);
span.SetStatus(codes.Error, err.Error()); onSubmitError(err); span.End() } (or
equivalent wrappers matching original signatures) and pass those to
t.inner.SubmitHeaders/SubmitData so the span lifetime and annotations reflect
actual completion.
block/internal/cache/pending_base.go (1)

80-96: ⚠️ Potential issue | 🟠 Major

Take lastHeight and the in-flight ranges under one synchronization boundary.

getPending() reads lastHeight before cloning inFlightClaims/gaps, while setLastSubmittedHeight() updates lastHeight and trims those slices independently. With the new concurrent submitter, the lastHeight=old + claims already trimmed interleaving can make findAvailableRange() hand out heights that were just acknowledged, causing duplicate DA submissions.

As per coding guidelines "Be careful with concurrent access to shared state".

Also applies to: 175-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/cache/pending_base.go` around lines 80 - 96, getPending()
currently reads pb.lastHeight (pb.lastHeight.Load()) outside the pb.inFlightMu
critical section and then clones pb.inFlightClaims and pb.gaps, which allows an
interleaving with setLastSubmittedHeight() that trims those slices and updates
lastHeight causing findAvailableRange() to return already-acknowledged heights;
fix by moving the read of pb.lastHeight inside the same
pb.inFlightMu.Lock()/Unlock() block where you clone inFlightClaims and gaps so
lastHeight and the in-flight ranges are read atomically, and apply the same
locking discipline to setLastSubmittedHeight() (acquire pb.inFlightMu while
trimming inFlightClaims/gaps and updating pb.lastHeight) to prevent races when
findAvailableRange, getPending, and setLastSubmittedHeight interact.
🧹 Nitpick comments (3)
block/internal/cache/pending_data.go (1)

88-90: Document the new exported reset helper.

ResetInFlightDataRange is public, but it has no doc comment. That violates the repo's Go guideline for exported identifiers and will likely fail linting.

💡 Suggested fix
+// ResetInFlightDataRange clears the in-flight data claim state for heights in [start, end].
 func (pd *PendingData) ResetInFlightDataRange(start, end uint64) {

As per coding guidelines, Document exported types and functions in Go code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/cache/pending_data.go` around lines 88 - 90, Add a Go doc
comment for the exported method ResetInFlightDataRange on type PendingData:
describe what the method does, its parameters (start, end uint64) and any
important behavior or side-effects (it delegates to pd.base.resetInFlightRange
to reset the in-flight data range). Place the comment immediately above the func
declaration for ResetInFlightDataRange so it satisfies Go exported identifier
documentation guidelines.
block/internal/cache/pending_headers.go (1)

83-85: Document the new exported reset helper.

ResetInFlightHeaderRange is public, but it has no doc comment. That violates the repo's Go guideline for exported identifiers and will likely fail linting.

💡 Suggested fix
+// ResetInFlightHeaderRange clears the in-flight header claim state for heights in [start, end].
 func (ph *PendingHeaders) ResetInFlightHeaderRange(start, end uint64) {

As per coding guidelines, Document exported types and functions in Go code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/cache/pending_headers.go` around lines 83 - 85, Add a Go doc
comment for the exported method PendingHeaders.ResetInFlightHeaderRange
describing its purpose and behavior: explain that it resets the in-flight header
range tracked by the PendingHeaders instance between the inclusive start and end
uint64 indices, mention any side effects (it delegates to
ph.base.resetInFlightRange), and document the parameters (start, end) and any
expectations (e.g., inclusive bounds or preconditions). Place the comment
immediately above the ResetInFlightHeaderRange method.
block/internal/submitting/submitter_test.go (1)

424-440: Exercise the new callbacks in the fake submitter.

This test double now accepts onSubmitSuccess/onSubmitError, but it still discards them. That means the loop test only verifies enqueueing, not the success/error lifecycle that now updates timestamps and resets in-flight cache state.

Consider invoking the callbacks when non-nil or adding a focused test that covers that contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/submitter_test.go` around lines 424 - 440, The
fakeDASubmitter currently drops the provided callbacks in SubmitHeaders and
SubmitData; update these methods (SubmitHeaders and SubmitData on
fakeDASubmitter) to call the supplied on-success and on-error callbacks when
they are non-nil so the test exercises the full success/error lifecycle (e.g.,
invoke the success callback when you want the fake to simulate success, or
invoke the error callback with a test error to simulate failure), while
preserving the existing signaling to chHdr/chData; alternatively add a focused
test that uses a fake submitter which invokes those callbacks to assert
timestamps and in-flight cache resets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@block/internal/submitting/da_submitter_integration_test.go`:
- Around line 101-110: The test currently calls daSubmitter.Close() only at the
end, risking resource leaks if earlier assertions fail; after creating the async
submitter (the daSubmitter variable), ensure cleanup is guaranteed by
registering a deferred close or t.Cleanup call—e.g., immediately after
daSubmitter is constructed call defer daSubmitter.Close() or t.Cleanup(func(){
daSubmitter.Close() }) so the Close() method on daSubmitter always runs even if
the test fails early.

In `@block/internal/submitting/da_submitter_test.go`:
- Around line 216-218: The test currently calls submitter.Close() after
assertions which can leak the submitter's async worker if an assertion fails;
change the teardown to run immediately after setup by invoking defer
submitter.Close() (or t.Cleanup(func(){ submitter.Close() })) right after the
submitter is created so Close() always runs even on test failures — update the
tests that call submitter.SubmitHeaders(...) and later submitter.Close() (e.g.,
the cases around SubmitHeaders and the other similar test) to use deferred
cleanup instead.

In `@block/internal/submitting/da_submitter.go`:
- Around line 388-398: The datalayer success branch uses res.SubmittedCount
directly which can be 0 or >len(marshaled) and cause infinite loops or panics;
in the datypes.StatusSuccess case (around res.SubmittedCount handling) validate
that submitted := int(res.SubmittedCount) is >0 and <= len(marshaled) before
calling onSuccess or advancing the window (marshaled = marshaled[submitted:]);
if submitted==0 treat as a reject/error (update rs.Attempt or return/log and do
not spin) and if submitted>len(marshaled) treat as malformed input (log/error
and reject) so only a validated count is passed to onSuccess and used to slice
marshaled.

In `@block/internal/submitting/submitter.go`:
- Around line 236-250: The code enqueues a batch as in-flight via
GetPendingHeaders/GetPendingData but if s.daSubmitter.SubmitHeaders or
SubmitData returns an immediate error the in-flight claim is never released;
update the error path in submitter.go around s.daSubmitter.SubmitHeaders and the
analogous SubmitData call so that before logging or returning on synchronous
error you call s.cache.ResetInFlightHeaderRange(headers[0].Height(),
headers[len(headers)-1].Height()) (and for data use the corresponding
ResetInFlightDataRange with the first/last data heights), then proceed to
log/handle the error (including the existing ErrOversizedItem handling) so the
claimed heights are retried.

---

Outside diff comments:
In `@block/internal/cache/pending_base.go`:
- Around line 80-96: getPending() currently reads pb.lastHeight
(pb.lastHeight.Load()) outside the pb.inFlightMu critical section and then
clones pb.inFlightClaims and pb.gaps, which allows an interleaving with
setLastSubmittedHeight() that trims those slices and updates lastHeight causing
findAvailableRange() to return already-acknowledged heights; fix by moving the
read of pb.lastHeight inside the same pb.inFlightMu.Lock()/Unlock() block where
you clone inFlightClaims and gaps so lastHeight and the in-flight ranges are
read atomically, and apply the same locking discipline to
setLastSubmittedHeight() (acquire pb.inFlightMu while trimming
inFlightClaims/gaps and updating pb.lastHeight) to prevent races when
findAvailableRange, getPending, and setLastSubmittedHeight interact.

In `@block/internal/submitting/da_submitter_tracing.go`:
- Around line 33-61: The current tracedDASubmitter starts a span in
SubmitHeaders (and similarly in SubmitData) but defers span.End(), which closes
the span before async work completes; instead remove the immediate defer
span.End() and wrap the onSubmitSuccess and onSubmitError callbacks with
closures that record errors/status on the span and call span.End() when the
async completion path runs; i.e., in tracedDASubmitter.SubmitHeaders and
SubmitData, create wrappedOnSuccess := func(){ span.SetStatus(codes.Ok, "");
onSubmitSuccess(); span.End() } and wrappedOnError := func(err error){
span.RecordError(err); span.SetStatus(codes.Error, err.Error());
onSubmitError(err); span.End() } (or equivalent wrappers matching original
signatures) and pass those to t.inner.SubmitHeaders/SubmitData so the span
lifetime and annotations reflect actual completion.

---

Nitpick comments:
In `@block/internal/cache/pending_data.go`:
- Around line 88-90: Add a Go doc comment for the exported method
ResetInFlightDataRange on type PendingData: describe what the method does, its
parameters (start, end uint64) and any important behavior or side-effects (it
delegates to pd.base.resetInFlightRange to reset the in-flight data range).
Place the comment immediately above the func declaration for
ResetInFlightDataRange so it satisfies Go exported identifier documentation
guidelines.

In `@block/internal/cache/pending_headers.go`:
- Around line 83-85: Add a Go doc comment for the exported method
PendingHeaders.ResetInFlightHeaderRange describing its purpose and behavior:
explain that it resets the in-flight header range tracked by the PendingHeaders
instance between the inclusive start and end uint64 indices, mention any side
effects (it delegates to ph.base.resetInFlightRange), and document the
parameters (start, end) and any expectations (e.g., inclusive bounds or
preconditions). Place the comment immediately above the ResetInFlightHeaderRange
method.

In `@block/internal/submitting/submitter_test.go`:
- Around line 424-440: The fakeDASubmitter currently drops the provided
callbacks in SubmitHeaders and SubmitData; update these methods (SubmitHeaders
and SubmitData on fakeDASubmitter) to call the supplied on-success and on-error
callbacks when they are non-nil so the test exercises the full success/error
lifecycle (e.g., invoke the success callback when you want the fake to simulate
success, or invoke the error callback with a test error to simulate failure),
while preserving the existing signaling to chHdr/chData; alternatively add a
focused test that uses a fake submitter which invokes those callbacks to assert
timestamps and in-flight cache resets.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 996c9001-1856-439f-b61e-aa2ea4669b4e

📥 Commits

Reviewing files that changed from the base of the PR and between 49ef5c9 and 31093e6.

📒 Files selected for processing (16)
  • block/internal/cache/manager.go
  • block/internal/cache/manager_test.go
  • block/internal/cache/pending_base.go
  • block/internal/cache/pending_base_test.go
  • block/internal/cache/pending_data.go
  • block/internal/cache/pending_data_test.go
  • block/internal/cache/pending_headers.go
  • block/internal/cache/pending_headers_test.go
  • block/internal/submitting/da_submitter.go
  • block/internal/submitting/da_submitter_integration_test.go
  • block/internal/submitting/da_submitter_mocks_test.go
  • block/internal/submitting/da_submitter_test.go
  • block/internal/submitting/da_submitter_tracing.go
  • block/internal/submitting/da_submitter_tracing_test.go
  • block/internal/submitting/submitter.go
  • block/internal/submitting/submitter_test.go

Comment on lines 101 to +110
// Submit headers and data - cache returns both items and marshalled bytes
headers, marshalledHeaders, err := cm.GetPendingHeaders(context.Background())
require.NoError(t, err)
require.NoError(t, daSubmitter.SubmitHeaders(context.Background(), headers, marshalledHeaders, cm, n))
require.NoError(t, daSubmitter.SubmitHeaders(context.Background(), headers, marshalledHeaders, cm, n, nil, nil))

dataList, marshalledData, err := cm.GetPendingData(context.Background())
require.NoError(t, err)
require.NoError(t, daSubmitter.SubmitData(context.Background(), dataList, marshalledData, cm, n, gen))
require.NoError(t, daSubmitter.SubmitData(context.Background(), dataList, marshalledData, cm, n, gen, nil, nil))

daSubmitter.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Defer Close() so cleanup always runs.

Calling Close() only at the end of the successful path can leak the async submitter if an earlier assertion fails. Move it into a defer or t.Cleanup immediately after construction.

💡 Suggested fix
 daSubmitter := NewDASubmitter(client, cfg, gen, common.DefaultBlockOptions(), common.NopMetrics(), zerolog.Nop(), noopDAHintAppender{}, noopDAHintAppender{})
+defer daSubmitter.Close()
@@
- daSubmitter.Close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Submit headers and data - cache returns both items and marshalled bytes
headers, marshalledHeaders, err := cm.GetPendingHeaders(context.Background())
require.NoError(t, err)
require.NoError(t, daSubmitter.SubmitHeaders(context.Background(), headers, marshalledHeaders, cm, n))
require.NoError(t, daSubmitter.SubmitHeaders(context.Background(), headers, marshalledHeaders, cm, n, nil, nil))
dataList, marshalledData, err := cm.GetPendingData(context.Background())
require.NoError(t, err)
require.NoError(t, daSubmitter.SubmitData(context.Background(), dataList, marshalledData, cm, n, gen))
require.NoError(t, daSubmitter.SubmitData(context.Background(), dataList, marshalledData, cm, n, gen, nil, nil))
daSubmitter.Close()
// Submit headers and data - cache returns both items and marshalled bytes
headers, marshalledHeaders, err := cm.GetPendingHeaders(context.Background())
require.NoError(t, err)
require.NoError(t, daSubmitter.SubmitHeaders(context.Background(), headers, marshalledHeaders, cm, n, nil, nil))
dataList, marshalledData, err := cm.GetPendingData(context.Background())
require.NoError(t, err)
require.NoError(t, daSubmitter.SubmitData(context.Background(), dataList, marshalledData, cm, n, gen, nil, nil))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/da_submitter_integration_test.go` around lines 101
- 110, The test currently calls daSubmitter.Close() only at the end, risking
resource leaks if earlier assertions fail; after creating the async submitter
(the daSubmitter variable), ensure cleanup is guaranteed by registering a
deferred close or t.Cleanup call—e.g., immediately after daSubmitter is
constructed call defer daSubmitter.Close() or t.Cleanup(func(){
daSubmitter.Close() }) so the Close() method on daSubmitter always runs even if
the test fails early.

Comment on lines +216 to +218
err = submitter.SubmitHeaders(ctx, headers, marshalledHeaders, cm, signer, nil, nil)
require.NoError(t, err)
submitter.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Defer Close() in the success-path tests.

Both tests close the submitter only after the assertions. If one of those assertions fails, teardown is skipped and the async worker can leak into later tests. Use defer or t.Cleanup right after setup instead.

💡 Suggested fix
 submitter, st, cm, mockDA, gen := setupDASubmitterTest(t)
+defer submitter.Close()
@@
- submitter.Close()

Also applies to: 331-333

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/da_submitter_test.go` around lines 216 - 218, The
test currently calls submitter.Close() after assertions which can leak the
submitter's async worker if an assertion fails; change the teardown to run
immediately after setup by invoking defer submitter.Close() (or
t.Cleanup(func(){ submitter.Close() })) right after the submitter is created so
Close() always runs even on test failures — update the tests that call
submitter.SubmitHeaders(...) and later submitter.Close() (e.g., the cases around
SubmitHeaders and the other similar test) to use deferred cleanup instead.

Comment on lines +388 to +398
case datypes.StatusSuccess:
submitted := int(res.SubmittedCount)
if onSuccess != nil {
onSuccess(submitted, res.Height)
}
s.logger.Info().Str("itemType", itemType).Int("count", submitted).Msg("successfully submitted items to DA layer")
if submitted == len(marshaled) {
return
}
// partial success: advance window
marshaled = marshaled[submitted:]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate SubmittedCount before using it.

res.SubmittedCount comes from the DA client. A value of 0 leaves marshaled unchanged and rs.Attempt unmodified, so this loop spins forever; a value larger than the batch size panics here and in the outer headers[:submittedCount] / signedDataList[:submittedCount] callbacks. Reject out-of-range counts before calling onSuccess.

Suggested fix
 		case datypes.StatusSuccess:
 			submitted := int(res.SubmittedCount)
+			if submitted <= 0 || submitted > len(marshaled) {
+				err := fmt.Errorf("invalid submitted count %d for batch size %d", submitted, len(marshaled))
+				s.recordFailure(common.DASubmitterFailureReasonUnknown)
+				s.logger.Error().Err(err).Str("itemType", itemType).Msg("DA layer returned invalid submitted count")
+				if onError != nil {
+					onError(err)
+				}
+				return
+			}
 			if onSuccess != nil {
 				onSuccess(submitted, res.Height)
 			}
 			s.logger.Info().Str("itemType", itemType).Int("count", submitted).Msg("successfully submitted items to DA layer")

As per coding guidelines "Validate all inputs from external sources in Go code".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case datypes.StatusSuccess:
submitted := int(res.SubmittedCount)
if onSuccess != nil {
onSuccess(submitted, res.Height)
}
s.logger.Info().Str("itemType", itemType).Int("count", submitted).Msg("successfully submitted items to DA layer")
if submitted == len(marshaled) {
return
}
// partial success: advance window
marshaled = marshaled[submitted:]
case datypes.StatusSuccess:
submitted := int(res.SubmittedCount)
if submitted <= 0 || submitted > len(marshaled) {
err := fmt.Errorf("invalid submitted count %d for batch size %d", submitted, len(marshaled))
s.recordFailure(common.DASubmitterFailureReasonUnknown)
s.logger.Error().Err(err).Str("itemType", itemType).Msg("DA layer returned invalid submitted count")
if onError != nil {
onError(err)
}
return
}
if onSuccess != nil {
onSuccess(submitted, res.Height)
}
s.logger.Info().Str("itemType", itemType).Int("count", submitted).Msg("successfully submitted items to DA layer")
if submitted == len(marshaled) {
return
}
// partial success: advance window
marshaled = marshaled[submitted:]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/da_submitter.go` around lines 388 - 398, The
datalayer success branch uses res.SubmittedCount directly which can be 0 or
>len(marshaled) and cause infinite loops or panics; in the datypes.StatusSuccess
case (around res.SubmittedCount handling) validate that submitted :=
int(res.SubmittedCount) is >0 and <= len(marshaled) before calling onSuccess or
advancing the window (marshaled = marshaled[submitted:]); if submitted==0 treat
as a reject/error (update rs.Attempt or return/log and do not spin) and if
submitted>len(marshaled) treat as malformed input (log/error and reject) so only
a validated count is passed to onSuccess and used to slice marshaled.

Comment thread block/internal/submitting/submitter.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
block/internal/submitting/da_submitter.go (1)

384-395: ⚠️ Potential issue | 🔴 Critical

Validate SubmittedCount before using it.

The res.SubmittedCount from the DA client is used without validation. If submitted == 0, the loop continues with unchanged marshaled and reset backoff, potentially causing an infinite loop. If submitted > len(marshaled), the slice operation at line 394 will panic.

🛡️ Proposed fix to validate SubmittedCount
 		case datypes.StatusSuccess:
 			submitted := int(res.SubmittedCount)
+			if submitted <= 0 || submitted > len(marshaled) {
+				s.recordFailure(common.DASubmitterFailureReasonUnknown)
+				err := fmt.Errorf("invalid submitted count %d for batch size %d", submitted, len(marshaled))
+				s.logger.Error().Err(err).Str("itemType", itemType).Msg("DA layer returned invalid submitted count")
+				if onError != nil {
+					onError(err)
+				}
+				return
+			}
 			if onSuccess != nil {
 				onSuccess(submitted, res.Height)
 			}

As per coding guidelines: "Validate all inputs from external sources in Go code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/da_submitter.go` around lines 384 - 395, The code
uses res.SubmittedCount directly which can be 0 or >len(marshaled); validate it
before slicing and advancing. In the datypes.StatusSuccess branch (symbols:
res.SubmittedCount, marshaled, onSuccess, rs.Next, reasonSuccess, pol) ensure
submitted := int(res.SubmittedCount) is checked: if submitted <= 0 treat as no
progress (do not reset backoff — call rs.Fail or return to avoid infinite loop),
if submitted > len(marshaled) cap it to len(marshaled) before calling onSuccess
and slicing; only advance marshaled and call rs.Next when a positive, bounded
submitted value was applied.
🧹 Nitpick comments (2)
block/internal/submitting/submitter.go (1)

249-260: Minor: Avoid logging when err is nil.

When onError is called with nil (on context cancellation or empty batch from submitWithRetry), this logs an error with no actual error. Consider guarding the log statement.

♻️ Proposed improvement
 						onError := func(err error) {
 							if len(headers) > 0 {
 								s.cache.ResetInFlightHeaderRange(headers[0].Height(), headers[len(headers)-1].Height())
 							}
 							if errors.Is(err, common.ErrOversizedItem) {
 								s.logger.Error().Err(err).
 									Msg("CRITICAL: Header exceeds DA blob size limit - halting to prevent live lock")
 								s.sendCriticalError(fmt.Errorf("unrecoverable DA submission error: %w", err))
 								return
 							}
-							s.logger.Error().Err(err).Msg("failed to submit headers")
+							if err != nil {
+								s.logger.Error().Err(err).Msg("failed to submit headers")
+							}
 						}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/submitter.go` around lines 249 - 260, The onError
closure should not log when err is nil: after calling
s.cache.ResetInFlightHeaderRange(headers[0].Height(),
headers[len(headers)-1].Height()) (if headers present), add an early guard if
err == nil { return } so you skip the error handling below; keep the
oversized-item check (errors.Is(err, common.ErrOversizedItem)),
s.logger.Error().Err(err).Msg("failed to submit headers"), and
s.sendCriticalError(...) as-is for non-nil errors.
block/internal/submitting/da_submitter.go (1)

326-331: Consider documenting the onError(nil) contract.

Calling onError(nil) for empty batches and context cancellations signals completion without error, but the callback name suggests an error occurred. This works correctly with the caller's errors.Is(err, common.ErrOversizedItem) check, but the semantics could be clearer.

Consider renaming to onComplete(error) or documenting that nil indicates graceful termination without retry failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/submitting/da_submitter.go` around lines 326 - 331, The
callback parameter onError in da_submitter.go is used to signal both actual
errors and graceful completion by calling onError(nil) for empty
batches/cancellations; update the code to make this contract explicit by either
renaming the parameter from onError to onComplete (and update all call sites and
related tests) or by adding a clear doc comment on the onError
parameter/signature (and any public type that exposes it) stating that a nil
argument indicates graceful termination/no-retry, while a non-nil error
indicates a retryable/failure condition; ensure you update comments and any
place that inspects the value (e.g., callers that use errors.Is(err,
common.ErrOversizedItem)) to reflect the new name or documented behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@block/internal/submitting/da_submitter.go`:
- Around line 384-395: The code uses res.SubmittedCount directly which can be 0
or >len(marshaled); validate it before slicing and advancing. In the
datypes.StatusSuccess branch (symbols: res.SubmittedCount, marshaled, onSuccess,
rs.Next, reasonSuccess, pol) ensure submitted := int(res.SubmittedCount) is
checked: if submitted <= 0 treat as no progress (do not reset backoff — call
rs.Fail or return to avoid infinite loop), if submitted > len(marshaled) cap it
to len(marshaled) before calling onSuccess and slicing; only advance marshaled
and call rs.Next when a positive, bounded submitted value was applied.

---

Nitpick comments:
In `@block/internal/submitting/da_submitter.go`:
- Around line 326-331: The callback parameter onError in da_submitter.go is used
to signal both actual errors and graceful completion by calling onError(nil) for
empty batches/cancellations; update the code to make this contract explicit by
either renaming the parameter from onError to onComplete (and update all call
sites and related tests) or by adding a clear doc comment on the onError
parameter/signature (and any public type that exposes it) stating that a nil
argument indicates graceful termination/no-retry, while a non-nil error
indicates a retryable/failure condition; ensure you update comments and any
place that inspects the value (e.g., callers that use errors.Is(err,
common.ErrOversizedItem)) to reflect the new name or documented behavior.

In `@block/internal/submitting/submitter.go`:
- Around line 249-260: The onError closure should not log when err is nil: after
calling s.cache.ResetInFlightHeaderRange(headers[0].Height(),
headers[len(headers)-1].Height()) (if headers present), add an early guard if
err == nil { return } so you skip the error handling below; keep the
oversized-item check (errors.Is(err, common.ErrOversizedItem)),
s.logger.Error().Err(err).Msg("failed to submit headers"), and
s.sendCriticalError(...) as-is for non-nil errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bbad2e4d-1de5-4835-bdcd-e5d352bbf063

📥 Commits

Reviewing files that changed from the base of the PR and between 31093e6 and da7df02.

📒 Files selected for processing (3)
  • block/internal/cache/pending_base.go
  • block/internal/submitting/da_submitter.go
  • block/internal/submitting/submitter.go
✅ Files skipped from review due to trivial changes (1)
  • block/internal/cache/pending_base.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant