Report HTML document-loader failures clearly by dahlia · Pull Request #913 · fedify-dev/fedify

dahlia · 2026-06-30T23:55:33Z

Fedify already treats HTML as a possible discovery page, not as an immediate failure. That is useful for servers that expose an ActivityPub representation through an alternate link, so this PR keeps that behavior. The problem is the fallback after discovery fails: once the loader has seen an HTML/XHTML response and found no usable ActivityPub alternate, passing the same body to the JSON parser turns a remote HTML page into a generic SyntaxError.

This moves the failure to the point where the document loader still has the right context. In packages/vocab-runtime/src/docloader.ts, the HTML/XHTML branch now clones the response before reading the body, keeps the existing header and in-document alternate discovery, then throws FetchError if no distinct ActivityPub or JSON-LD alternate can be found. Oversized HTML follows the same typed error path because Fedify cannot scan it safely. In both cases the error keeps the document URL, the response, and the Content-Type in the message.

The tests in packages/vocab-runtime/src/docloader.test.ts cover a real HTML page with no ActivityPub alternate and assert that the error is a FetchError with the original response metadata. The ReDoS regression test still checks that hostile HTML completes quickly, but now expects the same document-loader error instead of relying on the JSON parser. I also narrowed the malformed Link-header test to a JSON-LD response, so it continues to cover Link parser recovery without depending on the old HTML fallback.

CHANGES.md records this under @fedify/vocab-runtime for the next minor release.

Fixes #912.

Reject HTML and XHTML document-loader responses that do not advertise an ActivityPub alternate document before they reach the JSON parser. This surfaces remote HTML pages as FetchError failures with response context instead of generic SyntaxError crashes. Add regression coverage for HTML responses without ActivityPub alternates and keep the ReDoS regression test aligned with the new error path. Fixes fedify-dev#912 Assisted-by: Codex:gpt-5.5

netlify · 2026-06-30T23:55:38Z

✅ Deploy Preview for fedify-json-schema canceled.

Name	Link
🔨 Latest commit	`359f270`
🔍 Latest deploy log	https://app.netlify.com/projects/fedify-json-schema/deploys/6a44e96b48f70a0008ab3052

dahlia · 2026-06-30T23:55:43Z

@codex review

gemini-code-assist

Code Review

This pull request modifies getDocumentLoader() in @fedify/vocab-runtime to throw a FetchError when HTML/XHTML responses are too large or lack an ActivityPub alternate link, rather than attempting to parse the HTML as JSON. This improves error handling by surfacing document loading failures with descriptive messages instead of generic JSON parser crashes. Tests have been updated to assert this new behavior. There are no review comments, so I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c6750f718

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai · 2026-06-30T23:58:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9871ceac-28f2-47f0-b976-5b6b514e8624

📥 Commits

Reviewing files that changed from the base of the PR and between c4db929 and 359f270.

📒 Files selected for processing (2)

packages/vocab-runtime/src/docloader.test.ts
packages/vocab-runtime/src/docloader.ts

💤 Files with no reviewable changes (1)

packages/vocab-runtime/src/docloader.test.ts

📝 Walkthrough

Walkthrough

getRemoteDocument() in @fedify/vocab-runtime now bounds HTML/XHTML reads and throws FetchError for oversized responses or HTML without an ActivityPub alternate link. Tests and the changelog were updated to match the new failure mode.

Changes

HTML document loader error handling

Layer / File(s)	Summary
Bounded HTML error path `packages/vocab-runtime/src/docloader.ts`	HTML/XHTML responses are read through a bounded helper, oversized responses fail with `FetchError`, alternate-link scanning uses the bounded text, and JSON fallback is now guarded.
FetchError coverage `packages/vocab-runtime/src/docloader.test.ts`	Adjusts mocks and adds coverage for missing alternates, oversized HTML, canceled bodies, reader fallback, and the ReDoS regression.
Changelog note `CHANGES.md`	Documents the new `getDocumentLoader()` behavior for HTML/XHTML responses without an ActivityPub alternate link.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Suggested reviewers: 2chanhaeng, sij411

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: improving HTML document-loader failure reporting.
Description check	✅ Passed	The description is directly about the docloader HTML fallback error handling changes and related tests.
Linked Issues check	✅ Passed	The changes satisfy `#912` by replacing JSON parse crashes with typed FetchErrors and preserving response context.
Out of Scope Changes check	✅ Passed	The test updates and changelog note are aligned with the issue and no unrelated changes stand out.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/vocab-runtime/src/docloader.ts`:
- Around line 205-217: The HTML size guard in docloader.ts is applied too late
because alternate-link scanning in the response handling path still buffers the
full body via response.text() after cloning the stream. Update the document
fetch logic in the relevant docloader flow (around the same response-processing
code used for the FetchError path) so MAX_HTML_SIZE is enforced before
buffering, either by short-circuiting on Content-Length or by streaming/reading
incrementally with an early stop, and keep the existing logger.warn and
FetchError behavior intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9dc5289d-6a6e-423f-a8da-b8aae6a2bb84

📥 Commits

Reviewing files that changed from the base of the PR and between e2224ef and 4c6750f.

📒 Files selected for processing (3)

CHANGES.md
packages/vocab-runtime/src/docloader.test.ts
packages/vocab-runtime/src/docloader.ts

codecov · 2026-07-01T00:05:27Z

Codecov Report

❌ Patch coverage is 91.17647% with 6 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/vocab-runtime/src/docloader.ts	91.17%	3 Missing and 3 partials ⚠️

Files with missing lines	Coverage Δ
packages/vocab-runtime/src/docloader.ts	`93.93% <91.17%> (+1.14%)`	⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Keep the HTML document-loader fallback narrow by parsing responses that look like JSON even when they are served with an HTML Content-Type. This preserves compatibility with misconfigured ActivityPub endpoints while still reporting real HTML pages without alternates as FetchError failures. Read HTML responses through a bounded stream instead of buffering the full body before checking the scan limit. Oversized responses now short-circuit from Content-Length when available, or while reading when the stream crosses the limit. fedify-dev#913 (comment) fedify-dev#913 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-07-01T00:07:51Z

/gemini review

dahlia · 2026-07-01T00:07:51Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b00ea874a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist

Code Review

This pull request updates getDocumentLoader() to reject HTML and XHTML responses that do not advertise an ActivityPub alternate document with a FetchError instead of attempting to parse them as JSON. It introduces a helper function readBoundedText to safely read response bodies up to a 1MB limit to prevent ReDoS attacks and excessive memory usage. The feedback suggests adding a defensive fallback to response.text() in readBoundedText to ensure compatibility with environments or test mocks where response.body.getReader is not available.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Cancel oversized response bodies when Content-Length already exceeds the HTML scan limit, so document-loader failures do not leave the fetch stream open while returning early. Fall back to response.text() when a custom or non-standard response body lacks getReader(). This keeps the bounded reader usable with fetch mocks and older runtimes while preserving the streaming path where available. fedify-dev#913 (comment) fedify-dev#913 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-07-01T00:34:38Z

@codex review

dahlia · 2026-07-01T00:34:38Z

/gemini review

gemini-code-assist

Code Review

This pull request updates @fedify/vocab-runtime to reject HTML and XHTML responses that do not advertise an ActivityPub alternate document with a FetchError instead of attempting to parse the HTML as JSON. It introduces a MAX_HTML_SIZE limit of 1MB and helper functions like readBoundedText to safely stream and read HTML responses up to this limit, preventing JSON parser crashes on remote HTML error pages. Additionally, corresponding unit tests have been added to verify these behaviors, including handling oversized HTML and fallback scenarios. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector · 2026-07-01T00:40:09Z

Codex Review: Didn't find any major issues. Breezy!

Reviewed commit: c4db929f49

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

sij411

I approve this since compatibility concern was already handled.

2chanhaeng

Some suggestions?

The bounded HTML path can rely on the standard Response body type instead of defensive assertions for non-standard mocks. The parser now preserves HTML-labelled JSON compatibility by attempting JSON.parse() directly and converts syntax failures into the same FetchError used for ordinary HTML. fedify-dev#913 (comment) fedify-dev#913 (comment) fedify-dev#913 (comment) fedify-dev#913 (comment) fedify-dev#913 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-07-01T10:18:51Z

@codex review

dahlia · 2026-07-01T10:18:51Z

/gemini review

chatgpt-codex-connector · 2026-07-01T10:23:05Z

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

Reviewed commit: 359f27073b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist

Code Review

This pull request updates @fedify/vocab-runtime to reject HTML and XHTML responses that do not advertise an ActivityPub alternate document with a FetchError instead of attempting to parse the HTML as JSON. It introduces a MAX_HTML_SIZE limit of 1MB and implements bounded text reading (readBoundedText) to safely cancel response bodies that exceed this limit, preventing potential ReDoS or memory issues. Additionally, unit tests have been updated and expanded to cover these new error conditions and body cancellation behaviors. There are no review comments provided, so I have no feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

dahlia added this to the Fedify 2.4 milestone Jun 30, 2026

dahlia self-assigned this Jun 30, 2026

dahlia requested a review from 2chanhaeng as a code owner June 30, 2026 23:55

dahlia added the component/vocab Activity Vocabulary related label Jun 30, 2026

gemini-code-assist Bot reviewed Jun 30, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread packages/vocab-runtime/src/docloader.ts Outdated

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread packages/vocab-runtime/src/docloader.ts Outdated

chatgpt-codex-connector Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread packages/vocab-runtime/src/docloader.ts

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread packages/vocab-runtime/src/docloader.ts

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

dahlia requested a review from sij411 July 1, 2026 01:40

sij411 previously approved these changes Jul 1, 2026

View reviewed changes

2chanhaeng reviewed Jul 1, 2026

View reviewed changes

dahlia dismissed sij411’s stale review via 359f270 July 1, 2026 10:18

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

dahlia requested review from 2chanhaeng and sij411 July 1, 2026 10:28

Uh oh!

Uh oh!

Conversation

dahlia commented Jun 30, 2026

Uh oh!

netlify Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for fedify-json-schema canceled.

Uh oh!

dahlia commented Jun 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dahlia commented Jul 1, 2026

Uh oh!

dahlia commented Jul 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

dahlia commented Jul 1, 2026

Uh oh!

dahlia commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector Bot commented Jul 1, 2026

Uh oh!

sij411 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2chanhaeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dahlia commented Jul 1, 2026

Uh oh!

dahlia commented Jul 1, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

netlify Bot commented Jun 30, 2026 •

edited

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

codecov Bot commented Jul 1, 2026 •

edited

Loading

sij411 left a comment •

edited

Loading