Skip to content

gw: implement PROXY protocol#361

Merged
kvinwang merged 17 commits intomasterfrom
pp
Apr 21, 2026
Merged

gw: implement PROXY protocol#361
kvinwang merged 17 commits intomasterfrom
pp

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

@kvinwang kvinwang commented Sep 30, 2025

Summary

Extend dstack-gateway with three layered controls over per-instance traffic, all delivered through existing channels so they're covered by attestation or operator-level auth:

  1. PROXY protocol (v1/v2) outbound — decided per (instance, port) by the app itself.
  2. Port whitelist (restrict_mode) — apps declare which ports the gateway will forward; everything else is rejected at TCP-accept time (fail-close).
  3. Admin override — operators can override an instance's port policy via an Admin RPC, taking precedence over what the instance reports.

Background & security

An earlier revision encoded PP as a p suffix in the SNI subdomain (app-8080p.domain.com). That's client-controlled: a client could connect to a PP-expecting port without the suffix, the gateway would skip writing the PP header, and the backend would fall back to the gateway's WireGuard IP as the source. Effectively a source-address spoof.

PP (and by extension the port policy) must be declared by the app itself and delivered to the gateway through channels that clients cannot forge.

Design

1. AppCompose.port_policy (dstack-types)

Apps declare a single nested port_policy in app-compose.json:

{
  "port_policy": {
    "ports": [{ "port": 8080, "pp": true }],
    "restrict_mode": true
  }
}

Because it's part of app-compose, both ports and restrict_mode are measured into compose_hash and covered by attestation.

2. Reported at registration (RegisterCvmRequest.port_policy)

New CVMs include their port_policy in the existing WireGuard registration RPC. Wrapped in optional PortPolicy so the gateway can distinguish "not reported" (old dstack-util) from "reported with no restrictions".

3. Stored per-instance, synced across gateway nodes

InstanceInfo/InstanceData grow:

  • port_policy: Option<PortPolicy> — what the instance reported.
  • port_policy_hash: String — the compose_hash it was learned against.
  • admin_port_policy: Option<PortPolicy> — operator override via Admin RPC.

All three persist in the existing WaveKV inst/{instance_id} record, so per-instance decisions survive gateway restarts and propagate across the cluster without extra keys.

4. Effective policy resolution

The proxy data path (both TLS passthrough and termination) calls filter_allowed_addresses before connect_multiple_hosts:

  • Effective policy = admin_override.or(instance_reported).
  • restrict_mode = true + port not in portsdeny (TCP closed pre-handshake).
  • Policy unknown (registered CVM, neither layer cached) → deny (fail-close), schedule lazy fetch.
  • Instance not registered (e.g. localhost shortcut) → bypass.
  • should_send_pp(instance_id, port) reads from the same effective policy.

5. Admin RPC surface (mTLS auth, operator-level)

rpc SetInstancePortPolicy(SetInstancePortPolicyRequest) returns (google.protobuf.Empty);
rpc ClearInstancePortPolicy(ClearInstancePortPolicyRequest) returns (google.protobuf.Empty);
rpc GetInstancePortPolicy(GetInstancePortPolicyRequest) returns (GetInstancePortPolicyResponse);

Get returns { effective, source: "admin" | "instance" | "none", instance_reported, admin_override } so "why was port X denied" is answerable in one call. Set on an unknown instance_id errors with "instance ... not found" (404 semantics) — operators don't pre-create policy rows.

Admin override survives app upgrades: a compose_hash change invalidates the instance-reported policy as before, but the operator's intent persists. Explicit clear is required to revert.

6. Backward compatibility

Legacy CVMs that don't ship port_policy at registration:

  • Registration stores port_policy: None with the attested compose_hash.
  • Gateway prewarms at registration time; on data-path cache miss it lazy-fetches via the agent's Info() RPC, parses tcb_info.app_compose, and writes the result back to WaveKV.
  • Legacy CVMs that can't publish app_compose (e.g. public_tcbinfo=false) cache the default open policy (restrict_mode=false) so they keep working.

Subtleties:

  • Re-registration with port_policy=None does not wipe previously cached policy.
  • Re-registration with a different compose_hash invalidates the instance-reported policy but not the admin override.

7. Inbound PP

inbound_pp_enabled (server config) tells the gateway to read a PP header from the inbound TCP stream — used when the gateway sits behind a PP-aware LB like Cloudflare. When disabled, the gateway synthesises a PP header from the real TCP peer. Either way, the resulting header is what gets forwarded to the backend (when enabled per-port).

Config surface

In gateway.toml under [core.proxy]:

agent_port = 8090            # Guest-agent port inside each CVM (used for Info() fetch)
inbound_pp_enabled = false   # Read PP header from upstream (e.g. Cloudflare)

[core.proxy.timeouts]
pp_header = "5s"             # Timeout for reading inbound PP header

[core.proxy.port_policy_fetch]
timeout         = "10s"
max_retries     = 5
backoff_initial = "1s"
backoff_max     = "30s"

Test plan

  • cargo check --workspace
  • cargo test -p dstack-gateway (24 tests pass, including 9 new policy/override tests)
  • cargo fmt --all
  • cargo clippy -- -D warnings
  • Manual: deploy a CVM with port_policy.ports: [{port: 8080, pp: true}], confirm PP header is received at the backend
  • Manual: deploy a legacy CVM (no port_policy), confirm traffic still flows via lazy fetch
  • Manual (tdxlab): set admin override {restrict_mode: true, ports: [22, 8080]}, confirm port 9999 is denied pre-TLS-handshake and logged as port 9999 denied by app port policy
  • Manual (tdxlab): clear admin override, confirm source flips back to instance
  • Manual (tdxlab): SetInstancePortPolicy on unregistered instance returns "instance ... not found"
  • Manual: rolling-upgrade a KMS-provisioned CVM to a new compose_hash, confirm admin override survives while instance-reported is re-fetched
  • Manual: behind Cloudflare with inbound_pp_enabled = true, confirm client IP is propagated end-to-end
tdxlab smoke test session

Setup: 3-instance gateway on tdxlab, admin RPC on 127.0.0.1:13005. Target instance 9c1c4eacf6f05332be8c1acb5c6a7b6535357602 (app 712eab2f507b963e11144ae67218177e93ac2a24) has sshd on port 22.

# 1. Baseline — instance reported the default open policy.
$ curl -s -X POST http://127.0.0.1:13005/prpc/GetInstancePortPolicy?json \
    -H 'Content-Type: application/json' \
    -d '{"instance_id":"9c1c4eacf6f05332be8c1acb5c6a7b6535357602"}'
{"effective":{"ports":[],"restrict_mode":false},"source":"instance",
 "instance_reported":{"ports":[],"restrict_mode":false},"admin_override":null}

# 2. Apply admin override: restrict_mode + only 22 and 8080.
$ curl -s -X POST http://127.0.0.1:13005/prpc/SetInstancePortPolicy?json \
    -H 'Content-Type: application/json' \
    -d '{"instance_id":"9c1c4eacf6f05332be8c1acb5c6a7b6535357602",
         "policy":{"ports":[{"port":22,"pp":true},{"port":8080}],
                   "restrict_mode":true}}'
null

$ curl -s -X POST http://127.0.0.1:13005/prpc/GetInstancePortPolicy?json \
    -H 'Content-Type: application/json' \
    -d '{"instance_id":"9c1c4eacf6f05332be8c1acb5c6a7b6535357602"}'
{"effective":{"ports":[{"port":22,"pp":true},{"port":8080,"pp":false}],
              "restrict_mode":true},
 "source":"admin",
 "instance_reported":{"ports":[],"restrict_mode":false},
 "admin_override":{"ports":[{"port":22,"pp":true},{"port":8080,"pp":false}],
                   "restrict_mode":true}}

# 3. Enforcement: direct instance routing through the gateway.
# Port 22 is in the whitelist — TLS handshake completes.
$ openssl s_client -quiet \
    -connect 9c1c4eacf6f05332be8c1acb5c6a7b6535357602-22.tdxlab.dstack.org:13004 \
    -servername 9c1c4eacf6f05332be8c1acb5c6a7b6535357602-22.tdxlab.dstack.org \
    </dev/null 2>&1 | head -3
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = E8

# Port 9999 is NOT in the whitelist — gateway closes TCP before TLS accept.
$ openssl s_client -quiet \
    -connect 9c1c4eacf6f05332be8c1acb5c6a7b6535357602-9999.tdxlab.dstack.org:13004 \
    -servername 9c1c4eacf6f05332be8c1acb5c6a7b6535357602-9999.tdxlab.dstack.org \
    </dev/null 2>&1 | head -1
...:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:...

# Matching log line from the gateway:
# ERROR conn{id=3}: dstack_gateway::proxy:
#   connection error: port 9999 denied by app port policy
#   for 9c1c4eacf6f05332be8c1acb5c6a7b6535357602 (1 candidate(s))

# 4. Clear override — source flips back to "instance".
$ curl -s -X POST http://127.0.0.1:13005/prpc/ClearInstancePortPolicy?json \
    -H 'Content-Type: application/json' \
    -d '{"instance_id":"9c1c4eacf6f05332be8c1acb5c6a7b6535357602"}'
null

$ curl -s -X POST http://127.0.0.1:13005/prpc/GetInstancePortPolicy?json \
    -H 'Content-Type: application/json' \
    -d '{"instance_id":"9c1c4eacf6f05332be8c1acb5c6a7b6535357602"}'
{"effective":{"ports":[],"restrict_mode":false},"source":"instance",
 "instance_reported":{"ports":[],"restrict_mode":false},"admin_override":null}

# 5. Unknown instance — 404 semantics.
$ curl -s -w '\nHTTP=%{http_code}\n' \
    -X POST http://127.0.0.1:13005/prpc/SetInstancePortPolicy?json \
    -H 'Content-Type: application/json' \
    -d '{"instance_id":"ffffffffffffffffffffffffffffffffffffffff",
         "policy":{"ports":[],"restrict_mode":true}}'
{"error":"instance ffffffffffffffffffffffffffffffffffffffff not found"}
HTTP=400

Add PROXY protocol support to the gateway with two server-side config
options instead of client-controlled SNI suffixes:

- inbound_pp_enabled: read PP headers from upstream load balancers
- outbound_pp_enabled: send PP headers to backend apps

The original PR#361 used a 'p' suffix in the SNI subdomain to toggle
outbound PP per-connection. This is a security flaw: a client could
connect to a PP-expecting port without sending PP headers, allowing
source address spoofing. Both flags are now server-side config only.
Replace the global outbound_pp_enabled switch with a per-(instance, port)
lookup so different ports of the same backend can have different PP
behaviour. PP is declared by the app and reported to the gateway through
authenticated channels — never by client SNI.

Pipeline:

1. dstack-types::AppCompose grows a "ports" array. Each entry carries a
   port number and a "pp" flag. Because it's part of app-compose.json it
   is measured into compose_hash and attested.

2. RegisterCvmRequest grows an optional PortAttrsList. New CVMs include
   their port_attrs at WireGuard registration time. The optional wrapper
   lets the gateway distinguish "not reported" (legacy CVM) from
   "reported empty" (new CVM with no PP-enabled port).

3. The gateway stores port_attrs on InstanceInfo and persists/syncs it
   via WaveKV (InstanceData), keyed by instance_id (different instances
   of the same app may run different code).

4. AddressInfo now carries instance_id, and connect_multiple_hosts
   returns the winner's instance_id. The proxy looks up that instance's
   port_attrs to decide whether to send a PROXY header.

5. Backward compat: if an instance has no port_attrs (legacy CVM), the
   gateway lazily fetches them via the agent's Info() RPC, parses
   tcb_info.app_compose, and caches the result in WaveKV.

PROXY protocol module is unchanged; only the *decision* of whether to
send a header moves from a global config to a per-instance lookup.
A re-registration from a legacy CVM carries port_attrs=None, which
previously wiped any value learned at an earlier registration or lazy
fetch. Gateway restart + CVM re-register would then force a redundant
Info() fetch. Keep cached attrs unless the caller actively reports new
ones; same instance_id implies same compose_hash, so the cache cannot
go stale.
Same instance_id with a different compose_hash means the app was
upgraded in place (typical for KMS-provisioned CVMs that reuse their
disk). Previously, a legacy-style re-registration (port_attrs=None)
would preserve stale cached attrs across such upgrades because the
gateway assumed instance_id ↔ compose_hash was stable.

Track the compose_hash each cached port_attrs was learned against
(taken directly from the attested AppInfo, not from client input).
Mismatch clears the cache so the lazy Info() fetch runs again.
@kvinwang kvinwang changed the title gw: Implement proxy protocol gw: implement PROXY protocol with per-instance control Apr 16, 2026
@kvinwang kvinwang changed the title gw: implement PROXY protocol with per-instance control gw: implement PROXY protocol Apr 16, 2026
@kvinwang
Copy link
Copy Markdown
Collaborator Author

kvinwang commented Apr 16, 2026

End-to-end test on tdxlab

Deployed an nginx app with pp=true on port 8080 and pp=false on port 8081, exercised both forward and reverse paths.

Test endpoints

App ID a66e97bc022d18ba265a520ac2272ecfe6fdb17b, instance ID 59dae6c9db509ecc6358c5ee3d4e51419ac9f804. CVM uses dstack-0.5.8 (image does not yet ship port_policy at registration), so the policy is populated via the lazy Info() fetch path. Backward-compat path verified end to end.

Outbound PP (per-port, declared in app-compose.json)

{
  "port_policy": {
    "ports": [{"port": 8080, "pp": true}, {"port": 8081, "pp": false}],
    "restrict_mode": false
  }
}
Port pp Backend sees
8080 true proxy_protocol_addr=107.131.79.101 (real client)
8081 false remote_addr=10.8.42.1 (gateway WG IP — client IP lost, as expected)

Inbound PP (gateway behind a PP-aware LB)

Set inbound_pp_enabled = true, moved gateway listen to :13006, fronted with haproxy on :13004 using send-proxy-v2:

client (107.131.79.101) → haproxy:13004 → [PP v2] → gateway:13006 → [PP v2] → backend

Result on pp=true port: origin addr = 107.131.79.101 — the real client IP propagates through both hops.

@kvinwang kvinwang marked this pull request as ready for review April 16, 2026 07:29
Copilot AI review requested due to automatic review settings April 16, 2026 07:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end PROXY protocol (v1/v2) support in dstack-gateway, with per-(instance_id, port) outbound decisioning sourced from attested app-compose metadata (and lazily fetched for legacy CVMs), plus optional inbound PP parsing when the gateway is behind a PP-aware LB.

Changes:

  • Introduce AppCompose.ports / PortAttrs and propagate port attributes through CVM registration (protobuf + dstack-util).
  • Add gateway-side PP header read/synthesis and conditional outbound PP header injection per selected backend instance/port.
  • Persist per-instance port attributes in WaveKV with compose-hash invalidation and legacy lazy fetch via guest-agent Info().

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
guest-agent/src/rpc_service.rs Updates test fixture for new AppCompose.ports field.
gateway/src/proxy/tls_terminate.rs Injects outbound PP header (when enabled) before bridging to backend.
gateway/src/proxy/tls_passthough.rs Carries PP header through passthrough flow; returns winning instance_id from racing connect; injects PP header per port.
gateway/src/proxy/port_attrs.rs New: per-instance/port lookup with legacy lazy fetch via agent Info().
gateway/src/proxy.rs Reads/synthesizes inbound PP header before SNI extraction; passes header through proxy paths.
gateway/src/pp.rs New: inbound PROXY protocol v1/v2 parse + synthesized header creation + display helper.
gateway/src/models.rs Extends InstanceInfo with port_attrs and port_attrs_hash.
gateway/src/main_service/tests.rs Adjusts test calls for updated registration/new_client signatures.
gateway/src/main_service/snapshots/dstack_gateway__main_service__tests__config.snap Snapshot update for new InstanceInfo fields.
gateway/src/main_service/snapshots/dstack_gateway__main_service__tests__config-2.snap Snapshot update for new InstanceInfo fields.
gateway/src/main_service.rs Wires registration to store port attrs + compose hash; persists/invalidates cache; exposes instance lookup/update helpers; threads instance_id through address selection.
gateway/src/main.rs Registers new pp module.
gateway/src/kv/mod.rs Persists port_attrs and port_attrs_hash in InstanceData; defines PortFlags.
gateway/src/debug_service.rs Updates debug registration call signature.
gateway/src/config.rs Adds agent_port, inbound_pp_enabled, and timeouts.pp_header.
gateway/rpc/proto/gateway_rpc.proto Adds PortAttrs / PortAttrsList; extends RegisterCvmRequest with optional port_attrs.
gateway/gateway.toml Adds inbound_pp_enabled and pp_header timeout configuration.
gateway/Cargo.toml Adds proxy-protocol dependency.
dstack-util/src/system_setup.rs Sends port_attrs during registration based on app-compose ports.
dstack-types/src/lib.rs Adds AppCompose.ports and PortAttrs schema.
Cargo.toml Adds workspace dependency pin for proxy-protocol.
Cargo.lock Locks new dependency graph for proxy-protocol and transitive deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/src/main_service.rs Outdated
Comment thread gateway/src/proxy/port_attrs.rs Outdated
Comment thread gateway/src/pp.rs
Three fixes from review:

1. Treat the wire-format `port: uint32` as out-of-range when it can't fit
   in u16 (instead of silently truncating to a different valid port). Use
   `u16::try_from` and skip invalid entries.

2. Move the legacy `Info()` lazy fetch off the connection critical path:
   - `should_send_pp` is now sync. On a cache hit it returns the declared
     value; on a miss it enqueues the instance for the background worker
     and returns `pp = false` immediately, so a slow/missing CVM agent
     never blocks a proxied connection.
   - A single background task (`spawn_fetcher`) drains the queue, dedupes
     in-flight instance ids via a HashSet, applies a configurable
     timeout (`timeouts.port_attrs_fetch`, default 10s), and writes the
     result back to WaveKV.

3. Add unit tests in `pp.rs` for the inbound PROXY parser: v1/v2 IPv4
   happy paths, no-prefix rejection, v1 missing terminator, v2
   over-length cap, and the address synthesis/Display helpers.
When a CVM registers without port_attrs (legacy CVM, or compose_hash
mismatch invalidated the cache), enqueue a background fetch right away
instead of waiting for the first proxied connection to discover the
miss. Reduces the window during which the fast path returns a wrong
`pp = false` because the cache hasn't been populated yet.

The fetcher dedupes in-flight ids, so this is safe to enqueue on every
registration that ends up without cached attrs.
Right after registration, the WireGuard handshake hasn't completed yet
and the agent's TCP port isn't reachable. The previous one-shot fetch
would fail and leave the cache empty, falling back to pp=false until
the next connection (which would itself eat one more failed fetch).

Move the timeout/retry policy into a dedicated config block so it can
be tuned per deployment:

  [core.proxy.port_attrs_fetch]
  timeout = "10s"          # per-attempt Info() RPC timeout
  max_retries = 5          # extra attempts after the initial try
  backoff_initial = "1s"   # doubles each retry up to backoff_max
  backoff_max = "30s"

Worst-case 1+2+4+8+16+30 ≈ 1 min covers a reasonable WG warmup window.

Bail out early when the instance is no longer in state (recycled while
queued) — the unknown-instance error chain is the signal.
Don't waste a 1-minute retry budget on errors that can't recover. Two
classes:

- Transient → retry: TCP/RPC failure, Info() timeout. The CVM may just
  be warming up.
- Permanent → bail: instance was recycled (no longer in state), tcb_info
  isn't valid JSON, missing app_compose key, or app_compose itself
  fails to parse. Same input each retry, same failure.

`tcb_info` empty (public_tcbinfo=false) still goes through the success
path with an empty map cached, as before — that's not a fetch failure.
Thread the new gateway config knobs through the dstack-app deployment:

- .env / .app_env gains `INBOUND_PP_ENABLED` (default false). Set to
  true only when the gateway runs behind a PP-aware L4 LB; otherwise
  every connection would be rejected because the parser would try to
  read a PP header that isn't there.

- docker-compose.yaml forwards the new env vars plus the retry/backoff
  knobs for the background port_attrs fetcher and the pp_header read
  timeout.

- entrypoint.sh writes the corresponding fields into gateway.toml,
  including the new [core.proxy.port_attrs_fetch] section.

Defaults match the in-repo gateway.toml so existing deployments
continue to work without any .env changes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/dstack-app/builder/entrypoint.sh
Comment thread gateway/src/proxy/port_attrs.rs Outdated
Comment thread gateway/src/proxy/port_policy.rs
Comment thread gateway/src/main_service.rs Outdated
kvinwang and others added 4 commits April 16, 2026 01:24
The pre-existing script had three latent issues that weren't checked
because the file hadn't been touched. Modifying it for the PP rollout
brings it into the prek diff, so fix them now:

- SC1091: `source .env` — explicitly mark dynamic include
- SC2002: replace `cat … | tr …` with `tr … < file` redirect
- SC2086: quote $WG_ADDR in the cut pipeline
Apps can now declare `port_policy.restrict_mode: true` in app-compose to
make the gateway reject connections to any port not listed under
`port_policy.ports`. Combined with this PR's existing PROXY protocol
support, this gives apps explicit control over which ports the gateway
will forward.

Implementation:

- `AppCompose` gets a single nested `port_policy: PortPolicy` field
  (replacing the flat `ports` field added earlier on this branch).
  Keeps related config grouped and gives `restrict_mode` a meaningful
  namespace instead of a vague top-level flag.
- Gateway internals collapse the previous `port_attrs: Option<BTreeMap>`
  + `port_attrs_hash: String` into a single `Option<PortPolicy>`
  carrying `{ ports, restrict_mode }`. One Option distinguishes
  "not reported" from "reported empty" cleanly.
- RPC: `RegisterCvmRequest.port_attrs` (PortAttrsList) becomes
  `port_policy` (PortPolicy { ports, restrict_mode }).
- Enforcement: `filter_allowed_addresses` runs before
  `connect_multiple_hosts` in both TLS-terminate and TLS-passthrough
  paths. A denied connection bubbles up as a normal error and the
  proxy closes the inbound TCP stream — no special HTTP response.
- Failure mode: fail-close. An unknown policy (cache miss) denies the
  connection and triggers a background fetch, so subsequent connections
  proceed once the policy is known. Apps that opt into restrict_mode
  must run a CVM that reports policy at registration time; legacy
  CVMs that fall back to the lazy `Info()` fetch get the open default
  (`restrict_mode = false`) so they keep working.
- Unknown instance_ids (e.g. the `localhost` shortcut) bypass the
  check — the policy machinery only applies to registered CVMs.
- Rename the lazy-fetch config block from `port_attrs_fetch` to
  `port_policy_fetch` for consistency (only ever shipped on this
  unmerged branch).

Tests:

- 4 new unit tests covering allow/deny, disabled mode, fail-close on
  unknown policy, and the unknown-instance bypass.
- Snapshot tests updated for the renamed fields.
Operators can now override an instance's port policy through the Admin
service, taking precedence over what the instance reports for itself.

Three new methods on the Admin service:

- SetInstancePortPolicy(instance_id, policy)
  Persists an admin override on the instance record. Errors with
  "instance not found" (404 semantics) if the instance is not
  registered — operators don't pre-create policy rows.

- ClearInstancePortPolicy(instance_id)
  Removes the override. Effective policy reverts to whatever the
  instance reported (or none, fail-close again).

- GetInstancePortPolicy(instance_id) -> { effective, source,
  instance_reported, admin_override }
  Returns both layers and the resolved effective policy. The `source`
  string ("admin" | "instance" | "none") makes "why was port X
  rejected" answerable in one call.

Storage and resolution:

- `InstanceData.admin_port_policy: Option<PortPolicy>` lives next to
  the existing `port_policy` field, persisted to WaveKV under the
  same `inst/{id}` record so overrides sync across nodes for free.
- `instance_port_policy()` (used by both `is_port_allowed` and
  `should_send_pp`) now returns `admin_port_policy.or(port_policy)`.
- App upgrade (compose_hash change) clears the instance-reported
  policy as before, but the admin override survives. Operator intent
  is stronger than app updates; they can re-evaluate explicitly.
- Lazy fetch only ever writes `port_policy`, never touches the
  override.

Tests:

- 5 new unit tests: override beats instance, override can open what
  the instance restricts (and vice versa), clearing reverts, unknown
  instance errors, override survives compose_hash change.
@kvinwang kvinwang merged commit 8eac462 into master Apr 21, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants