Skip to content

feat(kms): make self-authorization enforcement configurable#651

Merged
kvinwang merged 2 commits intomasterfrom
feat/kms-optional-self-auth
Apr 16, 2026
Merged

feat(kms): make self-authorization enforcement configurable#651
kvinwang merged 2 commits intomasterfrom
feat/kms-optional-self-auth

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

Summary

Add core.enforce_self_authorization (default true) so KMS can opt out of the self-attestation step on trusted RPCs when run intentionally outside a TEE.

Motivation

Commit 06d89a2 ("kms: enforce self authorization on trusted RPCs") makes every trusted RPC first call `local_kms_boot_info` -> `app_attest`, which dials `/var/run/dstack(.sock)` (or its newer path) for a TDX quote. When KMS runs on a non-TEE host (local dev / integration testing setups), that socket does not exist, so the `OnceCell`-cached `self_boot_info` can never initialize and every trusted request returns 400 with `KMS self authorization failed: ...: No such file or directory (os error 2)`.

Existing host-mode setups that relied on this only kept working because their long-lived KMS process initialized the cache once when a socket happened to be present and then held it indefinitely. Any restart breaks them.

Changes

  • New field `KmsConfig::enforce_self_authorization` (defaults to `true` via `#[serde(default)]`, so existing configs and TEE deployments are unchanged).
  • `RpcHandler::ensure_self_allowed` and the free `ensure_self_kms_allowed` short-circuit and return `Ok(())` when the flag is `false`.
  • Default `kms.toml` documents the option with an explicit `= true` and a comment warning it should only be flipped for non-TEE local runs.

Test plan

  • `cargo clippy -p dstack-kms -- -D warnings -D clippy::expect_used -D clippy::unwrap_used` clean
  • `cargo fmt --check --all` clean
  • `cargo test -p dstack-kms` passes
  • Manual: run KMS on a non-TEE host with `enforce_self_authorization = false` and confirm `SignCert` / `GetTempCaCert` succeed (verifies the bypass path)
  • Manual: confirm a TEE deployment with the default config still gates trusted RPCs as before

Add core.enforce_self_authorization (default true) so trusted RPCs and
the onboard bootstrap path can skip the local self-attestation step
when KMS is intentionally run outside a TEE — e.g. local dev/testing
where there is no /var/run/dstack(.sock) to dial.

Default stays strict (true) so production deployments are unchanged.
When set to false, both RpcHandler::ensure_self_allowed and the free
ensure_self_kms_allowed return early without attempting to attest.

Why: the strict-by-default check (introduced in 06d89a2) makes any
non-TEE host KMS instance unable to serve a single request because
the OnceCell-cached self_boot_info can never initialize. This blocks
local CVM testing setups that previously relied on an unauthenticated
host KMS process.
Copilot AI review requested due to automatic review settings April 16, 2026 09:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configuration switch to let dstack-kms run in non-TEE environments by optionally skipping the “self-authorization” (self-attestation + auth-api check) gate that currently blocks all trusted RPCs when the guest-agent socket is unavailable.

Changes:

  • Introduces core.enforce_self_authorization in KmsConfig with a default of true.
  • Short-circuits self-authorization checks in both the main RPC handler and upgrade/onboarding authority path when the flag is false.
  • Documents the option in the default kms.toml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
kms/src/main_service/upgrade_authority.rs Skips ensure_self_kms_allowed self-authorization when config disables enforcement.
kms/src/main_service.rs Skips per-request self-authorization gate for trusted RPCs when enforcement is disabled.
kms/src/config.rs Adds KmsConfig::enforce_self_authorization with a default-true serde default.
kms/kms.toml Documents enforce_self_authorization = true and warns it’s for non-TEE dev/testing only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kms/src/main_service.rs
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 16, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • registry.npmmirror.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/_temp/ghcca-node/node/bin/npm ci 89d5/dbs/rust/wo-fdata-sections -O0 -1949cf8c6b5b557-g /index.crates.io-gdwarf-4 -1949cf8c6b5b557-fno-omit-frame-pointer 89d5/dbs/rust/woas -1949cf8c6b5b557--gdwarf-4 2_x86-64_unix.o -1949cf8c6b5b557-o rkin�� -1949cf8c6b5b557/tmp/ccoRf9B8.s -build -1949cf8c6b5b557/home/REDACTED/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aws-lc-sys-0.3--64 ssembly.a 2_x86-64_unix.o 41_x86-64_unix.o--64 2_x86-64_unix.o (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/_temp/ghcca-node/node/bin/npm ci /tmp/codeql-scratch-75cba588fa6889d5/dbs/rust/working/target/debug/build/aws-lc-sys-085ec86ba15b3b45/out/4433246e317b5e42-bignum_tomont_p384.o sys-�� f/aws-lc-sys-0.3-I f/aws-lc-sys-0.3/home/REDACTED/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aws-lc-sys-0.3--verbose f/aws-lc-sys-0.3-I f/aws-lc-sys-0.3ar f/aws-lc-sys-0.3cq f/aws-lc-sys-0.3/tmp/codeql-scratch-75cba588fa6889d5/dbs/rust/working/target/debug/build/aws-lc-sys-085ec86ba15b3b45/out/libaws_lc_0_38_0_crypto.a f/aws-lc-sys-0.3/tmp/codeql-scratch-75cba588fa6889d5/dbs/rust/working/target/debug/build/aws-lc-sys-085ec86ba15b3b45/out/020bc241f4dae036-edwards25519_scalarmuldouble_alt.o f/aw�� f/aws-lc-sys-0.3/tmp/codeql-scratch-75cba588fa6889d5/dbs/rust/working/target/debug/build/aws-lc-/home/REDACTED/.rustup/toolchains/1.92-x86_64-REDACTED-linux-gnu/bin/cargo f/aws-lc-sys-0.3/tmp/codeql-scratch-75cba588fa6889d5/dbs/rust/working/target/debug/build/aws-lc-rustc f/aws-lc-sys-0.3/tmp/codeql-scratch-75cba588fa6889d5/dbs/rust/working/target/debug/build/aws-lc--Z -1949cf8c6b5b557rustc f/aws-lc-sys-0.3--verbose -1949cf8c6b5b557--version f/aws-lc-sys-0.3/tmp/codeql-scratch-75cba588fa6889d5/dbs/rust/working/target/debug/build/aws-lc--- (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@kvinwang kvinwang enabled auto-merge April 16, 2026 10:27
@kvinwang kvinwang merged commit aa23e9b into master Apr 16, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants