feat(vmm): preserve serial logs across VM restarts#548
Merged
Conversation
QEMU truncates serial.log on each boot, losing previous crash/panic logs that are critical for debugging restart loops. - Append serial.log to serial.history.log with boot timestamp separator before each VM restart, capped at a configurable max size (default 4MB) - Add boot separator timestamps to stdout.log and stderr.log so individual boot sessions are clearly delimited - Add `serial_history_max_bytes` config option (supports human-readable sizes like "4MB", "512KB")
github.sha in pull_request events is a temporary merge commit that only
exists in GitHub's internal refs (refs/pull/N/merge) and is never pushed
to the repo. Dockerfiles that do `git clone && git checkout ${DSTACK_REV}`
fail with exit 128 because this SHA is unreachable.
Use github.event.pull_request.head.sha for PR events, falling back to
github.sha for push-to-branch events where it is always a real commit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a dstack VM crashes (kernel panic with
panic=1) or gets restarted, QEMU truncatesserial.logon the next boot — losing all previous boot logs. This is especially painful in TDX VMs where non-resettable CPUs causecpus are not resettable, terminating→ rapid restart loops, making it impossible to diagnose the original panic.Changes
vmm: preserve serial logs across restarts
serial.logcontent toserial.history.logwith a timestamped===== boot @ <rfc3339> =====separatorserial.history.logat a configurable max size (default 4MB, trimmed from the front to keep most-recent data)stdout.logandstderr.logso individual QEMU sessions are clearly delimitedserial_history_max_bytes(supports human-readable sizes:"4M","512K")ci: fix Docker Build Check on PRs
github.shaforpull_requestevents is a temporary merge commit that only exists in GitHub's internal refs and is never pushed to the repo — Dockerfiles doinggit clone && git checkout ${DSTACK_REV}fail with exit 128github.event.pull_request.head.sha || github.shaso PRs use the actual branch HEAD while push-to-branch events continue to usegithub.shaTest plan
serial.history.logis created with previous boot's serial output and a===== boot @ <timestamp> =====separatorstdout.logandstderr.logcontain boot separators between restartsserial_history_max_bytes— tested with1Klimit, 88KB history correctly trimmed to 991 bytes keeping tail contentserial.logstill works normally for the current boot