Timon

Timon is a self-hosted monitoring daemon for servers and scripts. It tracks the health of probes (recurring checks) and jobs (scripts with a start and an end), opens incidents automatically when things go wrong, and sends notifications via configurable webhooks.

How it works

Timon runs as a background daemon on your machine. Your scripts and cron jobs push their health status to the daemon over a Unix socket. The daemon applies your rules, manages incidents, and dispatches webhook notifications.

your scripts  ──push──►  timon daemon  ──►  SQLite database
                                       ──►  webhook calls

Installation

curl -fsSL https://raw.githubusercontent.com/draftloop/timon/master/install.sh | sh

Or build from source:

go build -o timon .

Getting started

1. Start the daemon

timon daemon

The daemon reads its configuration from ~/.config/timon/timon.toml or /etc/timon/timon.toml. It starts with sensible defaults if no config file is found.

2. Push your first probe

timon push probe myapp.health healthy --comment "All good"

3. Check the status

timon status

Commands

`timon daemon`

Start the daemon. Reads config, opens the Unix socket, and starts background tasks.

timon daemon

For quick testing, running timon daemon in a terminal is enough. For production use, set it up as a system service via the install script.

`timon status`

Show active incidents and the health of all known probes and jobs.

ACTIVE INCIDENTS (1)
  INC-3  open  "db.backup is critical"  1h ago

PROBES & JOBS (3)
  myapp.health   ✓ healthy — 4m ago   "All good"
  db.backup      ✗ critical — 1h ago              INC-3
  nightly.sync   ✓ healthy — 8h ago

Use watch to get a live-updating view refreshed every 2 seconds:

watch -n2 timon status

`timon summary`

Print a one-line health summary. Useful for shell prompts or status bars.

timon summary
# Timon — 1 active incidents · 1 critical (db.backup) · 0 stale · 0 warning · 2 healthy · 0 running jobs

timon summary --short
# Timon — 1 active incidents · 1 critical · 0 stale · 0 warning · 2 healthy · 0 running jobs

`timon push probe <code> <health>`

Push a health report for a probe. Creates the probe automatically on first push.

timon push probe <code> <healthy|warning|critical> [flags]

Flag	Description
`--comment <text>`	Optional comment attached to this sample
`--stale-after <duration>`	Flag the probe as stale if no push arrives within this delay
`--stale-incident-after <duration>`	Same as `--stale-after`, and also opens an incident

timon push probe myapp.health healthy \
    --comment "All good" \
    --stale-after 5m \
    --stale-incident-after 15m

`timon push job start <code>`

Start a new job run. Prints the run UID, which must be passed to subsequent step and end calls.

timon push job start <code> [flags]

Flag	Description
`--comment <text>`	Optional start comment
`--stale-after <duration>`	Flag the job as stale if no push arrives within this delay after it ends
`--stale-incident-after <duration>`	Same as `--stale-after`, and also opens an incident
`--overtime-incident-after <duration>`	Open an incident if the job runs longer than this delay
`--overlap-incident`	Open an incident if a new run starts while one is already running (default: true)

TIMON_JOB_RUN=$(timon push job start nightly.sync \
    --comment "Starting nightly sync" \
    --overtime-incident-after 30m \
    --stale-after 25h \
    --stale-incident-after 26h)

`timon push job step <code:run-uid> <label> <health>`

Push a step to an ongoing job run. The run UID is passed directly in the code argument using the code:run-uid notation.

timon push job step <code:run-uid> <label> <healthy|warning|critical> [flags]

Flag	Description
`--end`	End the run after this step; the step label is used as end comment if `--end-comment` is not set
`--end-comment <text>`	Override the end comment (only with `--end`)

timon push job step nightly.sync:$TIMON_JOB_RUN "Exported data" healthy
timon push job step nightly.sync:$TIMON_JOB_RUN "Uploaded to S3" healthy
timon push job step nightly.sync:$TIMON_JOB_RUN "Sent report" healthy --end

`timon push job end <code:run-uid>`

End a job run. This is an alternative to passing --end to the last job step when you want to end the run in a separate call.

timon push job end <code:run-uid> [--comment <text>]

`timon push incident <title> [description]`

Manually open an incident. Prints the incident code (INC-<id>).

timon push incident "Payment gateway down" "Stripe returning 503 since 14:32"
# INC-7

`timon show <code>`

Show details of a probe, job, specific run/sample, or incident.

timon show myapp.health          # probe overview + sample history
timon show myapp.health:<uid>    # specific probe sample
timon show nightly.sync          # job overview + run history
timon show nightly.sync:<uid>    # specific job run with steps
timon show INC-7                 # incident details + timeline

`timon annotate <INC-id> <note>`

Add a note to an incident's timeline.

timon annotate INC-7 "Contacted Stripe support, ticket #8821"

`timon resolve <INC-id>`

Mark an incident as resolved.

timon resolve INC-7
timon resolve INC-7 --note "Fixed by rolling back the payment service to v2.4.1"

`timon delete <code>`

Permanently delete a probe or job (and all its history), a specific sample or run, or an incident. Prompts for confirmation unless --yes is passed.

By default, deletion is refused if the target is linked to an active incident, or if an incident is not yet resolved. Use --force to override.

timon delete myapp.health           # delete probe and all its samples
timon delete nightly.sync           # delete job and all its runs
timon delete nightly.sync:abc123    # delete a specific run
timon delete INC-7                  # delete a resolved incident
timon delete myapp.health --force   # delete even if linked to an active incident
timon delete myapp.health --yes     # skip confirmation prompt

`timon truncate`

Bulk-delete old samples, runs, and resolved incidents based on retention durations. Items linked to active incidents are silently skipped — no error is returned, making it safe to use in batch scripts.

timon truncate [<code>] [--keep <duration>] [--keep-healthy <d>] [--keep-warning <d>] [--keep-critical <d>] [--keep-incidents <d>]

Flag	Description
`<code>`	Optional probe or job code — restrict the truncation to this probe or job
`--keep <duration>`	Delete samples and runs older than this duration (shorthand for all three health flags)
`--keep-healthy <duration>`	Retention duration for healthy samples and runs
`--keep-warning <duration>`	Retention duration for warning samples and runs
`--keep-critical <duration>`	Retention duration for critical samples and runs
`--keep-incidents <duration>`	Delete resolved incidents older than this duration

--keep is mutually exclusive with --keep-healthy, --keep-warning, and --keep-critical. At least one flag is required.

When health flags are used, omitted flags inherit from the next higher severity: --keep-healthy defaults to --keep-warning, which defaults to --keep-critical. Samples and runs of a given health are kept indefinitely if no applicable flag is set.

Unfinished runs that never received a warning or critical step (health unknown) are treated as critical for truncation purposes.

timon truncate --keep 30d                                  # all samples and runs older than 30 days
timon truncate myapp.health --keep 30d                     # samples of a single probe
timon truncate --keep-healthy 7d --keep-critical 90d       # warning inherits critical: 90d
timon truncate --keep 30d --keep-incidents 90d             # samples, runs, and resolved incidents
timon truncate --keep-incidents 180d                       # resolved incidents only

Incidents

Incidents are opened automatically based on the rules you set, or manually with timon push incident.

Trigger	Cause	Auto-generated title
`critical`	A probe push with health `critical`, or a job run that ended with at least one `critical` step	`<code> is critical`
`stale`	No push received before `--stale-incident-after` expires	`<code> is stale`
`job_overtime`	A job run exceeds `--overtime-incident-after`	`<code> is overtime`
`job_overlap`	A new run starts while one is already running	`<code> is overlapping`
`manual`	Created explicitly with `timon push incident`	(user-supplied)

An incident transitions through the following states:

open  ──(probe recovers)──►  recovered  ──(degrades again)──►  relapsed
  │                              │                                  │
  └──────────────────────────────┴──────────(timon resolve)────────►  resolved

Resolving an incident (timon resolve) is permanent and can be done from any state. Recovered/relapsed transitions happen automatically as health reports come in.

Configuration

Config is loaded from the first file found among:

~/.config/timon/timon.toml
/etc/timon/timon.toml

All settings are optional. Durations accept ns, us, ms, s, m, h, d, w, mo, y. Units above h are approximate (d = 24h, w = 168h, mo = 720h, y = 8760h); use h or smaller when precision matters.

For local development, a minimal config is enough:

[daemon]
data_dir  = "/tmp/timon/"
log_level = "debug"

Full reference:

[daemon]
hostname      = "prod-server-1"   # used in webhooks; defaults to machine hostname
data_dir      = "/etc/timon/"     # SQLite database location; defaults to /etc/timon/
log_dir       = "/var/log/timon/" # log file location when installed as a service; logs go to stdout otherwise
log_level     = "info"            # silent | fatal | error | warn | info | debug
ping_interval = "5m"              # send a timon.ping webhook event on this interval

[[webhook]]
on      = ["incident.open", "incident.relapsed"]
url     = "https://gotify.internal/message?token=CHANGE_ME"
cert    = "/usr/local/share/ca-certificates/extra/myca.crt"  # optional custom CA certificate to trust
headers = { "X-My-Header" = "yes" }
body    = """
{ "message": {{ if .incident.description }}{{ json .incident.description }}{{ else }}{{ json .incident.title }}{{ end }}, "title": {{ json .incident.title }}, "priority": 8 }
"""

[webhook.retry]
attempts = 3     # retries after the initial attempt (0 = no retry); defaults to 5
timeout  = "10s" # per-request timeout
delay    = "5s"  # delay between attempts

Webhook events

Event	Description
`incident.open`	An incident was opened
`incident.recovered`	An incident recovered
`incident.relapsed`	A recovered incident relapsed
`incident.resolved`	An incident was manually resolved
`incident.annotated`	An annotation was added to an incident
`timon.ping`	Periodic heartbeat (requires `ping_interval`)
`timon.started`	The daemon started

Webhook template

The body is a Go template. The following variables are always available, plus additional ones depending on the event:

Variable / Function	Description
`._hostname`	Daemon hostname
`._event`	Event name (e.g. `incident.open`)
`._timestamp`	RFC3339 timestamp of the event
`{{ json .value }}`	Encode a value as a JSON string
`{{ urlencode .value }}`	URL-encode a string

Example: cron job monitoring

Store the code:run-uid pair in a variable to avoid repeating the job code on every call. The [ -n "$RUN" ] && guard ensures timon calls are silently skipped if the daemon is unreachable — the actual job always runs.

#!/bin/sh

_UID=$(timon push job start db.backup \
    --overtime-incident-after 1h \
    --stale-after 25h \
    --stale-incident-after 26h)
RUN="${_UID:+db.backup:$_UID}"

pg_dump mydb | gzip > /backups/mydb.gz
[ -n "$RUN" ] && timon push job step "$RUN" "Dump completed" healthy

aws s3 cp /backups/mydb.gz s3://mybucket/
[ -n "$RUN" ] && timon push job step "$RUN" "Uploaded to S3" healthy

[ -n "$RUN" ] && timon push job end "$RUN" --comment "Backup OK"

Example: probe from a cron job

# /etc/cron.d/timon-probe
*/5 * * * * root /usr/local/bin/check-myapp.sh

#!/bin/sh
# check-myapp.sh

if curl -sf http://localhost:8080/health > /dev/null; then
  timon push probe myapp.health healthy --stale-incident-after 10m
else
  timon push probe myapp.health critical --stale-incident-after 10m
fi

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
cmd		cmd
internal		internal
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Timon

How it works

Installation

Getting started

Commands

`timon daemon`

`timon status`

`timon summary`

`timon push probe <code> <health>`

`timon push job start <code>`

`timon push job step <code:run-uid> <label> <health>`

`timon push job end <code:run-uid>`

`timon push incident <title> [description]`

`timon show <code>`

`timon annotate <INC-id> <note>`

`timon resolve <INC-id>`

`timon delete <code>`

`timon truncate`

Incidents

Configuration

Webhook events

Webhook template

Example: cron job monitoring

Example: probe from a cron job

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Timon

How it works

Installation

Getting started

Commands

timon daemon

timon status

timon summary

timon push probe <code> <health>

timon push job start <code>

timon push job step <code:run-uid> <label> <health>

timon push job end <code:run-uid>

timon push incident <title> [description]

timon show <code>

timon annotate <INC-id> <note>

timon resolve <INC-id>

timon delete <code>

timon truncate

Incidents

Configuration

Webhook events

Webhook template

Example: cron job monitoring

Example: probe from a cron job

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages

`timon daemon`

`timon status`

`timon summary`

`timon push probe <code> <health>`

`timon push job start <code>`

`timon push job step <code:run-uid> <label> <health>`

`timon push job end <code:run-uid>`

`timon push incident <title> [description]`

`timon show <code>`

`timon annotate <INC-id> <note>`

`timon resolve <INC-id>`

`timon delete <code>`

`timon truncate`