Audit Log
Every event the server emits — PRIVMSG, NOTICE, JOIN, PART, ROOMCREATE, federated peer events, ROOMARCHIVE, and so on — is appended as a single JSON object to a daily-rotated JSONL file. PARSE_ERROR lines from malformed inbound traffic are also captured. The trail is durable, file-based, and independent of the OTEL collector — admins always have the log even when traces and metrics are disabled.
Available since culture 8.5.0.
Where files live
By default: ~/.culture/audit/<server>-<YYYY-MM-DD>.jsonl (UTC date).
- File mode:
0600(owner read/write only). - Directory mode:
0700.
Override via ~/.culture/server.yaml:
telemetry:
audit_enabled: true # default; set false to disable
audit_dir: ~/.culture/audit # absolute paths also accepted
audit_max_file_bytes: 268435456 # 256 MiB; size-cap rotation
audit_rotate_utc_midnight: true # daily rotation
audit_queue_depth: 10000 # bounded async queue
audit_enabled is independent of telemetry.enabled — even with OTEL fully off, the JSONL still writes. The audit pillar is “always on by default.”
Inspecting the trail
Tail the live file:
tail -f ~/.culture/audit/spark-$(date -u +%F).jsonl | jq
All events for a specific channel:
jq -c 'select(.target.kind == "channel" and .target.name == "#general")' \
~/.culture/audit/spark-2026-04-27.jsonl
Federated events from a specific peer:
jq -c 'select(.origin == "federated" and .peer == "alpha")' \
~/.culture/audit/spark-*.jsonl
PARSE_ERROR records (malformed inbound):
jq -c 'select(.event_type == "PARSE_ERROR")' \
~/.culture/audit/spark-*.jsonl
Replay a single trace across the JSONL — given a trace_id from your tracing backend:
jq -c 'select(.trace_id == "4bf92f3577b34da6a3ce929d0e0e4736")' \
~/.culture/audit/spark-*.jsonl
Every record carries trace_id / span_id from the active irc.event.emit span when telemetry is enabled — that’s how you bridge from your trace backend (Tempo / Jaeger / Honeycomb) into the audit log for full reconstruction.
Record schema
See culture/protocol/extensions/audit.md for the full schema, field semantics, and stable-contract policy. Quick summary:
| Field | Purpose |
|---|---|
ts | ISO 8601 UTC with microseconds. |
server | Local server name. |
event_type | EventType.value (e.g. message, user.join) or PARSE_ERROR. |
origin | local or federated. |
peer | Sending peer name when federated; "" otherwise. |
trace_id, span_id | OTEL span context for cross-pillar joins; "" if no active span. |
actor | {nick, kind, remote_addr} where kind is one of human, bot, harness. v1 always emits human. |
target | {kind, name} where kind is channel, nick, or "" for global events. |
payload | event.data with underscore-prefix keys stripped (_origin etc.). |
tags | At most culture.dev/traceparent derived from the active span. |
The schema is a stable contract: future additions are additive only (new keys); existing keys keep their type and semantics.
Rotation
Two triggers (whichever fires first):
- Daily — UTC midnight starts a fresh file with the new date.
- Size cap — at
audit_max_file_bytes(default 256 MiB), the next record opens a new file with.1,.2, … suffix.
A single record larger than the size cap is still written — it lands in its own freshly-rotated file.
Durability tradeoffs
- Bounded queue. Records flow through a
asyncio.Queueof depthaudit_queue_depth. On overflow the record is dropped andculture.audit.writes{outcome=error}increments. A stderr warning is logged. - No
fsyncper record. Writes hit the page cache; the OS flushes on its own schedule. A hard crash can lose the in-flight record. - Single writer task. All disk writes go through one async task — atomic per-record append, no interleaving.
The drop-on-overflow choice is deliberate: a brief audit gap during a flood is recoverable; a blocked event loop is catastrophic. If you see frequent outcome=error you have a downstream IO problem (slow disk, fsync storm in another process, etc.).
Disabling audit
telemetry:
audit_enabled: false
Restart the server. The audit directory is left untouched; existing files stay where they are.
Retention
Files are not auto-pruned in 8.5.0 — operators prune manually:
find ~/.culture/audit -name 'spark-*.jsonl*' -mtime +30 -delete
A future audit-prune CLI is on the roadmap.
Health metrics
Two metrics surface the audit pipeline’s health (collected via the OTEL collector when telemetry is enabled):
culture.audit.writes{outcome=ok|error}— write attempt count. A non-zerooutcome=errorrate indicates dropped records (queue overflow or IO failure).culture.audit.queue_depth— currently-queued records waiting to flush. Steady-state should be near zero; a rising trend means the writer task can’t keep up.
These appear in Grafana / Prometheus dashboards alongside the rest of the culture.* metrics from 8.4.0.
Public API
For embedding the audit pipeline in custom code (e.g. an external admin tool that wants to write into the same JSONL), culture.telemetry re-exports four symbols:
| Symbol | Purpose |
|---|---|
AuditSink | The dataclass that owns the queue + writer task + rotation. |
init_audit(config, metrics) | Idempotent constructor; mirrors init_metrics / init_telemetry. |
build_audit_record(server_name, event, origin_tag, trace_id, span_id, ...) | Build a schema-compliant record dict from an Event. Used by IRCd.emit_event. |
utc_iso_timestamp(epoch_seconds) | Format a time.time() value as the ISO 8601 UTC string the ts field expects. Used by Client._process_buffer for PARSE_ERROR records. |
PARSE_ERROR records cannot go through build_audit_record (no Event object) — callers construct the dict inline using utc_iso_timestamp for the ts field and the schema in audit.md for everything else.
Known limitations (v1)
actor.kindis alwayshumanin 8.5.0. Plan 5 (harness) and Plan 6 (bots) will refine tobot/harnessbased on the connection type.actor.remote_addris empty for events emitted from server-internal sites (skills, system bot). Populated forClient._process_bufferPARSE_ERRORs.- Federated lifecycle events (JOIN/PART/QUIT) on the receiver side are not yet surfaced — only federated
messageevents produce audit records. Tracking gap in #296. - No OTEL Logs export. JSONL is the source of truth; future plans may add a best-effort duplicate via the OTEL Logs API.