Skip to main content

Log Service

The Log Service provides centralized logging for all Edd Cloud services with real-time streaming capabilities.

Features

  • Centralized Collection: All services send logs via gRPC
  • Real-time Streaming: SSE and WebSocket log streaming to the dashboard
  • Log Levels: DEBUG, INFO, WARN, ERROR
  • Source Filtering: Filter logs by service/pod name
  • GFS Persistence: Async, durable persistence of Warn+ logs to GFS with automatic 14-day retention

Architecture

Storage Model

Logs are stored in two layers:

In-Memory Ring Buffers (primary)

Each unique source + level combination gets its own circular buffer holding up to 1000 entries. These are the primary serving layer — subscribers receive recent entries from ring buffers on connect, then live updates via pub/sub broadcast.

  • Ephemeral: Lost on pod restart
  • Per-replica: Each log-service pod has independent buffers

GFS Persistence (Warn+ only, durable)

Only Warn and above (WARN, ERROR) are persisted to GFS. DEBUG and INFO entries are live-only and never written to storage. Warn+ entries are never dropped: the enqueue blocks (with shutdown escape) rather than discarding entries when the buffer is full.

A background drain worker batches entries and hands them to a dedicated writer goroutine that appends to GFS with exponential-backoff retry. The writer never discards a batch on error — it retries the same batch until GFS accepts it or the service shuts down. One writer goroutine ensures appends to each /<date>/<source>.jsonl file remain ordered.

A daily retention sweeper runs on startup and every 24 hours, deleting archived log files strictly older than 14 days.

/core-logs/2026-02-08/gateway.jsonl
/core-logs/2026-02-08/cluster-monitor.jsonl
/core-logs/2026-02-08/auth-service.jsonl
SettingValue
Persisted levelsWARN and above
Batch size200 entries
Flush interval5 seconds
GFS namespacecore-logs
Queue capacity10 000 entries
Overflow behaviorBlocking (backpressure)
Retry behaviorExponential backoff, max 30s, never drops
Retention14 days (daily sweep)

If GFS is unavailable at startup, persistence is disabled and logs are kept in memory only.

Limitations

  • No log replay from GFS on startup — ring buffers start empty after a pod restart
  • Each log-service replica has independent ring buffers (no cross-replica sharing)
  • DEBUG and INFO entries are never persisted to GFS; they exist only in the in-memory ring buffers

Security Audit Trail

Backend services emit structured audit events at security-relevant points. Audit events are ordinary log entries that carry an audit=true attribute. The log-service persistence filter treats this attribute as an override: any entry that is either Warn+ or audit-marked is archived to GFS. This means audit events are always persisted (14-day retention), even when their severity is Info.

What Is Audited

CategoryEvents
AuthenticationLogin success/failure, logout, 2FA success/failure
AuthorizationAll previously-silent 403 denials from compute and sfs
Credential lifecycleAPI token issue/revoke, passkey register/remove, password change
Privileged actionsUser create/update/delete, service-account create/update/delete
Resource actionsContainer create/delete, terminal session open/close, file delete, namespace visibility change
Registry operationsImage push, image pull, image delete
Gateway rejectsSSH connection rejected, no-route (unresolvable hostname)

Log Level by Outcome

Audit events use outcome-based log levels to keep the live default view useful:

outcome valueLog levelEffect
deniedWARNAppears in live Warn view; alert-eligible
failureWARNAppears in live Warn view; alert-eligible
successINFOArchived via audit marker; excluded from the noisy live default view

Standard Fields

Every audit event includes the following fields in its attributes object:

FieldDescriptionExample
auditAlways true; marks the entry for guaranteed persistencetrue
actionNamespaced verb describing the operationauthz.denied, token.issue, container.create
outcomeResult of the actionsuccess, failure, denied
actorUser ID, service-account ID, or anonymoususer:42, sa:ci-runner, anonymous
client_ipSource IP of the request, when available203.0.113.55
request_idCorrelates the event end-to-end via the gateway's X-Request-ID headerreq_01j9...
resourceThe object being acted oncontainer:abc123, token:tk_xyz, image:registry.cloud.eddisonso.com/user/app:v2
Never-logged

Audit events intentionally never contain secrets. Only identifiers are recorded — usernames, token IDs, key fingerprints, image references. Credentials, raw tokens, and passwords never appear in log output.

Example Audit Entry

A denied authorization event from the compute service:

{
"timestamp": 1750549200,
"level": 2,
"source": "edd-compute",
"message": "authorization denied",
"attributes": {
"audit": true,
"action": "authz.denied",
"outcome": "denied",
"actor": "user:17",
"client_ip": "203.0.113.55",
"request_id": "req_01j9xk2mq3v4w5n6p7r8s9t0u",
"resource": "container:c8f2a1b3d4e5"
}
}

Retrieving Audit Events

Audit events are stored in the normal log archive alongside all other Warn+ entries — no separate pipeline or storage path is required.

Download a full day's audit log:

GET https://health.cloud.eddisonso.com/logs/download?date=YYYY-MM-DD
Authorization: Bearer <admin-token>

The returned .zip archive contains a .jsonl file. Filter it locally for audit=true entries:

# Extract the jsonl and filter audit events
unzip -p edd-cloud-logs-2026-06-20.zip edd-cloud-logs-2026-06-20.jsonl \
| jq 'select(.attributes.audit == true)'

# Narrow further to denied/failure outcomes only
unzip -p edd-cloud-logs-2026-06-20.zip edd-cloud-logs-2026-06-20.jsonl \
| jq 'select(.attributes.audit == true and .attributes.outcome != "success")'

Watch denials and failures in real time (live stream, Warn filter):

GET https://health.cloud.eddisonso.com/sse/logs?level=WARN
Authorization: Bearer <admin-token>

All outcome=denied and outcome=failure audit events surface here because they log at WARN.

Client Library

Services use the gfslog package to send logs:

import "eddisonso.com/go-gfs/pkg/gfslog"

logger := gfslog.NewLogger(gfslog.Config{
Source: "my-service",
LogServiceAddr: "log-service:50051",
MinLevel: slog.LevelInfo, // Info is the recommended default; set LevelDebug only temporarily for deep debugging so per-operation debug logs are not shipped to the centralized stream
})
slog.SetDefault(logger.Logger)
defer logger.Close()

// Now use standard slog
slog.Info("Service started", "port", 8080)
slog.Error("Connection failed", "error", err)

API Endpoints

gRPC (Internal)

MethodDescription
PushLog(PushLogRequest)Send a log entry
StreamLogs(StreamLogsRequest)Stream logs (server-side streaming)
GetLogs(GetLogsRequest)Query recent log entries

HTTP/SSE and WebSocket (External)

EndpointAuthDescription
GET /sse/logsJWT required (admin only)Stream logs via SSE
GET /sse/logs?source=<name>JWT required (admin only)Filter by source name
GET /sse/logs?level=<level>JWT required (admin only)Filter by minimum level
WS /ws/logsJWT required (admin only)Stream logs via WebSocket
GET /logs/download?date=YYYY-MM-DDJWT required (admin only)Download a day's logs (all sources, merged) as a .zip containing .log + .jsonl

The download endpoint returns a .zip archive containing two files: a human-readable edd-cloud-logs-<date>.log and a raw edd-cloud-logs-<date>.jsonl, both containing entries from all sources merged and sorted chronologically for the requested UTC date. The token may be passed via the Authorization header or the ?token= query parameter (same as /ws/logs). Returns 400 for a malformed date and 404 if no logs exist for that date.

Phase 1 — Admin-only log access

Log streaming currently requires a valid JWT with admin privileges. Because log entries do not yet carry per-user or per-namespace ownership data, non-admin users cannot be scoped to their own container logs at this time.

Per-user log streaming is planned for Phase 2, which will add a namespace field to log entries so that non-admin users can stream logs from their own compute-{userID}-* containers. This requires coordinated changes across all log-producing services.

Log Entry Format

{
"timestamp": 1707350096,
"level": 1,
"source": "edd-storage",
"message": "Request processed",
"attributes": {
"method": "GET",
"path": "/storage/files",
"duration": "15ms"
}
}

Log Levels

LevelValueDescription
DEBUG0Detailed debugging information
INFO1General operational messages
WARN2Warning conditions
ERROR3Error conditions

Frontend Integration

The dashboard streams logs via SSE:

const params = new URLSearchParams();
params.set('source', 'edd-storage');
params.set('level', 'INFO');

const eventSource = new EventSource(`/sse/logs?${params}`);

eventSource.onmessage = (event) => {
const entry = JSON.parse(event.data);
console.log(`[${entry.source}] ${entry.message}`);
};

Configuration

FlagDescriptionDefault
-grpcgRPC listen address:50051
-httpHTTP listen address:8080
-masterGFS master addressgfs-master:9000