Platform Status & Transparency

F19.12 1 funcionalidades Entregado

En resumen

The public status page is petanque.life's transparency commitment — real health checks against API, DB, Redis, Auth and SWA, a 60-second background collector with 90-day rolling uptime history, admin-published incidents with timeline updates, 30-second client refresh and an honest fail-red fallback when the status service itself is unreachable, served in 4 languages and linked from every footer.

Cómo funciona

Trust in a B2B/B2G platform survives outages only when the operator is transparent during them. The status page solves this by going beyond the typical green-square theatre. A background job, CollectStatusSamplesJob, runs every 60 seconds and probes each component the platform depends on: API health endpoint, MongoDB ping, Redis ping, Auth service, the static web apps.

Each probe writes a ServiceHealthSample document with status, latency and timestamp into a rolling-window collection with a 90-day TTL — long enough for SLA reporting, short enough to stay cheap. The visible /status page reads aggregated samples and renders per-service uptime over the last 24 hours, 7 days and 90 days, plus current state. Operators can publish StatusIncident records from the sys console with a title, severity, affected services and a timeline of updates — these render at the top of the status page during active incidents and below the fold afterwards as historical post-mortems.

The client-side fetcher refreshes every 30 seconds without a page reload so visitors always see fresh state, and — critically — if the fetch times out the UI fails red ('We can't reach the status service either') rather than silently showing stale green. The page is fully translated into EN/FR/ES/SV and is linked from the footer of every marketing page and from inside the admin app, so any user, customer or prospect who suspects an outage gets a same-second answer without escalating to support.

Capacidades clave

Real health-checks against API, DB, Redis, Auth and SWA components
60-second background collector job recording rolling samples (90-day TTL)
Per-service uptime aggregated over 24h, 7d and 90d windows
Admin-publishable incidents with severity, affected services and update timeline
30-second client-side auto-refresh without a page reload
Honest fail-red fallback when the status service itself is unreachable
Linked from every marketing footer and inside the admin app, EN/FR/ES/SV

En la práctica

A federation IT contact gets a Slack ping that 'Petanque Life is down' from a club admin trying to issue a license. He opens petanque.life/status from his bookmarks. The page shows API: degraded latency, DB: operational, Auth: operational, with an active incident published 4 minutes ago: 'Elevated API latency in EU-North — investigating.' Two updates follow within the next 10 minutes.

He posts the status URL to his federation's Slack channel and the noise stops — everyone sees the same official source rather than competing speculation. The full incident closes with a post-mortem 90 minutes later, which becomes part of the visible history.

Funcionalidades de este subsistema

ID	Status	Funcionalidades
F19.12.01	Entregado	Live status page with real health-checks (API, DB, Redis, Auth, SWA), 60s-interval background collector (CollectStatusSamplesJob), ServiceHealthSample rolling-window uptime (90d TTL), StatusIncident admin-reporting, client-side fetch with 30s auto-refresh and honest fail-red fallback on timeout, 4 languages ✅ PL-T045