Ir al contenido principal
Petanque Life

Security Operations

F21.17 8 funcionalidades

En resumen

Security Operations is the surface a `sys_security` operator lives in during a live threat: cross-tenant active user-session panel with bulk revoke, a suspicious-activity queue covering impossible-travel, new-device on high-value accounts, and brute-force, a failed-login heatmap, API-key and OAuth-client rotation, Service Principal inventory, idempotent six-class secret-rotation runbooks, per-tenant IP allowlists, and a two-person-armed emergency kill-switch.

Cómo funciona

Security Operations is the surface a `sys_security` operator lives in during a live threat. The active user-sessions panel lists every end-user session across all tenants, filterable by tenant, role, origin, and a `suspicious` flag; bulk-revoke takes a list of session ids and tears them down in one POST. All mutations require fresh-auth.

The suspicious-activity queue persists `SysSuspiciousLoginEvent` rows from three rules — `impossible_travel` (two logins from geo-distant IPs within an implausible window), `new_device_hva` (unknown device on a high-value account), and `brute_force` (failure spike per IP/user). Triage is a single POST with action `none/notify/revoke/lock`. The failed-login heatmap aggregates `sys_failed_login_samples` by IP, user, country, or hour.

API-key and OAuth-client management offers cross-tenant lists with rotate and revoke for both `ApiToken` and `M2MClient`; the raw secret is returned exactly once after rotation, never persisted in plaintext server-side. Service Principal management treats `SysServicePrincipal` rows as inventory, surfacing expiry and last-used; schedule-rotation queues a future rotation, and rotate executes it. Secret rotation is the centrepiece: idempotent stepwise runbooks for six secret classes (`db`, `jwt_keys`, `webhook`, `stripe`, `sendgrid`, `bankgirot`) with `advance/retry/cancel` verbs and a pluggable `RotationStepRunner`.

Re-running a runbook resumes from the last completed step, so a partial rotation never leaves the platform stuck. Per-tenant IP allowlists set CIDR ranges that the tenant authentication path enforces — useful for federations on a stable office network. The emergency kill-switch is the nuclear option: arming requires two operators, each entering a single-use `SysKillSwitchApprovalCode` with a 10-minute TTL, and once armed the middleware returns `503 tenant_kill_switch_armed` for every non-sys tenant-scoped request until disarmed.

The kill-switch is the only way an on-call security engineer can stop the platform end-to-end without redeploying.

Capacidades clave

  • Cross-tenant active user-sessions panel with filter and bulk-revoke
  • Suspicious-activity queue: impossible_travel, new_device_hva, brute_force; one-POST triage
  • Failed-login heatmap by IP / user / country / hour
  • API-key and OAuth-client rotation with single-time raw-secret return
  • Service Principal inventory with expiry, last-used, schedule-rotation, rotate
  • Idempotent six-class secret-rotation runbooks with advance/retry/cancel and resume-from-step
  • Per-tenant IP allowlist (CIDR) enforced at tenant auth
  • Two-person-armed emergency kill-switch with single-use codes (10-minute TTL) returning 503

En la práctica

The suspicious-activity queue flips three `impossible_travel` events for the same user inside 12 minutes. The on-call security engineer triages with action `revoke`; all sessions for that user are torn down. She opens the failed-login heatmap and sees a clear cluster from one IP range.

She adds the user's tenant to its IP allowlist and revokes a leaked partner API key, copying the new secret once and emailing it through a secure channel. Later in the week, a planned Stripe-key rotation walks through the runbook one step at a time; an interrupted run resumes cleanly from step 4. During a separate ransomware drill she triggers the kill-switch with a colleague, every tenant request returns 503 within seconds, and the disarm thirty minutes later restores normal operation.

Funcionalidades de este subsistema

8
ID Status Funcionalidades
F21.17.01 Entregado Active user-sessions panel — cross-tenant end-user sessions with filter (tenant, role, origin, suspicious) and bulk-revoke. GET /sys/security/user-sessions + DELETE /sys/security/user-sessions/{id} + POST /sys/security/user-sessions/bulk-revoke; fresh-auth on mutations. ✅ PL-T137
F21.17.02 Entregado Suspicious-activity queue — three rules (impossible_travel, new_device_hva, brute_force) persist SysSuspiciousLoginEvent rows; triage via POST /sys/security/suspicious/{id}/triage with action none lock_user|revoke_sessions. | Implemented (PL-T137)
F21.17.03 Entregado Failed-login heatmap — GET /sys/security/failed-logins?group_by=ip email&window=24h aggregates SysFailedLoginSample for abuse detection. | Implemented (PL-T137)
F21.17.04 Entregado API-key & OAuth-client management — cross-tenant list + rotate + revoke for ApiToken and M2MClient; raw secret returned once after rotation. Implemented (PL-T137)
F21.17.05 Entregado Service Principal management — SysServicePrincipal inventory with expiry, last-used, rotation scheduling; schedule-rotation + rotate endpoints mirror the manual Azure procedure in docs/engineering/security/service-principals.md. Implemented (PL-T137)
F21.17.06 Entregado Secret-rotation runbook launcher — idempotent stepwise rotation for six secret classes (db, jwt_keys, webhook, stripe, sendgrid, bankgirot) with advance/retry/cancel verbs; pluggable RotationStepRunner. Implemented (PL-T137)
F21.17.07 Entregado Per-tenant IP allowlist — GET/PUT /sys/tenants/{id}/ip-allowlist with CIDR list; enforced during tenant authentication. Implemented (PL-T137)
F21.17.08 Entregado Emergency kill-switch — two-person armed via single-use SysKillSwitchApprovalCode (10 min TTL); middleware returns 503 tenant_kill_switch_armed for every non-sys tenant-scoped request until disarmed. Implemented (PL-T137)