Platform: JLT-Lane Mesh Env: Production Docs: v1.2.0
● All Systems Operational Last Deploy: Mar 22

Platform Runbooks

The operational response layer of JLT-Lane — where platform signals, system behavior, and recovery procedures become repeatable action.

Why this page exists

Architecture explains how the platform is designed. Observability shows how it behaves. Runbooks explain how to respond when that behavior needs to be verified, corrected, or recovered.

In the broader Engineering Mesh, runbooks sit after observability and before delivery. They are the layer that turns system signals into operational confidence.

Runbook catalog

These playbooks support repeatable operations across the JLT-Lane sandbox suite, observability stack, and local recovery workflows.

Sandbox Startup

Start the Node.js service, Prometheus, Grafana, and supporting containers in the local sandbox environment, then verify the stack is healthy.

  • Docker compose startup
  • Container verification
  • Metrics endpoint validation
  • Prometheus target checks

Open runbook →

Prometheus Target Debug

Diagnose missing metrics, scrape failures, or Prometheus targets that are marked down when dashboards show incomplete or empty data.

  • Check /metrics
  • Inspect Prometheus targets
  • Validate Docker networking
  • Confirm scrape configuration

Open runbook →

Grafana Dashboard Setup

Configure Grafana to use Prometheus as a datasource and build dashboard panels for CPU, memory, and request visibility.

  • Datasource configuration
  • Panel creation
  • PromQL examples
  • Dashboard validation

Open runbook →

Docker Recovery

Recover the platform when containers fail to start, networks drift, or the local Docker environment becomes unstable.

  • Restart Docker Desktop
  • Reset compose stack
  • Inspect failing containers
  • Prune stale resources safely

Open runbook →

Metrics Endpoint Debug

Debug the /metrics endpoint when Prometheus cannot scrape data or the application is running without exposing expected telemetry.

  • Call /metrics directly
  • Verify service port exposure
  • Review application logs
  • Confirm metric registration

Open runbook →

Operating model

These runbooks follow a common troubleshooting pattern designed to keep recovery explainable, safe, and repeatable across environments.

Observe symptom
        ↓
Verify service/container state
        ↓
Check metrics or targets
        ↓
Confirm configuration
        ↓
Restart / recover safely
        ↓
Validate platform behavior

The goal is not only to fix issues, but to make the path to recovery visible and teachable.

Where runbooks sit in the Engineering Mesh

Runbooks are the operational bridge between observability and delivery.

Architecture
        ↓
Sandbox
        ↓
Observability
        ↓
Runbooks
        ↓
Delivery

The sandbox creates a safe place to generate and observe platform signals. Runbooks turn those signals into repeatable operational action.

From runbooks to action

These related pages connect operational procedures to the rest of the platform.

Planned expansions

  • Observability stack reset
  • Container health verification
  • Incident simulation: Grafana “No Data”
  • Prometheus scrape failure recovery

The long-term goal is to make this page the operational entry point for troubleshooting and platform recovery across the JLT-Lane ecosystem.