Appearance
Fail-Open Resilience
AgentWatch is designed with a fundamental principle: our infrastructure never causes your production to go down. This is achieved through fail-open architecture at every layer.
What "Fail-Open" Means
When AgentWatch infrastructure is unreachable or experiencing issues, budget checks silently fail open — allowing API calls to proceed to the upstream provider without enforcement. This ensures your agents always work, even during AgentWatch outages.
Fail-Open Behavior by Mode
SDK-Only Mode (Recommended for most users)
When agentwatch_primary_provider is NOT set, only budget checks and telemetry go through AgentWatch:
python
client = WatchedOpenAI(
agentwatch_api_key="aw_live_xxx",
agentwatch_session_budget_usd=2.00,
agentwatch_enforcement_mode=True,
# No agentwatch_primary_provider — SDK calls go directly to OpenAI
)If AgentWatch is down:
- Budget check HTTP call fails (timeout at 2s)
- SDK catches exception, logs debug message
- API call proceeds directly to OpenAI
- Telemetry is silently dropped
Proxy Mode
When agentwatch_primary_provider is set, all traffic routes through AgentWatch:
python
client = WatchedOpenAI(
agentwatch_api_key="aw_live_xxx",
agentwatch_primary_provider="openai",
agentwatch_session_budget_usd=2.00,
agentwatch_enforcement_mode=True,
)If AgentWatch is down:
- The SDK automatically falls back to the direct provider URL
- No manual intervention required
Configuration
Fail-open is the default behavior. To change it:
python
# Fail-open (default) — AgentWatch outage doesn't affect your app
client = WatchedOpenAI(
agentwatch_enforcement_fail_open=True, # default
...
)
# Fail-closed — AgentWatch outage blocks API calls
client = WatchedOpenAI(
agentwatch_enforcement_fail_open=False,
...
)What Happens During an Outage
| Component | Behavior |
|---|---|
| Budget check | Fails open, API call proceeds |
| Telemetry logging | Silently dropped, no data loss |
| Anomaly detection | Disabled (requires edge processing) |
| Dashboard | May show stale data |
| Compliance reports | Skipped for the period |
Monitoring AgentWatch Health
Monitor AgentWatch availability via the health endpoint:
bash
curl https://agent-watch.dev/healthz
# Returns: okIf this endpoint returns non-200, AgentWatch is experiencing issues and fail-open is active.