Skip to content

AgentWatch

Proactive LLM Governance Platform — Prevent runaway agent loops from burning your budget before it starts.

What is AgentWatch?

AgentWatch is an ultra-low latency API proxy and SDK that intercepts, manages, and enforces budget constraints on LLM API requests at the edge. It acts as a proactive governance layer between your application and upstream providers like OpenAI and Anthropic.

The Problem

As engineering teams adopt autonomous LLM agents — coding assistants, research bots, recursive planners — they face a critical financial vulnerability: the runaway loop.

If an agent gets stuck in a recursive error-correction loop, it can execute hundreds of API calls per minute. Because each iteration appends the previous output to the context window, token size grows quadratically. A single stuck agent can burn thousands of dollars in minutes.

Iteration 1:   1,000 tokens  →  $0.003
Iteration 10:  10,000 tokens →  $0.030
Iteration 50:  250,000 tokens → $0.750
Iteration 100: 1,000,000 tokens → $3.000

Passive monitoring tools only report this after the budget is gone. AgentWatch prevents it before the call is made.

The Solution

AgentWatch provides three layers of protection:

1. Synchronous Budget Enforcement

Before any upstream LLM call, the SDK performs a pre-flight check to the AgentWatch edge. If the session's cumulative token cost exceeds the configured limit, the request is blocked instantly and an AgentBudgetExceeded exception is raised.

python
from agentwatch import WatchedOpenAI, AgentBudgetExceeded

client = WatchedOpenAI(
    agentwatch_api_key="aw_live_xxx",
    agentwatch_session_budget_usd=2.00,
    agentwatch_enforcement_mode=True,
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Refactor this module..."}]
    )
except AgentBudgetExceeded as e:
    print(f"Blocked: spent ${e.spent:.4f}, limit ${e.limit:.4f}")

2. Inline Anomaly Detection

AgentWatch detects runaway behavior before the budget is exhausted. It maintains a rolling window of the last 5 iterations per session and calculates token growth ratios. If three consecutive iterations show >1.4x prompt growth — the hallmark of a context-appending loop — an alert is fired via Slack webhook.

3. Fail-Open Resilience

If AgentWatch infrastructure experiences downtime, budget checks silently fail open. Your production traffic is never interrupted. This is a core design principle — AgentWatch uptime never causes customer outages.

Key Features

FeatureDescription
Session TrackingGlobal state tracked across your entire agent network via sub-1ms Cloudflare KV edge storage
Budget EnforcementSynchronous pre-call budget ceiling check. Drops requests instantly if limits are exceeded
Anomaly DetectionIdentifies the 1.4x consecutive context-growth signature of a stuck loop at iteration 4
Fail-OpenAgentWatch downtime never causes customer outages
5 ProvidersOpenAI, Anthropic, Groq, xAI, Gemini — all supported
SOC 2 CC6.1Compliance telemetry reports with audit-ready summaries
Team BudgetsMonthly USD caps per team with hard-stop enforcement

Supported Providers

ProviderStatus
OpenAISupported
AnthropicSupported
GroqSupported
xAI (Grok)Supported
GeminiSupported

Architecture

AgentWatch runs on Cloudflare's global edge infrastructure:

┌─────────────┐     ┌──────────────────┐     ┌─────────────┐
│  Python SDK  │────▶│  Cloudflare Edge  │────▶│  LLM Provider│
│ (WatchedOpenAI)│   │  (AgentWatch)    │     │  (OpenAI,    │
└─────────────┘     └──────────────────┘     │   Anthropic) │
                            │                 └─────────────┘
                            │ KV (session state)
                            │ Queue (telemetry buffer)

                    ┌──────────────────┐
                    │    Supabase       │
                    │  (Postgres logs)  │
                    └──────────────────┘

Next Steps