AgentWatch

Proactive LLM Governance Platform — Prevent runaway agent loops from burning your budget before it starts.

What is AgentWatch?

AgentWatch is an ultra-low latency API proxy and SDK that intercepts, manages, and enforces budget constraints on LLM API requests at the edge. It acts as a proactive governance layer between your application and upstream providers like OpenAI and Anthropic.

The Problem

As engineering teams adopt autonomous LLM agents — coding assistants, research bots, recursive planners — they face a critical financial vulnerability: the runaway loop.

If an agent gets stuck in a recursive error-correction loop, it can execute hundreds of API calls per minute. Because each iteration appends the previous output to the context window, token size grows quadratically. A single stuck agent can burn thousands of dollars in minutes.

Iteration 1:   1,000 tokens  →  $0.003
Iteration 10:  10,000 tokens →  $0.030
Iteration 50:  250,000 tokens → $0.750
Iteration 100: 1,000,000 tokens → $3.000

Passive monitoring tools only report this after the budget is gone. AgentWatch prevents it before the call is made.

The Solution

AgentWatch provides three layers of protection:

1. Synchronous Budget Enforcement

Before any upstream LLM call, the SDK performs a pre-flight check to the AgentWatch edge. If the session's cumulative token cost exceeds the configured limit, the request is blocked instantly and an AgentBudgetExceeded exception is raised.

python

from agentwatch import WatchedOpenAI, AgentBudgetExceeded

client = WatchedOpenAI(
    agentwatch_api_key="aw_live_xxx",
    agentwatch_session_budget_usd=2.00,
    agentwatch_enforcement_mode=True,
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Refactor this module..."}]
    )
except AgentBudgetExceeded as e:
    print(f"Blocked: spent ${e.spent:.4f}, limit ${e.limit:.4f}")

2. Inline Anomaly Detection

AgentWatch detects runaway behavior before the budget is exhausted. It maintains a rolling window of the last 5 iterations per session and calculates token growth ratios. If three consecutive iterations show >1.4x prompt growth — the hallmark of a context-appending loop — an alert is fired via Slack webhook.

3. Fail-Open Resilience

If AgentWatch infrastructure experiences downtime, budget checks silently fail open. Your production traffic is never interrupted. This is a core design principle — AgentWatch uptime never causes customer outages.

Key Features

Feature	Description
Session Tracking	Global state tracked across your entire agent network via sub-1ms Cloudflare KV edge storage
Budget Enforcement	Synchronous pre-call budget ceiling check. Drops requests instantly if limits are exceeded
Anomaly Detection	Identifies the 1.4x consecutive context-growth signature of a stuck loop at iteration 4
Fail-Open	AgentWatch downtime never causes customer outages
5 Providers	OpenAI, Anthropic, Groq, xAI, Gemini — all supported
SOC 2 CC6.1	Compliance telemetry reports with audit-ready summaries
Team Budgets	Monthly USD caps per team with hard-stop enforcement

Supported Providers

Provider	Status
OpenAI	Supported
Anthropic	Supported
Groq	Supported
xAI (Grok)	Supported
Gemini	Supported

Architecture

AgentWatch runs on Cloudflare's global edge infrastructure:

┌─────────────┐     ┌──────────────────┐     ┌─────────────┐
│  Python SDK  │────▶│  Cloudflare Edge  │────▶│  LLM Provider│
│ (WatchedOpenAI)│   │  (AgentWatch)    │     │  (OpenAI,    │
└─────────────┘     └──────────────────┘     │   Anthropic) │
                            │                 └─────────────┘
                            │ KV (session state)
                            │ Queue (telemetry buffer)
                            ▼
                    ┌──────────────────┐
                    │    Supabase       │
                    │  (Postgres logs)  │
                    └──────────────────┘

Next Steps

Quickstart — Get running in under 2 minutes
Architecture — Understand the system design
Session Budgets — Configure budget enforcement
Python SDK — Full SDK reference

AgentWatch ​

What is AgentWatch? ​

The Problem ​

The Solution ​

1. Synchronous Budget Enforcement ​

2. Inline Anomaly Detection ​

3. Fail-Open Resilience ​

Key Features ​

Supported Providers ​

Architecture ​

Next Steps ​