Skip to content

Architecture

AgentWatch is built on Cloudflare's global edge infrastructure to deliver sub-10ms budget enforcement with zero impact on your production traffic.

System Overview

┌─────────────────────────────────────────────────────────────┐
│                        Your Application                      │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  WatchedOpenAI / wrap()                             │    │
│  │  ├── Budget check (pre-flight)                      │    │
│  │  ├── Telemetry logging (async)                      │    │
│  │  └── PII detection (in-memory)                      │    │
│  └─────────────────────────────────────────────────────┘    │
└───────────────────────────┬─────────────────────────────────┘
                            │ HTTPS

┌─────────────────────────────────────────────────────────────┐
│                  Cloudflare Edge Network                     │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  AgentWatch Worker                                   │    │
│  │  ├── Authentication (Bearer token)                   │    │
│  │  ├── Rate limiting (per-tenant)                      │    │
│  │  ├── Budget check (KV lookup)                        │    │
│  │  ├── Rule evaluation (custom anomaly rules)          │    │
│  │  ├── Proxy routing (5 providers)                     │    │
│  │  ├── Failover (automatic provider switching)         │    │
│  │  ├── Stream budget enforcement (Durable Object)      │    │
│  │  └── Telemetry ingestion → Queue → Supabase         │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  Cloudflare   │  │  Durable     │  │  Cloudflare  │      │
│  │  KV           │  │  Objects     │  │  Queues      │      │
│  │  (session     │  │  (atomic     │  │  (telemetry  │      │
│  │   state)      │  │   counters)  │  │   buffer)    │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                     Supabase (Postgres)                       │
│  ├── llm_request_logs (telemetry)                            │
│  ├── auth_events (SSO audit trail)                           │
│  ├── api_access_log (API access audit)                       │
│  ├── model_pricing (cost estimation)                         │
│  └── report_dispatches (idempotency)                         │
└─────────────────────────────────────────────────────────────┘

Request Flow

1. SDK Call Initiated

The SDK (Python or TypeScript) intercepts chat.completions.create() and performs:

  1. Budget check — HTTP GET to /v1/budget-check with session ID and limit
  2. PII scanning — In-memory scan of prompt text for sensitive data
  3. Upstream call — Forward to AgentWatch proxy (or direct to provider)

2. Edge Proxy Processing

The Worker receives the request and executes:

  1. Authentication — Validate Bearer token (KV lookup + static map fallback)
  2. Rate limiting — Per-tenant rate limit check
  3. Rule evaluation — Custom anomaly rules from tenant configuration
  4. Proxy routing — Route to the correct upstream provider
  5. Failover — On 403/429/5xx, automatically switch to fallback provider
  6. Response forwarding — Stream response back to SDK

3. Asynchronous Telemetry

After the response is returned:

  1. Queue dispatch — Log record sent to Cloudflare Queue
  2. KV update — Session token count incremented
  3. Anomaly check — Rolling window growth ratio analysis
  4. Supabase write — Queue consumer writes to llm_request_logs

Key Design Decisions

Fail-Open by Default

AgentWatch never causes customer outages. If the edge proxy is unreachable:

  • SDK-only mode: Budget check fails silently, API call proceeds to provider
  • Proxy mode: SDK falls back to direct provider URL automatically

Tenant Isolation

All KV keys are namespaced by tenant ID: t:{tenantId}:s:{sessionId}:*. Cross-tenant data access is impossible through the KV layer.

Timing-Safe Authentication

Token comparison uses crypto.subtle.timingSafeEqual to prevent timing attacks. The constant-time comparison ensures response time is identical for valid and invalid tokens.

Async Telemetry Pipeline

Payload logging and risk scanning are offloaded to a background thread (SDK) or ctx.waitUntil() (Worker). The client receives the provider's response immediately with zero added latency.

Performance Characteristics

OperationLatency
Budget check (KV read)5-50ms (network round-trip)
PII detection (100KB)<50ms
Rule evaluation (50 rules)<10ms
Telemetry dispatchNon-blocking (async)
Total proxy overhead<100ms (excluding upstream)