Appearance
Architecture
AgentWatch is built on Cloudflare's global edge infrastructure to deliver sub-10ms budget enforcement with zero impact on your production traffic.
System Overview
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ WatchedOpenAI / wrap() │ │
│ │ ├── Budget check (pre-flight) │ │
│ │ ├── Telemetry logging (async) │ │
│ │ └── PII detection (in-memory) │ │
│ └─────────────────────────────────────────────────────┘ │
└───────────────────────────┬─────────────────────────────────┘
│ HTTPS
▼
┌─────────────────────────────────────────────────────────────┐
│ Cloudflare Edge Network │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ AgentWatch Worker │ │
│ │ ├── Authentication (Bearer token) │ │
│ │ ├── Rate limiting (per-tenant) │ │
│ │ ├── Budget check (KV lookup) │ │
│ │ ├── Rule evaluation (custom anomaly rules) │ │
│ │ ├── Proxy routing (5 providers) │ │
│ │ ├── Failover (automatic provider switching) │ │
│ │ ├── Stream budget enforcement (Durable Object) │ │
│ │ └── Telemetry ingestion → Queue → Supabase │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Cloudflare │ │ Durable │ │ Cloudflare │ │
│ │ KV │ │ Objects │ │ Queues │ │
│ │ (session │ │ (atomic │ │ (telemetry │ │
│ │ state) │ │ counters) │ │ buffer) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Supabase (Postgres) │
│ ├── llm_request_logs (telemetry) │
│ ├── auth_events (SSO audit trail) │
│ ├── api_access_log (API access audit) │
│ ├── model_pricing (cost estimation) │
│ └── report_dispatches (idempotency) │
└─────────────────────────────────────────────────────────────┘Request Flow
1. SDK Call Initiated
The SDK (Python or TypeScript) intercepts chat.completions.create() and performs:
- Budget check — HTTP GET to
/v1/budget-checkwith session ID and limit - PII scanning — In-memory scan of prompt text for sensitive data
- Upstream call — Forward to AgentWatch proxy (or direct to provider)
2. Edge Proxy Processing
The Worker receives the request and executes:
- Authentication — Validate Bearer token (KV lookup + static map fallback)
- Rate limiting — Per-tenant rate limit check
- Rule evaluation — Custom anomaly rules from tenant configuration
- Proxy routing — Route to the correct upstream provider
- Failover — On 403/429/5xx, automatically switch to fallback provider
- Response forwarding — Stream response back to SDK
3. Asynchronous Telemetry
After the response is returned:
- Queue dispatch — Log record sent to Cloudflare Queue
- KV update — Session token count incremented
- Anomaly check — Rolling window growth ratio analysis
- Supabase write — Queue consumer writes to
llm_request_logs
Key Design Decisions
Fail-Open by Default
AgentWatch never causes customer outages. If the edge proxy is unreachable:
- SDK-only mode: Budget check fails silently, API call proceeds to provider
- Proxy mode: SDK falls back to direct provider URL automatically
Tenant Isolation
All KV keys are namespaced by tenant ID: t:{tenantId}:s:{sessionId}:*. Cross-tenant data access is impossible through the KV layer.
Timing-Safe Authentication
Token comparison uses crypto.subtle.timingSafeEqual to prevent timing attacks. The constant-time comparison ensures response time is identical for valid and invalid tokens.
Async Telemetry Pipeline
Payload logging and risk scanning are offloaded to a background thread (SDK) or ctx.waitUntil() (Worker). The client receives the provider's response immediately with zero added latency.
Performance Characteristics
| Operation | Latency |
|---|---|
| Budget check (KV read) | 5-50ms (network round-trip) |
| PII detection (100KB) | <50ms |
| Rule evaluation (50 rules) | <10ms |
| Telemetry dispatch | Non-blocking (async) |
| Total proxy overhead | <100ms (excluding upstream) |