Skip to content

Setting up a secure LLM proxy on Azure

Notes on architecting a traced, audited LLM proxy for enterprise API consumption.

Why a proxy layer

Routing every team’s LLM calls through a central proxy gives you one place to:

  • Authenticate and rate-limit per team or user
  • Trace and audit token consumption
  • Swap or load-balance across model providers without touching client code

Core pieces

  • LiteLLM Proxy as the routing layer — presents an OpenAI-compatible API in front of multiple providers
  • Azure API Management in front of the proxy for auth, quotas, and network policy
  • Application Insights for tracing latency and error rates per route

Latency wins

Most of the win came from connection reuse and provider-side keep-alives, not from removing hops — a proxy adds a hop, but a well-configured one adds single-digit milliseconds while giving you full observability.