Setting up a secure LLM proxy on Azure

Why a proxy layer

Routing every team’s LLM calls through a central proxy gives you one place to:

Authenticate and rate-limit per team or user
Trace and audit token consumption
Swap or load-balance across model providers without touching client code

Core pieces

LiteLLM Proxy as the routing layer — presents an OpenAI-compatible API in front of multiple providers
Azure API Management in front of the proxy for auth, quotas, and network policy
Application Insights for tracing latency and error rates per route

Latency wins

Most of the win came from connection reuse and provider-side keep-alives, not from removing hops — a proxy adds a hop, but a well-configured one adds single-digit milliseconds while giving you full observability.