LLM Gateway Dashboard

Introduction

LLM Gateway is an intelligent routing layer that sits between your application and multiple LLM providers. It automatically selects the best model for each request based on cost, latency, reliability, and your preferences.

Cost Optimization

Save 30-50% on LLM costs with intelligent model selection

Automatic Failover

Never experience downtime with multi-provider fallbacks

Full Observability

Execution receipts for every request with routing decisions

Drop-in Compatible

Works with OpenAI SDK, LangChain, and any OpenAI-compatible client

Quick Start

Get up and running in under 5 minutes. Here's everything you need.

1. Get your API key

Create API Key

2. Make your first request

Use cURL or any HTTP client to make a request:

First API Callbash

curl https://llm-gateway-kqks.onrender.com/v1/chat/completions \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

3. Or use the OpenAI SDK

Just change the base URL—everything else stays the same:

Pythonpython

from openai import OpenAI

client = OpenAI(
    api_key="your-llm-gateway-api-key",
    base_url="https://llm-gateway-kqks.onrender.com/v1"
)

response = client.chat.completions.create(
    model="auto",  # Let the gateway choose the best model
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ]
)

print(response.choices[0].message.content)

Node.jsjavascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-llm-gateway-api-key',
  baseURL: 'https://llm-gateway-kqks.onrender.com/v1',
});

const response = await client.chat.completions.create({
  model: 'auto',
  messages: [
    { role: 'user', content: 'Hello, world!' }
  ],
});

console.log(response.choices[0].message.content);

Authentication

All API requests require authentication using a Bearer token.

Authorization Headerhttp

Authorization: Bearer llm_gw_xxxxxxxxxxxxxxxxxxxx

API Key Types

Type	Prefix	Use Case
Production	`llm_gw_prod_`	Live production traffic
Development	`llm_gw_dev_`	Testing and development

Security Best Practices

• Never expose API keys in client-side code
• Use environment variables to store keys
• Rotate keys periodically
• Use different keys for development and production

Auto-Routing

Auto-routing is the core feature of LLM Gateway. When you set model: "auto", the gateway evaluates multiple factors to select the optimal model.

How It Works

Request Analysis

We analyze your request including prompt complexity, expected output length, and any constraints you specify.

Provider Evaluation

We check real-time availability, latency, and cost across all enabled providers.

Model Selection

Using your preferences and our optimization algorithms, we select the best model.

Execution & Fallback

We execute the request with automatic fallback if the primary choice fails.

Routing Hints

You can provide hints to influence routing decisions:

Routing Hintsjson

{
  "model": "auto",
  "messages": [...],
  "x-routing-hints": {
    "priority": "cost",        // "cost" | "latency" | "quality"
    "max_latency_ms": 2000,    // Maximum acceptable latency
    "prefer_providers": ["anthropic", "openai"],
    "exclude_providers": ["cohere"],
    "min_quality_score": 0.8
  }
}

Providers & Models

LLM Gateway supports all major LLM providers. You can use auto-routing or specify a model directly.

Supported Providers

Provider	Models	Features
OpenAI	GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo	Functions, Vision, JSON mode
Anthropic	Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku	Long context, Tool use
Google	Gemini Pro, Gemini Ultra	Multimodal, Long context
Mistral	Mistral Large, Mistral Medium, Mistral Small	Fast inference, Cost-effective
Cohere	Command R+, Command R	RAG optimized, Multilingual

Direct Model Access

You can also request a specific model directly:

Specific Modeljson

{
  "model": "openai/gpt-4o",
  "messages": [...]
}

// Or with provider prefix
{
  "model": "anthropic/claude-3-5-sonnet-20241022",
  "messages": [...]
}

Fallbacks & Reliability

LLM Gateway automatically handles failures with intelligent fallback strategies.

Automatic Fallback

When a provider fails, we automatically retry with the next best option:

Primary fails

Retry secondary

Success

Circuit Breaker

We implement circuit breakers to prevent cascading failures:

Closed: Normal operation, requests pass through
Open: Provider temporarily disabled after repeated failures
Half-Open: Testing provider recovery with limited traffic

Custom Fallback Chain

Custom Fallbackjson

{
  "model": "auto",
  "messages": [...],
  "x-fallback-chain": [
    "anthropic/claude-3-5-sonnet",
    "openai/gpt-4o",
    "mistral/mistral-large"
  ]
}

Cost Optimization

LLM Gateway helps you save 30-50% on LLM costs through intelligent routing and real-time cost analysis.

How We Optimize Costs

Smart Model Selection

We match request complexity to the most cost-effective model that meets quality requirements.

Real-time Pricing

We track pricing changes across providers and adjust routing instantly.

Spend Limits

Set daily, weekly, or monthly spend limits to prevent unexpected charges.

Cost Alerts

Get notified when spending exceeds thresholds or anomalies are detected.

Setting Spend Limits

Spend Limits (Dashboard or API)json

{
  "limits": {
    "daily_usd": 100,
    "monthly_usd": 2500,
    "per_request_usd": 0.50
  },
  "alerts": {
    "threshold_percent": 80,
    "webhook_url": "https://your-app.com/webhook/spend-alert"
  }
}

Chat Completions

The Chat Completions API is fully compatible with the OpenAI specification.

Request Format

POST /v1/chat/completionsjson

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Responsejson

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "anthropic/claude-3-haiku",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  },
  "x-llm-gateway": {
    "request_id": "req_xyz789",
    "routing_decision": "cost_optimized",
    "cost_usd": 0.00012,
    "latency_ms": 234,
    "fallback_used": false
  }
}

Streaming

Stream responses in real-time using Server-Sent Events (SSE).

Streaming Requestpython

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://llm-gateway-kqks.onrender.com/v1"
)

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Execution Receipts

Every request generates an execution receipt with full details about the routing decision, cost, and performance.

Execution Receiptjson

{
  "request_id": "req_abc123xyz",
  "timestamp": "2024-01-15T10:30:00Z",
  "routing": {
    "strategy": "cost_optimized",
    "candidates_evaluated": 5,
    "winner": "anthropic/claude-3-haiku",
    "reason": "Lowest cost meeting quality threshold"
  },
  "execution": {
    "provider": "anthropic",
    "model": "claude-3-haiku-20240307",
    "latency_ms": 234,
    "routing_overhead_ms": 12,
    "tokens": {
      "prompt": 150,
      "completion": 89,
      "total": 239
    }
  },
  "cost": {
    "provider_cost_usd": 0.00023,
    "platform_fee_usd": 0.00002,
    "total_usd": 0.00025,
    "savings_vs_default_usd": 0.00089
  },
  "fallback": {
    "used": false,
    "attempts": []
  }
}

Access receipts via the dashboard or the x-llm-gateway response header.

Rate Limits

Rate limits protect the system and ensure fair usage across all customers.

Plan	RPM	TPM	Daily Requests
Free	20	40,000	1,000
Starter	100	200,000	10,000
Pro	500	1,000,000	100,000
Enterprise	Custom	Custom	Unlimited

Rate Limit Headers

Response Headershttp

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704067260

Dashboard Overview

The dashboard provides real-time visibility into your LLM usage, costs, and performance.

Analytics

Request volume, latency percentiles, and success rates

Cost Tracking

Real-time spend, savings analysis, and billing

Request Logs

Searchable logs with execution receipts

Configuration

Routing preferences, rate limits, and alerts

API Keys

Manage API keys for different environments and use cases.

Creating Keys

Navigate to Settings → API Keys in the dashboard
Click "Create New Key"
Select environment (Production or Development)
Set optional rate limits and expiration
Copy and securely store your key

API keys are shown only once. Store them securely in your environment variables or secrets manager.

Billing & Usage

Pay only for what you use with transparent, per-request pricing.

Pricing Model

Provider costPass-through

Platform fee10% of provider cost

Your totalProvider + 10%

No monthly minimums. No hidden fees. Cancel anytime.

Routing Preferences

Customize how the auto-router selects models for your requests.

Routing Preferencesjson

{
  "default_priority": "cost",
  "quality_threshold": 0.8,
  "max_latency_ms": 3000,
  "allowed_providers": ["openai", "anthropic", "mistral"],
  "blocked_providers": [],
  "allowed_models": [],
  "blocked_models": [],
  "fallback_enabled": true,
  "fallback_chain": ["anthropic/claude-3-haiku", "mistral/mistral-small"]
}

OpenAI SDK

Use LLM Gateway as a drop-in replacement for the OpenAI SDK.

Pythonpython

from openai import OpenAI

client = OpenAI(
    api_key="your-llm-gateway-api-key",
    base_url="https://llm-gateway-kqks.onrender.com/v1"
)

# All OpenAI SDK methods work the same
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}]
)

Node.jsjavascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.LLM_GATEWAY_API_KEY,
  baseURL: 'https://llm-gateway-kqks.onrender.com/v1',
});

const response = await client.chat.completions.create({
  model: 'auto',
  messages: [{ role: 'user', content: 'Hello!' }],
});

LangChain

Integrate LLM Gateway with LangChain for complex AI workflows.

LangChain Integrationpython

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="auto",
    openai_api_key="your-llm-gateway-api-key",
    openai_api_base="https://llm-gateway-kqks.onrender.com/v1"
)

# Use with chains, agents, etc.
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["topic"],
    template="Write a short poem about {topic}."
)

chain = LLMChain(llm=llm, prompt=prompt)
result = chain.run(topic="the moon")

LlamaIndex

Use LLM Gateway with LlamaIndex for RAG and document Q&A.

LlamaIndex Integrationpython

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(
    model="auto",
    api_key="your-llm-gateway-api-key",
    api_base="https://llm-gateway-kqks.onrender.com/v1"
)

# Now use LlamaIndex as normal
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What is this document about?")

Ready to get started?

Create your free account and start routing LLM requests in minutes.

Create Free Account View on GitHub