SynapseX Proxy
The SynapseX Proxy is a lightweight gateway that adds quantum reranking to any OpenAI-compatible LLM service. Use it to enhance responses from OpenAI, FlexAI, or any compatible API.
Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYNAPSEX PROXY │
├─────────────────────────────────── ──────────────────────────────────────────┤
│ │
│ Your Application │
│ │ │
│ ▼ POST /v1/chat/completions │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ SynapseX Proxy │ │
│ │ │ │
│ │ 1. Receive request │ │
│ │ 2. Generate K requests (varied temperatures) │ │
│ │ 3. Forward to upstream LLM │ │
│ │ 4. Collect K responses │ │
│ │ 5. Apply quantum reranking │ │
│ │ 6. Return best response │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Upstream LLM │ │
│ │ │ │
│ │ OpenAI │ FlexAI │ Azure OpenAI │ vLLM │ Ollama │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
When to Use Proxy
| Scenario | Solution |
|---|---|
| Already using OpenAI/FlexAI, want better quality | Proxy ✓ |
| Need your own model + multi-tenancy | SynapseX API |
| Want quantum reranking only | Proxy ✓ |
| Need custom model training | SynapseX API |
| Stateless, edge deployment | Proxy ✓ |
Quick Start
1. Configure Upstream
Set your upstream LLM provider:
# Environment variables
export SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1"
export SYNAPSEX_PROXY_UPSTREAM_KEY="sk-openai-xxx"
export SYNAPSEX_PROXY_API_KEY="sk-synapsex-proxy-xxx"
2. Start Proxy
# Docker
docker run -p 8080:8080 \
-e SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
-e SYNAPSEX_PROXY_UPSTREAM_KEY="sk-openai-xxx" \
softquantus/synapsex-proxy:latest
3. Use the Proxy
Replace your OpenAI base URL:
from openai import OpenAI
# Before: Direct OpenAI
# client = OpenAI(api_key="sk-openai-xxx")
# After: Through SynapseX Proxy
client = OpenAI(
api_key="sk-synapsex-proxy-xxx",
base_url="http://localhost:8080/v1"
)
response = client.chat.completions.create(
model="gpt-4", # Passed through to upstream
messages=[{"role": "user", "content": "Explain quantum computing"}],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 4
}
)
print(response.choices[0].message.content)
API Reference
Chat Completions
POST /v1/chat/completions
Request Body:
{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 500,
"use_rerank": "quantum_cpu",
"rerank_k": 4,
"rerank_temp_spread": 0.3
}
| Field | Type | Default | Description |
|---|---|---|---|
model | string | Required | Upstream model ID |
messages | array | Required | Conversation messages |
use_rerank | string | "classic" | "off", "classic", "quantum_cpu", "quantum_gpu" |
rerank_k | int | 4 | Number of candidates |
rerank_temp_spread | float | 0.3 | Temperature variation range |
stream | boolean | false | Enable streaming |
Response:
{
"id": "chatcmpl-proxy-abc123",
"object": "chat.completion",
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 50,
"total_tokens": 65
},
"meta": {
"proxy": true,
"upstream": "openai",
"rerank_tier": "quantum_cpu",
"candidates": 4,
"selected_index": 2,
"quality_score": 0.94
}
}
Disable Reranking
For passthrough mode:
{
"model": "gpt-4",
"messages": [...],
"use_rerank": "off"
}
Configuration
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
SYNAPSEX_PROXY_UPSTREAM_URL | Yes | - | Upstream API URL |
SYNAPSEX_PROXY_UPSTREAM_KEY | Yes | - | Upstream API key |
SYNAPSEX_PROXY_API_KEY | No | - | Proxy authentication key |
SYNAPSEX_PROXY_PORT | No | 8080 | Server port |
SYNAPSEX_PROXY_DEFAULT_RERANK | No | classic | Default rerank tier |
SYNAPSEX_PROXY_DEFAULT_K | No | 4 | Default candidate count |
SYNAPSEX_PROXY_QCOS_API_KEY | No | - | QCOS API key for quantum |
SYNAPSEX_PROXY_LOG_LEVEL | No | INFO | Logging level |
Config File
Alternatively, use config.yaml:
upstream:
url: "https://api.openai.com/v1"
api_key: "${OPENAI_API_KEY}"
timeout: 30
proxy:
port: 8080
api_key: "sk-synapsex-proxy-xxx"
rerank:
default_tier: "quantum_cpu"
default_k: 4
temp_spread: 0.3
softqcos:
api_key: "${QCOS_API_KEY}"
endpoint: "https://api.softqcos.softquantus.com/v2"
logging:
level: "INFO"
format: "json"
Deployment Options
Docker
docker run -d \
--name synapsex-proxy \
-p 8080:8080 \
-e SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
-e SYNAPSEX_PROXY_UPSTREAM_KEY="sk-xxx" \
-e SYNAPSEX_PROXY_QCOS_API_KEY="softqcos-xxx" \
softquantus/synapsex-proxy:latest
Docker Compose
version: '3.8'
services:
synapsex-proxy:
image: softquantus/synapsex-proxy:latest
ports:
- "8080:8080"
environment:
SYNAPSEX_PROXY_UPSTREAM_URL: "https://api.openai.com/v1"
SYNAPSEX_PROXY_UPSTREAM_KEY: "${OPENAI_API_KEY}"
SYNAPSEX_PROXY_QCOS_API_KEY: "${QCOS_API_KEY}"
restart: unless-stopped
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: synapsex-proxy
spec:
replicas: 3
selector:
matchLabels:
app: synapsex-proxy
template:
metadata:
labels:
app: synapsex-proxy
spec:
containers:
- name: proxy
image: softquantus/synapsex-proxy:latest
ports:
- containerPort: 8080
env:
- name: SYNAPSEX_PROXY_UPSTREAM_URL
value: "https://api.openai.com/v1"
- name: SYNAPSEX_PROXY_UPSTREAM_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: synapsex-proxy
spec:
selector:
app: synapsex-proxy
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
Azure Container Apps
az containerapp create \
--name synapsex-proxy \
--resource-group myResourceGroup \
--environment myEnvironment \
--image softquantus/synapsex-proxy:latest \
--target-port 8080 \
--ingress external \
--env-vars \
SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
SYNAPSEX_PROXY_UPSTREAM_KEY=secretref:openai-key
Upstream Providers
OpenAI
upstream:
url: "https://api.openai.com/v1"
api_key: "sk-xxx"
Azure OpenAI
upstream:
url: "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT"
api_key: "xxx"
headers:
api-version: "2024-02-01"
FlexAI
upstream:
url: "https://api.flexai.com/v1"
api_key: "flexai-xxx"
Local vLLM
upstream:
url: "http://localhost:8000/v1"
api_key: "not-needed"
Ollama
upstream:
url: "http://localhost:11434/v1"
api_key: "not-needed"
Multi-Upstream Routing
Route requests to different providers:
upstreams:
openai:
url: "https://api.openai.com/v1"
api_key: "sk-openai-xxx"
models: ["gpt-4", "gpt-3.5-turbo"]
flexai:
url: "https://api.flexai.com/v1"
api_key: "flexai-xxx"
models: ["llama-3-70b", "mixtral-8x7b"]
local:
url: "http://localhost:8000/v1"
api_key: ""
models: ["local-model"]
routing:
strategy: "model_match" # Route based on model name
Performance
Latency Impact
| Tier | Overhead | Total with K=4 |
|---|---|---|
| Off | 0ms | Upstream only |
| Classic | +5ms | Upstream × 4 + 5ms |
| Quantum CPU | +50ms | Upstream × 4 + 50ms |
| Quantum GPU | +100ms | Upstream × 4 + 100ms |
Optimization Tips
- Use streaming - Response starts before all candidates complete
- Reduce K for speed - K=2 is faster with modest quality gain
- Cache upstream responses - Enable response caching
- Use regional endpoints - Deploy proxy near upstream
Monitoring
Metrics Endpoint
GET /metrics
Returns Prometheus-compatible metrics:
synapsex_proxy_requests_total{tier="quantum_cpu"} 1234
synapsex_proxy_latency_seconds_bucket{le="0.1"} 500
synapsex_proxy_upstream_requests_total{provider="openai"} 4936
synapsex_proxy_rerank_quality_score{tier="quantum_cpu"} 0.91
Health Check
GET /health
{
"status": "healthy",
"upstream": "connected",
"softqcos": "connected",
"version": "2.1.0"
}
Security
Authentication
Protect your proxy with API key:
proxy:
api_key: "sk-synapsex-proxy-xxx"
require_auth: true
Rate Limiting
rate_limit:
enabled: true
requests_per_minute: 100
burst: 20
Request Logging
logging:
log_requests: true
log_responses: false # Don't log response content
redact_keys: true # Redact API keys in logs
Pricing
The proxy itself is free to deploy. You pay for:
| Component | Cost |
|---|---|
| Upstream API calls | As per provider (OpenAI, etc.) |
| QCOS Quantum Reranking | See QCOS Pricing |
Note: With K=4 reranking, you make 4× the upstream API calls.
Next Steps
- ⚛️ Quantum Reranking - How reranking works
- 📖 API Reference - Full API documentation
- 🔧 QCOS Integration - Direct quantum access