SynapseX Proxy
The SynapseX Proxy is a lightweight gateway that adds quantum reranking to any OpenAI-compatible LLM service. Use it to enhance responses from OpenAI, FlexAI, or any compatible API.
Overviewβ
ββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββββ
β SYNAPSEX PROXY β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Your Application β
β β β
β βΌ POST /v1/chat/completions β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SynapseX Proxy β β
β β β β
β β 1. Receive request β β
β β 2. Generate K requests (varied temperatures) β β
β β 3. Forward to upstream LLM β β
β β 4. Collect K responses β β
β β 5. Apply quantum reranking β β
β β 6. Return best response β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββ β
β β Upstream LLM β β
β β β β
β β OpenAI β FlexAI β Azure OpenAI β vLLM β Ollama β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
When to Use Proxyβ
| Scenario | Solution |
|---|---|
| Already using OpenAI/FlexAI, want better quality | Proxy β |
| Need your own model + multi-tenancy | SynapseX API |
| Want quantum reranking only | Proxy β |
| Need custom model training | SynapseX API |
| Stateless, edge deployment | Proxy β |
Quick Startβ
1. Configure Upstreamβ
Set your upstream LLM provider:
# Environment variables
export SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1"
export SYNAPSEX_PROXY_UPSTREAM_KEY="sk-openai-xxx"
export SYNAPSEX_PROXY_API_KEY="sk-synapsex-proxy-xxx"
2. Start Proxyβ
# Docker
docker run -p 8080:8080 \
-e SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
-e SYNAPSEX_PROXY_UPSTREAM_KEY="sk-openai-xxx" \
softquantus/synapsex-proxy:latest
3. Use the Proxyβ
Replace your OpenAI base URL:
from openai import OpenAI
# Before: Direct OpenAI
# client = OpenAI(api_key="sk-openai-xxx")
# After: Through SynapseX Proxy
client = OpenAI(
api_key="sk-synapsex-proxy-xxx",
base_url="http://localhost:8080/v1"
)
response = client.chat.completions.create(
model="gpt-4", # Passed through to upstream
messages=[{"role": "user", "content": "Explain quantum computing"}],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 4
}
)
print(response.choices[0].message.content)
API Referenceβ
Chat Completionsβ
POST /v1/chat/completions
Request Body:
{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 500,
"use_rerank": "quantum_cpu",
"rerank_k": 4,
"rerank_temp_spread": 0.3
}
| Field | Type | Default | Description |
|---|---|---|---|
model | string | Required | Upstream model ID |
messages | array | Required | Conversation messages |
use_rerank | string | "classic" | "off", "classic", "quantum_cpu", "quantum_gpu" |
rerank_k | int | 4 | Number of candidates |
rerank_temp_spread | float | 0.3 | Temperature variation range |
stream | boolean | false | Enable streaming |
Response:
{
"id": "chatcmpl-proxy-abc123",
"object": "chat.completion",
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 50,
"total_tokens": 65
},
"meta": {
"proxy": true,
"upstream": "openai",
"rerank_tier": "quantum_cpu",
"candidates": 4,
"selected_index": 2,
"quality_score": 0.94
}
}
Disable Rerankingβ
For passthrough mode:
{
"model": "gpt-4",
"messages": [...],
"use_rerank": "off"
}
Configurationβ
Environment Variablesβ
| Variable | Required | Default | Description |
|---|---|---|---|
SYNAPSEX_PROXY_UPSTREAM_URL | Yes | - | Upstream API URL |
SYNAPSEX_PROXY_UPSTREAM_KEY | Yes | - | Upstream API key |
SYNAPSEX_PROXY_API_KEY | No | - | Proxy authentication key |
SYNAPSEX_PROXY_PORT | No | 8080 | Server port |
SYNAPSEX_PROXY_DEFAULT_RERANK | No | classic | Default rerank tier |
SYNAPSEX_PROXY_DEFAULT_K | No | 4 | Default candidate count |
SYNAPSEX_PROXY_QCOS_API_KEY | No | - | QCOS API key for quantum |
SYNAPSEX_PROXY_LOG_LEVEL | No | INFO | Logging level |
Config Fileβ
Alternatively, use config.yaml:
upstream:
url: "https://api.openai.com/v1"
api_key: "${OPENAI_API_KEY}"
timeout: 30
proxy:
port: 8080
api_key: "sk-synapsex-proxy-xxx"
rerank:
default_tier: "quantum_cpu"
default_k: 4
temp_spread: 0.3
softqcos:
api_key: "${QCOS_API_KEY}"
endpoint: "https://api.softqcos.softquantus.com/v2"
logging:
level: "INFO"
format: "json"
Deployment Optionsβ
Dockerβ
docker run -d \
--name synapsex-proxy \
-p 8080:8080 \
-e SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
-e SYNAPSEX_PROXY_UPSTREAM_KEY="sk-xxx" \
-e SYNAPSEX_PROXY_QCOS_API_KEY="softqcos-xxx" \
softquantus/synapsex-proxy:latest
Docker Composeβ
version: '3.8'
services:
synapsex-proxy:
image: softquantus/synapsex-proxy:latest
ports:
- "8080:8080"
environment:
SYNAPSEX_PROXY_UPSTREAM_URL: "https://api.openai.com/v1"
SYNAPSEX_PROXY_UPSTREAM_KEY: "${OPENAI_API_KEY}"
SYNAPSEX_PROXY_QCOS_API_KEY: "${QCOS_API_KEY}"
restart: unless-stopped
Kubernetesβ
apiVersion: apps/v1
kind: Deployment
metadata:
name: synapsex-proxy
spec:
replicas: 3
selector:
matchLabels:
app: synapsex-proxy
template:
metadata:
labels:
app: synapsex-proxy
spec:
containers:
- name: proxy
image: softquantus/synapsex-proxy:latest
ports:
- containerPort: 8080
env:
- name: SYNAPSEX_PROXY_UPSTREAM_URL
value: "https://api.openai.com/v1"
- name: SYNAPSEX_PROXY_UPSTREAM_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: synapsex-proxy
spec:
selector:
app: synapsex-proxy
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
Azure Container Appsβ
az containerapp create \
--name synapsex-proxy \
--resource-group myResourceGroup \
--environment myEnvironment \
--image softquantus/synapsex-proxy:latest \
--target-port 8080 \
--ingress external \
--env-vars \
SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
SYNAPSEX_PROXY_UPSTREAM_KEY=secretref:openai-key
Upstream Providersβ
OpenAIβ
upstream:
url: "https://api.openai.com/v1"
api_key: "sk-xxx"
Azure OpenAIβ
upstream:
url: "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT"
api_key: "xxx"
headers:
api-version: "2024-02-01"
FlexAIβ
upstream:
url: "https://api.flexai.com/v1"
api_key: "flexai-xxx"
Local vLLMβ
upstream:
url: "http://localhost:8000/v1"
api_key: "not-needed"
Ollamaβ
upstream:
url: "http://localhost:11434/v1"
api_key: "not-needed"
Multi-Upstream Routingβ
Route requests to different providers:
upstreams:
openai:
url: "https://api.openai.com/v1"
api_key: "sk-openai-xxx"
models: ["gpt-4", "gpt-3.5-turbo"]
flexai:
url: "https://api.flexai.com/v1"
api_key: "flexai-xxx"
models: ["llama-3-70b", "mixtral-8x7b"]
local:
url: "http://localhost:8000/v1"
api_key: ""
models: ["local-model"]
routing:
strategy: "model_match" # Route based on model name
Performanceβ
Latency Impactβ
| Tier | Overhead | Total with K=4 |
|---|---|---|
| Off | 0ms | Upstream only |
| Classic | +5ms | Upstream Γ 4 + 5ms |
| Quantum CPU | +50ms | Upstream Γ 4 + 50ms |
| Quantum GPU | +100ms | Upstream Γ 4 + 100ms |
Optimization Tipsβ
- Use streaming - Response starts before all candidates complete
- Reduce K for speed - K=2 is faster with modest quality gain
- Cache upstream responses - Enable response caching
- Use regional endpoints - Deploy proxy near upstream
Monitoringβ
Metrics Endpointβ
GET /metrics
Returns Prometheus-compatible metrics:
synapsex_proxy_requests_total{tier="quantum_cpu"} 1234
synapsex_proxy_latency_seconds_bucket{le="0.1"} 500
synapsex_proxy_upstream_requests_total{provider="openai"} 4936
synapsex_proxy_rerank_quality_score{tier="quantum_cpu"} 0.91
Health Checkβ
GET /health
{
"status": "healthy",
"upstream": "connected",
"softqcos": "connected",
"version": "2.1.0"
}