Skip to main content

SynapseX Proxy

The SynapseX Proxy is a lightweight gateway that adds quantum reranking to any OpenAI-compatible LLM service. Use it to enhance responses from OpenAI, FlexAI, or any compatible API.

Overview​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SYNAPSEX PROXY β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ Your Application β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό POST /v1/chat/completions β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ SynapseX Proxy β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ 1. Receive request β”‚ β”‚
β”‚ β”‚ 2. Generate K requests (varied temperatures) β”‚ β”‚
β”‚ β”‚ 3. Forward to upstream LLM β”‚ β”‚
β”‚ β”‚ 4. Collect K responses β”‚ β”‚
β”‚ β”‚ 5. Apply quantum reranking β”‚ β”‚
β”‚ β”‚ 6. Return best response β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Upstream LLM β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ OpenAI β”‚ FlexAI β”‚ Azure OpenAI β”‚ vLLM β”‚ Ollama β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

When to Use Proxy​

ScenarioSolution
Already using OpenAI/FlexAI, want better qualityProxy βœ“
Need your own model + multi-tenancySynapseX API
Want quantum reranking onlyProxy βœ“
Need custom model trainingSynapseX API
Stateless, edge deploymentProxy βœ“

Quick Start​

1. Configure Upstream​

Set your upstream LLM provider:

# Environment variables
export SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1"
export SYNAPSEX_PROXY_UPSTREAM_KEY="sk-openai-xxx"
export SYNAPSEX_PROXY_API_KEY="sk-synapsex-proxy-xxx"

2. Start Proxy​

# Docker
docker run -p 8080:8080 \
-e SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
-e SYNAPSEX_PROXY_UPSTREAM_KEY="sk-openai-xxx" \
softquantus/synapsex-proxy:latest

3. Use the Proxy​

Replace your OpenAI base URL:

from openai import OpenAI

# Before: Direct OpenAI
# client = OpenAI(api_key="sk-openai-xxx")

# After: Through SynapseX Proxy
client = OpenAI(
api_key="sk-synapsex-proxy-xxx",
base_url="http://localhost:8080/v1"
)

response = client.chat.completions.create(
model="gpt-4", # Passed through to upstream
messages=[{"role": "user", "content": "Explain quantum computing"}],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 4
}
)

print(response.choices[0].message.content)

API Reference​

Chat Completions​

POST /v1/chat/completions

Request Body:

{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 500,
"use_rerank": "quantum_cpu",
"rerank_k": 4,
"rerank_temp_spread": 0.3
}
FieldTypeDefaultDescription
modelstringRequiredUpstream model ID
messagesarrayRequiredConversation messages
use_rerankstring"classic""off", "classic", "quantum_cpu", "quantum_gpu"
rerank_kint4Number of candidates
rerank_temp_spreadfloat0.3Temperature variation range
streambooleanfalseEnable streaming

Response:

{
"id": "chatcmpl-proxy-abc123",
"object": "chat.completion",
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 50,
"total_tokens": 65
},
"meta": {
"proxy": true,
"upstream": "openai",
"rerank_tier": "quantum_cpu",
"candidates": 4,
"selected_index": 2,
"quality_score": 0.94
}
}

Disable Reranking​

For passthrough mode:

{
"model": "gpt-4",
"messages": [...],
"use_rerank": "off"
}

Configuration​

Environment Variables​

VariableRequiredDefaultDescription
SYNAPSEX_PROXY_UPSTREAM_URLYes-Upstream API URL
SYNAPSEX_PROXY_UPSTREAM_KEYYes-Upstream API key
SYNAPSEX_PROXY_API_KEYNo-Proxy authentication key
SYNAPSEX_PROXY_PORTNo8080Server port
SYNAPSEX_PROXY_DEFAULT_RERANKNoclassicDefault rerank tier
SYNAPSEX_PROXY_DEFAULT_KNo4Default candidate count
SYNAPSEX_PROXY_QCOS_API_KEYNo-QCOS API key for quantum
SYNAPSEX_PROXY_LOG_LEVELNoINFOLogging level

Config File​

Alternatively, use config.yaml:

upstream:
url: "https://api.openai.com/v1"
api_key: "${OPENAI_API_KEY}"
timeout: 30

proxy:
port: 8080
api_key: "sk-synapsex-proxy-xxx"

rerank:
default_tier: "quantum_cpu"
default_k: 4
temp_spread: 0.3

softqcos:
api_key: "${QCOS_API_KEY}"
endpoint: "https://api.softqcos.softquantus.com/v2"

logging:
level: "INFO"
format: "json"

Deployment Options​

Docker​

docker run -d \
--name synapsex-proxy \
-p 8080:8080 \
-e SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
-e SYNAPSEX_PROXY_UPSTREAM_KEY="sk-xxx" \
-e SYNAPSEX_PROXY_QCOS_API_KEY="softqcos-xxx" \
softquantus/synapsex-proxy:latest

Docker Compose​

version: '3.8'
services:
synapsex-proxy:
image: softquantus/synapsex-proxy:latest
ports:
- "8080:8080"
environment:
SYNAPSEX_PROXY_UPSTREAM_URL: "https://api.openai.com/v1"
SYNAPSEX_PROXY_UPSTREAM_KEY: "${OPENAI_API_KEY}"
SYNAPSEX_PROXY_QCOS_API_KEY: "${QCOS_API_KEY}"
restart: unless-stopped

Kubernetes​

apiVersion: apps/v1
kind: Deployment
metadata:
name: synapsex-proxy
spec:
replicas: 3
selector:
matchLabels:
app: synapsex-proxy
template:
metadata:
labels:
app: synapsex-proxy
spec:
containers:
- name: proxy
image: softquantus/synapsex-proxy:latest
ports:
- containerPort: 8080
env:
- name: SYNAPSEX_PROXY_UPSTREAM_URL
value: "https://api.openai.com/v1"
- name: SYNAPSEX_PROXY_UPSTREAM_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: synapsex-proxy
spec:
selector:
app: synapsex-proxy
ports:
- port: 80
targetPort: 8080
type: LoadBalancer

Azure Container Apps​

az containerapp create \
--name synapsex-proxy \
--resource-group myResourceGroup \
--environment myEnvironment \
--image softquantus/synapsex-proxy:latest \
--target-port 8080 \
--ingress external \
--env-vars \
SYNAPSEX_PROXY_UPSTREAM_URL="https://api.openai.com/v1" \
SYNAPSEX_PROXY_UPSTREAM_KEY=secretref:openai-key

Upstream Providers​

OpenAI​

upstream:
url: "https://api.openai.com/v1"
api_key: "sk-xxx"

Azure OpenAI​

upstream:
url: "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT"
api_key: "xxx"
headers:
api-version: "2024-02-01"

FlexAI​

upstream:
url: "https://api.flexai.com/v1"
api_key: "flexai-xxx"

Local vLLM​

upstream:
url: "http://localhost:8000/v1"
api_key: "not-needed"

Ollama​

upstream:
url: "http://localhost:11434/v1"
api_key: "not-needed"

Multi-Upstream Routing​

Route requests to different providers:

upstreams:
openai:
url: "https://api.openai.com/v1"
api_key: "sk-openai-xxx"
models: ["gpt-4", "gpt-3.5-turbo"]

flexai:
url: "https://api.flexai.com/v1"
api_key: "flexai-xxx"
models: ["llama-3-70b", "mixtral-8x7b"]

local:
url: "http://localhost:8000/v1"
api_key: ""
models: ["local-model"]

routing:
strategy: "model_match" # Route based on model name

Performance​

Latency Impact​

TierOverheadTotal with K=4
Off0msUpstream only
Classic+5msUpstream Γ— 4 + 5ms
Quantum CPU+50msUpstream Γ— 4 + 50ms
Quantum GPU+100msUpstream Γ— 4 + 100ms

Optimization Tips​

  1. Use streaming - Response starts before all candidates complete
  2. Reduce K for speed - K=2 is faster with modest quality gain
  3. Cache upstream responses - Enable response caching
  4. Use regional endpoints - Deploy proxy near upstream

Monitoring​

Metrics Endpoint​

GET /metrics

Returns Prometheus-compatible metrics:

synapsex_proxy_requests_total{tier="quantum_cpu"} 1234
synapsex_proxy_latency_seconds_bucket{le="0.1"} 500
synapsex_proxy_upstream_requests_total{provider="openai"} 4936
synapsex_proxy_rerank_quality_score{tier="quantum_cpu"} 0.91

Health Check​

GET /health
{
"status": "healthy",
"upstream": "connected",
"softqcos": "connected",
"version": "2.1.0"
}

Security​

Authentication​

Protect your proxy with API key:

proxy:
api_key: "sk-synapsex-proxy-xxx"
require_auth: true

Rate Limiting​

rate_limit:
enabled: true
requests_per_minute: 100
burst: 20

Request Logging​

logging:
log_requests: true
log_responses: false # Don't log response content
redact_keys: true # Redact API keys in logs

Pricing​

The proxy itself is free to deploy. You pay for:

ComponentCost
Upstream API callsAs per provider (OpenAI, etc.)
QCOS Quantum RerankingSee QCOS Pricing

Note: With K=4 reranking, you make 4Γ— the upstream API calls.


Next Steps​