Quantum Reranking
SynapseX's quantum reranking system uses QAOA (Quantum Approximate Optimization Algorithm) to select the best response from multiple candidates, improving response quality by up to 25%.
Overviewβ
Traditional LLM inference generates a single response. Quantum reranking:
- Generates K candidates with varied temperatures
- Scores each candidate on multiple quality metrics
- Uses QAOA optimization to select the best response
- Returns enhanced result with quality metadata
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUANTUM RERANKING PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β User Prompt β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CANDIDATE GENERATION (K=4) β β
β β β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β β β T=0.5 β β T=0.7 β β T=0.9 β β T=1.1 β β β
β β βResponse1β βResponse2β βResponse3β βResponse4β β β
β β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β β
β β β β β β β β
β ββββββββββΌβββββββββββββΌβββββββββββββΌβββββββββββββΌβββββββββββββββββββ β
β β β β β β
β βΌ βΌ βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β QUALITY SCORING β β
β β β β
β β β’ Coherence β’ Relevance β’ Factuality β β
β β β’ Completeness β’ Safety β’ Entropy β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β QAOA OPTIMIZATION β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββ β β
β β β Quantum Circuit (QCOS API) β β β
β β β β β β
β β β |0β© β[H]β[RZ]β[ZZ]β[RX]β β’ β β β
β β β |0β© β[H]β[RZ]β[ZZ]β[RX]β β’ β β β
β β β |0β© β[H]β[RZ]β[ZZ]β[RX]β β’ β β β
β β β |0β© β[H]β[RZ]β[ZZ]β[RX]β β’ β β β
β β β β β β
β β βββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β Best Response + Metadata β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
How It Worksβ
1. Candidate Generationβ
Multiple responses are generated with varied temperatures:
# Internal process (simplified)
candidates = []
for temp in [0.5, 0.7, 0.9, 1.1]:
response = model.generate(prompt, temperature=temp)
candidates.append(response)
This creates diversity in responses while maintaining coherence.
2. Quality Scoringβ
Each candidate is scored on multiple dimensions:
| Metric | Weight | Description |
|---|---|---|
| Coherence | 0.25 | Logical flow and structure |
| Relevance | 0.30 | Match to user query |
| Factuality | 0.20 | Accuracy of information |
| Completeness | 0.15 | Coverage of the topic |
| Safety | 0.10 | Absence of harmful content |
3. QAOA Optimizationβ
The selection problem is formulated as a QUBO (Quadratic Unconstrained Binary Optimization):
$$H = -\sum_i w_i x_i + \lambda \left(\sum_i x_i - 1\right)^2$$
Where:
- $x_i \in {0, 1}$ indicates selection of candidate $i$
- $w_i$ is the quality score of candidate $i$
- $\lambda$ enforces single selection constraint
The QAOA circuit finds the optimal $x$ that maximizes quality.
4. Response Selectionβ
The candidate with highest QAOA probability is selected and returned with metadata:
{
"choices": [{"message": {"content": "Selected response..."}}],
"meta": {
"rerank_tier": "quantum_cpu",
"candidates": 4,
"quality_score": 0.92,
"selected_index": 2,
"scores": [0.78, 0.85, 0.92, 0.81]
}
}
Reranking Tiersβ
Classicβ
Heuristic-based selection using weighted scoring.
response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={"use_rerank": "classic"}
)
| Aspect | Value |
|---|---|
| Method | Weighted heuristic |
| Latency | +5ms |
| Quality Improvement | Baseline |
| Cost | Included in plan |
Best for: Low-latency applications, high-volume usage.
Quantum CPUβ
QAOA executed via QCOS API on quantum simulators.
response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={"use_rerank": "quantum_cpu"}
)
| Aspect | Value |
|---|---|
| Method | QAOA (simulated) |
| Latency | +50ms |
| Quality Improvement | +15% vs classic |
| Cost | Included in Pro+ |
Best for: Balanced quality and latency, typical production use.
Quantum GPUβ
QAOA executed on LUMI supercomputer GPUs for maximum quality.
response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={"use_rerank": "quantum_gpu"}
)
| Aspect | Value |
|---|---|
| Method | QAOA (GPU-accelerated) |
| Latency | +100ms |
| Quality Improvement | +25% vs classic |
| Cost | Extra fee on Hybrid+ |
Best for: Maximum quality, critical responses, enterprise use.
Configuration Optionsβ
Candidate Countβ
Control how many candidates to generate:
response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 8 # Generate 8 candidates (default: 4)
}
)
| K Value | Latency Impact | Quality Impact |
|---|---|---|
| 2 | Low | Minimal |
| 4 | Medium | Good (default) |
| 8 | High | Better |
| 16 | Very High | Maximum |
Temperature Rangeβ
Customize the temperature spread:
response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 4,
"temp_min": 0.3,
"temp_max": 1.5
}
)
Quality Weightsβ
Adjust quality metric weights (Enterprise only):
response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={
"use_rerank": "quantum_cpu",
"quality_weights": {
"coherence": 0.20,
"relevance": 0.40,
"factuality": 0.25,
"completeness": 0.10,
"safety": 0.05
}
}
)
Use Casesβ
Medical/Healthcareβ
High factuality requirements:
response = client.chat.completions.create(
model="synapsex-chat",
messages=[
{"role": "system", "content": "You are a medical information assistant."},
{"role": "user", "content": "What are the side effects of metformin?"}
],
extra_body={
"use_rerank": "quantum_gpu",
"rerank_k": 8,
"quality_weights": {
"factuality": 0.50,
"relevance": 0.30,
"completeness": 0.15,
"safety": 0.05
}
}
)
Creative Writingβ
Higher diversity:
response = client.chat.completions.create(
model="synapsex-chat",
messages=[
{"role": "user", "content": "Write a short story about a time traveler."}
],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 8,
"temp_min": 0.7,
"temp_max": 1.3,
"quality_weights": {
"coherence": 0.35,
"completeness": 0.30,
"relevance": 0.25,
"factuality": 0.05,
"safety": 0.05
}
}
)
Customer Supportβ
Balanced quality:
response = client.chat.completions.create(
model="synapsex-chat",
messages=[
{"role": "system", "content": "You are a customer support agent."},
{"role": "user", "content": "How do I return a product?"}
],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 4
}
)
Performance Benchmarksβ
Quality Improvementβ
Measured on standard benchmarks:
| Benchmark | Classic | Quantum CPU | Quantum GPU |
|---|---|---|---|
| MMLU | 72.3% | 76.8% (+4.5%) | 79.2% (+6.9%) |
| TruthfulQA | 45.2% | 52.1% (+6.9%) | 56.8% (+11.6%) |
| HumanEval | 48.5% | 52.3% (+3.8%) | 54.1% (+5.6%) |
Latency Distributionβ
| Tier | P50 | P95 | P99 |
|---|---|---|---|
| Classic | 5ms | 8ms | 12ms |
| Quantum CPU | 48ms | 65ms | 85ms |
| Quantum GPU | 95ms | 130ms | 180ms |
Integration with QCOSβ
SynapseX quantum reranking is powered by QCOS:
# Behind the scenes, SynapseX calls QCOS
from softqcos_sdk import QCOSClient
from softqcos.algorithms import QAOA
softqcos= QCOSClient()
# QUBO formulation from quality scores
qubo = formulate_selection_qubo(scores, constraint=1)
# Run QAOA
result = softqcos.run_qaoa(qubo, p=3, backend="simulator")
# Get best solution
best_candidate = result.most_probable_state
For direct QCOS access, see QCOS Documentation.
Pricingβ
| Tier | Included In | Extra Cost |
|---|---|---|
| Classic | All plans | Free |
| Quantum CPU | Pro, Hybrid, Enterprise | Free |
| Quantum GPU | Hybrid, Enterprise | $0.001/request |
Next Stepsβ
- π API Reference - Reranking parameters
- π Training - Improve base model quality
- βοΈ QCOS Integration - Direct quantum access