Skip to main content

Quantum Reranking

SynapseX's quantum reranking system uses QAOA (Quantum Approximate Optimization Algorithm) to select the best response from multiple candidates, improving response quality by up to 25%.

Overview​

Traditional LLM inference generates a single response. Quantum reranking:

  1. Generates K candidates with varied temperatures
  2. Scores each candidate on multiple quality metrics
  3. Uses QAOA optimization to select the best response
  4. Returns enhanced result with quality metadata
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ QUANTUM RERANKING PIPELINE β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ User Prompt β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ CANDIDATE GENERATION (K=4) β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ T=0.5 β”‚ β”‚ T=0.7 β”‚ β”‚ T=0.9 β”‚ β”‚ T=1.1 β”‚ β”‚ β”‚
β”‚ β”‚ β”‚Response1β”‚ β”‚Response2β”‚ β”‚Response3β”‚ β”‚Response4β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ QUALITY SCORING β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β€’ Coherence β€’ Relevance β€’ Factuality β”‚ β”‚
β”‚ β”‚ β€’ Completeness β€’ Safety β€’ Entropy β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ QAOA OPTIMIZATION β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Quantum Circuit (QCOS API) β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ β–’ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ β–’ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ β–’ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ β–’ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ Best Response + Metadata β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How It Works​

1. Candidate Generation​

Multiple responses are generated with varied temperatures:

# Internal process (simplified)
candidates = []
for temp in [0.5, 0.7, 0.9, 1.1]:
response = model.generate(prompt, temperature=temp)
candidates.append(response)

This creates diversity in responses while maintaining coherence.

2. Quality Scoring​

Each candidate is scored on multiple dimensions:

MetricWeightDescription
Coherence0.25Logical flow and structure
Relevance0.30Match to user query
Factuality0.20Accuracy of information
Completeness0.15Coverage of the topic
Safety0.10Absence of harmful content

3. QAOA Optimization​

The selection problem is formulated as a QUBO (Quadratic Unconstrained Binary Optimization):

$$H = -\sum_i w_i x_i + \lambda \left(\sum_i x_i - 1\right)^2$$

Where:

  • $x_i \in {0, 1}$ indicates selection of candidate $i$
  • $w_i$ is the quality score of candidate $i$
  • $\lambda$ enforces single selection constraint

The QAOA circuit finds the optimal $x$ that maximizes quality.

4. Response Selection​

The candidate with highest QAOA probability is selected and returned with metadata:

{
"choices": [{"message": {"content": "Selected response..."}}],
"meta": {
"rerank_tier": "quantum_cpu",
"candidates": 4,
"quality_score": 0.92,
"selected_index": 2,
"scores": [0.78, 0.85, 0.92, 0.81]
}
}

Reranking Tiers​

Classic​

Heuristic-based selection using weighted scoring.

response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={"use_rerank": "classic"}
)
AspectValue
MethodWeighted heuristic
Latency+5ms
Quality ImprovementBaseline
CostIncluded in plan

Best for: Low-latency applications, high-volume usage.


Quantum CPU​

QAOA executed via QCOS API on quantum simulators.

response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={"use_rerank": "quantum_cpu"}
)
AspectValue
MethodQAOA (simulated)
Latency+50ms
Quality Improvement+15% vs classic
CostIncluded in Pro+

Best for: Balanced quality and latency, typical production use.


Quantum GPU​

QAOA executed on LUMI supercomputer GPUs for maximum quality.

response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={"use_rerank": "quantum_gpu"}
)
AspectValue
MethodQAOA (GPU-accelerated)
Latency+100ms
Quality Improvement+25% vs classic
CostExtra fee on Hybrid+

Best for: Maximum quality, critical responses, enterprise use.


Configuration Options​

Candidate Count​

Control how many candidates to generate:

response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 8 # Generate 8 candidates (default: 4)
}
)
K ValueLatency ImpactQuality Impact
2LowMinimal
4MediumGood (default)
8HighBetter
16Very HighMaximum

Temperature Range​

Customize the temperature spread:

response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 4,
"temp_min": 0.3,
"temp_max": 1.5
}
)

Quality Weights​

Adjust quality metric weights (Enterprise only):

response = client.chat.completions.create(
model="synapsex-chat",
messages=[{"role": "user", "content": "..."}],
extra_body={
"use_rerank": "quantum_cpu",
"quality_weights": {
"coherence": 0.20,
"relevance": 0.40,
"factuality": 0.25,
"completeness": 0.10,
"safety": 0.05
}
}
)

Use Cases​

Medical/Healthcare​

High factuality requirements:

response = client.chat.completions.create(
model="synapsex-chat",
messages=[
{"role": "system", "content": "You are a medical information assistant."},
{"role": "user", "content": "What are the side effects of metformin?"}
],
extra_body={
"use_rerank": "quantum_gpu",
"rerank_k": 8,
"quality_weights": {
"factuality": 0.50,
"relevance": 0.30,
"completeness": 0.15,
"safety": 0.05
}
}
)

Creative Writing​

Higher diversity:

response = client.chat.completions.create(
model="synapsex-chat",
messages=[
{"role": "user", "content": "Write a short story about a time traveler."}
],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 8,
"temp_min": 0.7,
"temp_max": 1.3,
"quality_weights": {
"coherence": 0.35,
"completeness": 0.30,
"relevance": 0.25,
"factuality": 0.05,
"safety": 0.05
}
}
)

Customer Support​

Balanced quality:

response = client.chat.completions.create(
model="synapsex-chat",
messages=[
{"role": "system", "content": "You are a customer support agent."},
{"role": "user", "content": "How do I return a product?"}
],
extra_body={
"use_rerank": "quantum_cpu",
"rerank_k": 4
}
)

Performance Benchmarks​

Quality Improvement​

Measured on standard benchmarks:

BenchmarkClassicQuantum CPUQuantum GPU
MMLU72.3%76.8% (+4.5%)79.2% (+6.9%)
TruthfulQA45.2%52.1% (+6.9%)56.8% (+11.6%)
HumanEval48.5%52.3% (+3.8%)54.1% (+5.6%)

Latency Distribution​

TierP50P95P99
Classic5ms8ms12ms
Quantum CPU48ms65ms85ms
Quantum GPU95ms130ms180ms

Integration with QCOS​

SynapseX quantum reranking is powered by QCOS:

# Behind the scenes, SynapseX calls QCOS
from softqcos_sdk import QCOSClient
from softqcos.algorithms import QAOA

softqcos= QCOSClient()

# QUBO formulation from quality scores
qubo = formulate_selection_qubo(scores, constraint=1)

# Run QAOA
result = softqcos.run_qaoa(qubo, p=3, backend="simulator")

# Get best solution
best_candidate = result.most_probable_state

For direct QCOS access, see QCOS Documentation.


Pricing​

TierIncluded InExtra Cost
ClassicAll plansFree
Quantum CPUPro, Hybrid, EnterpriseFree
Quantum GPUHybrid, Enterprise$0.001/request

Next Steps​