Quantum Reranking

SynapseX's quantum reranking system uses QAOA (Quantum Approximate Optimization Algorithm) to select the best response from multiple candidates, improving response quality by up to 25%.

Overview

Traditional LLM inference generates a single response. Quantum reranking:

Generates K candidates with varied temperatures
Scores each candidate on multiple quality metrics
Uses QAOA optimization to select the best response
Returns enhanced result with quality metadata

┌─────────────────────────────────────────────────────────────────────────────┐
│                        QUANTUM RERANKING PIPELINE                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   User Prompt                                                               │
│        │                                                                    │
│        ▼                                                                    │
│   ┌──────────────────────────────────────────────────────────────────┐     │
│   │               CANDIDATE GENERATION (K=4)                         │     │
│   │                                                                   │     │
│   │   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐            │     │
│   │   │ T=0.5   │  │ T=0.7   │  │ T=0.9   │  │ T=1.1   │            │     │
│   │   │Response1│  │Response2│  │Response3│  │Response4│            │     │
│   │   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘            │     │
│   │        │            │            │            │                  │     │
│   └────────┼────────────┼────────────┼────────────┼──────────────────┘     │
│            │            │            │            │                         │
│            ▼            ▼            ▼            ▼                         │
│   ┌──────────────────────────────────────────────────────────────────┐     │
│   │                     QUALITY SCORING                              │     │
│   │                                                                   │     │
│   │   • Coherence      • Relevance      • Factuality                │     │
│   │   • Completeness   • Safety         • Entropy                   │     │
│   │                                                                   │     │
│   └──────────────────────────────────────────────────────────────────┘     │
│                              │                                              │
│                              ▼                                              │
│   ┌──────────────────────────────────────────────────────────────────┐     │
│   │                    QAOA OPTIMIZATION                             │     │
│   │                                                                   │     │
│   │        ┌───────────────────────────────────────┐                │     │
│   │        │     Quantum Circuit (QCOS API)        │                │     │
│   │        │                                        │                │     │
│   │        │   |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ ▢          │                │     │
│   │        │   |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ ▢          │                │     │
│   │        │   |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ ▢          │                │     │
│   │        │   |0⟩ ─[H]─[RZ]─[ZZ]─[RX]─ ▢          │                │     │
│   │        │                                        │                │     │
│   │        └───────────────────────────────────────┘                │     │
│   │                                                                   │     │
│   └──────────────────────────────────────────────────────────────────┘     │
│                              │                                              │
│                              ▼                                              │
│                     Best Response + Metadata                                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

How It Works

1. Candidate Generation

Multiple responses are generated with varied temperatures:

# Internal process (simplified)
candidates = []
for temp in [0.5, 0.7, 0.9, 1.1]:
    response = model.generate(prompt, temperature=temp)
    candidates.append(response)

This creates diversity in responses while maintaining coherence.

2. Quality Scoring

Each candidate is scored on multiple dimensions:

Metric	Weight	Description
Coherence	0.25	Logical flow and structure
Relevance	0.30	Match to user query
Factuality	0.20	Accuracy of information
Completeness	0.15	Coverage of the topic
Safety	0.10	Absence of harmful content

3. QAOA Optimization

The selection problem is formulated as a QUBO (Quadratic Unconstrained Binary Optimization):

$$H = -\sum_i w_i x_i + \lambda \left(\sum_i x_i - 1\right)^2$$

Where:

$x_i \in {0, 1}$ indicates selection of candidate $i$
$w_i$ is the quality score of candidate $i$
$\lambda$ enforces single selection constraint

The QAOA circuit finds the optimal $x$ that maximizes quality.

4. Response Selection

The candidate with highest QAOA probability is selected and returned with metadata:

{
  "choices": [{"message": {"content": "Selected response..."}}],
  "meta": {
    "rerank_tier": "quantum_cpu",
    "candidates": 4,
    "quality_score": 0.92,
    "selected_index": 2,
    "scores": [0.78, 0.85, 0.92, 0.81]
  }
}

Reranking Tiers

Classic

Heuristic-based selection using weighted scoring.

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[{"role": "user", "content": "..."}],
    extra_body={"use_rerank": "classic"}
)

Aspect	Value
Method	Weighted heuristic
Latency	+5ms
Quality Improvement	Baseline
Cost	Included in plan

Best for: Low-latency applications, high-volume usage.

Quantum CPU

QAOA executed via QCOS API on quantum simulators.

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[{"role": "user", "content": "..."}],
    extra_body={"use_rerank": "quantum_cpu"}
)

Aspect	Value
Method	QAOA (simulated)
Latency	+50ms
Quality Improvement	+15% vs classic
Cost	Included in Pro+

Best for: Balanced quality and latency, typical production use.

Quantum GPU

QAOA executed on LUMI supercomputer GPUs for maximum quality.

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[{"role": "user", "content": "..."}],
    extra_body={"use_rerank": "quantum_gpu"}
)

Aspect	Value
Method	QAOA (GPU-accelerated)
Latency	+100ms
Quality Improvement	+25% vs classic
Cost	Extra fee on Hybrid+

Best for: Maximum quality, critical responses, enterprise use.

Configuration Options

Candidate Count

Control how many candidates to generate:

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "use_rerank": "quantum_cpu",
        "rerank_k": 8  # Generate 8 candidates (default: 4)
    }
)

K Value	Latency Impact	Quality Impact
2	Low	Minimal
4	Medium	Good (default)
8	High	Better
16	Very High	Maximum

Temperature Range

Customize the temperature spread:

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "use_rerank": "quantum_cpu",
        "rerank_k": 4,
        "temp_min": 0.3,
        "temp_max": 1.5
    }
)

Quality Weights

Adjust quality metric weights (Enterprise only):

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "use_rerank": "quantum_cpu",
        "quality_weights": {
            "coherence": 0.20,
            "relevance": 0.40,
            "factuality": 0.25,
            "completeness": 0.10,
            "safety": 0.05
        }
    }
)

Use Cases

Medical/Healthcare

High factuality requirements:

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[
        {"role": "system", "content": "You are a medical information assistant."},
        {"role": "user", "content": "What are the side effects of metformin?"}
    ],
    extra_body={
        "use_rerank": "quantum_gpu",
        "rerank_k": 8,
        "quality_weights": {
            "factuality": 0.50,
            "relevance": 0.30,
            "completeness": 0.15,
            "safety": 0.05
        }
    }
)

Creative Writing

Higher diversity:

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[
        {"role": "user", "content": "Write a short story about a time traveler."}
    ],
    extra_body={
        "use_rerank": "quantum_cpu",
        "rerank_k": 8,
        "temp_min": 0.7,
        "temp_max": 1.3,
        "quality_weights": {
            "coherence": 0.35,
            "completeness": 0.30,
            "relevance": 0.25,
            "factuality": 0.05,
            "safety": 0.05
        }
    }
)

Customer Support

Balanced quality:

response = client.chat.completions.create(
    model="synapsex-chat",
    messages=[
        {"role": "system", "content": "You are a customer support agent."},
        {"role": "user", "content": "How do I return a product?"}
    ],
    extra_body={
        "use_rerank": "quantum_cpu",
        "rerank_k": 4
    }
)

Performance Benchmarks

Quality Improvement

Measured on standard benchmarks:

Benchmark	Classic	Quantum CPU	Quantum GPU
MMLU	72.3%	76.8% (+4.5%)	79.2% (+6.9%)
TruthfulQA	45.2%	52.1% (+6.9%)	56.8% (+11.6%)
HumanEval	48.5%	52.3% (+3.8%)	54.1% (+5.6%)

Latency Distribution

Tier	P50	P95	P99
Classic	5ms	8ms	12ms
Quantum CPU	48ms	65ms	85ms
Quantum GPU	95ms	130ms	180ms

Integration with QCOS

SynapseX quantum reranking is powered by QCOS:

# Behind the scenes, SynapseX calls QCOS
from softqcos_sdk import QCOSClient
from softqcos.algorithms import QAOA

softqcos= QCOSClient()

# QUBO formulation from quality scores
qubo = formulate_selection_qubo(scores, constraint=1)

# Run QAOA
result = softqcos.run_qaoa(qubo, p=3, backend="simulator")

# Get best solution
best_candidate = result.most_probable_state

For direct QCOS access, see QCOS Documentation.

Pricing

Tier	Included In	Extra Cost
Classic	All plans	Free
Quantum CPU	Pro, Hybrid, Enterprise	Free
Quantum GPU	Hybrid, Enterprise	$0.001/request

Next Steps

📖 API Reference - Reranking parameters
🎓 Training - Improve base model quality
⚛️ QCOS Integration - Direct quantum access

Overview​

How It Works​

1. Candidate Generation​

2. Quality Scoring​

3. QAOA Optimization​

4. Response Selection​

Reranking Tiers​

Classic​

Quantum CPU​

Quantum GPU​

Configuration Options​

Candidate Count​

Temperature Range​

Quality Weights​

Use Cases​

Medical/Healthcare​

Creative Writing​

Customer Support​

Performance Benchmarks​

Quality Improvement​

Latency Distribution​

Integration with QCOS​

Pricing​

Next Steps​