Arrange Act Assert

Jag Reehals thinking on things, mostly product development

Reranking: Improving Search Relevance with the AI SDK

23 Dec 2025

Reranking improves search relevance by reordering documents based on their relevance to a query. Unlike embedding-based similarity search, reranking models are specifically trained to understand the relationship between queries and documents, often producing more accurate relevance scores.

Reranking

The Problem: Vector Search vs Query Relevance

Vector search finds semantically similar documents but may miss the most relevant one when the query uses different vocabulary. For example, a query like "How do I get a refund?" might match documents containing "billing" or "payment" (similar financial vocabulary) rather than the document that actually answers the question.

Two-Stage Retrieval Pipeline

The recommended approach combines fast vector search with precise reranking:

┌───────────────────────────────────────────────────────────────┐
│                    All Documents (1000s)                      │
└──────────────────────────┬────────────────────────────────────┘
                           │
                           ▼
              ┌───────────────────────┐
              │   Vector Search       │
              │   (Fast, Broad)       │
              └──────────┬────────────┘
                         │
                         ▼
              ┌─────────────────────────┐
              │   Top 20-100 Candidates │
              └──────────┬──────────────┘
                         │
                         ▼
              ┌────────────────────────┐
              │   Reranking            │
              │   (Precise, Slower)    │
              └──────────┬─────────────┘
                         │
                         ▼
              ┌─────────────────────────┐
              │   Top 3-10 Results      │
              └─────────────────────────┘
  1. Broad retrieval: Vector search returns top 20-100 candidates quickly (tune for latency/recall)
  2. Precise reranking: Re-score candidates for query relevance using the rerank function

Example: Customer Support Knowledge Base

Using a knowledge base with 6 articles, we compare vector search with reranking.

For clarity, examples below rerank small document sets directly. In production, rerank only the top candidates from vector search (see Two-Stage Retrieval Pipeline).

Vector Search Results

import { embed, cosineSimilarity } from 'ai';
import { ollama } from 'ai-sdk-ollama';

const query = 'How do I get a refund?';
const queryEmbedding = await embed({
  model: ollama.textEmbeddingModel('nomic-embed-text'),
  value: query,
});

// Compare with pre-embedded articles
const scores = articles.map((article, i) => ({
  article,
  score: cosineSimilarity(queryEmbedding.embedding, articleEmbeddings[i].embedding),
}));

const topResults = scores.sort((a, b) => b.score - a.score).slice(0, 3);

Results:

Rank Article Score
1 Billing and Payment Methods 0.6073
2 Account Plan Changes 0.5862 ✓ correct
3 Exporting Your Data 0.5797

Vector search ranks "Billing and Payment Methods" #1 because it shares financial vocabulary, but it doesn't answer the refund question.

After Reranking

Using the rerank function:

import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';

const { ranking, rerankedDocuments } = await rerank({
  model: ollama.embeddingReranking('embeddinggemma'),
  documents: supportArticles.map(a => ({
    id: a.id,
    title: a.title,
    content: a.content,
  })),
  query: 'How do I get a refund?',
  topN: 3,
});

Results:

Rank Article Score Original Index
1 Account Plan Changes (KB003) 0.5376 2
2 How to Reset Your Password (KB001) 0.2920 0
3 Billing and Payment Methods (KB002) 0.2815 1

Reranking correctly identifies KB003 (which contains "Refunds are available within 14 days") as the most relevant result.

Comparison: Vector Search vs Reranking

Testing 7 queries designed to expose vocabulary mismatch issues:

Query Vector Search #1 After Rerank #1 Result
"How do I get a refund?" Billing and Payment Methods Account Plan Changes ✓ IMPROVED
"money back" Billing and Payment Methods Account Plan Changes ✓ IMPROVED
"account protection options" Two-Factor Authentication Setup ✓ Two-Factor Authentication Setup ✓ Same
"stop my account" Account Plan Changes ✓ Account Plan Changes ✓ Same
"download all my info" Exporting Your Data ✓ Exporting Your Data ✓ Same
"forgot my login" How to Reset Your Password ✓ How to Reset Your Password ✓ Same
"how much can I call the API" API Rate Limits ✓ API Rate Limits ✓ Same

Results:

Working with Object Documents

Reranking supports structured documents (JSON objects), making it ideal for searching through databases, emails, or other structured content:

import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';

const emails = [
  { from: 'aws-billing@amazon.com', subject: 'Your AWS Invoice', date: '2024-12-15' },
  { from: 'team@linear.app', subject: 'Weekly Project Update', date: '2024-12-14' },
  { from: 'noreply@github.com', subject: 'Security alert: new sign-in', date: '2024-12-14' },
  { from: 'sales@datadog.com', subject: 'Your Datadog Quote - Enterprise Plan', date: '2024-12-13' },
  { from: 'no-reply@vercel.com', subject: 'Deployment failed: main branch', date: '2024-12-13' },
  { from: 'hr@company.com', subject: 'Holiday Schedule Reminder', date: '2024-12-12' },
];

const { ranking, rerankedDocuments } = await rerank({
  model: ollama.embeddingReranking('embeddinggemma'),
  documents: emails,
  query: 'Find the pricing quote from Datadog',
  topN: 3,
});

Results:

Rank From Subject Score
1 sales@datadog.com Your Datadog Quote - Enterprise Plan 0.6830
2 aws-billing@amazon.com Your AWS Invoice 0.3238
3 hr@company.com Holiday Schedule Reminder 0.1946

Use Case: RAG Context Selection

For RAG applications, reranking selects the most relevant documents to include in the LLM context:

import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';

const docs = [
  'The createClient() function initializes a new API client...',
  'Authentication uses Bearer tokens...',
  'Rate limiting returns HTTP 429 when exceeded...',
  'Webhooks are sent via POST to your configured endpoint...',
  'Error responses follow RFC 7807 Problem Details format...',
  'Pagination uses cursor-based navigation...',
];

const { rerankedDocuments, ranking } = await rerank({
  model: ollama.embeddingReranking('embeddinggemma'),
  documents: docs,
  query: 'How do I handle API errors?',
  topN: 2,
});

// Use rerankedDocuments for LLM context
const context = rerankedDocuments.join('\n\n');

Results:

Rank Document Score
1 Error responses follow RFC 7807 Problem Details format... 0.4662
2 Rate limiting returns HTTP 429 when exceeded... 0.4447

These top documents would be passed to the LLM for answer generation.

Understanding the Results

The rerank function returns a comprehensive result object:

import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';

const { ranking, rerankedDocuments, originalDocuments } = await rerank({
  model: ollama.embeddingReranking('embeddinggemma'),
  documents: supportArticles,
  query: 'How do I get a refund?',
  topN: 3,
});

Return value properties:

Property Description
ranking Sorted array of { originalIndex, score, document }
rerankedDocuments Documents sorted by relevance (convenience property)
originalDocuments Original documents array (unchanged)

Each item in the ranking array contains:

Note: Scores are not comparable across models or providers; only relative ordering within a single rerank call matters.

Filtering by Score Threshold

In production, you may want to discard results with low relevance scores. Filter the ranking array to only include results above a threshold:

const { ranking } = await rerank({
  model: cohere.reranking('rerank-v3.5'),
  documents,
  query,
  topN: 10,
});

// Filter to only include results with score > 0.4
const relevantResults = ranking.filter(item => item.score > 0.4);
// Returns only documents with relevance score above threshold

Reranking Providers & Models

The AI SDK supports multiple reranking providers. You can use reranking models from:

Cohere

import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';

const { ranking } = await rerank({
  model: cohere.reranking('rerank-v3.5'),
  documents,
  query,
  topN: 3,
});

Amazon Bedrock

import { rerank } from 'ai';
import { bedrock } from '@ai-sdk/amazon-bedrock';

const { ranking } = await rerank({
  model: bedrock.reranking('cohere.rerank-v3-5:0'),
  documents,
  query,
  topN: 3,
});

Ollama (Local)

import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';

const { ranking } = await rerank({
  model: ollama.embeddingReranking('embeddinggemma'),
  documents,
  query,
  topN: 3,
});

Note: Some reranking models (e.g., Cohere rerankers) are cross-encoders that jointly score query–document pairs, while others (e.g., embedding-based rerankers) trade some accuracy for speed and locality. Cross-encoders typically provide higher accuracy but require API calls, while embedding-based rerankers can run locally with lower latency.

Settings

Top-N Results

Use topN to limit the number of results returned. This is useful for retrieving only the most relevant documents:

import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';

const { ranking } = await rerank({
  model: cohere.reranking('rerank-v3.5'),
  documents: ['doc1', 'doc2', 'doc3', 'doc4', 'doc5'],
  query: 'relevant information',
  topN: 3, // Return only top 3 most relevant documents
});

Provider Options

Reranking model settings can be configured using providerOptions for provider-specific parameters. This is particularly useful for handling long documents that might otherwise be truncated:

import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';

const { ranking } = await rerank({
  model: cohere.reranking('rerank-v3.5'),
  documents: ['long document 1...', 'long document 2...'],
  query: 'relevant information',
  providerOptions: {
    cohere: {
      maxTokensPerDoc: 1000, // Limit tokens per document
    },
  },
});

Examples (provider-specific; check provider docs for available options):

Best Practices

  1. Use vector search for broad retrieval first: Retrieve top 20-100 candidates before reranking (tune for latency/recall trade-offs). In practice, reranking typically adds tens of milliseconds for 20–50 documents, depending on model and provider.

  2. Tune topN based on use case:

    • RAG: topN: 3 (small context window)
    • Search results: topN: 10
    • Auto-reply selection: topN: 1
  3. Monitor performance: Track click-through rates, RAG answer accuracy, and queries requiring refinement.

Summary

Metric Vector Search With Reranking
Accuracy (top-1) 71% (5/7) 100% (7/7)
Vocabulary mismatch handling Poor Excellent
Speed Fast +tens of milliseconds for 20–50 docs
Best for Broad retrieval Final ranking

Reranking improves search relevance by reordering documents based on query-document relationships rather than just semantic similarity. For search systems where precision matters customer support, RAG, email, documentation, reranking improves result quality with minimal code changes.

See the ai-sdk-reranking-example repository for runnable code.

ai ai-sdk reranking search