Reranking: Improving Search Relevance with the AI SDK
23 Dec 2025Reranking improves search relevance by reordering documents based on their relevance to a query. Unlike embedding-based similarity search, reranking models are specifically trained to understand the relationship between queries and documents, often producing more accurate relevance scores.

The Problem: Vector Search vs Query Relevance
Vector search finds semantically similar documents but may miss the most relevant one when the query uses different vocabulary. For example, a query like "How do I get a refund?" might match documents containing "billing" or "payment" (similar financial vocabulary) rather than the document that actually answers the question.
Two-Stage Retrieval Pipeline
The recommended approach combines fast vector search with precise reranking:
┌───────────────────────────────────────────────────────────────┐
│ All Documents (1000s) │
└──────────────────────────┬────────────────────────────────────┘
│
▼
┌───────────────────────┐
│ Vector Search │
│ (Fast, Broad) │
└──────────┬────────────┘
│
▼
┌─────────────────────────┐
│ Top 20-100 Candidates │
└──────────┬──────────────┘
│
▼
┌────────────────────────┐
│ Reranking │
│ (Precise, Slower) │
└──────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Top 3-10 Results │
└─────────────────────────┘
- Broad retrieval: Vector search returns top 20-100 candidates quickly (tune for latency/recall)
- Precise reranking: Re-score candidates for query relevance using the
rerankfunction
Example: Customer Support Knowledge Base
Using a knowledge base with 6 articles, we compare vector search with reranking.
For clarity, examples below rerank small document sets directly. In production, rerank only the top candidates from vector search (see Two-Stage Retrieval Pipeline).
Vector Search Results
import { embed, cosineSimilarity } from 'ai';
import { ollama } from 'ai-sdk-ollama';
const query = 'How do I get a refund?';
const queryEmbedding = await embed({
model: ollama.textEmbeddingModel('nomic-embed-text'),
value: query,
});
// Compare with pre-embedded articles
const scores = articles.map((article, i) => ({
article,
score: cosineSimilarity(queryEmbedding.embedding, articleEmbeddings[i].embedding),
}));
const topResults = scores.sort((a, b) => b.score - a.score).slice(0, 3);
Results:
| Rank | Article | Score | |
|---|---|---|---|
| 1 | Billing and Payment Methods | 0.6073 | |
| 2 | Account Plan Changes | 0.5862 | ✓ correct |
| 3 | Exporting Your Data | 0.5797 |
Vector search ranks "Billing and Payment Methods" #1 because it shares financial vocabulary, but it doesn't answer the refund question.
After Reranking
Using the rerank function:
import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';
const { ranking, rerankedDocuments } = await rerank({
model: ollama.embeddingReranking('embeddinggemma'),
documents: supportArticles.map(a => ({
id: a.id,
title: a.title,
content: a.content,
})),
query: 'How do I get a refund?',
topN: 3,
});
Results:
| Rank | Article | Score | Original Index |
|---|---|---|---|
| 1 | Account Plan Changes (KB003) | 0.5376 | 2 |
| 2 | How to Reset Your Password (KB001) | 0.2920 | 0 |
| 3 | Billing and Payment Methods (KB002) | 0.2815 | 1 |
Reranking correctly identifies KB003 (which contains "Refunds are available within 14 days") as the most relevant result.
Comparison: Vector Search vs Reranking
Testing 7 queries designed to expose vocabulary mismatch issues:
| Query | Vector Search #1 | After Rerank #1 | Result |
|---|---|---|---|
| "How do I get a refund?" | Billing and Payment Methods | Account Plan Changes ✓ | IMPROVED |
| "money back" | Billing and Payment Methods | Account Plan Changes ✓ | IMPROVED |
| "account protection options" | Two-Factor Authentication Setup ✓ | Two-Factor Authentication Setup ✓ | Same |
| "stop my account" | Account Plan Changes ✓ | Account Plan Changes ✓ | Same |
| "download all my info" | Exporting Your Data ✓ | Exporting Your Data ✓ | Same |
| "forgot my login" | How to Reset Your Password ✓ | How to Reset Your Password ✓ | Same |
| "how much can I call the API" | API Rate Limits ✓ | API Rate Limits ✓ | Same |
Results:
- Vector Search accuracy: 5/7 (71%)
- After Reranking: 7/7 (100%)
- Queries improved: 2 (both vocabulary mismatch cases)
Working with Object Documents
Reranking supports structured documents (JSON objects), making it ideal for searching through databases, emails, or other structured content:
import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';
const emails = [
{ from: 'aws-billing@amazon.com', subject: 'Your AWS Invoice', date: '2024-12-15' },
{ from: 'team@linear.app', subject: 'Weekly Project Update', date: '2024-12-14' },
{ from: 'noreply@github.com', subject: 'Security alert: new sign-in', date: '2024-12-14' },
{ from: 'sales@datadog.com', subject: 'Your Datadog Quote - Enterprise Plan', date: '2024-12-13' },
{ from: 'no-reply@vercel.com', subject: 'Deployment failed: main branch', date: '2024-12-13' },
{ from: 'hr@company.com', subject: 'Holiday Schedule Reminder', date: '2024-12-12' },
];
const { ranking, rerankedDocuments } = await rerank({
model: ollama.embeddingReranking('embeddinggemma'),
documents: emails,
query: 'Find the pricing quote from Datadog',
topN: 3,
});
Results:
| Rank | From | Subject | Score |
|---|---|---|---|
| 1 | sales@datadog.com | Your Datadog Quote - Enterprise Plan | 0.6830 |
| 2 | aws-billing@amazon.com | Your AWS Invoice | 0.3238 |
| 3 | hr@company.com | Holiday Schedule Reminder | 0.1946 |
Use Case: RAG Context Selection
For RAG applications, reranking selects the most relevant documents to include in the LLM context:
import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';
const docs = [
'The createClient() function initializes a new API client...',
'Authentication uses Bearer tokens...',
'Rate limiting returns HTTP 429 when exceeded...',
'Webhooks are sent via POST to your configured endpoint...',
'Error responses follow RFC 7807 Problem Details format...',
'Pagination uses cursor-based navigation...',
];
const { rerankedDocuments, ranking } = await rerank({
model: ollama.embeddingReranking('embeddinggemma'),
documents: docs,
query: 'How do I handle API errors?',
topN: 2,
});
// Use rerankedDocuments for LLM context
const context = rerankedDocuments.join('\n\n');
Results:
| Rank | Document | Score |
|---|---|---|
| 1 | Error responses follow RFC 7807 Problem Details format... | 0.4662 |
| 2 | Rate limiting returns HTTP 429 when exceeded... | 0.4447 |
These top documents would be passed to the LLM for answer generation.
Understanding the Results
The rerank function returns a comprehensive result object:
import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';
const { ranking, rerankedDocuments, originalDocuments } = await rerank({
model: ollama.embeddingReranking('embeddinggemma'),
documents: supportArticles,
query: 'How do I get a refund?',
topN: 3,
});
Return value properties:
| Property | Description |
|---|---|
ranking |
Sorted array of { originalIndex, score, document } |
rerankedDocuments |
Documents sorted by relevance (convenience property) |
originalDocuments |
Original documents array (unchanged) |
Each item in the ranking array contains:
originalIndex: Position in the original documents arrayscore: Relevance score (higher is more relevant)document: The original document
Note: Scores are not comparable across models or providers; only relative ordering within a single
rerankcall matters.
Filtering by Score Threshold
In production, you may want to discard results with low relevance scores. Filter the ranking array to only include results above a threshold:
const { ranking } = await rerank({
model: cohere.reranking('rerank-v3.5'),
documents,
query,
topN: 10,
});
// Filter to only include results with score > 0.4
const relevantResults = ranking.filter(item => item.score > 0.4);
// Returns only documents with relevance score above threshold
Reranking Providers & Models
The AI SDK supports multiple reranking providers. You can use reranking models from:
Cohere
import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';
const { ranking } = await rerank({
model: cohere.reranking('rerank-v3.5'),
documents,
query,
topN: 3,
});
Amazon Bedrock
import { rerank } from 'ai';
import { bedrock } from '@ai-sdk/amazon-bedrock';
const { ranking } = await rerank({
model: bedrock.reranking('cohere.rerank-v3-5:0'),
documents,
query,
topN: 3,
});
Ollama (Local)
import { rerank } from 'ai';
import { ollama } from 'ai-sdk-ollama';
const { ranking } = await rerank({
model: ollama.embeddingReranking('embeddinggemma'),
documents,
query,
topN: 3,
});
Note: Some reranking models (e.g., Cohere rerankers) are cross-encoders that jointly score query–document pairs, while others (e.g., embedding-based rerankers) trade some accuracy for speed and locality. Cross-encoders typically provide higher accuracy but require API calls, while embedding-based rerankers can run locally with lower latency.
Settings
Top-N Results
Use topN to limit the number of results returned. This is useful for retrieving only the most relevant documents:
import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';
const { ranking } = await rerank({
model: cohere.reranking('rerank-v3.5'),
documents: ['doc1', 'doc2', 'doc3', 'doc4', 'doc5'],
query: 'relevant information',
topN: 3, // Return only top 3 most relevant documents
});
Provider Options
Reranking model settings can be configured using providerOptions for provider-specific parameters. This is particularly useful for handling long documents that might otherwise be truncated:
import { rerank } from 'ai';
import { cohere } from '@ai-sdk/cohere';
const { ranking } = await rerank({
model: cohere.reranking('rerank-v3.5'),
documents: ['long document 1...', 'long document 2...'],
query: 'relevant information',
providerOptions: {
cohere: {
maxTokensPerDoc: 1000, // Limit tokens per document
},
},
});
Examples (provider-specific; check provider docs for available options):
- Cohere:
maxTokensPerDoc- Maximum tokens per document (default varies by model) - Other providers may expose additional options via
providerOptions
Best Practices
-
Use vector search for broad retrieval first: Retrieve top 20-100 candidates before reranking (tune for latency/recall trade-offs). In practice, reranking typically adds tens of milliseconds for 20–50 documents, depending on model and provider.
-
Tune topN based on use case:
- RAG:
topN: 3(small context window) - Search results:
topN: 10 - Auto-reply selection:
topN: 1
- RAG:
-
Monitor performance: Track click-through rates, RAG answer accuracy, and queries requiring refinement.
Summary
| Metric | Vector Search | With Reranking |
|---|---|---|
| Accuracy (top-1) | 71% (5/7) | 100% (7/7) |
| Vocabulary mismatch handling | Poor | Excellent |
| Speed | Fast | +tens of milliseconds for 20–50 docs |
| Best for | Broad retrieval | Final ranking |
Reranking improves search relevance by reordering documents based on query-document relationships rather than just semantic similarity. For search systems where precision matters customer support, RAG, email, documentation, reranking improves result quality with minimal code changes.
See the ai-sdk-reranking-example repository for runnable code.