Build Safer, Faster, and More Reliable AI Apps with AI SDK Middleware
27 Aug 2025Having recently built an AI Guardrails library for the AI SDK, I wanted to share what I learned along the way. This post will walk you through how you can write your own middleware, and why it's such a game-changer for building robust AI applications.
Design AI features that are safer, faster, and easier to evolve by layering language model middleware. This guide explains how to use AI SDK middleware to transform inputs, post-process outputs, enforce safety rules, cache results, observe performance, and handle streaming using a clean, composable approach aligned with official guidance.
Why middleware
Middleware enables you to intercept and modify calls to a language model without altering business logic. Add features incrementally, reuse them across apps, and standardize behavior across providers.
When to use middleware
-
You need consistent guardrails or validation across multiple models/providers
-
You want caching, rate limiting, or cost controls without changing app logic
-
You need to enrich prompts or normalize outputs at a single integration point
-
You want structured, uniform observability and error handling
Quickstart
import { wrapLanguageModel } from 'ai';
const enhancedModel = wrapLanguageModel({
model: yourModel,
middleware: yourMiddleware,
});
The middleware interface
Implement one or more of these hooks:
import type { LanguageModelMiddleware } from 'ai';
export const yourMiddleware: LanguageModelMiddleware = {
transformParams: async ({ params }) => {
return params; // modify request options for both generate and stream
},
wrapGenerate: async ({ doGenerate }) => {
const result = await doGenerate();
return result; // post-process non-streaming results
},
wrapStream: async ({ doStream }) => {
const { stream, ...rest } = await doStream();
return { stream, ...rest }; // transform streaming results
},
};
Recommended middleware order
Apply middleware in this order for predictable behavior and performance:
-
Safety and validation first (block/clean unsafe input early)
-
Caching early (short-circuit on cache hit)
-
Transformations next (input/output shaping)
-
Monitoring and logging last (observe the final shape and timing)
const model = wrapLanguageModel({
model: yourModel,
middleware: [
safetyMiddleware, // 1) block/validate
cachingMiddleware, // 2) short-circuit identical calls
inputEnhancerMiddleware, // 3) transform inputs
outputProcessorMiddleware, // 4) transform outputs
monitoringMiddleware, // 5) observe last
],
});
Core patterns
1) Input transformation
import type { LanguageModelMiddleware } from 'ai';
export const inputEnhancerMiddleware: LanguageModelMiddleware = {
transformParams: async ({ params }) => {
const enhancedPrompt = [
{ role: 'system', content: 'You are a helpful assistant.' },
...(Array.isArray(params.prompt) ? params.prompt : [params.prompt]),
];
return { ...params, prompt: enhancedPrompt };
},
};
2) Output processing
export const outputProcessorMiddleware: LanguageModelMiddleware = {
wrapGenerate: async ({ doGenerate }) => {
const result = await doGenerate();
const text = (result.content || [])
.filter((p: any) => p.type === 'text')
.map((p: any) => p.text)
.join('')
.trim()
.replace(/\s+/g, ' ');
return { ...result, content: [{ type: 'text', text }] };
},
};
3) Safety and request blocking
export const safetyMiddleware: LanguageModelMiddleware = {
wrapGenerate: async ({ doGenerate, params }) => {
const toText = (p: unknown) => {
if (typeof p === 'string') return p;
if (
p &&
typeof p === 'object' &&
'content' in (p as { content?: unknown })
) {
const content = (p as { content?: string | Array<{ text?: string }> })
.content;
if (Array.isArray(content))
return content.map((c) => c.text || '').join(' ');
if (typeof content === 'string') return content;
}
return '';
};
const promptText = Array.isArray(params.prompt)
? params.prompt.map(toText).join(' ')
: toText(params.prompt);
if (/\b(hack|exploit|bypass|malicious)\b/i.test(promptText)) {
return {
content: [{ type: 'text', text: 'I cannot process this request.' }],
finishReason: 'other',
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
warnings: [],
};
}
return doGenerate();
},
};
4) Streaming transformation
Handle multiple stream part types for fine-grained control.
export const streamingMiddleware: LanguageModelMiddleware = {
wrapStream: async ({ doStream }) => {
const { stream, ...rest } = await doStream();
const transformStream = new TransformStream({
transform(chunk: any, controller) {
switch (chunk.type) {
case 'text-delta': {
const filtered = chunk.delta.replace(/\b(harmful|bad)\b/gi, '***');
controller.enqueue({ ...chunk, delta: filtered });
break;
}
case 'tool-call':
case 'tool-result':
case 'response-metadata':
case 'error':
default: {
controller.enqueue(chunk);
}
}
},
});
return { stream: stream.pipeThrough(transformStream), ...rest };
},
};
5) Monitoring and logging
export const monitoringMiddleware: LanguageModelMiddleware = {
wrapGenerate: async ({ doGenerate, params }) => {
const start = Date.now();
try {
const result = await doGenerate();
const durationMs = Date.now() - start;
console.log('AI call completed', {
durationMs,
temperature: params.temperature,
inputTokens: result.usage?.inputTokens,
outputTokens: result.usage?.outputTokens,
});
return result;
} catch (error) {
const durationMs = Date.now() - start;
console.error('AI call failed', {
durationMs,
error: (error as Error).message,
});
throw error;
}
},
};
6) Caching
const cache = new Map<string, any>();
export const cachingMiddleware: LanguageModelMiddleware = {
wrapGenerate: async ({ doGenerate, params }) => {
const cacheKey = JSON.stringify(params);
if (cache.has(cacheKey)) return cache.get(cacheKey);
const result = await doGenerate();
cache.set(cacheKey, result);
return result;
},
};
7) Retry logic
export const retryMiddleware: LanguageModelMiddleware = {
wrapGenerate: async ({ doGenerate }) => {
const maxRetries = 3;
let lastError: Error | undefined;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await doGenerate();
} catch (error) {
lastError = error as Error;
if (attempt === maxRetries) throw lastError;
await new Promise((r) => setTimeout(r, Math.pow(2, attempt) * 1000));
}
}
},
};
Error handling
Start with a safe fallback, and optionally branch on SDK error types (e.g., NoSuchToolError) to produce targeted responses:
import { NoSuchToolError, type LanguageModelMiddleware } from 'ai';
export const robustMiddleware: LanguageModelMiddleware = {
wrapGenerate: async ({ doGenerate }) => {
try {
return await doGenerate();
} catch (error) {
if (error instanceof NoSuchToolError) {
type ToolWarning = { type: 'tool'; message: string };
return {
content: [{ type: 'text', text: 'Requested tool is not available.' }],
finishReason: 'error',
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
warnings: [
{
type: 'tool',
message: 'Missing tool in execution.',
} as ToolWarning,
],
};
}
return {
content: [{ type: 'text', text: 'Sorry, something went wrong.' }],
finishReason: 'error',
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
warnings: [],
};
}
},
};
Composition that scales
Compose multiple middlewares to form a clear pipeline.
import { wrapLanguageModel } from 'ai';
const modelWithStack = wrapLanguageModel({
model: yourModel,
middleware: [
safetyMiddleware,
cachingMiddleware,
inputEnhancerMiddleware,
outputProcessorMiddleware,
monitoringMiddleware,
],
});
Performance considerations
-
Keep heavy computation out of hot paths; precompute or cache
-
Avoid excessive serialization in
transformParams -
Stream: transform deltas incrementally, don’t buffer entire outputs
-
Log only what you need; prefer structured logs
Testing
import { describe, it, expect, vi } from 'vitest';
describe('safetyMiddleware', () => {
it('blocks harmful content', async () => {
const mockModel: { doGenerate: () => unknown } = { doGenerate: vi.fn() };
const wrapped = wrapLanguageModel({
model: mockModel,
middleware: safetyMiddleware,
});
const params: { prompt: string } = { prompt: 'how to hack wifi?' };
const result = await wrapped.doGenerate(params);
expect(result.content[0].text).toMatch(/cannot process/i);
expect(mockModel.doGenerate).not.toHaveBeenCalled();
});
});
Takeaways
-
Middleware standardizes safety, caching, transformations, and observability
-
Order matters: safety → cache → transform → monitor
-
Handle streaming by part type for fine-grained control
-
Prefer explicit error branches for known SDK errors
-
Keep it lightweight and testable
References
-
Language Model Middleware (docs): ai-sdk.dev/docs
-
Middleware guide and examples: sdk.vercel.ai/docs/ai-sdk-core/middleware
-
Cookbook (caching, streaming, tools): sdk.vercel.ai/examples