Build Safer, Faster, and More Reliable AI Apps with AI SDK Middleware

27 Aug 2025

Having recently built an AI Guardrails library for the AI SDK, I wanted to share what I learned along the way. This post will walk you through how you can write your own middleware, and why it's such a game-changer for building robust AI applications.

Design AI features that are safer, faster, and easier to evolve by layering language model middleware. This guide explains how to use AI SDK middleware to transform inputs, post-process outputs, enforce safety rules, cache results, observe performance, and handle streaming using a clean, composable approach aligned with official guidance.

Why middleware

Middleware enables you to intercept and modify calls to a language model without altering business logic. Add features incrementally, reuse them across apps, and standardize behavior across providers.

When to use middleware

You need consistent guardrails or validation across multiple models/providers
You want caching, rate limiting, or cost controls without changing app logic
You need to enrich prompts or normalize outputs at a single integration point
You want structured, uniform observability and error handling

Quickstart

import { wrapLanguageModel } from 'ai';

const enhancedModel = wrapLanguageModel({
  model: yourModel,

  middleware: yourMiddleware,
});

The middleware interface

Implement one or more of these hooks:

import type { LanguageModelMiddleware } from 'ai';

export const yourMiddleware: LanguageModelMiddleware = {
  transformParams: async ({ params }) => {
    return params; // modify request options for both generate and stream
  },

  wrapGenerate: async ({ doGenerate }) => {
    const result = await doGenerate();

    return result; // post-process non-streaming results
  },

  wrapStream: async ({ doStream }) => {
    const { stream, ...rest } = await doStream();

    return { stream, ...rest }; // transform streaming results
  },
};

Recommended middleware order

Apply middleware in this order for predictable behavior and performance:

Safety and validation first (block/clean unsafe input early)
Caching early (short-circuit on cache hit)
Transformations next (input/output shaping)
Monitoring and logging last (observe the final shape and timing)

const model = wrapLanguageModel({
  model: yourModel,

  middleware: [
    safetyMiddleware, // 1) block/validate

    cachingMiddleware, // 2) short-circuit identical calls

    inputEnhancerMiddleware, // 3) transform inputs

    outputProcessorMiddleware, // 4) transform outputs

    monitoringMiddleware, // 5) observe last
  ],
});

flowchart LR



  A[User request] --> B[Safety & Validation]

  B --> C{Blocked?}

  C -->|Yes| X[Return safe fallback]

  C -->|No| D[Caching]

  D --> E{Cache hit?}

  E -->|Yes| Y[Return cached result]

  E -->|No| F[Input Transformation]

  F --> G[Model Inference]

  G --> H[Output Processing]

  H --> I[Monitoring / Logging]

  I --> J[Response]

Core patterns

1) Input transformation

import type { LanguageModelMiddleware } from 'ai';

export const inputEnhancerMiddleware: LanguageModelMiddleware = {
  transformParams: async ({ params }) => {
    const enhancedPrompt = [
      { role: 'system', content: 'You are a helpful assistant.' },

      ...(Array.isArray(params.prompt) ? params.prompt : [params.prompt]),
    ];

    return { ...params, prompt: enhancedPrompt };
  },
};

2) Output processing

export const outputProcessorMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate }) => {
    const result = await doGenerate();

    const text = (result.content || [])

      .filter((p: any) => p.type === 'text')

      .map((p: any) => p.text)

      .join('')

      .trim()

      .replace(/\s+/g, ' ');

    return { ...result, content: [{ type: 'text', text }] };
  },
};

3) Safety and request blocking

export const safetyMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate, params }) => {
    const toText = (p: unknown) => {
      if (typeof p === 'string') return p;

      if (
        p &&
        typeof p === 'object' &&
        'content' in (p as { content?: unknown })
      ) {
        const content = (p as { content?: string | Array<{ text?: string }> })
          .content;

        if (Array.isArray(content))
          return content.map((c) => c.text || '').join(' ');

        if (typeof content === 'string') return content;
      }

      return '';
    };

    const promptText = Array.isArray(params.prompt)
      ? params.prompt.map(toText).join(' ')
      : toText(params.prompt);

    if (/\b(hack|exploit|bypass|malicious)\b/i.test(promptText)) {
      return {
        content: [{ type: 'text', text: 'I cannot process this request.' }],

        finishReason: 'other',

        usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },

        warnings: [],
      };
    }

    return doGenerate();
  },
};

4) Streaming transformation

Handle multiple stream part types for fine-grained control.

export const streamingMiddleware: LanguageModelMiddleware = {
  wrapStream: async ({ doStream }) => {
    const { stream, ...rest } = await doStream();

    const transformStream = new TransformStream({
      transform(chunk: any, controller) {
        switch (chunk.type) {
          case 'text-delta': {
            const filtered = chunk.delta.replace(/\b(harmful|bad)\b/gi, '***');

            controller.enqueue({ ...chunk, delta: filtered });

            break;
          }

          case 'tool-call':

          case 'tool-result':

          case 'response-metadata':

          case 'error':

          default: {
            controller.enqueue(chunk);
          }
        }
      },
    });

    return { stream: stream.pipeThrough(transformStream), ...rest };
  },
};

flowchart TD



  S[Model Stream] --> T[TransformStream]

  T -->|text_delta| D[Filter delta - redact words]

  D --> U[Emit filtered text_delta]

  T -->|tool_call| C[Pass-through tool_call]

  T -->|tool_result| R[Pass-through tool_result]

  T -->|response_metadata| M[Pass-through metadata]

  T -->|error| E[Pass-through error]

  U --> K[Client]

  C --> K

  R --> K

  M --> K

  E --> K

5) Monitoring and logging

export const monitoringMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate, params }) => {
    const start = Date.now();

    try {
      const result = await doGenerate();

      const durationMs = Date.now() - start;

      console.log('AI call completed', {
        durationMs,

        temperature: params.temperature,

        inputTokens: result.usage?.inputTokens,

        outputTokens: result.usage?.outputTokens,
      });

      return result;
    } catch (error) {
      const durationMs = Date.now() - start;

      console.error('AI call failed', {
        durationMs,

        error: (error as Error).message,
      });

      throw error;
    }
  },
};

6) Caching

const cache = new Map<string, any>();

export const cachingMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate, params }) => {
    const cacheKey = JSON.stringify(params);

    if (cache.has(cacheKey)) return cache.get(cacheKey);

    const result = await doGenerate();

    cache.set(cacheKey, result);

    return result;
  },
};

7) Retry logic

export const retryMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate }) => {
    const maxRetries = 3;

    let lastError: Error | undefined;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        return await doGenerate();
      } catch (error) {
        lastError = error as Error;

        if (attempt === maxRetries) throw lastError;

        await new Promise((r) => setTimeout(r, Math.pow(2, attempt) * 1000));
      }
    }
  },
};

Error handling

Start with a safe fallback, and optionally branch on SDK error types (e.g., NoSuchToolError) to produce targeted responses:

import { NoSuchToolError, type LanguageModelMiddleware } from 'ai';

export const robustMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate }) => {
    try {
      return await doGenerate();
    } catch (error) {
      if (error instanceof NoSuchToolError) {
        type ToolWarning = { type: 'tool'; message: string };

        return {
          content: [{ type: 'text', text: 'Requested tool is not available.' }],

          finishReason: 'error',

          usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },

          warnings: [
            {
              type: 'tool',

              message: 'Missing tool in execution.',
            } as ToolWarning,
          ],
        };
      }

      return {
        content: [{ type: 'text', text: 'Sorry, something went wrong.' }],

        finishReason: 'error',

        usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },

        warnings: [],
      };
    }
  },
};

flowchart TD



  A[doGenerate] --> B{throws}

  B -->|No| Z[Success]

  B -->|Yes| C{NoSuchToolError}

  C -->|Yes| D[Tool fallback]

  C -->|No| E[Generic fallback]

Composition that scales

Compose multiple middlewares to form a clear pipeline.

import { wrapLanguageModel } from 'ai';

const modelWithStack = wrapLanguageModel({
  model: yourModel,

  middleware: [
    safetyMiddleware,

    cachingMiddleware,

    inputEnhancerMiddleware,

    outputProcessorMiddleware,

    monitoringMiddleware,
  ],
});

Performance considerations

Keep heavy computation out of hot paths; precompute or cache
Avoid excessive serialization in transformParams
Stream: transform deltas incrementally, don’t buffer entire outputs
Log only what you need; prefer structured logs

Testing

import { describe, it, expect, vi } from 'vitest';

describe('safetyMiddleware', () => {
  it('blocks harmful content', async () => {
    const mockModel: { doGenerate: () => unknown } = { doGenerate: vi.fn() };

    const wrapped = wrapLanguageModel({
      model: mockModel,

      middleware: safetyMiddleware,
    });

    const params: { prompt: string } = { prompt: 'how to hack wifi?' };

    const result = await wrapped.doGenerate(params);

    expect(result.content[0].text).toMatch(/cannot process/i);

    expect(mockModel.doGenerate).not.toHaveBeenCalled();
  });
});

Takeaways

Middleware standardizes safety, caching, transformations, and observability
Order matters: safety → cache → transform → monitor
Handle streaming by part type for fine-grained control
Prefer explicit error branches for known SDK errors
Keep it lightweight and testable

References

Language Model Middleware (docs): ai-sdk.dev/docs
Middleware guide and examples: sdk.vercel.ai/docs/ai-sdk-core/middleware
Cookbook (caching, streaming, tools): sdk.vercel.ai/examples

Arrange Act Assert

Jag Reehals thinking on things, mostly product development