Arrange Act Assert

Jag Reehals thinking on things, mostly product development

Build Safer, Faster, and More Reliable AI Apps with AI SDK Middleware

27 Aug 2025

Having recently built an AI Guardrails library for the AI SDK, I wanted to share what I learned along the way. This post will walk you through how you can write your own middleware, and why it's such a game-changer for building robust AI applications.

Design AI features that are safer, faster, and easier to evolve by layering language model middleware. This guide explains how to use AI SDK middleware to transform inputs, post-process outputs, enforce safety rules, cache results, observe performance, and handle streaming using a clean, composable approach aligned with official guidance.

Why middleware

Middleware enables you to intercept and modify calls to a language model without altering business logic. Add features incrementally, reuse them across apps, and standardize behavior across providers.

When to use middleware

Quickstart

import { wrapLanguageModel } from 'ai';

const enhancedModel = wrapLanguageModel({
  model: yourModel,

  middleware: yourMiddleware,
});

The middleware interface

Implement one or more of these hooks:

import type { LanguageModelMiddleware } from 'ai';

export const yourMiddleware: LanguageModelMiddleware = {
  transformParams: async ({ params }) => {
    return params; // modify request options for both generate and stream
  },

  wrapGenerate: async ({ doGenerate }) => {
    const result = await doGenerate();

    return result; // post-process non-streaming results
  },

  wrapStream: async ({ doStream }) => {
    const { stream, ...rest } = await doStream();

    return { stream, ...rest }; // transform streaming results
  },
};

Recommended middleware order

Apply middleware in this order for predictable behavior and performance:

const model = wrapLanguageModel({
  model: yourModel,

  middleware: [
    safetyMiddleware, // 1) block/validate

    cachingMiddleware, // 2) short-circuit identical calls

    inputEnhancerMiddleware, // 3) transform inputs

    outputProcessorMiddleware, // 4) transform outputs

    monitoringMiddleware, // 5) observe last
  ],
});

Yes

No

Yes

No

User request

Safety & Validation

Blocked?

Return safe fallback

Caching

Cache hit?

Return cached result

Input Transformation

Model Inference

Output Processing

Monitoring / Logging

Response

Core patterns

1) Input transformation

import type { LanguageModelMiddleware } from 'ai';

export const inputEnhancerMiddleware: LanguageModelMiddleware = {
  transformParams: async ({ params }) => {
    const enhancedPrompt = [
      { role: 'system', content: 'You are a helpful assistant.' },

      ...(Array.isArray(params.prompt) ? params.prompt : [params.prompt]),
    ];

    return { ...params, prompt: enhancedPrompt };
  },
};

2) Output processing

export const outputProcessorMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate }) => {
    const result = await doGenerate();

    const text = (result.content || [])

      .filter((p: any) => p.type === 'text')

      .map((p: any) => p.text)

      .join('')

      .trim()

      .replace(/\s+/g, ' ');

    return { ...result, content: [{ type: 'text', text }] };
  },
};

3) Safety and request blocking

export const safetyMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate, params }) => {
    const toText = (p: unknown) => {
      if (typeof p === 'string') return p;

      if (
        p &&
        typeof p === 'object' &&
        'content' in (p as { content?: unknown })
      ) {
        const content = (p as { content?: string | Array<{ text?: string }> })
          .content;

        if (Array.isArray(content))
          return content.map((c) => c.text || '').join(' ');

        if (typeof content === 'string') return content;
      }

      return '';
    };

    const promptText = Array.isArray(params.prompt)
      ? params.prompt.map(toText).join(' ')
      : toText(params.prompt);

    if (/\b(hack|exploit|bypass|malicious)\b/i.test(promptText)) {
      return {
        content: [{ type: 'text', text: 'I cannot process this request.' }],

        finishReason: 'other',

        usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },

        warnings: [],
      };
    }

    return doGenerate();
  },
};

4) Streaming transformation

Handle multiple stream part types for fine-grained control.

export const streamingMiddleware: LanguageModelMiddleware = {
  wrapStream: async ({ doStream }) => {
    const { stream, ...rest } = await doStream();

    const transformStream = new TransformStream({
      transform(chunk: any, controller) {
        switch (chunk.type) {
          case 'text-delta': {
            const filtered = chunk.delta.replace(/\b(harmful|bad)\b/gi, '***');

            controller.enqueue({ ...chunk, delta: filtered });

            break;
          }

          case 'tool-call':

          case 'tool-result':

          case 'response-metadata':

          case 'error':

          default: {
            controller.enqueue(chunk);
          }
        }
      },
    });

    return { stream: stream.pipeThrough(transformStream), ...rest };
  },
};

text_delta

tool_call

tool_result

response_metadata

error

Model Stream

TransformStream

Filter delta - redact words

Emit filtered text_delta

Pass-through tool_call

Pass-through tool_result

Pass-through metadata

Pass-through error

Client

5) Monitoring and logging

export const monitoringMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate, params }) => {
    const start = Date.now();

    try {
      const result = await doGenerate();

      const durationMs = Date.now() - start;

      console.log('AI call completed', {
        durationMs,

        temperature: params.temperature,

        inputTokens: result.usage?.inputTokens,

        outputTokens: result.usage?.outputTokens,
      });

      return result;
    } catch (error) {
      const durationMs = Date.now() - start;

      console.error('AI call failed', {
        durationMs,

        error: (error as Error).message,
      });

      throw error;
    }
  },
};

6) Caching

const cache = new Map<string, any>();

export const cachingMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate, params }) => {
    const cacheKey = JSON.stringify(params);

    if (cache.has(cacheKey)) return cache.get(cacheKey);

    const result = await doGenerate();

    cache.set(cacheKey, result);

    return result;
  },
};

7) Retry logic

export const retryMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate }) => {
    const maxRetries = 3;

    let lastError: Error | undefined;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        return await doGenerate();
      } catch (error) {
        lastError = error as Error;

        if (attempt === maxRetries) throw lastError;

        await new Promise((r) => setTimeout(r, Math.pow(2, attempt) * 1000));
      }
    }
  },
};

Error handling

Start with a safe fallback, and optionally branch on SDK error types (e.g., NoSuchToolError) to produce targeted responses:

import { NoSuchToolError, type LanguageModelMiddleware } from 'ai';

export const robustMiddleware: LanguageModelMiddleware = {
  wrapGenerate: async ({ doGenerate }) => {
    try {
      return await doGenerate();
    } catch (error) {
      if (error instanceof NoSuchToolError) {
        type ToolWarning = { type: 'tool'; message: string };

        return {
          content: [{ type: 'text', text: 'Requested tool is not available.' }],

          finishReason: 'error',

          usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },

          warnings: [
            {
              type: 'tool',

              message: 'Missing tool in execution.',
            } as ToolWarning,
          ],
        };
      }

      return {
        content: [{ type: 'text', text: 'Sorry, something went wrong.' }],

        finishReason: 'error',

        usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },

        warnings: [],
      };
    }
  },
};

No

Yes

Yes

No

doGenerate

throws

Success

NoSuchToolError

Tool fallback

Generic fallback

Composition that scales

Compose multiple middlewares to form a clear pipeline.

import { wrapLanguageModel } from 'ai';

const modelWithStack = wrapLanguageModel({
  model: yourModel,

  middleware: [
    safetyMiddleware,

    cachingMiddleware,

    inputEnhancerMiddleware,

    outputProcessorMiddleware,

    monitoringMiddleware,
  ],
});

Performance considerations

Testing

import { describe, it, expect, vi } from 'vitest';

describe('safetyMiddleware', () => {
  it('blocks harmful content', async () => {
    const mockModel: { doGenerate: () => unknown } = { doGenerate: vi.fn() };

    const wrapped = wrapLanguageModel({
      model: mockModel,

      middleware: safetyMiddleware,
    });

    const params: { prompt: string } = { prompt: 'how to hack wifi?' };

    const result = await wrapped.doGenerate(params);

    expect(result.content[0].text).toMatch(/cannot process/i);

    expect(mockModel.doGenerate).not.toHaveBeenCalled();
  });
});

Takeaways

References

ai ai-sdk