How to cache chat model responses

Prerequisites

This guide assumes familiarity with the following concepts:

LangChain provides an optional caching layer for chat models. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.

import { ChatOpenAI } from "@langchain/openai";

// To make the caching really obvious, lets use a slower model.
const model = new ChatOpenAI({
  model: "gpt-4",
  cache: true,
});

In Memory Cache

The default cache is stored in-memory. This means that if you restart your application, the cache will be cleared.

console.time();

// The first time, it is not yet in cache, so it should take longer
const res = await model.invoke("Tell me a joke!");
console.log(res);

console.timeEnd();

/*
  AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
      additional_kwargs: { function_call: undefined, tool_calls: undefined }
    },
    lc_namespace: [ 'langchain_core', 'messages' ],
    content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
    name: undefined,
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  }
  default: 2.224s
*/

console.time();

// The second time it is, so it goes faster
const res2 = await model.invoke("Tell me a joke!");
console.log(res2);

console.timeEnd();
/*
  AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
      additional_kwargs: { function_call: undefined, tool_calls: undefined }
    },
    lc_namespace: [ 'langchain_core', 'messages' ],
    content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
    name: undefined,
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  }
  default: 181.98ms
*/

Caching with Redis

LangChain also provides a Redis-based cache. This is useful if you want to share the cache across multiple processes or servers. To use it, you'll need to install the redis package:

npm
Yarn
pnpm

npm install ioredis @langchain/community

yarn add ioredis @langchain/community

pnpm add ioredis @langchain/community

Then, you can pass a cache option when you instantiate the LLM. For example:

import { ChatOpenAI } from "@langchain/openai";
import { Redis } from "ioredis";
import { RedisCache } from "@langchain/community/caches/ioredis";

const client = new Redis("redis://localhost:6379");

const cache = new RedisCache(client, {
  ttl: 60, // Optional key expiration value
});

const model = new ChatOpenAI({ cache });

const response1 = await model.invoke("Do something random!");
console.log(response1);
/*
  AIMessage {
    content: "Sure! I'll generate a random number for you: 37",
    additional_kwargs: {}
  }
*/

const response2 = await model.invoke("Do something random!");
console.log(response2);
/*
  AIMessage {
    content: "Sure! I'll generate a random number for you: 37",
    additional_kwargs: {}
  }
*/

await client.disconnect();

API Reference:

ChatOpenAI from @langchain/openai
RedisCache from @langchain/community/caches/ioredis

Caching on the File System

danger

This cache is not recommended for production use. It is only intended for local development.

LangChain provides a simple file system cache. By default the cache is stored a temporary directory, but you can specify a custom directory if you want.

const cache = await LocalFileCache.create();

Next steps

You've now learned how to cache model responses to save time and money.

Next, check out the other how-to guides on chat models, like how to get a model to return structured output or how to create your own custom chat model.

How to cache chat model responses

In Memory Cache

Caching with Redis

API Reference:

Caching on the File System

Next steps

Was this page helpful?

You can leave detailed feedback on GitHub.

How to cache chat model responses

In Memory Cache​

Caching with Redis​

API Reference:

Caching on the File System​

Next steps​

Was this page helpful?

You can leave detailed feedback on GitHub.

In Memory Cache

Caching with Redis

Caching on the File System

Next steps