Skip to main content

How to stream chat model responses

All chat models implement the Runnable interface, which comes with a default implementations of standard runnable methods (i.e.ย invoke, batch, stream, streamEvents).

The default streaming implementation provides an AsyncGenerator that yields a single value: the final output from the underlying chat model provider.

tip

The default implementation does not provide support for token-by-token streaming, but it ensures that the the model can be swapped in for any other model as it supports the same standard interface.

The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.

See which integrations support token-by-token streaming here.

Streamingโ€‹

Below, we use a --- to help visualize the delimiter between tokens.

Pick your chat model:

Install dependencies

yarn add @langchain/openai 

Add environment variables

OPENAI_API_KEY=your-api-key

Instantiate the model

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
model: "gpt-3.5-turbo",
temperature: 0
});
for await (const chunk of await model.stream(
"Write me a 1 verse song about goldfish on the moon"
)) {
console.log(`${chunk.content}
---`);
}

---
Here
---
is
---
a
---

---
1
---

---
verse
---
song
---
about
---
gol
---
dfish
---
on
---
the
---
moon
---
:
---


Gol
---
dfish
---
on
---
the
---
moon
---
,
---
swimming
---
through
---
the
---
sk
---
ies
---
,
---

Floating
---
in
---
the
---
darkness
---
,
---
beneath
---
the
---
lunar
---
eyes
---
.
---

Weight
---
less
---
as
---
they
---
drift
---
,
---
through
---
the
---
endless
---
voi
---
d,
---

D
---
rif
---
ting
---
,
---
swimming
---
,
---
exploring
---
,
---
this
---
new
---
worl
---
d unexp
---
lo
---
ye
---
d.
---

---

---

Stream eventsโ€‹

Chat models also support the standard streamEvents() method.

This method is useful if youโ€™re streaming output from a larger LLM application that contains multiple steps (e.g., a chain composed of a prompt, chat model and parser).

let idx = 0;

for await (const event of model.streamEvents(
"Write me a 1 verse song about goldfish on the moon",
{
version: "v1",
}
)) {
idx += 1;
if (idx >= 5) {
console.log("...Truncated");
break;
}
console.log(event);
}
{
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
event: "on_llm_start",
name: "ChatAnthropic",
tags: [],
metadata: {},
data: { input: "Write me a 1 verse song about goldfish on the moon" }
}
{
event: "on_llm_stream",
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
tags: [],
metadata: {},
name: "ChatAnthropic",
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: "",
additional_kwargs: {
id: "msg_01DqDQ9in33ZhmrCzdZaRNMZ",
type: "message",
role: "assistant",
model: "claude-3-haiku-20240307"
},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "",
name: undefined,
additional_kwargs: {
id: "msg_01DqDQ9in33ZhmrCzdZaRNMZ",
type: "message",
role: "assistant",
model: "claude-3-haiku-20240307"
},
response_metadata: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
}
}
}
{
event: "on_llm_stream",
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
tags: [],
metadata: {},
name: "ChatAnthropic",
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: "Here",
additional_kwargs: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "Here",
name: undefined,
additional_kwargs: {},
response_metadata: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
}
}
}
{
event: "on_llm_stream",
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
tags: [],
metadata: {},
name: "ChatAnthropic",
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: " is",
additional_kwargs: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: " is",
name: undefined,
additional_kwargs: {},
response_metadata: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
}
}
}
...Truncated

Next stepsโ€‹

Youโ€™ve now seen a few ways you can stream chat model responses.

Next, check out this guide for more on streaming with other LangChain modules.


Was this page helpful?


You can leave detailed feedback on GitHub.