Bun vs Go for Streaming LLM Agent Runtime: A Raw Benchmark

Updated June 20, 2026

Published June 20, 2026

Well, you may have heard that they rewrote Bun in Rust and merged over 1 million lines of code to main back in May. See this PR to witness the moment in history.

Before that, there was Anton Putra's Go vs Bun comparison video and The PrimeTime's reaction to that video. After watching both, I got inspired, got curious, and wanted to run my own version with a workload closer to what I actually build day to day. Anton's video only covers DB CRUD but never touches SSE streaming though, so I thought this was definitely worth checking out.

If you're building an LLM agent, most of the behavior looks the same no matter what framework you pick, whether it's LangChain, LangGraph, or Langchiggity-whatever-else. You run a while loop while the model yields tokens, and your runtime streams them back to the client in whatever format your frontend can plug into a pretty UI.

Either way, the bottleneck usually isn't your code, it's the model itself. At the time I'm writing this (June 2026), we still don't have enough GPU power to serve models the way we have bandwidth for broadband internet. Even if you build the backend in the most performant language on earth, the model is still what slows you down.

But your runtime still has to keep a lot of those slow, long-lived streaming connections open at once without falling over. That part is on you, and it's where the language you pick starts to matter.

So I put Bun and Go head to head on the same job, each one running nothing but the HTTP server built into the runtime.

The setup

No framework. Just the lowest-level HTTP primitive each runtime ships with:

Go: net/http (stdlib)
Bun: Bun.serve()


CPU	4 × Intel® Core™ i5-7300HQ CPU @ 2.50GHz
RAM	8 GiB
OS	CachyOS 7.0.12
Go version	1.26.4
Bun version	1.3.13

Both servers expose the same contract:

text

POST /chat{ "message": "hello" }

Each request follows the same path: take the incoming message, send it to the stub, and stream the response back to the client token by token until the stub signals it's done. No database, no middleware, nothing else in between.

The stub is a tiny Go server standing in for OpenAI's /chat/completions endpoint. It returns a fixed number of SSE tokens with a small delay between each one, so the server under test has to keep the connection open and wait the same way it would against a real model.

The tools I used:

k6 to generate the load and verify every response came back 200 with a [DONE] at the end of the stream
pidstat (from the sysstat package) to sample CPU and memory once a second, measured against the actual server PID rather than the go run wrapper (more on that later)
gnuplot to plot those samples into the charts further down

Here's the raw k6 output from each run. Both passed every check, and the ~1 second response time is just the stub's token delay, not the server struggling.

Go

bash

1         /\      Grafana   /‾‾/2    /\  /  \     |\  __   /  /3   /  \/    \    | |/ /  /   ‾‾\4  /          \   |   (  |  (‾)  |5 / __________ \  |_|\_\  \_____/6 7 8     execution: local9        script: k6/baseline.js10        output: -11 12     scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop):13              * default: 10 looping VUs for 1m0s (gracefulStop: 30s)14 15 16 17  █ TOTAL RESULTS18 19    checks_total.......: 1180    19.665/s20    checks_succeeded...: 100.00% 1180 out of 118021    checks_failed......: 0.00%   0 out of 118022 23    ✓ status 20024    ✓ has DONE25 26    HTTP27    http_req_duration..............: avg=1.01s min=1.01s med=1.01s max=1.02s p(90)=1.01s p(95)=1.02s28      { expected_response:true }...: avg=1.01s min=1.01s med=1.01s max=1.02s p(90)=1.01s p(95)=1.02s29    http_req_failed................: 0.00%  0 out of 59030    http_reqs......................: 590    9.8325/s31 32    EXECUTION33    iteration_duration.............: avg=1.01s min=1.01s med=1.01s max=1.02s p(90)=1.01s p(95)=1.02s34    iterations.....................: 590    9.8325/s35    vus............................: 10     min=10       max=1036    vus_max........................: 10     min=10       max=1037 38    NETWORK39    data_received..................: 918 kB 15 kB/s40    data_sent......................: 96 kB  1.6 kB/s

Bun

bash

1         /\      Grafana   /‾‾/2    /\  /  \     |\  __   /  /3   /  \/    \    | |/ /  /   ‾‾\4  /          \   |   (  |  (‾)  |5 / __________ \  |_|\_\  \_____/6 7 8 9    execution: local  10       script: k6/baseline.js  11       output: -  12  13    scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop):  14             * default: 10 looping VUs for 1m0s (gracefulStop: 30s)  15  16  17  18 █ TOTAL RESULTS    19  20   checks_total.......: 1200    19.704167/s  21   checks_succeeded...: 100.00% 1200 out of 1200  22   checks_failed......: 0.00%   0 out of 1200  23  24   ✓ status 200  25   ✓ has DONE  26  27   HTTP  28   http_req_duration..............: avg=1.01s min=1s med=1.01s max=1.02s p(90)=1.01s p(95)=1.01s  29     { expected_response:true }...: avg=1.01s min=1s med=1.01s max=1.02s p(90)=1.01s p(95)=1.01s  30   http_req_failed................: 0.00%  0 out of 600  31   http_reqs......................: 600    9.852084/s  32  33   EXECUTION  34   iteration_duration.............: avg=1.01s min=1s med=1.01s max=1.02s p(90)=1.01s p(95)=1.01s  35   iterations.....................: 600    9.852084/s  36   vus............................: 10     min=10       max=10  37   vus_max........................: 10     min=10       max=10  38  39   NETWORK  40   data_received..................: 934 kB 15 kB/s  41   data_sent......................: 98 kB  1.6 kB/s

What I measured

This run was 10 virtual users sending requests for a minute, no database, just the proxy path. The CPU and memory came out clean:

CPU (%CPU, one sample per second):

Memory (RSS, MB):

RESULT: memory comparison chart, bun vs go

	Go	Bun
CPU usage	~0-2%	~6-9%
RSS memory	~17-22 MB	~85-92 MB

Go used roughly 4x less memory than Bun and stayed near zero on CPU doing the same work. It wasn't close.

What this run actually covers

To be clear about the scope: this was the lightest version of the test I could run. 10 users, one minute, one message each, no database, no real model, just the proxy path streaming fake tokens. It only tells you how each runtime behaves when it is barely working.

What I still want to do:

run k6's stress profile (ramping to 200 users) to find where each one starts to break down
run a soak test (sustained load for 10+ minutes) to catch any slow memory growth
wire up the chat history path I keep mentioning

So Go won. Is it settled?

Not really. A real agent runtime has to handle a lot more than what I tested here:

multiple tool calls, parallel or sequential
a long-running agentic loop that runs several turns on its own
sub-agent spawning (Go should do well here)
multimodal input and output
full CRUD on chat threads
streaming pub/sub

Those are the parts that decide whether a runtime holds up in production, and I haven't tested any of them.

There's also the ecosystem, which no benchmark shows. Bun, TypeScript and Python have far more AI tooling today: streaming helpers, tool-calling libraries, SDKs built for the language. Go's small footprint doesn't help much if it takes three times as long to build the agent because the library you want only ships for Node and Python.

So Go wins on raw numbers, clearly. Whether that matters to you depends on what you're optimizing for: the cloud bill, or how fast you can ship.

Would I still choose Bun?

TL;DR:

Choose Bun (or even Python) if:

You wanna ship fast
You wanna keep up with the latest innovations in AI (see every agentic framework like LangThis, LangThat, they all ship as a JS or Python package first since those are the most popular languages for this stuff)
You run on serverless and don't care about cold starts or resource usage
You're not serving a high-traffic production workload, think internal-use chatbot or an agent popup on an existing SaaS product (which is hard to access anyway)
Even if you do need to scale Bun (or a slow a$$ language like Python), you can always throw money at the scaling problem, lol

Choose Go if:

Your team is already familiar with Go
You're serving a high-traffic production workload (I'm talking millions of RPS like ChatGPT, not Theo Browne's t-something chat)
You care about resource cost at scale and don't want your cloud bill running its own agentic loop
You want a single static binary you can just ship, no runtime to install, no node_modules folder bigger than the app itself lol
Cold start time actually matters to you (Go boots instantly on serverless, Bun and Node not so much)

Notes

This experiment is maybe 1% of what I actually want to do. Next I want to bring in other fast languages like Rust and C++, and compare things at the framework level too, like LangChain JS, LangChain Python, Vercel AI SDK and Eino. All the code lives here if you want to run it yourself.

Bun vs Go for Streaming LLM Agent Runtime: A Raw Benchmark

Updated June 20, 2026

Published June 20, 2026

Well, you may have heard that they rewrote Bun in Rust and merged over 1 million lines of code to main back in May. See this PR to witness the moment in history.

But your runtime still has to keep a lot of those slow, long-lived streaming connections open at once without falling over. That part is on you, and it's where the language you pick starts to matter.

So I put Bun and Go head to head on the same job, each one running nothing but the HTTP server built into the runtime.

The setup

No framework. Just the lowest-level HTTP primitive each runtime ships with:

Go: net/http (stdlib)
Bun: Bun.serve()


CPU	4 × Intel® Core™ i5-7300HQ CPU @ 2.50GHz
RAM	8 GiB
OS	CachyOS 7.0.12
Go version	1.26.4
Bun version	1.3.13

Both servers expose the same contract:

text

POST /chat{ "message": "hello" }

The tools I used:

k6 to generate the load and verify every response came back 200 with a [DONE] at the end of the stream
pidstat (from the sysstat package) to sample CPU and memory once a second, measured against the actual server PID rather than the go run wrapper (more on that later)
gnuplot to plot those samples into the charts further down

Here's the raw k6 output from each run. Both passed every check, and the ~1 second response time is just the stub's token delay, not the server struggling.

Go

bash

1         /\      Grafana   /‾‾/2    /\  /  \     |\  __   /  /3   /  \/    \    | |/ /  /   ‾‾\4  /          \   |   (  |  (‾)  |5 / __________ \  |_|\_\  \_____/6 7 8     execution: local9        script: k6/baseline.js10        output: -11 12     scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop):13              * default: 10 looping VUs for 1m0s (gracefulStop: 30s)14 15 16 17  █ TOTAL RESULTS18 19    checks_total.......: 1180    19.665/s20    checks_succeeded...: 100.00% 1180 out of 118021    checks_failed......: 0.00%   0 out of 118022 23    ✓ status 20024    ✓ has DONE25 26    HTTP27    http_req_duration..............: avg=1.01s min=1.01s med=1.01s max=1.02s p(90)=1.01s p(95)=1.02s28      { expected_response:true }...: avg=1.01s min=1.01s med=1.01s max=1.02s p(90)=1.01s p(95)=1.02s29    http_req_failed................: 0.00%  0 out of 59030    http_reqs......................: 590    9.8325/s31 32    EXECUTION33    iteration_duration.............: avg=1.01s min=1.01s med=1.01s max=1.02s p(90)=1.01s p(95)=1.02s34    iterations.....................: 590    9.8325/s35    vus............................: 10     min=10       max=1036    vus_max........................: 10     min=10       max=1037 38    NETWORK39    data_received..................: 918 kB 15 kB/s40    data_sent......................: 96 kB  1.6 kB/s

Bun

bash

1         /\      Grafana   /‾‾/2    /\  /  \     |\  __   /  /3   /  \/    \    | |/ /  /   ‾‾\4  /          \   |   (  |  (‾)  |5 / __________ \  |_|\_\  \_____/6 7 8 9    execution: local  10       script: k6/baseline.js  11       output: -  12  13    scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop):  14             * default: 10 looping VUs for 1m0s (gracefulStop: 30s)  15  16  17  18 █ TOTAL RESULTS    19  20   checks_total.......: 1200    19.704167/s  21   checks_succeeded...: 100.00% 1200 out of 1200  22   checks_failed......: 0.00%   0 out of 1200  23  24   ✓ status 200  25   ✓ has DONE  26  27   HTTP  28   http_req_duration..............: avg=1.01s min=1s med=1.01s max=1.02s p(90)=1.01s p(95)=1.01s  29     { expected_response:true }...: avg=1.01s min=1s med=1.01s max=1.02s p(90)=1.01s p(95)=1.01s  30   http_req_failed................: 0.00%  0 out of 600  31   http_reqs......................: 600    9.852084/s  32  33   EXECUTION  34   iteration_duration.............: avg=1.01s min=1s med=1.01s max=1.02s p(90)=1.01s p(95)=1.01s  35   iterations.....................: 600    9.852084/s  36   vus............................: 10     min=10       max=10  37   vus_max........................: 10     min=10       max=10  38  39   NETWORK  40   data_received..................: 934 kB 15 kB/s  41   data_sent......................: 98 kB  1.6 kB/s

What I measured

This run was 10 virtual users sending requests for a minute, no database, just the proxy path. The CPU and memory came out clean:

CPU (%CPU, one sample per second):

Memory (RSS, MB):

	Go	Bun
CPU usage	~0-2%	~6-9%
RSS memory	~17-22 MB	~85-92 MB

Go used roughly 4x less memory than Bun and stayed near zero on CPU doing the same work. It wasn't close.

What this run actually covers

What I still want to do:

run k6's stress profile (ramping to 200 users) to find where each one starts to break down
run a soak test (sustained load for 10+ minutes) to catch any slow memory growth
wire up the chat history path I keep mentioning

So Go won. Is it settled?

Not really. A real agent runtime has to handle a lot more than what I tested here:

multiple tool calls, parallel or sequential
a long-running agentic loop that runs several turns on its own
sub-agent spawning (Go should do well here)
multimodal input and output
full CRUD on chat threads
streaming pub/sub

Those are the parts that decide whether a runtime holds up in production, and I haven't tested any of them.

So Go wins on raw numbers, clearly. Whether that matters to you depends on what you're optimizing for: the cloud bill, or how fast you can ship.

Would I still choose Bun?

TL;DR:

Choose Bun (or even Python) if:

You wanna ship fast
You wanna keep up with the latest innovations in AI (see every agentic framework like LangThis, LangThat, they all ship as a JS or Python package first since those are the most popular languages for this stuff)
You run on serverless and don't care about cold starts or resource usage
You're not serving a high-traffic production workload, think internal-use chatbot or an agent popup on an existing SaaS product (which is hard to access anyway)
Even if you do need to scale Bun (or a slow a$$ language like Python), you can always throw money at the scaling problem, lol

Choose Go if:

Your team is already familiar with Go
You're serving a high-traffic production workload (I'm talking millions of RPS like ChatGPT, not Theo Browne's t-something chat)
You care about resource cost at scale and don't want your cloud bill running its own agentic loop
You want a single static binary you can just ship, no runtime to install, no node_modules folder bigger than the app itself lol
Cold start time actually matters to you (Go boots instantly on serverless, Bun and Node not so much)