Draft: Bun, Go, and Rust on SSE Streaming - 8-Way LLM Proxy Benchmark
Updated June 26, 2026
Published June 26, 2026
Eight servers. Two scenarios. One machine. All proxying SSE token streams from a mock LLM stub.
Machine
| CPU | 4 × Intel Core i5-7300HQ @ 2.50 GHz |
| RAM | 8 GiB |
| OS | CachyOS 7.0.12 |
Servers under test
| server | runtime | framework |
|---|---|---|
bun-http | Bun 1.x | Bun.serve() |
bun-hono | Bun 1.x | Hono 4.6 |
go-http | Go 1.26 | net/http |
go-gin | Go 1.26 | Gin 1.10 |
go-fiber | Go 1.26 | Fiber v2 |
rust-http | Rust 1.x | std::net (sync, thread-per-conn) |
rust-actix | Rust 1.x | actix-web 4 |
rust-axum | Rust 1.x | axum 0.7 + hyper 1 |
Test pipeline
No database. STUB_URL wired. pidstat samples CPU+RSS at 1 Hz against the server PID. k6 verifies 200 and [DONE] on every iteration.
Scenarios
| scenario | VUs | duration |
|---|---|---|
| baseline | 10 constant | 60 s |
| stress | 10 to 100 to 200 to 0 | ~70 s |
Stress test - top-line numbers
Throughput - stress (req/s, higher is better)
p95 latency - stress (ms, lower is better)
CPU usage - stress (%, lower is better)
Memory - stress (MB RSS max, lower is better)
Baseline vs stress - throughput delta
Under light load everyone clusters tightly. Under stress, bun-hono is the only server that goes up. Everyone else drops 3-13%.
Full results
| server | req/s | avg ms | p95 ms | max ms | cpu avg % | mem MB | fail % |
|---|---|---|---|---|---|---|---|
| bun-hono | 1609.6 | 43.9 | 115.4 | 356.0 | 61.8 | 129.0 | 0 |
| bun-http | 1543.1 | 45.8 | 125.4 | 312.8 | 66.0 | 122.1 | 0 |
| go-http | 1423.8 | 49.7 | 201.8 | 942.8 | 82.6 | 37.4 | 0 |
| go-fiber | 1410.6 | 50.2 | 207.5 | 973.2 | 78.5 | 35.8 | 0 |
| go-gin | 1373.4 | 51.6 | 213.2 | 1052.0 | 81.8 | 41.7 | 0 |
| rust-axum | 1187.1 | 59.8 | 268.8 | 1751.7 | 102.5 | 23.4 | 0 |
| rust-actix | 1123.7 | 63.1 | 294.3 | 1543.9 | 107.6 | 25.5 | 0 |
| rust-http | 813.1 | 87.2 | 286.4 | 966.7 | 120.8 | 18.1 | 0 |
3-way comparison - normalized scores (higher = better)
bun-hono · go-http · rust-axum. Each axis normalized to 100 = best in class.
p95 Latency and Max Latency are inverted (lower ms = higher score). Memory is inverted (lower MB = higher score).
Memory share by server - stress peak RSS
Bun takes 58% of total RSS across all eight servers. Rust takes 15%.
Both rust-actix and rust-axum shipped at ~260 req/s in early runs - identical to the other servers on cold connections but 4-5x slower on keep-alive. Root cause: hyper and actix-web do not set TCP_NODELAY on accepted sockets. Small SSE writes sit in the kernel's Nagle buffer waiting for ACK confirmation, adding ~40 ms per request. Fix: .tcp_nodelay(true) on HttpServer for actix; a custom hyper http1 accept loop with stream.set_nodelay(true) for axum. Throughput after fix: actix 263 to 1124 req/s, axum 266 to 1187 req/s.
At 200 VU, bun-hono uses ~85 MB more than go-http (129 vs 37 MB) and gets back: +13% throughput, 43% lower p95 latency (115 vs 202 ms), a worst-case tail of 356 ms vs 943 ms, and 21% less CPU. For a latency-sensitive streaming service with a 256 MB+ container budget, yes - the memory delta is cheap for what you get. Only reach for Go when packing many processes onto the same box, or when memory is genuinely constrained. Reach for Rust when you need single-digit MB footprint and can accept current async proxy overhead.