Skip to content
GitHub

Sandbox Provider Leaderboard

Sandbox Benchmarks

A leaderboard of common benchmarks for each of our sandbox providers.

Last run: May 13, 2026
ArchilLatitudeBrowserbaseTigris

Performance Over Time

Composite Score

Detailed Metrics

Provider
Score
Median
P95
P99
Success
Declaw98.20.17s0.19s0.20s100%
Daytona94.40.51s0.64s0.65s100%
Tensorlake94.50.52s0.60s0.60s100%
Archil93.30.63s0.72s0.73s100%
E2B92.10.69s0.92s1.00s100%
Vercel89.40.80s1.33s1.67s100%
Blaxel83.71.49s1.84s1.85s100%
Modal81.71.62s2.11s2.17s100%
Upstash72.01.79s3.92s4.00s98%
Namespace77.82.11s2.36s2.41s100%
Cloudflare73.72.24s2.92s3.72s100%
Runloop32.85.62s8.02s9.01s100%
CodeSandbox20.86.53s13.04s14.80s100%
Hopx0.015.97s16.50s16.56s100%

Want to see a provider added?

Let us know on X

Methodology

What We Measure

Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.

Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.

Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.

Staggered Test: Sandboxes are launched with 200ms delays between each.

Burst Test: All sandboxes are launched concurrently in a single burst.

How We Score

The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.

The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.

  • Median: 60% — primary signal for typical experience
  • P95: 25% — tail latency / consistency
  • P99: 15% — extreme tail latency

Sandbox Benchmarks FAQs

Have another question? Email us.

A sandbox is anywhere you can run code in isolation. It could be a VM, bare metal, a container, anywhere with compute resources.