Skip to content
GitHub

Sandbox Provider Leaderboard

Sandbox Benchmarks

A leaderboard of common benchmarks for each of our sandbox providers.

Last run: April 13, 2026
ArchilLatitudeBrowserbaseTigris

Performance Over Time

Composite Score

Detailed Metrics

Provider
Score
Median
P95
P99
Success
Daytona98.30.10s0.28s0.28s100%
Vercel95.70.38s0.48s0.55s100%
E2B94.00.44s0.76s0.96s100%
Blaxel95.40.44s0.47s0.48s100%
Hopx7.91.05s1.37s1.37s9%
Modal83.01.52s1.94s1.99s100%
Cloudflare79.51.72s2.33s2.90s100%
Namespace80.61.77s1.96s2.61s100%
Runloop78.91.96s2.32s2.40s100%
CodeSandbox13.83.79s13.07s14.46s37%

Want to see a provider added?

Let us know on X

Methodology

What We Measure

Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.

Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.

Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.

Staggered Test: Sandboxes are launched with 200ms delays between each.

Burst Test: All sandboxes are launched concurrently in a single burst.

How We Score

The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.

The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.

  • Median: 60% — primary signal for typical experience
  • P95: 25% — tail latency / consistency
  • P99: 15% — extreme tail latency

Sandbox Benchmarks FAQs

Have another question? Email us.

A sandbox is anywhere you can run code in isolation. It could be a VM, bare metal, a container, anywhere with compute resources.