Sandbox Provider Leaderboard
Sandbox Benchmarks
A leaderboard of common benchmarks for each of our sandbox providers.
Performance Over Time
Composite Score
Detailed Metrics
Provider | Score | Median | P95 | P99 | Success |
|---|---|---|---|---|---|
| Daytona | 98.3 | 0.10s | 0.28s | 0.28s | 100% |
| Vercel | 95.7 | 0.38s | 0.48s | 0.55s | 100% |
| E2B | 94.0 | 0.44s | 0.76s | 0.96s | 100% |
| Blaxel | 95.4 | 0.44s | 0.47s | 0.48s | 100% |
| Hopx | 7.9 | 1.05s | 1.37s | 1.37s | 9% |
| Modal | 83.0 | 1.52s | 1.94s | 1.99s | 100% |
| Cloudflare | 79.5 | 1.72s | 2.33s | 2.90s | 100% |
| Namespace | 80.6 | 1.77s | 1.96s | 2.61s | 100% |
| Runloop | 78.9 | 1.96s | 2.32s | 2.40s | 100% |
| CodeSandbox | 13.8 | 3.79s | 13.07s | 14.46s | 37% |
Want to see a provider added?
Methodology
What We Measure
Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.
Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.
Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.
Staggered Test: Sandboxes are launched with 200ms delays between each.
Burst Test: All sandboxes are launched concurrently in a single burst.
How We Score
The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.
The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.
- • Median: 60% — primary signal for typical experience
- • P95: 25% — tail latency / consistency
- • P99: 15% — extreme tail latency
Sandbox Benchmarks FAQs
Have another question? Email us.