Sandbox Provider Leaderboard
Sandbox Benchmarks
A leaderboard of common benchmarks for each of our sandbox providers.
Performance Over Time
Composite Score
Detailed Metrics
Provider | Score | Median | P95 | P99 | Success |
|---|---|---|---|---|---|
| Isorun | 99.2 | 0.08s | 0.10s | 0.10s | 100% |
| Declaw | 98.0 | 0.18s | 0.23s | 0.23s | 100% |
| Northflank | 96.6 | 0.33s | 0.37s | 0.37s | 100% |
| E2B | 93.3 | 0.58s | 0.79s | 0.82s | 100% |
| Daytona | 93.1 | 0.59s | 0.81s | 0.85s | 100% |
| Modal | 93.3 | 0.65s | 0.69s | 0.70s | 100% |
| Vercel | 91.0 | 0.81s | 1.00s | 1.06s | 100% |
| Blaxel | 91.5 | 0.82s | 0.90s | 0.91s | 100% |
| Tensorlake | 52.8 | 1.21s | 10.62s | 10.63s | 100% |
| Archil | 77.0 | 2.19s | 2.44s | 2.51s | 100% |
| Cloudflare | 71.3 | 2.51s | 3.33s | 3.56s | 100% |
| Runloop | 66.4 | 3.19s | 3.62s | 3.62s | 100% |
| Upstash | 27.4 | 5.43s | 12.64s | 13.02s | 100% |
| CodeSandbox | 23.9 | 6.75s | 8.57s | 9.46s | 100% |
| Hopx | 0.0 | 16.25s | 16.96s | 17.01s | 100% |
Want to see a provider added?
Methodology
What We Measure
Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.
Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.
Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.
Staggered Test: Sandboxes are launched with 200ms delays between each.
Burst Test: All sandboxes are launched concurrently in a single burst.
How We Score
The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.
The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.
- • Median: 60% — primary signal for typical experience
- • P95: 25% — tail latency / consistency
- • P99: 15% — extreme tail latency
Sandbox Benchmarks FAQs
Have another question? Email us.