Sandbox Provider Leaderboard
Sandbox Benchmarks
A leaderboard of common benchmarks for each of our sandbox providers.
Provider Leaderboard
Performance Over Time
Composite Score
Detailed Metrics
Provider | Score | Median | P95 | P99 | Success |
|---|---|---|---|---|---|
| Declaw | 98.2 | 0.17s | 0.19s | 0.20s | 100% |
| Daytona | 94.4 | 0.51s | 0.64s | 0.65s | 100% |
| Tensorlake | 94.5 | 0.52s | 0.60s | 0.60s | 100% |
| Archil | 93.3 | 0.63s | 0.72s | 0.73s | 100% |
| E2B | 92.1 | 0.69s | 0.92s | 1.00s | 100% |
| Vercel | 89.4 | 0.80s | 1.33s | 1.67s | 100% |
| Blaxel | 83.7 | 1.49s | 1.84s | 1.85s | 100% |
| Modal | 81.7 | 1.62s | 2.11s | 2.17s | 100% |
| Upstash | 72.0 | 1.79s | 3.92s | 4.00s | 98% |
| Namespace | 77.8 | 2.11s | 2.36s | 2.41s | 100% |
| Cloudflare | 73.7 | 2.24s | 2.92s | 3.72s | 100% |
| Runloop | 32.8 | 5.62s | 8.02s | 9.01s | 100% |
| CodeSandbox | 20.8 | 6.53s | 13.04s | 14.80s | 100% |
| Hopx | 0.0 | 15.97s | 16.50s | 16.56s | 100% |
Want to see a provider added?
Methodology
What We Measure
Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.
Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.
Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.
Staggered Test: Sandboxes are launched with 200ms delays between each.
Burst Test: All sandboxes are launched concurrently in a single burst.
How We Score
The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.
The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.
- • Median: 60% — primary signal for typical experience
- • P95: 25% — tail latency / consistency
- • P99: 15% — extreme tail latency
Sandbox Benchmarks FAQs
Have another question? Email us.