Sandbox Provider Leaderboard
Sandbox Benchmarks
A leaderboard of common benchmarks for each of our sandbox providers.
Provider Leaderboard
Performance Over Time
Composite Score
Detailed Metrics
# | Provider | Score | Median | P95 | P99 | Success |
|---|---|---|---|---|---|---|
| 1 | 90.2 | 0.44s | 0.95s | 3.15s | 100% | |
| 2 | 89.8 | 0.71s | 1.44s | 1.54s | 100% | |
| 3 | 88.1 | 1.06s | 1.38s | 1.42s | 100% | |
| 4 | 88.8 | 1.10s | 1.15s | 1.17s | 100% | |
| 5 | 83.9 | 1.60s | 1.62s | 1.65s | 100% | |
| 6 | 81.6 | 1.76s | 1.94s | 1.98s | 100% | |
| 7 | 79.0 | 2.02s | 2.18s | 2.27s | 100% | |
| 8 | 69.5 | 2.24s | 4.12s | 4.48s | 100% | |
| 9 | 74.0 | 2.27s | 3.02s | 3.20s | 100% | |
| 10 | 75.0 | 2.39s | 2.64s | 2.68s | 100% |
Want to see a provider added? Let us know on X.
Methodology
What We Measure
Every benchmark measures Time to Interactive (TTI) — the elapsed time from calling compute.sandbox.create() to the first successful runCommand() inside the sandbox.
Each provider is tested with 100 iterations per run. Benchmarks run automatically via GitHub Actions on a recurring schedule. All results are committed to the public benchmarks repo.
Sequential Test: Sandboxes are launched one at a time, waiting for each to become interactive before starting the next.
Staggered Test: Sandboxes are launched with 200ms delays between each.
Burst Test: All sandboxes are launched concurrently in a single burst.
How We Score
The Composite Score is a weighted blend of timing metrics multiplied by the success rate. Each metric is scored against a fixed 10-second ceiling: 100 × (1 − value / 10,000ms), so a 200ms median scores 98 and anything ≥10s scores 0.
The weighted timing score is then multiplied by the success rate (0–1), so providers that fail frequently are penalized proportionally.
- • Median: 60% — primary signal for typical experience
- • P95: 25% — tail latency / consistency
- • P99: 15% — extreme tail latency
Sandbox Benchmarks FAQs
Have another question? Email us.