01
ai compute
models, agents, and inference at any scale
deploy any model to gpu-backed endpoints. bring your own weights or pull from our model registry. built-in support for vllm, tgi, ollama, and custom serving runtimes. chain models into autonomous agents with tool calling, memory, and human-in-the-loop controls.
// capabilities
- -h100, a100, and l40s gpu on-demand or reserved
- -auto-scaling inference from zero to thousands of replicas
- -managed model registry with versioning and rollback
- -agent orchestration with built-in tool calling and memory
- -vector storage and rag pipelines included
02
adaptive compute
serverless and dedicated, per-service
choose the right execution model for each service. use serverless for bursty, event-driven workloads that scale to zero. switch to dedicated instances for sustained throughput and predictable latency. mix both in the same project.
// capabilities
- -serverless functions with sub-100ms cold starts
- -dedicated containers with reserved cpu and memory
- -switch between modes without redeploying code
- -canary releases and traffic splitting built-in
- -per-service autoscaling policies
03
data layer
managed storage for every workload
serverless postgres, redis, and s3-compatible object storage. provision instantly, branch for dev, replicate for prod. automatic backups, encryption, and point-in-time recovery without managing a single server.
// capabilities
- -serverless postgres with instant branching
- -redis for caching, queues, and real-time state
- -s3-compatible object store for files and embeddings
- -read replicas in 40+ regions
- -point-in-time recovery up to 30 days