// capabilities

  • -h100, a100, and l40s gpu on-demand or reserved
  • -auto-scaling inference from zero to thousands of replicas
  • -managed model registry with versioning and rollback
  • -agent orchestration with built-in tool calling and memory
  • -vector storage and rag pipelines included

// capabilities

  • -serverless functions with sub-100ms cold starts
  • -dedicated containers with reserved cpu and memory
  • -switch between modes without redeploying code
  • -canary releases and traffic splitting built-in
  • -per-service autoscaling policies

// capabilities

  • -serverless postgres with instant branching
  • -redis for caching, queues, and real-time state
  • -s3-compatible object store for files and embeddings
  • -read replicas in 40+ regions
  • -point-in-time recovery up to 30 days