platform

ai-native. serverless or dedicated. your call.

a single platform for ai inference, microservices, and data. choose the right compute model for each workload.

ai compute

models, agents, and inference at any scale

deploy any model to gpu-backed endpoints. bring your own weights or pull from our model registry. built-in support for vllm, tgi, ollama, and custom serving runtimes. chain models into autonomous agents with tool calling, memory, and human-in-the-loop controls.

// capabilities

-h100, a100, and l40s gpu on-demand or reserved
-auto-scaling inference from zero to thousands of replicas
-managed model registry with versioning and rollback
-agent orchestration with built-in tool calling and memory
-vector storage and rag pipelines included

adaptive compute

serverless and dedicated, per-service

choose the right execution model for each service. use serverless for bursty, event-driven workloads that scale to zero. switch to dedicated instances for sustained throughput and predictable latency. mix both in the same project.

// capabilities

-serverless functions with sub-100ms cold starts
-dedicated containers with reserved cpu and memory
-switch between modes without redeploying code
-canary releases and traffic splitting built-in
-per-service autoscaling policies

data layer

managed storage for every workload

serverless postgres, redis, and s3-compatible object storage. provision instantly, branch for dev, replicate for prod. automatic backups, encryption, and point-in-time recovery without managing a single server.

// capabilities

-serverless postgres with instant branching
-redis for caching, queues, and real-time state
-s3-compatible object store for files and embeddings
-read replicas in 40+ regions
-point-in-time recovery up to 30 days