pricing

pay for what you use

start free with gpu access. scale with serverless, dedicated, or both.

free

$0forever

experiment with models and ship side projects.

  • -50 gpu-hours / month
  • -500K serverless invocations
  • -5 GB storage
  • -community support
  • -3 services max
  • -shared inference endpoints

pro

*
$79/mo

for teams shipping ai products to production.

  • -500 gpu-hours / month
  • -unlimited invocations
  • -100 GB storage
  • -priority support
  • -unlimited services
  • -dedicated or serverless compute
  • -custom domains
  • -team seats included

enterprise

custom/org

reserved capacity, compliance, and dedicated support.

  • -everything in pro
  • -reserved gpu capacity (h100, a100)
  • -99.99% uptime sla
  • -dedicated solutions architect
  • -sso / saml / scim
  • -vpc peering and private networking
  • -on-premise and hybrid options
  • -custom model hosting agreements

faq

frequently asked

what is a gpu-hour?

one hour of compute on a single gpu. if you run inference on an h100 for 30 minutes, that is 0.5 gpu-hours.

can i mix serverless and dedicated?

yes. you can set the compute mode per-service. run your api serverless and your model inference on dedicated gpus in the same project.

do you offer startup credits?

qualifying startups can receive up to $100K in credits. contact us for details.

what happens if i exceed my free tier?

we will notify you before any overage. you can upgrade to pro or set hard spending limits. we never charge without your consent.