GPU compute rental

Rent production GPU capacity. Deploy models through one API.

StellarComputerX packages approved GPU supply into shared, reserved, and dedicated compute lanes. Teams can lease capacity, run open-model templates, and expose them through the same OpenAI-compatible API and Credit settlement layer.

Request GPU capacity Configure capacity

4 lanesShared, reserved, dedicated, and managed model deployment

60+ modelsProduction-hosted routes can run on approved nodes

1 APIStable endpoint across model and compute changes

Capacity configurator

Configure GPU capacity like a cloud console, then expose it as an API lane.

Pick billing posture, region, instance, runtime, storage, and network controls. The quote stays attached to the model-serving path instead of becoming a disconnected infrastructure workflow.

Base

Instance

Runtime

Network

Review

Billing and region

Region / availabilityAvailability zone

Instance specification

Select a production-ready GPU shape. Prices are indicative credits for capacity planning.

Instance	GPU	vCPU	Memory	Network	Best for	Status	Credit/hr
scx.h100.8xlargeH100	8 x H100 80GB	96 vCPU	960 GiB	400 / 800 Gbps	Frontier inference, distributed fine-tuning	Consult	58.40
scx.h100.4xlargeH100	4 x H100 80GB	64 vCPU	640 GiB	400 Gbps	Large reasoning models, batch inference	Consult	31.20
scx.h100.1xlargeH100	1 x H100 80GB	32 vCPU	256 GiB	200 / 400 Gbps	Reasoning, high-throughput inference	Consult	8.80
scx.a100.4xlargeA100	4 x A100 80GB	64 vCPU	512 GiB	200 Gbps	Fine-tuning, high-concurrency serving	Reservable	12.60
scx.a100.2xlargeA100	2 x A100 80GB	48 vCPU	384 GiB	100 / 200 Gbps	Fine-tuning, embeddings, LLM serving	Reservable	6.40
scx.a100.1xlargeA100	1 x A100 80GB	24 vCPU	192 GiB	100 Gbps	LLM serving, LoRA jobs, embeddings	Reservable	3.40
scx.l40s.4xlargeL40S	4 x L40S 48GB	48 vCPU	384 GiB	100 Gbps	Multimodal inference, image/video jobs	Ready	9.70
scx.l40s.2xlargeL40S	2 x L40S 48GB	32 vCPU	256 GiB	100 Gbps	Vision-language models, vLLM pools	Ready	5.20
scx.l40s.1xlargeL40S	1 x L40S 48GB	16 vCPU	128 GiB	50 / 100 Gbps	Multimodal inference, vision, vLLM	Ready	2.80
scx.l20.4xlargeL20	4 x L20 48GB	48 vCPU	384 GiB	100 Gbps	Cost-efficient 70B serving pools	Ready	6.80
scx.l20.2xlargeL20	2 x L20 48GB	32 vCPU	192 GiB	50 / 100 Gbps	Private assistants, model gateways	Ready	3.70
scx.l20.1xlargeL20	1 x L20 48GB	16 vCPU	96 GiB	25 / 50 Gbps	Cost-efficient model serving	Ready	1.95
scx.a10.2xlargeA10	2 x A10 24GB	16 vCPU	128 GiB	25 Gbps	Small model serving, rerank, dev pools	Ready	1.72
scx.a10.largeA10	1 x A10 24GB	8 vCPU	64 GiB	10 / 25 Gbps	Embedding, rerank, dev workloads	Ready	0.92

Runtime template

Base image

System storage

120 GiB

Network and security

Bandwidth cap

Instances1

Estimated hourly4.02Compute Credits / hr

Estimated monthly2,894.40Compute Credits / 720h

Attach to API route Request this capacity

Product advantages

Built for AI model operators: performance, elasticity, network posture, and billing all sit behind a unified model-access surface.

Vetted GPU supply

Approved providers, node health checks, and deployment-ready capacity instead of anonymous marketplace listings.

Elastic reservation

Start with shared inference, reserve throughput when traffic stabilizes, then move to dedicated pools when boundaries matter.

Model-serving templates

Prebuilt runtime patterns for vLLM-style serving, embeddings, rerankers, multimodal routes, and batch jobs.

Credit settlement

On-demand usage, reserved capacity, and dedicated deployments reconcile into one Compute Credit ledger.

Rental modes

Choose capacity by operating need, not only by GPU SKU. Every mode can feed the unified API layer when the workload becomes a customer-facing model route.

Hourly / daily

On-demand GPU

Launch experiments, evaluation jobs, or burst inference without long commitments.

Monthly reserve

Reserved inference pool

Keep warm throughput for production model aliases with predictable latency and budget posture.

Private pool

Dedicated node group

Pin workloads to isolated nodes for enterprise traffic, compliance, regional posture, or sustained utilization.

API-ready

Managed open-model deployment

Ask SCX to deploy mature open models on selected GPU supply and expose a stable route.

Use cases

Map each workload to the right compute lane before reserving capacity.

LLM production inference

Serve chat, reasoning, and agent routes with warm pools, autoscaling posture, and route-level observability.

Private model deployment

Run approved open models or customer-tuned models on isolated capacity while keeping the public API contract stable.

Fine-tuning and evaluation

Schedule training, LoRA jobs, benchmark batches, and regression suites on GPU families matched to memory needs.

Multimodal and media jobs

Support vision, OCR, image generation, audio transcription, and rendering workloads from the same capacity layer.

Embedding and retrieval

Deploy embedding and reranking services close to application traffic with predictable cost and throughput.

Enterprise reserved capacity

Lock budget, region, security posture, and invoice flow before scaling high-value production traffic.

Activation flow

A clear path from workload demand to live endpoint, with deployment and settlement connected from day one.

01. Submit workload profile

Model family, modality, context window, traffic estimate, region, latency target, and compliance needs.

02. Match GPU lane

SCX recommends on-demand, reserved, dedicated, or managed deployment capacity with a clear cost posture.

03. Deploy runtime

Bring your own container or use platform templates for inference, embeddings, rerankers, and batch workers.

04. Route and settle

Expose the workload through API aliases, monitor health, and reconcile usage through Compute Credit.

StellarComputerX Compute

Need a GPU plan for a production model?

Send the workload profile. SCX can propose GPU family, serving mode, reservation shape, and API exposure path.

Request capacity plan Open model registry