GPU compute rental

Rent production GPU capacity. Deploy models through one API.

StellarComputerX packages approved GPU supply into shared, reserved, and dedicated compute lanes. Teams can lease capacity, run open-model templates, and expose them through the same OpenAI-compatible API and Credit settlement layer.

4 lanesShared, reserved, dedicated, and managed model deployment
60+ modelsProduction-hosted routes can run on approved nodes
1 APIStable endpoint across model and compute changes

Capacity configurator

Configure GPU capacity like a cloud console, then expose it as an API lane.

Pick billing posture, region, instance, runtime, storage, and network controls. The quote stays attached to the model-serving path instead of becoming a disconnected infrastructure workflow.

1

Base

2

Instance

3

Runtime

4

Network

5

Review

01

Billing and region

02

Instance specification

Select a production-ready GPU shape. Prices are indicative credits for capacity planning.

InstanceGPUvCPUMemoryNetworkBest forStatusCredit/hr
scx.h100.8xlargeH1008 x H100 80GB96 vCPU960 GiB400 / 800 GbpsFrontier inference, distributed fine-tuningConsult58.40
scx.h100.4xlargeH1004 x H100 80GB64 vCPU640 GiB400 GbpsLarge reasoning models, batch inferenceConsult31.20
scx.h100.1xlargeH1001 x H100 80GB32 vCPU256 GiB200 / 400 GbpsReasoning, high-throughput inferenceConsult8.80
scx.a100.4xlargeA1004 x A100 80GB64 vCPU512 GiB200 GbpsFine-tuning, high-concurrency servingReservable12.60
scx.a100.2xlargeA1002 x A100 80GB48 vCPU384 GiB100 / 200 GbpsFine-tuning, embeddings, LLM servingReservable6.40
scx.a100.1xlargeA1001 x A100 80GB24 vCPU192 GiB100 GbpsLLM serving, LoRA jobs, embeddingsReservable3.40
scx.l40s.4xlargeL40S4 x L40S 48GB48 vCPU384 GiB100 GbpsMultimodal inference, image/video jobsReady9.70
scx.l40s.2xlargeL40S2 x L40S 48GB32 vCPU256 GiB100 GbpsVision-language models, vLLM poolsReady5.20
scx.l40s.1xlargeL40S1 x L40S 48GB16 vCPU128 GiB50 / 100 GbpsMultimodal inference, vision, vLLMReady2.80
scx.l20.4xlargeL204 x L20 48GB48 vCPU384 GiB100 GbpsCost-efficient 70B serving poolsReady6.80
scx.l20.2xlargeL202 x L20 48GB32 vCPU192 GiB50 / 100 GbpsPrivate assistants, model gatewaysReady3.70
scx.l20.1xlargeL201 x L20 48GB16 vCPU96 GiB25 / 50 GbpsCost-efficient model servingReady1.95
scx.a10.2xlargeA102 x A10 24GB16 vCPU128 GiB25 GbpsSmall model serving, rerank, dev poolsReady1.72
scx.a10.largeA101 x A10 24GB8 vCPU64 GiB10 / 25 GbpsEmbedding, rerank, dev workloadsReady0.92
03

Runtime template

Base image

04

Network and security

Bandwidth cap

Instances1
Estimated hourly4.02Compute Credits / hr
Estimated monthly2,894.40Compute Credits / 720h
Attach to API routeRequest this capacity

Product advantages

Product advantages

Built for AI model operators: performance, elasticity, network posture, and billing all sit behind a unified model-access surface.

01

Vetted GPU supply

Approved providers, node health checks, and deployment-ready capacity instead of anonymous marketplace listings.

02

Elastic reservation

Start with shared inference, reserve throughput when traffic stabilizes, then move to dedicated pools when boundaries matter.

03

Model-serving templates

Prebuilt runtime patterns for vLLM-style serving, embeddings, rerankers, multimodal routes, and batch jobs.

04

Credit settlement

On-demand usage, reserved capacity, and dedicated deployments reconcile into one Compute Credit ledger.

Rental modes

Rental modes

Choose capacity by operating need, not only by GPU SKU. Every mode can feed the unified API layer when the workload becomes a customer-facing model route.

Hourly / daily

On-demand GPU

Launch experiments, evaluation jobs, or burst inference without long commitments.

Monthly reserve

Reserved inference pool

Keep warm throughput for production model aliases with predictable latency and budget posture.

Private pool

Dedicated node group

Pin workloads to isolated nodes for enterprise traffic, compliance, regional posture, or sustained utilization.

API-ready

Managed open-model deployment

Ask SCX to deploy mature open models on selected GPU supply and expose a stable route.

Use cases

Use cases

Map each workload to the right compute lane before reserving capacity.

LLM production inference

Serve chat, reasoning, and agent routes with warm pools, autoscaling posture, and route-level observability.

Private model deployment

Run approved open models or customer-tuned models on isolated capacity while keeping the public API contract stable.

Fine-tuning and evaluation

Schedule training, LoRA jobs, benchmark batches, and regression suites on GPU families matched to memory needs.

Multimodal and media jobs

Support vision, OCR, image generation, audio transcription, and rendering workloads from the same capacity layer.

Embedding and retrieval

Deploy embedding and reranking services close to application traffic with predictable cost and throughput.

Enterprise reserved capacity

Lock budget, region, security posture, and invoice flow before scaling high-value production traffic.

Activation flow

Activation flow

A clear path from workload demand to live endpoint, with deployment and settlement connected from day one.

01. Submit workload profile

Model family, modality, context window, traffic estimate, region, latency target, and compliance needs.

02. Match GPU lane

SCX recommends on-demand, reserved, dedicated, or managed deployment capacity with a clear cost posture.

03. Deploy runtime

Bring your own container or use platform templates for inference, embeddings, rerankers, and batch workers.

04. Route and settle

Expose the workload through API aliases, monitor health, and reconcile usage through Compute Credit.

StellarComputerX Compute

Need a GPU plan for a production model?

Send the workload profile. SCX can propose GPU family, serving mode, reservation shape, and API exposure path.