GPU compute rental
Rent production GPU capacity. Deploy models through one API.
StellarComputerX packages approved GPU supply into shared, reserved, and dedicated compute lanes. Teams can lease capacity, run open-model templates, and expose them through the same OpenAI-compatible API and Credit settlement layer.
Capacity configurator
Configure GPU capacity like a cloud console, then expose it as an API lane.
Pick billing posture, region, instance, runtime, storage, and network controls. The quote stays attached to the model-serving path instead of becoming a disconnected infrastructure workflow.
Base
Instance
Runtime
Network
Review
Billing and region
Instance specification
Select a production-ready GPU shape. Prices are indicative credits for capacity planning.
| Instance | GPU | vCPU | Memory | Network | Best for | Status | Credit/hr | |
|---|---|---|---|---|---|---|---|---|
| scx.h100.8xlargeH100 | 8 x H100 80GB | 96 vCPU | 960 GiB | 400 / 800 Gbps | Frontier inference, distributed fine-tuning | Consult | 58.40 | |
| scx.h100.4xlargeH100 | 4 x H100 80GB | 64 vCPU | 640 GiB | 400 Gbps | Large reasoning models, batch inference | Consult | 31.20 | |
| scx.h100.1xlargeH100 | 1 x H100 80GB | 32 vCPU | 256 GiB | 200 / 400 Gbps | Reasoning, high-throughput inference | Consult | 8.80 | |
| scx.a100.4xlargeA100 | 4 x A100 80GB | 64 vCPU | 512 GiB | 200 Gbps | Fine-tuning, high-concurrency serving | Reservable | 12.60 | |
| scx.a100.2xlargeA100 | 2 x A100 80GB | 48 vCPU | 384 GiB | 100 / 200 Gbps | Fine-tuning, embeddings, LLM serving | Reservable | 6.40 | |
| scx.a100.1xlargeA100 | 1 x A100 80GB | 24 vCPU | 192 GiB | 100 Gbps | LLM serving, LoRA jobs, embeddings | Reservable | 3.40 | |
| scx.l40s.4xlargeL40S | 4 x L40S 48GB | 48 vCPU | 384 GiB | 100 Gbps | Multimodal inference, image/video jobs | Ready | 9.70 | |
| scx.l40s.2xlargeL40S | 2 x L40S 48GB | 32 vCPU | 256 GiB | 100 Gbps | Vision-language models, vLLM pools | Ready | 5.20 | |
| scx.l40s.1xlargeL40S | 1 x L40S 48GB | 16 vCPU | 128 GiB | 50 / 100 Gbps | Multimodal inference, vision, vLLM | Ready | 2.80 | |
| scx.l20.4xlargeL20 | 4 x L20 48GB | 48 vCPU | 384 GiB | 100 Gbps | Cost-efficient 70B serving pools | Ready | 6.80 | |
| scx.l20.2xlargeL20 | 2 x L20 48GB | 32 vCPU | 192 GiB | 50 / 100 Gbps | Private assistants, model gateways | Ready | 3.70 | |
| scx.l20.1xlargeL20 | 1 x L20 48GB | 16 vCPU | 96 GiB | 25 / 50 Gbps | Cost-efficient model serving | Ready | 1.95 | |
| scx.a10.2xlargeA10 | 2 x A10 24GB | 16 vCPU | 128 GiB | 25 Gbps | Small model serving, rerank, dev pools | Ready | 1.72 | |
| scx.a10.largeA10 | 1 x A10 24GB | 8 vCPU | 64 GiB | 10 / 25 Gbps | Embedding, rerank, dev workloads | Ready | 0.92 |
Runtime template
Base image
Network and security
Bandwidth cap
Product advantages
Product advantages
Built for AI model operators: performance, elasticity, network posture, and billing all sit behind a unified model-access surface.
Vetted GPU supply
Approved providers, node health checks, and deployment-ready capacity instead of anonymous marketplace listings.
Elastic reservation
Start with shared inference, reserve throughput when traffic stabilizes, then move to dedicated pools when boundaries matter.
Model-serving templates
Prebuilt runtime patterns for vLLM-style serving, embeddings, rerankers, multimodal routes, and batch jobs.
Credit settlement
On-demand usage, reserved capacity, and dedicated deployments reconcile into one Compute Credit ledger.
Rental modes
Rental modes
Choose capacity by operating need, not only by GPU SKU. Every mode can feed the unified API layer when the workload becomes a customer-facing model route.
On-demand GPU
Launch experiments, evaluation jobs, or burst inference without long commitments.
Reserved inference pool
Keep warm throughput for production model aliases with predictable latency and budget posture.
Dedicated node group
Pin workloads to isolated nodes for enterprise traffic, compliance, regional posture, or sustained utilization.
Managed open-model deployment
Ask SCX to deploy mature open models on selected GPU supply and expose a stable route.
Use cases
Use cases
Map each workload to the right compute lane before reserving capacity.
LLM production inference
Serve chat, reasoning, and agent routes with warm pools, autoscaling posture, and route-level observability.
Private model deployment
Run approved open models or customer-tuned models on isolated capacity while keeping the public API contract stable.
Fine-tuning and evaluation
Schedule training, LoRA jobs, benchmark batches, and regression suites on GPU families matched to memory needs.
Multimodal and media jobs
Support vision, OCR, image generation, audio transcription, and rendering workloads from the same capacity layer.
Embedding and retrieval
Deploy embedding and reranking services close to application traffic with predictable cost and throughput.
Enterprise reserved capacity
Lock budget, region, security posture, and invoice flow before scaling high-value production traffic.
Activation flow
Activation flow
A clear path from workload demand to live endpoint, with deployment and settlement connected from day one.
01. Submit workload profile
Model family, modality, context window, traffic estimate, region, latency target, and compliance needs.
02. Match GPU lane
SCX recommends on-demand, reserved, dedicated, or managed deployment capacity with a clear cost posture.
03. Deploy runtime
Bring your own container or use platform templates for inference, embeddings, rerankers, and batch workers.
04. Route and settle
Expose the workload through API aliases, monitor health, and reconcile usage through Compute Credit.
StellarComputerX Compute
Need a GPU plan for a production model?
Send the workload profile. SCX can propose GPU family, serving mode, reservation shape, and API exposure path.
