GPU costs dominate machine learning budgets. A single H100 can cost $2-4/hour in the cloud—$48-96/day or $1,440-2,880/month. Choose the wrong provider and you’ll hemorrhage money. Choose right and you’ll cut costs by 60%.
This guide compares real 2026 GPU rental prices across all major cloud providers and specialized GPU marketplaces, then shows when renting beats buying.
GPU Rental Market Overview: 2026
The GPU rental market has bifurcated into two tiers:
Tier 1: Enterprise Cloud (AWS, Google Cloud, Azure)
Premium pricing ($0.76-$3.50/hour), enterprise SLAs, 99.99% uptime guarantees, managed infrastructure, global availability.
Tier 2: Specialist Providers (RunPod, Lambda Labs, CoreWeave, Modal)
Discount pricing ($0.40-$2.50/hour), variable uptime (95-99%), community support, limited geographic availability.
Most ML engineers use Tier 2 for R&D and Tier 1 for production. This hybrid approach costs 40% less than pure enterprise.
H100 Pricing Across Providers
The H100 is the gold standard for large language model training. Prices in April 2026:
| Provider | On-Demand | Spot/Discount | 1-Year Reserved |
|---|---|---|---|
| AWS (us-east-1) | $2.88/hr | $0.86/hr (70% off) | $1.97/hr |
| Google Cloud | $3.12/hr | $1.09/hr (65% off) | $2.34/hr |
| Lambda Labs | $3.98/hr | N/A | $3.18/hr |
| RunPod | $2.49/hr | $1.50/hr (40% off) | N/A |
| CoreWeave | $2.70/hr | $1.62/hr (40% off) | $2.16/hr |
For continuous training, AWS spot instances at $0.86/hour are unbeatable. However, AWS can preempt your job with 2-5 minutes notice, risking hours of lost progress.
For reliability, RunPod’s $2.49/hour on-demand is safer than AWS but still substantially cheaper than Lambda Labs.
H200 Pricing (The Emerging Leader)
NVIDIA’s H200 offers 50% more memory and identical compute vs H100, making it better for long-context models and larger batch sizes.
| Provider | On-Demand | Spot/Discount |
|---|---|---|
| Lambda Labs | $3.98/hr | N/A |
| RunPod | $3.19/hr | $1.91/hr |
| CoreWeave | $3.49/hr | $2.09/hr |
| AWS | N/A yet | N/A yet |
H200 will likely dominate by year-end as AWS and Google Cloud add inventory.
A100 Pricing (The Value Play)
The A100, released in 2020, remains competitive for inference and smaller model training. It costs 40% less than H100 but offers 40% less compute (40 TFLOPS vs 55 TFLOPS).
| Provider | On-Demand | Spot/Discount |
|---|---|---|
| AWS | $1.21/hr | $0.36/hr |
| Google Cloud | $1.02/hr | $0.31/hr |
| RunPod | $0.44/hr | $0.26/hr |
| CoreWeave | $0.62/hr | $0.37/hr |
RunPod’s $0.44/hour A100 is shockingly cheap. For inference workloads (serving models), A100 often suffices and costs 1/6th of H100 pricing.
L40S and Consumer GPU Pricing
The L40S is a workstation GPU designed for professional graphics and light ML, not training. Prices:
- AWS: $0.70/hour on-demand, $0.21/hour spot
- RunPod: $0.44/hour on-demand
- CoreWeave: $0.37/hour on-demand
L40S is useful for:
- Rendering (3D graphics inference)
- Image generation (SDXL, Flux inference)
- Small model training (<10B parameters)
- Development and testing
For a startup building a product with Stable Diffusion, L40S at CoreWeave ($0.37/hour) is ideal. It costs $266/month for continuous operation, but most inference workloads need far less.
Consumer GPU Rentals (RTX 4090 and 5090)
Consumer GPUs offer value for specific niches:
- RTX 4090: $0.30-0.50/hour (sufficient for SDXL, Llama 13B inference)
- RTX 5090: $0.80-1.20/hour (emerging, limited availability)
These are rarely worth it for professional ML. An A100 is only 2x the price but offers 8x more VRAM and better multi-GPU scaling.
On-Premise vs Cloud Break-Even Analysis
Should you buy a GPU instead of renting? The math depends on usage hours.
H100 Purchase Cost: $20,000 (retail)
Cost per hour at different cloud providers:
- AWS on-demand: $2.88/hour × 8,760 hours/year = $25,229/year
- AWS spot: $0.86/hour × 8,760 hours/year = $7,534/year
- RunPod on-demand: $2.49/hour × 8,760 hours/year = $21,802/year
Break-even for H100 purchase:
- At AWS on-demand ($2.88/hr): 6,944 hours (~9.4 months of continuous use)
- At AWS spot ($0.86/hr): 23,256 hours (2.7 years of continuous use)
- At RunPod on-demand ($2.49/hr): 8,032 hours (~11 months of continuous use)
Practical usage scenarios:
- 50 hours/month (5,600 hours/year): Rent from RunPod. Yearly: $13,000. Buying costs $20,000 upfront + electricity.
- 200 hours/month (2,400 hours/year): Rent from AWS spot. Yearly: $2,064. Break-even never occurs.
- 500 hours/month (6,000 hours/year): Rent from RunPod. Yearly: $14,700. Buying costs $20,000 + electricity ($2,000/year). Break-even at month 3-4.
- 1,000 hours/month (12,000 hours/year): Buy. Yearly: $20,000 + $4,000 electricity = $24,000. Running on cloud costs $29,700+. Buying wins.
Use the GPU Cost Calculator to find your break-even point with your monthly usage.
Cost Per FLOP Comparison
Different models have different efficiency. Here’s cost per TFLOPS (trillion floating point operations per second):
| GPU | TFLOPS | Cost/Hour | Cost per TFLOP/Hour |
|---|---|---|---|
| H100 (AWS on-demand) | 55 | $2.88 | $0.052/TFLOP |
| H100 (AWS spot) | 55 | $0.86 | $0.016/TFLOP |
| A100 (AWS on-demand) | 20 | $1.21 | $0.061/TFLOP |
| A100 (RunPod) | 20 | $0.44 | $0.022/TFLOP |
| L40S (CoreWeave) | 7.4 | $0.37 | $0.050/TFLOP |
AWS spot H100 wins on cost-per-FLOP, but RunPod A100 offers reliability at nearly the same efficiency. For production systems, RunPod A100 at $0.022/TFLOP is the sweet spot.
Network and Storage Costs (The Hidden Killer)
GPU rental hourly rates hide additional costs:
Data ingress/egress: AWS charges $0.12/GB egress. Moving a 100GB model in and out costs $24. RunPod doesn’t charge for local transfers, saving thousands monthly.
Storage: AWS charges $0.023/GB/month for EBS. A 500GB dataset costs $11.50/month. RunPod offers free NVMe storage, making it superior for data-intensive workflows.
Network bandwidth: Inter-GPU communication within AWS is free. Across regions costs $0.02/GB. Large distributed training on multiple GPUs across regions becomes expensive quickly.
Total ownership cost for cloud GPU work can be 30% higher than the GPU hourly rate suggests.
Spot vs On-Demand: When to Use Each
Use spot instances ($0.30-0.86/hour) for:
- Model training (checkpoints every 30 minutes protect against preemption)
- Data preprocessing
- Batch inference jobs
- Research and experimentation
- Development and testing
Use on-demand instances ($2.49-3.98/hour) for:
- Real-time inference (serving models to users)
- Fine-tuning with short training windows
- Time-sensitive production jobs
- Enterprise SLA requirements
A hybrid approach uses spot for research and on-demand for production, reducing average costs by 40-50%.
Recommendations by Use Case
Scenario 1: Individual researcher training a model
Use RunPod spot H100 at $1.50/hour. Cost for 1,000 hours training: $1,500. If premature termination causes restarts (+20% overhead), cost is $1,800. Acceptable for research.
Scenario 2: Startup running inference at scale
Use AWS A100 on-demand at $1.21/hour in auto-scaling group. Peak load 8 hours/day requires 2-3 GPUs. Monthly: 8 hours × 30 days × 2.5 GPUs × $1.21 = $726. Alternative: CoreWeave A100 at $0.62/hour = $372/month. CoreWeave saves 50%.
Scenario 3: Company doing continuous training
500 hours/month with reliability requirements. AWS reserved H100 at $1.97/hour × 500 = $985/month. Or buy an H100 ($20,000) and run 500 hours/month. Payback in 20 months plus electricity and maintenance. Renting is safer.
Scenario 4: Large-scale research (Kaggle competition)
1,000+ hours needed urgently. Use AWS spot H100 at $0.86/hour × 1,000 = $860. Risk of preemption is worth the 70% savings vs on-demand ($2,880).
Pro Tips for Minimizing GPU Costs
1. Use mixed precision (FP16) - Cuts memory usage by 50%, allowing smaller GPUs. Train on A100 instead of H100 and save $2/hour.
2. Profile before scaling - Verify your job actually uses the GPU. Many ML workloads bottleneck on data loading, not GPU compute. Profiling saves unnecessary rental costs.
3. Batch inference requests - Serve 100 inference requests in one GPU call instead of 100 separate calls. Reduces GPU rental hours by 95%.
4. Use spot instances with checkpoints - Lose $1.50 to preemption every 10 hours of spot use? Save $8.80/hour with spot. Break-even is under 2 preemptions per 1,000 hours.
5. Compare total cost of ownership - H100 rental looks expensive until you compare with storage, network, and personnel costs. Use the GPU Cost Calculator for full accounting.
Future Trends
NVIDIA’s Blackwell architecture (2026-2027) will offer 2x performance of H100. Expect current H100 prices to drop 40% as inventory shifts to Blackwell. Planning a 6-month project? Wait for Blackwell to avoid overpaying for aging hardware.
AMD’s MI300X offers 30% cost savings over H100 but has lower adoption. By 2027, expect AMD to capture 30-40% of the GPU market, driving competitive pricing.
Conclusion
GPU rental costs range from $0.30/hour (consumer L40S) to $3.98/hour (H200) depending on provider, model, and discounts. Smart teams use RunPod for research (save 40%), AWS for production (reliability), and A100 instead of H100 for inference (save 60%). The difference between optimal and naive GPU spending is 60-70% of total ML infrastructure cost.