Specifications Compared
| Spec | H200 | RTX-4090 |
|---|---|---|
| TDP | 700W | 450W |
| VRAM | 141 GB | 24 GB |
| CUDA Cores | 16,896 | 16,384 |
| Memory Type | HBM3e | GDDR6X |
| Architecture | Hopper | Ada Lovelace |
| Form Factors | SXM, NVL | PCIe |
| Interconnect | NVLink, PCIe 5.0, InfiniBand | PCIe 4.0 |
| Tensor Cores | 528 | 512 |
| FP8 Performance | 3,958 TFLOPS | 660 TFLOPS |
| FP16 Performance | 1,979 TFLOPS | 165 TFLOPS |
| FP32 Performance | 67 TFLOPS | 82.6 TFLOPS |
| FP64 Performance | 34 TFLOPS | 1.3 TFLOPS |
| INT8 Performance | 3,958 TOPS | 660 TOPS |
| Memory Bandwidth | 4,800 GB/s | 1,008 GB/s |
Performance Analysis
The H200 dominates in FP16 performance at 1979 TFLOPS compared to the RTX 4090's 165 TFLOPS, accelerating AI training where half-precision computations prevail and enabling faster convergence on large datasets. For inference, the H200's 3958 TFLOPS FP8 throughput vastly outpaces the RTX 4090's 660 TFLOPS, supporting higher query volumes in production environments. However, the RTX 4090 leads in FP32 at 82.6 TFLOPS over the H200's 67 TFLOPS, benefiting simulations or rendering that demand single-precision accuracy.
Memory specifications define real-world usability: the H200's 141 GB HBM3e allows batch sizes for models exceeding 100 billion parameters without swapping, while the RTX 4090's 24 GB GDDR6X limits it to smaller batches or models under 30 billion parameters. The 4800 GB/s bandwidth on the H200 sustains data flow for these scales, reducing bottlenecks versus the RTX 4090's 1008 GB/s. Higher TDP of 700W on the H200 reflects its density, contrasting the 450W RTX 4090 suited for lighter power envelopes.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
H200 NVL
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Vultr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 960GB Storage | Atlanta | $1.99/GPU/hr | Available | ||
![]() Lambda Labs | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 64 vCPU 432GB RAM 4096GB Storage | Virginia | $2.29/GPU/hr | Available | ||
Nebius | NVIDIA H200 SXM 141GB VRAM | 141GB | 16 vCPU 200GB RAM | 🌍Europe | $2.45/GPU/hr | |||
![]() CoreWeave | 8×NVIDIA H200 SXM 141GB VRAM | 141GB | 128 vCPU 0GB RAM 61440GB Storage | United States | $2.58/GPU/hr $20.64/hr total (8×) | |||
![]() Ori | 4×NVIDIA H200 SXM 141GB VRAM | 141GB | 96 vCPU 960GB RAM 12000GB Storage | London | $3.50/GPU/hr $14.00/hr total (4×) | Available |
RTX 4090
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Chubbuck, Idaho | $0.39/GPU/hr | Available | ||
![]() TensorDock | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Orlando, Florida | $0.48/GPU/hr | Available | ||
![]() TensorDock | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Winnipeg, Manitoba | $0.50/GPU/hr | Available | ||
![]() Vast.ai | 4×NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 96 vCPU 472GB RAM 3034GB Storage | Sweden | $0.53/GPU/hr $2.13/hr total (4×) | Available | ||
![]() Vast.ai | 4×NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 80 vCPU 157GB RAM 856GB Storage | United Kingdom | $0.67/GPU/hr $2.67/hr total (4×) | Available |
When to Choose the H200 NVL
The H200 excels in large-scale LLM training and inference where 141 GB VRAM handles massive models without distribution overhead. Enterprise users benefit from NVLink interconnects and PCIe 5.0 for multi-GPU scaling at $0.50 per hour starting price.
Scientific computing with high memory demands favors the H200's 4800 GB/s bandwidth for simulations involving terabyte-scale datasets.
When to Choose the RTX 4090
The RTX 4090 suits budget-conscious developers for fine-tuning smaller models or Stable Diffusion tasks, leveraging 24 GB VRAM at $0.16 per hour. Its PCIe 4.0 form factor integrates easily into standard cloud instances without specialized infrastructure.
Gaming-adjacent workloads or prototyping benefit from 82.6 TFLOPS FP32 performance at lower average $0.46 per hour costs.
Use Cases
The H200's 141 GB HBM3e VRAM supports training models over 100 billion parameters in single-GPU setups. Its 1979 TFLOPS FP16 throughput accelerates convergence compared to the RTX 4090's 24 GB limit.
H200 delivers 3958 TFLOPS FP8 for high-throughput serving of large models. The 4800 GB/s bandwidth sustains large batch sizes unavailable on RTX 4090.
RTX 4090 handles fine-tuning of models under 30 billion parameters efficiently at $0.16 per hour. Its 165 TFLOPS FP16 suffices for cost-effective iterations.
24 GB GDDR6X on RTX 4090 supports image generation workflows at low $0.46 per hour average. Higher availability across 112 offers aids quick prototyping.
H200's 4800 GB/s bandwidth processes large datasets in simulations. NVLink interconnect enables multi-GPU precision tasks beyond RTX 4090 capabilities.
Frequently Asked Questions
How much more VRAM does the H200 have than the RTX 4090?▾
The H200 provides 141 GB HBM3e VRAM, nearly six times the RTX 4090's 24 GB GDDR6X. This enables handling of much larger AI models without sharding. Bandwidth follows suit at 4800 GB/s versus 1008 GB/s.
What is the FP16 performance difference between H200 and RTX 4090?▾
H200 achieves 1979 TFLOPS FP16, over 12 times the RTX 4090's 165 TFLOPS. This gap accelerates deep learning training significantly. FP8 shows even larger disparity at 3958 TFLOPS versus 660 TFLOPS.
Which GPU is cheaper in the cloud?▾
RTX 4090 starts at $0.16 per hour with $0.46 average across 112 offers, far below H200 NVL's $0.50 starting and $2.39 average on four offers. Availability favors RTX 4090 for small-scale use. H200 justifies cost for scale.
Does the H200 support better interconnects than RTX 4090?▾
H200 includes NVLink, PCIe 5.0, and InfiniBand for multi-GPU clusters. RTX 4090 limits to PCIe 4.0 in consumer form factors. This suits H200 for distributed training.
What are the power requirements for these GPUs?▾
H200 demands 700W TDP for its density, versus RTX 4090's 450W. Higher power on H200 correlates with 141 GB VRAM performance. Data centers accommodate this readily.
Is RTX 4090 better in FP32 than H200?▾
RTX 4090 delivers 82.6 TFLOPS FP32, exceeding H200's 67 TFLOPS. This aids FP32-heavy tasks like certain simulations. However, AI workloads prioritize FP16 and FP8.
Which is cheaper to rent, the H200 or the RTX 4090?▾
Cloud rental prices for both the H200 and RTX 4090 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the H200 have compared to the RTX 4090?▾
The H200 has 141 GB of HBM3e memory. The RTX 4090 has 24 GB of GDDR6X memory.
Can I find H200 and RTX 4090 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the H200 and the RTX 4090?▾
The H200 uses the Hopper architecture (2024) while the RTX 4090 uses Ada Lovelace (2022). The H200 delivers 12.0x the FP16 throughput and 4.8x the memory bandwidth of the RTX 4090.




