Specifications Compared
| Spec | H200 | L40S |
|---|---|---|
| TDP | 700W | 350W |
| VRAM | 141 GB | 48 GB |
| CUDA Cores | 16,896 | 18,176 |
| Memory Type | HBM3e | GDDR6X |
| Architecture | Hopper | Ada Lovelace |
| Form Factors | SXM, NVL | PCIe |
| Interconnect | NVLink, PCIe 5.0, InfiniBand | PCIe 4.0 |
| Tensor Cores | 528 | 568 |
| FP8 Performance | 3,958 TFLOPS | 724 TFLOPS |
| FP16 Performance | 1,979 TFLOPS | 362 TFLOPS |
| FP32 Performance | 67 TFLOPS | 91 TFLOPS |
| FP64 Performance | 34 TFLOPS | 1.4 TFLOPS |
| INT8 Performance | 3,958 TOPS | 724 TOPS |
| Memory Bandwidth | 4,800 GB/s | 864 GB/s |
Performance Analysis
Memory specifications dominate real-world differences: the H200's 141 GB HBM3e VRAM supports models exceeding 100 billion parameters without multi-GPU sharding, unlike the L40S limited to 48 GB GDDR6X. Bandwidth of 4800 GB/s on the H200 enables larger batch sizes in training, reducing time per epoch, while 864 GB/s on the L40S constrains throughput for memory-bound workloads.
FP16 performance favors the H200 at 1979 TFLOPS over the L40S's 362 TFLOPS, accelerating mixed-precision training by over 5x. FP8 inference sees similar gains: 3958 TFLOPS versus 724 TFLOPS, ideal for high-throughput serving of quantized LLMs. However, FP32 rates reverse with L40S at 91 TFLOPS exceeding H200's 67 TFLOPS, benefiting simulations requiring single-precision accuracy.
Power draw reflects capabilities: H200 TDP at 700W demands robust cooling, while L40S at 350W suits dense deployments. Interconnects enhance H200 scalability via NVLink and PCIe 5.0, surpassing L40S PCIe 4.0.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
H200 SXM
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Vultr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 960GB Storage | Atlanta | $1.99/GPU/hr | Available | ||
![]() Lambda Labs | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 64 vCPU 432GB RAM 4096GB Storage | Virginia | $2.29/GPU/hr | Available | ||
Nebius | NVIDIA H200 SXM 141GB VRAM | 141GB | 16 vCPU 200GB RAM | 🌍Europe | $2.45/GPU/hr | |||
![]() CoreWeave | 8×NVIDIA H200 SXM 141GB VRAM | 141GB | 128 vCPU 0GB RAM 61440GB Storage | United States | $2.58/GPU/hr $20.64/hr total (8×) | |||
![]() Ori | 4×NVIDIA H200 SXM 141GB VRAM | 141GB | 96 vCPU 960GB RAM 12000GB Storage | London | $3.50/GPU/hr $14.00/hr total (4×) | Available |
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
When to Choose the H200 SXM
The H200 excels in large-scale LLM training and inference where 141 GB VRAM handles massive models without fragmentation. Scenarios include pre-training 175B+ parameter transformers: 4800 GB/s bandwidth sustains high batch sizes, and 1979 TFLOPS FP16 cuts epochs significantly.
Enterprises with NVLink clusters prioritize the H200 for its 3958 TFLOPS FP8 in production inference at scale, despite $1.19/hr starting pricing.
When to Choose the L40S
The L40S suits cost-sensitive inference and fine-tuning of models under 70B parameters, leveraging 48 GB GDDR6X at $0.40/hr from pricing. Lower 350W TDP enables higher density in PCIe servers for Stable Diffusion or visualization pipelines.
FP32 workloads like scientific simulations favor L40S's 91 TFLOPS over H200's 67 TFLOPS, with PCIe 4.0 sufficient for single-node tasks.
Use Cases
H200's 141 GB HBM3e VRAM and 4800 GB/s bandwidth support massive batch sizes for models over 100B parameters. L40S 48 GB limits scale.
3958 TFLOPS FP8 on H200 delivers high throughput for large quantized models. L40S suffices for smaller models but bottlenecks on VRAM.
141 GB VRAM on H200 accommodates full model loading during PEFT on billion-parameter LLMs. L40S 48 GB requires more gradient checkpointing.
L40S 48 GB GDDR6X handles high-resolution generation efficiently at $0.40/hr. H200 overkill for typical 512x512 inference.
L40S FP32 at 91 TFLOPS outperforms H200's 67 TFLOPS for simulations. Lower 350W TDP aids multi-GPU scientific clusters.
Frequently Asked Questions
What is the VRAM difference between H200 and L40S?▾
H200 provides 141 GB HBM3e VRAM, enabling larger models than L40S's 48 GB GDDR6X. This gap affects batch sizes in training.
Which GPU has higher FP16 performance?▾
H200 achieves 1979 TFLOPS FP16, over 5x the L40S's 362 TFLOPS. This accelerates mixed-precision AI training.
How do cloud prices compare?▾
H200 starts at $1.19/hr (average $3.71/hr) across 22 offers; L40S at $0.40/hr (average $1.17/hr) across 21. L40S offers better value for lighter tasks.
What are the TDP ratings?▾
H200 TDP is 700W, requiring advanced cooling; L40S is 350W for denser deployments. Power scales with performance.
Which supports better interconnects?▾
H200 includes NVLink, PCIe 5.0, and InfiniBand for multi-GPU scaling. L40S relies on PCIe 4.0 for single-node use.
Is FP32 better on H200 or L40S?▾
L40S delivers 91 TFLOPS FP32, surpassing H200's 67 TFLOPS. This benefits FP32-dominant scientific computing.
Which is cheaper to rent, the H200 or the L40S?▾
Cloud rental prices for both the H200 and L40S vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the H200 have compared to the L40S?▾
The H200 has 141 GB of HBM3e memory. The L40S has 48 GB of GDDR6X memory.
Can I find H200 and L40S GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the H200 and the L40S?▾
The H200 uses the Hopper architecture (2024) while the L40S uses Ada Lovelace (2023). The H200 delivers 5.5x the FP16 throughput and 5.6x the memory bandwidth of the L40S.





