Specifications Compared
| Spec | H200 | L40 |
|---|---|---|
| TDP | 700W | 300W |
| VRAM | 141 GB | 48 GB |
| CUDA Cores | 16,896 | 18,176 |
| Memory Type | HBM3e | GDDR6 |
| Architecture | Hopper | Ada Lovelace |
| Form Factors | SXM, NVL | PCIe |
| Interconnect | NVLink, PCIe 5.0, InfiniBand | |
| Tensor Cores | 528 | 568 |
| FP8 Performance | 3,958 TFLOPS | |
| FP16 Performance | 1,979 TFLOPS | 90.5 TFLOPS |
| FP32 Performance | 67 TFLOPS | 90.5 TFLOPS |
| FP64 Performance | 34 TFLOPS | |
| INT8 Performance | 3,958 TOPS | 724 TOPS |
| Memory Bandwidth | 4,800 GB/s | 864 GB/s |
Performance Analysis
The H200's FP16 performance reaches 1979 TFLOPS compared to the L40's 90.5 TFLOPS, enabling over 21 times faster tensor operations critical for LLM training and inference. Its FP8 capability at 3958 TFLOPS further accelerates quantized inference, while FP32 at 67 TFLOPS on the H200 trails the L40's 90.5 TFLOPS slightly, a minor concern for non-AI graphics workloads. This FP16/FP32 delta favors the H200 for deep learning: training large models demands high FP16 throughput, whereas the L40 suits FP32-heavy visualization.
Memory specifications dominate real-world impacts. The H200's 141 GB HBM3e versus 48 GB GDDR6 allows batch sizes up to three times larger for models like 70B-parameter LLMs, reducing overhead. Its 4800 GB/s bandwidth versus 864 GB/s minimizes data bottlenecks, speeding up training epochs by supporting faster memory access during gradient computations. Lower bandwidth on the L40 limits it to smaller batches, increasing iteration times for memory-intensive tasks.
Power efficiency varies by workload. The H200's 700W TDP delivers superior throughput per watt in FP16-heavy scenarios, while the L40's 300W suits dense deployments where total power budgets constrain scaling.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
H200 NVL
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Vultr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 960GB Storage | Atlanta | $1.99/GPU/hr | Available | ||
![]() Lambda Labs | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 64 vCPU 432GB RAM 4096GB Storage | Virginia | $2.29/GPU/hr | Available | ||
Nebius | NVIDIA H200 SXM 141GB VRAM | 141GB | 16 vCPU 200GB RAM | 🌍Europe | $2.45/GPU/hr | |||
![]() CoreWeave | 8×NVIDIA H200 SXM 141GB VRAM | 141GB | 128 vCPU 0GB RAM 61440GB Storage | United States | $2.58/GPU/hr $20.64/hr total (8×) | |||
![]() Ori | 2×NVIDIA H200 SXM 141GB VRAM | 141GB | 48 vCPU 480GB RAM 6000GB Storage | London | $3.50/GPU/hr $7.00/hr total (2×) | Available |
L40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40 48GB VRAM | 48GB | 8 vCPU 94GB RAM | 🌍global | $0.82/GPU/hr | |||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40 48GB VRAM | 48GB | 14 vCPU 72GB RAM 625GB Storage | Iowa | $0.86/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40 48GB VRAM | 48GB | 26 vCPU 144GB RAM 1250GB Storage | Iowa | $0.86/GPU/hr $1.72/hr total (2×) | Available |
When to Choose the H200 NVL
Choose the H200 NVL for large-scale LLM training or inference where models exceed 48 GB VRAM, such as 100B+ parameter deployments. Its 141 GB HBM3e and 4800 GB/s bandwidth handle massive batch sizes and datasets without swapping, achieving 1979 TFLOPS FP16 for rapid iterations. NVLink interconnects enable multi-GPU scaling in NVL form factors, ideal for research clusters.
High FP8 performance at 3958 TFLOPS makes it optimal for quantized inference at enterprise scale, justifying $2.60 per hour average pricing.
When to Choose the L40
The L40 excels in cost-sensitive inference for models under 48 GB, like 7B LLMs, with 90.5 TFLOPS FP16/FP32 and $0.90 per hour average pricing across 15 offers. Its 300W TDP and PCIe form factor simplify dense server integrations without specialized cooling.
Graphics and visualization tasks leverage balanced FP32 at 90.5 TFLOPS, outperforming the H200's 67 TFLOPS for rendering in scientific simulations.
Use Cases
The H200's 141 GB VRAM and 1979 TFLOPS FP16 support training of 100B+ parameter models with large batches. The L40's 48 GB limits scale.
3958 TFLOPS FP8 and 4800 GB/s bandwidth enable high-throughput quantized serving for large LLMs. L40 suits only smaller models under 48 GB.
141 GB HBM3e accommodates full model fine-tuning without truncation, unlike L40's 48 GB constraint. FP16 dominance accelerates iterations.
L40's 90.5 TFLOPS FP32 handles image generation efficiently at lower cost. H200 overkill unless scaling to massive resolutions.
L40's balanced 90.5 TFLOPS FP32/FP16 and 300W TDP fit simulations and viz. H200's FP32 at 67 TFLOPS is less optimal.
Frequently Asked Questions
What is the VRAM difference between H200 NVL and L40?▾
The H200 NVL provides 141 GB HBM3e VRAM, nearly three times the L40's 48 GB GDDR6. This enables larger models and batches on the H200. Bandwidth reaches 4800 GB/s on H200 versus 864 GB/s on L40.
How do FP16 performances compare?▾
H200 delivers 1979 TFLOPS FP16, over 21 times the L40's 90.5 TFLOPS. This gap accelerates AI training and inference significantly. FP8 on H200 hits 3958 TFLOPS for quantization.
What are the cloud pricing ranges?▾
H200 NVL starts at $0.50 per hour, averaging $2.60 across five offers. L40 starts at $0.67 per hour, averaging $0.90 across 15 offers. L40 offers better value for lighter workloads.
Which has higher power consumption?▾
H200 requires 700W TDP in SXM/NVL form factors. L40 uses 300W in PCIe. This makes L40 easier for dense, power-limited deployments.
Is H200 better for LLM training?▾
Yes, H200's 141 GB VRAM and 1979 TFLOPS FP16 handle large-scale training unattainable on L40's 48 GB. NVLink supports multi-GPU setups.
What architectures do they use?▾
H200 uses Hopper from 2024. L40 uses Ada Lovelace from 2023. Hopper optimizes for latest AI tensor cores.
Which is cheaper to rent, the H200 or the L40?▾
Cloud rental prices for both the H200 and L40 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the H200 have compared to the L40?▾
The H200 has 141 GB of HBM3e memory. The L40 has 48 GB of GDDR6 memory.
Can I find H200 and L40 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the H200 and the L40?▾
The H200 uses the Hopper architecture (2024) while the L40 uses Ada Lovelace (2023). The H200 delivers 21.9x the FP16 throughput and 5.6x the memory bandwidth of the L40.





