Specifications Compared
| Spec | GH200 | L40 |
|---|---|---|
| TDP | 900W | 300W |
| VRAM | 96 GB | 48 GB |
| CUDA Cores | 16,896 | 18,176 |
| Memory Type | HBM3 | GDDR6 |
| Architecture | Hopper | Ada Lovelace |
| Form Factors | SXM | PCIe |
| Interconnect | NVLink-C2C, PCIe 5.0 | |
| Tensor Cores | 528 | 568 |
| FP8 Performance | 3,958 TFLOPS | |
| FP16 Performance | 1,979 TFLOPS | 90.5 TFLOPS |
| FP32 Performance | 67 TFLOPS | 90.5 TFLOPS |
| FP64 Performance | 34 TFLOPS | |
| INT8 Performance | 3,958 TOPS | 724 TOPS |
| Memory Bandwidth | 4,000 GB/s | 864 GB/s |
Performance Analysis
The GH200's FP16 throughput of 1979 TFLOPS vastly outpaces the L40's 90.5 TFLOPS, accelerating neural network training and inference where half-precision dominates. Its FP32 rate of 67 TFLOPS trails the L40's identical 90.5 TFLOPS in FP16 and FP32, positioning the L40 better for FP32-centric workloads like traditional simulations. FP8 capability on GH200 at 3958 TFLOPS further enhances low-precision inference efficiency.
Memory differences prove critical: GH200's 96 GB HBM3 and 4000 GB/s bandwidth enable larger batch sizes in model training, reducing bottlenecks for datasets exceeding 48 GB GDDR6 on L40 with 864 GB/s. High bandwidth on GH200 sustains data flow for large language models, improving throughput by multiples over L40 limitations.
Power draw underscores trade-offs: GH200 requires 900W TDP versus L40's 300W, demanding robust cooling but delivering density in specialized racks.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
GH200
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Vultr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 960GB Storage | Atlanta | $1.99/GPU/hr | Available | ||
![]() Lambda Labs | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 64 vCPU 432GB RAM 4096GB Storage | Virginia | $2.29/GPU/hr | Available | ||
![]() Denvr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 7600GB Storage | Virginia | $3.87/GPU/hr | |||
![]() CoreWeave | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 7680GB Storage | United States | $6.50/GPU/hr |
L40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40 48GB VRAM | 48GB | 8 vCPU 94GB RAM | 🌍global | $0.82/GPU/hr | |||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40 48GB VRAM | 48GB | 14 vCPU 72GB RAM 625GB Storage | Iowa | $0.86/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40 48GB VRAM | 48GB | 26 vCPU 144GB RAM 1250GB Storage | Iowa | $0.86/GPU/hr $1.72/hr total (2×) | Available |
When to Choose the GH200
Select the GH200 for large-scale LLM training or HPC applications needing 96 GB HBM3 VRAM and 4000 GB/s bandwidth to manage massive datasets without swapping. Its 1979 TFLOPS FP16 and 3958 TFLOPS FP8 excel in AI supercomputing via NVLink-C2C interconnect.
In cloud settings, GH200 justifies $3.59 per hour average when peak performance trumps cost for time-sensitive projects.
When to Choose the L40
Choose the L40 for budget-conscious inference, fine-tuning, or graphics tasks fitting within 48 GB GDDR6 VRAM and 864 GB/s bandwidth. Balanced 90.5 TFLOPS across FP16 and FP32 supports versatile compute at 300W TDP.
With $0.88 per hour average pricing across 13 offers, L40 enables dense deployments in PCIe systems prioritizing efficiency over raw scale.
Use Cases
GH200's 96 GB HBM3 VRAM and 1979 TFLOPS FP16 support training massive models with large batch sizes. L40's 48 GB GDDR6 limits scale.
GH200 delivers 3958 TFLOPS FP8 and 4000 GB/s bandwidth for high-throughput serving. L40's 90.5 TFLOPS FP16 suffices only for smaller models.
L40's 90.5 TFLOPS FP32/FP16 and $0.88 per hour average handle most fine-tuning within 48 GB VRAM cost-effectively. GH200 overkill for sub-48 GB tasks.
L40's Ada Lovelace architecture and balanced 90.5 TFLOPS optimize image generation in 48 GB VRAM. Lower 300W TDP aids multi-GPU rendering.
GH200's Hopper design, 67 TFLOPS FP32, and NVLink-C2C excel in HPC simulations needing 96 GB high-bandwidth memory. L40 lacks interconnect scale.
Frequently Asked Questions
What is the VRAM capacity of GH200 versus L40?▾
GH200 provides 96 GB HBM3 VRAM, doubling L40's 48 GB GDDR6. This enables GH200 to load larger models without offloading. Bandwidth follows suit at 4000 GB/s for GH200 versus 864 GB/s.
How do GH200 and L40 compare in cloud pricing?▾
GH200 starts at $1.99 per hour, averaging $3.59 per hour across four offers. L40 begins at $0.67 per hour, averaging $0.88 per hour across 13 offers. Pricing reflects GH200's premium performance.
Which GPU performs better in FP16 for AI training?▾
GH200 achieves 1979 TFLOPS FP16, over 20 times L40's 90.5 TFLOPS. This gap accelerates deep learning training significantly. FP8 on GH200 reaches 3958 TFLOPS for inference.
What are the power requirements of these GPUs?▾
GH200 demands 900W TDP in SXM form, requiring advanced cooling. L40 uses 300W TDP in PCIe, suiting standard servers. Lower TDP on L40 reduces operational costs.
Is GH200 or L40 better for large batch sizes?▾
GH200's 4000 GB/s bandwidth and 96 GB VRAM support larger batches without bottlenecks. L40's 864 GB/s and 48 GB limit it to smaller scales. This impacts training throughput directly.
What architectures power GH200 and L40?▾
GH200 uses Hopper from 2023 with NVLink-C2C interconnect. L40 employs Ada Lovelace from 2023 in PCIe form. Hopper optimizes for HPC scale over Ada's graphics focus.
Which is cheaper to rent, the GH200 or the L40?▾
Cloud rental prices for both the GH200 and L40 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the GH200 have compared to the L40?▾
The GH200 has 96 GB of HBM3 memory. The L40 has 48 GB of GDDR6 memory.
Can I find GH200 and L40 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the GH200 and the L40?▾
The GH200 uses the Hopper architecture (2023) while the L40 uses Ada Lovelace (2023). The GH200 delivers 21.9x the FP16 throughput and 4.6x the memory bandwidth of the L40.





