Specifications Compared
| Spec | A40 | GH200 |
|---|---|---|
| TDP | 300W | 900W |
| VRAM | 48 GB | 96 GB |
| CUDA Cores | 10,752 | 16,896 |
| Memory Type | GDDR6 | HBM3 |
| Architecture | Ampere | Hopper |
| Form Factors | PCIe | SXM |
| Interconnect | NVLink | NVLink-C2C, PCIe 5.0 |
| Tensor Cores | 336 | 528 |
| FP16 Performance | 37.4 TFLOPS | 1,979 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 67 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 34 TFLOPS |
| INT8 Performance | 299 TOPS | 3,958 TOPS |
| Memory Bandwidth | 696 GB/s | 4,000 GB/s |
Performance Analysis
The GH200 vastly outpaces the A40 in compute capabilities: FP16 performance stands at 1979 TFLOPS compared to 37.4 TFLOPS, enabling over 50 times faster matrix operations critical for deep learning training. FP32 throughput reaches 67 TFLOPS on GH200 versus 37.4 TFLOPS on A40, supporting superior single-precision scientific simulations. The FP16 to FP32 balance on A40 suits general workloads evenly, but GH200's disparity favors low-precision training acceleration.
Memory specifications transform real-world usage: 96 GB HBM3 on GH200 versus 48 GB GDDR6 on A40 allows loading models twice as large without partitioning. Bandwidth of 4000 GB/s on GH200 dwarfs A40's 696 GB/s, reducing data transfer bottlenecks and enabling larger batch sizes in training by minimizing GPU memory swaps. This sustains higher throughput in inference pipelines handling high-resolution inputs.
Power demands reflect these gains: GH200's 900W TDP triples A40's 300W, necessitating robust cooling in SXM setups. FP8 performance at 3959 TFLOPS on GH200 excels in quantized inference, processing more tokens per second than A40's capabilities permit.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 315GB RAM 2313GB Storage | United Kingdom | $0.16/GPU/hr $1.28/hr total (8×) | Available |
GH200
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Vultr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 960GB Storage | Atlanta | $1.99/GPU/hr | Available | ||
![]() Lambda Labs | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 64 vCPU 432GB RAM 4096GB Storage | Virginia | $2.29/GPU/hr | Available | ||
![]() Denvr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 7600GB Storage | Virginia | $3.87/GPU/hr | |||
![]() CoreWeave | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 7680GB Storage | United States | $6.50/GPU/hr |
When to Choose the A40
The A40 suits cost-sensitive deployments where models fit within 48 GB VRAM. At $0.24 per hour starting price across 23 offers, it delivers 37.4 TFLOPS FP16 and FP32 performance reliably for fine-tuning or inference on mid-sized networks. Its 300W TDP and PCIe form factor simplify integration into existing clusters without high power infrastructure.
Legacy Ampere compatibility favors A40 for visualization or Stable Diffusion tasks not demanding extreme bandwidth. Average pricing of $1.26 per hour ensures economical scaling for teams avoiding GH200's premium.
When to Choose the GH200
The GH200 excels in large-scale AI training requiring 96 GB VRAM and 4000 GB/s bandwidth. Its 1979 TFLOPS FP16 throughput accelerates LLM pretraining, while FP8 at 3958 TFLOPS optimizes inference for production serving.
High-performance computing benefits from GH200's NVLink-C2C interconnect and Hopper architecture. Despite $1.99 per hour starting cost, it justifies investment for workloads bottlenecked by A40's 696 GB/s bandwidth or 37.4 TFLOPS limits.
Use Cases
GH200's 1979 TFLOPS FP16 outperforms A40's 37.4 TFLOPS by over 50 times, accelerating large model training. Bandwidth of 4000 GB/s supports massive batches.
FP8 at 3958 TFLOPS and 96 GB VRAM on GH200 handle high-throughput serving. A40's 48 GB limits scale for large LLMs.
A40 suffices for models under 48 GB at lower $0.24 per hour cost. GH200 accelerates larger fine-tunes with 1979 TFLOPS FP16.
A40's 37.4 TFLOPS FP32 and 48 GB VRAM fit image generation efficiently. Lower 300W TDP and pricing suit creative workflows.
GH200's 67 TFLOPS FP32 and 4000 GB/s bandwidth speed simulations. NVLink-C2C enhances multi-GPU scaling over A40.
Frequently Asked Questions
Which GPU has more VRAM, A40 or GH200?▾
The GH200 provides 96 GB HBM3 VRAM, doubling the A40's 48 GB GDDR6. This enables GH200 to load larger models without sharding. Bandwidth also favors GH200 at 4000 GB/s versus 696 GB/s.
How do A40 and GH200 compare in FP16 performance?▾
GH200 achieves 1979 TFLOPS FP16, over 52 times the A40's 37.4 TFLOPS. This gap accelerates AI training significantly. FP32 on GH200 reaches 67 TFLOPS versus A40's 37.4 TFLOPS.
What is the pricing difference between A40 and GH200?▾
A40 starts at $0.24 per hour averaging $1.26 per hour across 23 offers. GH200 begins at $1.99 per hour averaging $3.59 per hour over 4 offers. A40 offers better value for lighter workloads.
Does GH200 support FP8 precision?▾
GH200 delivers 3958 TFLOPS FP8, ideal for inference quantization. A40 lacks specified FP8 performance. This boosts GH200 in high-volume serving scenarios.
What are the power requirements for these GPUs?▾
A40 consumes 300W TDP in PCIe form factor. GH200 requires 900W in SXM with advanced cooling. GH200 demands more infrastructure for its performance gains.
Which GPU is newer, A40 or GH200?▾
GH200 uses Hopper architecture from 2023, versus A40's Ampere from 2020. Interconnects differ: GH200 has NVLink-C2C and PCIe 5.0. This generational advance drives GH200's spec superiority.
Which is cheaper to rent, the A40 or the GH200?▾
Cloud rental prices for both the A40 and GH200 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the GH200?▾
The A40 has 48 GB of GDDR6 memory. The GH200 has 96 GB of HBM3 memory.
Can I find A40 and GH200 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the GH200?▾
The A40 uses the Ampere architecture (2020) while the GH200 uses Hopper (2023). The GH200 delivers 52.9x the FP16 throughput and 5.7x the memory bandwidth of the A40.





