Specifications Compared
| Spec | A40 | H200 |
|---|---|---|
| TDP | 300W | 700W |
| VRAM | 48 GB | 141 GB |
| CUDA Cores | 10,752 | 16,896 |
| Memory Type | GDDR6 | HBM3e |
| Architecture | Ampere | Hopper |
| Form Factors | PCIe | SXM, NVL |
| Interconnect | NVLink | NVLink, PCIe 5.0, InfiniBand |
| Tensor Cores | 336 | 528 |
| FP16 Performance | 37.4 TFLOPS | 1,979 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 67 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 34 TFLOPS |
| INT8 Performance | 299 TOPS | 3,958 TOPS |
| Memory Bandwidth | 696 GB/s | 4,800 GB/s |
Performance Analysis
The H200 vastly outpaces the A40 in compute performance: its FP16 rating reaches 1979 TFLOPS compared to 37.4 TFLOPS, enabling over 50 times faster half-precision operations critical for deep learning training. FP32 performance stands at 67 TFLOPS versus 37.4 TFLOPS, providing a clear edge in single-precision tasks like simulations. The H200's FP8 capability of 3958 TFLOPS further accelerates inference for quantized models, an area where the A40 lacks equivalent support.
Memory specifications define real-world usability. The H200's 141 GB HBM3e VRAM supports batch sizes and model sizes unattainable on the A40's 48 GB GDDR6, preventing out-of-memory errors in large language model inference or training. Bandwidth of 4800 GB/s versus 696 GB/s reduces data transfer bottlenecks, allowing sustained high throughput in memory-intensive workloads such as transformer processing.
Power consumption reflects these differences: the H200's 700W TDP demands robust cooling and infrastructure, while the A40's 300W suits denser deployments. In training scenarios, the H200 shortens epochs dramatically due to superior FP16 and bandwidth; for inference, FP8 and VRAM enable serving larger models at lower latency.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 4×NVIDIA RTX A4000 16GB VRAM | 16GB | 16 vCPU 86GB RAM 500GB Storage | Norway | $0.15/GPU/hr $0.60/hr total (4×) | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available |
H200 SXM
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Vultr | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 72 vCPU 480GB RAM 960GB Storage | Atlanta | $1.99/GPU/hr | Available | ||
![]() Lambda Labs | NVIDIA GH200 Grace Hopper 96GB VRAM | 96GB | 64 vCPU 432GB RAM 4096GB Storage | Virginia | $2.29/GPU/hr | Available | ||
Nebius | NVIDIA H200 SXM 141GB VRAM | 141GB | 16 vCPU 200GB RAM | 🌍Europe | $2.45/GPU/hr | |||
![]() CoreWeave | 8×NVIDIA H200 SXM 141GB VRAM | 141GB | 128 vCPU 0GB RAM 61440GB Storage | United States | $2.58/GPU/hr $20.64/hr total (8×) | |||
![]() Ori | 4×NVIDIA H200 SXM 141GB VRAM | 141GB | 96 vCPU 960GB RAM 12000GB Storage | London | $3.50/GPU/hr $14.00/hr total (4×) | Available |
When to Choose the A40
The A40 excels in cost-sensitive environments requiring reliable performance without extreme scale. At a minimum cloud price of $0.24 per hour and 300W TDP, it fits PCIe-based servers for fine-tuning smaller models or Stable Diffusion generation within 48 GB VRAM limits. Its balanced 37.4 TFLOPS FP16 and FP32 suit scientific computing or inference where memory bandwidth of 696 GB/s suffices and budgets constrain options below the H200's $1.19 per hour minimum.
When to Choose the H200 SXM
Opt for the H200 SXM in high-end AI pipelines demanding massive scale. Its 141 GB HBM3e VRAM accommodates full-parameter training of large language models, while 4800 GB/s bandwidth sustains large batch sizes. The 1979 TFLOPS FP16 and 3958 TFLOPS FP8 deliver unmatched speed for LLM training and inference, justifying the $3.70 per hour average despite 700W TDP.
Use Cases
The H200's 141 GB HBM3e VRAM and 1979 TFLOPS FP16 performance enable training of massive language models without splitting, unlike the A40's 48 GB GDDR6 limit.
With 3958 TFLOPS FP8 and 141 GB VRAM, the H200 supports high-throughput serving of large models; the A40's 37.4 TFLOPS FP16 falls short for scale.
The H200's superior 4800 GB/s bandwidth and 67 TFLOPS FP32 accelerate fine-tuning on parameter-heavy models beyond the A40's 696 GB/s capacity.
The A40's 48 GB VRAM and 37.4 TFLOPS FP16 handle image generation workflows efficiently at lower cost of $1.31 per hour average.
The A40's balanced 37.4 TFLOPS FP32 suits moderate simulations at 300W TDP; the H200's 67 TFLOPS FP32 benefits memory-intensive HPC tasks.
Frequently Asked Questions
What is the VRAM capacity of the NVIDIA A40 versus H200 SXM?▾
The A40 provides 48 GB GDDR6 VRAM. The H200 SXM offers 141 GB HBM3e, enabling larger models and batch sizes.
How do the FP16 performances compare between A40 and H200?▾
The A40 delivers 37.4 TFLOPS in FP16. The H200 achieves 1979 TFLOPS, over 52 times higher for AI training.
What are the cloud pricing differences for A40 and H200 SXM?▾
A40 pricing starts at $0.24 per hour, averaging $1.31 across 23 offers. H200 SXM starts at $1.19 per hour, averaging $3.70 across 23 offers.
Which GPU has higher memory bandwidth, A40 or H200?▾
The A40 has 696 GB/s bandwidth. The H200 reaches 4800 GB/s, reducing bottlenecks in data-heavy workloads.
What are the TDP ratings for A40 and H200 SXM?▾
The A40 consumes 300W TDP. The H200 SXM requires 700W, suiting high-density data centers.
Does the H200 support FP8 precision unlike the A40?▾
The H200 provides 3958 TFLOPS in FP8 for efficient inference. The A40 lacks specified FP8 performance.
Which is cheaper to rent, the A40 or the H200?▾
Cloud rental prices for both the A40 and H200 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the H200?▾
The A40 has 48 GB of GDDR6 memory. The H200 has 141 GB of HBM3e memory.
Can I find A40 and H200 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the H200?▾
The A40 uses the Ampere architecture (2020) while the H200 uses Hopper (2024). The H200 delivers 52.9x the FP16 throughput and 6.9x the memory bandwidth of the A40.





