Specifications Compared
| Spec | A40 | H100 |
|---|---|---|
| TDP | 300W | 700W |
| VRAM | 48 GB | 80-94 GB |
| CUDA Cores | 10,752 | 16,896 |
| Memory Type | GDDR6 | HBM3 |
| Architecture | Ampere | Hopper |
| Form Factors | PCIe | SXM5, PCIe, NVL |
| Interconnect | NVLink | NVLink, PCIe 5.0, InfiniBand |
| Tensor Cores | 336 | 528 |
| FP16 Performance | 37.4 TFLOPS | 1,979 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 67 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 34 TFLOPS |
| INT8 Performance | 299 TOPS | 3,958 TOPS |
| Memory Bandwidth | 696 GB/s | 3,350 GB/s |
Performance Analysis
H100 NVL's FP16 performance reaches 1979 TFLOPS, over 52 times A40's 37.4 TFLOPS, accelerating neural network training where half-precision dominates. FP32 at 67 TFLOPS exceeds A40's 37.4 TFLOPS, aiding simulation and rendering tasks. FP8 capability of 3958 TFLOPS on H100 NVL optimizes inference for quantized large language models.
Memory bandwidth of 3350 GB/s on H100 NVL, nearly five times A40's 696 GB/s, enables larger batch sizes in training, reducing per-iteration time for models exceeding 48 GB VRAM. H100 NVL's 80-94 GB HBM3 capacity handles massive datasets without splitting, unlike A40's 48 GB GDDR6 limit.
Higher TDP of 700W on H100 NVL versus 300W on A40 demands advanced cooling, but NVLink and PCIe 5.0 interconnects support scalable multi-GPU clusters for distributed workloads.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 315GB RAM 2313GB Storage | United Kingdom | $0.16/GPU/hr $1.28/hr total (8×) | Available |
H100 NVL
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() Hyperstack | 4×NVIDIA H100 PCIe 80GB VRAM | 80GB | 124 vCPU 720GB RAM 3300GB Storage | Canada | $1.90/GPU/hr $7.60/hr total (4×) | Available | ||
![]() Hyperstack | 2×NVIDIA H100 PCIe 80GB VRAM | 80GB | 60 vCPU 360GB RAM 1600GB Storage | Canada | $1.90/GPU/hr $3.80/hr total (2×) | Available | ||
![]() Hyperstack | 8×NVIDIA H100 PCIe 80GB VRAM | 80GB | 252 vCPU 1440GB RAM 6600GB Storage | Canada | $1.90/GPU/hr $15.20/hr total (8×) | Available | ||
![]() Hyperstack | NVIDIA H100 PCIe 80GB VRAM | 80GB | 28 vCPU 180GB RAM 850GB Storage | Canada | $1.90/GPU/hr | Available | ||
![]() Hyperstack | 8×NVIDIA H100 PCIe 80GB VRAM | 80GB | 252 vCPU 1440GB RAM 6600GB Storage | Canada | $1.95/GPU/hr $15.60/hr total (8×) | Available |
When to Choose the A40
NVIDIA A40 fits budget-driven projects with cloud pricing from $0.24 per hour and average $1.31 per hour. Its 300W TDP integrates into standard PCIe servers without power upgrades, ideal for inference on models under 48 GB VRAM or Stable Diffusion generation at 37.4 TFLOPS FP16. Select A40 when workloads do not require Hopper-specific features like FP8 precision.
When to Choose the H100 NVL
NVIDIA H100 NVL dominates large-scale AI training with 1979 TFLOPS FP16 and 80-94 GB HBM3 VRAM for models like GPT-scale LLMs. Bandwidth at 3350 GB/s supports enormous batch sizes, cutting training epochs versus A40's constraints. Choose H100 NVL for inference throughput via 3958 TFLOPS FP8 and NVLink scaling in clusters.
Use Cases
H100 NVL's 1979 TFLOPS FP16 and 3350 GB/s bandwidth enable training of massive LLMs with large batch sizes. A40's 37.4 TFLOPS and 696 GB/s fall short for such scales.
FP8 performance at 3958 TFLOPS on H100 NVL delivers high-throughput quantized inference. 80-94 GB VRAM supports full model loading unlike A40's 48 GB limit.
H100 NVL accelerates fine-tuning with 67 TFLOPS FP32 and superior memory, reducing iteration times. A40 suffices only for very small models due to lower specs.
A40's 48 GB VRAM and 37.4 TFLOPS FP16 handle image generation efficiently at lower cost from $0.24 per hour. H100 NVL overkill for typical diffusion models.
H100 NVL's 67 TFLOPS FP32 and NVLink interconnect scale simulations better than A40's 37.4 TFLOPS. Bandwidth advantage aids data-intensive HPC workloads.
Frequently Asked Questions
How much more powerful is H100 NVL than A40 in FP16?▾
H100 NVL achieves 1979 TFLOPS FP16, over 52 times A40's 37.4 TFLOPS. This gap transforms AI training speed. Inference benefits similarly from FP8 at 3958 TFLOPS on H100 NVL.
What is the VRAM difference between A40 and H100 NVL?▾
A40 has 48 GB GDDR6 VRAM, while H100 NVL provides 80-94 GB HBM3. Larger capacity on H100 NVL fits bigger models. Bandwidth reaches 3350 GB/s versus 696 GB/s.
Which GPU is cheaper in the cloud?▾
A40 starts at $0.24 per hour averaging $1.31 per hour across 23 offers. H100 NVL begins at $1.40 per hour averaging $2.89 per hour across 9 offers. A40 suits cost-sensitive use.
What are the TDP ratings for A40 and H100 NVL?▾
A40 consumes 300W TDP in PCIe form factor. H100 NVL requires 700W in SXM5, PCIe, or NVL forms. Higher power correlates with performance gains.
Is H100 NVL better for LLM training?▾
Yes, H100 NVL excels with 1979 TFLOPS FP16 and 80-94 GB VRAM for large LLMs. A40's 37.4 TFLOPS limits scale. Bandwidth of 3350 GB/s further aids.
What interconnects do these GPUs support?▾
A40 uses NVLink. H100 NVL supports NVLink, PCIe 5.0, and InfiniBand. This enables superior multi-GPU clustering on H100 NVL.
Which is cheaper to rent, the A40 or the H100?▾
Cloud rental prices for both the A40 and H100 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the H100?▾
The A40 has 48 GB of GDDR6 memory. The H100 has 80 to 94 GB of HBM3 memory.
Can I find A40 and H100 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the H100?▾
The A40 uses the Ampere architecture (2020) while the H100 uses Hopper (2022). The H100 delivers 52.9x the FP16 throughput and 4.8x the memory bandwidth of the A40.


