Specifications Compared
| Spec | A40 | B200 |
|---|---|---|
| TDP | 300W | 1000W |
| VRAM | 48 GB | 192 GB |
| CUDA Cores | 10,752 | 18,432 |
| Memory Type | GDDR6 | HBM3e |
| Architecture | Ampere | Blackwell |
| Form Factors | PCIe | SXM, NVL |
| Interconnect | NVLink | NVLink, PCIe 6.0, InfiniBand |
| Tensor Cores | 336 | 576 |
| FP16 Performance | 37.4 TFLOPS | 4,500 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 90 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 45 TFLOPS |
| INT8 Performance | 299 TOPS | 9,000 TOPS |
| Memory Bandwidth | 696 GB/s | 8,000 GB/s |
Performance Analysis
The compute disparity defines their capabilities: B200 SXM's 4500 TFLOPS FP16 vastly exceeds A40's 37.4 TFLOPS, accelerating deep learning training where half-precision dominates. A40's equal 37.4 TFLOPS FP16 and FP32 suits balanced single-precision tasks, but B200 SXM's 90 TFLOPS FP32 and 9000 TFLOPS FP8 enable superior mixed-precision inference for large models.
Memory bandwidth presents the starkest real-world impact: B200 SXM's 8000 GB/s versus A40's 696 GB/s supports batch sizes four to ten times larger in training, minimizing data loading bottlenecks and shortening epochs for LLMs exceeding 70B parameters. A40 handles smaller batches effectively but struggles with memory-bound workloads.
Power draw underscores trade-offs: A40's 300W TDP fits standard PCIe servers, while B200 SXM's 1000W demands high-density SXM or NVL platforms with advanced cooling. Overall, B200 SXM transforms throughput for AI pipelines, rendering A40 adequate for legacy or lighter inference.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 315GB RAM 2313GB Storage | United Kingdom | $0.16/GPU/hr $1.28/hr total (8×) | Available |
B200 SXM
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Nebius | NVIDIA B200 SXM 192GB VRAM | 192GB | 20 vCPU 224GB RAM | 🌍Europe | $3.95/GPU/hr | |||
Cirrascale | 8×NVIDIA B200 SXM 192GB VRAM | 192GB | 192 vCPU 2048GB RAM 43923GB Storage | United States | $4.79/GPU/hr $38.32/hr total (8×) | |||
Cirrascale | 8×NVIDIA B200 SXM 192GB VRAM | 192GB | 192 vCPU 2048GB RAM 43923GB Storage | United States | $5.39/GPU/hr $43.12/hr total (8×) | |||
Cirrascale | 8×NVIDIA B200 SXM 192GB VRAM | 192GB | 192 vCPU 2048GB RAM 43923GB Storage | United States | $5.69/GPU/hr $45.52/hr total (8×) | |||
![]() RunPod | NVIDIA B200 SXM 192GB VRAM | 192GB | 28 vCPU 283GB RAM | North Carolina | $5.89/GPU/hr |
When to Choose the A40
Select the A40 for budget-limited projects requiring PCIe compatibility in existing servers. Its 48 GB GDDR6 VRAM and 696 GB/s bandwidth suffice for fine-tuning models under 30B parameters or Stable Diffusion at 512x512 resolutions, with pricing from $0.24 per hour across 24 offers.
A40 excels in environments constrained by 300W TDP or NVLink interconnects without InfiniBand needs, such as professional visualization or scientific simulations on moderate datasets.
When to Choose the B200 SXM
Choose B200 SXM for large-scale LLM training or inference demanding 192 GB HBM3e VRAM and 8000 GB/s bandwidth. Its 4500 TFLOPS FP16 handles models over 1T parameters, enabling batch sizes that A40 cannot support.
B200 SXM suits high-performance clusters with SXM form factors, NVLink, PCIe 6.0, or InfiniBand, justified by 9000 TFLOPS FP8 for efficient serving despite $1.71 per hour starting pricing.
Use Cases
B200 SXM's 4500 TFLOPS FP16 and 192 GB HBM3e VRAM enable training of models over 1T parameters with large batches. A40's 37.4 TFLOPS and 48 GB limit it to smaller scales.
B200 SXM's 9000 TFLOPS FP8 and 8000 GB/s bandwidth support high-throughput serving of massive models. A40 manages lighter loads but bottlenecks on large batches.
A40's 48 GB VRAM handles models under 70B parameters cost-effectively at $0.24 per hour. B200 SXM accelerates larger fine-tunes with 192 GB but at higher $1.71 per hour cost.
A40's 37.4 TFLOPS FP16 and 48 GB VRAM generate images at 1024x1024 efficiently for most workflows. B200 SXM overpowers needs for this task.
A40's 37.4 TFLOPS FP32 and 300W TDP fit PCIe servers for simulations on moderate grids. B200 SXM's 1000W and SXM form suit only extreme HPC.
Frequently Asked Questions
What is the VRAM difference between A40 and B200 SXM?▾
A40 provides 48 GB GDDR6 VRAM, while B200 SXM offers 192 GB HBM3e. This quadruples capacity for B200 SXM, enabling larger models and batches.
How do FP16 performance levels compare?▾
A40 delivers 37.4 TFLOPS FP16, contrasted by B200 SXM's 4500 TFLOPS. B200 SXM provides roughly 120x faster half-precision compute for AI training.
What are the current cloud pricing ranges?▾
A40 starts at $0.24 per hour averaging $1.28 per hour across 24 offers. B200 SXM begins at $1.71 per hour averaging $4.60 per hour across 13 offers.
Which has higher memory bandwidth?▾
B200 SXM achieves 8000 GB/s, over 11x A40's 696 GB/s. This boosts B200 SXM for memory-intensive tasks like large-batch training.
What are the TDP and form factor differences?▾
A40 uses 300W in PCIe form, suiting standard servers. B200 SXM requires 1000W in SXM or NVL, needing specialized high-power racks.
Does B200 SXM support FP8?▾
B200 SXM reaches 9000 TFLOPS FP8 for efficient inference. A40 lacks FP8 specs, relying on FP16 at 37.4 TFLOPS.
Which is cheaper to rent, the A40 or the B200?▾
Cloud rental prices for both the A40 and B200 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the B200?▾
The A40 has 48 GB of GDDR6 memory. The B200 has 192 GB of HBM3e memory.
Can I find A40 and B200 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the B200?▾
The A40 uses the Ampere architecture (2020) while the B200 uses Blackwell (2024). The B200 delivers 120.3x the FP16 throughput and 11.5x the memory bandwidth of the A40.



