Specifications Compared
| Spec | A40 | B200 |
|---|---|---|
| TDP | 300W | 1000W |
| VRAM | 48 GB | 192 GB |
| CUDA Cores | 10,752 | 18,432 |
| Memory Type | GDDR6 | HBM3e |
| Architecture | Ampere | Blackwell |
| Form Factors | PCIe | SXM, NVL |
| Interconnect | NVLink | NVLink, PCIe 6.0, InfiniBand |
| Tensor Cores | 336 | 576 |
| FP16 Performance | 37.4 TFLOPS | 4,500 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 90 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 45 TFLOPS |
| INT8 Performance | 299 TOPS | 9,000 TOPS |
| Memory Bandwidth | 696 GB/s | 8,000 GB/s |
Performance Analysis
The B200 NVL vastly outpaces the A40 in compute throughput: its 4500 TFLOPS FP16 rating enables training of large language models up to 120 times faster than the A40's 37.4 TFLOPS. For FP32 workloads, the B200 NVL achieves 90 TFLOPS against the A40's matching 37.4 TFLOPS, benefiting precision-sensitive simulations. The FP16 to FP32 delta on the B200 NVL supports mixed-precision training, reducing memory use while accelerating convergence in deep learning pipelines.
Memory specifications define real-world scalability: the B200 NVL's 192 GB HBM3e and 8000 GB/s bandwidth handle batch sizes for trillion-parameter models without swapping, unlike the A40's 48 GB GDDR6 limit at 696 GB/s. Larger batches on the B200 NVL cut training epochs and inference latency. However, the B200 NVL's 1000W TDP demands robust cooling, contrasting the A40's efficient 300W draw.
Inference benefits from the B200 NVL's 9000 TFLOPS FP8 performance, enabling high-throughput serving of quantized models far beyond the A40's capabilities.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 4×NVIDIA RTX A4000 16GB VRAM | 16GB | 16 vCPU 86GB RAM 500GB Storage | Norway | $0.15/GPU/hr $0.60/hr total (4×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available |
B200 NVL
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Nebius | NVIDIA B200 SXM 192GB VRAM | 192GB | 20 vCPU 224GB RAM | 🌍Europe | $3.95/GPU/hr | |||
Cirrascale | 8×NVIDIA B200 SXM 192GB VRAM | 192GB | 192 vCPU 2048GB RAM 43923GB Storage | United States | $4.79/GPU/hr $38.32/hr total (8×) | |||
Cirrascale | 8×NVIDIA B200 SXM 192GB VRAM | 192GB | 192 vCPU 2048GB RAM 43923GB Storage | United States | $5.39/GPU/hr $43.12/hr total (8×) | |||
Cirrascale | 8×NVIDIA B200 SXM 192GB VRAM | 192GB | 192 vCPU 2048GB RAM 43923GB Storage | United States | $5.69/GPU/hr $45.52/hr total (8×) | |||
![]() RunPod | NVIDIA B200 SXM 192GB VRAM | 192GB | 28 vCPU 283GB RAM | California | $5.89/GPU/hr |
When to Choose the A40
The A40 excels in cost-sensitive environments with moderate demands. Its pricing from $0.24 per hour across 24 offers and 300W TDP make it ideal for small-to-medium inference deployments or legacy Ampere-optimized codebases. PCIe form factor simplifies integration into existing data centers without NVLink complexity.
Choose the A40 for Stable Diffusion generation or fine-tuning models under 48 GB VRAM, where 37.4 TFLOPS FP16 suffices and 696 GB/s bandwidth avoids overprovisioning.
When to Choose the B200 NVL
The B200 NVL suits frontier AI research and production-scale training. With 192 GB HBM3e VRAM, it loads massive models intact, and 4500 TFLOPS FP16 accelerates iterations on datasets exceeding A40 limits. NVLink, PCIe 6.0, and InfiniBand interconnects enable multi-GPU clusters for distributed workloads.
Opt for the B200 NVL in high-volume LLM inference, leveraging 9000 TFLOPS FP8 for quantized serving at scales unattainable by the A40.
Use Cases
The B200 NVL's 4500 TFLOPS FP16 and 192 GB HBM3e VRAM enable training of trillion-parameter models with large batch sizes via 8000 GB/s bandwidth. The A40's 37.4 TFLOPS and 48 GB GDDR6 limit it to smaller scales.
B200 NVL's 9000 TFLOPS FP8 supports high-throughput quantized inference for massive models fitting in 192 GB VRAM. A40 cannot match this scale with 48 GB and lower compute.
A40 handles fine-tuning under 48 GB VRAM at $0.24 per hour starting price with 37.4 TFLOPS FP16. B200 NVL accelerates larger adapters via 4500 TFLOPS but at $10.50 per hour.
A40's 48 GB GDDR6 and 37.4 TFLOPS FP16 suffice for image generation pipelines at low $1.28 per hour average. B200 NVL overkill for typical resolutions.
B200 NVL's 90 TFLOPS FP32 and 8000 GB/s bandwidth speed simulations with large datasets. A40's equal 37.4 TFLOPS FP32 suits smaller HPC tasks.
Frequently Asked Questions
What is the VRAM difference between A40 and B200 NVL?▾
The A40 provides 48 GB GDDR6 VRAM, while the B200 NVL offers 192 GB HBM3e. This quadruples capacity for larger models on the B200 NVL. Memory bandwidth reaches 8000 GB/s on B200 NVL versus 696 GB/s on A40.
How do A40 and B200 NVL compare in FP16 performance?▾
A40 delivers 37.4 TFLOPS FP16, but B200 NVL achieves 4500 TFLOPS. This represents over 120 times the throughput for AI training on B200 NVL. FP32 is 37.4 TFLOPS on A40 versus 90 TFLOPS on B200 NVL.
What are the cloud pricing details for these GPUs?▾
A40 pricing starts at $0.24 per hour with $1.28 average across 24 offers. B200 NVL is $10.50 per hour average across one offer. A40 provides better value for lighter workloads.
Which GPU has higher power consumption?▾
B200 NVL TDP is 1000W, compared to A40's 300W. This makes A40 more power-efficient for dense deployments. B200 NVL requires advanced cooling infrastructure.
Is B200 NVL better for LLM training than A40?▾
Yes, B200 NVL excels with 4500 TFLOPS FP16 and 192 GB VRAM for large models. A40's 37.4 TFLOPS limits it to smaller training runs. Bandwidth of 8000 GB/s on B200 NVL supports bigger batches.
What interconnects do these GPUs support?▾
A40 uses NVLink and PCIe form factor. B200 NVL supports NVLink, PCIe 6.0, InfiniBand, and SXM/NVL formats. This enables superior multi-GPU scaling on B200 NVL.
Which is cheaper to rent, the A40 or the B200?▾
Cloud rental prices for both the A40 and B200 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the B200?▾
The A40 has 48 GB of GDDR6 memory. The B200 has 192 GB of HBM3e memory.
Can I find A40 and B200 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the B200?▾
The A40 uses the Ampere architecture (2020) while the B200 uses Blackwell (2024). The B200 delivers 120.3x the FP16 throughput and 11.5x the memory bandwidth of the A40.



