Specifications Compared
| Spec | A40 | RTX-4090 |
|---|---|---|
| TDP | 300W | 450W |
| VRAM | 48 GB | 24 GB |
| CUDA Cores | 10,752 | 16,384 |
| Memory Type | GDDR6 | GDDR6X |
| Architecture | Ampere | Ada Lovelace |
| Form Factors | PCIe | PCIe |
| Interconnect | NVLink | PCIe 4.0 |
| Tensor Cores | 336 | 512 |
| FP16 Performance | 37.4 TFLOPS | 165 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 82.6 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 1.3 TFLOPS |
| INT8 Performance | 299 TOPS | 660 TOPS |
| Memory Bandwidth | 696 GB/s | 1,008 GB/s |
Performance Analysis
Raw compute power favors the RTX 4090 decisively: its 165 TFLOPS FP16 performance quadruples the A40's 37.4 TFLOPS, accelerating mixed-precision training and inference in deep learning models. The FP32 rate of 82.6 TFLOPS on RTX 4090 doubles A40's 37.4 TFLOPS, benefiting simulations and graphics rendering that rely on single-precision floats. FP8 capability at 660 TFLOPS on RTX 4090 enables ultra-efficient inference for large language models, a feature absent on A40. Higher memory bandwidth of 1008 GB/s versus 696 GB/s on RTX 4090 supports larger batch sizes in training, reducing overhead in data-parallel workloads. However, A40's 48 GB VRAM doubles RTX 4090's 24 GB, allowing bigger models or datasets without splitting across GPUs, critical for memory-bound tasks like fine-tuning massive transformers. TDP differences matter too: A40's 300W suits dense clusters better than RTX 4090's 450W, which demands robust cooling. In real-world AI pipelines, RTX 4090 excels in speed-sensitive inference, while A40 handles capacity-intensive training.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 4×NVIDIA RTX A4000 16GB VRAM | 16GB | 16 vCPU 86GB RAM 500GB Storage | Norway | $0.15/GPU/hr $0.60/hr total (4×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available |
RTX 4090
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Chubbuck, Idaho | $0.39/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 64 vCPU 101GB RAM 140GB Storage | Iceland | $0.44/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 32 vCPU 88GB RAM 106GB Storage | Iceland | $0.47/GPU/hr | Available | ||
![]() TensorDock | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Orlando, Florida | $0.48/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 32 vCPU 101GB RAM 108GB Storage | Iceland | $0.53/GPU/hr | Available |
When to Choose the A40
Choose the A40 when VRAM capacity is paramount: its 48 GB GDDR6 supports loading full large language models that exceed 24 GB, avoiding model parallelism overhead. NVLink interconnect enables efficient multi-GPU scaling for enterprise training jobs, unlike PCIe 4.0 on RTX 4090. At 300W TDP, it fits power-constrained data centers better than 450W alternatives. Cloud users prioritizing stability over peak speed select A40 for scientific computing with high memory demands.
When to Choose the RTX 4090
Opt for RTX 4090 in compute-bound scenarios: 165 TFLOPS FP16 and 660 TFLOPS FP8 deliver superior inference throughput for real-time applications. Its 1008 GB/s bandwidth handles large batches efficiently, ideal for Stable Diffusion or fine-tuning. Lower pricing from $0.16 per hour average $0.48 across 97 offers provides better value than A40's $1.26 average. High availability and Ada Lovelace efficiency make it preferable for cost-sensitive prototyping.
Use Cases
A40's 48 GB VRAM accommodates larger models without sharding, unlike RTX 4090's 24 GB limit. NVLink supports multi-GPU scaling critical for extended training runs.
RTX 4090's 660 TFLOPS FP8 and 165 TFLOPS FP16 enable higher throughput for serving requests. Bandwidth of 1008 GB/s handles bigger batches than A40's 696 GB/s.
RTX 4090 offers faster 82.6 TFLOPS FP32 for quicker iterations, but A40's 48 GB VRAM suits memory-heavy adapters. Choice depends on model size versus speed needs.
RTX 4090's 165 TFLOPS FP16 and 1008 GB/s bandwidth generate images faster than A40's 37.4 TFLOPS and 696 GB/s. Lower $0.48 per hour average enhances accessibility.
A40's 48 GB VRAM and NVLink excel in memory-intensive simulations across multiple GPUs. 300W TDP fits cluster environments better than 450W.
Frequently Asked Questions
Which GPU has more VRAM, A40 or RTX 4090?▾
The A40 provides 48 GB GDDR6 VRAM, double the RTX 4090's 24 GB GDDR6X. This makes A40 suitable for larger models in training. RTX 4090 compensates with higher bandwidth at 1008 GB/s versus 696 GB/s.
How do A40 and RTX 4090 compare in cloud pricing?▾
RTX 4090 starts at $0.16 per hour with $0.48 average across 97 offers, cheaper than A40's $0.24 start and $1.26 average over 23 offers. More RTX 4090 instances ensure better availability. Price favors RTX 4090 for budget workloads.
What is the FP16 performance difference between A40 and RTX 4090?▾
RTX 4090 achieves 165 TFLOPS FP16, over four times A40's 37.4 TFLOPS. This boosts AI inference speed significantly. FP32 on RTX 4090 at 82.6 TFLOPS also doubles A40's 37.4 TFLOPS.
Does A40 or RTX 4090 have higher memory bandwidth?▾
RTX 4090 leads with 1008 GB/s bandwidth compared to A40's 696 GB/s. Higher bandwidth supports larger batch sizes in training. A40 counters with double the VRAM at 48 GB.
What are the TDP ratings for A40 and RTX 4090?▾
A40 consumes 300W TDP, lower than RTX 4090's 450W. Lower TDP aids dense deployments. RTX 4090's higher power enables its 165 TFLOPS FP16 peak.
Can A40 and RTX 4090 both use NVLink?▾
A40 supports NVLink for multi-GPU communication, while RTX 4090 relies on PCIe 4.0. NVLink benefits large-scale training on A40. PCIe suffices for most single-node RTX 4090 tasks.
Which is cheaper to rent, the A40 or the RTX 4090?▾
Cloud rental prices for both the A40 and RTX 4090 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the RTX 4090?▾
The A40 has 48 GB of GDDR6 memory. The RTX 4090 has 24 GB of GDDR6X memory.
Can I find A40 and RTX 4090 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the RTX 4090?▾
The A40 uses the Ampere architecture (2020) while the RTX 4090 uses Ada Lovelace (2022). The RTX 4090 delivers 4.4x the FP16 throughput and 1.4x the memory bandwidth of the A40.


