Specifications Compared
| Spec | A40 | RTX-5090 |
|---|---|---|
| TDP | 300W | 575W |
| VRAM | 48 GB | 32 GB |
| CUDA Cores | 10,752 | 21,760 |
| Memory Type | GDDR6 | GDDR7 |
| Architecture | Ampere | Blackwell |
| Form Factors | PCIe | PCIe |
| Interconnect | NVLink | PCIe 5.0 |
| Tensor Cores | 336 | 680 |
| FP16 Performance | 37.4 TFLOPS | 419 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 105 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 1.6 TFLOPS |
| INT8 Performance | 299 TOPS | 838 TOPS |
| Memory Bandwidth | 696 GB/s | 1,792 GB/s |
Performance Analysis
The RTX 5090 vastly outpaces the A40 in compute throughput: FP16 reaches 419 TFLOPS compared to 37.4 TFLOPS, enabling faster model training and inference in deep learning pipelines. FP32 performance hits 105 TFLOPS on the RTX 5090 against 37.4 TFLOPS on the A40, benefiting scientific simulations and rendering. The FP16 to FP32 delta on the RTX 5090 indicates optimized tensor cores for AI, while the A40 maintains parity suited to general compute.
Memory bandwidth of 1792 GB/s on the RTX 5090 supports larger batch sizes in training, reducing iteration times versus the A40's 696 GB/s. This gap proves critical for data-intensive workloads like large language models, where high throughput minimizes bottlenecks. However, the A40's 48 GB VRAM exceeds the RTX 5090's 32 GB, accommodating bigger models without splitting across GPUs.
Power draw reflects these capabilities: the RTX 5090 demands 575W TDP against the A40's 300W, influencing cluster density and cooling needs.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 4×NVIDIA RTX A4000 16GB VRAM | 16GB | 16 vCPU 86GB RAM 500GB Storage | Norway | $0.15/GPU/hr $0.60/hr total (4×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available |
RTX 5090
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA GeForce RTX 5090 32GB VRAM | 32GB | 0 vCPU 0GB RAM | Chubbuck, Idaho | $0.57/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 5090 32GB VRAM | 32GB | 384 vCPU 94GB RAM 570GB Storage | Czechia | $0.81/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 5090 32GB VRAM | 32GB | 8 vCPU 30GB RAM 489GB Storage | South Korea | $0.87/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 5090 32GB VRAM | 32GB | 16 vCPU 30GB RAM 583GB Storage | South Korea | $0.87/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 5090 32GB VRAM | 32GB | 16 vCPU 30GB RAM 495GB Storage | South Korea | $0.91/GPU/hr | Available |
When to Choose the A40
The A40 excels in scenarios requiring high VRAM capacity. With 48 GB GDDR6, it handles large models like those exceeding 32 GB without multi-GPU complexity, ideal for memory-bound LLM fine-tuning or scientific computing datasets.
NVLink interconnect enables efficient multi-GPU setups, and its 300W TDP allows denser deployments. Availability across 23 cloud offers at average $1.26 per hour suits reliable enterprise workloads prioritizing stability over peak speed.
When to Choose the RTX 5090
The RTX 5090 dominates high-throughput inference and training tasks. Its 419 TFLOPS FP16 and 838 TFLOPS FP8 deliver rapid processing for real-time AI serving, far surpassing the A40's 37.4 TFLOPS.
Superior 1792 GB/s bandwidth supports massive batch sizes, and lower cloud pricing from $0.16 per hour average $0.74 across 16 offers provides cost savings. The Blackwell architecture optimizes modern AI pipelines, making it preferable for performance-critical applications.
Use Cases
The A40's 48 GB VRAM supports larger models without fragmentation, critical for training massive LLMs. Its NVLink aids multi-GPU scaling at lower 300W TDP.
RTX 5090's 838 TFLOPS FP8 and 419 TFLOPS FP16 enable ultra-fast serving. High 1792 GB/s bandwidth handles high-concurrency requests efficiently.
A40's 48 GB VRAM fits memory-heavy fine-tuning; RTX 5090's 419 TFLOPS FP16 accelerates iterations. Choice depends on model size versus speed needs.
RTX 5090's 105 TFLOPS FP32 and 1792 GB/s bandwidth speed up image generation pipelines. Lower $0.74 average hourly cost optimizes creative workflows.
RTX 5090's 105 TFLOPS FP32 outperforms A40's 37.4 TFLOPS for simulations. PCIe 5.0 supports fast data transfers in HPC environments.
Frequently Asked Questions
Which GPU has more VRAM, A40 or RTX 5090?▾
The A40 provides 48 GB GDDR6 VRAM, exceeding the RTX 5090's 32 GB GDDR7. This makes the A40 better for memory-intensive tasks. Bandwidth favors the RTX 5090 at 1792 GB/s over 696 GB/s.
How do A40 and RTX 5090 compare in cloud pricing?▾
RTX 5090 starts at $0.16 per hour with average $0.74 across 16 offers. A40 begins at $0.24 per hour averaging $1.26 across 23 offers. The RTX 5090 offers better value for high-performance needs.
What is the FP16 performance difference between A40 and RTX 5090?▾
RTX 5090 achieves 419 TFLOPS FP16, over 11 times the A40's 37.4 TFLOPS. This gap accelerates AI training and inference. FP32 on RTX 5090 is 105 TFLOPS versus 37.4 TFLOPS.
Is the RTX 5090 or A40 better for multi-GPU setups?▾
A40 supports NVLink for high-speed multi-GPU communication. RTX 5090 relies on PCIe 5.0, suitable for fewer GPUs. A40's lower 300W TDP aids dense clusters.
Which has higher power consumption, A40 or RTX 5090?▾
RTX 5090 draws 575W TDP, nearly double the A40's 300W. This impacts cooling and density in cloud instances. Performance justifies the increase for speed-focused tasks.
What architectures do A40 and RTX 5090 use?▾
A40 uses Ampere from 2020; RTX 5090 employs Blackwell from 2025. Blackwell enables FP8 at 838 TFLOPS absent in A40. Newer design boosts efficiency in AI workloads.
Which is cheaper to rent, the A40 or the RTX 5090?▾
Cloud rental prices for both the A40 and RTX 5090 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the RTX 5090?▾
The A40 has 48 GB of GDDR6 memory. The RTX 5090 has 32 GB of GDDR7 memory.
Can I find A40 and RTX 5090 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the RTX 5090?▾
The A40 uses the Ampere architecture (2020) while the RTX 5090 uses Blackwell (2025). The RTX 5090 delivers 11.2x the FP16 throughput and 2.6x the memory bandwidth of the A40.


