Specifications Compared
| Spec | A40 | L40S |
|---|---|---|
| TDP | 300W | 350W |
| VRAM | 48 GB | 48 GB |
| CUDA Cores | 10,752 | 18,176 |
| Memory Type | GDDR6 | GDDR6X |
| Architecture | Ampere | Ada Lovelace |
| Form Factors | PCIe | PCIe |
| Interconnect | NVLink | PCIe 4.0 |
| Tensor Cores | 336 | 568 |
| FP16 Performance | 37.4 TFLOPS | 362 TFLOPS |
| FP32 Performance | 37.4 TFLOPS | 91 TFLOPS |
| FP64 Performance | 0.6 TFLOPS | 1.4 TFLOPS |
| INT8 Performance | 299 TOPS | 724 TOPS |
| Memory Bandwidth | 696 GB/s | 864 GB/s |
Performance Analysis
Compute specifications highlight the L40S dominance: 362 TFLOPS FP16 versus the A40's 37.4 TFLOPS accelerates deep learning training by nearly 9.7 times, reducing epochs for models like transformers. FP32 performance at 91 TFLOPS on the L40S outpaces the A40's 37.4 TFLOPS by 2.4 times, aiding precision-sensitive simulations. The L40S FP8 at 724 TFLOPS enables ultra-fast inference with quantization, ideal for deployment.
Memory bandwidth of 864 GB/s on the L40S exceeds the A40's 696 GB/s by 24 percent, supporting larger batch sizes in training and minimizing data starvation for 48 GB VRAM utilization. This delta enhances throughput in memory-bound workloads such as fine-tuning large models. The L40S 350W TDP versus 300W reflects higher performance density, though it requires robust power delivery.
In real-world terms, the L40S handles modern Ada-optimized frameworks efficiently, while the A40 suffices for Ampere-era codebases but lags in raw speed.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA RTX A4000 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Tallinn, Harjumaa | $0.08/GPU/hr | Available | ||
![]() Vast.ai | 8×NVIDIA RTX A4000 16GB VRAM | 16GB | 80 vCPU 201GB RAM 1698GB Storage | United Kingdom | $0.15/GPU/hr $1.17/hr total (8×) | Available | ||
![]() Hyperstack | 4×NVIDIA RTX A4000 16GB VRAM | 16GB | 16 vCPU 86GB RAM 500GB Storage | Norway | $0.15/GPU/hr $0.60/hr total (4×) | Available | ||
![]() Hyperstack | 2×NVIDIA RTX A4000 16GB VRAM | 16GB | 8 vCPU 43GB RAM 200GB Storage | Norway | $0.15/GPU/hr $0.30/hr total (2×) | Available | ||
![]() Hyperstack | NVIDIA RTX A4000 16GB VRAM | 16GB | 4 vCPU 21GB RAM 100GB Storage | Norway | $0.15/GPU/hr | Available |
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | 4×NVIDIA L40S 48GB VRAM | 48GB | 46 vCPU 288GB RAM 2500GB Storage | Iowa | $0.88/GPU/hr $3.52/hr total (4×) | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
When to Choose the A40
The A40 fits cost-sensitive or power-limited environments. Pricing starts at $0.24 per hour across 23 cloud offers, undercutting the L40S $0.40 per hour minimum, with 48 GB GDDR6 VRAM at 300W TDP suiting legacy servers. NVLink interconnect enables scalable multi-GPU training for Ampere-specific software stacks.
When to Choose the L40S
The L40S targets high-performance AI pipelines. Its 362 TFLOPS FP16 and 724 TFLOPS FP8 dwarf the A40 equivalents, speeding LLM training and inference, while 864 GB/s bandwidth handles large batches. Average $1.10 per hour across 18 offers delivers strong value for Ada workloads.
Use Cases
L40S FP16 at 362 TFLOPS is 9.7 times the A40's 37.4 TFLOPS, slashing training times for large models. Higher 864 GB/s bandwidth supports bigger batches on 48 GB VRAM.
L40S FP8 reaches 724 TFLOPS for quantized serving, far beyond A40 capabilities. 362 TFLOPS FP16 ensures low-latency responses.
L40S 91 TFLOPS FP32 and 362 TFLOPS FP16 outperform A40's 37.4 TFLOPS each, accelerating parameter updates. Bandwidth edge aids memory-intensive tuning.
L40S 362 TFLOPS FP16 generates images 9.7 times faster than A40's 37.4 TFLOPS. 48 GB VRAM handles high-resolution diffusion models.
L40S 91 TFLOPS FP32 exceeds A40's 37.4 TFLOPS by 2.4 times for simulations. Ada architecture optimizes parallel compute workloads.
Frequently Asked Questions
Do the A40 and L40S have the same VRAM?▾
Both GPUs provide 48 GB VRAM. A40 uses GDDR6, while L40S employs faster GDDR6X with 864 GB/s bandwidth versus 696 GB/s.
Which GPU is cheaper in the cloud?▾
A40 starts at $0.24 per hour (average $1.26 per hour across 23 offers). L40S begins at $0.40 per hour (average $1.10 per hour across 18 offers).
What is the FP16 performance difference?▾
L40S delivers 362 TFLOPS FP16, 9.7 times the A40's 37.4 TFLOPS. This gap favors L40S for AI training.
Which has higher TDP?▾
L40S TDP is 350W, higher than A40's 300W. This supports greater compute but needs better cooling.
What architectures do they use?▾
A40 is Ampere from 2020 with NVLink. L40S is Ada Lovelace from 2023 with PCIe 4.0.
Is L40S better for inference?▾
Yes, L40S FP8 at 724 TFLOPS excels for quantized inference. FP16 at 362 TFLOPS also outpaces A40's 37.4 TFLOPS.
Which is cheaper to rent, the A40 or the L40S?▾
Cloud rental prices for both the A40 and L40S vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A40 have compared to the L40S?▾
The A40 has 48 GB of GDDR6 memory. The L40S has 48 GB of GDDR6X memory.
Can I find A40 and L40S GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A40 and the L40S?▾
The A40 uses the Ampere architecture (2020) while the L40S uses Ada Lovelace (2023). The L40S delivers 9.7x the FP16 throughput and 1.2x the memory bandwidth of the A40.




