Specifications Compared
| Spec | A16 | L40S |
|---|---|---|
| TDP | 250W | 350W |
| VRAM | 16 GB | 48 GB |
| CUDA Cores | 2,560 | 18,176 |
| Memory Type | GDDR6 | GDDR6X |
| Architecture | Ampere | Ada Lovelace |
| Form Factors | PCIe | PCIe |
| Interconnect | PCIe 4.0 | |
| Tensor Cores | 80 | 568 |
| FP16 Performance | 4.5 TFLOPS | 362 TFLOPS |
| FP32 Performance | 4.5 TFLOPS | 91 TFLOPS |
| Memory Bandwidth | 231 GB/s | 864 GB/s |
Performance Analysis
The L40S demonstrates superior raw compute power over the A16. Its FP16 performance of 362 TFLOPS dwarfs the A16's 4.5 TFLOPS, enabling up to 80 times faster matrix operations critical for deep learning inference. The FP32 rating of 91 TFLOPS on the L40S versus 4.5 TFLOPS on the A16 accelerates model training phases that rely on single-precision arithmetic. FP8 support at 724 TFLOPS on the L40S further optimizes quantized inference for large language models.
Memory specifications profoundly impact real-world usage. The L40S's 48 GB GDDR6X VRAM supports models and batch sizes infeasible on the A16's 16 GB GDDR6, preventing out-of-memory errors in tasks like fine-tuning. Bandwidth of 864 GB/s on the L40S, compared to 231 GB/s on the A16, minimizes data transfer bottlenecks, allowing larger batches and higher throughput in memory-intensive applications such as generative AI. Although the L40S draws 350W TDP versus the A16's 250W, its architectural efficiency yields better performance per watt for demanding workloads.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A16
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
Vultr | 8×NVIDIA A16 64GB VRAM | 64GB | 48 vCPU 496GB RAM 1500GB Storage | Singapore | $0.47/GPU/hr $3.77/hr total (8×) | Available | ||
Vultr | 8×NVIDIA A16 64GB VRAM | 64GB | 48 vCPU 496GB RAM 1500GB Storage | Atlanta | $0.47/GPU/hr $3.77/hr total (8×) | Available | ||
Vultr | 8×NVIDIA A16 64GB VRAM | 64GB | 48 vCPU 496GB RAM 1500GB Storage | Bangalore | $0.47/GPU/hr $3.77/hr total (8×) | Available | ||
Vultr | 2×NVIDIA A16 64GB VRAM | 64GB | 12 vCPU 128GB RAM 700GB Storage | Bangalore | $0.47/GPU/hr $0.94/hr total (2×) | Available | ||
Vultr | 4×NVIDIA A16 64GB VRAM | 64GB | 24 vCPU 256GB RAM 1200GB Storage | Atlanta | $0.47/GPU/hr $1.88/hr total (4×) | Available |
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | 4×NVIDIA L40S 48GB VRAM | 48GB | 46 vCPU 288GB RAM 2500GB Storage | Iowa | $0.88/GPU/hr $3.52/hr total (4×) | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
When to Choose the A16
The A16 suits budget-conscious deployments with modest compute needs. Its average pricing of $0.48/hr across 74 live offers provides abundant availability for entry-level inference or virtual desktop infrastructure. With 16 GB VRAM and 4.5 TFLOPS FP16/FP32, it handles smaller models efficiently at a 250W TDP, ideal for cost-sensitive environments avoiding overprovisioning.
When to Choose the L40S
Select the L40S for high-performance AI and graphics workloads requiring substantial resources. The 48 GB VRAM and 864 GB/s bandwidth accommodate large-scale models and big batches, while 362 TFLOPS FP16 and 91 TFLOPS FP32 deliver rapid training and inference. Despite a higher average of $1.11/hr across 21 offers, its PCIe 4.0 interconnect and 724 TFLOPS FP8 justify the investment for production-scale tasks.
Use Cases
The L40S's 91 TFLOPS FP32 and 362 TFLOPS FP16 provide the compute power needed for training large models, far exceeding the A16's 4.5 TFLOPS.
With 48 GB VRAM and 724 TFLOPS FP8, the L40S supports high-throughput inference for LLMs, unlike the A16's 16 GB limitation.
The L40S's 864 GB/s bandwidth and 362 TFLOPS FP16 handle larger batch sizes during fine-tuning, outperforming the A16's 231 GB/s.
Stable Diffusion benefits from the L40S's 48 GB VRAM for high-resolution generation, compared to the A16's 16 GB constraint.
Light simulations fit the A16's 4.5 TFLOPS FP32 at low cost, but complex ones require the L40S's 91 TFLOPS and higher bandwidth.
Frequently Asked Questions
What is the VRAM difference between A16 and L40S?▾
The A16 has 16 GB GDDR6 VRAM, while the L40S offers 48 GB GDDR6X. This tripling enables the L40S to manage significantly larger models without swapping.
How do their FP16 performances compare?▾
The A16 delivers 4.5 TFLOPS FP16, whereas the L40S achieves 362 TFLOPS. This gap translates to much faster inference on the L40S for AI workloads.
What are the current cloud prices for these GPUs?▾
A16 pricing starts at $0.47/hr with an average of $0.48/hr across 74 offers. L40S starts at $0.40/hr but averages $1.11/hr across 21 offers.
Which GPU has higher memory bandwidth?▾
The L40S provides 864 GB/s, over three times the A16's 231 GB/s. Higher bandwidth reduces bottlenecks in data-heavy tasks like training.
What architectures do they use?▾
The A16 uses Ampere from 2021, and the L40S uses Ada Lovelace from 2023. The newer architecture yields better efficiency and FP8 support at 724 TFLOPS.
How do TDPs compare?▾
The A16 consumes 250W TDP, lower than the L40S's 350W. Lower power suits edge or cost-optimized setups on the A16.
Which is cheaper to rent, the A16 or the L40S?▾
Cloud rental prices for both the A16 and L40S vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A16 have compared to the L40S?▾
The A16 has 16 GB of GDDR6 memory. The L40S has 48 GB of GDDR6X memory.
Can I find A16 and L40S GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A16 and the L40S?▾
The A16 uses the Ampere architecture (2021) while the L40S uses Ada Lovelace (2023). The L40S delivers 80.4x the FP16 throughput and 3.7x the memory bandwidth of the A16.


