Specifications Compared
| Spec | A100 | L40S |
|---|---|---|
| TDP | 400W | 350W |
| VRAM | 40-80 GB | 48 GB |
| CUDA Cores | 6,912 | 18,176 |
| Memory Type | HBM2e | GDDR6X |
| Architecture | Ampere | Ada Lovelace |
| Form Factors | SXM4, PCIe | PCIe |
| Interconnect | NVLink, PCIe 4.0, InfiniBand | PCIe 4.0 |
| Tensor Cores | 432 | 568 |
| FP16 Performance | 312 TFLOPS | 362 TFLOPS |
| FP32 Performance | 19.5 TFLOPS | 91 TFLOPS |
| FP64 Performance | 9.7 TFLOPS | 1.4 TFLOPS |
| INT8 Performance | 624 TOPS | 724 TOPS |
| Memory Bandwidth | 2,039 GB/s | 864 GB/s |
Performance Analysis
The A100's 2039 GB/s HBM2e bandwidth significantly outpaces the L40S's 864 GB/s GDDR6X, allowing larger batch sizes in model training and reducing data loading bottlenecks for workloads like scientific computing or LLM pretraining. This gap proves critical when handling datasets exceeding 40 GB VRAM limits, as higher throughput sustains peak FP16 utilization at 312 TFLOPS. Conversely, the L40S delivers 362 TFLOPS FP16, a 16 percent improvement over the A100, and 91 TFLOPS FP32 more than four times the A100's 19.5 TFLOPS, accelerating single-precision inference and graphics tasks. Its 724 TFLOPS FP8 capability further enhances quantized model serving, common in production deployment. Overall, bandwidth favors A100 for training throughput, while L40S compute densities suit inference efficiency and lower 350W TDP reduces operational costs.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
A100 SXM4 40GB
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() Vast.ai | NVIDIA A100 SXM4 80GB 80GB VRAM | 80GB | 256 vCPU 63GB RAM 397GB Storage | Slovenia | $0.73/GPU/hr | Available | ||
![]() LeaderGPU | 8×NVIDIA A100 PCIe 80GB 80GB VRAM | 80GB | 64 vCPU 384GB RAM 2000GB Storage | Netherlands | $0.90/GPU/hr $7.20/hr total (8×) | Available | ||
![]() Vast.ai | 2×NVIDIA A100 SXM4 80GB 80GB VRAM | 80GB | 64 vCPU 126GB RAM 1114GB Storage | Czechia | $1.00/GPU/hr $2.00/hr total (2×) | Available | ||
![]() Denvr | 4×NVIDIA A100 PCIe 80GB 80GB VRAM | 80GB | 64 vCPU 512GB RAM 7600GB Storage | Virginia | $1.15/GPU/hr $4.60/hr total (4×) | |||
![]() Denvr | 8×NVIDIA A100 SXM4 80GB 80GB VRAM | 80GB | 128 vCPU 1024GB RAM 15200GB Storage | Virginia | $1.15/GPU/hr $9.20/hr total (8×) |
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | 4×NVIDIA L40S 48GB VRAM | 48GB | 46 vCPU 288GB RAM 2500GB Storage | Iowa | $0.88/GPU/hr $3.52/hr total (4×) | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
When to Choose the A100 SXM4 40GB
Select the A100 SXM4 40GB when memory bandwidth dominates, such as in distributed LLM training where 2039 GB/s enables batch sizes twice those feasible on L40S's 864 GB/s without spilling to slower storage. Its NVLink interconnect and HBM2e VRAM excel in high-throughput simulations requiring sustained 312 TFLOPS FP16 over extended runs.
When to Choose the L40S
Choose the L40S for cost-effective inference pipelines, leveraging 362 TFLOPS FP16, 91 TFLOPS FP32, and 724 TFLOPS FP8 at $0.40/hr starting price versus A100's $1.00/hr. The PCIe form factor and 350W TDP simplify scaling in datacenters focused on fine-tuning or Stable Diffusion with superior single-precision performance.
Use Cases
A100's 2039 GB/s bandwidth supports larger batches and faster data movement than L40S's 864 GB/s during intensive pretraining.
L40S provides 362 TFLOPS FP16 and 724 TFLOPS FP8 for efficient quantized serving at lower $1.13/hr average cost.
L40S's 91 TFLOPS FP32 outperforms A100's 19.5 TFLOPS, speeding parameter updates with 48 GB VRAM.
Ada architecture and 362 TFLOPS FP16 accelerate image generation faster than A100, at reduced 350W TDP.
A100's 2039 GB/s bandwidth handles memory-bound simulations better than L40S's 864 GB/s.
Frequently Asked Questions
Which has more VRAM: A100 SXM4 40GB or L40S?▾
The L40S offers 48 GB GDDR6X VRAM compared to A100 SXM4 40GB HBM2e. This extra capacity aids slightly larger models, though A100's bandwidth compensates in throughput.
A100 vs L40S: which is cheaper in cloud?▾
L40S starts at $0.40/hr average $1.13/hr across 23 offers, versus A100 SXM4 40GB from $1.00/hr average $2.80/hr across 4 offers. L40S provides broader availability and savings.
What is the FP32 performance difference?▾
L40S achieves 91 TFLOPS FP32, over 4x the A100's 19.5 TFLOPS. This benefits CPU-like precision tasks in fine-tuning or graphics.
Does L40S support FP8?▾
Yes, L40S delivers 724 TFLOPS FP8 for quantized inference, absent on A100. It accelerates low-precision serving significantly.
Which has higher TDP?▾
A100 consumes 400W TDP versus L40S's 350W. Lower power on L40S lowers cooling costs in dense deployments.
Best interconnect for multi-GPU?▾
A100 supports NVLink alongside PCIe 4.0, enabling faster scaling than L40S's PCIe 4.0 alone. Use A100 for tightly coupled training.
Which is cheaper to rent, the A100 or the L40S?▾
Cloud rental prices for both the A100 and L40S vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the A100 have compared to the L40S?▾
The A100 has 40 to 80 GB of HBM2e memory. The L40S has 48 GB of GDDR6X memory.
Can I find A100 and L40S GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the A100 and the L40S?▾
The A100 uses the Ampere architecture (2020) while the L40S uses Ada Lovelace (2023). The L40S delivers 1.2x the FP16 throughput and 2.4x the memory bandwidth of the A100.





