Specifications Compared
| Spec | L40S | A100 |
|---|---|---|
| TDP | 350W | 400W |
| VRAM | 48 GB | 40-80 GB |
| CUDA Cores | 18,176 | 6,912 |
| Memory Type | GDDR6X | HBM2e |
| Architecture | Ada Lovelace | Ampere |
| Form Factors | PCIe | SXM4, PCIe |
| Interconnect | PCIe 4.0 | NVLink, PCIe 4.0, InfiniBand |
| Tensor Cores | 568 | 432 |
| FP8 Performance | 724 TFLOPS | |
| FP16 Performance | 362 TFLOPS | 312 TFLOPS |
| FP32 Performance | 91 TFLOPS | 19.5 TFLOPS |
| FP64 Performance | 1.4 TFLOPS | 9.7 TFLOPS |
| INT8 Performance | 724 TOPS | 624 TOPS |
| Memory Bandwidth | 864 GB/s | 2,039 GB/s |
Performance Analysis
Performance gaps between the L40S and A100 center on precision formats critical for AI. The L40S delivers 362 TFLOPS in FP16 and 91 TFLOPS in FP32, surpassing the A100's 312 TFLOPS FP16 and 19.5 TFLOPS FP32: this favors L40S for FP32-dominant tasks like scientific simulations, while FP16 edges aid mixed-precision training.
Memory bandwidth reveals a stark divide: the A100's 2039 GB/s HBM2e dwarfs the L40S's 864 GB/s GDDR6X, enabling larger batch sizes in training and inference for models like LLMs. Higher bandwidth reduces data bottlenecks, allowing the A100 to process bigger datasets without stalling compute units.
FP8 capability on the L40S at 724 TFLOPS accelerates quantized inference, cutting latency for deployment. Power draw differs at 350W for L40S versus 400W for A100, impacting density in clusters. Interconnects favor A100 with NVLink alongside PCIe 4.0, boosting multi-GPU scaling over L40S's PCIe 4.0 alone.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | 4×NVIDIA L40S 48GB VRAM | 48GB | 46 vCPU 288GB RAM 2500GB Storage | Iowa | $0.88/GPU/hr $3.52/hr total (4×) | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
A100
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() Vast.ai | 2×NVIDIA A100 SXM4 80GB 80GB VRAM | 80GB | 256 vCPU 126GB RAM 5672GB Storage | Slovenia | $0.73/GPU/hr $1.47/hr total (2×) | Available | ||
![]() Vast.ai | 2×NVIDIA A100 SXM4 80GB 80GB VRAM | 80GB | 256 vCPU 126GB RAM 769GB Storage | Slovenia | $0.73/GPU/hr $1.47/hr total (2×) | Available | ||
![]() LeaderGPU | 8×NVIDIA A100 PCIe 80GB 80GB VRAM | 80GB | 64 vCPU 384GB RAM 2000GB Storage | Netherlands | $0.90/GPU/hr $7.20/hr total (8×) | Available | ||
![]() Vast.ai | 2×NVIDIA A100 SXM4 80GB 80GB VRAM | 80GB | 64 vCPU 126GB RAM 1114GB Storage | Czechia | $1.00/GPU/hr $2.00/hr total (2×) | Available | ||
![]() Denvr | 4×NVIDIA A100 PCIe 80GB 80GB VRAM | 80GB | 64 vCPU 512GB RAM 7600GB Storage | Virginia | $1.15/GPU/hr $4.60/hr total (4×) |
When to Choose the L40S
Opt for the L40S in workloads demanding high FP32 throughput: its 91 TFLOPS crushes the A100's 19.5 TFLOPS for graphics rendering or simulations. The 2023 Ada Lovelace architecture with 724 TFLOPS FP8 suits modern quantized inference, and 48 GB GDDR6X handles diverse models efficiently at 350W TDP.
PCIe form factor simplifies single-node deployments without NVLink complexity, ideal for cost-conscious users despite $1.65 per hour starting price.
When to Choose the A100
Choose the A100 for bandwidth-intensive AI training: 2039 GB/s supports massive batch sizes versus L40S's 864 GB/s, accelerating LLM convergence. NVLink and InfiniBand enable superior multi-GPU scaling over PCIe-only L40S.
Abundant supply at $0.13 per hour from 34 offers makes it economical for large-scale deployments, with up to 80 GB HBM2e VRAM fitting enormous models.
Use Cases
A100's 2039 GB/s bandwidth enables larger batch sizes critical for LLM training convergence. NVLink scaling outperforms L40S PCIe in multi-GPU setups.
L40S FP8 at 724 TFLOPS accelerates quantized serving. Its 362 TFLOPS FP16 edges A100's 312 TFLOPS for low-latency responses.
L40S 91 TFLOPS FP32 suits parameter-efficient methods, while A100 2039 GB/s handles data-heavy fine-tuning. Choice depends on model scale and budget.
L40S Ada architecture with 48 GB VRAM and 362 TFLOPS FP16 optimizes diffusion model generation. Higher FP32 at 91 TFLOPS aids rendering fidelity.
L40S 91 TFLOPS FP32 vastly exceeds A100's 19.5 TFLOPS for simulations. Lower 350W TDP supports dense compute clusters.
Frequently Asked Questions
Which GPU has higher FP32 performance?▾
The L40S achieves 91 TFLOPS FP32, far exceeding the A100's 19.5 TFLOPS. This gap benefits FP32-heavy tasks like simulations. FP16 remains close at 362 TFLOPS for L40S versus 312 TFLOPS for A100.
How does memory bandwidth compare?▾
A100 offers 2039 GB/s with HBM2e, over twice the L40S 864 GB/s GDDR6X. Higher bandwidth supports larger batches in training. VRAM is 40-80 GB for A100 against 48 GB for L40S.
What are the current cloud prices?▾
L40S starts at $1.65 per hour, averaging $1.66 across three offers. A100 begins at $0.13 per hour, averaging $1.33 across 34 offers. Availability favors A100 significantly.
Which has better interconnects?▾
A100 supports NVLink, PCIe 4.0, and InfiniBand for multi-GPU scaling. L40S limits to PCIe 4.0. This makes A100 superior for clusters.
What is the TDP difference?▾
L40S draws 350W, lower than A100's 400W. This aids power-efficient deployments. Form factors include PCIe for both, with A100 adding SXM4.
Does L40S support FP8?▾
L40S provides 724 TFLOPS FP8 for quantized inference, unavailable on A100. This leverages Ada Lovelace advances. FP16 is 362 TFLOPS on L40S.
Which is cheaper to rent, the L40S or the A100?▾
Cloud rental prices for both the L40S and A100 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40S have compared to the A100?▾
The L40S has 48 GB of GDDR6X memory. The A100 has 40 to 80 GB of HBM2e memory.
Can I find L40S and A100 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40S and the A100?▾
The L40S uses the Ada Lovelace architecture (2023) while the A100 uses Ampere (2020). The A100 delivers 0.9x the FP16 throughput and 2.4x the memory bandwidth of the L40S.





