A40 vs L40S: 9.7x FP16 Gap, 48GB vs 48GB

Specifications Compared

Spec	A40	L40S
TDP	300W	350W
VRAM	48 GB	48 GB
CUDA Cores	10,752	18,176
Memory Type	GDDR6	GDDR6X
Architecture	Ampere	Ada Lovelace
Form Factors	PCIe	PCIe
Interconnect	NVLink	PCIe 4.0
Tensor Cores	336	568
FP16 Performance	37.4 TFLOPS	362 TFLOPS
FP32 Performance	37.4 TFLOPS	91 TFLOPS
FP64 Performance	0.6 TFLOPS	1.4 TFLOPS
INT8 Performance	299 TOPS	724 TOPS
Memory Bandwidth	696 GB/s	864 GB/s

Performance Analysis

Compute specifications highlight the L40S dominance: 362 TFLOPS FP16 versus the A40's 37.4 TFLOPS accelerates deep learning training by nearly 9.7 times, reducing epochs for models like transformers. FP32 performance at 91 TFLOPS on the L40S outpaces the A40's 37.4 TFLOPS by 2.4 times, aiding precision-sensitive simulations. The L40S FP8 at 724 TFLOPS enables ultra-fast inference with quantization, ideal for deployment.

Memory bandwidth of 864 GB/s on the L40S exceeds the A40's 696 GB/s by 24 percent, supporting larger batch sizes in training and minimizing data starvation for 48 GB VRAM utilization. This delta enhances throughput in memory-bound workloads such as fine-tuning large models. The L40S 350W TDP versus 300W reflects higher performance density, though it requires robust power delivery.

In real-world terms, the L40S handles modern Ada-optimized frameworks efficiently, while the A40 suffices for Ampere-era codebases but lags in raw speed.

Live Cloud Pricing

Real-time prices from 25+ providers. Updated every 60 seconds.

A40

Provider	GPU Model	VRAM	Host Specs	Region	Price
RunPod	NVIDIA RTX A4000 16GB VRAM	16GB	8 vCPU 25GB RAM	🌍global	$0.25/GPU/hr
Cirrascale	8×NVIDIA RTX A4000 16GB VRAM	16GB	40 vCPU 256GB RAM 2610GB Storage	United States	$0.27/GPU/hr $2.16/hr total (8×)
Cirrascale	8×NVIDIA RTX A4000 16GB VRAM	16GB	40 vCPU 256GB RAM 2610GB Storage	United States	$0.31/GPU/hr $2.48/hr total (8×)
Cirrascale	8×NVIDIA RTX A4000 16GB VRAM	16GB	40 vCPU 256GB RAM 2610GB Storage	United States	$0.33/GPU/hr $2.64/hr total (8×)
Cirrascale	8×NVIDIA RTX A4000 16GB VRAM	16GB	40 vCPU 256GB RAM 2610GB Storage	United States	$0.34/GPU/hr $2.72/hr total (8×)

L40S

Provider	GPU Model	VRAM	Host Specs	Region	Price	Status
Vast.ai	NVIDIA L40S 48GB VRAM	48GB	256 vCPU 189GB RAM 2798GB Storage	Slovenia	$0.80/GPU/hr	Available
Massed Compute	8×NVIDIA L40S 48GB VRAM	48GB	94 vCPU 576GB RAM 5000GB Storage	Iowa	$0.88/GPU/hr $7.04/hr total (8×)	Available
Massed Compute	2×NVIDIA L40S 48GB VRAM	48GB	24 vCPU 144GB RAM 1250GB Storage	Iowa	$0.88/GPU/hr $1.76/hr total (2×)	Available
Massed Compute	NVIDIA L40S 48GB VRAM	48GB	12 vCPU 72GB RAM 625GB Storage	Iowa	$0.88/GPU/hr	Available
Massed Compute	NVIDIA L40S 48GB VRAM	48GB	12 vCPU 72GB RAM 625GB Storage	Iowa	$0.88/GPU/hr	Available

View all 49 offers

QuantaCloud

Comparing providers? We broker across all of them.

Stop tab-switching between pricing pages. Tell us what you need — 16+ GPUs, reserved or cluster capacity — and we return one quote at partner rates within 24 hours.

No waitlist24hr quote turnaroundInfiniBand fabric

Compare real-time pricing across 25+ providers

When to Choose the A40

The A40 fits cost-sensitive or power-limited environments. Pricing starts at $0.24 per hour across 23 cloud offers, undercutting the L40S $0.40 per hour minimum, with 48 GB GDDR6 VRAM at 300W TDP suiting legacy servers. NVLink interconnect enables scalable multi-GPU training for Ampere-specific software stacks.

When to Choose the L40S

The L40S targets high-performance AI pipelines. Its 362 TFLOPS FP16 and 724 TFLOPS FP8 dwarf the A40 equivalents, speeding LLM training and inference, while 864 GB/s bandwidth handles large batches. Average $1.10 per hour across 18 offers delivers strong value for Ada workloads.

Use Cases

LLM Training

L40S

L40S FP16 at 362 TFLOPS is 9.7 times the A40's 37.4 TFLOPS, slashing training times for large models. Higher 864 GB/s bandwidth supports bigger batches on 48 GB VRAM.

LLM Inference

L40S

L40S FP8 reaches 724 TFLOPS for quantized serving, far beyond A40 capabilities. 362 TFLOPS FP16 ensures low-latency responses.

Fine-tuning

L40S

L40S 91 TFLOPS FP32 and 362 TFLOPS FP16 outperform A40's 37.4 TFLOPS each, accelerating parameter updates. Bandwidth edge aids memory-intensive tuning.

Stable Diffusion

L40S

L40S 362 TFLOPS FP16 generates images 9.7 times faster than A40's 37.4 TFLOPS. 48 GB VRAM handles high-resolution diffusion models.

Scientific Computing

L40S

L40S 91 TFLOPS FP32 exceeds A40's 37.4 TFLOPS by 2.4 times for simulations. Ada architecture optimizes parallel compute workloads.

Frequently Asked Questions

Do the A40 and L40S have the same VRAM?▾

Both GPUs provide 48 GB VRAM. A40 uses GDDR6, while L40S employs faster GDDR6X with 864 GB/s bandwidth versus 696 GB/s.

Which GPU is cheaper in the cloud?▾

A40 starts at $0.24 per hour (average $1.26 per hour across 23 offers). L40S begins at $0.40 per hour (average $1.10 per hour across 18 offers).

What is the FP16 performance difference?▾

L40S delivers 362 TFLOPS FP16, 9.7 times the A40's 37.4 TFLOPS. This gap favors L40S for AI training.

Which has higher TDP?▾

L40S TDP is 350W, higher than A40's 300W. This supports greater compute but needs better cooling.

What architectures do they use?▾

A40 is Ampere from 2020 with NVLink. L40S is Ada Lovelace from 2023 with PCIe 4.0.

Is L40S better for inference?▾

Yes, L40S FP8 at 724 TFLOPS excels for quantized inference. FP16 at 362 TFLOPS also outpaces A40's 37.4 TFLOPS.

Which is cheaper to rent, the A40 or the L40S?▾

Cloud rental prices for both the A40 and L40S vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.

How much VRAM does the A40 have compared to the L40S?▾

The A40 has 48 GB of GDDR6 memory. The L40S has 48 GB of GDDR6X memory.

Can I find A40 and L40S GPUs available to rent right now?▾

Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.

What is the main difference between the A40 and the L40S?▾

The A40 uses the Ampere architecture (2020) while the L40S uses Ada Lovelace (2023). The L40S delivers 9.7x the FP16 throughput and 1.2x the memory bandwidth of the A40.