L40 vs L40S: 4.0x FP16 Gap, 48GB vs 48GB

Specifications Compared

Spec	L40	L40S
TDP	300W	350W
VRAM	48 GB	48 GB
CUDA Cores	18,176	18,176
Memory Type	GDDR6	GDDR6X
Architecture	Ada Lovelace	Ada Lovelace
Form Factors	PCIe	PCIe
Interconnect		PCIe 4.0
Tensor Cores	568	568
FP16 Performance	90.5 TFLOPS	362 TFLOPS
FP32 Performance	90.5 TFLOPS	91 TFLOPS
INT8 Performance	724 TOPS	724 TOPS
Memory Bandwidth	864 GB/s	864 GB/s

Performance Analysis

The L40S outperforms the L40 significantly in half-precision computing: its 362 TFLOPS FP16 rate quadruples the L40's 90.5 TFLOPS, accelerating deep learning training and inference where models leverage mixed precision. FP32 performance remains comparable at 90.5 TFLOPS for the L40 and 91 TFLOPS for the L40S, suiting workloads like scientific simulations that demand single-precision accuracy. The L40S exclusive FP8 capability at 724 TFLOPS enables ultra-efficient quantized inference for large language models.

Identical 864 GB/s memory bandwidth on both GPUs means equivalent support for large batch sizes in training, preventing bottlenecks in data-heavy tasks. However, the L40S 350W TDP versus the L40 300W allows sustained higher performance under load, though it increases power and cooling demands. In real-world AI pipelines, the L40S handles 4x more FP16 operations per second, reducing training epochs for models like transformers.

For inference, the FP8 advantage on the L40S lowers latency in serving quantized models, while shared 48 GB VRAM ensures both manage billion-parameter LLMs without splitting.

Live Cloud Pricing

Real-time prices from 25+ providers. Updated every 60 seconds.

L40

Provider	GPU Model	VRAM	Host Specs	Region	Price	Status
Vast.ai	NVIDIA L40S 48GB VRAM	48GB	256 vCPU 189GB RAM 2798GB Storage	Slovenia	$0.80/GPU/hr	Available
RunPod	NVIDIA L40 48GB VRAM	48GB	8 vCPU 94GB RAM	🌍global	$0.82/GPU/hr
Massed Compute	NVIDIA L40 48GB VRAM	48GB	14 vCPU 72GB RAM 625GB Storage	Iowa	$0.86/GPU/hr	Available
Massed Compute	2×NVIDIA L40 48GB VRAM	48GB	26 vCPU 144GB RAM 1250GB Storage	Iowa	$0.86/GPU/hr $1.72/hr total (2×)	Available
Massed Compute	4×NVIDIA L40 48GB VRAM	48GB	50 vCPU 288GB RAM 2500GB Storage	Iowa	$0.86/GPU/hr $3.44/hr total (4×)	Available

L40S

Provider	GPU Model	VRAM	Host Specs	Region	Price	Status
Vast.ai	NVIDIA L40S 48GB VRAM	48GB	256 vCPU 189GB RAM 2779GB Storage	Slovenia	$0.80/GPU/hr	Available
Massed Compute	2×NVIDIA L40S 48GB VRAM	48GB	24 vCPU 144GB RAM 1250GB Storage	Iowa	$0.88/GPU/hr $1.76/hr total (2×)	Available
Massed Compute	4×NVIDIA L40S 48GB VRAM	48GB	46 vCPU 288GB RAM 2500GB Storage	Iowa	$0.88/GPU/hr $3.52/hr total (4×)	Available
Massed Compute	NVIDIA L40S 48GB VRAM	48GB	12 vCPU 72GB RAM 625GB Storage	Iowa	$0.88/GPU/hr	Available
Massed Compute	2×NVIDIA L40S 48GB VRAM	48GB	24 vCPU 144GB RAM 1250GB Storage	Iowa	$0.88/GPU/hr $1.76/hr total (2×)	Available

View all 59 offers

QuantaCloud

Comparing providers? We broker across all of them.

Stop tab-switching between pricing pages. Tell us what you need — 16+ GPUs, reserved or cluster capacity — and we return one quote at partner rates within 24 hours.

No waitlist24hr quote turnaroundInfiniBand fabric

Compare real-time pricing across 25+ providers

When to Choose the L40

The L40 suits cost-sensitive deployments where FP32 workloads dominate. Its 90.5 TFLOPS FP32 matches the L40S 91 TFLOPS closely, ideal for scientific computing or legacy graphics rendering. Lower 300W TDP reduces operational costs in power-constrained clouds, and pricing from $0.67 per hour appeals for long-running jobs across 11 offers.

Choose the L40 for balanced performance without FP16 premiums, such as in environments prioritizing efficiency over peak AI throughput.

When to Choose the L40S

The L40S excels in modern AI tasks leveraging FP16 and FP8 precision. Its 362 TFLOPS FP16 delivers four times the L40's rate, speeding LLM training and inference, while 724 TFLOPS FP8 optimizes quantized serving. Despite higher average $1.10 per hour pricing across 18 offers, minimum $0.40 per hour options make it viable.

Select the L40S for high-throughput workloads like fine-tuning large models, where 350W TDP sustains superior compute.

Use Cases

LLM Training

L40S

L40S 362 TFLOPS FP16 quadruples L40 90.5 TFLOPS, reducing training time for large models. Shared 48 GB VRAM supports massive batches.

LLM Inference

L40S

L40S FP8 at 724 TFLOPS enables quantized serving with low latency. FP16 advantage accelerates real-time queries over L40.

Fine-tuning

L40S

Higher 362 TFLOPS FP16 on L40S speeds parameter updates versus L40 90.5 TFLOPS. Identical bandwidth aids large datasets.

Stable Diffusion

L40S

L40S FP16 performance generates images 4x faster than L40. 48 GB VRAM handles high-resolution diffusion models.

Scientific Computing

L40

L40 FP32 at 90.5 TFLOPS nearly matches L40S 91 TFLOPS for simulations. Lower 300W TDP cuts costs in FP32-heavy tasks.

Frequently Asked Questions

What is the main performance difference between L40 and L40S?▾

The L40S offers 362 TFLOPS FP16 versus L40 90.5 TFLOPS, a 4x boost for AI tasks. It adds 724 TFLOPS FP8 absent on L40, with FP32 at 91 TFLOPS versus 90.5 TFLOPS.

Which has better pricing on gpuperhour.com?▾

L40S starts at $0.40 per hour across 18 offers, cheaper minimum than L40 $0.67 per hour over 11 offers. L40 averages lower at $0.86 versus L40S $1.10.

Do L40 and L40S have the same VRAM?▾

Both provide 48 GB VRAM with 864 GB/s bandwidth. L40 uses GDDR6, L40S GDDR6X for potential efficiency gains.

What is the TDP difference?▾

L40 consumes 300W, L40S 350W. Higher TDP on L40S supports sustained peak performance in demanding workloads.

Are they the same architecture?▾

Both use Ada Lovelace from 2023 in PCIe form factors. L40S specifies PCIe 4.0 interconnect.

Is L40S better for inference?▾

Yes, L40S FP8 724 TFLOPS excels in quantized LLM inference. FP16 362 TFLOPS also outperforms L40 90.5 TFLOPS.

Which is cheaper to rent, the L40 or the L40S?▾

Cloud rental prices for both the L40 and L40S vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.

How much VRAM does the L40 have compared to the L40S?▾

The L40 has 48 GB of GDDR6 memory. The L40S has 48 GB of GDDR6X memory.

Can I find L40 and L40S GPUs available to rent right now?▾

Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.

What is the main difference between the L40 and the L40S?▾

The L40 uses the Ada Lovelace architecture (2023) while the L40S uses Ada Lovelace (2023). The L40S delivers 4.0x the FP16 throughput and 1.0x the memory bandwidth of the L40.