Specifications Compared
| Spec | L4 | L40 |
|---|---|---|
| TDP | 72W | 300W |
| VRAM | 24 GB | 48 GB |
| CUDA Cores | 7,424 | 18,176 |
| Memory Type | GDDR6 | GDDR6 |
| Architecture | Ada Lovelace | Ada Lovelace |
| Form Factors | PCIe | PCIe |
| Interconnect | PCIe 4.0 | |
| Tensor Cores | 232 | 568 |
| FP8 Performance | 242 TFLOPS | |
| FP16 Performance | 121 TFLOPS | 90.5 TFLOPS |
| FP32 Performance | 30.3 TFLOPS | 90.5 TFLOPS |
| FP64 Performance | 0.5 TFLOPS | |
| INT8 Performance | 242 TOPS | 724 TOPS |
| Memory Bandwidth | 300 GB/s | 864 GB/s |
Performance Analysis
Floating-point performance profiles shape workload efficiency. L4's 121 TFLOPS FP16 exceeds L40's 90.5 TFLOPS, enabling faster inference in mixed-precision setups common for LLMs. Conversely, L4's 30.3 TFLOPS FP32 trails L40's 90.5 TFLOPS, positioning L40 ahead for training phases reliant on single-precision accumulation.
L4's 242 TFLOPS FP8 further boosts quantized inference throughput. Memory bandwidth impacts data handling: L40's 864 GB/s supports larger batch sizes than L4's 300 GB/s, minimizing bottlenecks in model loading and processing for VRAM-intensive tasks.
Double VRAM on L40 (48 GB versus 24 GB) accommodates larger models without offloading, while L4's 72W TDP promotes higher density in power-limited clusters. These specs translate to L40 favoring memory-bound training and L4 excelling in efficient inference.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L4
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() Vast.ai | NVIDIA L4 24GB VRAM | 24GB | 64 vCPU 101GB RAM 485GB Storage | Iceland | $0.33/GPU/hr | Available | ||
![]() RunPod | NVIDIA L4 24GB VRAM | 24GB | 12 vCPU 50GB RAM | 🌍global | $0.39/GPU/hr | |||
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40 48GB VRAM | 48GB | 8 vCPU 94GB RAM | 🌍global | $0.82/GPU/hr | |||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr |
L40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40 48GB VRAM | 48GB | 8 vCPU 94GB RAM | 🌍global | $0.82/GPU/hr | |||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40 48GB VRAM | 48GB | 14 vCPU 72GB RAM 625GB Storage | Iowa | $0.86/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40 48GB VRAM | 48GB | 26 vCPU 144GB RAM 1250GB Storage | Iowa | $0.86/GPU/hr $1.72/hr total (2×) | Available |
When to Choose the L4
The L4 stands out in low-power, high-density inference deployments. Its 72W TDP enables more GPUs per server compared to L40's 300W, ideal for edge cloud or cost-sensitive scaling. Starting at $0.32/hr, it delivers 121 TFLOPS FP16 and 242 TFLOPS FP8 for throughput-oriented serving of models fitting within 24 GB VRAM.
When to Choose the L40
The L40 proves superior for memory-heavy training and fine-tuning. With 48 GB GDDR6 VRAM and 864 GB/s bandwidth, it processes larger batches and models than L4's 24 GB and 300 GB/s. Balanced 90.5 TFLOPS FP16/FP32 supports diverse AI pipelines despite higher $0.67/hr starting cost.
Use Cases
L40's 90.5 TFLOPS FP32 matches its FP16, outperforming L4's 30.3 TFLOPS FP32 for gradient computations. 48 GB VRAM handles larger models than L4's 24 GB.
L4's 121 TFLOPS FP16 and 242 TFLOPS FP8 exceed L40's 90.5 TFLOPS FP16 for mixed-precision serving. Lower 72W TDP and $0.32/hr pricing suit high-throughput deployments.
L40's balanced 90.5 TFLOPS FP16/FP32 and 864 GB/s bandwidth enable efficient adapter training on large models. 48 GB VRAM exceeds L4's 24 GB for dataset handling.
L40's 48 GB VRAM supports high-resolution image generation without swapping, unlike L4's 24 GB limit. 864 GB/s bandwidth accelerates texture processing.
L4 fits FP16-heavy simulations at 121 TFLOPS with 72W efficiency; L40 handles FP32-dominant tasks at 90.5 TFLOPS with 48 GB VRAM for complex datasets.
Frequently Asked Questions
What is the VRAM difference between L4 and L40?▾
L4 provides 24 GB GDDR6 VRAM, while L40 doubles it to 48 GB. This allows L40 to load larger models without offloading to system RAM.
How do L4 and L40 compare in FP16 performance?▾
L4 achieves 121 TFLOPS FP16, surpassing L40's 90.5 TFLOPS. L4's edge suits inference, but L40 balances with equal FP32 performance.
Which GPU has higher memory bandwidth?▾
L40 offers 864 GB/s, nearly three times L4's 300 GB/s. Higher bandwidth on L40 supports bigger batch sizes in training.
What are the power consumption and pricing differences?▾
L4 uses 72W TDP and starts at $0.32/hr (avg $0.68/hr across 15 offers); L40 requires 300W and $0.67/hr (avg $0.88/hr across 13). L4 favors efficiency-focused rentals.
Is L4 or L40 better for AI inference?▾
L4 excels with 121 TFLOPS FP16 and 242 TFLOPS FP8 at lower cost and power. L40 suits inference needing more than 24 GB VRAM.
Do both GPUs use the same architecture?▾
Yes, both employ Ada Lovelace from 2023 in PCIe form factors. Differences stem from tiering: L4 optimizes efficiency, L40 emphasizes capacity.
Which is cheaper to rent, the L4 or the L40?▾
Cloud rental prices for both the L4 and L40 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L4 have compared to the L40?▾
The L4 has 24 GB of GDDR6 memory. The L40 has 48 GB of GDDR6 memory.
Can I find L4 and L40 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L4 and the L40?▾
The L4 uses the Ada Lovelace architecture (2023) while the L40 uses Ada Lovelace (2023). The L4 delivers 1.3x the FP16 throughput and 2.9x the memory bandwidth of the L40.



