Specifications Compared
| Spec | L4 | L40S |
|---|---|---|
| TDP | 72W | 350W |
| VRAM | 24 GB | 48 GB |
| CUDA Cores | 7,424 | 18,176 |
| Memory Type | GDDR6 | GDDR6X |
| Architecture | Ada Lovelace | Ada Lovelace |
| Form Factors | PCIe | PCIe |
| Interconnect | PCIe 4.0 | PCIe 4.0 |
| Tensor Cores | 232 | 568 |
| FP8 Performance | 242 TFLOPS | 724 TFLOPS |
| FP16 Performance | 121 TFLOPS | 362 TFLOPS |
| FP32 Performance | 30.3 TFLOPS | 91 TFLOPS |
| FP64 Performance | 0.5 TFLOPS | 1.4 TFLOPS |
| INT8 Performance | 242 TOPS | 724 TOPS |
| Memory Bandwidth | 300 GB/s | 864 GB/s |
Performance Analysis
Compute throughput defines the core disparity: the L40S delivers 362 TFLOPS FP16 versus the L4's 121 TFLOPS, enabling roughly three times faster tensor operations in mixed-precision training. FP32 performance at 91 TFLOPS on L40S outpaces the L4's 30.3 TFLOPS, benefiting general-purpose computing and simulations requiring single-precision arithmetic.
Memory subsystems amplify real-world impacts. The L40S's 864 GB/s bandwidth, nearly three times the L4's 300 GB/s, supports larger batch sizes in inference and training, reducing bottlenecks for large language models. Coupled with 48 GB VRAM against 24 GB, the L40S handles models exceeding 24 GB without excessive paging, accelerating convergence in fine-tuning workflows.
Power dynamics contrast sharply: the L4's 72W TDP yields dense deployments, but the L40S's 350W sustains peak performance under sustained loads. FP8 at 724 TFLOPS on L40S versus 242 TFLOPS on L4 optimizes quantized inference, where low-precision formats dominate production serving.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L4
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() Vast.ai | NVIDIA L4 24GB VRAM | 24GB | 64 vCPU 101GB RAM 485GB Storage | Iceland | $0.33/GPU/hr | Available | ||
![]() RunPod | NVIDIA L4 24GB VRAM | 24GB | 12 vCPU 50GB RAM | 🌍global | $0.39/GPU/hr | |||
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40 48GB VRAM | 48GB | 8 vCPU 94GB RAM | 🌍global | $0.82/GPU/hr | |||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr |
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
When to Choose the L4
The L4 suits budget-conscious inference and light workloads. Its 72W TDP enables high-density server configurations, and cloud pricing from $0.32 per hour across 11 offers minimizes costs for always-on serving. Deploy it for edge AI or small-batch LLM inference where 24 GB VRAM and 121 TFLOPS FP16 suffice without overprovisioning.
Power-limited environments favor the L4. Cooling requirements stay low, and PCIe 4.0 compatibility fits legacy racks, ideal for prototyping or non-critical tasks.
When to Choose the L40S
The L40S dominates demanding training and large-model inference. With 48 GB VRAM and 362 TFLOPS FP16, it processes expansive datasets and models infeasible on the L4's 24 GB. Bandwidth at 864 GB/s supports massive batches, slashing training times.
High-performance computing selects the L40S. FP32 at 91 TFLOPS accelerates simulations, and $1.65 per hour pricing justifies premiums for throughput gains in production pipelines.
Use Cases
The L40S's 48 GB VRAM and 362 TFLOPS FP16 handle large-scale training batches effectively. The L4's 24 GB limits model sizes compared to the L40S's capacity.
Higher 724 TFLOPS FP8 on L40S supports quantized serving at scale. Bandwidth of 864 GB/s enables bigger concurrent requests versus L4's 300 GB/s.
L40S's 91 TFLOPS FP32 and double VRAM accelerate parameter updates on mid-sized models. L4's 30.3 TFLOPS FP32 constrains efficiency.
L4's 24 GB VRAM suffices for standard generations at 121 TFLOPS FP16. L40S's 48 GB excels in high-resolution or batch workflows.
L40S's 91 TFLOPS FP32 outperforms L4's 30.3 TFLOPS for simulations. Greater bandwidth aids data-intensive HPC tasks.
Frequently Asked Questions
Which GPU has higher performance for AI training?▾
The L40S provides 362 TFLOPS FP16 and 91 TFLOPS FP32, over three times the L4's 121 TFLOPS FP16 and 30.3 TFLOPS FP32. This gap shortens training cycles for deep learning models.
How do VRAM capacities compare between L4 and L40S?▾
L40S offers 48 GB GDDR6X, double the L4's 24 GB GDDR6. Larger VRAM on L40S accommodates bigger models without offloading.
What are the cloud rental prices for these GPUs?▾
L4 starts at $0.32 per hour, averaging $0.78 across 11 offers. L40S begins at $1.65 per hour, averaging $1.66 across 3 offers.
Which has better memory bandwidth?▾
L40S achieves 864 GB/s, nearly three times the L4's 300 GB/s. This improves data throughput for large batch inference.
What is the power consumption difference?▾
L4 uses 72W TDP for efficiency. L40S requires 350W, supporting sustained high performance.
Are both GPUs suitable for inference?▾
Yes, but L40S's 724 TFLOPS FP8 excels in high-volume serving. L4's 242 TFLOPS FP8 fits lighter loads at lower cost.
Which is cheaper to rent, the L4 or the L40S?▾
Cloud rental prices for both the L4 and L40S vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L4 have compared to the L40S?▾
The L4 has 24 GB of GDDR6 memory. The L40S has 48 GB of GDDR6X memory.
Can I find L4 and L40S GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L4 and the L40S?▾
The L4 uses the Ada Lovelace architecture (2023) while the L40S uses Ada Lovelace (2023). The L40S delivers 3.0x the FP16 throughput and 2.9x the memory bandwidth of the L4.



