Specifications Compared
| Spec | L40S | RTX-4090 |
|---|---|---|
| TDP | 350W | 450W |
| VRAM | 48 GB | 24 GB |
| CUDA Cores | 18,176 | 16,384 |
| Memory Type | GDDR6X | GDDR6X |
| Architecture | Ada Lovelace | Ada Lovelace |
| Form Factors | PCIe | PCIe |
| Interconnect | PCIe 4.0 | PCIe 4.0 |
| Tensor Cores | 568 | 512 |
| FP8 Performance | 724 TFLOPS | 660 TFLOPS |
| FP16 Performance | 362 TFLOPS | 165 TFLOPS |
| FP32 Performance | 91 TFLOPS | 82.6 TFLOPS |
| FP64 Performance | 1.4 TFLOPS | 1.3 TFLOPS |
| INT8 Performance | 724 TOPS | 660 TOPS |
| Memory Bandwidth | 864 GB/s | 1,008 GB/s |
Performance Analysis
The L40S outperforms in FP16 at 362 TFLOPS versus the RTX 4090's 165 TFLOPS, accelerating mixed-precision training where models leverage half-precision for speed: this delta enables faster convergence on large neural networks. Its FP32 rate of 91 TFLOPS edges out 82.6 TFLOPS, benefiting single-precision scientific simulations. The doubled 48 GB VRAM on the L40S sustains larger batch sizes in LLM training, reducing overhead from model swapping compared to the 24 GB limit on the RTX 4090.
Memory bandwidth favors the RTX 4090 at 1008 GB/s over 864 GB/s, improving throughput in bandwidth-bound inference scenarios like high-resolution image generation. For FP8 inference optimized for deployment, the L40S's 724 TFLOPS surpasses 660 TFLOPS, supporting quantized models at scale. Lower TDP of 350W on the L40S versus 450W aids dense cloud packing, though real-world efficiency hinges on workload memory intensity.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
RTX 4090
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Chubbuck, Idaho | $0.39/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 32 vCPU 101GB RAM 152GB Storage | Iceland | $0.40/GPU/hr | Available | ||
![]() TensorDock | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Orlando, Florida | $0.48/GPU/hr | Available | ||
![]() Vast.ai | NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 32 vCPU 101GB RAM 108GB Storage | Iceland | $0.53/GPU/hr | Available | ||
![]() Vast.ai | 4×NVIDIA GeForce RTX 4090 24GB VRAM | 24GB | 80 vCPU 157GB RAM 856GB Storage | United Kingdom | $0.67/GPU/hr $2.67/hr total (4×) | Available |
When to Choose the L40S
The L40S excels in enterprise deployments requiring 48 GB VRAM: large-scale LLM training or fine-tuning of models exceeding 24 GB fits perfectly, avoiding fragmentation. Its superior FP16 at 362 TFLOPS and FP32 at 91 TFLOPS handle compute-intensive tasks efficiently despite the $1.66 per hour average cost.
When to Choose the RTX 4090
The RTX 4090 suits budget-conscious users with its $0.39 per hour average pricing across 75 offers: workloads like Stable Diffusion or smaller inference batches thrive on 24 GB VRAM and 1008 GB/s bandwidth. Higher availability makes it ideal for prototyping or high-volume parallel jobs where 165 TFLOPS FP16 suffices.
Use Cases
The L40S's 48 GB VRAM and 362 TFLOPS FP16 support larger models and batches without swapping. RTX 4090's 24 GB limits scale on massive datasets.
RTX 4090's 1008 GB/s bandwidth aids high-throughput serving under 24 GB. L40S's 724 TFLOPS FP8 handles quantized large models efficiently.
91 TFLOPS FP32 and 48 GB VRAM on L40S accelerate parameter-efficient tuning of big models. RTX 4090 constraints apply to memory-heavy adapters.
RTX 4090's 24 GB VRAM and 1008 GB/s bandwidth generate images rapidly at $0.39 per hour. L40S overkill for typical resolutions.
RTX 4090's 82.6 TFLOPS FP32 and lower $0.27 per hour cost fit simulations. L40S's extras unnecessary for standard HPC loads.
Frequently Asked Questions
Which has more VRAM, L40S or RTX 4090?▾
The L40S provides 48 GB GDDR6X VRAM, twice the RTX 4090's 24 GB. This advantage suits large model training. RTX 4090 suffices for most inference.
What is the FP16 performance difference?▾
L40S delivers 362 TFLOPS FP16, more than double the RTX 4090's 165 TFLOPS. This boosts mixed-precision AI training speed. Inference sees similar gains.
How do cloud prices compare?▾
RTX 4090 starts at $0.27 per hour averaging $0.39 across 75 offers. L40S begins at $1.65 averaging $1.66 over three offers. Cost drives most choices.
Which has higher memory bandwidth?▾
RTX 4090 offers 1008 GB/s, exceeding L40S's 864 GB/s. Bandwidth aids data-heavy tasks like image processing. VRAM capacity offsets for L40S.
What are the TDP ratings?▾
L40S consumes 350W TDP, lower than RTX 4090's 450W. This improves density in multi-GPU clouds. Power efficiency varies by workload.
Are both on Ada Lovelace?▾
Yes, L40S uses 2023 Ada Lovelace, RTX 4090 2022 version. Shared architecture ensures similar tensor core features. L40S targets datacenter optimization.
Which is cheaper to rent, the L40S or the RTX 4090?▾
Cloud rental prices for both the L40S and RTX 4090 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40S have compared to the RTX 4090?▾
The L40S has 48 GB of GDDR6X memory. The RTX 4090 has 24 GB of GDDR6X memory.
Can I find L40S and RTX 4090 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40S and the RTX 4090?▾
The L40S uses the Ada Lovelace architecture (2023) while the RTX 4090 uses Ada Lovelace (2022). The RTX 4090 delivers 0.5x the FP16 throughput and 1.2x the memory bandwidth of the L40S.



