Specifications Compared
| Spec | L40S | RTX-3090 |
|---|---|---|
| TDP | 350W | 350W |
| VRAM | 48 GB | 24 GB |
| CUDA Cores | 18,176 | 10,496 |
| Memory Type | GDDR6X | GDDR6X |
| Architecture | Ada Lovelace | Ampere |
| Form Factors | PCIe | PCIe |
| Interconnect | PCIe 4.0 | NVLink |
| Tensor Cores | 568 | 328 |
| FP8 Performance | 724 TFLOPS | |
| FP16 Performance | 362 TFLOPS | 35.6 TFLOPS |
| FP32 Performance | 91 TFLOPS | 35.6 TFLOPS |
| FP64 Performance | 1.4 TFLOPS | |
| INT8 Performance | 724 TOPS | |
| Memory Bandwidth | 864 GB/s | 936 GB/s |
Performance Analysis
The L40S dominates in compute throughput: its 362 TFLOPS FP16 rating enables training of large language models up to ten times faster than the RTX 3090's 35.6 TFLOPS, reducing epoch times in deep learning pipelines. The FP32 performance of 91 TFLOPS on the L40S versus 35.6 TFLOPS supports precise simulations, while FP8 at 724 TFLOPS accelerates inference for quantized models, cutting latency in deployment scenarios.
Memory capacity proves decisive for real-world applications: 48 GB VRAM on the L40S accommodates batch sizes twice as large as the RTX 3090's 24 GB, minimizing out-of-memory errors in fine-tuning or inference of models exceeding 20 billion parameters. Although the RTX 3090 edges bandwidth at 936 GB/s over 864 GB/s, this advantage fades with larger datasets where VRAM limits batch processing first.
Interconnect differences matter in multi-GPU setups: PCIe 4.0 on the L40S suits single-node clusters, while NVLink on the RTX 3090 excels in peer-to-peer transfers for gaming or smaller-scale distributed training.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
RTX 3090
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA GeForce RTX 3090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Wilmington, Delaware | $0.20/GPU/hr | Available | ||
![]() TensorDock | NVIDIA GeForce RTX 3090 24GB VRAM | 24GB | 0 vCPU 0GB RAM | Dallas, Texas | $0.21/GPU/hr | Available | ||
![]() Vast.ai | 4×NVIDIA GeForce RTX 3090 24GB VRAM | 24GB | 32 vCPU 403GB RAM 104GB Storage | Iceland | $0.25/GPU/hr $1.01/hr total (4×) | Available | ||
![]() Vast.ai | 4×NVIDIA GeForce RTX 3090 24GB VRAM | 24GB | 32 vCPU 252GB RAM 1440GB Storage | Finland | $0.27/GPU/hr $1.07/hr total (4×) | Available | ||
![]() LeaderGPU | 8×NVIDIA GeForce RTX 3090 24GB VRAM | 24GB | 64 vCPU 384GB RAM 2000GB Storage | Netherlands | $0.29/GPU/hr $2.29/hr total (8×) | Available |
When to Choose the L40S
The L40S excels in demanding AI workloads requiring extensive VRAM and peak performance: training or fine-tuning large language models benefits from 48 GB GDDR6X and 362 TFLOPS FP16, allowing larger batches without performance degradation. FP8 support at 724 TFLOPS makes it ideal for high-throughput inference servers handling quantized models at scale.
Datacenter deployments prioritize the L40S for its Ada Lovelace efficiency, despite $1.10 per hour average pricing, when projects demand 91 TFLOPS FP32 for scientific computing or generative AI.
When to Choose the RTX 3090
The RTX 3090 suits budget-conscious users with moderate workloads: its $0.08 per hour starting price and 52 live offers provide cost-effective access for prototyping or small-scale fine-tuning where 24 GB VRAM suffices.
Higher 936 GB/s bandwidth aids tasks like Stable Diffusion generation or scientific simulations sensitive to memory speed, and NVLink enables efficient multi-GPU consumer setups without the L40S's premium.
Use Cases
The L40S's 362 TFLOPS FP16 and 48 GB VRAM handle large models with bigger batches than the RTX 3090's 35.6 TFLOPS and 24 GB.
FP8 at 724 TFLOPS on the L40S accelerates quantized inference, paired with double the VRAM for high concurrency.
91 TFLOPS FP32 and 48 GB VRAM support efficient fine-tuning of models over 20B parameters without memory constraints.
RTX 3090's 936 GB/s bandwidth speeds image generation, and $0.41/hr average cost fits frequent creative workflows.
Both offer 350W TDP and strong FP32 at 91 TFLOPS (L40S) or 35.6 TFLOPS (RTX 3090); choose by budget or scale.
Frequently Asked Questions
Which GPU has more VRAM, L40S or RTX 3090?▾
The L40S provides 48 GB GDDR6X VRAM, double the RTX 3090's 24 GB. This allows larger models in AI tasks. Batch sizes increase accordingly without system memory overflow.
How do FP16 performances compare between L40S and RTX 3090?▾
L40S delivers 362 TFLOPS FP16 versus RTX 3090's 35.6 TFLOPS. Training speeds improve dramatically with mixed precision. Inference latency drops in deep learning pipelines.
What is the cloud pricing for these GPUs?▾
L40S starts at $0.40/hr average $1.10/hr across 18 offers; RTX 3090 at $0.08/hr average $0.41/hr across 52 offers. Costs align with performance tiers. Availability favors RTX 3090.
Does L40S support FP8, and how does it compare?▾
L40S achieves 724 TFLOPS FP8, unavailable on RTX 3090. This boosts quantized inference throughput. Deployment scales better for production LLMs.
Which has higher memory bandwidth?▾
RTX 3090 leads with 936 GB/s over L40S's 864 GB/s. Bandwidth aids data-heavy tasks like rendering. VRAM capacity often overrides in AI.
Are both GPUs suitable for multi-GPU setups?▾
L40S uses PCIe 4.0; RTX 3090 employs NVLink. NVLink excels in fast peer-to-peer for consumers. PCIe 4.0 fits datacenter scaling.
Which is cheaper to rent, the L40S or the RTX 3090?▾
Cloud rental prices for both the L40S and RTX 3090 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40S have compared to the RTX 3090?▾
The L40S has 48 GB of GDDR6X memory. The RTX 3090 has 24 GB of GDDR6X memory.
Can I find L40S and RTX 3090 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40S and the RTX 3090?▾
The L40S uses the Ada Lovelace architecture (2023) while the RTX 3090 uses Ampere (2020). The L40S delivers 10.2x the FP16 throughput and 1.1x the memory bandwidth of the RTX 3090.




