Specifications Compared
| Spec | L40S | V100 |
|---|---|---|
| TDP | 350W | 300W |
| VRAM | 48 GB | 16-32 GB |
| CUDA Cores | 18,176 | 5,120 |
| Memory Type | GDDR6X | HBM2 |
| Architecture | Ada Lovelace | Volta |
| Form Factors | PCIe | SXM2, PCIe |
| Interconnect | PCIe 4.0 | NVLink, PCIe 3.0 |
| Tensor Cores | 568 | 640 |
| FP8 Performance | 724 TFLOPS | |
| FP16 Performance | 362 TFLOPS | 125 TFLOPS |
| FP32 Performance | 91 TFLOPS | 15.7 TFLOPS |
| FP64 Performance | 1.4 TFLOPS | 7.8 TFLOPS |
| INT8 Performance | 724 TOPS | |
| Memory Bandwidth | 864 GB/s | 900 GB/s |
Performance Analysis
The L40S dominates in mixed-precision workloads due to its FP16 rating of 362 TFLOPS, nearly tripling the V100's 125 TFLOPS; this accelerates deep learning training and inference where half-precision is standard. FP32 performance shows an even larger gap at 91 TFLOPS for the L40S versus 15.7 TFLOPS for the V100, benefiting scientific simulations or legacy code requiring full precision.
VRAM capacity is the key differentiator: 48 GB on the L40S supports larger batch sizes and complex models that exceed the V100's 16 GB limit, reducing the need for model parallelism. Bandwidth is similar with 864 GB/s versus 900 GB/s, so the V100 holds a slight edge in memory-intensive tasks fitting within its constraints, but the L40S's extra capacity mitigates bottlenecks for real-world AI scaling.
Power draw is close at 350W TDP for the L40S and 300W for the V100, implying comparable efficiency in dense deployments. Overall, the L40S translates specs to 2-6x speedups in modern frameworks like PyTorch for transformer-based models.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | 4×NVIDIA L40S 48GB VRAM | 48GB | 46 vCPU 288GB RAM 2500GB Storage | Iowa | $0.88/GPU/hr $3.52/hr total (4×) | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
Tesla V100 16GB
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA Tesla V100 16GB 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Texas | $0.19/GPU/hr | Available | ||
![]() TensorDock | NVIDIA Tesla V100 16GB 16GB VRAM | 16GB | 0 vCPU 0GB RAM | New York City | $0.19/GPU/hr | Available | ||
![]() TensorDock | NVIDIA Tesla V100 32GB 32GB VRAM | 32GB | 0 vCPU 0GB RAM | Texas | $0.29/GPU/hr | Available | ||
![]() TensorDock | NVIDIA Tesla V100 32GB 32GB VRAM | 32GB | 0 vCPU 0GB RAM | New York City | $0.29/GPU/hr | Available | ||
![]() Lambda Labs | 8×NVIDIA Tesla V100 16GB 16GB VRAM | 16GB | 88 vCPU 448GB RAM 6041GB Storage | Texas | $0.79/GPU/hr $6.32/hr total (8×) | Available |
When to Choose the L40S
Opt for the L40S in scenarios demanding high VRAM and throughput, such as training large language models exceeding 16 GB or running FP8 inference at 724 TFLOPS. Its 48 GB GDDR6X and Ada Lovelace features excel in multi-GPU setups via PCIe 4.0, ideal for cloud users prioritizing speed over cost.
The L40S suits fine-tuning or generative AI where FP16 performance of 362 TFLOPS halves training times compared to the V100.
When to Choose the Tesla V100 16GB
Choose the V100 16GB for cost-sensitive legacy applications or small-scale inference fitting within 16 GB HBM2. At $0.10 per hour starting price, it offers value for prototyping or workloads leveraging NVLink in older clusters.
It remains viable for FP32-heavy scientific computing at 15.7 TFLOPS where the V100's 900 GB/s bandwidth supports high-throughput data movement without needing the L40S's power overhead.
Use Cases
The L40S's 48 GB VRAM accommodates massive models, while 362 TFLOPS FP16 speeds training 3x over the V100's 125 TFLOPS.
FP8 at 724 TFLOPS and 48 GB capacity enable high-batch inference; V100's 16 GB limits scale on large LLMs.
91 TFLOPS FP32 and ample VRAM support efficient fine-tuning; outperforms V100's 15.7 TFLOPS significantly.
Ada architecture with 362 TFLOPS FP16 accelerates diffusion models; 48 GB handles high-resolution generations.
L40S offers 91 TFLOPS FP32 for speed, but V100's 900 GB/s bandwidth and lower $0.10/hr cost suit memory-bound tasks under 16 GB.
Frequently Asked Questions
Which GPU has more VRAM: L40S or V100 16GB?▾
The L40S provides 48 GB GDDR6X VRAM, triple the V100 16GB's 16 GB HBM2 capacity. This enables larger models on the L40S without sharding.
How do FP16 performances compare between L40S and V100?▾
L40S achieves 362 TFLOPS in FP16, nearly 3x the V100's 125 TFLOPS. This boosts AI training and inference speeds significantly.
What are the cloud pricing differences for L40S vs V100 16GB?▾
L40S starts at $0.40 per hour averaging $1.13 across 23 offers; V100 16GB from $0.10 per hour averaging $0.81 over 25 offers. V100 suits budgets, L40S performance.
Does the L40S or V100 have higher memory bandwidth?▾
V100 edges out with 900 GB/s versus L40S's 864 GB/s. However, L40S's 48 GB VRAM offsets this for larger workloads.
Which is better for FP32 tasks: L40S or V100?▾
L40S delivers 91 TFLOPS FP32, over 5x the V100's 15.7 TFLOPS. Choose L40S for demanding single-precision computing.
What interconnects do L40S and V100 support?▾
L40S uses PCIe 4.0; V100 supports NVLink and PCIe 3.0. PCIe 4.0 on L40S provides higher bandwidth in new clusters.
Which is cheaper to rent, the L40S or the V100?▾
Cloud rental prices for both the L40S and V100 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40S have compared to the V100?▾
The L40S has 48 GB of GDDR6X memory. The V100 has 16 to 32 GB of HBM2 memory.
Can I find L40S and V100 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40S and the V100?▾
The L40S uses the Ada Lovelace architecture (2023) while the V100 uses Volta (2017). The L40S delivers 2.9x the FP16 throughput and 1.0x the memory bandwidth of the V100.



