Specifications Compared
| Spec | L40S | V100 |
|---|---|---|
| TDP | 350W | 300W |
| VRAM | 48 GB | 16-32 GB |
| CUDA Cores | 18,176 | 5,120 |
| Memory Type | GDDR6X | HBM2 |
| Architecture | Ada Lovelace | Volta |
| Form Factors | PCIe | SXM2, PCIe |
| Interconnect | PCIe 4.0 | NVLink, PCIe 3.0 |
| Tensor Cores | 568 | 640 |
| FP8 Performance | 724 TFLOPS | |
| FP16 Performance | 362 TFLOPS | 125 TFLOPS |
| FP32 Performance | 91 TFLOPS | 15.7 TFLOPS |
| FP64 Performance | 1.4 TFLOPS | 7.8 TFLOPS |
| INT8 Performance | 724 TOPS | |
| Memory Bandwidth | 864 GB/s | 900 GB/s |
Performance Analysis
The L40S dominates in compute performance: its 362 TFLOPS FP16 rating delivers nearly three times the V100's 125 TFLOPS, accelerating mixed-precision training and inference for deep learning models. FP32 throughput reaches 91 TFLOPS on the L40S, over five times the V100's 15.7 TFLOPS, benefiting simulations and graphics rendering that require single-precision accuracy. The L40S's FP8 capability at 724 TFLOPS further enhances low-precision inference, enabling faster deployment of quantized large language models.
Memory capacity proves decisive for real-world workloads. The L40S's 48 GB GDDR6X supports larger batch sizes and models that exceed the V100's 16-32 GB HBM2 limit, reducing the need for model parallelism. Although V100 edges bandwidth at 900 GB/s over 864 GB/s, the L40S's greater VRAM mitigates bottlenecks in data-intensive tasks like training transformers, allowing sustained high utilization without frequent swapping.
Power and form factors influence scalability. Both GPUs have comparable TDPs, 350W for L40S and 300W for V100, but L40S's PCIe-only design simplifies integration versus V100's SXM2 or PCIe options with NVLink. PCIe 4.0 on L40S provides double the bandwidth of V100's PCIe 3.0, improving multi-GPU training efficiency.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
V100
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA Tesla V100 16GB 16GB VRAM | 16GB | 0 vCPU 0GB RAM | Texas | $0.19/GPU/hr | Available | ||
![]() TensorDock | NVIDIA Tesla V100 16GB 16GB VRAM | 16GB | 0 vCPU 0GB RAM | New York City | $0.19/GPU/hr | Available | ||
![]() TensorDock | NVIDIA Tesla V100 32GB 32GB VRAM | 32GB | 0 vCPU 0GB RAM | Texas | $0.29/GPU/hr | Available | ||
![]() TensorDock | NVIDIA Tesla V100 32GB 32GB VRAM | 32GB | 0 vCPU 0GB RAM | New York City | $0.29/GPU/hr | Available | ||
![]() Lambda Labs | 8×NVIDIA Tesla V100 16GB 16GB VRAM | 16GB | 88 vCPU 448GB RAM 6041GB Storage | Texas | $0.79/GPU/hr $6.32/hr total (8×) | Available |
When to Choose the L40S
Opt for the L40S in scenarios demanding high VRAM and compute for modern AI. Its 48 GB GDDR6X handles large language models during training or inference, where the V100's 16-32 GB HBM2 falls short for batch sizes exceeding 32 GB. FP8 performance at 724 TFLOPS excels in quantized inference pipelines, delivering up to 5.8 times FP32 speed over V100's 15.7 TFLOPS.
The L40S suits graphics-intensive tasks like Stable Diffusion at scale, leveraging Ada Lovelace architecture for 362 TFLOPS FP16 versus V100's 125 TFLOPS.
When to Choose the V100
Choose the V100 for cost-sensitive legacy applications where 16-32 GB HBM2 suffices. Instances start at $0.05 per hour, ideal for prototyping or small-scale training not requiring over 32 GB VRAM. Its 900 GB/s bandwidth supports memory-bound scientific computing better than L40S's 864 GB/s in bandwidth-limited setups.
V100 fits environments valuing NVLink interconnect for multi-GPU legacy clusters, avoiding L40S's higher average pricing of $1.66 per hour.
Use Cases
L40S's 48 GB VRAM and 91 TFLOPS FP32 support larger models and batches compared to V100's 16-32 GB and 15.7 TFLOPS. FP16 at 362 TFLOPS accelerates training nearly three times faster.
FP8 performance of 724 TFLOPS on L40S optimizes quantized inference for high throughput. Greater VRAM handles full model loading unlike V100's limits.
L40S 362 TFLOPS FP16 speeds fine-tuning of large models, with 48 GB avoiding sharding needs on V100's 16-32 GB.
Ada Lovelace architecture and 48 GB VRAM enable high-resolution generation at 362 TFLOPS FP16, surpassing V100's capabilities.
V100's 900 GB/s bandwidth and NVLink suit memory-bound simulations at lower cost from $0.05 per hour. 15.7 TFLOPS FP32 suffices for many legacy codes.
Frequently Asked Questions
Which GPU has more VRAM: L40S or V100?▾
The L40S provides 48 GB GDDR6X VRAM, exceeding the V100's 16-32 GB HBM2. This capacity supports larger AI models without partitioning. V100 suits smaller workloads.
How do FP32 performance levels compare between L40S and V100?▾
L40S delivers 91 TFLOPS FP32, about 5.8 times the V100's 15.7 TFLOPS. This gap accelerates single-precision tasks like simulations. L40S excels in modern compute.
What are the current cloud pricing differences?▾
L40S starts from $1.65 per hour, averaging $1.66 across three offers. V100 begins at $0.05 per hour but averages $1.92 across six offers. V100 offers spot low-cost options.
Does V100 or L40S have higher memory bandwidth?▾
V100 achieves 900 GB/s with HBM2, slightly above L40S's 864 GB/s GDDR6X. Bandwidth aids data-heavy tasks on V100. L40S compensates with more VRAM.
What architectures power these GPUs?▾
L40S uses Ada Lovelace from 2023 for advanced AI features. V100 relies on Volta from 2017 with tensor cores. L40S provides newer optimizations.
Compare TDP and form factors of L40S vs V100.▾
L40S has 350W TDP in PCIe form, versus V100's 300W in SXM2 or PCIe. L40S suits standard racks; V100 enables dense NVLink clusters.
Which is cheaper to rent, the L40S or the V100?▾
Cloud rental prices for both the L40S and V100 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40S have compared to the V100?▾
The L40S has 48 GB of GDDR6X memory. The V100 has 16 to 32 GB of HBM2 memory.
Can I find L40S and V100 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40S and the V100?▾
The L40S uses the Ada Lovelace architecture (2023) while the V100 uses Volta (2017). The V100 delivers 0.3x the FP16 throughput and 1.0x the memory bandwidth of the L40S.



