Specifications Compared
| Spec | L40S | T4 |
|---|---|---|
| TDP | 350W | 70W |
| VRAM | 48 GB | 16 GB |
| CUDA Cores | 18,176 | 2,560 |
| Memory Type | GDDR6X | GDDR6 |
| Architecture | Ada Lovelace | Turing |
| Form Factors | PCIe | PCIe |
| Interconnect | PCIe 4.0 | |
| Tensor Cores | 568 | 320 |
| FP8 Performance | 724 TFLOPS | |
| FP16 Performance | 362 TFLOPS | 8.1 TFLOPS |
| FP32 Performance | 91 TFLOPS | 8.1 TFLOPS |
| FP64 Performance | 1.4 TFLOPS | |
| INT8 Performance | 724 TOPS | 130 TOPS |
| Memory Bandwidth | 864 GB/s | 320 GB/s |
Performance Analysis
The L40S outperforms the T4 dramatically in floating-point compute: 362 TFLOPS FP16 versus 8.1 TFLOPS means over 44 times faster half-precision operations, ideal for AI training and inference. FP32 performance of 91 TFLOPS on the L40S contrasts with 8.1 TFLOPS on the T4, providing 11 times the single-precision throughput for scientific simulations. FP8 at 724 TFLOPS on the L40S further accelerates quantized inference models unavailable on the T4.
Memory differences profoundly affect real-world usage: 48 GB VRAM on the L40S supports batch sizes up to three times larger than the T4's 16 GB, reducing out-of-memory errors in large language models. The 864 GB/s bandwidth versus 320 GB/s enables 2.7 times faster data movement, minimizing bottlenecks in training loops and allowing higher throughput for diffusion models.
Power efficiency reveals trade-offs: the T4's 70W TDP suits dense deployments, but the L40S's 350W delivers far superior performance per watt in high-utilization scenarios, with cloud pricing from $0.40 per hour underscoring its value for intensive workloads.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | 4×NVIDIA L40S 48GB VRAM | 48GB | 46 vCPU 288GB RAM 2500GB Storage | Iowa | $0.88/GPU/hr $3.52/hr total (4×) | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
T4
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() AWS | NVIDIA Tesla T4 16GB VRAM | 16GB | 4 vCPU 16GB RAM | Virginia | $0.53/GPU/hr | |||
![]() AWS | NVIDIA Tesla T4 16GB VRAM | 16GB | 8 vCPU 32GB RAM | Virginia | $0.75/GPU/hr | |||
![]() AWS | 4×NVIDIA Tesla T4 16GB VRAM | 16GB | 48 vCPU 192GB RAM | Virginia | $0.98/GPU/hr $3.91/hr total (4×) | |||
![]() AWS | NVIDIA Tesla T4 16GB VRAM | 16GB | 16 vCPU 64GB RAM | Virginia | $1.20/GPU/hr | |||
![]() AWS | NVIDIA Tesla T4 16GB VRAM | 16GB | 32 vCPU 128GB RAM | Virginia | $2.18/GPU/hr |
When to Choose the L40S
Select the L40S for large-scale AI training or inference where 48 GB VRAM handles models exceeding 16 GB, such as 70B parameter LLMs. Its 362 TFLOPS FP16 performance accelerates fine-tuning by orders of magnitude over the T4's 8.1 TFLOPS, while 864 GB/s bandwidth supports massive batch sizes.
The L40S excels in generative tasks like Stable Diffusion, leveraging FP8 at 724 TFLOPS for rapid image generation, and offers better economics at $1.10 average hourly cost.
When to Choose the T4
Choose the T4 for lightweight inference on small models fitting within 16 GB VRAM, such as basic computer vision tasks, where its 70W TDP minimizes power costs in edge or dense server setups. The 320 GB/s bandwidth suffices for low-latency serving without the L40S's 350W draw.
It suits budget-conscious deployments for legacy applications, with pricing from $0.53 per hour providing adequate 8.1 TFLOPS FP16 for non-demanding workloads.
Use Cases
The L40S's 48 GB VRAM and 362 TFLOPS FP16 handle large datasets and models, far surpassing the T4's 16 GB and 8.1 TFLOPS.
724 TFLOPS FP8 and 864 GB/s bandwidth on the L40S enable high-throughput serving of large LLMs, unlike the T4's limited 8.1 TFLOPS.
91 TFLOPS FP32 and 48 GB VRAM support efficient fine-tuning of mid-to-large models, exceeding the T4's capabilities by over 11 times in FP32.
The L40S's high FP16 at 362 TFLOPS and ample VRAM generate images rapidly at scale, while the T4 struggles with memory constraints.
91 TFLOPS FP32 outperforms the T4's 8.1 TFLOPS for simulations, with 864 GB/s bandwidth accelerating data-heavy computations.
Frequently Asked Questions
What is the VRAM difference between L40S and T4?▾
The L40S provides 48 GB GDDR6X VRAM, three times the T4's 16 GB GDDR6. This allows the L40S to manage larger models without swapping. Batch sizes increase significantly on the L40S as a result.
Which GPU has higher performance in FP16?▾
The L40S achieves 362 TFLOPS FP16, over 44 times the T4's 8.1 TFLOPS. This gap accelerates AI training and inference workloads. Real-world throughput scales accordingly.
How do cloud prices compare?▾
L40S starts at $0.40 per hour with an average of $1.10 across 18 offers, cheaper than T4's $0.53 start and $1.66 average across 6 offers. Value favors the L40S for performance gains.
What are the power requirements?▾
The L40S has a 350W TDP, suited for high-performance servers, versus the T4's efficient 70W TDP for low-power deployments. Choose based on cooling and density needs.
Is the L40S compatible with PCIe systems?▾
Both use PCIe form factors, but the L40S employs PCIe 4.0 for faster interconnects. The T4 lacks specified interconnect details but fits standard PCIe slots.
Which is better for memory bandwidth?▾
The L40S delivers 864 GB/s, 2.7 times the T4's 320 GB/s. This reduces bottlenecks in data-intensive tasks like training.
Which is cheaper to rent, the L40S or the T4?▾
Cloud rental prices for both the L40S and T4 vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40S have compared to the T4?▾
The L40S has 48 GB of GDDR6X memory. The T4 has 16 GB of GDDR6 memory.
Can I find L40S and T4 GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40S and the T4?▾
The L40S uses the Ada Lovelace architecture (2023) while the T4 uses Turing (2018). The L40S delivers 44.7x the FP16 throughput and 2.7x the memory bandwidth of the T4.



