Specifications Compared
| Spec | L40S | MI355X |
|---|---|---|
| TDP | 350W | 750W |
| VRAM | 48 GB | 288 GB |
| CUDA Cores | 18,176 | |
| Memory Type | GDDR6X | HBM3e |
| Architecture | Ada Lovelace | CDNA 4 |
| Form Factors | PCIe | OAM |
| Interconnect | PCIe 4.0 | Infinity Fabric |
| Tensor Cores | 568 | |
| FP8 Performance | 724 TFLOPS | 4,600 TFLOPS |
| FP16 Performance | 362 TFLOPS | 2,300 TFLOPS |
| FP32 Performance | 91 TFLOPS | 2300 TFLOPS |
| FP64 Performance | 1.4 TFLOPS | 72 TFLOPS |
| INT8 Performance | 724 TOPS | 4,600 TOPS |
| Memory Bandwidth | 864 GB/s | 8,000 GB/s |
Performance Analysis
Raw compute performance positions the MI355X far ahead: its 2300 TFLOPS FP16 exceeds the L40S's 362 TFLOPS by over six times, and 2300 TFLOPS FP32 dwarfs the L40S's 91 TFLOPS by 25 times. FP8 reaches 4600 TFLOPS on the MI355X against 724 TFLOPS on the L40S. This delta impacts training and inference profoundly: the L40S suits FP16-dominant neural network training via tensor cores, but the MI355X's balanced FP16 and FP32 excels in mixed-precision training and FP32-heavy scientific simulations.
Memory specifications transform real-world usability: 288 GB HBM3e on the MI355X supports models exceeding 48 GB GDDR6X limits on the L40S, enabling single-GPU handling of massive LLMs. The 8000 GB/s bandwidth versus 864 GB/s allows vastly larger batch sizes, reducing training iterations and accelerating convergence by minimizing data bottlenecks.
Power and interconnects add context: the L40S's 350 W TDP enables denser racks than the MI355X's 750 W, while PCIe 4.0 offers broad compatibility against Infinity Fabric's specialized scaling. Overall, the MI355X prioritizes peak throughput for frontier workloads, while the L40S balances efficiency for production-scale inference.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40S
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | 4×NVIDIA L40S 48GB VRAM | 48GB | 46 vCPU 288GB RAM 2500GB Storage | Iowa | $0.88/GPU/hr $3.52/hr total (4×) | Available | ||
![]() Massed Compute | 2×NVIDIA L40S 48GB VRAM | 48GB | 24 vCPU 144GB RAM 1250GB Storage | Iowa | $0.88/GPU/hr $1.76/hr total (2×) | Available | ||
![]() Massed Compute | NVIDIA L40S 48GB VRAM | 48GB | 12 vCPU 72GB RAM 625GB Storage | Iowa | $0.88/GPU/hr | Available |
When to Choose the L40S
The L40S emerges as the superior choice for deployments requiring immediate availability and cost efficiency. With cloud pricing from $0.40 per hour and an average of $1.10 per hour across 18 offers, it avoids the MI355X's lack of live instances. Its 350 W TDP and PCIe form factor facilitate integration into existing PCIe 4.0 systems without specialized OAM support.
Current workloads like Stable Diffusion generation or fine-tuning models under 48 GB VRAM benefit from the L40S's 362 TFLOPS FP16 and 724 TFLOPS FP8, delivering reliable performance without overprovisioning power or memory.
When to Choose the MI355X
The MI355X stands out for workloads demanding extreme scale and memory capacity. Its 288 GB HBM3e VRAM handles LLMs that exceed the L40S's 48 GB limit, while 8000 GB/s bandwidth supports massive batch sizes in training.
High-compute tasks leverage 2300 TFLOPS FP16, 2300 TFLOPS FP32, and 4600 TFLOPS FP8, ideal for FP32-intensive scientific computing or next-generation inference at scale, despite the 750 W TDP and OAM form factor.
Use Cases
The MI355X's 288 GB HBM3e VRAM and 8000 GB/s bandwidth support massive batch sizes and models exceeding the L40S's 48 GB limit. Its 2300 TFLOPS FP16 outperforms the L40S's 362 TFLOPS for faster convergence.
4600 TFLOPS FP8 on the MI355X accelerates high-throughput inference for large models, surpassing the L40S's 724 TFLOPS. 288 GB VRAM enables deployment without multi-GPU sharding.
The L40S's 48 GB VRAM suffices for most fine-tuning tasks under that threshold, with immediate availability at $0.40 per hour. 362 TFLOPS FP16 handles efficient iterations without the MI355X's 750 W overhead.
Stable Diffusion models fit within 48 GB GDDR6X, and the L40S's 724 TFLOPS FP8 delivers fast generation. Lower 350 W TDP and PCIe compatibility suit creative workflows.
2300 TFLOPS FP32 on the MI355X excels in simulations requiring high precision, far beyond the L40S's 91 TFLOPS. Infinity Fabric aids multi-node scaling.
Frequently Asked Questions
What is the VRAM difference between L40S and MI355X?▾
The L40S provides 48 GB GDDR6X VRAM, while the MI355X offers 288 GB HBM3e. This sixfold increase enables the MI355X to load much larger models without distribution across multiple GPUs.
How do FP16 performance figures compare?▾
The MI355X achieves 2300 TFLOPS FP16, over six times the L40S's 362 TFLOPS. This gap accelerates AI training and inference on the MI355X for FP16-heavy workloads.
What are the current cloud prices for these GPUs?▾
L40S instances start at $0.40 per hour with an average of $1.10 per hour across 18 offers. The MI355X has no live cloud offers available yet.
Which GPU has higher memory bandwidth?▾
The MI355X delivers 8000 GB/s with HBM3e, compared to the L40S's 864 GB/s GDDR6X. Higher bandwidth on the MI355X supports larger batches and faster data transfer.
What are the TDP ratings?▾
The L40S consumes 350 W TDP, lower than the MI355X's 750 W. This makes the L40S more power-efficient for dense deployments.
Which is better for FP32 workloads?▾
The MI355X provides 2300 TFLOPS FP32, vastly superior to the L40S's 91 TFLOPS. It suits scientific computing and simulations needing high single-precision performance.
Which is cheaper to rent, the L40S or the MI355X?▾
Cloud rental prices for both the L40S and MI355X vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40S have compared to the MI355X?▾
The L40S has 48 GB of GDDR6X memory. The MI355X has 288 GB of HBM3e memory.
Can I find L40S and MI355X GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40S and the MI355X?▾
The L40S uses the Ada Lovelace architecture (2023) while the MI355X uses CDNA 4 (2025). The MI355X delivers 6.4x the FP16 throughput and 9.3x the memory bandwidth of the L40S.


