Specifications Compared
| Spec | L40 | MI355X |
|---|---|---|
| TDP | 300W | 750W |
| VRAM | 48 GB | 288 GB |
| CUDA Cores | 18,176 | |
| Memory Type | GDDR6 | HBM3e |
| Architecture | Ada Lovelace | CDNA 4 |
| Form Factors | PCIe | OAM |
| Interconnect | Infinity Fabric | |
| Tensor Cores | 568 | |
| FP16 Performance | 90.5 TFLOPS | 2,300 TFLOPS |
| FP32 Performance | 90.5 TFLOPS | 2300 TFLOPS |
| INT8 Performance | 724 TOPS | 4,600 TOPS |
| Memory Bandwidth | 864 GB/s | 8,000 GB/s |
Performance Analysis
Compute performance differs dramatically between the GPUs: the L40 achieves 90.5 TFLOPS in FP16 and FP32, while the MI355X reaches 2300 TFLOPS in both formats. This 25x advantage accelerates neural network training, where FP16 dominates, and FP32 inference tasks. The MI355X FP8 capability at 4600 TFLOPS further boosts quantized inference speeds.
Memory specifications profoundly impact real-world usage. The MI355X 288 GB HBM3e VRAM supports massive models without multi-GPU splitting, unlike the L40 48 GB limit. Its 8000 GB/s bandwidth, versus 864 GB/s, sustains larger batch sizes and reduces data transfer bottlenecks during training epochs.
Power efficiency varies: L40 at 300W TDP suits dense clusters, but MI355X 750W demands robust cooling for its superior throughput.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40 48GB VRAM | 48GB | 8 vCPU 94GB RAM | 🌍global | $0.82/GPU/hr | |||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40 48GB VRAM | 48GB | 14 vCPU 72GB RAM 625GB Storage | Iowa | $0.86/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40 48GB VRAM | 48GB | 26 vCPU 144GB RAM 1250GB Storage | Iowa | $0.86/GPU/hr $1.72/hr total (2×) | Available |
When to Choose the L40
The L40 excels in cost-effective, readily available cloud deployments. Its pricing starts at $0.67 per hour with an average of $0.89 per hour across 14 offers, and 300W TDP fits standard PCIe form factors without specialized infrastructure. Choose it for workloads under 48 GB VRAM, such as moderate fine-tuning or inference serving multiple smaller models.
When to Choose the MI355X
The MI355X dominates large-scale AI training and inference. With 288 GB HBM3e VRAM and 2300 TFLOPS FP16, it processes enormous LLMs on single GPUs, supported by 8000 GB/s bandwidth for high batch sizes. Its OAM form factor and Infinity Fabric interconnect optimize multi-node HPC clusters, despite the 750W TDP.
Use Cases
MI355X 288 GB VRAM and 2300 TFLOPS FP16 support massive datasets and models without partitioning. L40 48 GB and 90.5 TFLOPS limit scale to smaller runs.
MI355X 4600 TFLOPS FP8 and 8000 GB/s bandwidth enable high-throughput serving of large models. L40 90.5 TFLOPS FP16 constrains batch sizes and speed.
L40 48 GB VRAM suffices for most fine-tuning under 90.5 TFLOPS FP32. MI355X 288 GB accelerates larger parameter counts at 2300 TFLOPS.
L40 48 GB GDDR6 and 864 GB/s bandwidth handle image generation efficiently at $0.67 per hour. MI355X overkill for typical 90.5 TFLOPS-equivalent needs.
MI355X 2300 TFLOPS FP32 and Infinity Fabric excel in simulations requiring high memory. L40 90.5 TFLOPS suits lighter HPC tasks.
Frequently Asked Questions
Which GPU has higher FP16 performance?▾
The MI355X delivers 2300 TFLOPS FP16, compared to the L40 90.5 TFLOPS. This provides over 25 times faster half-precision compute for AI training. FP32 matches at 2300 TFLOPS versus 90.5 TFLOPS.
What is the VRAM difference between L40 and MI355X?▾
MI355X features 288 GB HBM3e VRAM, six times the L40 48 GB GDDR6. This allows MI355X to load much larger models singly. Bandwidth reaches 8000 GB/s on MI355X versus 864 GB/s.
How do power requirements compare?▾
L40 TDP is 300W in PCIe form, while MI355X requires 750W in OAM. L40 suits efficient clusters; MI355X needs advanced cooling. Interconnect is Infinity Fabric on MI355X.
Is MI355X available in cloud pricing?▾
No live offers exist for MI355X currently. L40 pricing starts at $0.67 per hour, averaging $0.89 per hour across 14 providers. MI355X launches in 2025.
What architectures power these GPUs?▾
L40 uses NVIDIA Ada Lovelace from 2023. MI355X employs AMD CDNA 4 from 2025. MI355X adds FP8 at 4600 TFLOPS absent on L40.
Which supports larger batch sizes better?▾
MI355X 8000 GB/s bandwidth and 288 GB VRAM enable significantly larger batches than L40 864 GB/s and 48 GB. This reduces training bottlenecks on MI355X.
Which is cheaper to rent, the L40 or the MI355X?▾
Cloud rental prices for both the L40 and MI355X vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40 have compared to the MI355X?▾
The L40 has 48 GB of GDDR6 memory. The MI355X has 288 GB of HBM3e memory.
Can I find L40 and MI355X GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40 and the MI355X?▾
The L40 uses the Ada Lovelace architecture (2023) while the MI355X uses CDNA 4 (2025). The MI355X delivers 25.4x the FP16 throughput and 9.3x the memory bandwidth of the L40.


