Specifications Compared
| Spec | L40 | MI325X |
|---|---|---|
| TDP | 300W | 750W |
| VRAM | 48 GB | 256 GB |
| CUDA Cores | 18,176 | |
| Memory Type | GDDR6 | HBM3e |
| Architecture | Ada Lovelace | CDNA 3 |
| Form Factors | PCIe | OAM |
| Interconnect | Infinity Fabric | |
| Tensor Cores | 568 | |
| FP16 Performance | 90.5 TFLOPS | 1,307 TFLOPS |
| FP32 Performance | 90.5 TFLOPS | 1307 TFLOPS |
| INT8 Performance | 724 TOPS | 2,614 TOPS |
| Memory Bandwidth | 864 GB/s | 6,000 GB/s |
Performance Analysis
The MI325X provides superior FP16 and FP32 performance at 1307 TFLOPS each, compared to the L40's 90.5 TFLOPS: this 14-fold increase translates to faster matrix multiplications essential for neural network training and inference. Training large language models benefits from such throughput, reducing epochs from days to hours on equivalent datasets.
Memory differences profoundly impact real-world usage. The MI325X's 256 GB HBM3e VRAM accommodates models with hundreds of billions of parameters on a single GPU, while the L40's 48 GB GDDR6 limits users to smaller models or multi-GPU setups. Coupled with 6000 GB/s bandwidth versus 864 GB/s, the MI325X sustains larger batch sizes without memory bottlenecks, enhancing training stability and throughput in data-intensive tasks.
Power and precision add nuance. The L40's 300W TDP enables denser deployments than the MI325X's 750W, but the latter's FP8 capability at 2614 TFLOPS optimizes inference for quantized models, yielding higher tokens per second in production serving.
Live Cloud Pricing
Real-time prices from 25+ providers. Updated every 60 seconds.
L40
| Provider | GPU Model | VRAM | Host Specs | Region | Price | Status | Action | |
|---|---|---|---|---|---|---|---|---|
![]() TensorDock | NVIDIA L40S 48GB VRAM | 48GB | 0 vCPU 0GB RAM | Wolverhampton | $0.55/GPU/hr | Available | ||
![]() RunPod | NVIDIA L40 48GB VRAM | 48GB | 8 vCPU 94GB RAM | 🌍global | $0.82/GPU/hr | |||
![]() RunPod | NVIDIA L40S 48GB VRAM | 48GB | 16 vCPU 94GB RAM | 🌍global | $0.86/GPU/hr | |||
![]() Massed Compute | NVIDIA L40 48GB VRAM | 48GB | 14 vCPU 72GB RAM 625GB Storage | Iowa | $0.86/GPU/hr | Available | ||
![]() Massed Compute | 2×NVIDIA L40 48GB VRAM | 48GB | 26 vCPU 144GB RAM 1250GB Storage | Iowa | $0.86/GPU/hr $1.72/hr total (2×) | Available |
When to Choose the L40
The L40 excels in scenarios demanding immediate availability and power efficiency. With cloud pricing starting at $0.67 per hour across 14 offers, it delivers cost-effective access for workloads fitting within 48 GB VRAM, such as fine-tuning mid-sized models or inference on Stable Diffusion. Its 300W TDP and PCIe form factor support easier integration into existing clusters without extensive cooling upgrades.
When to Choose the MI325X
The MI325X stands out for memory-constrained large-scale AI tasks. Its 256 GB HBM3e VRAM and 6000 GB/s bandwidth enable single-GPU handling of massive LLMs, supporting batch sizes infeasible on the L40's 48 GB GDDR6. The 1307 TFLOPS FP16/FP32 and 2614 TFLOPS FP8 accelerate training and quantized inference, ideal for research pushing model scales.
Use Cases
MI325X's 256 GB HBM3e VRAM and 1307 TFLOPS FP16 handle massive models and large batches single-GPU. L40's 48 GB limits scalability.
FP8 at 2614 TFLOPS and 6000 GB/s bandwidth on MI325X boost quantized serving throughput. L40 lacks FP8 support.
L40's 90.5 TFLOPS FP32 and 48 GB VRAM suffice for mid-sized models at $0.67 per hour. MI325X overkill for most cases.
L40's 48 GB GDDR6 meets image generation needs efficiently at 300W TDP. Higher specs on MI325X unnecessary.
MI325X's 6000 GB/s bandwidth and Infinity Fabric excel in data-parallel simulations. L40's 864 GB/s trails.
Frequently Asked Questions
What is the VRAM capacity of the L40 versus MI325X?▾
The L40 provides 48 GB GDDR6 VRAM. The MI325X offers 256 GB HBM3e, enabling five times more model parameters on a single device.
How do FP16 performance levels compare?▾
L40 achieves 90.5 TFLOPS FP16. MI325X reaches 1307 TFLOPS, a 14 times improvement for AI training acceleration.
What are the current cloud prices?▾
L40 starts at $0.67 per hour, averaging $0.89 across 14 offers. MI325X has no live offers available.
Which GPU has higher memory bandwidth?▾
MI325X delivers 6000 GB/s with HBM3e. L40 provides 864 GB/s GDDR6, nearly seven times less.
What are the TDP ratings?▾
L40 consumes 300W. MI325X requires 750W, demanding robust power and cooling infrastructure.
What form factors do they use?▾
L40 uses PCIe for standard server compatibility. MI325X employs OAM with Infinity Fabric for AMD ecosystems.
Which is cheaper to rent, the L40 or the MI325X?▾
Cloud rental prices for both the L40 and MI325X vary by provider, configuration, and availability. This page shows live pricing from 25+ providers updated every 60 seconds. Scroll to the Live Cloud Pricing section to compare current rates.
How much VRAM does the L40 have compared to the MI325X?▾
The L40 has 48 GB of GDDR6 memory. The MI325X has 256 GB of HBM3e memory.
Can I find L40 and MI325X GPUs available to rent right now?▾
Yes. This page shows real-time availability across 25+ cloud GPU providers. The Live Cloud Pricing section displays only in-stock offers with current pricing.
What is the main difference between the L40 and the MI325X?▾
The L40 uses the Ada Lovelace architecture (2023) while the MI325X uses CDNA 3 (2024). The MI325X delivers 14.4x the FP16 throughput and 6.9x the memory bandwidth of the L40.


