Machine Learning & AI on Apple Silicon
For on-device ML inference, Neural Engine TOPS is the headline number—it determines how quickly models like Stable Diffusion or local LLMs generate output. For training and fine-tuning, unified memory size is often the bottleneck: larger models need more RAM, and Apple Silicon's unified architecture lets the GPU and NPU share the full pool. Memory bandwidth governs how fast weights move through the system, directly impacting tokens-per-second for LLM inference.
Laptops
The M5 Max delivers 61 NPU TOPS and up to 128 GB of unified memory with 614 GB/s bandwidth—enough to run 70B-parameter LLMs on a laptop. The M5 Pro is a practical choice for models up to ~30B parameters. The base M5 handles smaller models and inference workloads well but tops out at 32 GB.
| Spec | M5 10c CPU / 10c GPU |
M5 10c CPU / 8c GPU |
A18 Pro 6c CPU / 5c GPU |
M5 Max 18c CPU / 40c GPU |
M5 Max 18c CPU / 32c GPU |
M5 Pro 18c CPU / 20c GPU |
M5 Pro 15c CPU / 16c GPU |
|---|---|---|---|---|---|---|---|
| Devices |
MacBook Air 15″ MacBook Air 13″ MacBook Pro 14″ iPad Pro 13″ iPad Pro 11″ Apple Vision Pro |
MacBook Air 13″ | MacBook Neo |
MacBook Pro 16″ MacBook Pro 14″ |
MacBook Pro 16″ MacBook Pro 14″ |
MacBook Pro 16″ MacBook Pro 14″ |
MacBook Pro 14″ |
| Neural Engine cores | 16 | 16 | 16 | 16 | 16 | 16 | 16 |
| Neural Engine TOPS | 61 | 61 | 35 | 61 | 61 | 61 | 61 |
| CPU Cores | 10 | 10 | 6 | 18 | 18 | 18 | 15 |
| Super Cores | 4 | 4 | – | 6 | 6 | 6 | 5 |
| Performance Cores | – | – | 2 | 12 | 12 | 12 | 10 |
| GPU cores | 10 | 8 | 5 | 40 | 32 | 20 | 16 |
| TFLOPS | 5.13 | 4.11 | – | 20.53 | 16.42 | 10.27 | 8.21 |
| Memory bandwidth (GB/s) | 153.6 | 153.6 | 60 | 614 | 460 | 307 | 307 |
| Memory type | LPDDR5X-9600 | LPDDR5X-9600 | LPDDR5 | LPDDR5X-9600 | LPDDR5X-9600 | LPDDR5X-9600 | LPDDR5X-9600 |
| Memory options (GB) |
16 24 32 |
16 24 32 |
8 |
48 64 128 |
36 |
24 48 64 |
24 48 |
Desktops
For larger models and training workloads, desktop Macs offer more memory headroom. The M3 Ultra in the Mac Studio supports up to 256 GB unified memory. If your models fit in 128 GB, the M4 Max delivers better per-TOPS performance at a lower price. The M4 in the iMac and Mac mini is suitable for inference on smaller models (up to ~13B parameters).
| Spec | M4 10c CPU / 10c GPU |
M4 8c CPU / 8c GPU |
M4 Pro 14c CPU / 20c GPU |
M4 Pro 12c CPU / 16c GPU |
M4 Max 16c CPU / 40c GPU |
M4 Max 14c CPU / 32c GPU |
M3 Ultra 32c CPU / 80c GPU |
M3 Ultra 28c CPU / 60c GPU |
|---|---|---|---|---|---|---|---|---|
| Devices |
iMac Mac mini |
iMac | Mac mini | Mac mini | Mac Studio | Mac Studio | Mac Studio | Mac Studio |
| Neural Engine cores | 16 | 16 | 16 | 16 | 16 | 16 | 32 | 32 |
| Neural Engine TOPS | 38 | 38 | 38 | 38 | 38 | 38 | 36 | 36 |
| CPU Cores | 10 | 8 | 14 | 12 | 16 | 14 | 32 | 28 |
| Performance Cores | 4 | 4 | 10 | 8 | 12 | 10 | 24 | 20 |
| GPU cores | 10 | 8 | 20 | 16 | 40 | 32 | 80 | 60 |
| TFLOPS | 4.26 | 3.41 | 8.52 | 6.82 | 17.04 | 13.64 | 28.262 | 21.197 |
| Memory bandwidth (GB/s) | 120 | 120 | 273 | 273 | 546 | 409.6 | 819.2 | 819.2 |
| Memory type | LPDDR5X-7500 | LPDDR5X-7500 | LPDDR5X-8533 | LPDDR5X-8533 | LPDDR5X-8533 | LPDDR5X-8533 | LPDDR5-6400 | LPDDR5-6400 |
| Memory options (GB) |
8 16 24 32 |
8 16 24 32 |
24 48 64 |
24 48 64 |
48 64 128 |
36 |
96 256 |
96 256 |