Whisper.cpp - NVIDIA RTX 3090 vs Apple M1 Max 24c GPU

Have you ever found yourself in the tech aisle, scratching your head and wondering, "Why should I pick the M1 Max when the PC is flexing its muscles with better test scores?" Well, my friends, it's a story worth telling, and it's all about what makes your tech heart happy!

Whisper.cpp - NVIDIA RTX 3090 vs Apple M1 Max 24c GPU

Excerpt

About a month ago, I finally made up my mind and purchased a used Mac Studio for about $1,200. It is the base model with M1 Max 10-core CPU, 24-core GPU, 32GB RAM, 512GB SSD and last but not least - standard 10GBASE-T with compatibility of 2.5G and 5GBASE-T.

The famed “Apple Experience” is no different on this machine. It is quiet, it is lightning fast for what I do daily, and it barely runs above 40 degrees C. I love everything about this machine, and I would probably say this is the best computer for $1,200.

What’s really interesting is that Apple has taken the Unified Memory approach to make a RAM pool which VRAM and system RAM is shared. This concept is nothing new since most of the modern CPUs that comes with an integrated graphics do use system RAM as the iGPU’s VRAM. However, no one has ever put such a powerful iGPU with their CPU, nor anyone has ever provided their iGPU with such a memory bandwidth at 400GB/s.

But, how does this compare with a “real” computer? Let’s find out.

Specs:

PC:

CPU: Ryzen 7 7700X 8-core

GPU: NVIDIA RTX 3090 24GB

RAM: 32GB DDR5-5600

SSD: Kingston KC3000 1TB

If you build it out exactly like mine, you are probably looking at $1,200.

Mac Studio:

CPU: M1 Max 10-core

GPU: 24 Core

RAM+VRAM: 32GB Integrated DDR5

SSD: Integrated 512GB + TB3 Enclosured Samsung 980 Pro 2TB

Tests:

This section will get upgraded over time, but the first entry will be to compare the performance when using Whisper.cpp to transcribe a 19-minute long podcast, with Chinese Mandarin and English spoken.

Whisper.cpp built-in benchmark

Configuration: 4 Threads, PC with CuBLAS and Mac with CoreML.

Result

Model: ggml-large-v3, lower is better
Category RTX 3090 M1 Max 24c GPU RTX 3090 Advantage
Load Time 2081.38 ms 1015.44 ms - 49%
Encode Time 133.96 ms 571.48 ms + 426%
Decode Time 3424.47 ms 3728.00 ms + 8%
Batchd Time 980.06 ms 2398.49 ms + 245%
Prompt Time 511.76 ms 2164.99 ms + 423%
Total Time 5050.95 ms 8863.60 ms + 175%

The first thing that caught my eye is how much faster the Mac Studio is able to load the model. That difference shaved 1 second of the total time required, and we will continue to see it outperforming in that category. However, that is about the only thing that the Mac can outperform the PC with a REAL GPU. The RTX 3090 obliterates the M1 Max 24c GPU in every single category that requires the raw compute of the GPU.

Whisper.cpp, 19 minutes audio transcribe, with Chinese Mandarin and English spoken.

This is more of a real world test with actual work loads to be handled.

Model: ggml-large-v3, lower is better
Category RTX 3090 M1 Max 24c GPU RTX 3090 Advantage
Load Time 2008.59 ms 1026.31 ms - 49%
Fallbacks 0p / 1h 3p / 23h N/A
Mel Time 382.32 ms 386.86 ms + 1%
Sample Time 6628.67 ms 6191.47 ms - 7%
Encode Time 5368.99 ms 24429.72 ms + 455%
Decode Time 2616.88 ms 751.55 ms - 72%
Batchd Time 69074.67 ms 244340.34 ms + 353%
Prompt Time 1542.44 ms 6026.75 ms + 390%
Total Time 87792.72 ms 283626.41 ms + 323%

As expected, when the workload runs longer, the initial load times can be irrelevant. In this real world test, the Mac Studio took almost 5 minutes to complete this 19 minutes long transcribe, while the RTX 3090 only took about one and a half. The speed difference is there, even Apple can’t bend the law of physics.

Ah, the eternal question: why choose the M1 Max when it seems to lag behind the PC in almost every test? I pondered over this a bit and realized it's all about understanding our own needs and preferences.

First up, let's chat about Windows. Oh, Windows 11, you quirky character! It's like that friend who means well but keeps tripping over their own feet. It's veered off from being the reliable workhorse we all knew, morphing into something of a billboard for Microsoft's other ventures. From its merry-go-round of updates to its dance of inconsistent UIs, Windows 11 is a bit like a variety show – you never know what you're getting next! And let's not forget those oh-so-energetic animations that seem more about flair than function. It's like a tech version of "trying too hard to be cool."

Now, let's switch gears to the Mac experience. Picture this: A serene workspace where your computer hums along quietly, almost whispering. That's the Mac Studio for you! Running tests on the PC feels like commanding a space shuttle – the roar of the fans, the RTX 3090 working overtime, guzzling power like it's going out of style. Now, compare that to the Mac Studio, which is more like a zen master, calm and composed, sipping just 80W of power. It's a stark contrast, isn't it?

So, while the PC, with its beefy CPU and GPU, flexes its muscles and shows off its power, there's something undeniably charming about the Mac Studio's quiet confidence. It's like choosing between a flashy sports car and a sleek, eco-friendly electric vehicle. Both have their allure, but in the end, it's about what brings joy and ease to your daily life.

In a nutshell, it's not always about raw power; sometimes, it's the quiet ones that make the biggest impact!