Using Mac M2 Ultra 192GB to Self-Host LLMs?

shaserlark@sh.itjust.works · edit-2 1 year ago

Using Mac M2 Ultra 192GB to Self-Host LLMs?

just_another_person@lemmy.world · edit-2 1 year ago

I’ve not run such things on Apple hardware, so can’t speak to the functionality, but you’d definitely be able to do it cheaper with PC hardware.

The problem with this kind of setup is going to be heat. There are definitely cheaper minipcs, but I wouldn’t think they have the space for this much memory AND a GPU, so you’d be looking for an AMD APU/NPU combo maybe. You could easily build something about the size of a game console that does this for maybe $1.5k.

awesomesauce309@midwest.social · 1 year ago

For context length, vram is important, you can’t break contexts across memory pools so it would be limited to maybe 16gb. With m series you can have a lot more space since ram/vram are the same, but its ram at apple prices. You can get a +24gb setup way cheaper than some nvidia server card though

shaserlark@sh.itjust.works · 1 year ago

Yeah the VRAM of Mac M series is very attractive for running models at full context length and the memory bandwidth is quite good for token generation compared to the price, power consumption and heat generation of NVidia GPUs.

Since I’ll have to put this in my kitchen/living room that’d be a big plus but idk how well prompt processing would work if I send over like 80k tokens.

shaserlark@sh.itjust.works · 1 year ago

I’d honestly be open for that but would an AMD setup not take up a lot of space and consume lots of power / be loud?

It seems like in terms of price & speed, the Macs suck compared to other options, but if you don’t have a lot of space and don’t want to hear an airplane engine constantly I’m wondering if there are options.

just_another_person@lemmy.world · edit-2 1 year ago

~~I just looked, and the MM maxes out at 24G anyway. Not sure where you got the thought of 196GB at.~~ NVM you said m2 ultra

Look, you have two choices. Just pick one. Whichever is more cost effective and works for you is the winner. Talking it down to the Nth degree here isn’t going to help you with the actual barriers to entry you’ve put in place.

windowsphoneguy@feddit.org · 1 year ago

Mac Mini M4 Pro can be ordered with up to 64GB shared memory

shaserlark@sh.itjust.works · 1 year ago

I understand what you’re saying but I’m coming to this community because I like having more input, hear about the experience of others and potentially learn about things I didn’t know about. I wouldn’t ask specifically in this community if I wouldn’t want to optimize my setup as much as I can.

just_another_person@lemmy.world · 1 year ago

Here’s a quick idea of what you’d want in a PC build https://newegg.io/2d410e4

shaserlark@sh.itjust.works · 1 year ago

Thanks, that’s very helpful! Will look into that type of build

just_another_person@lemmy.world · 1 year ago

You can have a slightly bigger package in PC form and doing 4x the work for half the price. That’s the gist.

BorgDrone@lemmy.one · 1 year ago

you’d definitely be able to do it cheaper with PC hardware.

You can get a GPU with 192GB VRAM for less than a Mac? Sign me up please.

just_another_person@lemmy.world · 1 year ago

AMD APU uses whatever system RAM is as VRAM, so…yeah. NPU as well.

BorgDrone@lemmy.one · 1 year ago

And what is the memory bandwidth on these APUs?

just_another_person@lemmy.world · 1 year ago

As fast as it gets to the CPU. That should be pretty obvious.

BorgDrone@lemmy.one · 1 year ago

Which is how fast?

OhVenus_Baby@lemmy.ml · 1 year ago

Up to half of system RAM*