Hiya,

Recently upgraded my server to an i5-12400 CPU, and have neen wanting to push my server a bit. Been looking to host my own LLM tasks and workloads, such as building pipelines to scan open-source projects for vulnerabilities and insecure code, to mention one of the things I want to start doing. Inspiration for this started after reading the recent scannings of the Curl project.

Sidenote: I have no intention of swamping devs with AI bugreports, i will simply want to scan projects that i personally use to be aware of its current state and future changes, before i blindly update apps i host.

What budget friendly GPU should i be looking for? Afaik VRAM is quite important, higher the better. What other features do i need to be on the look out for?

  • Domi@lemmy.secnd.me
    link
    fedilink
    English
    arrow-up
    6
    ·
    2 hours ago

    Not sure if it counts as “budget friendly” but the best and cheapest method right now to run decently sized models is a Strix Halo machine like the Bosgame M5 or the Framework Desktop.

    Not only does it have 128GB of VRAM/RAM, it sips power at 10W idle and 120W full load.

    It can run models like gpt-oss-120b or glm-4.5-air (Q4/Q6) at full context length and even larger models like glm-4.6, qwen3-235b, or minimax-m2 at Q3 quantization.

    Running these models is otherwise not currently possible without putting 128GB of RAM in a server mainboard or paying the Nvidia tax to get a RTX 6000 Pro.

  • dieTasse@feddit.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 hour ago

    I bought used rtx 2060 12 Gigz vram edition for about 150 bucks and it runs pretty well. 4B models run well, I even ran 12B models and while its not the fastest experience, its still decent enough. Sad truth is that nvidia gpus are miles better than any other cards for ai even running linux.

  • Sims@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 hour ago

    Compute prices are not going down. If you want a budget friendly compute, then you may need to look for NPU’s. They are slower, and is unfortunately also getting more expensive pr TOPS. Electricity prices in the West are going up as a low output gets bought up by datacenters instead. The newest GPU’s are more efficient in re. to power pr Top, but they are not budget friendly.

    I have not seen a recent comparison of GPU vs Npu; TOPS vs price vs power consumption, but @ aliexpress, I saw a small 40 TOPS Npu as a nvme stick with 16gb ram that draws 10 watts, or so (search for ‘ai accelerator nvme’). This little thing can scan files 24/7 and your i5 can help out in peaks. Afaik you can run a distributed model that also runs in, and uses your i5 memory/compute, so if you max out i5 memory, perhaps the combined compute is enough for a larger model ? Maybe a few Npu sticks can work together on the same model ?

    Alternatively, you could look for the next version of a Huawei gpu card (first had baby kinks afaik), or one of the other upcoming producers from China. They’ll come faster and faster, but are earmarked for local consumption first.

    Another suggestion is to buy one of the old P40/80 (from the ‘pascal’ chipset, I think) (or was it k40/80 ??). They should still support a descent range of modern quantization needs and often have 24gb ram. A refurbished miner card cost around 50-70$+ - cant remember exactly tho.

    Lastly, you could try something different. If the i5, with enough memory, can run a large model slowly, you could add a dedicated KV cache, so most of the tokens won’t need to be recalculated. Memory and bandwidth are the most important here, but any old server could be upgraded to be a dedicated KV cache server (might need a network upgrade tho!).

    Anyway, Ideas are easy and cheap. If I were you, I would ask an AI for a little python app where you can add a product and it returns a graph where it compares with other products and show optimality over time given prices/power and Tops - Gpu vs Npu. Good hunting…

    • irmadlad@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 hours ago

      <giggle> I’ve self hosted a few of the bite sized LLM. The thing that’s keeping me from having a full blown, self hosted AI platform is my little GeForce 1650 just doesn’t have the ass to really do it up right. If I’m going to consult with AI, I want the answers within at least 3 or 4 minutes, not hours. LOL

      • Diplomjodler@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 hours ago

        Quite so. The cheapest card that I’d put any kind of real AI workload on is the 16GB Radeon 9060XT. That’s not what I would call budget friendly, which is why I consider a budget friendly AI GPU to be a mythical beast.

  • comrade_twisty@feddit.org
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    edit-2
    6 hours ago

    Afaik the budget friendliest local AI solutions currently are Mac Minis! Due to the CPU/GPU/RAM unified structure they are powerhouses for AI and astonishingly well priced for what they can put out.

    • troed@fedia.io
      link
      fedilink
      arrow-up
      4
      ·
      5 hours ago

      Agree, this is exactly what I went with recently in the same situation.

  • Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    4 hours ago

    Intel has some GPUs that are more cost effective than NVIDIA’s when it comes to VRAM.

    Arc A770 is selling for $370 in the US, and the new B50 for $399, both with 16GB.

    B60 has 24GB, but I’m not sure where to find it.

  • snekerpimp@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    5 hours ago

    Everyone is mentioning nvidia, but amds rocm has improved tremendously in the last few years, making a 6900xt 16gb an attractive option for me. I currently have a 6700xt 12gb that works no problem with ollama and comfyui, and an instinct mi25 16gb that works with some fiddling as well. From what I understand, an mi50 32gb requires less fiddling. However the instinct line is passively cooled, so finding a way to cool it might be a reason to stay away from them.

    Edit: I should add, my experience is on a few Linux distributions, I can not attest to the experience on windows.

  • drkt@scribe.disroot.org
    link
    fedilink
    English
    arrow-up
    9
    ·
    6 hours ago

    It’s all VRAM, that’s the bottleneck for even the best GPUs. AMD support is spotty so you should stay in Nvidia’s claws unless you know what you’re doing. Figure out what kind of money you’re willing to part with, and then get whatever Nvidia GPU gets you the most VRAM.

  • state_electrician@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    5
    ·
    6 hours ago

    I heard about people using multiple used 3090s in a single motherboard for this. Apparently it delivers a lot of bang for the buck, as compared to a single card with loads of VRAM.

      • MalReynolds@slrpnk.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        47 minutes ago

        Nah, NVLink is irrelevant for inference workloads (inference nearly all happens in the cards, models are split up over multiple and tokens are piped over pcie as necessary), mildly useful for training but you’ll get there without them.

  • poVoq@slrpnk.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    5 hours ago

    Recent models run surprisignly well on CPUs if you have sufficient regular RAM. You can also use a low VRAM GPU and offload parts to the CPU. If you are just starting out and want to play around I would try that first. 64gb system RAM is a good amount for that.

  • afk_strats@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    6 hours ago

    3090 24gb ($800 USD)

    3060 12gb x 2 if you have 2 pcie slots (<$400 USD)

    Radeon mi50 32gb with Vulkan (<$300 ) if you have more time, space, and will to tinker

  • marud@piefed.marud.fr
    link
    fedilink
    Français
    arrow-up
    2
    arrow-down
    1
    ·
    5 hours ago

    Don’t forget that the “budget friendly” card cost does not include the “non budget friendly” power bill that goes with it.