Hiya,

Recently upgraded my server to an i5-12400 CPU, and have neen wanting to push my server a bit. Been looking to host my own LLM tasks and workloads, such as building pipelines to scan open-source projects for vulnerabilities and insecure code, to mention one of the things I want to start doing. Inspiration for this started after reading the recent scannings of the Curl project.

Sidenote: I have no intention of swamping devs with AI bugreports, i will simply want to scan projects that i personally use to be aware of its current state and future changes, before i blindly update apps i host.

What budget friendly GPU should i be looking for? Afaik VRAM is quite important, higher the better. What other features do i need to be on the look out for?

  • Domi@lemmy.secnd.me
    link
    fedilink
    English
    arrow-up
    7
    ·
    6 hours ago

    Not sure if it counts as “budget friendly” but the best and cheapest method right now to run decently sized models is a Strix Halo machine like the Bosgame M5 or the Framework Desktop.

    Not only does it have 128GB of VRAM/RAM, it sips power at 10W idle and 120W full load.

    It can run models like gpt-oss-120b or glm-4.5-air (Q4/Q6) at full context length and even larger models like glm-4.6, qwen3-235b, or minimax-m2 at Q3 quantization.

    Running these models is otherwise not currently possible without putting 128GB of RAM in a server mainboard or paying the Nvidia tax to get a RTX 6000 Pro.