Hiya,
Recently upgraded my server to an i5-12400 CPU, and have neen wanting to push my server a bit. Been looking to host my own LLM tasks and workloads, such as building pipelines to scan open-source projects for vulnerabilities and insecure code, to mention one of the things I want to start doing. Inspiration for this started after reading the recent scannings of the Curl project.
Sidenote: I have no intention of swamping devs with AI bugreports, i will simply want to scan projects that i personally use to be aware of its current state and future changes, before i blindly update apps i host.
What budget friendly GPU should i be looking for? Afaik VRAM is quite important, higher the better. What other features do i need to be on the look out for?


Compute prices are not going down. If you want a budget friendly compute, then you may need to look for NPU’s. They are slower, and is unfortunately also getting more expensive pr TOPS. Electricity prices in the West are going up as a low output gets bought up by datacenters instead. The newest GPU’s are more efficient in re. to power pr Top, but they are not budget friendly.
I have not seen a recent comparison of GPU vs Npu; TOPS vs price vs power consumption, but @ aliexpress, I saw a small 40 TOPS Npu as a nvme stick with 16gb ram that draws 10 watts, or so (search for ‘ai accelerator nvme’). This little thing can scan files 24/7 and your i5 can help out in peaks. Afaik you can run a distributed model that also runs in, and uses your i5 memory/compute, so if you max out i5 memory, perhaps the combined compute is enough for a larger model ? Maybe a few Npu sticks can work together on the same model ?
Alternatively, you could look for the next version of a Huawei gpu card (first had baby kinks afaik), or one of the other upcoming producers from China. They’ll come faster and faster, but are earmarked for local consumption first.
Another suggestion is to buy one of the old P40/80 (from the ‘pascal’ chipset, I think) (or was it k40/80 ??). They should still support a descent range of modern quantization needs and often have 24gb ram. A refurbished miner card cost around 50-70$+ - cant remember exactly tho.
Lastly, you could try something different. If the i5, with enough memory, can run a large model slowly, you could add a dedicated KV cache, so most of the tokens won’t need to be recalculated. Memory and bandwidth are the most important here, but any old server could be upgraded to be a dedicated KV cache server (might need a network upgrade tho!).
Anyway, Ideas are easy and cheap. If I were you, I would ask an AI for a little python app where you can add a product and it returns a graph where it compares with other products and show optimality over time given prices/power and Tops - Gpu vs Npu. Good hunting…