Thanks! Hadn’t thought of YouTube at all but it’s super helpful. I guess that’ll help me decide if the extra Ram is worth it considering that inference will be much slower if I don’t go NVIDIA.
Thanks! Hadn’t thought of YouTube at all but it’s super helpful. I guess that’ll help me decide if the extra Ram is worth it considering that inference will be much slower if I don’t go NVIDIA.
Yeah I was thinking about running something like Code Qwen 72B which apparently requires 145GB Ram to run the full model. But if it’s super slow especially with large context and I can only run small models at acceptable speed anyway it may be worth going NVIDIA alone for CUDA.
Meh, ofc I don’t.
Thanks, that’s very helpful! Will look into that type of build
I understand what you’re saying but I’m coming to this community because I like having more input, hear about the experience of others and potentially learn about things I didn’t know about. I wouldn’t ask specifically in this community if I wouldn’t want to optimize my setup as much as I can.
Interesting, is there any kind of model you could run at reasonable speed?
I guess over time it could amortize but if the usability sucks that may make it not worth it. OTOH really don’t want to send my data to any company.
I’d honestly be open for that but would an AMD setup not take up a lot of space and consume lots of power / be loud?
It seems like in terms of price & speed, the Macs suck compared to other options, but if you don’t have a lot of space and don’t want to hear an airplane engine constantly I’m wondering if there are options.
Yeah the VRAM of Mac M series is very attractive for running models at full context length and the memory bandwidth is quite good for token generation compared to the price, power consumption and heat generation of NVidia GPUs.
Since I’ll have to put this in my kitchen/living room that’d be a big plus but idk how well prompt processing would work if I send over like 80k tokens.
So would this work well e.g. with the the *arr stack? Because most of the services wouldn’t even need to run always
I’m intrigued! But how does it compare to React which is pretty straight forward? I’m not a frontend dev so what’s really great about React is that it works super well with LLMs.
Thanks! Super helpful and I’d love to have the compose and install script. I also looked into the Helm charts but still wondering if I should go down that route or not eventually.
Thanks! What about CPU usage, how many CPUs did you assign to the environment you run the container in?
deleted by creator
Thanks! What resources are you running it on? I’m looking into a VPS that could host it and ChatGPT recommends 4-8 vCPUs and 16 GB Ram, which sounds reasonable. But let’s say I’m running it on k8s does that leave any room for e.g. running other services on the same cluster?
Thank you! I’m running a Servarr over Docker Compose, and have managed some Kubernetes clusters in the past (although poorly tbh). Any idea how complicated that is in comparison? Also, do you use their Helm charts?
I would deploy the whole app over k8s Helm charts and I would want to use the CI/CD tools and also do Traefik/Ingress for load balancing and having cloudflare point at it. In the future I might be collaborating with other people so I would want the architecture to be solid.
I would actually want to use it to integrate with k8s. I would deploy the app on Kubernetes and do load balancing + pointing at a Cloudflare domain so I would need the whole thing to be solid. I think I do need a lot of the features, but I don’t think I necessarily need to have GitLab if something FOSS could offer the same.
Thanks! May I ask what kind of setup you were running and if there’s any feature you might be missing that existed in GitLab but doesn’t in Forgejo?
Thanks! This looks actually really interesting. Did you try doing CI/CD with it? In future I would probably collaborate with others who’d be also using my self-hosted Git. What would be critical for me is that I can set it up in a way that once I open a PR that branch automatically gets deployed to a dev Kubernetes environment and when I merge with main that it automatically deploys to staging and only when I release a tag the branch would end up in prod. Also I’d like to do secrets management over the platform. I like that Forgejo would be non-commercial and I would prefer it over GitLab if it can do these things well.
Yeah I found some stats now and indeed you’re gonna wait like an hour to process if you throw like 80-100k token into a powerful model. With APIs that kinda works instantly, not surprising but just to give a comparison. Bummer.