Just a stranger trying things.

  • 2 Posts
  • 33 Comments
Joined 2 years ago
cake
Cake day: July 16th, 2023

help-circle
  • The Hobbyist@lemmy.ziptoSelfhosted@lemmy.worldI installed Ollama. Now what?
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    edit-2
    9 days ago

    Ollama is very useful but also rather barebones. I recommend installing Open-Webui to manage models and conversations. It will also be useful if you want to tweak more advanced settings like system prompts, seed, temperature and others.

    You can install open-webui using docker or just pip, which is enough if you only care about serving yourself.

    Edit: open-webui also renders markdown, which makes formatting and reading much more appealing and useful.

    Edit2: you can also plug ollama into continue.dev, an extension to vscode which brings the LLM capabilities to your IDE.



  • The Hobbyist@lemmy.ziptoSelfhosted@lemmy.worldImmich - 2024 Recap 🎊
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    1 month ago

    I’m not saying it’s not true, but nowhere on that page is there the word donation. And if it is, the fact that it is described and a license, tied to a server or a user causes a lot of confusion to me, especially when combined with the fact that there is no paywall but that it requires registration.

    Why use the term license, server and user? Why not simply say donation and with the option of displaying the support by getting exclusive access to a badge like signal does?

    Again, I’m very happy immich is free, it is great software and it deserves support but this is just super confusing to me and the buy.immich.app link does not clarify things nor does that blog post.

    Edit: typo


  • Hi and thank you so much for the fantastic work on Immich! I’m hoping to get a chance to try it out soon, with the first stable release!

    One question on the financial support page: is it not a donation? There is a per server and a per user purchase, but I thought immich was exclusively self hosted, is it not? Or is this more like a way to say thanks while giving some hints as to how immich is being used privately? Or is there a way to actually pay to have immich host a server for one?

    Thanks for clarifying!







  • I didn’t say it can’t. But I’m not sure how well it is optimized for it. From my initial testing it queues queries and submits them one after another to the model, I have not seen it batch compute the queries, but maybe it’s a setup thing on my side. vLLM on the other hand is designed specifically for the multi co current user use case and has multiple optimizations for it.


  • The Hobbyist@lemmy.ziptoSelfhosted@lemmy.worldSelf-hosting LLMs
    link
    fedilink
    English
    arrow-up
    23
    ·
    edit-2
    3 months ago

    I run the Mistral-Nemo(12B) and Mistral-Small (22B) on my GPU and they are pretty code. As others have said, the GPU memory is one of the most limiting factors. 8B models are decent, 15-25B models are good and 70B+ models are excellent (solely based on my own experience). Go for q4_K models, as they will run many times faster than higher quantization with little performance degradation. They typically come in S (Small), M (Medium) and (Large) and take the largest which fits in your GPU memory. If you go below q4, you may see more severe and noticeable performance degradation.

    If you need to serve only one user at the time, ollama +Webui works great. If you need multiple users at the same time, check out vLLM.

    Edit: I’m simplifying it very much, but hopefully should it is simple and actionable as a starting point. I’ve also seen great stuff from Gemma2-27B

    Edit2: added links

    Edit3: a decent GPU regarding bang for buck IMO is the RTX 3060 with 12GB. It may be available on the used market for a decent price and offers a good amount of VRAM and GPU performance for the cost. I would like to propose AMD GPUs as they offer much more GPU mem for their price but they are not all as supported with ROCm and I’m not sure about the compatibility for these tools, so perhaps others can chime in.

    Edit4: you can also use openwebui with vscode with the continue.dev extension such that you can have a copilot type LLM in your editor.


  • Indeed, quite surprising. You got to “stroke their fur the right way” so to speak haha

    Also, I’m increasingly more impressed with the rapid progress reaching open-weights models: initially I was playing with Llama3.1-8B which is already quite useful for simple querries. Then lately I’ve been trying out Mistral-Nemo (12B) and Mistrall-Small (22B) and they are quite much more capable. I have a 12GB GPU and so far those are the most powerful models I can run decently. I’m using them to help me in writing tasks for ansible, learning the inner workings of the Linux kernel and some bootloader stuff. I find them quite helpful!



  • I have no idea if ollama can handle multi-GPU. The 70B in it’s q2_k quantized form requires already 26GB of memory, so you would need at least that to run it well and that would only imply it could be entirely run on GPU, which is the best case scenario, but not at what speed.

    I know some people with apple silicon who have enough memory to run the 70B model and for them it runs fast enough to be usable. You may be able to find more info about it online.





  • The interface called open-webui can run in a container, but ollama runs as a service on your system, from my understanding.

    The models are local and only answer queries by default. It all happens on the system without any additional tools. Now, if you want to give them internet access, you can, it is an option you have to setup and open-webui makes that possible though I have not tried it myself. I just see it.

    I have never heard of any llm “answer base queries offline before contacting their provider for support”. It’s almost impossible for the LLM to do it by itself without you setting things up for it that way.


  • whats great is that with ollama and webui, you can as easily run it all on one computer locally using the open-webui pip package or in a remote server using the container version of open-webui.

    Ive run both and the webui is really well done. It offers a number of advanced options, like the system prompt but also memory features, documents for RAG and even a built in python ide for when you want to execute python functions. You can even enable web browsing for your model.

    I’m personally very pleased with open-webui and ollama and they both work wonders together. Hoghly recommend it! And the latest llama3.1 (in 8 and 70B variants) and llama3.2 (in 1 and 3B variants) work very well, even on CPU only, for the latter! Give it a shot, it is so easy to set up :)