• 1 Post
  • 186 Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle
  • Oh, both! Yeah. I didn’t even think of that, but [AIT]/[AIP] as separate tags makes a lot of sense.

    I’d like being able to filter by either, actually.

    I guess two tags runs the risk of “rules too complex for some to follow,” but that’s more of a moderation load question. I have no say in that, heh.


  • For what it’s worth, I asked my self-hosted LLM (MiMo 2.5, no network access outside my desktop), and it came with [AIT] (AI-Topic).

    …I think that’s my favorite so far. [AIP] would work too.

    I feel like that “obfuscates” the tag enough to blunt impulse downvotes in /new and feeds, without being deceptive or anything.


  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldSelfhosted & AI
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    7 hours ago

    I’m not consistent about it yet, but because of exactly this, I’m trying to differentiate the two when I talk.

    Responsible automation? I use ML or machine learning.

    The grift consuming the world? A Tech Bro? “AI”

    I think one of the saddest things is the conflation between the two, like you can’t even talk about one without invoking the other. Or it opening up that whole ethical debate, when you’re just talking about, like, a 100M transcription model trained by one research in some university on a potato.



  • TBD indeed. But it will effectively ‘downrank’ posts and their visibility, maybe into the negative vote range. I’ve seen highly negative scores across the board in more machine-learning focused subs, and that’s without a tag that catches the eye so easily.

    I think even modifying the acronym could make a difference, though (as I ninja edited).


  • Also:

    Anything with an [AI] tag, first thing in the title, will have a drive-by downvote issue.

    Not sure how to deal with that, or if its even a concern.


    EDIT:

    Maybe it should be something else that’s not such a loaded keyword?

    [ML] for Machine Learning? [SAI]? [LAI]?

    I’ve been messing with ‘AI’ for a decade, and even I hate what the term has come to represent.



  • +1

    Home-AI oriented channels like Reddit’s localllama are filled with self promotion garbage, and more will trickle here over time… I’m not even against self promo or heavy coding assistance, but 9-times-out-of-10, the linked repo is nonsense, or straight-up fraudulent. And being obviously vibe-coded is a common tell.

    Good to get ahead of this.

    Also, +1 on supressing driveby insults. If the post is tagged up front, there’s no need. That being said, it should be okay for users to call out an obvious grift, or a “nonsense repo” that’s actually pure slop.


  • A 3060?

    Exllama/TabbyAPI is still worth looking at if you are trying to run a model purely in GPU RAM. It’s easily the most VRAM efficient backend, it just doesn’t support CPU offloading (which is useful for MoEs if you have considerable spare CPU RAM) and more optimized for 4xxx and up Nvidia cards.

    And TabbyAPI has a docker container you can use. Look for “exl3” models on huggingface.


  • If you’re using docker anyway, and “fast” pure GPU models, you might try a vllm container while you’re at it.

    It should be much faster than even llama.cpp, albeit at the cost of context length, and it supports some exotic 4-bit quantization like SPQA.

    Same with TabbyAPI. It’s quantization is SOTA, though it does not support CPU offloading, and it’s speed is somewhere between vllm and llama.cpp.


  • Mostly, yeah.

    Sometimes it’s better to “cut it close,” with (for instance) a 27B model that’s nearly OOMing your VRAM fully offloaded, but you know will be fine in regular use without too many programs open.

    In my case, with MiMo 2.5, it fills both my CPU and GPU RAM rather completely, so it’s best to set a static value so I don’t swap CPU RAM, and don’t OOM on the GPU either.





  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldDo you host your own AI?
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    edit-2
    3 days ago

    I completely disagree.

    Frankly, I find the description “VC funding a FOSS” offensive. They aren’t funding the engine. I’ve been messing with LLM inference engines since 2022, and Ollama is the worst I’ve seen in the community.

    They misname models for SEO. They leech off llama.cpp while deliberately hiding attribution yet redirecting GH support requests there. They sometimes make their own GGUFs+forked releases which are broken and incompatibile with upstream llama.cpp, just so they can get a release out a day ahead for hype, even though it doesn’t really work and they’ll never upstream one line. They set a default context size thats basically unusable, they screw up chat templates and deep internal code with no obvious indicators, they release suboptimal quants without iMatrix, they gate you into their internal quantization repo and model card format, they hide model downloads on your hard drive, they mess with standard APIs for no good reason other than to mess up other backends. I could go on and on.

    And if that’s all fine, they’re enshittifying the app with closed code, and pointers to cloud models.

    They GIVE LLM inference a bad name, by making it a terrible quality engine that happens to show up in search as the “default.” Hence the comments below of people being unimpressed with local inference. And they sap attention from actual llama.cpp devs, without contributing a single dime. Everyone in the localllama communtity hates their guts, and that’s not even getting into the interpersonal drama they’ve stirred.

    They are a leech that’s a net drag to the whole community, that we can’t get rid of because they’re attention grifters. And they’ve gotten worse and worse over time.


    It’s more morale to use any cloud API over Ollama, in my eyes. They’re a grift.


    EDIT: And, to be clear, I’m not against VC funded downstream stuff.

    LM Studio is good! Even though it’s closed source.

    Tons of downstream projects are great.




  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldDo you host your own AI?
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    edit-2
    3 days ago

    https://sleepingrobots.com/dreams/stop-using-ollama/

    And that’s not even all of it. Basically they break models in many ways, and they’re slimey Tech Bros.

    LM Studio is better, and easy.

    If you’re on Nvidia, and want to run optimally, I would use the ik_llama.cpp fork. On AMD, regular llama.cpp. On a Mac, use an MLX runner (Like LM Studio) with an MLX quant (ideally an MLX-DWQ quant).

    It’s all pretty technical, and… thats kinda the point. LLMs are just too performance sensitive and too finicky to not have a grasp of how they work. There is no “easy button” to run them without bad results, there can’t be.

    But if you don’t have time for that and just want to see if it’s worth it, I’d suggest self hosing your own UI, and trying the dirt cheap APIs of models you can theoretically run on your setup. This will give you a “best case” taste of what they’re capable of.