𞋴𝛂𝛋𝛆

  • 5 Posts
  • 55 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle
  • Yeah, I’m kinda volunteering for the mod part. In truth I think it would take the respective instance admin setting up such a thing specifically. Like create a throwaway or something so that the actual user is not propagated to other admin or the full activity pub feed being transported. The one instance admin would know and have the ability to filter or block, but that information would never escape the one server. As a mod I would be blind to actual potential bad actors and only filter at the liberal community and comments level. So basically a normal community that replaces the OP name with Anon, and never shares the real ID with anyone.


  • I keep seeing people go to the effort of creating a throwaway account to say or post stuff they want or need to externalize on the threadiverse. I’m willing to bet that for every person that goes to that much effort, there are likely somewhere between 10-100 people that lack an outlet and motivation to do the same. Greentext is just a mutual pretext on my part for genuinely caring about people under pressure right now and in need of an outlet in a way that is not really well supported by the fediverse or activity pub.

    We are small enough here that regular names and people can hold meaning in familiarity and memorable history. Kind words and social interaction anonymously from these may hold considerably more value and meaning within this social dynamic that is not afforded elsewhere.






  • Not necessarily. Like I don’t have my YT stuff stored anywhere any more.

    Shorter format stuff – sure, and that seems to be the only focus really for peertube now. Most of the YT stuff I posted was like bits and pieces of my journey of creating a product photography studio and progress I was making while still in my collar with a broken neck. I also made electrical hobby and bicycle stuff. I typically uploaded long format with 20-40 minutes detailing what I tried and what did or did not work when fixing stuff that is supposed to be unserviceable or undocumented and like reverse engineering type content. Some of those proved to be a reference I used many years later. My digital storage has never been at a very high quality level. Most of my motivation is like here on Lemmy; I want to share and just be a little social while maybe providing some useful tidbit that helps someone. I’d rather relegate that digital archiving to someone else mostly because my life has never been well supported or super stable.


  • We probably need to also get more of us actually uploading to peertube and posting stuff here with better integration.

    First step is streamlining account creation and uploading. Is there a post goto for how to sign up? What servers are stable versus maybe not so much? Really useful video content is a major undertaking for technically useful stuff. I did several on YT in the past and some in the hundreds of thousands of views about how to fix or hack stuff where I was the only source posted. Editing something well is at least 1 hour per minute, and twice that with a good setup and recording. So like, I’d be far more bummed if that stuff got lost by instances disappearing. That is probably the biggest hesitation I have had. IMO, useful original content is the holy grail for this kind of thing, or maybe that is just my perspective bias.



  • Thanks for the cross post.

    Citations needed on mod tool complaints. I mod one of the largest communities on Lemmy. In 2 years I’ve had around a couple dozen times that required actual mod stuff. The tools are perfectly adequate for the volume of users in my opinion.

    We all took it a little hard when some regular users left. I get that. There will always be people coming and going for various reasons.

    There is also always an issue with narcissists that tend to get involved with moderating for the wrong reasons.

    All humans are lazy at times. And all of us have a right to pick up an leave if we choose. Blaming the tools as a scapegoat for one’s laziness, or inadequacy, or to mask one’s financial limitations, seems to me like a narcissistic way to toss in the towel and check out, like an attempt to drag others down too.

    I wish those that want to leave all the best, and I’ll still be here hanging around if you ever want to come back, friend. Regardless , thanks for what you contributed to this place in the time we spent as digital neighbors.




  • I haven’t looked into the issue of PCIe lanes and the GPU.

    I don’t think it should matter with a smaller PCIe bus, in theory, if I understand correctly (unlikely). The only time a lot of data is transferred is when the model layers are initially loaded. Like with Oobabooga when I load a model, most of the time my desktop RAM monitor widget does not even have the time to refresh and tell me how much memory was used on the CPU side. What is loaded in the GPU is around 90% static. I have a script that monitors this so that I can tune the maximum number of layers. I leave overhead room for the context to build up over time but there are no major changes happening aside from initial loading. One just sets the number of layers to offload on the GPU and loads the model. However many seconds that takes is irrelevant startup delay that only happens once when initiating the server.

    So assuming the kernel modules and hardware support the more narrow bandwidth, it should work… I think. There are laptops that have options for an external FireWire GPU too, so I don’t think the PCIe bus is too baked in.


  • 𞋴𝛂𝛋𝛆@lemmy.worldtoSelfhosted@lemmy.worldConsumer GPUs to run LLMs
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    edit-2
    3 months ago
    Anything under 16 is a no go. Your number of CPU cores are important. Use Oobabooga Textgen for an advanced llama.cpp setup that splits between the CPU and GPU. You'll need at least 64 GB of RAM or be willing to offload layers using the NVME with deepspeed. I can run up to a 72b model with 4 bit quantization in GGUF with a 12700 laptop with a mobile 3080Ti which has 16GB of VRAM (mobile is like that).

    I prefer to run a 8×7b mixture of experts model because only 2 of the 8 are ever running at the same time. I am running that in 4 bit quantized GGUF and it takes 56 GB total to load. Once loaded it is about like a 13b model for speed but is ~90% of the capabilities of a 70b. The streaming speed is faster than my fastest reading pace.

    A 70b model streams at my slowest tenable reading pace.

    Both of these options are exponentially more capable than any of the smaller model sizes even if you screw around with training. Unfortunately, this streaming speed is still pretty slow for most advanced agentic stuff. Maybe if I had 24 to 48gb it would be different, I cannot say. If I was building now, I would be looking at what hardware options have the largest L1 cache, the most cores that include the most advanced AVX instructions. Generally, anything with efficiency cores are removing AVX and because the CPU schedulers in kernels are usually unable to handle this asymmetry consumer junk has poor AVX support. It is quite likely that all the problems Intel has had in recent years has been due to how they tried to block consumer stuff from accessing the advanced P-core instructions that were only blocked in microcode. It requires disabling the e-cores or setting up a CPU set isolation in Linux or BSD distros.

    You need good Linux support even if you run windows. Most good and advanced stuff with AI will be done with WSL if you haven’t ditched doz for whatever reason. Use https://linux-hardware.org/ to see support for devices.

    The reason I mentioned avoid consumer e-cores is because there have been some articles popping up lately about all p-core hardware.

    The main constraint for the CPU is the L2 to L1 cache bus width. Researching this deeply may be beneficial.

    Splitting the load between multiple GPUs may be an option too. As of a year ago, the cheapest option for a 16 GB GPU in a machine was a second hand 12th gen Intel laptop with a 3080Ti by a considerable margin when all of it is added up. It is noisy, gets hot, and I hate it many times, wishing I had gotten a server like setup for AI, but I have something and that is what matters.


  • 𞋴𝛂𝛋𝛆@lemmy.worldtoFediverse@lemmy.world*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    4 months ago

    I blocked NSQ bc of an active bot as a mod.

    Lemmy in general does not handle conceptual abstractions well at all. I think it is great to question the seemingly obvious subjects, and to poll user depth and intelligence regularly. I hate getting blindsided by someone asking stupid questions like this in real life and having to take the time to think out which of many angles I would like to address the issue from. I find it useful and healthy to see how others address such a question and how people respond to the various approaches. This is fundamental to the intuitive usefulness of NSQ and when that utility is hampered it effectively renders the community useless.

    I rather ineffectively volunteered to take over the community myself when I encountered poor moderation from a bot with no accountable individual to address. Instead I block the community and consider it an embarrassment to exist.