Lemmy search isn’t great, or I’m too new, and can’t tell if this has been posted here before.

  • vegetaaaaaaa@lemmy.world
    link
    fedilink
    arrow-up
    46
    ·
    edit-2
    1 year ago

    awesome-selhosted maintainer here. This critique comes up often (and I sometimes agree…) but it’s hard to properly “fix”:

    Any rule that enforces some kind of “quality” guideline has to be explicitly written to the contribution guidelines to not waste submitters’ (and maintainers) time.

    As you can see there are already minimal rules in place (software has to be actively maintained, properly documented, first release must be older than 4 months, must of course be fully Free and Open-source…). Anything more is very hard to word objectively or is plain unfair - in the last 7 years (!) maintaining the list I’ve spent countless hours thinking about it.

    For example, rejecting new projects because an existing/already listed one effectively does the same thing would give an unfair advantage to older projects, effectively “locking out” newer ones. Moreover, you will rarely find two projects that have the exact same feature set, workflow, release frequency, technical requirements… and every user has different needs and requirements, so yeah, users of the list are expected to do some research to find the best solution to their particular needs.

    This is of course, less true for some categories (why are there so many pastebins??). But again, it’s hard to find clear and objective criteria to determine what deserves to be listed and what does not.

    If we started rejecting projects because “I don’t have a need for it” or “I already use a somewhat equivalent solution and am not going to switch”, that would discard 90% of entries in the list (and not necessarily the worst ones). I do check that projects being added are in a “production-ready” state and ask more questions during reviews if needed. But it’s hard to be more selective than we already are, without falling in subjective “I like/I don’t like” reasoning (let’s ban all Nodejs-based projects, npm is horrible and a security liability. Let’s also ban all projects that are so convoluted and impossible to build and install properly that Docker is the only installation option. Follow my thoughts?)

    Also, Free Software has always been very fragmented, which is both a strength and a weakness. The list simply reflects that.

    Another idea I contemplated is linking each project to a “review” thread for the software in question. But I will not host or moderate such a forum/review board, and it will be heavily brigaded by PR departments looking to promote their companies software.

    A HTML version is coming out soon (based on the same data) that will hopefully make the list easier to browse.

    I am open to other suggestions, keeping in mind the points above…

    250+ self hostable apps

    1268 exactly.

    You can help cleaning up the list of unmaintained projects by working on this issue

    • somedaysoon@lemmy.world
      link
      fedilink
      arrow-up
      7
      ·
      edit-2
      1 year ago

      I just wanted to give you my two cents and say that I appreciate the way you have it. And also thank you for all the thought you’ve put into it because I don’t want someone making subjective decisions for me and I’m glad you understand that position.

    • JoeyJoeJoeJr@lemmy.ml
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      I would imagine the source for most projects is hosted on GitHub, or similar platforms? Perhaps you could consider forks, stars, and followers as “votes” and sort each sub category based on the votes. I would imagine that would be scriptable - the script could be included in the awesome list repo, and run periodically. It would be kind of interesting to tag “releases” and see how the sort order changes over time. If you wanted to get fancy, the sorting could probably happen as part of a CI task.

      If workable, the obvious benefit is you don’t have to exclude anything for subjective reasons, but it’s easier for readers of the list to quickly find the “most used” options.

      Just an idea off the top of my head. You may have already thought about it, and/or it may be full of holes.

      • vegetaaaaaaa@lemmy.world
        link
        fedilink
        arrow-up
        7
        arrow-down
        1
        ·
        edit-2
        1 year ago

        would imagine that would be scriptable - the script could be included in the awesome list repo, and run periodically.

        The next version of the list will be based on https://github.com/awesome-selfhosted/awesome-selfhosted-data (raw YAML data), so much easier to integrate with scripts. There is already a CI system running at https://github.com/awesome-selfhosted/awesome-selfhosted-data/actions, and a preview of an enriched export at https://nodiscc.github.io/awesome-selfhosted-html-preview/ that take stars/last update dates and other metadata into account. This will all go live “soon”.

        Perhaps you could consider forks, stars, and followers as “votes” and sort each sub category based on the votes.
        it’s easier for readers of the list to quickly find the “most used” options.

        This would exclude (or move to the bottom of the list) all projects that are not hosted on these (mostly proprietary) platforms. Right now only metadata from Github is being parsed, in the future it will expand to Gitlab, maybe Gitea instances or similar, but it will take time and not all platforms have these stars/followers/forks features. This would also induce a huge bias as Github projects will have a lot more forks/followers/… than projects hosted on independent forges. Star counts can also (and absolutely are) manipulated by some projects that want to get “trending”.

        Also popularity != quality. A project whose code is hosted on cgit can be as good or even better than a project on Github (even more in the context of self-hosting…).

        Just an idea off the top of my head. You may have already thought about it, and/or it may be full of holes.

        It was a good idea :) But as you can see, it has its flaws.

        • JoeyJoeJoeJr@lemmy.ml
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          edit-2
          1 year ago

          it has its flaws.

          Yep yep. I was aware of some of what you pointed out - I think this might be a “perfect is the enemy of good” scenario, though. GitHub alone accounts for over 84% (based on the awesome-selfhosted-data repo):

          $ grep -r 'source_code_url' | cut -d ' ' -f 2 | cut -d '/' -f 3 | sort | uniq -c | sort -rn | head -n 15
             1068 github.com
               36 gitlab.com
                7 git.mills.io
                6 sourceforge.net
                6 framagit.org
                4 www.atlassian.com
                4 codeberg.org
                3 git.drupalcode.org
                3 git.cloudron.io
                2 repos.goffi.org
                2 git.tt-rss.org
                2 git.sr.ht
                2 cvsweb.openbsd.org
                1 yetishare.com
                1 www.wiz.cn
          
          $ python -c "print($(grep -r 'source_code_url' . | grep github.com | wc -l) / $(ls -1 | wc -l))"
          0.8422712933753943
          

          Adding in gitlab gets you to 87%:

          $ python -c "print($(grep -r 'source_code_url' . | grep -i -e github.com -e gitlab.com | wc -l) / $(ls -1 | wc -l))" 0.8706624605678234

          Also popularity != quality.

          True, but a thriving community generally means more resources, guides, etc, which can be important, especially for self-hosted solutions.

          In any case, the project is great, and much appreciated. Additionally, the enriched html version looks fantastic, and exposes most of the metadata* I’d want to see, regardless of how it’s sorted.

          *One other item to track, that I thought about after making my previous comment - number of contributors. It gives an additional data point on the size of the community, as well as an idea of how many people can be hit by busses before the continued development of the project gets called into question.