Lemmy search isn’t great, or I’m too new, and can’t tell if this has been posted here before.

  • JoeyJoeJoeJr@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    2 years ago

    I would imagine the source for most projects is hosted on GitHub, or similar platforms? Perhaps you could consider forks, stars, and followers as “votes” and sort each sub category based on the votes. I would imagine that would be scriptable - the script could be included in the awesome list repo, and run periodically. It would be kind of interesting to tag “releases” and see how the sort order changes over time. If you wanted to get fancy, the sorting could probably happen as part of a CI task.

    If workable, the obvious benefit is you don’t have to exclude anything for subjective reasons, but it’s easier for readers of the list to quickly find the “most used” options.

    Just an idea off the top of my head. You may have already thought about it, and/or it may be full of holes.

    • vegetaaaaaaa@lemmy.world
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      edit-2
      2 years ago

      would imagine that would be scriptable - the script could be included in the awesome list repo, and run periodically.

      The next version of the list will be based on https://github.com/awesome-selfhosted/awesome-selfhosted-data (raw YAML data), so much easier to integrate with scripts. There is already a CI system running at https://github.com/awesome-selfhosted/awesome-selfhosted-data/actions, and a preview of an enriched export at https://nodiscc.github.io/awesome-selfhosted-html-preview/ that take stars/last update dates and other metadata into account. This will all go live “soon”.

      Perhaps you could consider forks, stars, and followers as “votes” and sort each sub category based on the votes.
      it’s easier for readers of the list to quickly find the “most used” options.

      This would exclude (or move to the bottom of the list) all projects that are not hosted on these (mostly proprietary) platforms. Right now only metadata from Github is being parsed, in the future it will expand to Gitlab, maybe Gitea instances or similar, but it will take time and not all platforms have these stars/followers/forks features. This would also induce a huge bias as Github projects will have a lot more forks/followers/… than projects hosted on independent forges. Star counts can also (and absolutely are) manipulated by some projects that want to get “trending”.

      Also popularity != quality. A project whose code is hosted on cgit can be as good or even better than a project on Github (even more in the context of self-hosting…).

      Just an idea off the top of my head. You may have already thought about it, and/or it may be full of holes.

      It was a good idea :) But as you can see, it has its flaws.

      • JoeyJoeJoeJr@lemmy.ml
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        2 years ago

        it has its flaws.

        Yep yep. I was aware of some of what you pointed out - I think this might be a “perfect is the enemy of good” scenario, though. GitHub alone accounts for over 84% (based on the awesome-selfhosted-data repo):

        $ grep -r 'source_code_url' | cut -d ' ' -f 2 | cut -d '/' -f 3 | sort | uniq -c | sort -rn | head -n 15
           1068 github.com
             36 gitlab.com
              7 git.mills.io
              6 sourceforge.net
              6 framagit.org
              4 www.atlassian.com
              4 codeberg.org
              3 git.drupalcode.org
              3 git.cloudron.io
              2 repos.goffi.org
              2 git.tt-rss.org
              2 git.sr.ht
              2 cvsweb.openbsd.org
              1 yetishare.com
              1 www.wiz.cn
        
        $ python -c "print($(grep -r 'source_code_url' . | grep github.com | wc -l) / $(ls -1 | wc -l))"
        0.8422712933753943
        

        Adding in gitlab gets you to 87%:

        $ python -c "print($(grep -r 'source_code_url' . | grep -i -e github.com -e gitlab.com | wc -l) / $(ls -1 | wc -l))" 0.8706624605678234

        Also popularity != quality.

        True, but a thriving community generally means more resources, guides, etc, which can be important, especially for self-hosted solutions.

        In any case, the project is great, and much appreciated. Additionally, the enriched html version looks fantastic, and exposes most of the metadata* I’d want to see, regardless of how it’s sorted.

        *One other item to track, that I thought about after making my previous comment - number of contributors. It gives an additional data point on the size of the community, as well as an idea of how many people can be hit by busses before the continued development of the project gets called into question.