Anubis is awesome and I want to talk about it

SmokeyDope@piefed.social · edit-2 3 months ago

Anubis is awesome and I want to talk about it

non_burglar@lemmy.world · 3 months ago

Anubis is an elegant solution to the ai bot scraper issue, I just wish the solution to everything wasn’t just spending compute everywhere. In a world where we need to rethink our energy consumption and generation, even on clients, this is a stupid use of computing power.

Leon@pawb.social · edit-2 3 months ago

It also doesn’t function without JavaScript. If you’re security or privacy conscious chances are not zero that you have JS disabled, in which case this presents a roadblock.

On the flip side of things, if you are a creator and you’d prefer to not make use of JS (there’s dozens of us) then forcing people to go through a JS “security check” feels kind of shit. The alternative is to just take the hammering, and that feels just as bad.

No hate on Anubis. Quite the opposite, really. It just sucks that we need it.

SmokeyDope@piefed.social · edit-2 3 months ago

Theres a compute option that doesnt require javascript. The responsibility lays on site owners to properly configure IMO, though you can make the argument its not default I guess.

https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh

From docs on Meta Refresh Method

Meta Refresh (No JavaScript)

The metarefresh challenge sends a browser a much simpler challenge that makes it refresh the page after a set period of time. This enables clients to pass challenges without executing JavaScript.

To use it in your Anubis configuration:

# Generic catchall rule
- name: generic-browser
  user_agent_regex: >-
    Mozilla|Opera
  action: CHALLENGE
  challenge:
    difficulty: 1 # Number of seconds to wait before refreshing the page
    algorithm: metarefresh # Specify a non-JS challenge method

This is not enabled by default while this method is tested and its false positive rate is ascertained. Many modern scrapers use headless Google Chrome, so this will have a much higher false positive rate.

z3rOR0ne@lemmy.ml · 3 months ago

Yeah I actually use the noscript extension and i refuse to just whitelist certain sites unless I’m very certain I trust them.

I run into Anubis checks all the time and while I appreciate the software, having to consistently temporarily whitelist these sites does get cumbersome at times. I hope they make this noJS implementation the default soon.

Prathas@lemmy.zip · edit-2 2 months ago

Wait, you keep temporarily allowing them over and over again? Why temporary?

Sincerely,
Another NoScript fan

quick_snail@feddit.nl · 3 months ago

This is why we need these sites to have .onions. Tor Browser has a PoW that doesn’t require js

Nate Cox@programming.dev · edit-2 1 month ago

deleted by creator

bitcrafter@programming.dev · 3 months ago

What would you propose as an alternative?

Nate Cox@programming.dev · edit-2 1 month ago

deleted by creator

cadekat@pawb.social · 3 months ago

Scarcity is what powers this type of challenge: you have to prove you spent a certain amount of electricity in exchange for access to the site, and because electricity isn’t free, this imposes a dollar cost on bots.

You could skip the detour through hashes/electricity and do something with a proof-of-stake cryptocurrency, and just pay for access. The site owner actually gets compensated instead of burning dead dinosaurs.

Obviously there are practical roadblocks to this today that a JavaScript proof-of-work challenge doesn’t face, but longer term…

Nate Cox@programming.dev · edit-2 1 month ago

[This comment has been deleted by an automated system]

termaxima@slrpnk.net · 3 months ago

I am very annoyed that I have to enable cloudflare’s JavaScript on so many websites, I would much prefer if more of them used Anubis so I didn’t have third-party JavaScript running as often.

( coming from an annoying user who tries to enable the fewest things possible in NoScript )

Deathray5@lemmynsfw.com · 3 months ago

Unrelated but one day I won’t get gender envy from random cartoon woman

Arghblarg@lemmy.ca · edit-2 3 months ago

I have a script that watches apache or caddy logs for poison link hits and a set of bot user agents, adding IPs to an ipset blacklist, blocking with iptables. I should polish it up for others to try. My list of unique IPs is well over 10k in just a few days.

git repos seem to be real bait for these damn AI scrapers.

0_o7@lemmy.dbzer0.com · 3 months ago

I don’t mind Anubis but the challenge page shouldn’t really load an image. It’s wasting extra bandwidth for nothing.

Just parse the challenge and move on.

Allero@lemmy.today · 3 months ago

Afaik, you can set it up not to have any image, or have any other one.

Voroxpete@sh.itjust.works · edit-2 3 months ago

It’s actually a brilliant monetization model. If you want to use it as is, it’s free, even for large corporate clients.

If you want to get rid of the puppygirls though, that’s when you have to pay.

Kilgore Trout@feddit.it · edit-2 3 months ago

It’s a palette of 10 colours. I would guess it uses an indexed colorspace, reducing the size to a minimum.
edit: 28 KB on disk

CameronDev@programming.dev · 3 months ago

A HTTP get request is a few hundred bytes. The response is 28KB. Thats 280x. If a large botnet wanted to denial of service an Anubis protected site, requesting that image could be enough.

Ideally, Anubis should serve as little data as possible until the POW is completed. Caching the POW algorithm (and the image) to a CDN would also mitigate the issue.

Kilgore Trout@feddit.it · 2 months ago

I might agree, still one could argue that brand recognisability is contributing to the service as well.

CameronDev@programming.dev · 2 months ago

Definitely, which is why i suggested hosting the image + js on a CDN. Keeps brand awareness, and lets the CDN take the brunt of any malicious activity. with a bit of code-golfing, the data served by Anubis directly prior to POW could be a few hundred bytes, without impacting its functionality.

sixty@sh.itjust.works · 3 months ago

Yeah im not gonna use this anime stuff

Nate Cox@programming.dev · edit-2 1 month ago

deleted by creator

Cyberflunk@lemmy.world · 3 months ago

thank you! this needed said.

This post is a bit critical of a small well-intentioned project, so I felt obliged to email the maintainer to discuss it before posting it online. I didn’t hear back.

i used to watch the dev on mastodon, they seemed pretty radicalized on killing AI, and anyone who uses it (kidding!!) i’m not even surprised you didn’t hear back

great take on the software, and as far as i can tell, playwright still works/completes the unit of work. at scale anubis still seems to work if you have popular content, but does hasnt stopped me using claude code + virtual browsers

im not actively testing it though. im probably very wrong about a few things, but i know anubis isn’t hindering my personal scraping, it does fuck up perplexity and chatgpt bots, which is fun to see.

good luck Blue team!

Nate Cox@programming.dev · edit-2 1 month ago

[This comment has been deleted by an automated system]

sudo@programming.dev · 3 months ago

I’ve repeatedly stated this before: Proof of Work bot-management is only Proof of Javascript bot-management. It is nothing to a headless browser to by-pass. Proof of JavaScript does work and will stop the vast majority of bot traffic. That’s how Anubis actually works. You don’t need to punish actual users by abusing their CPU. POW is a far higher cost on your actual users than the bots.

Last I checked Anubis has an JavaScript-less strategy called “Meta Refresh”. It first serves you a blank HTML page with a <meta> tag instructing the browser to refresh and load the real page. I highly advise using the Meta Refresh strategy. It should be the default.

I’m glad someone is finally making an open source and self hostable bot management solution. And I don’t give a shit about the cat-girls, nor should you. But Techaro admitted they had little idea what they were doing when they started and went for the “nuclear option”. Fuck Proof of Work. It was a Dead On Arrival idea decades ago. Techaro should strip it from Anubis.

I haven’t caught up with what’s new with Anubis, but if they want to get stricter bot-management, they should check for actual graphics acceleration.

rtxn@lemmy.world · edit-2 3 months ago

POW is a far higher cost on your actual users than the bots.

That sentence tells me that you either don’t understand or consciously ignore the purpose of Anubis. It’s not to punish the scrapers, or to block access to the website’s content. It is to reduce the load on the web server when it is flooded by scraper requests. Bots running headless Chrome can easily solve the challenge, but every second a client is working on the challenge is a second that the web server doesn’t have to waste CPU cycles on serving clankers.

POW is an inconvenience to users. The flood of scrapers is an existential threat to independent websites. And there is a simple fact that you conveniently ignored: it fucking works.

sudo@programming.dev · 3 months ago

Its like you didn’t understand anything I said. Anubis does work. I said it works. But it works because most AI crawlers don’t have a headless browser to solve the PoW. To operate efficiently at the high volume required, they use raw http requests. The vast majority are probably using basic python requests module.

You don’t need PoW to throttle general access to your site and that’s not the fundamental assumption of PoW. PoW assumes (incorrectly) that bots won’t pay the extra flops to scrape the website. But bots are paid to scape the website users aren’t. They’ll just scale horizontally and open more parallel connections. They have the money.

poVoq@slrpnk.net · 3 months ago

You are arguing a strawman. Anubis works because because most AI scrapers (currently) don’t want to spend extra on running headless chromium, and because it slightly incentivises AI scrapers to correctly identify themselves as such.

Most of the AI scraping is frankly just shoddy code written by careless people that don’t want to ddos the independent web, but can’t be bothered to actually fix that on their side.

sudo@programming.dev · edit-2 3 months ago

You are arguing a strawman. Anubis works because because most AI scrapers (currently) don’t want to spend extra on running headless chromium

WTF, That’s what I already said? That was my entire point from the start!? You don’t need PoW to force headless usage. Any JavaScript challenge will suffice. I even said the Meta Refresh challenge Anubis provides is sufficient and explicitly recommended it.

poVoq@slrpnk.net · 3 months ago

And how do you actually check for working JS in a way that can’t be easily spoofed? Hint: PoW is a good way to do that.

Meta refresh is a downgrade in usability for everyone but a tiny minority that has disabled JS.

sudo@programming.dev · 3 months ago

And how do you actually check for working JS in a way that can’t be easily spoofed? Hint: PoW is a good way to do that.

Accessing the browsers API in any way is way harder to spoof than some hashing. I already suggested checking if the browser has graphics acceleration. That would filter out the vast majority of headless browsers too. PoW is just math and is easy to spoof without running any JavaScript. You can even do it faster than real JavaScript users something like Rust or C.

Meta refresh is a downgrade in usability for everyone but a tiny minority that has disabled JS.

What are you talking about? It just refreshes the page without doing any of the extra computation that PoW does. What extra burden does it put on users?

poVoq@slrpnk.net · 3 months ago

If you check for GPU (not generally a bad idea) you will have the same people that currently complain about JS, complain about this breaking with their anti-fingerprinting browser addons.

But no, you can’t spoof PoW obviously, that’s the entire point of it. If you do the calculation in Javascript or not doesn’t really matter for it to work.

In the current shape Anubis has zero impact on usability for 99% of the site visitors, not so with meta refresh.

___qwertz___@feddit.org · 3 months ago

Funnily enough, PoW was a hot topic in academia around the late 90s / early 2000, and it’s somewhat clear that the autor of Anubis has not read much about the discussion back then.

There was a paper called “Proof of work does not work” (or similar, can’t be bothered to look it up) that argued that PoW can not work for spam protection, because you have to support both low-powered consumer devices while blocking spammers with heavy hardware. And that is very valid concern. Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user (e.g. only sending one email) has a low difficulty, while a spammer (sending thousands of emails) has a high difficulty.

The idea of blocking known bad actors actually is used in email quite a lot in forms of DNS block lists (DNSBLs) such as spamhaus (this has nothing to do with PoW, but such a distributed list could be used to determine PoW difficulty).

Anubis on the other hand does nothing like that and a bot developed to pass Anubis would do so trivially.

Sorry for long text.

ORbituary@lemmy.dbzer0.com · 3 months ago

It’s a great service. I hate the character.

CoyoteFacts@piefed.ca · 3 months ago

You can customize the images if you want: https://anubis.techaro.lol/docs/admin/botstopper#customizing-images

Nate Cox@programming.dev · edit-2 1 month ago

[This comment has been deleted by an automated system]

ORbituary@lemmy.dbzer0.com · 3 months ago

Not sure why you’re getting down votes for just asking a question.

Nate Cox@programming.dev · edit-2 1 month ago

deleted by creator

Lemminary@lemmy.world · edit-2 3 months ago

Not idol worship, rather, it’s silly to complain about JS when tools like NoScript allow you to selectively choose what runs instead of guessing what it is. It’s simply a documentation page like it says on the URL. I mean, they’re incredibly tame on the danger scale to leave your guard all the way up and instead take a jab at the entire community that had nothing to do with your personal choices.

Nate Cox@programming.dev · edit-2 1 month ago

[This comment has been deleted by an automated system]

A_norny_mousse@feddit.org · edit-2 3 months ago

At the time of commenting, this post is 8h old. I read all the top comments, many of them critical of Anubis.

I run a small website and don’t have problems with bots. Of course I know what a DDOS is - maybe that’s the only use case where something like Anubis would help, instead of the strictly server-side solution I deploy?

I use CrowdSec (it seems to work with caddy btw). It took a little setting up, but it does the job.
(I think it’s quite similar to fail2ban in what it does, plus community-updated blocklists)

Am I missing something here? Why wouldn’t that be enough? Why do I need to heckle my visitors?

Despite all that I still had a problem with bots knocking on my ports spamming my logs.

By the time Anubis gets to work, the knocking already happened so I don’t really understand this argument.

If the system is set up to reject a certain type of requests, these are microsecond transactions of no (DDOS exception) harm.

poVoq@slrpnk.net · edit-2 3 months ago

AI scraping is a massive issue for specific types of websites, such as git forges, wikis and to a lesser extend Lemmy etc, that rely on complex database operations that can not be easily cached. Unless you massively overprovision your infrastructure these web-applications come to a grinding halt by constantly maxing out the available CPU power.

The vast majority of the critical commenters here seem to talk from a point of total ignorance about this, or assume operators of such web applications have time for hyperviligance to constantly monitor and manually block AI scrapers (that do their best to circumvent more basic blocks). The realistic options for such operators are right now: Anubis (or similar), Cloudflare or shutting down their servers. Of these Anubis is clearly the least bad option.

chunes@lemmy.world · 3 months ago

Sounds like maybe webapps are a bad idea then.

If they need dynamism, how about releasing a desktop application?

quick_snail@feddit.nl · 3 months ago

Kinda sucks how it makes websites inaccessible to folks who have to disable JavaScript for security.

poVoq@slrpnk.net · 3 months ago

I kinda sucks how AI scrapers make websites inaccessible to everyone 🙄