Ugh, apparently yesterday a bot visited my Forgejo instance and queried everything, which caused Forgejo to create repo archives for everything. Git on the instance is 2.1 GB in size, but the repo archive filled up everything and is 120 GB. I really didn’t expect such a spike.
That meant that it filled up the whole hard drive and the server and all the services and websites on it went down while I was sleeping.
Luckily it seems that just deleting that directory fixes the problem temporarily. I also disabled the possibility of downloading archived from the UI but I’m not sure if this will prevent bots from generating those archives again. I also can’t just make the directory read only because it uses it for other things like mirroring, etc too.
For small instances like mine those archives are quite a headache.
For now I asked chatgtp to help me to implement a simple return 403 on bot user agent. I looked into my logs and collected the bot names which I saw. I know it won’t hold forever but for now it’s quite nice, I just added this file to /etc/nginx/conf.d/block_bots.conf and it gets run before all the vhosts and rejects all bots. The rest just goes normally to the vhosts. This way I don’t need to implement it in each vhost seperatelly.
➜ jeena@Abraham conf.d cat block_bots.conf # /etc/nginx/conf.d/block_bots.conf # 1️⃣ Map user agents to $bad_bot map $http_user_agent $bad_bot { default 0; ~*SemrushBot 1; ~*AhrefsBot 1; ~*PetalBot 1; ~*YisouSpider 1; ~*Amazonbot 1; ~*VelenPublicWebCrawler 1; ~*DataForSeoBot 1; ~*Expanse,\ a\ Palo\ Alto\ Networks\ company 1; ~*BacklinksExtendedBot 1; ~*ClaudeBot 1; ~*OAI-SearchBot 1; ~*GPTBot 1; ~*meta-externalagent 1; } # 2️⃣ Global default server to block bad bots server { listen 80 default_server; listen [::]:80 default_server; listen 443 ssl default_server; listen [::]:443 ssl default_server; # dummy SSL cert for HTTPS ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem; ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key; # block bad bots if ($bad_bot) { return 403; } # close connection for anything else hitting default server return 444; }