Maybe crowdsec could add a list for blocking scraping for LLMs
(Justin)
Tech nerd from Sweden
Maybe crowdsec could add a list for blocking scraping for LLMs
Ah, ok I see.
only need dedup if your data is duplicated
Nope, you don’t need any VPS to use it, it comes with an SFTP interface.
https://www.hetzner.com/storage/storage-box/
offsite backup for $2/TB and no download fees, 1/3rd the price of B2.
Hetzner storage box is super cheap and works with rclone. They have a web interface for configuring regular zfs snapshots too so you don’t have to worry about accidental deletions/ransomware.
Hardware-wise:
Software wise, too many projects to count lol
Renovate is a very useful tool for automatically updating containers. It just watches a git repo and automatically updates stuff.
I have it configured to automatically deploy minor updates, and for bigger updates, it opens a pull request and sends me an email.
Yeah full VMs are pretty old school, there are a lot more management options and automation available with containers. Not to mention the compute overhead.
Red Hat doesn’t even recommend businesses to use VMs anymore, and they offer a virtualization tool that runs the VMs inside a container for legacy apps. Its called Openshift Virtualization.
Yeah unraid is the same, it just adds a Gui to make it easier to learn. The downside is that unraid is very non-standard and is basically impossible to back up or manage in source control like vanilla docker or kubernetes
You should keep your docker/kubernetes configuration saved in git, and then have something like rclone take daily backups of all your data to something like a hetzner storage box. That is the setup I have.
My entire kubernetes configuration: https://codeberg.org/jlh/h5b/src/branch/main/argo/custom_applications
My backup cronjob: https://codeberg.org/jlh/h5b/src/branch/main/argo/custom_applications/backups/rclone-velero.yaml
With something like this, your entire setup could crash and burn, and you would still have everything you need to restore safely stored offsite.
RAM is definitely the limiting factor. The one server with a 5600X and 64GiB ram handled it pretty well as long as I wasn’t doing cpu transcoding, though.
I’ve since added two N100 boxes with 16GiB and two first gen Epyc 32 cores with 64GiB ram. All pretty cost effective and quiet.
The N100 CPUs get overloaded sometimes if they’re running too many databases, but usually it balances pretty well.
Yeah most of them are just high-availability replicas, probably only about 100-200 actual services/microservices
I have gone up to about 300-400 or so. Currently running about 5 machines averaging about 100 each.
Ah, cool, interesting!
Interesting, this seems to have better documentation and feedback than the external-dns operator
Right, something like hetzner storage box is a good complement to raid 5 in order to follow the 321 backup rule. You can use rclone to sync your backup to hetzner, and even encrypt it, and they can do automatic snapshots on their end to protect against ransomware.
Looks like a good setup to me. Hdds have a lot of downsides, so if you can afford the extra $20/TB, an all flash array is super useful. Mdadm is rock solid.
The only issue I think is that it’s not possible to expand this array like you can on LVM or ZFS, so just watch out for that.
LVM is a good way to do raid on linux
I’m looking at the future and what might be good replacement that offers a blend of power-efficiency, flexibility, and storage cost.
Any modern CPU will improve energy efficiency. AMD AM4 platform and Intel N100 are very cheap. AMD SP3 platform is also very cheap and has a ton of PCIe lanes and memory expandability for gpus, NVMe, and running lots of VMs.
For storage cost, used hdds are currently $8/TB, generic NVMe is currently $45, and used enterprise SSDs are $50/TB, decide which you want to invest in depending on your needs. Used enterprise drives will be the fastest and most durable for databases and RAID.
https://www.ebay.com/sch/i.html?_nkw=pm983+m.2&_trksid=p4432023.m4084.l1313
SSD prices are expected to decrease faster than HDD prices, and will probably overtake HDDs for value in ~5 years.
About dGPUs, Intel A310 is the best transcoding Gpu out there. Used Nvidia datacenter gpus probably have the best vram/$ and pytorch compatibility for AI
The nextcloud helm chart is nice