Hey, the short answer is yes, you can.
I would elaborate a little more:
- First, you have the problem of sourcing the data. In essence, Crowdsec won’t be able to go and fetch those logs for you dynamically, but can go and take those logs from a file (you can do a dirty solution like a sidecar deployment) or from a stream. You can deploy crowdsec in multiple modes, and you can have many instances that talk to each other. You can also simply have some process tailing the pod logs and sending them to a file crowdsec has access to or serving them as a stream (see https://doc.crowdsec.net/docs/data_sources/intro).
- The above means that it doesn’t really matter whether you run Crowdsec inside your cluster (it does have a Helm chart) or on the host. Ultimately all it matters is that crowdsec has access to your pods logs (for example, the logs of your ingress controller).
- The next piece is the remediation component. What do you want crowdsec to do, once it is able to detect bad IPs? If you want to just add IPs to the firewall, then it might make more sense running it on the host(s) you use in ingress, if you want to add the IPs to network policies you can do it, but you need to develop your own remediation components. I am planning to write a remediation component that will add the IPs to Hetzner firewall, some other systems are already supported, but this would be a way to basically block the IPs outside your cluster. For nginx ingress controller there is already a pre-made remediation component .
In practice I personally would choose a simple setup where the interesting logs are just forwarded (in Syslog format for example) to a single crowdsec instance. If you have ingress from a single node, I’d go for running it on the host and banning via firewall, if you have multiple ingress nodes, then I would run it inside the cluster and ban via a loadBalancer/cloud firewall/whatever you have in front.
In essence, I would spend some time to think about your preferences, and it might take a little bit to make the setup clean, but I think you have plenty of flexibility to do what you prefer. Let me know if you want to bounce some more ideas!
Comfort is the main reason, I suppose. If I mess up Wireguard config, even to debug the tunnel I need to go to the KVM console. It also means that if I go to a different place and I have to SSH into the box I can’t plug my Yubikey and SSH from there. It’s a rare occurrence, but still…
Ultimately I do understand both point of view. The thing is, SSH bots pose no threats after the bare minimum hardening for SSH has been done. The resource consumption is negligible, so it has no real impact.
To me the tradeoff is slight inconvenience vs slightly bigger attack surface (in case of CVEs). Ultimately everyone can decide which compromise is acceptable for them, but I would say that the choice is not really a big one.