How about a federated system for sharing “known safe” image attestations? That way, the trust list is something managed locally by each participating instance.
Edit: thinking about it some more, a federated image classification system would allow some instances to be more strict than others.
As I’m saying, I don’t think you need to: manually subscribing to each trusted instance via ActivityPub should suffice. The pass/fail determination can be done when querying for known images.