I think this project has some tools that might automate that:
https://0xacab.org/dCF/deCloudflare
They ID and track every website that joins #Cloudflare. It’s a huge effort but those guys are on top of it. A script could check the list of domains against their list. There is also this service (from the same devs) which does some checks:
https://karma.crimeflare.eu.org:1984/api/is/cloudflare/html/
but caveat: if a non-CF domain (e.g. example.tld) has a CF host (e.g. somehost.example.tld), that tool will return YES for the whole domain.
Manually adjusting availability is a can of worms that I don’t want to open
I would suggest not bothering with any complex math, and simply do the calculation as you normally do but then if a site is Cloudflare cap whatever the calculated figure is to 98%. Probably most (if not all) CF sites would be 100% anyway, so they would just be reduced by 2%. Though it would need to be explained somewhere – the beauty of which would be to help inform people that the CF walled garden is excluding people. Cloudflare’s harm perpetuates to a large extent because people are unaware that it’s an exclusive walled garden that marginalizes people.
Glad to see they are tagged. It could evolve more but the tags are the most important thing.