I don`t know if this utterly possible, but what I`d like to see would be a feature to allow/deny useragents. Basically, I`ve had a nasty issue last days with Googlebot being blocked out, and the only thing I could see in the logs were a port scan ip blocked ...
After 4 days, I said what the heck, I`ll look in CSF - > Iptables last x log entries ... and guess what ... google bot blocked ...
So I`d like to see or know if this is possible/not possible ... anyone care to see this happening ?
User Agent Allow Deny ?
Only process tracking?
Chirpy, I read in the csf.rignore file that this mainly has to do with process tracking. If for example I insert:chirpy wrote:This was added some time ago with csf.rignore
.googlebot.com
Would this also prevent the GoogleBot IP from being inserted in the csf.deny list by other-than-process-tracking means?
Thanks much!
FOLLOW-UP:
I guess not, after doing that I tried adding a googlebot IP to csf.deny like this:
/etc/csf/csf.pl -d 66.249.70.132 # googlebot block test
Adding 66.249.70.132 to csf.deny and iptables DROP...
DROP all opt -- in !lo out * 66.249.70.132 -> 0.0.0.0/0
DROP all opt -- in * out !lo 0.0.0.0/0 -> 66.249.70.132
And as you can see the googlebot would have been blocked anyway, despite 66.249.70.132 properly reversing out to google.com, so csf.rignore does not seem to help with this afterall.
So obviously we need some kind of auto-googlebot-white-list, or at least block preventative facility that would keep the googlebot IPs from ending up in the csf.deny file. Is there anything in the works which would help us here?... like a csf.rallow ?
This is an important matter as we are loosing accounts based on googlebot blocks that are inadvertently made.
Just to elaborate why this capability is needed:
----------------------
csf.rignore is only for process tracking.
This means that googlebot can be blocked for other reasons and csf.rignore would not prevent this.
For example, we have a cron driven shell script set up to look at the general Apache access log and block any IP that hits (the GENERAL) log with too many 404 errors. This is usually the result of a hacker probe hitting the server IP, looking for hackable programs. (NOTE: Our script does not look at the individual accounts for 404 errors.)
The problem for us is that when we suspend an account, we find that the googlebot will occasionally go mad and start hitting the account's suspend page over and over again, and all of this winds up as 404 hits to the general Apache access log, thus the googlebot is blocked.
Now, if only csf.rignore worked globally for any csf blocks, then it would be great. As it is, for the time being I guess we are left with writing our own little script which would do the rDNS lookup before implementing:
/etc/csf/csf.pl -d
or
/etc/csf/csf.pl -td
... which should be simple for us to do, but this is just beyond my capabilities for the moment.
----------------------
csf.rignore is only for process tracking.
This means that googlebot can be blocked for other reasons and csf.rignore would not prevent this.
For example, we have a cron driven shell script set up to look at the general Apache access log and block any IP that hits (the GENERAL) log with too many 404 errors. This is usually the result of a hacker probe hitting the server IP, looking for hackable programs. (NOTE: Our script does not look at the individual accounts for 404 errors.)
The problem for us is that when we suspend an account, we find that the googlebot will occasionally go mad and start hitting the account's suspend page over and over again, and all of this winds up as 404 hits to the general Apache access log, thus the googlebot is blocked.
Now, if only csf.rignore worked globally for any csf blocks, then it would be great. As it is, for the time being I guess we are left with writing our own little script which would do the rDNS lookup before implementing:
/etc/csf/csf.pl -d
or
/etc/csf/csf.pl -td
... which should be simple for us to do, but this is just beyond my capabilities for the moment.