Page 1 of 1

User Agent Allow Deny ?

Posted: 16 Feb 2009, 22:21
by bin_asc
I don`t know if this utterly possible, but what I`d like to see would be a feature to allow/deny useragents. Basically, I`ve had a nasty issue last days with Googlebot being blocked out, and the only thing I could see in the logs were a port scan ip blocked ...
After 4 days, I said what the heck, I`ll look in CSF - > Iptables last x log entries ... and guess what ... google bot blocked ...
So I`d like to see or know if this is possible/not possible ... anyone care to see this happening ?

Posted: 16 Feb 2009, 22:23
by bin_asc
I think another idea would be allowing access based on rDNS info ... but this would slow down communication a bit ... I think maybe using some sort of cache while the resolution is done on the ip ?

Posted: 19 Feb 2009, 16:48
by chirpy
This was added some time ago with csf.rignore

Posted: 19 Feb 2009, 16:50
by bin_asc
I didn`t know that *goes to scoop files*.

Only process tracking?

Posted: 01 May 2009, 07:57
by jols
chirpy wrote:This was added some time ago with csf.rignore
Chirpy, I read in the csf.rignore file that this mainly has to do with process tracking. If for example I insert:

.googlebot.com

Would this also prevent the GoogleBot IP from being inserted in the csf.deny list by other-than-process-tracking means?

Thanks much!


FOLLOW-UP:

I guess not, after doing that I tried adding a googlebot IP to csf.deny like this:

/etc/csf/csf.pl -d 66.249.70.132 # googlebot block test
Adding 66.249.70.132 to csf.deny and iptables DROP...
DROP all opt -- in !lo out * 66.249.70.132 -> 0.0.0.0/0
DROP all opt -- in * out !lo 0.0.0.0/0 -> 66.249.70.132

And as you can see the googlebot would have been blocked anyway, despite 66.249.70.132 properly reversing out to google.com, so csf.rignore does not seem to help with this afterall.

So obviously we need some kind of auto-googlebot-white-list, or at least block preventative facility that would keep the googlebot IPs from ending up in the csf.deny file. Is there anything in the works which would help us here?... like a csf.rallow ?

This is an important matter as we are loosing accounts based on googlebot blocks that are inadvertently made.

Posted: 04 May 2009, 00:26
by jols
Just to elaborate why this capability is needed:
----------------------
csf.rignore is only for process tracking.

This means that googlebot can be blocked for other reasons and csf.rignore would not prevent this.

For example, we have a cron driven shell script set up to look at the general Apache access log and block any IP that hits (the GENERAL) log with too many 404 errors. This is usually the result of a hacker probe hitting the server IP, looking for hackable programs. (NOTE: Our script does not look at the individual accounts for 404 errors.)

The problem for us is that when we suspend an account, we find that the googlebot will occasionally go mad and start hitting the account's suspend page over and over again, and all of this winds up as 404 hits to the general Apache access log, thus the googlebot is blocked.

Now, if only csf.rignore worked globally for any csf blocks, then it would be great. As it is, for the time being I guess we are left with writing our own little script which would do the rDNS lookup before implementing:

/etc/csf/csf.pl -d

or

/etc/csf/csf.pl -td

... which should be simple for us to do, but this is just beyond my capabilities for the moment.

Posted: 07 May 2009, 09:56
by chirpy
Erm, No. csf.rignore stops lfd from blocking those IP addresses if they trigger any of the settings. If you deliberately block them manually using csf -d then it will block them as requested.

Posted: 11 May 2009, 08:12
by jols
And yes, this is the case, i.e. we use csf -d with a custom shell script to block so many bad hits to the server.