Sep 24, 2006

baiduspider bad bot ignores robots.txt

baiduspider+(+http://www.baidu.com/search/spider.htm)
202.108.11.106
202.108.11.108
60.28.17.43

This is a china search system that indexes sites writen in chinese I think.
Since my sites are in english I don't understand why its trying to index me.

It says to add "baiduspider" to your robots file. I did this months ago but its back.
It is ignoring the robots.txt file.

The above IPS are in the blacklist as spammers. See link. You hace to click on OPEN RBL and then when the second window opens click on LOOKUP this will display all the block list in red.


It has been added to the useragent ban list and is blocked but it just ignores the
eror and keeps comming abck. Its time to add the IPS to the Server IP ban.

deny from 202.108.11.106
deny from 202.108.11.108
deny from 60.28.17.43

More work needed to find all its ips.

4 comments:

Scott Allen said...

I get these punks on my server all the time too. I agree...why are they indexing English sites? I banned em too. I set my .htaccess to send them back to their own site. Thought that was funny. :)

Anonymous said...

your site gets sent to baidu when you do the mass site submit found on some sites.

Anonymous said...

I have found that this Baidispider uses many Ip's and I keep adding them to my firewall to block them. Every time thet get to scan one of my sites I start getting spam from all over on my contact us form. I have no doubr that this spider is scanning sites to find contact us forms it can pass on to spammers. This i swhy it is scanning english sites. Evertime I block them and chnage my contact us form file name the spam stops until they come in o a new IP and then the sdpam starts again. There is no doubt that they are responsible.

kamran said...

Its a spam spider. You can see yourself that it always seek pages which contain email or contact forum page to post spam comments. I today try to block ip ranges but its did not care of those too and go wild hit my site with different ips. Its very annoying consume large no. of bandwidth and affect page views data.