May 10, 2008

67.202.15.206 compute-1.amazonaws.com www.powerset.com

This company powerset.com says "we employ a small army of PhDs" But they know nothing about building bots. The blog they run won't even take comments without giving a error page.

bad-behavior 403 Required header 'Accept' missing
Agent: Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com-crawl@powerset.com]
67.202.15.206 ec2-67-202-15-206.z-1.compute-1.amazonaws.com

amazonaws.com keeps showing up in my logs. It looks like this is a web hosting div of amazon so we may be able to ban it without banning amazon.

1 comment:

Anonymous said...

From my Apache HTTPD.CONF:
# stop amazonaws addresses scraping /jargon
RewriteEngine on
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC]
RewriteCond %{REQUEST_URI} ^.*/jargon/.*\.html$ [NC]
RewriteRule !^/jargon/index\.html$ /jargon/index.html [L,R=permanent]

(excuse the wrap)