This company powerset.com says "we employ a small army of PhDs" But they know nothing about building bots. The blog they run won't even take comments without giving a error page.
bad-behavior 403 Required header 'Accept' missing
Agent: Mozilla/5.0 (compatible; zermelo; +http://www.powerset.com) [email:paul@page-store.com-crawl@powerset.com]
67.202.15.206 ec2-67-202-15-206.z-1.compute-1.amazonaws.com
amazonaws.com keeps showing up in my logs. It looks like this is a web hosting div of amazon so we may be able to ban it without banning amazon.
Subscribe to:
Post Comments (Atom)
1 comment:
From my Apache HTTPD.CONF:
# stop amazonaws addresses scraping /jargon
RewriteEngine on
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC]
RewriteCond %{REQUEST_URI} ^.*/jargon/.*\.html$ [NC]
RewriteRule !^/jargon/index\.html$ /jargon/index.html [L,R=permanent]
(excuse the wrap)
Post a Comment