Jan 29, 2008

woriobot heritrix/1.10.0 +http://worio.com) bot

Mozilla/5.0 (compatible; heritrix/1.6.0 +http://www.worio.com/) worio.com

A new bot just showed up claiming another beta test.

This bot is blocked by Bad Behaivor for using improper headers.

Klaas said...
Could you elaborate on the problem with the headers? I'm eager to fix real an perceived problems with our crawler.

Here is the BB error

bad-behavior 403 Required header 'Accept' missing
Agent: Mozilla/5.0 (compatible; woriobot heritrix/1.10.0 +http://worio.com) worio.com

Your just going to have to test it on a blog using Bad Behavior.

If it were a worthwhile bot I would whitelist it but since it doesn't do anything yet why bother. If your project ever gets off the ground let me know and I will erase this post.


Klaas said...
This comment has been removed by a blog administrator.
SpottedLop said...

I woulnd't whitelist it. Whatever it is, it totally ignored my robots.txt file, and the website it pointed to worio.com has no information, just a log in page. Not okay.

Furmen Sakume said...

I blocked them via cloudflare. The bot's showing the signs of being the average content scraper, ignores robots.txt file even though the log shows it read it and when crawling, tries to index 403's repeatedly.