Nov 30, 2006

; Windows NT; ....../1.0 What is this

Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )

Another fake browser detected.

OrgName: UUNET Technologies, Inc.
Address: 22001 Loudoun County Parkway
City: Ashburn
StateProv: VA
PostalCode: 20147
Country: US

Nov 29, 2006


Not sure what this one is

Nov 28, 2006

anothrrobot ( RSS ABUSE

anothrrobot (

The above IP is banned as a Single-stage open SMTP relay or HTTP Proxy See here

anothrrobot (

The above IP is also banned as a spammer see here

Located in Shanghai ShangHai china

This RSS Robot is said to read your rss feeds and then push them to the end user. But It keeps on reloading the rss feed over and over and over.

Example of abuse. I set my RSS feed Time To Live (TTL) to no more than 1 load per day but this bot is loading the feed every min and ignoring the TTL.

So its banned.
After banning I am seeing hits from another dynamic China IP address. I dont think this is a real Feed service.

Domain name:

Registrant Contact:
Zheng XY
13501863736 fax: 13501863736
15L,Huamin Building, No.728,Yanan Xi Rd.
Shanghai ShangHai 200051

Administrative Contact:
Zheng XY
13501863736 fax: 13501863736
15L,Huamin Building, No.728,Yanan Xi Rd.
Shanghai ShangHai 200051

Technical Contact:
Product Team
64677272 fax: 64727880
Room 306,MingYuan Tower,1199 Fu Xing Road (M)
Shanghai Shanghai 200031

Billing Contact:
Product Team
64677272 fax: 64727880
Room 306,MingYuan Tower,1199 Fu Xing Road (M)
Shanghai Shanghai 200031


Created: 2006-03-16
Expires: 2008-03-16


outfoxbot/0.5 (for internet experiments; http://;
All Hits From

This bot runs on a IP banned for sending out china spam.

It is a unknown bot. Likely a email harvestor

Nov 26, 2006 abuse probes

mozilla/4.0 (compatible; msie 6.0; windows nt 5.2; win64; amd64)

Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

I show this useragent only being used by not users.
mozilla/4.0 (compatible; msie 6.0; windows nt 5.0; avant browser []; hotbar

I show this useragent only being used by not users.

mozilla/5.0 (windows; u; windows nt 5.0; en-us; rv:1.7.5) gecko/20050207 firefox/1.0.1 is at it again. Goes straight to my contact form then back to the homepage then back to the contact form.

I few mins later it shows up on one of my PHP nuke sites trying to load a module that I do not run. After 14 tries it gave up and started on the homepage. After 6 tries it gave up on the homepage. should be banned from all sites.

Listed in

Nov 25, 2006

converacrawler/0.9d bot

converacrawler/0.9d (+

This bot came in and refused to take no for a answer it tried to load every page I had. So its clear that it didn't spider my site to get those links it got them from google or somewhere else.

converacrawler was orginaly banned as a email harvestor but it now looks like a real search site at

They claim you can add this to robots.txt.

User-agent: ConveraCrawler
Disallow: /

Nov 24, 2006 Violates robots.txt

The website has a robot that comes in and ignores your robots.txt file and takes a snapshot of your website and then post it as a thumbnail on its site.
It doesn't matter if you do block all images from bots like this.

User-agent: *
Disallow: /images/ refuses to abide by the commands in robots.txt.

abuse from

Mozilla/4.0 compatible

Ran into this probe today from several IPS on Looks like they were testing out useragents on diffrent IPS.

Nov 23, 2006 spamer

mozilla/5.0 (compatible; googlebot/2.1; +
mozilla/5.0 (compatible; googlebot/2.1; + is the leading provider for data for direct marketing campaings

Humm why would a direct marketer be sending out a bot that fakes google and atempts to post spam into our scripts.

It looks like this is a spammer site and they are getting into blog spam?

the website won't load but a cache is still stored in google

Update this bot keeps trying scripts that dont exist.

PHPNuke Atacker Bots

Have been seeing a lot of bots hitting my phpnuke sites its not clear why they are trying to load the following files since they are not used in the current version. And have never been located on my server.

The files they are atempting to post to are

I have setup a autoban on these files to track the atacks here will be the results of what I find.

Its now clear what this is. This is an atack on the phpBB forum software that PHPNUKE uses problem is that this version is modified and the atack wont work on PHP NUKE. But that doesnt stop the robots atacks.

IPS of phpBB hackers <- worst abuser

Nov 22, 2006

Exalead image theft

Exalead Snapshots your site and lists it in its search system. It also tries to hot link all your images in a page view window. Sites like mine using hotlink protection will display a image theft notice when they do this.

I thought I has stoped this snapshot bot without stoping its crawler but they have again changed the useragent for it.

See here and here for more.

block useragents.

NG/2.0,Image crawler
NG/4.0,image crawler

Robot does not comply with simple basic robots.txt commands to not load images.

User-agent: *
Disallow: /images/

Nov 21, 2006

nodomaintransfer abuse is back

Will show up as a domain nodomaintransfer??.com with the ?? being replaced with a number. This is a guestbook spammer.

It is now suspected that they are registering throw away domains so when they get caught they can just switch to a new one. I have seen the above ones if you have seen other combos please post them.

On another note its odd that we also see Singapore peepsurf running a proxy on now suspected to be connected.

domain ban
nodomaintransfer,Gustbook Spammer

sumitbot_hansrajbot RufusBot Submit Bot spammer

sumitbot_hansrajbot (sumitbot_hansrajbot;

IP has been flagged as a spammer. Also see SPAMBAG on

Why are we crawling?
We crawl the web towards the goal of developing a new kind of index/search tool that will bring substantial and previously unavailable exposure to websites. We're in "stealth mode" for the next few months for business reasons, but watch this page for more details on our product.

Yea same old story. But if its true why don't you have a real domain name and why are you running on a ip flagged as a source of spam. Get a real hosting account with a real domain and someone might believe you.

We identify ourselves with the name RufusBot in our crawls

The code below can be used to disallow access to all parts of your site just for our bot.
User-Agent: RufusBot
Disallow: /

Sorry that statment is false. It identifies itself as The Submit Bot in crawls. Submitting what? Spam?

Its not clear what is. Is it a ISP or hosting company.



This bot is using a IP flagged as a spammer. This unknown bot is banned.

blogbot/1.0 Locus.CS.UCLA.EDU

blogbot/1.0 (ucla cs dept
All Hits From Locus.CS.UCLA.EDU

Unknown what this bot is for so its banned. webbot/0.1

mozilla/5.0 (compatible; webbot/0.1;

This bot fell right into bot traps and then kept trying to spider all my sites.

It is a ru robot

Banned due to abuse. Not following robots.txt

Nov 17, 2006 scrapper

400 Required header 'Accept' missing
Mozilla/5.0 (compatible; heritrix/1.8.0 +

My mission is helping companies mine the online world. I seek innovators like you, who provide insights into unmet needs, trends, and market activity. Using Accelovation Market Discovery™ software (MDS), I help automate market research, allowing companies to more effectively and economically identify and take advantage of new opportunities for innovation and growth.

This bot was caught hammering my site and getting blocked on all PHP pages by BB.
Recomend adding this robot to your robots file.

Case Studies
Major consumer packaged goods companies use Accelovation to identify new innovations that will become their next billion dollar businesses.
Multiple Fortune 500 chemical companies use Accelovation to discover new markets for existing capabilities, while keeping tabs on the competition.
A Fortune 100 telecommunications company identifies patent infringers to win multi-million dollar awards via automated Accelovation searches.

Really? Stealing my content so some big company can make money off of it is theft.
Helping big companies find ideals that they can take from us and patent is theft.
And worse yet once they take your ideals and patent them they come back and sue you for patent theft.


Nov 10, 2006

Running remote scripts

elseif(intval(get_cfg_var(’allow_url_fopen’)) && function_exists(’file’)) {
if($content = @file(””.$QueryString))
echo @join('’, $content);
elseif(function_exists(’curl_init’)) {
$ch = curl_init (””.$QueryString);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_exec ($ch);

Take care. The site will not answer my questions about the security problems.

Beware of the PHP script provided by that they want you to put on your server. It allows them to take total control of your server. Instead of pulling content and displaying it on your server. It loads the script from the remote server and then runs it.

This is a huge security violation. Then can spam from your server or run bots or do anything they want. They will control your server.

Until they release a real script that just prints the content to the screen so it can not be executed or answer emails about why they wont change it do not use that service.

More testing on this shows that it looks like the remote content can be loaded then scanned for any php codes before its displayed but you will have to write your own script to do this. If anyone else wants to help test some safe scripts using this service let me know. Need to make sure we know all the exploits we need to scan for.
Scanning for
should prevent any php codes from running. Any more ideals?

Nov 9, 2006

Website Contact form How the robots atack

If you have a website you likely have a contact form so you do not have to list your email address.

The rise of blogs has also created roving spambots that post to comment forms. They are atempting to find blogs and guestbooks but they are also posting to our website contact forms.

Here is an example of a robot that came from

The robot read the form from my html page copied all the form fields including the hidden ones. It then submitted all the proper filelds leaving the ones not used blank. It added data to teh name and city.

The city field contained 'k o s t a n a y' (Spaces added) The name contained a random name. It is beleived that this was a test message designed to post to everything and then com back a month later and scan google to find out what sites end up displaying the test phrase which in this case is the city.

Once it finds out which sites it got into it will then come back and post its spam message.

Strange thing about this robot is that it has a bug. It doesnt understand your reset or clear button so it tries to submit that field also like this.
reset=Reset form

So if you find your reset filed being posted to your form you should reject the entry.

Posting a key field or password field won't help because it will read the field and repost it. However after detecting this bot I changed my key and found that its still trying to post under the old key so it reads your key once and then doesn't do any updates.

In order to protect your forms from this bot I recomend using php to create your form page and then post the current date as a hidden field along with a rotating key. Then test for these when the data is submitted. This type of bot may pass the first test but none of the ones after that. In fact it may not even pass the first test it it doesnt post on the same day it scans.

For my forms that are on html pages I have changed my php submission form. It now displays a page asking the user to press submit again to verify the post. This inserts another date and key code in the input that the robots can not duplicate. Not only do they not know what the key will be before time but they would have to submit the data twice with diffrent keys to get in, something they are not programmed to do.

The verify button takes the place of the capata and works just as good so far.

Nov 8, 2006

Google IP Falls into bot traps

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; sv1; .net clr 1.1.4322)

This IP is owned by Google and is used by Google Web Accelerator

The problem is that google is not following the robots.txt file so its falling into bot traps.

Or if its not Google Web Accelerator falling into traps then people are using the ip as a proxy.

Question is what to do about this?

Nov 7, 2006

403 A User-Agent is required but none was provided

This one also has no useragent not sure what its doing.

bad-behavior 403 A User-Agent is required but none was provided

These 2 came in at the same time and is the same bot from 2 ips.

Its not clear what is the website gives a error and I can find no record in google,

Nov 6, 2006 Is a Health search sytem



They have started changing useragents lately. The site says its using the Voyager useragent.
What is your crawler's HTTP user-agent string?


Thats really strange since it keeps using cfetch/1.0 most of the time.
I had been banning them by ip but will try the robots file again.

Add this to robots.txt
User-agent: voyager
Disallow: /

This is a ISP in Telekom Malaysia It creates a lot of guestbook spam and we had banned. it It will be unbanned as an experment to see what we get.

Jakarta Commons

See last post this one came uin using Jakarta Commons and was blocked by BB so they started changing IPS I guess thinking I was blocking them by IP?

Here is a list the first one had a longer Useragent

Jakarta Commons-HttpClient/3.0.1 UP.Link/

Jakarta Commons-HttpClient/3.0.1

Still comming ip list updated,

Kind of looks like either he has accounts on all of these or the computers are compromised or they are some type of proxy...

400 Header 'Connection' contains invalid values

400 Header 'Connection' contains invalid values
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

This bot or hacker dont know which hit my site and was stoped by BB.
So it starts changing IPS still uses the same useragent and headers. Whats strange is all the IPS its using.

Here is a list.

Also see next post about similar action using useragent 'Jakarta Commons'

Nov 3, 2006
SE - Sweden
What kind of domain is this?
Its website is blank and its robot visits with no useragent.

mozilla/4.0 (compatible; msie 5.01; windows nt)

This one is a spammer the IP is on the spam blocklist.

What gave him away is the windows nt useragent. This is invalid.

Posted: October 31 2006 Post subject: suspicious link in my stats

Does anybody have a clue what this is? I've had is show up three times now. Obviously, the entry link and exit link have nothing to do with my site.

Here is the stat info

October 31st 2006 16:42:51
7 seconds
Konqueror 3.5
1600x1200 Returning Visits:

Referring URL: 0
Location: Florida, Miami, United States
host (
No referring link

A check on this site buzzlogic shows that its a snoop bot for corps to check on who is talking about them. Another snoop bot.