Michael Scheidell | 5 Nov 12:54 2011

websense violating robots.txt to scan your web site. Fwd: FW: alert: New event: ET SCAN WebHack Control Center User-Agent Inbound (WHCC/)

websense violating robots.txt files.

seems websense is forging user agents, in attempts to scan/mirror/ download your web site.

whois 208.80.194.26 give you websense.
grepping your web server logs for 208.80.194.* give you a lot more interesting data.

10/31-23:00:04 <trust1> TCP 208.80.194.26:37330 --> 10.70.1.13:80
[1:2003924:13] ET SCAN WebHack Control Center User-Agent Inbound (WHCC/)
[Classification: A Network Trojan was detected] [Priority: 1]


this give you NOTHING, ZERO, NODA, they won't download your robots.txt files because they don't want you to know they are a robot.

grep robots.txt access.log | grep 208.80.194
so, how can you use robots.txt to block them? you can't. (well, robots.txt is NOT an ACL, its only for well behaved robots. not robots that pretend to be humans)

check this out:
208.80.194.28 - - [04/Nov/2011:00:03:51 -0400] "GET /press-room/first-alerts/vulnerability-in-dell-oem-xp-install.html HTTP/1.0" 200 27813 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {EA977ADD-F89E-CD12-4114-52C1061160B7}; .NET CLR 1.1.4322)"

what about this:

208.80.194.26 - - [03/Nov/2011:06:13:48 -0400] "GET /search.html HTTP/1.0" 200 15075 "-" "Mozilla/4.0 (compatible; MSIE 6.0; America Online Browser 1.1; rev1.2; Windows NT 5.1; SV1; FunWebProducts; hbtools 4.7.0)"

208.80.194.28 - - [02/Nov/2011:06:36:18 -0400] "GET / HTTP/1.0" 301 226 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MEGAUPLOAD 2.0)"

notice anything unusual?  what about the HTTP/1.0?  HM??   MS IE 6.0, RUNNING A HTTP 1.0 BROWSER?

What category of sig? would you recommend? 
'stealth web crawl robot'?  
looking for 'HTTP/1.0' and MSIE and Mozilla?

grep 208.80.194 access.log | egrep -v 'MSIE.*Windows'

(gives me zero)

so, something like '

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (msg:"ET POLICY Stealth web crawler";
flow:to_server,established; content:"HTTP/1.0"; content:"MSIE"; content: "Windows"; nocase; http_uri; classtype:web-application-activity


too broad? get lots of them.

46.246.125.94 - - [01/Nov/2011:05:39:11 -0400] "GET /company/board/victornappe.html HTTP/1.0" 200 21414 "http://www.secnap.com/company/board/victornappe.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Maxthon; .NET CLR 1.1.43


tighten it up:
alert tcp [208.80.192.0/21] any -> $HTTP_SERVERS $HTTP_PORTS (msg:"ET POLICY Stealth websense crawler";
flow:to_server,established; content:"HTTP/1.0"; content:"MSIE"; content: "Windows"; nocase; http_uri; classtype:web-application-activity


--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
> | SECNAP Network Security Corporation
.unnamed1 { margin: 1em; padding: 1px; }
  • Best Mobile Solutions Product of 2011
  • Best Intrusion Prevention Product
  • Hot Company Finalist 2011
  • Best Email Security Product
  • Certified SNORT Integrator



This email has been scanned and certified safe by SpammerTrap®.
For Information please see http://www.spammertrap.com/

<div>
    websense violating robots.txt files.<br><br>
    seems websense is forging user agents, in attempts to scan/mirror/
    download your web site.<br><br>
    whois 208.80.194.26 give you websense.<br>
    grepping your web server logs for 208.80.194.* give you a lot more
    interesting data.<br><br><div class="WordSection1">
      <p class="MsoNormal">10/31-23:00:04 &lt;trust1&gt; TCP
        208.80.194.26:37330 --&gt; 10.70.1.13:80<br>
        [1:2003924:13] ET SCAN WebHack Control Center User-Agent Inbound
        (WHCC/)<br>
        [Classification: A Network Trojan was detected] [Priority: 1]<p></p></p>
    </div>
    <br>
    this give you NOTHING, ZERO, NODA, they won't download your
    robots.txt files because they don't want you to know they are a
    robot.<br><br>
    grep robots.txt access.log | grep 208.80.194<br>
    so, how can you use robots.txt to block them? you can't. (well,
    robots.txt is NOT an ACL, its only for well behaved robots. not
    robots that pretend to be humans)<br><br>
    check this out:<br>
    208.80.194.28 - - [04/Nov/2011:00:03:51 -0400] "GET
    /press-room/first-alerts/vulnerability-in-dell-oem-xp-install.html
    HTTP/1.0" 200 27813 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows
    NT 5.1; {EA977ADD-F89E-CD12-4114-52C1061160B7}; .NET CLR 1.1.4322)"<br><br>
    what about this:<br><br>
    208.80.194.26 - - [03/Nov/2011:06:13:48 -0400] "GET /search.html
    HTTP/1.0" 200 15075 "-" "Mozilla/4.0 (compatible; MSIE 6.0; America
    Online Browser 1.1; rev1.2; Windows NT 5.1; SV1; FunWebProducts;
    hbtools 4.7.0)"<br><br>
    208.80.194.28 - - [02/Nov/2011:06:36:18 -0400] "GET / HTTP/1.0" 301
    226 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;
    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR
    1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30;
    MEGAUPLOAD 2.0)"<br><br>
    notice anything unusual?&nbsp; what about the HTTP/1.0?&nbsp; HM??&nbsp;&nbsp; MS IE
    6.0, RUNNING A HTTP 1.0 BROWSER?<br><br>
    What category of sig? would you recommend?&nbsp; <br>
    'stealth web crawl robot'?&nbsp;&nbsp; <br>
    looking for 'HTTP/1.0' and MSIE and Mozilla?<br><br>
    grep 208.80.194 access.log | egrep -v 'MSIE.*Windows'<br><br>
    (gives me zero)<br><br>
    so, something like '<br><br>
    alert tcp $EXTERNAL_NET any -&gt; $HTTP_SERVERS $HTTP_PORTS (msg:"ET
    POLICY Stealth web crawler";<br>
    flow:to_server,established; content:"HTTP/1.0"; content:"MSIE";
    content: "Windows"; nocase; http_uri;
    classtype:web-application-activity<br><br><br>
    too broad? get lots of them.<br><br>
    46.246.125.94 - - [01/Nov/2011:05:39:11 -0400] "GET
    /company/board/victornappe.html HTTP/1.0" 200 21414
    <a class="moz-txt-link-rfc2396E" href="http://www.secnap.com/company/board/victornappe.html">"http://www.secnap.com/company/board/victornappe.html"</a> "Mozilla/4.0
    (compatible; MSIE 6.0; Windows NT 5.1; Maxthon; .NET CLR 1.1.43<br><br><br>
    tighten it up:<br>
    alert tcp [208.80.192.0/21] any -&gt; $HTTP_SERVERS $HTTP_PORTS
    (msg:"ET POLICY Stealth websense crawler";<br>
    flow:to_server,established; content:"HTTP/1.0"; content:"MSIE";
    content: "Windows"; nocase; http_uri;
    classtype:web-application-activity<br><br><br>
    -- <br>
    Michael Scheidell, CTO<br>
    o: 561-999-5000<br>
    d: 561-948-2259<br>&gt; | SECNAP
    Network Security Corporation
    <div class="moz-signature">
      .unnamed1 { margin: 1em; padding: 1px; }<ul class="unnamed1">
<li>Best Mobile Solutions Product of 2011</li>
        <li>Best Intrusion Prevention Product</li>
        <li>Hot Company Finalist 2011</li>
        <li>Best Email Security Product</li>
        <li>Certified SNORT Integrator</li>
      </ul>
</div>
    <br><br><br><div>This email has been scanned and certified safe by SpammerTrap&reg;.</div>
<div>For Information please see
<a href="http://www.spammertrap.com/">http://www.spammertrap.com/</a> </div>
<br>
</div>
Martin Holste | 5 Nov 15:16 2011
Picon

Re: websense violating robots.txt to scan your web site. Fwd: FW: alert: New event: ET SCAN WebHack Control Center User-Agent Inbound (WHCC/)

In defense of Websense, malicious sites (and more commonly, sites hacked to be malicious) are set up to serve non-malicious content to spiders to hide their presence.  To combat this, Websense ignores the robots.txt and spoofs the UA.  Remember that their goal is not to mirror content but rather to assess a site, so following robots.txt doesn't make sense for them.  In any case, the sig should be helpful.

On Saturday, November 5, 2011, Michael Scheidell <michael.scheidell-03O0XwTbxF3QT0dZR+AlfA@public.gmane.org> wrote:
> websense violating robots.txt files.
>
> seems websense is forging user agents, in attempts to scan/mirror/ download your web site.
>
> whois 208.80.194.26 give you websense.
> grepping your web server logs for 208.80.194.* give you a lot more interesting data.
>
> 10/31-23:00:04 <trust1> TCP 208.80.194.26:37330 --> 10.70.1.13:80
> [1:2003924:13] ET SCAN WebHack Control Center User-Agent Inbound (WHCC/)
> [Classification: A Network Trojan was detected] [Priority: 1]
>
> this give you NOTHING, ZERO, NODA, they won't download your robots.txt files because they don't want you to know they are a robot.
>
> grep robots.txt access.log | grep 208.80.194
> so, how can you use robots.txt to block them? you can't. (well, robots.txt is NOT an ACL, its only for well behaved robots. not robots that pretend to be humans)
>
> check this out:
> 208.80.194.28 - - [04/Nov/2011:00:03:51 -0400] "GET /press-room/first-alerts/vulnerability-in-dell-oem-xp-install.html HTTP/1.0" 200 27813 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {EA977ADD-F89E-CD12-4114-52C1061160B7}; .NET CLR 1.1.4322)"
>
> what about this:
>
> 208.80.194.26 - - [03/Nov/2011:06:13:48 -0400] "GET /search.html HTTP/1.0" 200 15075 "-" "Mozilla/4.0 (compatible; MSIE 6.0; America Online Browser 1.1; rev1.2; Windows NT 5.1; SV1; FunWebProducts; hbtools 4.7.0)"
>
> 208.80.194.28 - - [02/Nov/2011:06:36:18 -0400] "GET / HTTP/1.0" 301 226 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MEGAUPLOAD 2.0)"
>
> notice anything unusual?  what about the HTTP/1.0?  HM??   MS IE 6.0, RUNNING A HTTP 1.0 BROWSER?
>
> What category of sig? would you recommend? 
> 'stealth web crawl robot'?  
> looking for 'HTTP/1.0' and MSIE and Mozilla?
>
> grep 208.80.194 access.log | egrep -v 'MSIE.*Windows'
>
> (gives me zero)
>
> so, something like '
>
> alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (msg:"ET POLICY Stealth web crawler";
> flow:to_server,established; content:"HTTP/1.0"; content:"MSIE"; content: "Windows"; nocase; http_uri; classtype:web-application-activity
>
>
> too broad? get lots of them.
>
> 46.246.125.94 - - [01/Nov/2011:05:39:11 -0400] "GET /company/board/victornappe.html HTTP/1.0" 200 21414 "http://www.secnap.com/company/board/victornappe.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Maxthon; .NET CLR 1.1.43
>
>
> tighten it up:
> alert tcp [208.80.192.0/21] any -> $HTTP_SERVERS $HTTP_PORTS (msg:"ET POLICY Stealth websense crawler";
> flow:to_server,established; content:"HTTP/1.0"; content:"MSIE"; content: "Windows"; nocase; http_uri; classtype:web-application-activity
>
>
> --
> Michael Scheidell, CTO
> o: 561-999-5000
> d: 561-948-2259
>> | SECNAP Network Security Corporation
>
> Best Mobile Solutions Product of 2011
> Best Intrusion Prevention Product
> Hot Company Finalist 2011
> Best Email Security Product
> Certified SNORT Integrator
>
>
> ________________________________
> This email has been scanned and certified safe by SpammerTrap®.
> For Information please see http://www.spammertrap.com/
> ________________________________
>

<div><p>In defense of Websense, malicious sites (and more commonly, sites hacked to be malicious) are set up to serve non-malicious content to spiders to hide their presence. &nbsp;To combat this, Websense ignores the robots.txt and spoofs the UA. &nbsp;Remember that their goal is not to mirror content but rather to assess a site, so following robots.txt doesn't make sense for them. &nbsp;In any case, the sig should be helpful.<br><br>On Saturday, November 5, 2011, Michael Scheidell &lt;<a href="mailto:michael.scheidell@...">michael.scheidell@...</a>&gt; wrote:<br>&gt; websense violating robots.txt files.<br>&gt;<br>&gt; seems websense is forging user agents, in attempts to scan/mirror/ download your web site.<br>
&gt;<br>&gt; whois 208.80.194.26 give you websense.<br>&gt; grepping your web server logs for 208.80.194.* give you a lot more interesting data.<br>&gt;<br>&gt; 10/31-23:00:04 &lt;trust1&gt; TCP <a href="http://208.80.194.26:37330">208.80.194.26:37330</a> --&gt; <a href="http://10.70.1.13:80">10.70.1.13:80</a><br>
&gt; [1:2003924:13] ET SCAN WebHack Control Center User-Agent Inbound (WHCC/)<br>&gt; [Classification: A Network Trojan was detected] [Priority: 1]<br>&gt;<br>&gt; this give you NOTHING, ZERO, NODA, they won't download your robots.txt files because they don't want you to know they are a robot.<br>
&gt;<br>&gt; grep robots.txt access.log | grep 208.80.194<br>&gt; so, how can you use robots.txt to block them? you can't. (well, robots.txt is NOT an ACL, its only for well behaved robots. not robots that pretend to be humans)<br>
&gt;<br>&gt; check this out:<br>&gt; 208.80.194.28 - - [04/Nov/2011:00:03:51 -0400] "GET /press-room/first-alerts/vulnerability-in-dell-oem-xp-install.html HTTP/1.0" 200 27813 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {EA977ADD-F89E-CD12-4114-52C1061160B7}; .NET CLR 1.1.4322)"<br>
&gt;<br>&gt; what about this:<br>&gt;<br>&gt; 208.80.194.26 - - [03/Nov/2011:06:13:48 -0400] "GET /search.html HTTP/1.0" 200 15075 "-" "Mozilla/4.0 (compatible; MSIE 6.0; America Online Browser 1.1; rev1.2; Windows NT 5.1; SV1; FunWebProducts; hbtools 4.7.0)"<br>
&gt;<br>&gt; 208.80.194.28 - - [02/Nov/2011:06:36:18 -0400] "GET / HTTP/1.0" 301 226 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MEGAUPLOAD 2.0)"<br>
&gt;<br>&gt; notice anything unusual?&nbsp; what about the HTTP/1.0?&nbsp; HM??&nbsp;&nbsp; MS IE 6.0, RUNNING A HTTP 1.0 BROWSER?<br>&gt;<br>&gt; What category of sig? would you recommend?&nbsp;<br>&gt; 'stealth web crawl robot'?&nbsp;&nbsp;<br>&gt; looking for 'HTTP/1.0' and MSIE and Mozilla?<br>
&gt;<br>&gt; grep 208.80.194 access.log | egrep -v 'MSIE.*Windows'<br>&gt;<br>&gt; (gives me zero)<br>&gt;<br>&gt; so, something like '<br>&gt;<br>&gt; alert tcp $EXTERNAL_NET any -&gt; $HTTP_SERVERS $HTTP_PORTS (msg:"ET POLICY Stealth web crawler";<br>
&gt; flow:to_server,established; content:"HTTP/1.0"; content:"MSIE"; content: "Windows"; nocase; http_uri; classtype:web-application-activity<br>&gt;<br>&gt;<br>&gt; too broad? get lots of them.<br>
&gt;<br>&gt; 46.246.125.94 - - [01/Nov/2011:05:39:11 -0400] "GET /company/board/victornappe.html HTTP/1.0" 200 21414 "<a href="http://www.secnap.com/company/board/victornappe.html">http://www.secnap.com/company/board/victornappe.html</a>" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Maxthon; .NET CLR 1.1.43<br>
&gt;<br>&gt;<br>&gt; tighten it up:<br>&gt; alert tcp [<a href="http://208.80.192.0/21">208.80.192.0/21</a>] any -&gt; $HTTP_SERVERS $HTTP_PORTS (msg:"ET POLICY Stealth websense crawler";<br>&gt; flow:to_server,established; content:"HTTP/1.0"; content:"MSIE"; content: "Windows"; nocase; http_uri; classtype:web-application-activity<br>
&gt;<br>&gt;<br>&gt; --<br>&gt; Michael Scheidell, CTO<br>&gt; o: 561-999-5000<br>&gt; d: 561-948-2259<br>&gt;&gt; | SECNAP Network Security Corporation<br>&gt;<br>&gt; Best Mobile Solutions Product of 2011<br>&gt; Best Intrusion Prevention Product<br>
&gt; Hot Company Finalist 2011<br>&gt; Best Email Security Product<br>&gt; Certified SNORT Integrator<br>&gt;<br>&gt;<br>&gt; ________________________________<br>&gt; This email has been scanned and certified safe by SpammerTrap&reg;.<br>
&gt; For Information please see <a href="http://www.spammertrap.com/">http://www.spammertrap.com/</a><br>&gt; ________________________________<br>&gt;</p></div>
Nathan | 5 Nov 20:42 2011
Picon

Re: websense violating robots.txt to scan your web site. Fwd: FW: alert: New event: ET SCAN WebHack Control Center User-Agent Inbound (WHCC/)

On 11/05/11 06:54, Michael Scheidell wrote:
> websense violating robots.txt files.
> 
> seems websense is forging user agents, in attempts to scan/mirror/ download your
> web site.

WebSense is a great annoyance.  They consistently miscategorize items; for
example the NRA-ILA is classified as "Weapons/Firearms" while the Brady Campaign
is classified as civil/government.  Ridiculous one-way agenda-based filtering.

I've kicked them off packetmail and they turned around and classified it as
"Illegal".

I freely ban these IP ranges I believe the below to be a fairly comprehensive
list of WebSense and their for-profit pet tricks.

#AS13448 WEBSENSE , Inc.
  /sbin/iptables -A blacklist -s 66.194.6.0/24 		-j DROP
  /sbin/iptables -A blacklist -s 67.117.201.128/28 	-j DROP
  /sbin/iptables -A blacklist -s 91.194.158.0/23 	-j DROP
  /sbin/iptables -A blacklist -s 204.15.64.0/21 	-j DROP
  /sbin/iptables -A blacklist -s 192.132.210.0/24 	-j DROP
  /sbin/iptables -A blacklist -s 207.114.184.0/24 	-j DROP
  /sbin/iptables -A blacklist -s 208.80.192.0/21 	-j DROP

Nathan

Gmane