histrionics | 17 May 01:13
Picon

Timeout not obeyed when trying to open bad url


Here is something Ive never seen before:

I have a list of urls fed into mechanize (which uses net/http to grab
pages)

I have it setup as thus:
require 'mechanize'
  agent = WWW::Mechanize.new;
  error_count = 0
  begin
    Timeout::timeout(2) {
      @tracked_page = agent.get("http://#{site_url}")
    }
  rescue Timeout::Error => timeout_error
    puts "I TIMED OUT AFTER 2 SECS BUT IM TRYING AGAIN:
#{timeout_error}"
    error_count += 1
    if error_count < 5
      puts "ATTEMPT NUMBER #{@error_count} QUITTING AFTER 4 TRIES"
      retry
    end
end

This is all well and good, it works fine and catches any timeout
exceptions, except when its trying to deal with one particular URL
(www.webdevking.com).

This URL is not currently resolving to any host. it returns "unknown
host" when you try to connect to it.
(Continue reading)

Frederick Cheung | 17 May 02:33
Picon
Gravatar

Re: Timeout not obeyed when trying to open bad url


On 17 May 2008, at 00:16, histrionics wrote:

>
> Here is something Ive never seen before:
>
I've seen this before. Bottom line is, ruby's threading sucks, and  
when running c code (extensions or bits of the stdlib that call  
through to c code) you can block the entire ruby interpreter. Things  
that can do this include mysql queries, some parts of name resolving,  
and it would appear some other bits of the networking libraries.

Fred
> I have a list of urls fed into mechanize (which uses net/http to grab
> pages)
>
> I have it setup as thus:
> require 'mechanize'
>  agent = WWW::Mechanize.new;
>  error_count = 0
>  begin
>    Timeout::timeout(2) {
>      @tracked_page = agent.get("http://#{site_url}")
>    }
>  rescue Timeout::Error => timeout_error
>    puts "I TIMED OUT AFTER 2 SECS BUT IM TRYING AGAIN:
> #{timeout_error}"
>    error_count += 1
>    if error_count < 5
>      puts "ATTEMPT NUMBER #{@error_count} QUITTING AFTER 4 TRIES"
(Continue reading)


Gmane