Rüdiger Asche | 30 Jun 2012 00:33
Picon

strange issue with tcp-connect on Windows...

Hi there,
 
today's bug from hell might not be a Racket related issue (to me it seems more of a Windows type of thing), but maybe somebody has already come across something similar, so here goes:
 
I have a Racket application serving as a stress test against a TCP server, that is, the Racket application calls tcp-connect() to the remote server, then does some data exchange, closes the ports and starts all over again. To make it more interesting, I have 12 instances of this application running concurrently, so there is quite a lot of connection establishment and -takedown going on.
 
So far, so good - only thing I need to mention is that I call tcp-connect only with the minimum necessary parameters (remote name and remote port).
 
All of this runs fine for a while, until ALL of the stress clients begin to raise WSAEADDRESSINUSE exceptions. Now I understand that WSAEADDRESSINUSE means that an application tries to bind a socket to a local port that is already used for another connection - but in the case of tcp-connect(), I assume that the call with no local port translates to a bind() with port 0 which instructs the TCP layer to select a free port? Or does the Racket TCP/IP translation layer do its own magic with local port selection which might be buggy?
 
I also understand that the OS will keep bound sockets around for a while (which can be easily verified by looking at the netstat() output), but the "normal" BSD compatible implementation of bind(...0...) will skip sockets in those states, so I don't quite understand where the WSAEADDRESSINUSE error comes from unless the port selection was made outside of the low level network software...
 
any ideas?
 
Thanks!
  
____________________
  Racket Users list:
  http://lists.racket-lang.org/users
Neil Van Dyke | 30 Jun 2012 02:36
Picon
Favicon

Re: strange issue with tcp-connect on Windows...

Matthew can address the Racket internals questions, but one thing I'd 
check, if you haven't already...

At the moment of the error, have all the free TCP ports on one of the 
involved nodes been exhausted?  You can probably check this on the 
individual hosts close enough to the moment of the error, even if you 
can't check at the exact moment.

You can also reconstruct likely TCP stack states of each host after the 
fact, if you capture all the TCP traffic of each node on the wire, and 
you can also see whether protocol that should be happening (e.g., 
shutdown handshake) isn't happening on the wire in a timely enough 
manner.  (To help sniff traffic between N hosts on a LAN without the 
additional complications of hacking an OpenWRT device, I got an old 
*unswitched* Ethernet hub off eBay.  You might already have better 
network test equipment, but I'm mentioning this for any random person 
who reads this later.)

Neil V.

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Matthew Flatt | 30 Jun 2012 16:03
Picon
Favicon
Gravatar

Re: strange issue with tcp-connect on Windows...

At Sat, 30 Jun 2012 00:33:36 +0200, Rüdiger Asche wrote:
> All of this runs fine for a while, until ALL of the stress clients
> begin to raise WSAEADDRESSINUSE exceptions. [...] in the case of
> tcp-connect(), I assume that the call with no local port translates
> to a bind() with port 0 which instructs the TCP layer to select a
> free port?

That's essentially correct. More precisely, Racket doesn't call bind(),
so connect() is responsible for picking a free port number.

> Or does the Racket TCP/IP translation layer do 
> its own magic with local port selection

No --- unless you provide a fourth argument to `tcp-connect', of
course, in which case bind() is used before connect(), but you
mentioned that you provide only two arguments to `tcp-connect'.

> any ideas?

With 12 clients running full blast on the same machine(?), maybe you
really are running out of ports for connect() to pick from, as Neil
suggests?

It appears that Windows XP uses the range 1025 to 5000 for ephemeral
ports, while later versions of Windows apparently use 49152 to 65535.
So, there are 3976 or 16383 port numbers for connect() to choose from,
depending on the version of Windows that you're using.

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Gmane