Mark Richardson | 24 Aug 2011 02:30
Picon
Favicon

Mico stuck on a socket

Hi all, I haven't written in a while, but I've come up against something very unusual....

We recently switched over to mico 2.3.13 with some home-grown multi-threaded queue handlers.  Everything was working grand, until we had one service hang.  We killed it and everything worked fine again for a week - same thing happened again.  Then some other services started experiencing the same problems.

As a quick fix, we put in a timer set at 0.5 hours, and if no corba functions are called, then the service kills itself.  This works fine because we use the imr, so when a client asks mico where the service is, the mico daemon automatically starts up the service.  This fixed one of the services and we haven't had any problems since.

However, the original service is still stuck - so after much debugging and rooting around in the mico code, we came up with an unusual situation. The service receives the corba header telling it how long the incoming message is (the header is 12 bytes, and the incoming message is approximately 8300 bytes).  The socket is read numerous times to get the entire 8300 bytes, but after about 5200 bytes, the reading stops.  This wouldn't be so bad, but all the threads in the service all stop receiving messages (None of them respond to any new messages coming in).  The homegrown queue thread is still running and watching for any new messages, but no other messages come in.

The strange part, is if I use "lsof" and compare the file descriptor with the debugger file descriptor, I see that the socket connection is "ESTABLISHED".  But, if I go to the other machine (that's sending the message) I don't see any socket connection - no fin wait, no closed, nothing!  So the service will never receive the last 3100 bytes.

I've tried duplicating this in a test environment and I've hammered the service with no problems.  The service resides on an HPux 11i and the client normally resides on an SGI.  I have done all my testing with the client on HPux 11i and linux, but never SGI (that's coming next).  Anybody know of inherent socket issues with SGI or HPux 11i to SGI?

I can give specific debug info and lines of code if anyone is interested.  Any help is appreciated!

Thanks,
Mark
------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Mico-devel mailing list
Mico-devel <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mico-devel

Gmane