8 Jun 2012 17:47
Re: Strace output
CS Lee <geek00l <at> gmail.com>
2012-06-08 15:47:00 GMT
2012-06-08 15:47:00 GMT
hi Carter,
I basically run argus on bivio, and radium on another linux box, but they are connected via direct 10G link.
Now I run everything in the bivio box, In order for argus to run in foreground and check, I need to force it to run on 1 cpu, I start argus and radium, nothing much happening and it stays, however when I use ra to connect to radium, after a while here's what I get -
argus
argus[1708.48c93490]: 08 Jun 12 22:05:38.142199 ArgusWriteSocket: write (4, 0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142226 ArgusWriteSocket: write (4, 0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142251 ArgusWriteSocket: write (4, 0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142277 ArgusWriteSocket: write (4, 0x693015e0, 32, ...) -1
Killed
radium -
radium[1756]: 22:03:15.953146 connect from localhost
radium[1756]: 22:03:55.399637 ArgusWriteOutSocket(0x49b5a4e8) client not processing: disconnecting
radium[1756]: 22:05:47.968393 connect to 10.0.0.1:561 failed 'Connection refused'
ra just quit
By the way now argus is running on less than 1G traffic. I used to run argus on gigabit network and never see such issue, anyway bivio is new for me as I have never used it last time.
On Fri, Jun 8, 2012 at 10:44 PM, Carter Bullard <carter <at> qosient.com> wrote:
Hey CS Lee,OK, so two things, first there does seem to be a bug in how argus triesto gracefully recover from this type of problem. I am working on that now.Second, we need to get things such that the argus data flow is stable, thenadd components to see what is causing the problem. Also, we'd liketo insulate argus from all this, so that it doesn't die.What seems to be the problem is your clients are connecting, but not readingflow data fast enough ( my interpretation of the write failure messages, andpossibly the "client not ready" messages ). Argus is designed to allow fora large number of write errors that are related to client queuing and flowcontrol, but the real bug is that argus is not dealing with slow clients verywell, leaving data in queues, not clearing status quickly enough, and thengiving up, but not terminating properly.As a work around to this problem, we need to get the first link in your datachain, argus -> radium, so that the channel never back pressures argus.Does the argus radium connection work without any ra* clients attached?Where does your radium run? On the Bivio or another machine ?If radium is not running on Bivio, I would recommend that we do that, so thatradium is managing the interface that remote clients interact with, andargus only see's a single consistent connect from a single radium.But I will also recommend that you run a radium on the remote machine,so that the data chain is [ argus -> radium ] -> [ radium->ra*].Lets get the data flow going reliably, without ra* clients, and then see whatis going on when it attaches.CarterOn Jun 8, 2012, at 7:07 AM, CS Lee wrote:
<bivio-argus-strace.log>hi Carter,I'm not sure if this is useful to help, here's the output from strace -strace -c /usr/local/sbin/argus -i s0.e0argus[28208]: 08 Jun 12 17:12:50.271411 startedargus[28208]: 08 Jun 12 17:12:50.292235 ArgusGetInterfaceStatus: interface s0.e0 is upargus[28208]: 08 Jun 12 17:14:18.699681 connect from 10.0.0.3% time seconds usecs/call calls errors syscall------ ----------- ----------- --------- --------- ----------------99.68 41.720000 164252 254 126 futex0.17 0.072972 973 75 mmap0.12 0.050000 50000 1 1 restart_syscall0.02 0.009062 432 21 munmap0.00 0.000884 34 26 5 setsockopt0.00 0.000144 3 46 10 open0.00 0.000000 0 112 read0.00 0.000000 0 1 write0.00 0.000000 0 62 close0.00 0.000000 0 1 waitpid0.00 0.000000 0 1 execve0.00 0.000000 0 4 time0.00 0.000000 0 1 setuid0.00 0.000000 0 2 getuid0.00 0.000000 0 1 1 access0.00 0.000000 0 5 brk0.00 0.000000 0 1 getgid0.00 0.000000 0 56 1 ioctl0.00 0.000000 0 3 clone0.00 0.000000 0 28 mprotect0.00 0.000000 0 3 _llseek0.00 0.000000 0 1 select0.00 0.000000 0 1 writev0.00 0.000000 0 2 sched_get_priority_max0.00 0.000000 0 2 sched_get_priority_min0.00 0.000000 0 8 rt_sigaction0.00 0.000000 0 2 rt_sigprocmask0.00 0.000000 0 1 getrlimit0.00 0.000000 0 5 mmap20.00 0.000000 0 1 stat640.00 0.000000 0 30 fstat640.00 0.000000 0 2 getdents640.00 0.000000 0 5 fcntl640.00 0.000000 0 1 set_tid_address0.00 0.000000 0 126 clock_gettime0.00 0.000000 0 1 tgkill0.00 0.000000 0 1 get_robust_list0.00 0.000000 0 1 SYS_3170.00 0.000000 0 27 socket0.00 0.000000 0 8 bind0.00 0.000000 0 7 3 connect0.00 0.000000 0 1 listen0.00 0.000000 0 5 getsockname0.00 0.000000 0 4 sendto0.00 0.000000 0 9 getsockopt0.00 0.000000 0 11 recvmsg------ ----------- ----------- --------- --------- ----------------100.00 41.853062 966 147 totalHopefully this strace is helpful.--
Best Regards,
CS Lee<geek00L[at]gmail.com>
http://geek00l.blogspot.com
http://defcraft.net
Best Regards,
CS Lee<geek00L[at]gmail.com>
http://geek00l.blogspot.com
http://defcraft.net
RSS Feed