22 Jul 2012 16:43
Race condition in garbage collector
Hi,
I am writing this email with Bcc to the ECL mailing list and to the GC developers mailing list. I just discovered a serious race condition that prevents our program from exiting. This race condition happens between the exit code associated to a call to dlclose() and the exit code from a POSIX thread.
Roughly, we just run ECL, load a bunch of libraries (DLLs) and then quit the program. At exit time two things will happen: the libraries will have to be unloaded and the servicing threads will exit. This results in the program hanging, as shown below
1) This thread is a servicing one. It is trying to exit and in the process it acquires the GC lock, but for some reason the thread invokes the dyld library. I still haven't located where in GC this happens but from the symptoms it seems it is close to GC_unregister...
(gdb) thread 2
(gdb) bt
#0 0x00007fff88009bf2 in __psynch_mutexwait ()
#1 0x00007fff897d31a1 in pthread_mutex_lock ()
#2 0x00007fff84eae623 in dyldGlobalLockAcquire ()
#3 0x00007fff6172a745 in __dyld__ZN26ImageLoaderMachOCompressed20doBindFastLazySymbolEjRKN11ImageLoader11Link\
ContextEPFvvES5_ ()
#4 0x00007fff61717922 in __dyld__ZN4dyld18fastBindLazySymbolEPP11ImageLoaderm ()
#5 0x00007fff84eae716 in dyld_stub_binder_ ()
#6 0x0000000101d01458 in C.88.15036 ()
#7 0x0000000101c73100 in GC_inner_start_routine (sb=0x1041deeb0, arg=0x102117ea0) at pthread_start.c:67
#8 0x0000000101c6eb1c in GC_call_with_stack_base (fn=0x101c73030 <GC_inner_start_routine>, arg=0x102117ea0) a\
t misc.c:1510
#9 0x0000000101c74565 in GC_start_routine (arg=0x102117ea0) at pthread_support.c:1504
#10 0x00007fff897d48bf in _pthread_start ()
#11 0x00007fff897d7b75 in thread_start ()
2) This thread is the main one. It is trying to close a bunch of libraries, none of which are related to the thread above. However, when dlclose() is called, some code associated to the garbage collector is run and we enter a race condition.
(gdb) thread 1
[Switching to thread 1 (process 37491), "com.apple.main-thread"]
0x00007fff88009bf2 in __psynch_mutexwait ()
(gdb) bt
#0 0x00007fff88009bf2 in __psynch_mutexwait ()
#1 0x00007fff897d31a1 in pthread_mutex_lock ()
#2 0x0000000101c74833 in GC_lock () at pthread_support.c:1784
#3 0x0000000101c6c53d in GC_remove_roots (b=0x104f03220, e=0x104f03238) at mark_rts.c:311
#4 0x0000000101c61f20 in GC_dyld_image_remove (hdr=0x104eff000, slide=4377800704) at dyn_load.c:1319
#5 0x00007fff61714bdd in __dyld__ZN4dyld11removeImageEP11ImageLoader ()
#6 0x00007fff6171858d in __dyld__ZN4dyld20garbageCollectImagesEv ()
#7 0x00007fff6171c432 in __dyld_dlclose ()
#8 0x00007fff84eaebd5 in dlclose ()
#9 0x0000000101c2ae8c in dlclose_wrapper [inlined] () at /Users/jjgarcia/devel/ecl/src/c/ffi/libraries.d:432
#10 0x0000000101c2ae8c in ecl_library_close (block=0x103be4e00) at libraries.d:432
#11 0x0000000101c2af79 in ecl_library_close_all () at libraries.d:448
#12 0x0000000101b1a84d in cl_shutdown () at main.d:301
#13 0x0000000101b1a964 in si_exit (narg=4377800704) at main.d:839
#14 0x0000000101b13e47 in main ()
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ Ecls-list mailing list Ecls-list@... https://lists.sourceforge.net/lists/listinfo/ecls-list
RSS Feed