Gangadharan S.A. | 15 Apr 19:05 2009
Picon

Re: psycopg2 2.0.8 - segmentation fault

Program received signal SIGABRT, Aborted.
[Switching to Thread 1283557696 (LWP 12437)]
0x00002b2fdc478535 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00002b2fdc478535 in raise () from /lib64/libc.so.6
#1  0x00002b2fdc479990 in abort () from /lib64/libc.so.6
#2  0x00002b2fdc4af6db in __libc_message () from /lib64/libc.so.6
#3  0x00002b2fdc4b48fe in malloc_printerr () from /lib64/libc.so.6
#4  0x00002b2fdc4b5f36 in free () from /lib64/libc.so.6
#5  0x00002b2fe062f7e7 in connection_dealloc (obj=0x2aaaaac0ef30) at psycopg/connection_type.c:464

I managed to reproduce and fix the issue. This is happening because when dealloc-ing, the "Py_BEGIN_ALLOW_THREADS;" in conn_close is letting other threads launch the garbage collector which in turn ends up running the dealloc a second time.

The fix is to untrack the connnection object from GC, before going ahead with dealloc, as said in http://docs.python.org/c-api/gcsupport.html

"Similarly, the deallocator for the object must conform to a similar pair of rules:
  1. Before fields which refer to other containers are invalidated, PyObject_GC_UnTrack must be called.
  2. The object’s memory must be deallocated using PyObject_GC_Del."
I have attached script to reproduce the issue and the 1 line patch against current trunk to fix it.

Thanks,
Gangadharan


Attachment (double_dealloc_fix.diff): text/x-patch, 482 bytes
Attachment (trigger_double_dealloc.py): text/x-python, 2686 bytes
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Federico Di Gregorio | 15 Apr 23:41 2009

Re: psycopg2 2.0.8 - segmentation fault

Il giorno mer, 15/04/2009 alle 22.35 +0530, Gangadharan S.A. ha scritto:
>         Program received signal SIGABRT, Aborted.
>         [Switching to Thread 1283557696 (LWP 12437)]
>         0x00002b2fdc478535 in raise () from /lib64/libc.so.6
>         (gdb) bt
>         #0  0x00002b2fdc478535 in raise () from /lib64/libc.so.6
>         #1  0x00002b2fdc479990 in abort () from /lib64/libc.so.6
>         #2  0x00002b2fdc4af6db in __libc_message ()
>         from /lib64/libc.so.6
>         #3  0x00002b2fdc4b48fe in malloc_printerr ()
>         from /lib64/libc.so.6
>         #4  0x00002b2fdc4b5f36 in free () from /lib64/libc.so.6
>         #5  0x00002b2fe062f7e7 in connection_dealloc
>         (obj=0x2aaaaac0ef30) at psycopg/connection_type.c:464
> 
> I managed to reproduce and fix the issue. This is happening because
> when dealloc-ing, the "Py_BEGIN_ALLOW_THREADS;" in conn_close is
> letting other threads launch the garbage collector which in turn ends
> up running the dealloc a second time.
> 
> The fix is to untrack the connnection object from GC, before going
> ahead with dealloc, as said in
> http://docs.python.org/c-api/gcsupport.html
> 
> "Similarly, the deallocator for the object must conform to a similar
> pair of rules: 
>      1. Before fields which refer to other containers are invalidated,
>         PyObject_GC_UnTrack must be called.

One always learn something. Thank you _very_ much for this patch.

federico

--

-- 
Federico Di Gregorio                         http://people.initd.org/fog
Debian GNU/Linux Developer                                fog@...
INIT.D Developer                                           fog@...
  Se consideri l'uso del software libero una concessione tu stesso,
   come potrai proporla agli altri?                         -- Nick Name
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Gangadharan S.A. | 19 Apr 23:28 2009
Picon

Re: psycopg2 2.0.8 - segmentation fault

I managed to reproduce and fix the issue. This is happening because when dealloc-ing, the "Py_BEGIN_ALLOW_THREADS;" in conn_close is letting other threads launch the garbage collector which in turn ends up running the dealloc a second time.

The fix is to untrack the connnection object from GC, before going ahead with dealloc, as said in http://docs.python.org/c-api/gcsupport.html

There was a similar problem with cursor class. Cursor dealloc can internally invoke connection dealloc and hence we can get into the same double dealloc problem for cursor deallocation. The effects of this double dealloc are not immediately visible though, as cursor dealloc seems idempotent for the most part. It was leading to gradual memory corruption and eventually a segmentation fault.

Have attached the script to reproduce this issue and the one line fix for it.

There are other issues I noticed. Unfortunately, I won't be able to spend time on these in the near future, so I'm putting them out here for anyone else who can fix them. If not, I will try and get to these when time permits:

1. All / most of the classes in psycopg2 code seem to have Py_TPFLAGS_HAVE_GC, so they need to a PyObject_GC_UnTrack as their first step: http://docs.python.org/c-api/gcsupport.html . But since psycopg2 classes haven't implemented tp_clear either, the actual double dealloc probably won't happen, unless people inherit these classes. (Heap classes get the subtype_clear, tp_clear implementation.)

2. Classes should implement tp_clear wherever a cycle is possible (such as a connection object and its "async_cursor" cursor object). Without this, there is a memory leak of cyclic trash accumulation. Note, that once this is implemented, all Py_DECREF and PY_CLEAR in the _dealloc methods need to replaced by Py_XDECREF , as things may get null-ed by tp_clear, before invocation of dealloc.

3. Wherever the code creates or deletes python objects (such as list, dict, string etc), other than in the constructors and desctructors, the code should acquire the GIL before doing so. I think this is a general guideline for the python C/API (I remember seeing this somewhere, but am unable to find an online reference now.). This is because any python operation can lead to a garbage collection run (Gargabe collection runs get triggered when the difference between the number of python object allocs and deallocs reaches a certain value) and it needs to be ensured that there is atmost only one thread running the garbage collector at a time.

Thanks,
Gangadharan
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Gangadharan S.A. | 19 Apr 23:37 2009
Picon

Re: psycopg2 2.0.8 - segmentation fault


Have attached the script to reproduce this issue and the one line fix for it.

Oops, missed the attachement. Here it is now.

Thanks,
Gangadharan
Attachment (cursor_dealloc_fix.diff): text/x-diff, 445 bytes
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Karsten Hilbert | 19 Apr 23:38 2009
Picon
Picon

Re: psycopg2 2.0.8 - segmentation fault

Just a short thanks for your tracking down such issues.

Karsten

On Mon, Apr 20, 2009 at 02:58:12AM +0530, Gangadharan S.A. wrote:

> > I managed to reproduce and fix the issue. This is happening because when
> > dealloc-ing, the "Py_BEGIN_ALLOW_THREADS;" in conn_close is letting other
> > threads launch the garbage collector which in turn ends up running the
> > dealloc a second time.
> >
> > The fix is to untrack the connnection object from GC, before going ahead
> > with dealloc, as said in http://docs.python.org/c-api/gcsupport.html

...

--

-- 
GPG key ID E4071346  <at>  wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
James Henstridge | 20 Apr 12:00 2009
Picon

Re: psycopg2 2.0.8 - segmentation fault

On Mon, Apr 20, 2009 at 5:28 AM, Gangadharan S.A.
<gangadharan@...> wrote:
>> I managed to reproduce and fix the issue. This is happening because when
>> dealloc-ing, the "Py_BEGIN_ALLOW_THREADS;" in conn_close is letting other
>> threads launch the garbage collector which in turn ends up running the
>> dealloc a second time.
>>
>> The fix is to untrack the connnection object from GC, before going ahead
>> with dealloc, as said in http://docs.python.org/c-api/gcsupport.html
>
> There was a similar problem with cursor class. Cursor dealloc can internally
> invoke connection dealloc and hence we can get into the same double dealloc
> problem for cursor deallocation. The effects of this double dealloc are not
> immediately visible though, as cursor dealloc seems idempotent for the most
> part. It was leading to gradual memory corruption and eventually a
> segmentation fault.
>
> Have attached the script to reproduce this issue and the one line fix for
> it.
>
> There are other issues I noticed. Unfortunately, I won't be able to spend
> time on these in the near future, so I'm putting them out here for anyone
> else who can fix them. If not, I will try and get to these when time
> permits:
>
> 1. All / most of the classes in psycopg2 code seem to have
> Py_TPFLAGS_HAVE_GC, so they need to a PyObject_GC_UnTrack as their first
> step: http://docs.python.org/c-api/gcsupport.html . But since psycopg2
> classes haven't implemented tp_clear either, the actual double dealloc
> probably won't happen, unless people inherit these classes. (Heap classes
> get the subtype_clear, tp_clear implementation.)

If the type only uses Py_CLEAR() or equivalent to free its resources,
then it isn't necessary to untrack the object (although it probably
won't hurt).

This macro essentially does the following:

    tmp = self->whatever;
    self->whatever = NULL;
    if (tmp != NULL) {
        /* free tmp */
    }

Given that the code is protected by the GIL, there isn't a race in
unsetting the object.

> 2. Classes should implement tp_clear wherever a cycle is possible (such as a
> connection object and its "async_cursor" cursor object). Without this, there
> is a memory leak of cyclic trash accumulation. Note, that once this is
> implemented, all Py_DECREF and PY_CLEAR in the _dealloc methods need to
> replaced by Py_XDECREF , as things may get null-ed by tp_clear, before
> invocation of dealloc.

As noted above, Py_CLEAR() is GC safe.

Also, for immutable objects that don't cause cycles on allocation it
is generally safe to go without a tp_clear() method.  As an example,
if an object holds a reference to a dictionary, there is no way to
change which dictionary is pointed to and the dictionary is allocated
by the constructor, then you don't need a tp_clear() to kill that
reference: the other objects in a potential cycle (e.g. the dictionary
itself) are responsible.

The async_cursor bit does sound like a cycle we should handle though.

> 3. Wherever the code creates or deletes python objects (such as list, dict,
> string etc), other than in the constructors and desctructors, the code
> should acquire the GIL before doing so. I think this is a general guideline
> for the python C/API (I remember seeing this somewhere, but am unable to
> find an online reference now.). This is because any python operation can
> lead to a garbage collection run (Gargabe collection runs get triggered when
> the difference between the number of python object allocs and deallocs
> reaches a certain value) and it needs to be ensured that there is atmost
> only one thread running the garbage collector at a time.

In C function/method calls, the GIL is acquired by default so things
are probably safer than you make out.  Such manipulation functions
need to be GC safe if they're decrementing reference counts though.
Py_CLEAR() the old value, incref the new value before assigning to the
pointer.  Alternatively, untracking and retracking the object is a big
hammer that can be used to avoid the problem.

James.
Gangadharan S.A. | 20 Apr 14:26 2009
Picon

Re: psycopg2 2.0.8 - segmentation fault

> 1. All / most of the classes in psycopg2 code seem to have
> Py_TPFLAGS_HAVE_GC, so they need to a PyObject_GC_UnTrack as their first
> step: http://docs.python.org/c-api/gcsupport.html . But since psycopg2
> classes haven't implemented tp_clear either, the actual double dealloc
> probably won't happen, unless people inherit these classes. (Heap classes
> get the subtype_clear, tp_clear implementation.)

If the type only uses Py_CLEAR() or equivalent to free its resources,
then it isn't necessary to untrack the object (although it probably
won't hurt).

This macro essentially does the following:

   tmp = self->whatever;
   self->whatever = NULL;
   if (tmp != NULL) {
       /* free tmp */
   }

Given that the code is protected by the GIL, there isn't a race in
unsetting the object.
 
That is true. But there are times when the code voluntarily releases the GIL from inside the destructor, such as the conn_close in my above 2 fixes. Then, the garbage collector can end up running concurrent to the current dealloc. In those cases, we need to untrack the object first.

Btw, I'm trying to decide whether I should manually patch psycopg for my organization or if I should wait for a release of psycopg with both my patches in. Can someone please make an estimate on when there will be a release with both these patches applied (double_dealloc_fix.diff and cursor_dealloc_fix.diff) ?

Thanks,
Gangadharan


_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Federico Di Gregorio | 20 Apr 14:28 2009

Re: psycopg2 2.0.8 - segmentation fault

Il giorno lun, 20/04/2009 alle 17.56 +0530, Gangadharan S.A. ha scritto:
> 
> Btw, I'm trying to decide whether I should manually patch psycopg for
> my organization or if I should wait for a release of psycopg with both
> my patches in. Can someone please make an estimate on when there will
> be a release with both these patches applied (double_dealloc_fix.diff
> and cursor_dealloc_fix.diff) ?

Your first patch is already in bzr. I'll apply the second one then
release ASAP. 

federico

--

-- 
Federico Di Gregorio                         http://people.initd.org/fog
Debian GNU/Linux Developer                                fog@...
INIT.D Developer                                           fog@...
                             Best friends are often failed lovers. -- Me
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg
Gangadharan S.A. | 20 Apr 15:23 2009
Picon

Re: psycopg2 2.0.8 - segmentation fault

Your first patch is already in bzr. I'll apply the second one then
release ASAP.


Great. Thanks!

Gangadharan
_______________________________________________
Psycopg mailing list
Psycopg@...
http://lists.initd.org/mailman/listinfo/psycopg

Gmane