Kristján Valur Jónsson | 17 Apr 12:55 2012

issue 9141, finalizers and gc module

Hello there.

For those familiar with the intricacies of the gcmodule.c, I would like to draw your attention to http://bugs.python.org/issue9141.

 

I would like to consult with you to find out more about finalizers/gc in order to improve the in-file documentation.

 

Traditionally, it has not been possible to collect objects that have __del__ methods, or more generally, finalizers.  Instead, they and any objects that are reachable from them, are put in gc.garbage.

 

What I want to know is, why is this limitation in place?  Here are two possibilities:

1)      „The order of calling finalizers in a cycle is undefined so it is not a solvable problem“.  But this would allow a single object, with only internal cycles to be collected.  Currently this is not the case.

2)      „During collection, the interpreter is in a fragile state (linked lists of gc objects with refcount bookkeeping in place) and no unknown code can be allowed to run“.  This is the reason I personally think is the true reason.

 

The reason I‘m asking is that python has traditionally tested for finalizers by checking the tp_del slot of the object‘s type.  This will be true if the object has a __del__ method.  Since generators were added, they use the tp_del slot for their own finalizers, but special code was put in place so that the generators could tell if the finalizer were „trivial“ or not (trivial meaning „just doing Py_DECREF()).

This allowed generators to be coollected too, if they were in a common, trivial state, but otherwise, they had to be put in gc.garbage().

 

Yesterday, I stumbled upon the fact that tp_dealloc of iobase objects also calls an internal finalizer, one that isn‘t exposed in any tp_del slot:  It will invoke a PyObject_CallMethod(self, „close“, „“) on itself.  This will happen whenever iobase objects are part of a cycle that needs to be cleared.  This can cause arbitrary code to run.  There are even provisions made for the resurrection of the iobase objects based on the action of this close() call.

 

Clearly, this has the potential to be non-trivial, and therefore, again, I see this as an argument for my proposed patched in issue 9141.  But others have voiced worries that if we stop collecting iobase objects, that would be a regression.

 

So, I ask you:  What is allowed during tp_clear()?  Is this a hard rule?  What is the reason?

 

Kristján

 

<div>
<div class="WordSection1">
<p class="MsoNormal"><span lang="IS">Hello there.<p></p></span></p>
<p class="MsoNormal"><span lang="IS">For those familiar with the intricacies of the gcmodule.c, I would like to draw your attention to
</span><a href="http://bugs.python.org/issue9141">http://bugs.python.org/issue9141</a>.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">I would like to consult with you to find out more about finalizers/gc in order to improve the in-file documentation.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">Traditionally, it has not been possible to collect objects that have __del__ methods, or more generally, finalizers. &nbsp;Instead, they and any objects that are reachable from them, are put in gc.garbage.<p></p></p>
<p class="MsoNormal"><p>&nbsp;</p></p>
<p class="MsoNormal">What I want to know is, why is this limitation in place? &nbsp;Here are two possibilities:<p></p></p>
<p class="MsoListParagraph"><span lang="IS"><span>1)<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span></span><span lang="IS">&bdquo;The order of calling finalizers in a cycle is undefined so it is not a solvable problem&ldquo;.&nbsp; But this would allow a single object, with only internal cycles to be collected.&nbsp; Currently this is not the case.<p></p></span></p>
<p class="MsoListParagraph"><span lang="IS"><span>2)<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
</span></span></span><span lang="IS">&bdquo;During collection, the interpreter is in a fragile state (linked lists of gc objects with refcount bookkeeping in place) and no unknown code can be allowed to run&ldquo;.&nbsp; This is the reason I personally think is the
 true reason.<p></p></span></p>
<p class="MsoNormal"><span lang="IS"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="IS">The reason I&lsquo;m asking is that python has traditionally tested for finalizers by checking the tp_del slot of the object&lsquo;s type.&nbsp; This will be true if the object has a __del__ method. &nbsp;Since generators were added, they use
 the tp_del slot for their own finalizers, but special code was put in place so that the generators could tell if the finalizer were &bdquo;trivial&ldquo; or not (trivial meaning &bdquo;just doing Py_DECREF()).<p></p></span></p>
<p class="MsoNormal"><span lang="IS">This allowed generators to be coollected too, if they were in a common, trivial state, but otherwise, they had to be put in gc.garbage().<p></p></span></p>
<p class="MsoNormal"><span lang="IS"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="IS">Yesterday, I stumbled upon the fact that tp_dealloc of iobase objects also calls an internal finalizer, one that isn&lsquo;t exposed in any tp_del slot:&nbsp; It will invoke a PyObject_CallMethod(self, &bdquo;close&ldquo;, &bdquo;&ldquo;) on itself.&nbsp; This
 will happen whenever iobase objects are part of a cycle that needs to be cleared.&nbsp; This can cause arbitrary code to run.&nbsp; There are even provisions made for the resurrection of the iobase objects based on the action of this close() call.<p></p></span></p>
<p class="MsoNormal"><span lang="IS"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="IS">Clearly, this has the potential to be non-trivial, and therefore, again, I see this as an argument for my proposed patched in issue 9141.&nbsp; But others have voiced worries that if we stop collecting iobase objects, that would
 be a regression.<p></p></span></p>
<p class="MsoNormal"><span lang="IS"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="IS">So, I ask you:&nbsp; What is allowed during tp_clear()?&nbsp; Is this a hard rule?&nbsp; What is the reason?<p></p></span></p>
<p class="MsoNormal"><span lang="IS"><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span lang="IS">Kristj&aacute;n<p></p></span></p>
<p class="MsoNormal"><span lang="IS"><p>&nbsp;</p></span></p>
</div>
</div>
martin | 17 Apr 16:45 2012
Picon

Re: issue 9141, finalizers and gc module

> What I want to know is, why is this limitation in place?  Here are  
> two possibilities:
>
> 1)      "The order of calling finalizers in a cycle is undefined so  
> it is not a solvable problem".  But this would allow a single  
> object, with only internal cycles to be collected.  Currently this  
> is not the case.

It's similar to this, but not exactly: "A finalizer in a cycle mail
try to refer back to an object that was already cleared, and fail
because of that; this may cause arbitrary failures changing from
run to run".

It's true that a cycle involving only a single object with __del__
could be safely collected. This special case was not implemented.

> 2)      "During collection, the interpreter is in a fragile state  
> (linked lists of gc objects with refcount bookkeeping in place) and  
> no unknown code can be allowed to run".  This is the reason I  
> personally think is the true reason.

No, that's not the case at all. As Antoine explains in the issue,
there are plenty of ways in which Python code can be run when breaking
a cycle. Not only weakrefs, but also objects released as a consequence
of tp_clear which weren't *in* the cycle (but hung from it).

> So, I ask you:  What is allowed during tp_clear()?  Is this a hard  
> rule?  What is the reason?

We are all consenting adults. Everything is allowed - you just have to
live with the consequences.

Regards,
Martin

Kristján Valur Jónsson | 17 Apr 19:22 2012

Re: issue 9141, finalizers and gc module


> -----Original Message-----
> 
> No, that's not the case at all. As Antoine explains in the issue, there are
> plenty of ways in which Python code can be run when breaking a cycle. Not
> only weakrefs, but also objects released as a consequence of tp_clear which
> weren't *in* the cycle (but hung from it).
I see, that makes sense.  The rule is, then that we cannot delete objects with finalalizer, that can reach
other garbage, simply because doing so may find the objects in an unexpected (cleared) state and thus
cause weird errors.
(weakrefs are a special case, apparently dealt with separately.  And the callback cannot refer back to the
referent) . 
 This reasoning belongs in the gcmodule.c, I think.
> 
> > So, I ask you:  What is allowed during tp_clear()?  Is this a hard
> > rule?  What is the reason?
> 
> We are all consenting adults. Everything is allowed - you just have to live with
> the consequences.

Well, we specifically decided that objects with __del__ methods that are part of a cycle cannot be run.
The same reasoning was applied to generators, if they are in a certain state.
What makes iobase so special that its 'close' method can be run even if it is part of a cycle?
Why not allow it for all objects, then?

At the very least, I think this behaviour (this exception for iobase) merits being explicitly documented.

Kristján

Antoine Pitrou | 17 Apr 20:30 2012
Picon

Re: issue 9141, finalizers and gc module

On Tue, 17 Apr 2012 17:22:57 +0000
Kristján Valur Jónsson <kristjan <at> ccpgames.com> wrote:
> > 
> > We are all consenting adults. Everything is allowed - you just have to live with
> > the consequences.
> 
> Well, we specifically decided that objects with __del__ methods that are part of a cycle cannot be run.
> The same reasoning was applied to generators, if they are in a certain state.
> What makes iobase so special that its 'close' method can be run even if it is part of a cycle?

The reason is that making file objects uncollectable when they are part
of a reference cycle would be a PITA and a serious regression for many
applications, I think.

> Why not allow it for all objects, then?

I'm not the author of the original GC design. Perhaps it was
deliberately conservative at the time? I think PyPy has a more tolerant
solution for finalizers in reference cycles, perhaps they can explain it
here.

Regards

Antoine.

Maciej Fijalkowski | 17 Apr 23:29 2012
Picon

Re: issue 9141, finalizers and gc module

On Tue, Apr 17, 2012 at 8:30 PM, Antoine Pitrou <solipsis <at> pitrou.net> wrote:

On Tue, 17 Apr 2012 17:22:57 +0000
Kristján Valur Jónsson <kristjan <at> ccpgames.com> wrote:
> >
> > We are all consenting adults. Everything is allowed - you just have to live with
> > the consequences.
>
> Well, we specifically decided that objects with __del__ methods that are part of a cycle cannot be run.
> The same reasoning was applied to generators, if they are in a certain state.
> What makes iobase so special that its 'close' method can be run even if it is part of a cycle?

The reason is that making file objects uncollectable when they are part
of a reference cycle would be a PITA and a serious regression for many
applications, I think.

> Why not allow it for all objects, then?

I'm not the author of the original GC design. Perhaps it was
deliberately conservative at the time? I think PyPy has a more tolerant
solution for finalizers in reference cycles, perhaps they can explain it
here.

Regards

Antoine.

PyPy breaks cycles randomly. I think a pretty comprehensive description of what happens is here:


Cheers,
fijal
<div>
<p>On Tue, Apr 17, 2012 at 8:30 PM, Antoine Pitrou <span dir="ltr">&lt;<a href="mailto:solipsis <at> pitrou.net">solipsis <at> pitrou.net</a>&gt;</span> wrote:<br></p>
<div class="gmail_quote">
<blockquote class="gmail_quote">

<div class="im">On Tue, 17 Apr 2012 17:22:57 +0000<br>
Kristj&aacute;n Valur J&oacute;nsson &lt;<a href="mailto:kristjan <at> ccpgames.com">kristjan <at> ccpgames.com</a>&gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; We are all consenting adults. Everything is allowed - you just have to live with<br>
&gt; &gt; the consequences.<br>
&gt;<br>
&gt; Well, we specifically decided that objects with __del__ methods that are part of a cycle cannot be run.<br>
&gt; The same reasoning was applied to generators, if they are in a certain state.<br>
&gt; What makes iobase so special that its 'close' method can be run even if it is part of a cycle?<br><br>
</div>The reason is that making file objects uncollectable when they are part<br>
of a reference cycle would be a PITA and a serious regression for many<br>
applications, I think.<br><div class="im">
<br>
&gt; Why not allow it for all objects, then?<br><br>
</div>I'm not the author of the original GC design. Perhaps it was<br>
deliberately conservative at the time? I think PyPy has a more tolerant<br>
solution for finalizers in reference cycles, perhaps they can explain it<br>
here.<br><br>
Regards<br><br>
Antoine.</blockquote>
<div><br></div>
<div>PyPy breaks cycles randomly. I think a pretty comprehensive description of what happens is here:</div>
<div><br></div>
<div><a href="http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html">http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html</a></div>

<div><a href="http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-2.html">http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-2.html</a></div>
<div><br></div>
<div>Cheers,</div>
<div>fijal</div>

</div>
</div>
Kristján Valur Jónsson | 20 Apr 15:33 2012

Re: issue 9141, finalizers and gc module

Thanks. I wonder if these semantics might not belong in cPython too, us being consenting adults and all that J

 

K

 

From: python-dev-bounces+kristjan=ccpgames.com <at> python.org [mailto:python-dev-bounces+kristjan=ccpgames.com <at> python.org] On Behalf Of Maciej Fijalkowski
Sent: 17. apríl 2012 21:29
To: Antoine Pitrou
Cc: python-dev <at> python.org
Subject: Re: [Python-Dev] issue 9141, finalizers and gc module

 

 

PyPy breaks cycles randomly. I think a pretty comprehensive description of what happens is here:

 

 

Cheers,

fijal

<div>
<div class="WordSection1">
<p class="MsoNormal"><span>Thanks. I wonder if these semantics might not belong in cPython too, us being consenting adults and all that
</span><span>J</span><span><p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<p class="MsoNormal"><span>K<p></p></span></p>
<p class="MsoNormal"><span><p>&nbsp;</p></span></p>
<div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US">From:</span><span lang="EN-US"> python-dev-bounces+kristjan=ccpgames.com <at> python.org [mailto:python-dev-bounces+kristjan=ccpgames.com <at> python.org]
On Behalf Of Maciej Fijalkowski<br>Sent: 17. apr&iacute;l 2012 21:29<br>To: Antoine Pitrou<br>Cc: python-dev <at> python.org<br>Subject: Re: [Python-Dev] issue 9141, finalizers and gc module<p></p></span></p>
</div>
</div>
<p class="MsoNormal"><p>&nbsp;</p></p>
<div>
<div>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div>
<div>
<p class="MsoNormal">PyPy breaks cycles randomly. I think a pretty comprehensive description of what happens is here:<p></p></p>
</div>
<div>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div>
<div>
<p class="MsoNormal"><a href="http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html">http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html</a><p></p></p>
</div>
<div>
<p class="MsoNormal"><a href="http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-2.html">http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-2.html</a><p></p></p>
</div>
<div>
<p class="MsoNormal"><p>&nbsp;</p></p>
</div>
<div>
<p class="MsoNormal">Cheers,<p></p></p>
</div>
<div>
<p class="MsoNormal">fijal<p></p></p>
</div>
</div>
</div>
</div>
</div>
Maciej Fijalkowski | 20 Apr 15:35 2012
Picon

Re: issue 9141, finalizers and gc module

On Fri, Apr 20, 2012 at 3:33 PM, Kristján Valur Jónsson <kristjan <at> ccpgames.com> wrote:

Thanks. I wonder if these semantics might not belong in cPython too, us being consenting adults and all that J


I would say it's saner, but it's just my opinion :)

Cheers,
fijal
 

 

K

 

From: python-dev-bounces+kristjan=ccpgames.com <at> python.org [mailto:python-dev-bounces+kristjan=ccpgames.com <at> python.org] On Behalf Of Maciej Fijalkowski
Sent: 17. apríl 2012 21:29
To: Antoine Pitrou
Cc: python-dev <at> python.org


Subject: Re: [Python-Dev] issue 9141, finalizers and gc module

 

 

PyPy breaks cycles randomly. I think a pretty comprehensive description of what happens is here:


<div>
<p>On Fri, Apr 20, 2012 at 3:33 PM, Kristj&aacute;n Valur J&oacute;nsson <span dir="ltr">&lt;<a href="mailto:kristjan <at> ccpgames.com">kristjan <at> ccpgames.com</a>&gt;</span> wrote:<br></p>
<div class="gmail_quote">
<blockquote class="gmail_quote">

<div lang="EN-GB" link="blue" vlink="purple">
<div>
<p class="MsoNormal"><span>Thanks. I wonder if these semantics might not belong in cPython too, us being consenting adults and all that
</span><span>J</span></p>
</div>
</div>
</blockquote>
<div><br></div>
<div>I would say it's saner, but it's just my opinion :)</div>
<div><br></div>
<div>Cheers,</div>

<div>fijal</div>
<div>&nbsp;</div>
<blockquote class="gmail_quote">
<div lang="EN-GB" link="blue" vlink="purple">
<div>
<p class="MsoNormal"><span></span></p>

<p class="MsoNormal"><span>&nbsp;</span></p>
<p class="MsoNormal"><span>K</span></p>
<p class="MsoNormal"><span>&nbsp;</span></p>
<div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US">From:</span><span lang="EN-US"> python-dev-bounces+kristjan=<a href="mailto:ccpgames.com <at> python.org" target="_blank">ccpgames.com <at> python.org</a> [mailto:<a href="mailto:python-dev-bounces%2Bkristjan" target="_blank">python-dev-bounces+kristjan</a>=<a href="mailto:ccpgames.com <at> python.org" target="_blank">ccpgames.com <at> python.org</a>]
On Behalf Of Maciej Fijalkowski<br>Sent: 17. apr&iacute;l 2012 21:29<br>To: Antoine Pitrou<br>Cc: <a href="mailto:python-dev <at> python.org" target="_blank">python-dev <at> python.org</a></span></p>
<div class="im">
<br>Subject: Re: [Python-Dev] issue 9141, finalizers and gc module</div>
<p></p>
</div>
</div>
<p class="MsoNormal">&nbsp;</p>
<div>
<div>
<p class="MsoNormal">&nbsp;</p>
</div>
<div>
<p class="MsoNormal">PyPy breaks cycles randomly. I think a pretty comprehensive description of what happens is here:</p>
</div>
<div class="im">
<div>
<p class="MsoNormal">&nbsp;</p>
</div>
<div>
<p class="MsoNormal"><a href="http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html" target="_blank">http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html</a></p>

</div>
<div>
<p class="MsoNormal"><a href="http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-2.html" target="_blank">http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-2.html</a></p>

</div>
<div>
<p class="MsoNormal">&nbsp;</p>
</div>
<div>
<p class="MsoNormal">Cheers,</p>
</div>
<div>
<p class="MsoNormal">fijal</p>
</div>
</div>
</div>
</div>
</div>
</div>

</blockquote>
</div>
<br>
</div>
martin | 18 Apr 09:11 2012
Picon

Re: issue 9141, finalizers and gc module

> Well, we specifically decided that objects with __del__ methods that  
> are part of a cycle cannot be run.
> The same reasoning was applied to generators, if they are in a certain state.
> What makes iobase so special that its 'close' method can be run even  
> if it is part of a cycle?

It's a hack, and I find it well-documented in iobase.c. It explains  
what tricks
it has to go through to still invoke methods from tp_del.

Invoking methods in tp_clear I find fairly harmless, in comparison. My only
concern is that errors are silently ignored. However, I don't think  
this matters
in practice, since io objects typically are not part of cycles, anyway.

> Why not allow it for all objects, then?

It's *allowed* for all objects. Why do you think it is not?

It must be opt-in, though. In the IO case, there are certain drawbacks;
not being able to report errors is the most prominent one. Any other object
implementation will have to evaluate whether to follow the iobase approach,
or implement a regular __del__. I personally consider the resurrection in
tp_del a much more serious problem, though, as this goes explicitly against
the design of the release procedure. For iobase, it's ok since it can evolve
along with the rest of the code base. Any third-party author would have to
accept that such approach can break from one Python release to the next.

I wonder why Python couldn't promise to always invoke tp_clear on GC
objects; ISTM that this would remove the need for resurrection in tp_del.

> At the very least, I think this behaviour (this exception for  
> iobase) merits being explicitly documented.

I find all of this well-documented in iobase.c. If you think anything
else needs to be said, please submit patches.

Regards,
Martin

Kristján Valur Jónsson | 20 Apr 15:28 2012

Re: issue 9141, finalizers and gc module


> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com <at> python.org
> [mailto:python-dev-bounces+kristjan=ccpgames.com <at> python.org] On
> Behalf Of martin <at> v.loewis.de
> Sent: 18. apríl 2012 07:11
> To: python-dev <at> python.org
> Subject: Re: [Python-Dev] issue 9141, finalizers and gc module
> 
> Invoking methods in tp_clear I find fairly harmless, in comparison. My only
> concern is that errors are silently ignored. However, I don't think this matters
> in practice, since io objects typically are not part of cycles, anyway.
> 
> > Why not allow it for all objects, then?
> 
> It's *allowed* for all objects. Why do you think it is not?
> 
Oh, because dynamic classes with __del__ methods are deliberately not collected but put in gc.garbage. 
And the special case of the generator object, etc. etc.

iobase.c probably documents its own needs well enough.  The fact that I had to raise this question here,
though, means that the source code  for gcmodule.c doesn't have enough information to explain exactly the
problem that it has with calling finalizers.
It seems to me that it worries that __del__ methods may not run to completion because of attribute errors,
and that it would have to silence such errors to not cause unexpected noise.
That is the impression I get from this discussion.  Correctness over memory conservation, or something
like that.

Btw, regarding object resurrection, I was working on a patch to get that to work better, particularly with subclasses.
You may want to check out issue 8212, whence this discussion originates.

K


Gmane