Dmitri Fedoruk | 13 Sep 17:15

lxml + mod_python: cannot unmarshal code objects in restricted execution mode

Hello everyone,

I'm developing a mod_python application that is based on XML\XSLT
transforming.

I used 4Suite libraries for that, but as the speed was unacceptable
for me, I switched to lxml. Everything became much easier and 10 times
faster, but I've encountered the subject problem.

In brief - all my data and xslt are stored and transferred in UTF-8.
With 4Suite everything was fine all the time. With lxml it works fine
 from the console, but inside mod_python it occasionaly dies, ~ one
time out of three. Strange - the same code with the same data works or
dies by its own means.

As far as I have found, there was a similar problem with PyXML and
encodings module, this is the problem with UTF, but there was no clear
solution.

So, my configuration is the following:
Python 2.5.1
Server version: Apache/2.2.4 (FreeBSD)
mod_python-3.3.1

And the relevant parts of my code are these:

def extApplyXslt(xslt, data, logger ):
    try:
        strXslt = urllib2.urlopen(xslt).read()
        # i have to read the xslt url to the python string
(Continue reading)

Lee Brown | 13 Sep 17:40

Re: lxml + mod_python: cannot unmarshal code objects inrestricted execution mode

Greetings!

The first thing I'd suggest is to also put your query on the Mod Python list as
well.  

A few questions:

Are you trying to execute this code in a Handler or in a Filter?  There's world
of hidden trouble lurking in Filters because of their re-entrant nature.

Which Apache MPM are you using?  If you're using a multiple-process module, you
might try swithing to a single-process-multiple-thread module to see if this
behavior changes.

> -----Original Message-----
> From: lxml-dev-bounces <at> codespeak.net 
> [mailto:lxml-dev-bounces <at> codespeak.net] On Behalf Of Dmitri Fedoruk
> Sent: Thursday, September 13, 2007 11:18 AM
> To: lxml-dev <at> codespeak.net
> Subject: [lxml-dev] lxml + mod_python: cannot unmarshal code 
> objects inrestricted execution mode
> 
> Hello everyone,
> 
> I'm developing a mod_python application that is based on 
> XML\XSLT transforming.
> 
> I used 4Suite libraries for that, but as the speed was 
> unacceptable for me, I switched to lxml. Everything became 
> much easier and 10 times faster, but I've encountered the 
(Continue reading)

Stefan Behnel | 13 Sep 17:45

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode


Dmitri Fedoruk wrote:
> I'm developing a mod_python application that is based on XML\XSLT
> transforming.
> 
> I used 4Suite libraries for that, but as the speed was unacceptable
> for me, I switched to lxml. Everything became much easier and 10 times
> faster

Thanks for sharing that. :)

> but I've encountered the subject problem.
> 
> In brief - all my data and xslt are stored and transferred in UTF-8.
> With 4Suite everything was fine all the time. With lxml it works fine
>  from the console, but inside mod_python it occasionaly dies, ~ one
> time out of three. Strange - the same code with the same data works or
> dies by its own means.
> 
> As far as I have found, there was a similar problem with PyXML and
> encodings module, this is the problem with UTF, but there was no clear
> solution.
> 
> So, my configuration is the following:
> Python 2.5.1
> Server version: Apache/2.2.4 (FreeBSD)
> mod_python-3.3.1

Looks like you forgot to mention the lxml version you are using.

(Continue reading)

David Danier | 13 Sep 18:02

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

> Everything became much easier and 10 times
> faster, but I've encountered the subject problem.

Same problem here, but with different code and versions:
 * Django as webframework
 * Apache 2.0.59 and 2.2.4
 * lxml 1.3.x (all versions)
 * mod_python 3.2.10 and 3.3.1
 * libxml2 2.6.28 / libxslt 1.1.20

I think this might have something to do with mod_python fiddling with
__builtins__, at least googling for the error message told me, that
Python switches to restricted mode when doing so (but this might one
trigger of many). lxml seems to have callbacks run in its own "sandbox"
(or something like this, at least it seems to be a different environment
as the outer code had), which works fine unless the restricted mode is
triggered.

Somehow restricted mode is only mentioned in the docs for RExec
(http://docs.python.org/lib/module-rexec.html), but should not be
available any more, to I don't know what lxml exactly does to use callbacks.

Some further bug-finding I did revealed, that the "unmarshaling"-error
only occured if all modules I used in the callback are loaded before the
callback runs. If I load them inside the callback the error differs.
Example:
------------8<----------------------------------------------------
# unmarshaling error
from foo import bar
def callback(ctx, ...):
(Continue reading)

Lee Brown | 13 Sep 18:07

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

Greetings!

Sorry, I should have stated my first question more clearly.  Are you calling
your routines from within a Mod Python requestHandler object or an outputFilter
object? 

> -----Original Message-----
> From: lxml-dev-bounces <at> codespeak.net 
> [mailto:lxml-dev-bounces <at> codespeak.net] On Behalf Of David Danier
> Sent: Thursday, September 13, 2007 12:02 PM
> To: lxml-dev <at> codespeak.net
> Subject: Re: [lxml-dev] lxml + mod_python: cannot unmarshal 
> code objects in restricted execution mode
> 
> > Everything became much easier and 10 times faster, but I've 
> > encountered the subject problem.
> 
> Same problem here, but with different code and versions:
>  * Django as webframework
>  * Apache 2.0.59 and 2.2.4
>  * lxml 1.3.x (all versions)
>  * mod_python 3.2.10 and 3.3.1
>  * libxml2 2.6.28 / libxslt 1.1.20
> 
> I think this might have something to do with mod_python 
> fiddling with __builtins__, at least googling for the error 
> message told me, that Python switches to restricted mode when 
> doing so (but this might one trigger of many). lxml seems to 
> have callbacks run in its own "sandbox"
> (or something like this, at least it seems to be a different 
(Continue reading)

David Danier | 13 Sep 18:50

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

> Sorry, I should have stated my first question more clearly.  Are you calling
> your routines from within a Mod Python requestHandler object or an outputFilter
> object? 

It is called out of a RequestHandler, but I'm not really doing this
myself. Django does most of the work, see:
http://www.djangoproject.com/documentation/modpython/
http://code.djangoproject.com/browser/django/trunk/django/core/handlers/modpython.py#L176

Greetings, David Danier
David Danier | 13 Sep 19:05

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

> Somehow restricted mode is only mentioned in the docs for RExec
> (http://docs.python.org/lib/module-rexec.html), but should not be
> available any more, to I don't know what lxml exactly does to use callbacks.

Found another place that mentions restricted mode by accident:
http://www.modpython.org/live/current/doc-html/pyapi-interps.html

I think this paragraph describes the problem pretty well:
------------8<----------------------------------------------------
Note that if any third party module is being used which has a C code
component that uses the simplified API for access to the Global
Interpreter Lock (GIL) for Python extension modules, then the
interpreter name must be forcibly set to be "main_interpreter". This is
necessary as such a module will only work correctly if run within the
context of the first Python interpreter created by the process. If not
forced to run under the "main_interpreter", a range of Python errors can
arise, each typically referring to code being run in restricted mode.
---------------------------------------------------->8------------
(thanks to Lee Brown for asking about where lxml is called, it made me
read the mod_python-docs again)

I'll try to setup my site on mod_python and using "PythonInterpreter
main_interpreter" in the config. According to the docs this might
help...but if I read this right might produce namespace-problems or at
least pollute some global namespace. As this takes some time I will post
the result later.

Perhaps it can be fixed in lxml by not using the "simplified API for
access to the Global Interpreter Lock (GIL) for Python extension modules"?

(Continue reading)

Stefan Behnel | 13 Sep 19:28

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

Hi,

David Danier wrote:
>> Somehow restricted mode is only mentioned in the docs for RExec
>> (http://docs.python.org/lib/module-rexec.html), but should not be
>> available any more, to I don't know what lxml exactly does to use callbacks.
> 
> Found another place that mentions restricted mode by accident:
> http://www.modpython.org/live/current/doc-html/pyapi-interps.html
> 
> I think this paragraph describes the problem pretty well:
> ------------8<----------------------------------------------------
> Note that if any third party module is being used which has a C code
> component that uses the simplified API for access to the Global
> Interpreter Lock (GIL) for Python extension modules, then the
> interpreter name must be forcibly set to be "main_interpreter". This is
> necessary as such a module will only work correctly if run within the
> context of the first Python interpreter created by the process. If not
> forced to run under the "main_interpreter", a range of Python errors can
> arise, each typically referring to code being run in restricted mode.
> ---------------------------------------------------->8------------
> (thanks to Lee Brown for asking about where lxml is called, it made me
> read the mod_python-docs again)

thanks for the infos, that's good to know.

> I'll try to setup my site on mod_python and using "PythonInterpreter
> main_interpreter" in the config. According to the docs this might
> help...but if I read this right might produce namespace-problems or at
> least pollute some global namespace. As this takes some time I will post
(Continue reading)

David Danier | 13 Sep 19:51

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

>> As this takes some time I will post
>> the result later.
> Please do.

Seems to work properly. But I'm not really sure how bad
"main_interpreter" is polluted now.

> No way. There's a reason why it is there which is the same why we use it: it's
> simple and usable. Using anything else would mean a lot of rewriting.

Thats sad. What are the chances that patches addressing this problem are
accepted?
(Must review the code first, but I would really like a clean solution here)

> You might want to try compiling lxml with "--without-threading", though, which
> disables concurrency support completely (i.e. not more GIL freeing).

Works, too. But I'm not really sure it it is a good idea to do so, as
Py_NewInterpreter seems to create a thread, see
http://www.python.org/doc/current/api/initialization.html#l2h-820. But I
think this might not be a problem if not using a threaded Apache-MPM.

Greetings, David Danier
Stefan Behnel | 15 Sep 17:48

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode


David Danier wrote:
>>> As this takes some time I will post
>>> the result later.
>> Please do.
> 
> Seems to work properly. But I'm not really sure how bad
> "main_interpreter" is polluted now.

I wouldn't expect much (namespace) polution - unless there's real evidence
that this can become a problem.

And a crash is definitely a more important problem than namespace polution.

>> No way. There's a reason why it is there which is the same why we use it: it's
>> simple and usable. Using anything else would mean a lot of rewriting.
> 
> Thats sad. What are the chances that patches addressing this problem are
> accepted?
> (Must review the code first, but I would really like a clean solution here)

We always accept patches as long as there is general interest and/or a good
motivation behind them. But threading is pretty much an issue by itself in
lxml.etree. And the "simplified API" gives you a way to just say "release GIL
- call to libxml2 - acquire GIL" and "acquire GIL - run callback code - free
GIL". That's as easy as it can get - especially since Cython has support for
the latter nowadays. It is very unlikely that this can get any "cleaner" by
changing the thread-lock calls.

>> You might want to try compiling lxml with "--without-threading", though, which
(Continue reading)

Dmitri Fedoruk | 14 Sep 10:28

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

Hello,

> I'll try to setup my site on mod_python and using "PythonInterpreter main_interpreter" in the config.

Fine, works for me too. As I'm not very good in python, I can't tell
whether this is good or evil, but this trick works and that's all I
need. Thanks!

Dmitri
Stefan Behnel | 13 Sep 18:53

Re: lxml + mod_python: cannot unmarshal code objects in restricted execution mode

... just forwarding to the list ...

[original mail by Dmitri Fedoruk]

On 9/13/07, Stefan Behnel wrote:
> Looks like you forgot to mention the lxml version you are using.
The most important thing
lxml-1.3.4

> As I already mentioned on c.l.py, you can pass the result of urlopen()
> directly into parse().
Thank you, that looks better.

> Hmmm, I can't see where any "unmarshaling" should be taking place here -
> definitely not in XSLT(). And I don't get why this should only happen once in
> a while.
The point is that it than happens again and again, but I can't see any
regularity. Pretty random.

Here is the real code and it's profiling output:

    try:
        xslt_parser = etree.XMLParser()
        xslt_parser.resolvers.add( PrefixResolver("XSLT") )

        inLogger.log(logging.INFO, "parser created" )

        xslt_doc = etree.parse( urllib2.urlopen(xslt) , xslt_parser)
        inLogger.log(logging.INFO, "%s parsed" % xslt )

(Continue reading)


Gmane