Kumar McMillan | 10 May 23:43
Picon

install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

I know this has been discussed over and over but I'm writing to see if
anyone has made a breakthrough yet.  The problem of course is that
Leopard's builtin libxml2 and libxslt are too old for lxml 2.0.  You
have to build libxml2 either from source or use a port.  There is
currently a problem with the libxml2 port, but the workaround is going
fine for me: http://trac.macports.org/ticket/15230 (I know because
postgres built just fine and I have some tests exerising psycopg2 as
well)

So after updating my libxml2 to 2.6.31 and libxslt to 1.1.23 and
updating my $PATH so that the new xml2-config and xslt-config can be
found, I can build lxml *without errors* but I see these warnings:

$ sudo easy_install lxml-2.0.5.tgz
Processing lxml-2.0.5.tgz
Running lxml-2.0.5/setup.py -q bdist_egg --dist-dir
/tmp/easy_install-3azY8e/lxml-2.0.5/egg-dist-tmp-t80esG
Building lxml version 2.0.5.
NOTE: Trying to build without Cython, pre-generated 'src/lxml/etree.c'
needs to be available.
Using build configuration of libxslt 1.1.23

ld: warning in /opt/local/lib/libxslt.dylib, file is not of required
architecture
ld: warning in /opt/local/lib/libexslt.dylib, file is not of required
architecture
ld: warning in /opt/local/lib/libxml2.dylib, file is not of required
architecture
[... and more like this ...]
...
(Continue reading)

Stefan Behnel | 11 May 09:01
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Hi Kumar,

you ask why this is so hard? Simple answer: because no-one has contributed a
way so far to make it easier.

We had lots of reports about stuff not working and almost as many
work-arounds, but no-one came up with a patch that would allow building lxml
reliably at least on a subset of Mac-OS systems. And I just cannot believe
that there is no-one amongst the Mac-OS-X users who knows how to use distutils
to build a binary extension. Or at least someone who knows how to build C code
statically against a C library.

>From my POV, Mac-OS seems to lack three things that make this problem
non-trivial. It doesn't have a standard package management system. Neither
does it have something like the Linux Standard Base, which dictates where
newly installed things belong. And it doesn't seem to support "rpath", which
would allow a binary to say "I know where my dependencies come from". Or at
least distutils don't support that on Mac. So everything I could try here on
Linux to make it work better is bound to fail.

Kumar McMillan wrote:
> I know this has been discussed over and over but I'm writing to see if
> anyone has made a breakthrough yet.  The problem of course is that
> Leopard's builtin libxml2 and libxslt are too old for lxml 2.0.  You
> have to build libxml2 either from source or use a port.
[lots of important details skipped to keep this at a higer level for now]
> Next, I tried doing a static build of lxml by setting
> STATIC_LIBRARY_DIRS = ['/opt/local/lib'] in setup.py and running:
> 
> python setup.py bdist_egg --static
(Continue reading)

Mike Meyer | 11 May 20:48
Favicon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

On Sun, 11 May 2008 09:01:01 +0200
Stefan Behnel <stefan_ml <at> behnel.de> wrote:

> you ask why this is so hard? Simple answer: because no-one has contributed a
> way so far to make it easier.

Gee, I had no trouble at all doing this last week (the release of
Oracle library bits for Intel OS-X means it's now desirable). I
installed macports, did a self-update, then installed py25-lxml.  It
installed python2.5.2 and the versions of libxml2 and libxslt that
were in macports as part of the process. Installing cx_Oracle after
that was more work.

> We had lots of reports about stuff not working and almost as many
> work-arounds, but no-one came up with a patch that would allow building lxml
> reliably at least on a subset of Mac-OS systems. And I just cannot believe
> that there is no-one amongst the Mac-OS-X users who knows how to use distutils
> to build a binary extension. Or at least someone who knows how to build C code
> statically against a C library.

I'm sorry, but my experience is that binary distributions make the
problems *worse*, not better - at least if you require multiple
different components to be installed. You have to make sure the
components all agree about the builds of any libraries they have in
common, and unless you have a coordinated build, that just doesn't
happen.

After all, I could build a binary distribution of lxml from macports,
but to use it, you'd have to have the macports versions of python,
libxml2 and libxslt. If you've got that, it's probably easier to
(Continue reading)

Kumar McMillan | 12 May 01:00
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

On Sun, May 11, 2008 at 1:48 PM, Mike Meyer <mwm <at> mired.org> wrote:
> On Sun, 11 May 2008 09:01:01 +0200
> Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>
>> you ask why this is so hard? Simple answer: because no-one has contributed a
>> way so far to make it easier.
>
> Gee, I had no trouble at all doing this last week (the release of
> Oracle library bits for Intel OS-X means it's now desirable). I
> installed macports, did a self-update, then installed py25-lxml.  It
> installed python2.5.2 and the versions of libxml2 and libxslt that
> were in macports as part of the process.

The build of lxml doesn't fail and you probably won't see any errors
unless you are using xpath.  In fact, running selftest.py after
building passes for me (I'm not sure if that runs all tests or not)
but I do get a consistent segfault in my program.

Looking at the macport of py25-lxml I don't see any flags that would
indicate they have accomplished statically linking the new libxml
libs.  I don't like to use ports of python modules because
/opt/local/bin/python doesn't mix well with a Framework python
installation from my experience.
Mike Meyer | 12 May 01:26
Favicon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

On Sun, 11 May 2008 18:00:26 -0500
"Kumar McMillan" <kumar.mcmillan <at> gmail.com> wrote:

> On Sun, May 11, 2008 at 1:48 PM, Mike Meyer <mwm <at> mired.org> wrote:
> > On Sun, 11 May 2008 09:01:01 +0200
> > Stefan Behnel <stefan_ml <at> behnel.de> wrote:
> >
> >> you ask why this is so hard? Simple answer: because no-one has contributed a
> >> way so far to make it easier.
> >
> > Gee, I had no trouble at all doing this last week (the release of
> > Oracle library bits for Intel OS-X means it's now desirable). I
> > installed macports, did a self-update, then installed py25-lxml.  It
> > installed python2.5.2 and the versions of libxml2 and libxslt that
> > were in macports as part of the process.
> 
> The build of lxml doesn't fail and you probably won't see any errors
> unless you are using xpath.  In fact, running selftest.py after
> building passes for me (I'm not sure if that runs all tests or not)
> but I do get a consistent segfault in my program.

Well, we make fairly heavy use of xpath (we use it to extract millions
of records/minute in our ETL system, plus provide default attributes
in the xml config file), so if it's a problem, I'm sure I'll see
it. The few tests I've run so far worked fine. Care to provide an
example that breaks?

> Looking at the macport of py25-lxml I don't see any flags that would
> indicate they have accomplished statically linking the new libxml
> libs.  I don't like to use ports of python modules because
(Continue reading)

Kumar McMillan | 12 May 02:23
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

On Sun, May 11, 2008 at 6:26 PM, Mike Meyer <mwm <at> mired.org> wrote:
>
> Well, we make fairly heavy use of xpath (we use it to extract millions
> of records/minute in our ETL system, plus provide default attributes
> in the xml config file), so if it's a problem, I'm sure I'll see
> it. The few tests I've run so far worked fine.

huh, yeah it does seem like you'd see a crash.  Maybe the py25-lxml
port gains some advantages from getting built within the macports
environment somehow.

> Care to provide an
> example that breaks?

unfornately, I don't think I have one, not something that is decoupled
from the app I'm working on anyway.  The app I'm working on makes
heavy use of lxml.html to spider through the web, uses xpath() here
and there, and the test cases use xpaths for assertions.  However, I
see the segfault in strange places.  For example, if I run all tests
at once (I'm using nose) then I usually don't see a segfault.  But if
I run test cases by themselves I will generally see a segfault.  And
if I do, it is a consistent segfault.  Looking at the crash log I can
see that it's on an xpath lookup (I posted this earlier).  However, to
make matters worse, the test cases I can trigger segfaults in
generally do not seem to touch any of the xpath code :/

Nonetheless, all the workarounds I've mentioned stop the segfaults.
Stefan Behnel | 12 May 10:41
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Hi,

Mike Meyer wrote:
> On Sun, 11 May 2008 09:01:01 +0200
> Stefan Behnel <stefan_ml <at> behnel.de> wrote:
> 
>> you ask why this is so hard? Simple answer: because no-one has contributed a
>> way so far to make it easier.
> 
> Gee, I had no trouble at all doing this last week (the release of
> Oracle library bits for Intel OS-X means it's now desirable). I
> installed macports, did a self-update, then installed py25-lxml.  It
> installed python2.5.2 and the versions of libxml2 and libxslt that
> were in macports as part of the process. Installing cx_Oracle after
> that was more work.
> 
>> We had lots of reports about stuff not working and almost as many
>> work-arounds, but no-one came up with a patch that would allow building lxml
>> reliably at least on a subset of Mac-OS systems. And I just cannot believe
>> that there is no-one amongst the Mac-OS-X users who knows how to use distutils
>> to build a binary extension. Or at least someone who knows how to build C code
>> statically against a C library.
> 
> I'm sorry, but my experience is that binary distributions make the
> problems *worse*, not better

I wasn't talking about distributing binaries. I meant: someone has to provide
a way to configure the compiler so that it builds lxml statically on Mac-OS.

Stefan
(Continue reading)

Kumar McMillan | 12 May 00:49
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Stefan, thanks for all the info

On Sun, May 11, 2008 at 2:01 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
> From my POV, Mac-OS seems to lack three things that make this problem
> non-trivial. It doesn't have a standard package management system. Neither
> does it have something like the Linux Standard Base, which dictates where
> newly installed things belong. And it doesn't seem to support "rpath", which
> would allow a binary to say "I know where my dependencies come from". Or at
> least distutils don't support that on Mac. So everything I could try here on
> Linux to make it work better is bound to fail.

I don't have experience building native OS X applications but I've
done a little more research into the problem and I think it is
specifically this:

"/usr/lib/libxml2.2.dylib uses two-level namespace, meaning that the
Foundation framework will always use this one instead of yours"

-- from http://0xced.blogspot.com/2006/07/dealing-with-outdated-open-source-libs.html

What is two-level namespacing?  Good question.  I haven't quite
figured that out yet but as the blog post suggests, you can "flatten"
it at runtime by setting DYLD_FORCE_FLAT_NAMESPACE=1

And, by golly, this actually works -- that is, setting it in my shell
and running my test cases that would otherwise segfault run smoothly.
Also, this doesn't screw up my lib paths like setting
DYLD_LIBRARY_PATH does (the conflict with subversion went away!).

>From more googling it does appear however that setting this var might
(Continue reading)

Stefan Behnel | 12 May 11:04
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Hi,

Kumar McMillan wrote:
> I don't have experience building native OS X applications

See? That seems to be a general problem amongst Mac-OS users. If no-one using
that platform knows how to build a C program, how am I supposed to know it?

> What is two-level namespacing?

*shrug*, I prefer an automatic static build on Mac-OS anyway.

> You say your patch removed the enforcement of STATIC_*_DIRS but that was
> never a problem.

It was, as it requires manual interaction by users that should only be
required in stupid "who-needs-a-system-compiler-anyway" environments like Windows.

> in fact, that seems to confuse gcc when building
> with --static since it produces orphaned -I args (no directory
> attached)

It just disables the requirement for setting the variables. It doesn't
configure anything so far. The config has to come from xml2-config and
xslt-config.

> Next, you suggest to adjust the sys.platform checks.  sys.platform
> always equals "darwin" on OS X

Ok, then the function will likely look something like this:
(Continue reading)

Kumar McMillan | 12 May 17:15
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

On Mon, May 12, 2008 at 4:04 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>  Kumar McMillan wrote:
>  > I don't have experience building native OS X applications
>
>  See? That seems to be a general problem amongst Mac-OS users.

Most people "use" computers they don't build them.  Your work is
greatly appreciated! :)

>  > What is two-level namespacing?
>
>  *shrug*, I prefer an automatic static build on Mac-OS anyway.

me too, I think that would be the right solution.

>  > in fact, that seems to confuse gcc when building
>  > with --static since it produces orphaned -I args (no directory
>  > attached)
>
>  It just disables the requirement for setting the variables. It doesn't
>  configure anything so far. The config has to come from xml2-config and
>  xslt-config.

something is going wrong then with --static because I get "Python.h
not found" errors and the gcc command looked something like this:

gcc ... -I -I/path/to/python/headers

notice the orphaned -I call where, afaict, STATIC_INCLUDE_DIRS was
previously getting inserted.  Just a theory.
(Continue reading)

Stefan Behnel | 13 May 07:25
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Hi,

Kumar McMillan wrote:
> something is going wrong then with --static because I get "Python.h
> not found" errors and the gcc command looked something like this:
> 
> gcc ... -I -I/path/to/python/headers

That's a bug. Here is a patch.

Stefan

=== setupinfo.py
==================================================================
--- setupinfo.py        (revision 4206)
+++ setupinfo.py        (local)
@@ -15,8 +15,11 @@
 PACKAGE_PATH = "src/lxml/"

 def env_var(name):
-    value = os.getenv(name, '')
-    return value.split(os.pathsep)
+    value = os.getenv(name)
+    if value:
+        return value.split(os.pathsep)
+    else:
+        return []

 def ext_modules(static_include_dirs, static_library_dirs, static_cflags):
     if CYTHON_INSTALLED:
(Continue reading)

Kumar McMillan | 14 May 06:54
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

On Tue, May 13, 2008 at 12:25 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>  > gcc ... -I -I/path/to/python/headers
>
>  That's a bug. Here is a patch.

closer!  thanks for that fix.  That got all the -I includes in order.

Next up, I'm pretty sure I need to pass -static to libtool so that it
honors the -lxml2.a (without -static, it says xml2.a -- lib not
found).  My idea for this was:

export LDFLAGS='-static'

and I got:

gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g
-bundle -undefined dynamic_lookup -static
build/temp.macosx-10.3-i386-2.5/src/lxml/lxml.etree.o
-L/Users/kumar/src/lxml-2.0/parts/libxml2/lib
-L/Users/kumar/src/lxml-2.0/parts/libxslt/lib -lxslt.a -lexslt.a
-lxml2.a -lz.a -lm.a -o build/lib.macosx-10.3-i386-2.5/lxml/etree.so
ld_classic: incompatible flag -bundle used (must specify "-dynamic" to be used)

so ... how do I stop it from adding -bundle?  Ideas for another approach?
Stefan Behnel | 14 May 08:01
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Hi,

Kumar McMillan wrote:
> On Tue, May 13, 2008 at 12:25 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>>  > gcc ... -I -I/path/to/python/headers
>>
>>  That's a bug. Here is a patch.
> 
> closer!  thanks for that fix.  That got all the -I includes in order.
> 
> Next up, I'm pretty sure I need to pass -static to libtool so that it
> honors the -lxml2.a (without -static, it says xml2.a -- lib not
> found).

It's not "-lxml2.a" but a plain "/path/to/libxml2.a" as parameter to link it
in just like the normal lxml.etree.o object file that was just compiled.

Stefan
Kumar McMillan | 15 May 06:40
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Hello again

On Wed, May 14, 2008 at 1:01 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>> Next up, I'm pretty sure I need to pass -static to libtool so that it
>> honors the -lxml2.a (without -static, it says xml2.a -- lib not
>> found).
>
> It's not "-lxml2.a" but a plain "/path/to/libxml2.a" as parameter to link it
> in just like the normal lxml.etree.o object file that was just compiled.

when I tried the plain paths it says library cannot be found.  But
I've discovered that building with -static is a dead end.  It seems
that Apple all but disallows static linking completely:

http://developer.apple.com/qa/qa2001/qa1118.html

HOWEVER

after blood, sweat, and some tears (kidding) this is *all* I needed, it seems:

export CFLAGS="-flat_namespace"

...no static builds libxml2 libs, no buildout recipe.  I just set that and ran:

python setup.py bdist_egg
--with-xml2-config=/opt/local/bin/xml2-config
--with-xslt-config=/opt/local/bin/xslt-config

which uses the libxml2 and etc. installed by ports.  In fact, as long
as /opt/local/bin is on my path that should work without having to set
(Continue reading)

Stefan Behnel | 15 May 13:03
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

Hi Kumar,

Kumar McMillan wrote:
> so, I'm thinking this is just two lines of code added to cflags() ...
> 
> if sys.platform in ('darwin',):
>     result.append('-flat_namespace')

That's cool, thanks. I added it to the trunk and to the 2.0 branch. Let's see
if Mac users get along with 2.0.6 then...

Thanks for the effort!

Stefan
Kumar McMillan | 16 May 04:47
Picon

Re: install lxml 2.0.5 on Mac OS X Leopard - why is it so hard?

On Thu, May 15, 2008 at 6:03 AM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>> if sys.platform in ('darwin',):
>>     result.append('-flat_namespace')
>
> That's cool, thanks. I added it to the trunk and to the 2.0 branch.

excellent

> Let's see
> if Mac users get along with 2.0.6 then...
>
> Thanks for the effort!

sure, no problem.

I researched this a bit more.  It seems that people generally consider
-flat_namespace a bad "hack," something to keep in mind.  However,
this seems to be because a few libraries take advantage of
-twolevel_namespace (the default gcc behavior as of OS X 10.3 or
something) so your binaries may cause other linked libs to behave
wrong.  The only specific example I could find of one that uses two
level namespaces was OpenGL, but maybe there are others.

Anyway, for lxml's purposes *I think* it is OK to use -flat_namespace
since there aren't many other libs involved.  Let's roll with it.
This is what etree links to :

$ otool -l path/to/lxml/etree.so

[snip]
(Continue reading)


Gmane