Antoine Pitrou | 23 Jun 2012 17:25

Empty directory is a namespace?


Hello,

I've just noticed the following:

$ mkdir foo
$ ./python
Python 3.3.0a4+ (default:837d51ba1aa2+1794308c1ea7+, Jun 23 2012,
14:43:41) [GCC 4.5.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
>>> foo
<module 'foo' (namespace)>

Should even an empty directory be a valid namespace package?

Regards

Antoine.

Guido van Rossum | 23 Jun 2012 17:38
Favicon

Re: Empty directory is a namespace?

Yes. Otherwise, where to draw the line? What if it contains a single
dot file? What if it contains no Python files? What if it contains
only empty subdirectories?

On Sat, Jun 23, 2012 at 8:25 AM, Antoine Pitrou <solipsis <at> pitrou.net> wrote:
>
> Hello,
>
> I've just noticed the following:
>
> $ mkdir foo
> $ ./python
> Python 3.3.0a4+ (default:837d51ba1aa2+1794308c1ea7+, Jun 23 2012,
> 14:43:41) [GCC 4.5.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import foo
>>>> foo
> <module 'foo' (namespace)>
>
>
> Should even an empty directory be a valid namespace package?
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev <at> python.org
(Continue reading)

Antoine Pitrou | 23 Jun 2012 17:39

Re: Empty directory is a namespace?

On Sat, 23 Jun 2012 08:38:02 -0700
Guido van Rossum <guido <at> python.org> wrote:
> Yes. Otherwise, where to draw the line? What if it contains a single
> dot file? What if it contains no Python files? What if it contains
> only empty subdirectories?

That's true. I would have hoped for it to be recognized only when
there's at least one module or package inside, but it doesn't sound
easy to check for (especially in the recursive namespace packages case
- is that possible?).

Regards

Antoine.
martin | 23 Jun 2012 17:55
Picon
Gravatar

Re: Empty directory is a namespace?

> That's true. I would have hoped for it to be recognized only when
> there's at least one module or package inside, but it doesn't sound
> easy to check for (especially in the recursive namespace packages case
> - is that possible?).

Yes - a directory becomes a namespace package by not having an __init__.py,
so the "namespace package" case will likely become the default, and people
will start removing the empty __init__.pys when they don't need to support
3.2- anymore.

If you wonder whether a nested namespace package may have multiple portions:
that can also happen, i.e. if you have z3c.recipe.ldap, z3c.recipe.template,
z3c.recipe.sphinxdoc. They may all get installed as separate zip files,
each contributing a portion to z3c.recipe.

In the long run, I expect that we will see namespace packages such as
org.openstack, com.canonical, com.ibm, etc. Then, "com" is a namespace
package, com.canonical is a namespace package, and com.canonical.launchpad
might still be a namespace package with multiple portions.

Regards,
Martin

Antoine Pitrou | 23 Jun 2012 17:58

Re: Empty directory is a namespace?

On Sat, 23 Jun 2012 17:55:24 +0200
martin <at> v.loewis.de wrote:
> > That's true. I would have hoped for it to be recognized only when
> > there's at least one module or package inside, but it doesn't sound
> > easy to check for (especially in the recursive namespace packages case
> > - is that possible?).
> 
> Yes - a directory becomes a namespace package by not having an __init__.py,
> so the "namespace package" case will likely become the default, and people
> will start removing the empty __init__.pys when they don't need to support
> 3.2- anymore.

Have you tested the performance of namespace packages compared to
normal packages?

> In the long run, I expect that we will see namespace packages such as
> org.openstack, com.canonical, com.ibm, etc. Then, "com" is a namespace
> package, com.canonical is a namespace package, and com.canonical.launchpad
> might still be a namespace package with multiple portions.

I hope we are spared such naming schemes.

Regards

Antoine.

Martin v. Löwis | 24 Jun 2012 09:51
Picon
Gravatar

Re: Empty directory is a namespace?

On 23.06.2012 17:58, Antoine Pitrou wrote:
> On Sat, 23 Jun 2012 17:55:24 +0200
> martin <at> v.loewis.de wrote:
>>> That's true. I would have hoped for it to be recognized only when
>>> there's at least one module or package inside, but it doesn't sound
>>> easy to check for (especially in the recursive namespace packages case
>>> - is that possible?).
>>
>> Yes - a directory becomes a namespace package by not having an __init__.py,
>> so the "namespace package" case will likely become the default, and people
>> will start removing the empty __init__.pys when they don't need to support
>> 3.2- anymore.
> 
> Have you tested the performance of namespace packages compared to
> normal packages?

No, I haven't.

Regards,
Martin
PJ Eby | 24 Jun 2012 19:44
Gravatar

Re: Empty directory is a namespace?

On Sun, Jun 24, 2012 at 3:51 AM, "Martin v. Löwis" <martin <at> v.loewis.de> wrote:
On 23.06.2012 17:58, Antoine Pitrou wrote:
> On Sat, 23 Jun 2012 17:55:24 +0200
> martin <at> v.loewis.de wrote:
>>> That's true. I would have hoped for it to be recognized only when
>>> there's at least one module or package inside, but it doesn't sound
>>> easy to check for (especially in the recursive namespace packages case
>>> - is that possible?).
>>
>> Yes - a directory becomes a namespace package by not having an __init__.py,
>> so the "namespace package" case will likely become the default, and people
>> will start removing the empty __init__.pys when they don't need to support
>> 3.2- anymore.
>
> Have you tested the performance of namespace packages compared to
> normal packages?

No, I haven't.

It's probably not worthwhile; any performance cost increase due to looking at more sys.path entries should be offset by the speedup of any subsequent imports from later sys.path entries.

Or, to put it another way, almost all the extra I/O cost of namespace packages is paid only once, for the *first* namespace package imported.  In effect, this means that the amortized cost of using namespace packages actually *decreases* as namespace packages become more popular.  Also, the total extra overhead equals the cost of a listdir() for each directory on sys.path that would otherwise not have been checked for an import.  (So, for example, if even one import fails over the life of a program's execution, or it performs even one import from the last directory on sys.path, then there is no actual extra overhead.)

Of course, there are still cache validation stat() calls, and they make the cost of an initial import of a namespace package (vs. a self-contained package with __init__.py) to be an extra N stat() calls, where N is the number of sys.path entries that appear *after* the sys.path directory where the package is found.  (This cost of course must still be compared against the costs of finding, opening, and running an empty __init__.py[co] file, so it may actually still be quite competitive in many cases.)

For imports *within* a namespace package, similar considerations apply, except that N is smaller, and in the simple case of replacing a self-contained package with a namespace (but not adding any additional path locations), N will be zero, making imports from inside the namespace run exactly as quickly as normal imports.

In short, it's not worth worrying about, and definitely nothing that should cause people to spread an idea that __init__.py somehow speeds things up.  If there's a difference, it'll likely be lost in measurement noise, due to importlib's new directory caching mechanism.

<div>
<div class="gmail_quote">On Sun, Jun 24, 2012 at 3:51 AM, "Martin v. L&ouml;wis" <span dir="ltr">&lt;<a href="mailto:martin <at> v.loewis.de" target="_blank">martin <at> v.loewis.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">
<div class="im">On 23.06.2012 17:58, Antoine Pitrou wrote:<br>
&gt; On Sat, 23 Jun 2012 17:55:24 +0200<br>
&gt; <a href="mailto:martin <at> v.loewis.de">martin <at> v.loewis.de</a> wrote:<br>
&gt;&gt;&gt; That's true. I would have hoped for it to be recognized only when<br>
&gt;&gt;&gt; there's at least one module or package inside, but it doesn't sound<br>
&gt;&gt;&gt; easy to check for (especially in the recursive namespace packages case<br>
&gt;&gt;&gt; - is that possible?).<br>
&gt;&gt;<br>
&gt;&gt; Yes - a directory becomes a namespace package by not having an __init__.py,<br>
&gt;&gt; so the "namespace package" case will likely become the default, and people<br>
&gt;&gt; will start removing the empty __init__.pys when they don't need to support<br>
&gt;&gt; 3.2- anymore.<br>
&gt;<br>
&gt; Have you tested the performance of namespace packages compared to<br>
&gt; normal packages?<br><br>
</div>No, I haven't.<br>
</blockquote>
<div>
<br>It's probably not worthwhile; any performance cost increase due to looking at more sys.path entries should be offset by the speedup of any subsequent imports from later sys.path entries.<br><br>Or, to put it another way, almost all the extra I/O cost of namespace packages is paid only once, for the *first* namespace package imported.&nbsp; In effect, this means that the amortized cost of using namespace packages actually *decreases* as namespace packages become more popular.&nbsp; Also, the total extra overhead equals the cost of a listdir() for each directory on sys.path that would otherwise not have been checked for an import.&nbsp; (So, for example, if even one import fails over the life of a program's execution, or it performs even one import from the last directory on sys.path, then there is no actual extra overhead.)<br><br>Of course, there are still cache validation stat() calls, and they make the cost of an initial import of a namespace package (vs. a self-contained package with __init__.py) to be an extra N stat() calls, where N is the number of sys.path entries that appear *after* the sys.path directory where the package is found.&nbsp; (This cost of course must still be compared against the costs of finding, opening, and running an empty __init__.py[co] file, so it may actually still be quite competitive in many cases.)<br><br>For imports *within* a namespace package, similar considerations apply, except that N is smaller, and in the simple case of replacing a self-contained package with a namespace (but not adding any additional path locations), N will be zero, making imports from inside the namespace run exactly as quickly as normal imports.<br><br>In short, it's not worth worrying about, and definitely nothing that should cause people to spread an idea that __init__.py somehow speeds things up.&nbsp; If there's a difference, it'll likely be lost in measurement noise, due to importlib's new directory caching mechanism.<br>
</div>
</div>
<br>
</div>
Antoine Pitrou | 24 Jun 2012 19:46

Re: Empty directory is a namespace?

On Sun, 24 Jun 2012 13:44:52 -0400
PJ Eby <pje <at> telecommunity.com> wrote:
> On Sun, Jun 24, 2012 at 3:51 AM, "Martin v. Löwis" <martin <at> v.loewis.de>wrote:
> 
> > On 23.06.2012 17:58, Antoine Pitrou wrote:
> > > On Sat, 23 Jun 2012 17:55:24 +0200
> > > martin <at> v.loewis.de wrote:
> > >>> That's true. I would have hoped for it to be recognized only when
> > >>> there's at least one module or package inside, but it doesn't sound
> > >>> easy to check for (especially in the recursive namespace packages case
> > >>> - is that possible?).
> > >>
> > >> Yes - a directory becomes a namespace package by not having an
> > __init__.py,
> > >> so the "namespace package" case will likely become the default, and
> > people
> > >> will start removing the empty __init__.pys when they don't need to
> > support
> > >> 3.2- anymore.
> > >
> > > Have you tested the performance of namespace packages compared to
> > > normal packages?
> >
> > No, I haven't.
> >
> 
> It's probably not worthwhile; any performance cost increase due to looking
> at more sys.path entries should be offset by the speedup of any subsequent
> imports from later sys.path entries.
> 
> Or, to put it another way, almost all the extra I/O cost of namespace
> packages is paid only once, for the *first* namespace package imported.

And how about CPU cost?

> In short, it's not worth worrying about, and definitely nothing that
> should cause people to spread an idea that __init__.py somehow speeds
> things up.

The best way to avoid people spreading that idea would be to show hard
measurements.

Regards

Antoine.
Martin v. Löwis | 24 Jun 2012 21:27
Picon
Gravatar

Re: Empty directory is a namespace?

>> In short, it's not worth worrying about, and definitely nothing that
>> should cause people to spread an idea that __init__.py somehow speeds
>> things up.
> 
> The best way to avoid people spreading that idea would be to show hard
> measurements.

PJE wants people to spread an idea, not to avoid them doing so.

In any case, hard measurements might help to spread the idea, here are
mine. For the attached project, ec656d79b8ac gives, on my system

import time for a namespace package: 113µs (fastest run, hot caches)
import time for a regular package:   128µs (---- " ------)
first-time import of regular package: 1859µs (due to pyc generation)
(remove __init__.py and __pycache__ to construct the first setup)

So namespace packages are indeed faster than regular packages, at least
in some cases.

Regards,
Martin
Attachment (spacetiming.tgz): application/x-compressed-tar, 386 bytes
>> In short, it's not worth worrying about, and definitely nothing that
>> should cause people to spread an idea that __init__.py somehow speeds
>> things up.
> 
> The best way to avoid people spreading that idea would be to show hard
> measurements.

PJE wants people to spread an idea, not to avoid them doing so.

In any case, hard measurements might help to spread the idea, here are
mine. For the attached project, ec656d79b8ac gives, on my system

import time for a namespace package: 113µs (fastest run, hot caches)
import time for a regular package:   128µs (---- " ------)
first-time import of regular package: 1859µs (due to pyc generation)
(remove __init__.py and __pycache__ to construct the first setup)

So namespace packages are indeed faster than regular packages, at least
in some cases.

Regards,
Martin
PJ Eby | 24 Jun 2012 21:51
Gravatar

Re: Empty directory is a namespace?

On Sun, Jun 24, 2012 at 3:27 PM, "Martin v. Löwis" <martin <at> v.loewis.de> wrote:
>> In short, it's not worth worrying about, and definitely nothing that
>> should cause people to spread an idea that __init__.py somehow speeds
>> things up.
>
> The best way to avoid people spreading that idea would be to show hard
> measurements.

PJE wants people to spread an idea, not to avoid them doing so.

In any case, hard measurements might help to spread the idea, here are
mine. For the attached project, ec656d79b8ac gives, on my system

import time for a namespace package: 113盜 (fastest run, hot caches)
import time for a regular package:   128盜 (---- " ------)
first-time import of regular package: 1859盜 (due to pyc generation)
(remove __init__.py and __pycache__ to construct the first setup)

So namespace packages are indeed faster than regular packages, at least
in some cases.

I don't really want to spread the idea that they're faster, either: the exact same benchmark can probably be made to turn out differently if you have, say, a hundred unzipped eggs on sys.path after the benchmark directory.  A more realistic benchmark would import more than one module, though...  and then it goes back and forth, dueling benchmarks that can always be argued against with a different benchmark measuring different things with other setup conditions.

That's what I meant by "lost in the noise": the outcome of the benchmark depends on which of many potentially-plausible setups and applications you choose to use as your basis for measurement, so it's silly to think that either omitting or including __init__.py should be done for performance reasons.  Do whatever your application needs, because it's not going to make much difference either way in any realistic program.

<div><div class="gmail_quote">On Sun, Jun 24, 2012 at 3:27 PM, "Martin v. L&ouml;wis" <span dir="ltr">&lt;<a href="mailto:martin <at> v.loewis.de" target="_blank">martin <at> v.loewis.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">
<div class="im">&gt;&gt; In short, it's not worth worrying about, and definitely nothing that<br>
&gt;&gt; should cause people to spread an idea that __init__.py somehow speeds<br>
&gt;&gt; things up.<br>
&gt;<br>
&gt; The best way to avoid people spreading that idea would be to show hard<br>
&gt; measurements.<br><br>
</div>PJE wants people to spread an idea, not to avoid them doing so.<br><br>
In any case, hard measurements might help to spread the idea, here are<br>
mine. For the attached project, ec656d79b8ac gives, on my system<br><br>
import time for a namespace package: 113&#30428; (fastest run, hot caches)<br>
import time for a regular package: &nbsp; 128&#30428; (---- " ------)<br>
first-time import of regular package: 1859&#30428; (due to pyc generation)<br>
(remove __init__.py and __pycache__ to construct the first setup)<br><br>
So namespace packages are indeed faster than regular packages, at least<br>
in some cases.<br>
</blockquote>
<div>
<br>I don't really want to spread the idea that they're faster, either: the exact same benchmark can probably be made to turn out differently if you have, say, a hundred unzipped eggs on sys.path after the benchmark directory.&nbsp; A more realistic benchmark would import more than one module, though...&nbsp; and then it goes back and forth, dueling benchmarks that can always be argued against with a different benchmark measuring different things with other setup conditions.<br><br>That's what I meant by "lost in the noise": the outcome of the benchmark depends on which of many potentially-plausible setups and applications you choose to use as your basis for measurement, so it's silly to think that either omitting or including __init__.py should be done for performance reasons.&nbsp; Do whatever your application needs, because it's not going to make much difference either way in any realistic program.<br><br>
</div>
</div></div>
Antoine Pitrou | 24 Jun 2012 21:51

Re: Empty directory is a namespace?

Le dimanche 24 juin 2012 à 15:51 -0400, PJ Eby a écrit :

> 
> I don't really want to spread the idea that they're faster, either:
> the exact same benchmark can probably be made to turn out differently
> if you have, say, a hundred unzipped eggs on sys.path after the
> benchmark directory.

Yes, the case where sys.path is long (thanks to setuptools) is precisely
what I was thinking about.

>   A more realistic benchmark would import more than one module,
> though...

Indeed.

> That's what I meant by "lost in the noise": the outcome of the
> benchmark depends on which of many potentially-plausible setups and
> applications you choose to use as your basis for measurement,

Should we forget to care about performance, just because different
setups might yield different results? That's a rather unconstructive
attitude.

Regards

Antoine.

_______________________________________________
Python-Dev mailing list
Python-Dev <at> python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org
Martin v. Löwis | 24 Jun 2012 22:13
Picon
Gravatar

Re: Empty directory is a namespace?

> Should we forget to care about performance, just because different
> setups might yield different results?

No, we are not forgetting about performance. You asked for a benchmark,
I presented one.

I fail to see your problem. I claim that the performance of namespace
packages is just fine, and presented a benchmark. PJE claims that the
performance of namespace packages is fine, and provided reasoning.

If you want to see two specific scenarios compared, provide *at least*
a description of what these scenarios are. Better, just do the
benchmark then yourself.

In general, I think there is a widespread misunderstanding how new
features impact performance. There are really several cases to
be considered:
1.what is the impact of feature on existing applications which
  don't use it. This is difficult to measure since you first need
  to construct an implementation which doesn't have the feature,
  but is otherwise identical. This is often easy to reason about,
  though.
2,what is the performance of the feature when it is used. This is
  easy to measure, but difficult to evaluate. If you measure it,
  and you get some result - is that good, good enough, or bad?

For 1, it may be tempting to compare the new implementation with
the previous release. However, in the specific case, this is
misleading, since the entire import machinery was replaced. So
you really need to compare with a version of importlib that doesn't
have namespace packages.

Regards,
Martin
martin | 23 Jun 2012 17:47
Picon
Gravatar

Re: Empty directory is a namespace?

> Should even an empty directory be a valid namespace package?

Yes, that's what the PEP says, by BDFL pronouncement.

Regards,
Martin


Gmane