Require loaders set __package__ and __loader__

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Require loaders set __package__ and __loader__

Brett Cannon-2
An open issue in PEP 302 is whether to require __loader__ attributes on modules. The claimed worry is memory consumption, but considering importlib and zipimport are already doing this that seems like a red herring. Requiring it, though, opens the door to people relying on its existence and thus starting to do things like loading assets with ``__loader__.get_data(path_to_internal_package_file)`` which allows code to not care how modules are stored (e.g. zip file, sqlite database, etc.).

What I would like to do is update the PEP to state that loaders are expected to set __loader__. Now importlib will get updated to do that implicitly so external code can expect it post-import, but requiring loaders to set it would mean that code executed during import can rely on it as well.

As for __package__, PEP 366 states that modules should set it but it isn't referenced by PEP 302. What I want to do is add a reference and make it required like __loader__. Importlib already sets it implicitly post-import, but once again it would be nice to do this pre-import.

To help facilitate both new requirements, I would update the importlib.util.module_for_loader decorator to set both on a module that doesn't have them before passing the module down to the decorated method. That way people already using the decorator don't have to worry about anything and it is one less detail to have to worry about. I would also update the docs on importlib.util.set_package and importlib.util.set_loader to suggest people use importlib.util.module_for_loader and only use the other two decorators for backwards-compatibility.

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Eric Snow-2
On Sat, Apr 14, 2012 at 2:56 PM, Brett Cannon <[hidden email]> wrote:

> An open issue in PEP 302 is whether to require __loader__ attributes on
> modules. The claimed worry is memory consumption, but considering importlib
> and zipimport are already doing this that seems like a red herring.
> Requiring it, though, opens the door to people relying on its existence and
> thus starting to do things like loading assets with
> ``__loader__.get_data(path_to_internal_package_file)`` which allows code to
> not care how modules are stored (e.g. zip file, sqlite database, etc.).
>
> What I would like to do is update the PEP to state that loaders are expected
> to set __loader__. Now importlib will get updated to do that implicitly so
> external code can expect it post-import, but requiring loaders to set it
> would mean that code executed during import can rely on it as well.
>
> As for __package__, PEP 366 states that modules should set it but it isn't
> referenced by PEP 302. What I want to do is add a reference and make it
> required like __loader__. Importlib already sets it implicitly post-import,
> but once again it would be nice to do this pre-import.
>
> To help facilitate both new requirements, I would update the
> importlib.util.module_for_loader decorator to set both on a module that
> doesn't have them before passing the module down to the decorated method.
> That way people already using the decorator don't have to worry about
> anything and it is one less detail to have to worry about. I would also
> update the docs on importlib.util.set_package and importlib.util.set_loader
> to suggest people use importlib.util.module_for_loader and only use the
> other two decorators for backwards-compatibility.

+1

-eric
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Guido van Rossum
On Sat, Apr 14, 2012 at 2:15 PM, Eric Snow <[hidden email]> wrote:

> On Sat, Apr 14, 2012 at 2:56 PM, Brett Cannon <[hidden email]> wrote:
>> An open issue in PEP 302 is whether to require __loader__ attributes on
>> modules. The claimed worry is memory consumption, but considering importlib
>> and zipimport are already doing this that seems like a red herring.
>> Requiring it, though, opens the door to people relying on its existence and
>> thus starting to do things like loading assets with
>> ``__loader__.get_data(path_to_internal_package_file)`` which allows code to
>> not care how modules are stored (e.g. zip file, sqlite database, etc.).
>>
>> What I would like to do is update the PEP to state that loaders are expected
>> to set __loader__. Now importlib will get updated to do that implicitly so
>> external code can expect it post-import, but requiring loaders to set it
>> would mean that code executed during import can rely on it as well.
>>
>> As for __package__, PEP 366 states that modules should set it but it isn't
>> referenced by PEP 302. What I want to do is add a reference and make it
>> required like __loader__. Importlib already sets it implicitly post-import,
>> but once again it would be nice to do this pre-import.
>>
>> To help facilitate both new requirements, I would update the
>> importlib.util.module_for_loader decorator to set both on a module that
>> doesn't have them before passing the module down to the decorated method.
>> That way people already using the decorator don't have to worry about
>> anything and it is one less detail to have to worry about. I would also
>> update the docs on importlib.util.set_package and importlib.util.set_loader
>> to suggest people use importlib.util.module_for_loader and only use the
>> other two decorators for backwards-compatibility.
>
> +1

Funny, I was just thinking about having a simple standard API that
will let you open files (and list directories) relative to a given
module or package regardless of how the thing is loaded. If we
guarantee that there's always a __loader__ that's a first step, though
I think we may need to do a little more to get people who currently do
things like open(os.path.join(os.path.basename(__file__),
'some_file_name') to switch. I was thinking of having a stdlib
function that you give a module/package object, a relative filename,
and optionally a mode ('b' or 't') and returns a stream -- and sibling
functions that return a string or bytes object (depending on what API
the user is using either the stream or the data can be more useful).
What would we call thos functions and where would the live?

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Christian Heimes-2
Am 15.04.2012 00:32, schrieb Guido van Rossum:

> Funny, I was just thinking about having a simple standard API that
> will let you open files (and list directories) relative to a given
> module or package regardless of how the thing is loaded. If we
> guarantee that there's always a __loader__ that's a first step, though
> I think we may need to do a little more to get people who currently do
> things like open(os.path.join(os.path.basename(__file__),
> 'some_file_name') to switch. I was thinking of having a stdlib
> function that you give a module/package object, a relative filename,
> and optionally a mode ('b' or 't') and returns a stream -- and sibling
> functions that return a string or bytes object (depending on what API
> the user is using either the stream or the data can be more useful).
> What would we call thos functions and where would the live?

pkg_resources has a similar API [1] that supports dotted names.
pkg_resources also does some caching for files that aren't stored on a
local file system (database, ZIP file, you name it). It should be
trivial to support both dotted names and module instances.

Christian

[1]
http://packages.python.org/distribute/pkg_resources.html#resourcemanager-api

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Brett Cannon-2
In reply to this post by Guido van Rossum


On Sat, Apr 14, 2012 at 18:32, Guido van Rossum <[hidden email]> wrote:
On Sat, Apr 14, 2012 at 2:15 PM, Eric Snow <[hidden email]> wrote:
> On Sat, Apr 14, 2012 at 2:56 PM, Brett Cannon <[hidden email]> wrote:
>> An open issue in PEP 302 is whether to require __loader__ attributes on
>> modules. The claimed worry is memory consumption, but considering importlib
>> and zipimport are already doing this that seems like a red herring.
>> Requiring it, though, opens the door to people relying on its existence and
>> thus starting to do things like loading assets with
>> ``__loader__.get_data(path_to_internal_package_file)`` which allows code to
>> not care how modules are stored (e.g. zip file, sqlite database, etc.).
>>
>> What I would like to do is update the PEP to state that loaders are expected
>> to set __loader__. Now importlib will get updated to do that implicitly so
>> external code can expect it post-import, but requiring loaders to set it
>> would mean that code executed during import can rely on it as well.
>>
>> As for __package__, PEP 366 states that modules should set it but it isn't
>> referenced by PEP 302. What I want to do is add a reference and make it
>> required like __loader__. Importlib already sets it implicitly post-import,
>> but once again it would be nice to do this pre-import.
>>
>> To help facilitate both new requirements, I would update the
>> importlib.util.module_for_loader decorator to set both on a module that
>> doesn't have them before passing the module down to the decorated method.
>> That way people already using the decorator don't have to worry about
>> anything and it is one less detail to have to worry about. I would also
>> update the docs on importlib.util.set_package and importlib.util.set_loader
>> to suggest people use importlib.util.module_for_loader and only use the
>> other two decorators for backwards-compatibility.
>
> +1

Funny, I was just thinking about having a simple standard API that
will let you open files (and list directories) relative to a given
module or package regardless of how the thing is loaded. If we
guarantee that there's always a __loader__ that's a first step, though
I think we may need to do a little more to get people who currently do
things like open(os.path.join(os.path.basename(__file__),
'some_file_name') to switch. I was thinking of having a stdlib
function that you give a module/package object, a relative filename,
and optionally a mode ('b' or 't') and returns a stream -- and sibling
functions that return a string or bytes object (depending on what API
the user is using either the stream or the data can be more useful).
What would we call thos functions and where would the live?

IOW go one level lower than get_data() and return the stream and then just have helper functions which I guess just exhaust the stream for you to return bytes or str? Or are you thinking that somehow providing a function that can get an explicit bytes or str object will be more optimized than doing something with the stream? Either way you will need new methods on loaders to make it work more efficiently since loaders only have get_data() which returns bytes and not a stream object. Plus there is currently no API for listing the contents of a directory.

As for what to call such functions, I really don't know since they are essentially abstract functions above the OS which work on whatever storage backend a module uses.

For where they should live, it depends if you are viewing this as more of a file abstraction or something that ties into modules. For the former it seems like shutil or something that dealt with higher order file manipulation. If it's the latter I would say importlib.util.

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Guido van Rossum
On Sat, Apr 14, 2012 at 3:50 PM, Brett Cannon <[hidden email]> wrote:

> On Sat, Apr 14, 2012 at 18:32, Guido van Rossum <[hidden email]> wrote:
>> Funny, I was just thinking about having a simple standard API that
>> will let you open files (and list directories) relative to a given
>> module or package regardless of how the thing is loaded. If we
>> guarantee that there's always a __loader__ that's a first step, though
>> I think we may need to do a little more to get people who currently do
>> things like open(os.path.join(os.path.basename(__file__),
>> 'some_file_name') to switch. I was thinking of having a stdlib
>> function that you give a module/package object, a relative filename,
>> and optionally a mode ('b' or 't') and returns a stream -- and sibling
>> functions that return a string or bytes object (depending on what API
>> the user is using either the stream or the data can be more useful).
>> What would we call thos functions and where would the live?

> IOW go one level lower than get_data() and return the stream and then just
> have helper functions which I guess just exhaust the stream for you to
> return bytes or str? Or are you thinking that somehow providing a function
> that can get an explicit bytes or str object will be more optimized than
> doing something with the stream? Either way you will need new methods on
> loaders to make it work more efficiently since loaders only have get_data()
> which returns bytes and not a stream object. Plus there is currently no API
> for listing the contents of a directory.

Well, if it's a real file, and you need a stream, that's efficient,
and if you need the data, you can read it. But if it comes from a
loader, and you need a stream, you'd have to wrap it in a StringIO
instance. So having two APIs, one to get a stream, and one to get the
data, allows the implementation to be more optimal -- it would be bad
to wrap a StringIO instance around data only so you can read the data
from the stream again...

> As for what to call such functions, I really don't know since they are
> essentially abstract functions above the OS which work on whatever storage
> backend a module uses.
>
> For where they should live, it depends if you are viewing this as more of a
> file abstraction or something that ties into modules. For the former it
> seems like shutil or something that dealt with higher order file
> manipulation. If it's the latter I would say importlib.util.

if pkg_resources is in the stdlib that would be a fine place to put it.

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Brett Cannon-2
In reply to this post by Christian Heimes-2


On Sat, Apr 14, 2012 at 18:41, Christian Heimes <[hidden email]> wrote:
Am 15.04.2012 00:32, schrieb Guido van Rossum:
> Funny, I was just thinking about having a simple standard API that
> will let you open files (and list directories) relative to a given
> module or package regardless of how the thing is loaded. If we
> guarantee that there's always a __loader__ that's a first step, though
> I think we may need to do a little more to get people who currently do
> things like open(os.path.join(os.path.basename(__file__),
> 'some_file_name') to switch. I was thinking of having a stdlib
> function that you give a module/package object, a relative filename,
> and optionally a mode ('b' or 't') and returns a stream -- and sibling
> functions that return a string or bytes object (depending on what API
> the user is using either the stream or the data can be more useful).
> What would we call thos functions and where would the live?

pkg_resources has a similar API [1] that supports dotted names.
pkg_resources also does some caching for files that aren't stored on a
local file system (database, ZIP file, you name it). It should be
trivial to support both dotted names and module instances.


But that begs the question of whether this API should conflate module hierarchies with file directories. Are we trying to support reading files from within packages w/o caring about storage details but still fundamentally working with files, or are we trying to abstract away the concept of files and deal more with stored bytes inside packages? For the former you would essentially want the root package and then simply specify some file path. But for the latter you would want the module or package that is next to or containing the data and grab it from there.

And I just realized that we would have to be quite clear that for namespace packages it is what is in __file__ that people care about, else people might expect some search to be performed on their behalf. Namespace packages also dictate that you would want the module closest to the data in the hierarchy to make sure you went down the right directory (e.g. if you had the namespace package monty with modules spam and bacon but from different directories, you really want to make sure you grab the right module). I would argue that you can only go next to/within modules/packages; going up would just cause confusion on where you were grabbing from and going down could be done but makes things a little messier.

-Brett
 
Christian

[1]
http://packages.python.org/distribute/pkg_resources.html#resourcemanager-api

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Brett Cannon-2
In reply to this post by Guido van Rossum


On Sat, Apr 14, 2012 at 18:56, Guido van Rossum <[hidden email]> wrote:
On Sat, Apr 14, 2012 at 3:50 PM, Brett Cannon <[hidden email]> wrote:
> On Sat, Apr 14, 2012 at 18:32, Guido van Rossum <[hidden email]> wrote:
>> Funny, I was just thinking about having a simple standard API that
>> will let you open files (and list directories) relative to a given
>> module or package regardless of how the thing is loaded. If we
>> guarantee that there's always a __loader__ that's a first step, though
>> I think we may need to do a little more to get people who currently do
>> things like open(os.path.join(os.path.basename(__file__),
>> 'some_file_name') to switch. I was thinking of having a stdlib
>> function that you give a module/package object, a relative filename,
>> and optionally a mode ('b' or 't') and returns a stream -- and sibling
>> functions that return a string or bytes object (depending on what API
>> the user is using either the stream or the data can be more useful).
>> What would we call thos functions and where would the live?

> IOW go one level lower than get_data() and return the stream and then just
> have helper functions which I guess just exhaust the stream for you to
> return bytes or str? Or are you thinking that somehow providing a function
> that can get an explicit bytes or str object will be more optimized than
> doing something with the stream? Either way you will need new methods on
> loaders to make it work more efficiently since loaders only have get_data()
> which returns bytes and not a stream object. Plus there is currently no API
> for listing the contents of a directory.

Well, if it's a real file, and you need a stream, that's efficient,
and if you need the data, you can read it. But if it comes from a
loader, and you need a stream, you'd have to wrap it in a StringIO
instance. So having two APIs, one to get a stream, and one to get the
data, allows the implementation to be more optimal -- it would be bad
to wrap a StringIO instance around data only so you can read the data
from the stream again...

Right, so you would need to grow, which is fine and can be done in a backwards-compatible way using io.BytesIO and StringIO.
 

> As for what to call such functions, I really don't know since they are
> essentially abstract functions above the OS which work on whatever storage
> backend a module uses.
>
> For where they should live, it depends if you are viewing this as more of a
> file abstraction or something that ties into modules. For the former it
> seems like shutil or something that dealt with higher order file
> manipulation. If it's the latter I would say importlib.util.

if pkg_resources is in the stdlib that would be a fine place to put it.

It's not.

-Brett
 

--
--Guido van Rossum (python.org/~guido)


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Christian Heimes-2
In reply to this post by Guido van Rossum
Am 15.04.2012 00:56, schrieb Guido van Rossum:
> Well, if it's a real file, and you need a stream, that's efficient,
> and if you need the data, you can read it. But if it comes from a
> loader, and you need a stream, you'd have to wrap it in a StringIO
> instance. So having two APIs, one to get a stream, and one to get the
> data, allows the implementation to be more optimal -- it would be bad
> to wrap a StringIO instance around data only so you can read the data
> from the stream again...

We need a third way to access a file. The two methods get_data() and
get_stream() aren't sufficient for libraries that need a read file that
lifes on the file system. In order to have real files the loader (or
some other abstraction layer) needs to create a temporary directory for
the current process and clean it up when the process ends. The file is
saved to the temporary directory the first time it's accessed.

The get_file() feature has a neat benefit. Since it transparently
extracts files from the loader, users can ship binary extensions and
shared libraries (dlls) in a ZIP file and use them without too much hassle.

Christian
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Guido van Rossum
On Sat, Apr 14, 2012 at 5:06 PM, Christian Heimes <[hidden email]> wrote:

> Am 15.04.2012 00:56, schrieb Guido van Rossum:
>> Well, if it's a real file, and you need a stream, that's efficient,
>> and if you need the data, you can read it. But if it comes from a
>> loader, and you need a stream, you'd have to wrap it in a StringIO
>> instance. So having two APIs, one to get a stream, and one to get the
>> data, allows the implementation to be more optimal -- it would be bad
>> to wrap a StringIO instance around data only so you can read the data
>> from the stream again...
>
> We need a third way to access a file. The two methods get_data() and
> get_stream() aren't sufficient for libraries that need a read file that
> lives on the file system. In order to have real files the loader (or
> some other abstraction layer) needs to create a temporary directory for
> the current process and clean it up when the process ends. The file is
> saved to the temporary directory the first time it's accessed.

Hm... Can you give an example of a library that needs a real file?
That sounds like a poorly designed API.

Perhaps you're talking about APIs that take a filename instead of a
stream? Maybe for those it would be best to start getting serious
about a virtual filesystem... (Sorry, probably python-ideas stuff).

> The get_file() feature has a neat benefit. Since it transparently
> extracts files from the loader, users can ship binary extensions and
> shared libraries (dlls) in a ZIP file and use them without too much hassle.

Yeah, DLLs are about the only example I can think of where even a
virtual filesystem doesn't help...

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Nick Coghlan
On Sun, Apr 15, 2012 at 12:59 PM, Guido van Rossum <[hidden email]> wrote:
> Hm... Can you give an example of a library that needs a real file?
> That sounds like a poorly designed API.

If you're invoking a separate utility (e.g. via it's command line
interface), you may need a real filesystem path that you can pass
along.

>> The get_file() feature has a neat benefit. Since it transparently
>> extracts files from the loader, users can ship binary extensions and
>> shared libraries (dlls) in a ZIP file and use them without too much hassle.
>
> Yeah, DLLs are about the only example I can think of where even a
> virtual filesystem doesn't help...

An important example, though. However, I still don't believe it is
something we should necessarily be rushing into implementing in the
standard library in the *same* release that finally completes the
conversion started so long ago with PEP 302.

Cheers,
Nick.

--
Nick Coghlan   |   [hidden email]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Nick Coghlan
In reply to this post by Guido van Rossum
On Sun, Apr 15, 2012 at 8:32 AM, Guido van Rossum <[hidden email]> wrote:

> Funny, I was just thinking about having a simple standard API that
> will let you open files (and list directories) relative to a given
> module or package regardless of how the thing is loaded. If we
> guarantee that there's always a __loader__ that's a first step, though
> I think we may need to do a little more to get people who currently do
> things like open(os.path.join(os.path.basename(__file__),
> 'some_file_name') to switch. I was thinking of having a stdlib
> function that you give a module/package object, a relative filename,
> and optionally a mode ('b' or 't') and returns a stream -- and sibling
> functions that return a string or bytes object (depending on what API
> the user is using either the stream or the data can be more useful).
> What would we call thos functions and where would the live?

We already offer pkgutil.get_data() for the latter API:
http://docs.python.org/library/pkgutil#pkgutil.get_data

There's no get_file() or get_filename() equivalent, since there's no
relevant API formally defined for PEP 302 loader objects (the closest
we have is get_filename(), which is only defined for the actual module
objects, not for arbitrary colocated files).

Now that importlib is the official import implementation, and is fully
PEP 302 compliant, large sections of pkgutil should either be
deprecated (the import emulation) or updated to be thin wrappers
around importlib (the package walking components and other utility
functions).

Cheers,
Nick.

--
Nick Coghlan   |   [hidden email]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Glyph Lefkowitz
In reply to this post by Guido van Rossum

On Apr 14, 2012, at 3:32 PM, Guido van Rossum wrote:

Funny, I was just thinking about having a simple standard API that
will let you open files (and list directories) relative to a given
module or package regardless of how the thing is loaded.

Twisted has such a thing, mostly written by me, called twisted.python.modules.

Sorry if I'm repeating myself here, I know I've brought it up on this list before, but it seems germane to this thread.  I'd be interested in getting feedback from the import-wizards participating in this thread in case it is doing anything bad (in particular I'd like to make sure it will keep working in future versions of Python), but I think it may provide quite a good template for a standard API.


The API is fairly simple.

>>> from twisted.python.modules import getModule
>>> e = getModule("email") # get an abstract "module" object (un-loaded)
>>> e
PythonModule<'email'>
>>> walker = e.walkModules() # walk the module hierarchy
>>> walker.next()
PythonModule<'email'>
>>> walker.next()
PythonModule<'email._parseaddr'>
>>> walker.next() # et cetera
PythonModule<'email.base64mime'>
>>> charset = e["charset"] # get the 'charset' child module of the 'e' package
>>> charset.filePath
FilePath('.../lib/python2.7/email/charset.py')
>>> charset.filePath.parent().children() # list the directory containing charset.py

Worth pointing out is that although in this example it's a FilePath, it could also be a ZipPath if you imported stuff from a zipfile.  We have an adapter that inspects path_importer_cache and produces appropriately-shaped filesystem-like objects depending on where your module was imported from.  Thank you to authors of PEP 302; that was my religion while writing this code.

You can also, of course, ask to load something once you've identified it with the traversal API:

>>> charset.load()
<module 'email.charset' from '.../lib/python2.7/email/charset.pyc'>

You can also ask questions like this, which are very useful when debugging setup problems:

>>> ifaces = getModule("twisted.internet.interfaces")
>>> ifaces.pathEntry
PathEntry<FilePath('/Domicile/glyph/Projects/Twisted/trunk')>
>>> list(ifaces.pathEntry.iterModules())
[PythonModule<'setup'>, PythonModule<'twisted'>]

This asks what sys.path entry is responsible twisted.internet.interfaces, and then what other modules could be loaded from there.  Just 'setup' and 'twisted' indicates that this is a development install (not surprising for one of my computers), since site-packages would be much more crowded.

The idiom for saying "there's a file installed near this module, and I'd like to grab it as a string", is pretty straightforward:

from twisted.python.modules import getModule
mod = getModule(__name__).filePath.sibling("my-file").open().read()

And hopefully it's obvious from this idiom how one might get the pathname, or a stream rather than the bytes.

-glyph

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Glyph Lefkowitz
In reply to this post by Guido van Rossum

On Apr 14, 2012, at 7:59 PM, Guido van Rossum wrote:

> On Sat, Apr 14, 2012 at 5:06 PM, Christian Heimes <[hidden email]> wrote:
>> Am 15.04.2012 00:56, schrieb Guido van Rossum:
>>> Well, if it's a real file, and you need a stream, that's efficient,
>>> and if you need the data, you can read it. But if it comes from a
>>> loader, and you need a stream, you'd have to wrap it in a StringIO
>>> instance. So having two APIs, one to get a stream, and one to get the
>>> data, allows the implementation to be more optimal -- it would be bad
>>> to wrap a StringIO instance around data only so you can read the data
>>> from the stream again...
>>
>> We need a third way to access a file. The two methods get_data() and
>> get_stream() aren't sufficient for libraries that need a read file that
>> lives on the file system. In order to have real files the loader (or
>> some other abstraction layer) needs to create a temporary directory for
>> the current process and clean it up when the process ends. The file is
>> saved to the temporary directory the first time it's accessed.
>
> Hm... Can you give an example of a library that needs a real file?
> That sounds like a poorly designed API.

Lots of C libraries use filenames or FILE*s where they _should_ be using some much more abstract things; i.e., constellations of function pointers that are isomorphic to Python's "file-like objects".  Are these APIs poorly designed?  Sure, but they also exist ;).

> Perhaps you're talking about APIs that take a filename instead of a
> stream? Maybe for those it would be best to start getting serious
> about a virtual filesystem... (Sorry, probably python-ideas stuff).

twisted.python.filepath... ;-)

>> The get_file() feature has a neat benefit. Since it transparently
>> extracts files from the loader, users can ship binary extensions and
>> shared libraries (dlls) in a ZIP file and use them without too much hassle.
>
> Yeah, DLLs are about the only example I can think of where even a
> virtual filesystem doesn't help...

In a previous life, I was frequently exposed to proprietary game-engine things that could only load resources (3D models, audio files, textures) from actual real files, and I had to do lots of unpacking stuff either from things tacked on to a .exe or inside a zip file.  (I don't know how common this is any more in that world but I suspect "very".)

Unfortunately all the examples I can think of off the top of my head were in proprietary, now defunct code; but this is exactly the sort of polish that open-sourcing tends to apply, so I would guess problematic code in this regard would more often be invisible.

-glyph
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Barry Warsaw
In reply to this post by Guido van Rossum
On Apr 14, 2012, at 03:32 PM, Guido van Rossum wrote:

>Funny, I was just thinking about having a simple standard API that
>will let you open files (and list directories) relative to a given
>module or package regardless of how the thing is loaded.

I tend to use the "basic resource access" API of pkg_resources.

http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

I'm not suggesting that we adopt all of pkg_resources, but I think the 5
functions listed there, plus resource_filename() (from the next section)
provide basic functionality I've found very useful.

-Barry
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Barry Warsaw
In reply to this post by Glyph Lefkowitz
On Apr 15, 2012, at 02:12 PM, Glyph wrote:

>Twisted has such a thing, mostly written by me, called
>twisted.python.modules.
>
>Sorry if I'm repeating myself here, I know I've brought it up on this list
>before, but it seems germane to this thread.  I'd be interested in getting
>feedback from the import-wizards participating in this thread in case it is
>doing anything bad (in particular I'd like to make sure it will keep working
>in future versions of Python), but I think it may provide quite a good
>template for a standard API.
>
>The code's here: <http://twistedmatrix.com/trac/browser/trunk/twisted/python/modules.py>
>
>The API is fairly simple.
>
>>>> from twisted.python.modules import getModule
>>>> e = getModule("email") # get an abstract "module" object (un-loaded)

Got a PEP 8 friendly version? :)

-Barry
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Glyph Lefkowitz

On Apr 15, 2012, at 6:38 PM, Barry Warsaw wrote:

> On Apr 15, 2012, at 02:12 PM, Glyph wrote:
>
>> Twisted has such a thing, mostly written by me, called
>> twisted.python.modules.
>>
>> Sorry if I'm repeating myself here, I know I've brought it up on this list
>> before, but it seems germane to this thread.  I'd be interested in getting
>> feedback from the import-wizards participating in this thread in case it is
>> doing anything bad (in particular I'd like to make sure it will keep working
>> in future versions of Python), but I think it may provide quite a good
>> template for a standard API.
>>
>> The code's here: <http://twistedmatrix.com/trac/browser/trunk/twisted/python/modules.py>
>>
>> The API is fairly simple.
>>
>>>>> from twisted.python.modules import getModule
>>>>> e = getModule("email") # get an abstract "module" object (un-loaded)
>
> Got a PEP 8 friendly version? :)

No, but I'd be happy to do the translation manually if people actually prefer the shape of this API!

I am just pointing it out as a source of inspiration for whatever comes next, which I assume will be based on pkg_resources.

-glyph
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Brett Cannon-2
In reply to this post by Brett Cannon-2
Anyone other than Eric have something to say on this proposal? Obviously the discussion went tangential before I saw a clear consensus that what I was proposing was fine with people.

On Sat, Apr 14, 2012 at 16:56, Brett Cannon <[hidden email]> wrote:
An open issue in PEP 302 is whether to require __loader__ attributes on modules. The claimed worry is memory consumption, but considering importlib and zipimport are already doing this that seems like a red herring. Requiring it, though, opens the door to people relying on its existence and thus starting to do things like loading assets with ``__loader__.get_data(path_to_internal_package_file)`` which allows code to not care how modules are stored (e.g. zip file, sqlite database, etc.).

What I would like to do is update the PEP to state that loaders are expected to set __loader__. Now importlib will get updated to do that implicitly so external code can expect it post-import, but requiring loaders to set it would mean that code executed during import can rely on it as well.

As for __package__, PEP 366 states that modules should set it but it isn't referenced by PEP 302. What I want to do is add a reference and make it required like __loader__. Importlib already sets it implicitly post-import, but once again it would be nice to do this pre-import.

To help facilitate both new requirements, I would update the importlib.util.module_for_loader decorator to set both on a module that doesn't have them before passing the module down to the decorated method. That way people already using the decorator don't have to worry about anything and it is one less detail to have to worry about. I would also update the docs on importlib.util.set_package and importlib.util.set_loader to suggest people use importlib.util.module_for_loader and only use the other two decorators for backwards-compatibility.


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Andrew Svetlov
+1 for initial proposition.

On Tue, Apr 17, 2012 at 6:59 PM, Brett Cannon <[hidden email]> wrote:

> Anyone other than Eric have something to say on this proposal? Obviously the
> discussion went tangential before I saw a clear consensus that what I was
> proposing was fine with people.
>
>
> On Sat, Apr 14, 2012 at 16:56, Brett Cannon <[hidden email]> wrote:
>>
>> An open issue in PEP 302 is whether to require __loader__ attributes on
>> modules. The claimed worry is memory consumption, but considering importlib
>> and zipimport are already doing this that seems like a red herring.
>> Requiring it, though, opens the door to people relying on its existence and
>> thus starting to do things like loading assets with
>> ``__loader__.get_data(path_to_internal_package_file)`` which allows code to
>> not care how modules are stored (e.g. zip file, sqlite database, etc.).
>>
>> What I would like to do is update the PEP to state that loaders are
>> expected to set __loader__. Now importlib will get updated to do that
>> implicitly so external code can expect it post-import, but requiring loaders
>> to set it would mean that code executed during import can rely on it as
>> well.
>>
>> As for __package__, PEP 366 states that modules should set it but it isn't
>> referenced by PEP 302. What I want to do is add a reference and make it
>> required like __loader__. Importlib already sets it implicitly post-import,
>> but once again it would be nice to do this pre-import.
>>
>> To help facilitate both new requirements, I would update the
>> importlib.util.module_for_loader decorator to set both on a module that
>> doesn't have them before passing the module down to the decorated method.
>> That way people already using the decorator don't have to worry about
>> anything and it is one less detail to have to worry about. I would also
>> update the docs on importlib.util.set_package and importlib.util.set_loader
>> to suggest people use importlib.util.module_for_loader and only use the
>> other two decorators for backwards-compatibility.
>
>
>
> _______________________________________________
> Python-Dev mailing list
> [hidden email]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
>



--
Thanks,
Andrew Svetlov
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Require loaders set __package__ and __loader__

Nick Coghlan
In reply to this post by Brett Cannon-2

+1 here. Previously, it wasn't a reasonable requirement, since CPython itself didn't comply with it.

--
Sent from my phone, thus the relative brevity :)


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com