Adding a builtins parameter to eval(), exec() and __import__().

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding a builtins parameter to eval(), exec() and __import__().

Mark Shannon-3
I propose adding an optional (keyword-only?) 3rd parameter, "builtins"
to exec(), eval(), __import__() and any other functions that take
locals and globals as parameters.

Currently, Python code is evaluated in a context of three name spaces;
locals(), globals() and builtins.
However, eval & exec only take 2 (optional) namespaces as parameters;
locals and globals, so access to builtins is poorly defined.

The reason I am proposing this here rather than on python-ideas is that
treating the triple of [locals, globals, builtins] as a single
"execution context" can be implemented in a really nice way.

Internally, the execution context of [locals, globals, builtins]
can be treated a single immutable object (custom object or tuple)
Treating it as immutable means that it can be copied merely by taking a
reference. A nice trick in the implementation is to make a NULL locals
mean "fast" locals for function contexts. Frames, could then acquire
their globals and builtins by a single reference copy from the function
object, rather than searching globals for a '__builtins__'
to find the builtins.

A unified execution context will speed up all calls
(to Python functions) as frame allocation and deallocation would be faster.
I used this implementation in my original HotPy VM, and it worked well.

It should also help with sandboxing, as it would make it easier to
analyse and thus control access to builtins, since the execution context
of all code would be easier to determine.

Currently, it is impossible to allow one function access to sensitive
functions like open(), while denying it to others, as any code can then
get the builtins of another function via f.__globals__['builtins__'].
Separating builtins from globals could solve this.

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Benjamin Peterson-3
2012/3/7 Mark Shannon <[hidden email]>:
> Currently, it is impossible to allow one function access to sensitive
> functions like open(), while denying it to others, as any code can then
> get the builtins of another function via f.__globals__['builtins__'].
> Separating builtins from globals could solve this.

I like this idea. We could finally kill __builtins__, too, which has
often been confusing for people.


--
Regards,
Benjamin
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Brett Cannon-2


On Wed, Mar 7, 2012 at 10:56, Benjamin Peterson <[hidden email]> wrote:
2012/3/7 Mark Shannon <[hidden email]>:
> Currently, it is impossible to allow one function access to sensitive
> functions like open(), while denying it to others, as any code can then
> get the builtins of another function via f.__globals__['builtins__'].
> Separating builtins from globals could solve this.

I like this idea. We could finally kill __builtins__, too, which has
often been confusing for people.

I like it as well. It's a mess right now to try to grab the __import__() implementation and this would actually help clarify import semantics by saying that __import__() for any chained imports comes from __import__()s locals, globals, or builtins arguments (in that order) or from the builtins module itself (i.e. tstate->builtins).

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Adding a builtins parameter to eval(), exec() and __import__().

Jim Jewett


http://mail.python.org/pipermail/python-dev/2012-March/117395.html
Brett Cannon posted:

[in reply to Mark Shannon's suggestion of adding a builtins parameter
to match locals and globals]

> It's a mess right now to try to grab the __import__()
> implementation and this would actually help clarify import semantics by
> saying that __import__() for any chained imports comes from __import__()s
> locals, globals, or builtins arguments (in that order) or from the builtins
> module itself (i.e. tstate->builtins).

How does that differ from today?

If you're saying that the locals and (module-level) globals aren't
always checked in order, then that is a semantic change.  Probably
a good change, but still a change -- and it can be made indepenently
of Mark's suggestion.

Also note that I would assume this was for sandboxing, and that
missing names should *not* fall back to the "real" globals, although
I would understand if bootstrapping required the import statement to
get special treatment.


(Note that I like Mark's proposed change; I just don't see how it
cleans up import.)


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Mark Shannon-3
Jim J. Jewett wrote:

>
> http://mail.python.org/pipermail/python-dev/2012-March/117395.html
> Brett Cannon posted:
>
> [in reply to Mark Shannon's suggestion of adding a builtins parameter
> to match locals and globals]
>
>> It's a mess right now to try to grab the __import__()
>> implementation and this would actually help clarify import semantics by
>> saying that __import__() for any chained imports comes from __import__()s
>> locals, globals, or builtins arguments (in that order) or from the builtins
>> module itself (i.e. tstate->builtins).
>
> How does that differ from today?

The idea is that you can change, presumable restrict, the builtins
separately from the globals for an import.

>
> If you're saying that the locals and (module-level) globals aren't
> always checked in order, then that is a semantic change.  Probably
> a good change, but still a change -- and it can be made indepenently
> of Mark's suggestion.
>
> Also note that I would assume this was for sandboxing,

Actually, I just think it's a cleaner implementation,
but sandboxing is a good excuse :)

 > and that
> missing names should *not* fall back to the "real" globals, although
> I would understand if bootstrapping required the import statement to
> get special treatment.
>
>
> (Note that I like Mark's proposed change; I just don't see how it
> cleans up import.)

I don't think it cleans up import, but I'll defer to Brett on that.
I've included __import__() along with exec and eval as it is a place
where new namespaces can be introduced into an execution.
There may be others I haven't though of.

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Nick Coghlan
On Thu, Mar 8, 2012 at 10:06 PM, Mark Shannon <[hidden email]> wrote:
> I don't think it cleans up import, but I'll defer to Brett on that.
> I've included __import__() along with exec and eval as it is a place where
> new namespaces can be introduced into an execution.
> There may be others I haven't though of.

runpy is another one.

However, the problem I see with "builtins" as a separate argument is
that it would be a lie.

The element that's most interesting about locals vs globals vs
builtins is the scope of visibility of their contents.

When I call out to another function in the same module, locals are not
shared, but globals and builtins are.

When I call out to code in a *different* module, neither locals nor
globals are shared, but builtins are still common.

So there are two ways this purported extra "builtins" parameter could work:

1. Sandboxing - you try to genuinely give the execution context a
different set of builtins that's shared by all code executed, even
imports from other modules.  However, I assume this isn't what you
meant, since it is the domain of sandboxing utilities like Victor's
pysandbox and is known to be incredibly difficult to get right (hence
the demise of both rexec and Bastion and recent comments about known
segfault vulnerabilities that are tolerable in the normal case of
merely processing untrusted data with trusted code but anathema to a
robust CPython native sandboxing scheme that can still cope even when
the code itself is untrusted).

2. chained globals - just an extra namespace that's chained behind the
globals dictionary for name lookup, not actually shared with code
invoked from other modules.

The second approach is potentially useful, but:

1. "builtins" is *not* the right name for it (because any other code
invoked will still be using the original builtins)
2. it's already trivial to achieve such chained lookups in 3.3 by
passing a collections.ChainMap instance as the globals parameter:
http://docs.python.org/dev/library/collections#collections.ChainMap

collections.ChainMap also has the virtue of working with any current
API that accepts a globals argument and can be extended to an
arbitrary level of chaining, whereas this suggestion requires that all
such APIs be expanded to accept a third parameter, and could still
only chain lookups one additional step in doing so.

So a big -1 from me.

Cheers,
Nick.

P.S. I've referenced this talk before, but Tim Dawborn's effort from
PyCon AU last year about the sandboxing setup for
http://www.ncss.edu.au/ should be required viewing for anyone wanting
to understand the kind of effort it takes to fairly comprehensively
protect host servers from attacks when executing arbitrary untrusted
Python code on CPython. Implementing such protection is certainly
*possible* (since Tim's talk is all about one way to do it), but it's
not easy, and Tim's approach uses Linux OS level sandboxing rather
than rather than relying on a Python language level sandbox. This was
largely due to a university requirement that the sandbox solution be
language agnostic, but it also serves to protect the sandbox from the
documented attacks against the CPython interpreter. Tim reviews a few
interesting attempts to break the sandbox around the 5 minute mark in
https://www.youtube.com/watch?v=y-WPPdhTKBU. (I did suggest he grab
our test_crashers directory to see what happened when they were run in
the sandbox, but I doubt it would be much more interesting than merely
calling "sys.exit()")

--
Nick Coghlan   |   [hidden email]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Mark Shannon-3
Nick Coghlan wrote:
> On Thu, Mar 8, 2012 at 10:06 PM, Mark Shannon <[hidden email]> wrote:
>> I don't think it cleans up import, but I'll defer to Brett on that.
>> I've included __import__() along with exec and eval as it is a place where
>> new namespaces can be introduced into an execution.
>> There may be others I haven't though of.
>
> runpy is another one.

Add that to the list.

>
> However, the problem I see with "builtins" as a separate argument is
> that it would be a lie.
>
> The element that's most interesting about locals vs globals vs
> builtins is the scope of visibility of their contents.
>
> When I call out to another function in the same module, locals are not
> shared, but globals and builtins are.
>
> When I call out to code in a *different* module, neither locals nor
> globals are shared, but builtins are still common.

Not necessarily. All functions in a module will inherit their globals
*and* builtins from the module, which gets them from __import__().

>
> So there are two ways this purported extra "builtins" parameter could work:
>
> 1. Sandboxing - you try to genuinely give the execution context a
> different set of builtins that's shared by all code executed, even
> imports from other modules.  

Victor's pysandbox seems pretty good to me, I had a go at breaking it
and failed, but it is too restrictive.

Rather than make pysandbox more secure, I think my proposal could make
it more usable, as clearer guarantees about access and visibility can be
provided to the sandbox developer.
You shouldn't need to cripple introspection in order to limit access to
the builtins.

> However, I assume this isn't what you
> meant, since it is the domain of sandboxing utilities like Victor's
> pysandbox and is known to be incredibly difficult to get right (hence
> the demise of both rexec and Bastion and recent comments about known
> segfault vulnerabilities that are tolerable in the normal case of
> merely processing untrusted data with trusted code but anathema to a
> robust CPython native sandboxing scheme that can still cope even when
> the code itself is untrusted).

By changing the implementation to be based around immutable "execution
context"s means that the compiler will enforce things for us.
Static typing has its advantages, occasionally :)

As I stated elsewhere, the crashers can be fixed. I think Victor has
already fixed a couple.

>
> 2. chained globals - just an extra namespace that's chained behind the
> globals dictionary for name lookup, not actually shared with code
> invoked from other modules.

That's exactly what builtins already are. They are a fall back for
LOAD_GLOBAL and similar when something isn't found in the globals.

>
> The second approach is potentially useful, but:
>
> 1. "builtins" is *not* the right name for it (because any other code
> invoked will still be using the original builtins)

Other code will use whatever builtins they were given at __import__.

The key point is that every piece of code already inherits locals,
globals and builtins from somewhere else.
We can already control locals (by which parameters are passed in) and
globals via exec, eval, __import__, and runpy (any others?)
but we can't control builtins.


One last point is that this is a low-impact change. All code using eval,
etc. will continue to work as before.
It also may speed things up a little.

Cheers,
Mark.

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Nick Coghlan
On Thu, Mar 8, 2012 at 11:40 PM, Mark Shannon <[hidden email]> wrote:
> Other code will use whatever builtins they were given at __import__.

Then they're not builtins - they're module-specific chained globals.
The thing that makes the builtins special is *who else* can see them
(i.e. all the other code in the process). If you replace
builtins.open, you replace if for everyone (that hasn't either
shadowed it or cached a reference to the original).

> The key point is that every piece of code already inherits locals, globals
> and builtins from somewhere else.
> We can already control locals (by which parameters are passed in) and
> globals via exec, eval, __import__, and runpy (any others?)
> but we can't control builtins.

Correct - because controlling builtins is the domain of sandboxes.

> One last point is that this is a low-impact change. All code using eval,
> etc. will continue to work as before.
> It also may speed things up a little.

Passing in a ChainMap instance as the globals when you want to include
an additional namespace in the lookup chain is even lower impact.

A reference implementation and concrete use cases might change my
mind, but for now, I'm just seeing a horrendously complicated approach
with huge implications for the runtime data model semantics for
something that 3.3 already supports in a much simpler fashion.

Cheers,
Nick.

--
Nick Coghlan   |   [hidden email]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Guido van Rossum
On Thu, Mar 8, 2012 at 5:57 AM, Nick Coghlan <[hidden email]> wrote:
> On Thu, Mar 8, 2012 at 11:40 PM, Mark Shannon <[hidden email]> wrote:
>> Other code will use whatever builtins they were given at __import__.
>
> Then they're not builtins - they're module-specific chained globals.
> The thing that makes the builtins special is *who else* can see them
> (i.e. all the other code in the process). If you replace
> builtins.open, you replace if for everyone (that hasn't either
> shadowed it or cached a reference to the original).

Looks like you two are talking about different things. There is only
one 'builtins' *module*.

But the __builtins__ that are actually used by any particular piece of
code is *not* taken by importing builtins. It is taken from what the
globals store under the key __builtins__.

This is a feature that was added specifically for sandboxing purposes,
but I believe it has found other uses too.

>> The key point is that every piece of code already inherits locals, globals
>> and builtins from somewhere else.
>> We can already control locals (by which parameters are passed in) and
>> globals via exec, eval, __import__, and runpy (any others?)
>> but we can't control builtins.
>
> Correct - because controlling builtins is the domain of sandboxes.

Incorrect (unless I misunderstand the context) -- when you control the
globals you control the __builtins__ set there.

>> One last point is that this is a low-impact change. All code using eval,
>> etc. will continue to work as before.
>> It also may speed things up a little.
>
> Passing in a ChainMap instance as the globals when you want to include
> an additional namespace in the lookup chain is even lower impact.
>
> A reference implementation and concrete use cases might change my
> mind, but for now, I'm just seeing a horrendously complicated approach
> with huge implications for the runtime data model semantics for
> something that 3.3 already supports in a much simpler fashion.

I can't say I'm completely following the discussion. It's not clear
whether what I just explained was already implicit in the coversation
or is new information.

In any case, the locals / globals / builtins chain is a
simplification; there are also any number of intermediate scopes
(between locals and globals) from which "nonlocal" variables may be
used. Like optimized function globals, these don't use a dict lookup
at all, they are determined by compile-time analysis.

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Paul Moore
In reply to this post by Nick Coghlan
On 8 March 2012 12:52, Nick Coghlan <[hidden email]> wrote:
> 2. it's already trivial to achieve such chained lookups in 3.3 by
> passing a collections.ChainMap instance as the globals parameter:
> http://docs.python.org/dev/library/collections#collections.ChainMap

Somewhat OT, but collections.ChainMap is really cool. I hadn't noticed
it get added into 3.3, and as far as I can see, it's not in the
"What's New in 3.3" document. But it's little things like this that
*really* make the difference for me in new versions.

So thanks to whoever added it, and could we have a whatsnew entry, please?

Paul.
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Victor Stinner-3
In reply to this post by Mark Shannon-3
On 07/03/2012 16:33, Mark Shannon wrote:
> It should also help with sandboxing, as it would make it easier to
> analyse and thus control access to builtins, since the execution context
> of all code would be easier to determine.

pysandbox patchs __builtins__ in:

  - the caller frame
  - the interpreter state
  - all modules

It uses a read-only dict with only a subset of __builtins__. It is
important for:

  - deny replacing a builtin function
  - deny adding a new "superglobal" variable
  - deny accessing a blocked function

If a module or something else leaks the real builtins dict, it would be
a vulnerability.

pysandbox is able to replace temporary __builtins__ everywhere and then
restore the previous state.

Can you please explain why/how pysandbox is too restrictive and how your
proposition would make it more usable?

> Currently, it is impossible to allow one function access to sensitive
> functions like open(), while denying it to others, as any code can then
> get the builtins of another function via f.__globals__['builtins__'].
> Separating builtins from globals could solve this.

For a sandbox, it's a feature, or maybe a requirement :-)

It is a problem if a function accessing to the trusted builtins dict is
also accessible in the sandbox. I don't remember why it is a problem:
pysandbox blocks access to the __globals__ attribute of functions.

Victor
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Nick Coghlan
In reply to this post by Guido van Rossum
On Fri, Mar 9, 2012 at 3:31 AM, Guido van Rossum <[hidden email]> wrote:
> But the __builtins__ that are actually used by any particular piece of
> code is *not* taken by importing builtins. It is taken from what the
> globals store under the key __builtins__.
>
> This is a feature that was added specifically for sandboxing purposes,
> but I believe it has found other uses too.

Agreed, but swapping out builtins for a different namespace is still
the exception rather than the rule. My Impression of Mark's proposal
was that this approach would become the *preferred* way of doing
things, and that's the part I don't like at a conceptual level.

>>> The key point is that every piece of code already inherits locals, globals
>>> and builtins from somewhere else.
>>> We can already control locals (by which parameters are passed in) and
>>> globals via exec, eval, __import__, and runpy (any others?)
>>> but we can't control builtins.
>>
>> Correct - because controlling builtins is the domain of sandboxes.
>
> Incorrect (unless I misunderstand the context) -- when you control the
> globals you control the __builtins__ set there.

And this is where I don't like the idea at a practical level. We
already have a way to swap in a different set of builtins for a
certain execution context (i.e. set "__builtins__" in the global
namespace) for a small chunk of code, as well as allowing
collections.ChainMap to insert additional namespaces into the name
lookup path.

This proposal suggests adding an additional mapping argument to every
API that currently accepts a locals and/or globals mapping, thus
achieving... well, nothing substantial, as far as I can tell (aside
from a lot of pointless churn in a bunch of APIs, not all of which are
under our direct control).

> In any case, the locals / globals / builtins chain is a
> simplification; there are also any number of intermediate scopes
> (between locals and globals) from which "nonlocal" variables may be
> used. Like optimized function globals, these don't use a dict lookup
> at all, they are determined by compile-time analysis.

Acknowledged, but code executed via the exec API with both locals and
globals passed in is actually one of the few places where that lookup
chain survives in its original form (module level class definitions
being the other).

Now, rereading Mark's original message, a simpler proposal of having
*function objects* do an early lookup of
"self.__globals__['__builtins__']" at creation time and caching that
somewhere such that the frame objects can get hold of it (rather than
having to do the lookup every time the function gets called or a
builtin gets referenced) might be a nice micro-optimisation. It's the
gratuitous API changes that I'm objecting to, not the underlying idea
of binding the reference to the builtins namespace earlier in the
function definition process. I'd even be OK with leaving the default
builtins reference *out* of the globals namespace in favour of storing
a hidden reference on the frame objects.

Cheers,
Nick.

--
Nick Coghlan   |   [hidden email]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Guido van Rossum
On Thu, Mar 8, 2012 at 4:33 PM, Nick Coghlan <[hidden email]> wrote:

> On Fri, Mar 9, 2012 at 3:31 AM, Guido van Rossum <[hidden email]> wrote:
>> But the __builtins__ that are actually used by any particular piece of
>> code is *not* taken by importing builtins. It is taken from what the
>> globals store under the key __builtins__.
>>
>> This is a feature that was added specifically for sandboxing purposes,
>> but I believe it has found other uses too.
>
> Agreed, but swapping out builtins for a different namespace is still
> the exception rather than the rule. My Impression of Mark's proposal
> was that this approach would become the *preferred* way of doing
> things, and that's the part I don't like at a conceptual level.
>
>>>> The key point is that every piece of code already inherits locals, globals
>>>> and builtins from somewhere else.
>>>> We can already control locals (by which parameters are passed in) and
>>>> globals via exec, eval, __import__, and runpy (any others?)
>>>> but we can't control builtins.
>>>
>>> Correct - because controlling builtins is the domain of sandboxes.
>>
>> Incorrect (unless I misunderstand the context) -- when you control the
>> globals you control the __builtins__ set there.
>
> And this is where I don't like the idea at a practical level. We
> already have a way to swap in a different set of builtins for a
> certain execution context (i.e. set "__builtins__" in the global
> namespace) for a small chunk of code, as well as allowing
> collections.ChainMap to insert additional namespaces into the name
> lookup path.
>
> This proposal suggests adding an additional mapping argument to every
> API that currently accepts a locals and/or globals mapping, thus
> achieving... well, nothing substantial, as far as I can tell (aside
> from a lot of pointless churn in a bunch of APIs, not all of which are
> under our direct control).
>
>> In any case, the locals / globals / builtins chain is a
>> simplification; there are also any number of intermediate scopes
>> (between locals and globals) from which "nonlocal" variables may be
>> used. Like optimized function globals, these don't use a dict lookup
>> at all, they are determined by compile-time analysis.
>
> Acknowledged, but code executed via the exec API with both locals and
> globals passed in is actually one of the few places where that lookup
> chain survives in its original form (module level class definitions
> being the other).
>
> Now, rereading Mark's original message, a simpler proposal of having
> *function objects* do an early lookup of
> "self.__globals__['__builtins__']" at creation time and caching that
> somewhere such that the frame objects can get hold of it (rather than
> having to do the lookup every time the function gets called or a
> builtin gets referenced) might be a nice micro-optimisation. It's the
> gratuitous API changes that I'm objecting to, not the underlying idea
> of binding the reference to the builtins namespace earlier in the
> function definition process. I'd even be OK with leaving the default
> builtins reference *out* of the globals namespace in favour of storing
> a hidden reference on the frame objects.

Agreed on the gratuitous API changes. I'd like to hear Mark's response.

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Mark Shannon-3
Guido van Rossum wrote:

> On Thu, Mar 8, 2012 at 4:33 PM, Nick Coghlan <[hidden email]> wrote:
>> On Fri, Mar 9, 2012 at 3:31 AM, Guido van Rossum <[hidden email]> wrote:
>>> But the __builtins__ that are actually used by any particular piece of
>>> code is *not* taken by importing builtins. It is taken from what the
>>> globals store under the key __builtins__.
>>>
>>> This is a feature that was added specifically for sandboxing purposes,
>>> but I believe it has found other uses too.
>> Agreed, but swapping out builtins for a different namespace is still
>> the exception rather than the rule. My Impression of Mark's proposal
>> was that this approach would become the *preferred* way of doing
>> things, and that's the part I don't like at a conceptual level.
>>
>>>>> The key point is that every piece of code already inherits locals, globals
>>>>> and builtins from somewhere else.
>>>>> We can already control locals (by which parameters are passed in) and
>>>>> globals via exec, eval, __import__, and runpy (any others?)
>>>>> but we can't control builtins.
>>>> Correct - because controlling builtins is the domain of sandboxes.
>>> Incorrect (unless I misunderstand the context) -- when you control the
>>> globals you control the __builtins__ set there.
>> And this is where I don't like the idea at a practical level. We
>> already have a way to swap in a different set of builtins for a
>> certain execution context (i.e. set "__builtins__" in the global
>> namespace) for a small chunk of code, as well as allowing
>> collections.ChainMap to insert additional namespaces into the name
>> lookup path.
>>
>> This proposal suggests adding an additional mapping argument to every
>> API that currently accepts a locals and/or globals mapping, thus
>> achieving... well, nothing substantial, as far as I can tell (aside
>> from a lot of pointless churn in a bunch of APIs, not all of which are
>> under our direct control).
>>
>>> In any case, the locals / globals / builtins chain is a
>>> simplification; there are also any number of intermediate scopes
>>> (between locals and globals) from which "nonlocal" variables may be
>>> used. Like optimized function globals, these don't use a dict lookup
>>> at all, they are determined by compile-time analysis.
>> Acknowledged, but code executed via the exec API with both locals and
>> globals passed in is actually one of the few places where that lookup
>> chain survives in its original form (module level class definitions
>> being the other).
>>
>> Now, rereading Mark's original message, a simpler proposal of having
>> *function objects* do an early lookup of
>> "self.__globals__['__builtins__']" at creation time and caching that
>> somewhere such that the frame objects can get hold of it (rather than
>> having to do the lookup every time the function gets called or a
>> builtin gets referenced) might be a nice micro-optimisation. It's the
>> gratuitous API changes that I'm objecting to, not the underlying idea
>> of binding the reference to the builtins namespace earlier in the
>> function definition process. I'd even be OK with leaving the default
>> builtins reference *out* of the globals namespace in favour of storing
>> a hidden reference on the frame objects.
>
> Agreed on the gratuitous API changes. I'd like to hear Mark's response.
>
C API or Python API?

The Python API would be changed, but in a backwards compatible way.
exec, eval and __import__ would all gain an optional (keyword-only?)
"builtins" parameter.

I see no reason to change any of the C API functions.
New functions taking an extra parameter could be added,
but it wouldn't be a requirement.

Cheers,
Mark




_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Nick Coghlan
On Fri, Mar 9, 2012 at 6:19 PM, Mark Shannon <[hidden email]> wrote:
> The Python API would be changed, but in a backwards compatible way.
> exec, eval and __import__ would all gain an optional (keyword-only?)
> "builtins" parameter.

No, some APIs effectively define *protocols*. For such APIs, *adding*
parameters is almost of comparable import to taking them away, because
they require that other APIs modelled on the prototype also change. In
this case, not only exec() has to change, but eval, __import__,
probably runpy, function creation, eventually any third party APIs for
code execution, etc, etc.

Adding a new parameter to exec is a change with serious implications,
and utterly unnecessary, since the API part is already covered by
setting __builtins__ in the passed in globals namespace (which is
appropriately awkward to advise people that they're doing something
strange with potentially unintended consequences or surprising
limitations).

That said, binding a reference to the builtin *early* (for example, at
function definition time or when a new invocation of the eval loop
first fires up) may be a reasonable idea, but you don't have to change
the user facing API to explore that option - it works just as well
with "__builtins__" as an optional value in the existing global
namespace.

Cheers,
Nick.

--
Nick Coghlan   |   [hidden email]   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Mark Shannon-3
Nick Coghlan wrote:

> On Fri, Mar 9, 2012 at 6:19 PM, Mark Shannon <[hidden email]> wrote:
>> The Python API would be changed, but in a backwards compatible way.
>> exec, eval and __import__ would all gain an optional (keyword-only?)
>> "builtins" parameter.
>
> No, some APIs effectively define *protocols*. For such APIs, *adding*
> parameters is almost of comparable import to taking them away, because
> they require that other APIs modelled on the prototype also change. In
> this case, not only exec() has to change, but eval, __import__,
> probably runpy, function creation, eventually any third party APIs for
> code execution, etc, etc.
>
> Adding a new parameter to exec is a change with serious implications,
> and utterly unnecessary, since the API part is already covered by
> setting __builtins__ in the passed in globals namespace (which is
> appropriately awkward to advise people that they're doing something
> strange with potentially unintended consequences or surprising
> limitations).

It is the implementation that interests me.
Implementing the (locals, globals, builtins) triple as a single object
has advantages both in terms of internal consistency and efficiency.

I just thought to expose this to the user.
I am now persuaded that I don't want to expose anything :)

>
> That said, binding a reference to the builtin *early* (for example, at
> function definition time or when a new invocation of the eval loop
> first fires up) may be a reasonable idea, but you don't have to change
> the user facing API to explore that option - it works just as well
> with "__builtins__" as an optional value in the existing global
> namespace.

OK. So, how about this:
(builtins refers to the dict used for variable lookup, not the module)

New eval pseudocode
eval(code, globals, locals):
     triple = (locals, globals, globals["__builtins__"])
     return eval_internal(triple)

Similarly for exec, __import__ and runpy.

That way the (IMO clumsy) builtins = globals["__builtins__"]
only happens at a few known locations.
It should then be clear where all code gets its namespaces from.

Namespaces should be inherited as follows:

frame:
function scope: globals and builtins from function, locals from parameters.
module scope: globals and builtins from module, locals == globals.
in eval, exec, or runpy: all explicit.

function: globals and builtins from module (no locals)

module:  globals and builtins from import (no locals)

import: explicitly from __import__() or
implicitly from current frame in an import statement.

For frame and function, free and cell (nonlocal) variables would be
unchanged.

On entry the namespaces will be {}, {}, sys.modules['builtins'].__dict__

This is pretty much what happens anyway,
except that where code gets its builtins from is now well defined.

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Victor Stinner-3
In reply to this post by Mark Shannon-3
> The reason I am proposing this here rather than on python-ideas is that
> treating the triple of [locals, globals, builtins] as a single
> "execution context" can be implemented in a really nice way.
>
> Internally, the execution context of [locals, globals, builtins]
> can be treated a single immutable object (custom object or tuple)
> Treating it as immutable means that it can be copied merely by taking a
> reference. A nice trick in the implementation is to make a NULL locals
> mean "fast" locals for function contexts. Frames, could then acquire their
> globals and builtins by a single reference copy from the function object,
> rather than searching globals for a '__builtins__'
> to find the builtins.

Creating a new frame lookup for __builtins__ in globals only if
globals of the new frame is different from the globals of the previous
frame. You would like to optimize this case? If globals is unchanged,
Python just increments the reference counter.

When globals is different from the previous frame? When you call a
function from a different module maybe?

Do you have an idea of the speedup of your optimization?

Victor
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Mark Shannon-3
Victor Stinner wrote:

>> The reason I am proposing this here rather than on python-ideas is that
>> treating the triple of [locals, globals, builtins] as a single
>> "execution context" can be implemented in a really nice way.
>>
>> Internally, the execution context of [locals, globals, builtins]
>> can be treated a single immutable object (custom object or tuple)
>> Treating it as immutable means that it can be copied merely by taking a
>> reference. A nice trick in the implementation is to make a NULL locals
>> mean "fast" locals for function contexts. Frames, could then acquire their
>> globals and builtins by a single reference copy from the function object,
>> rather than searching globals for a '__builtins__'
>> to find the builtins.
>
> Creating a new frame lookup for __builtins__ in globals only if
> globals of the new frame is different from the globals of the previous
> frame. You would like to optimize this case? If globals is unchanged,
> Python just increments the reference counter.

I'm more interested in simplifying the code than performance.
We this proposed approach, there is no need to test where the globals
come from, or what the builtins are; just incref the namespace triple.

>
> When globals is different from the previous frame? When you call a
> function from a different module maybe?
>
> Do you have an idea of the speedup of your optimization?

No. But it won't be slower.

Cheers,
Mark
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Adding a builtins parameter to eval(), exec() and __import__().

Mark Lawrence
On 09/03/2012 12:57, Mark Shannon wrote:

> No. But it won't be slower.
>
> Cheers,
> Mark

Please prove it, you have to convince a number of core developers
including, but not limited to, the BDFL :).

--
Cheers.

Mark Lawrence.

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com