|
Hi,
I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3. You can read the PEP online: http://www.python.org/dev/peps/pep-0414/ This is a followup to the discussion about this topic here on the mailinglist and on twitter/IRC over the last few weeks. Regards, Armin _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
If this can encourage more projects to support Python 3 (even if it's
only 3.3 and later) and hence improve adoption of Python 3, I'm all for it. A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. --Guido On Sat, Feb 25, 2012 at 12:23 PM, Armin Ronacher <[hidden email]> wrote: > Hi, > > I just uploaded PEP 414 which proposes am optional 'u' prefix for string > literals for Python 3. > > You can read the PEP online: http://www.python.org/dev/peps/pep-0414/ > > This is a followup to the discussion about this topic here on the > mailinglist and on twitter/IRC over the last few weeks. > > > Regards, > Armin > _______________________________________________ > Python-Dev mailing list > [hidden email] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum <[hidden email]> wrote:
> A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. Even if it was quite fast, I don't think such a function would bring the same benefits as restoring support for u'' literals. Using myself as an example, my work projects (such as PulpDist [1]) are currently written to target Python 2.6, since that's the system Python on RHEL 6. As a web application, PulpDist has unicode literals *everywhere*, but (as Armin pointed out to me), turning on "from __future__ import unicode_literals" in every file would be incorrect, since many of them also include native strings (mostly related to attribute names and subprocess invocation, but probably a few WSGI related ones as well). The action-at-a-distance of that future import can also make the code hard to read and review (in particular, a diff doesn't tell you whether or not the future import is present in the original file). It's going to be quite some time before I look at porting that code to Python 3, but, given the style of forward compatible code that I write (e.g. "print (X)", never "print X" or " print (X, Y)"; "except A as B:", never "except A, B:"), the lack of unicode literals in 3.x is the only significant sticking point I expect to encounter. If 3.3+ has Unicode literals, I expect that PulpDist *right now* would be awfully close to being source compatible (and any other discrepancies would just be simple fixes like adding conditional imports from new locations). IIRC, I've previously opposed the restoration of unicode literals as a retrograde step. Looking at the implications for the future migration of PulpDist has changed my mind. Regards, Nick. [1] https://fedorahosted.org/pulpdist/ -- Nick Coghlan | [hidden email] | Brisbane, Australia _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Armin Ronacher
The PEP does not consider an alternative idea such as using "from __future__
import unicode_literals" in code which needs to run on 2.x, together with e.g. a callable n('xxx') which can be used where native strings are needed. This avoids the need to reintroduce the u'xxx' literal syntax, makes it explicit where native strings are needed, is less obtrusive that u('xxx') or u'xxx' because typically there will be vastly fewer places where you need native strings, and is unlikely to impose a major runtime penalty when compared with u('xxx') (again, because of the lower frequency of occurrence). Even if you have arguments against this idea, I think it's at least worth mentioning in the PEP with any counter-arguments you have. Regards, Vinay Sajip _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
26.02.12 11:05, Vinay Sajip написав(ла):
> The PEP does not consider an alternative idea such as using "from __future__ > import unicode_literals" in code which needs to run on 2.x, together with e.g. a > callable n('xxx') which can be used where native strings are needed. This avoids > the need to reintroduce the u'xxx' literal syntax, makes it explicit where > native strings are needed, is less obtrusive that u('xxx') or u'xxx' because > typically there will be vastly fewer places where you need native strings, and > is unlikely to impose a major runtime penalty when compared with u('xxx') > (again, because of the lower frequency of occurrence). n = str _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
Serhiy Storchaka <storchaka <at> gmail.com> writes:
> n = str Well, n to indicate that native string is required. Regards, Vinay Sajip _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
Vinay Sajip wrote:
> Serhiy Storchaka <storchaka <at> gmail.com> writes: > > >> n = str > > Well, n to indicate that native string is required. str indicates the native string type, because it *is* the native string type. By definition, str = str in both Python 2.x and Python 3.x. There's no point in aliasing it to "n". Besides, "n" is commonly used for ints. It would be disturbing for me to read code with n a function or type, particularly one that returns a string. I think your suggestion is not well explained. You suggested a function n, expected to take a string literal. The example you gave earlier was: n('xxx') But it seems to me that this is a no-op, because 'xxx' is already the native string type. In Python 2, it gives a str (byte-string), which the n() function converts to a byte-string. In Python 3, it gives a str (unicode-string), which the n() function converts to a unicode-string. -- Steven _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Vinay Sajip
On Sun, Feb 26, 2012 at 7:05 PM, Vinay Sajip <[hidden email]> wrote:
> The PEP does not consider an alternative idea such as using "from __future__ > import unicode_literals" in code which needs to run on 2.x, together with e.g. a > callable n('xxx') which can be used where native strings are needed. This avoids > the need to reintroduce the u'xxx' literal syntax, makes it explicit where > native strings are needed, is less obtrusive that u('xxx') or u'xxx' because > typically there will be vastly fewer places where you need native strings, and > is unlikely to impose a major runtime penalty when compared with u('xxx') > (again, because of the lower frequency of occurrence). > > Even if you have arguments against this idea, I think it's at least worth > mentioning in the PEP with any counter-arguments you have. The PEP already mentions that. In fact, all bar the first paragraph in the "Rationale and Goals" section discusses it. However, it's the last paragraph that explains why using that particular future import is, in and of itself, a bad idea: ============ Additionally, the vast majority of people who maintain Python 2.x codebases are more familiar with Python 2.x semantics, and a per-file difference in literal meanings will be very annoying for them in the long run. A quick poll on Twitter about the use of the division future import supported my suspicions that people opt out of behaviour-changing future imports because they are a maintenance burden. Every time you review code you have to check the top of the file to see if the behaviour was changed. Obviously that was an unscientific informal poll, but it might be something worth considering. ============ As soon as you allow the use of "from __future__ import unicode_literals" or a module level "__metaclass__ = type", you can't review diffs in isolation any more - whether the diff is correct or not will depend on the presence or absence of module level tweak to the language semantics. Future imports work well for things like absolute imports, new keywords, or statements becoming functions - if the future import is missing when you expected it to be present (or vice-versa) will result in a quick SyntaxError or ImportError that will point you directly to the offending code. Unicode literals and implicitly creating new-style classes are a different matter - for those, if the module level modification takes place (or doesn't take place when you expected it to be there), you get unexpected changes in behaviour instead of a clear exception that refers directly to the source of the problem. Cheers, Nick. -- Nick Coghlan | [hidden email] | Brisbane, Australia _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Steven D'Aprano-8
On Sun, Feb 26, 2012 at 9:00 PM, Steven D'Aprano <[hidden email]> wrote:
> I think your suggestion is not well explained. You suggested a function n, > expected to take a string literal. The example you gave earlier was: > > n('xxx') > > But it seems to me that this is a no-op, because 'xxx' is already the native > string type. In Python 2, it gives a str (byte-string), which the n() > function converts to a byte-string. In Python 3, it gives a str > (unicode-string), which the n() function converts to a unicode-string. Vinay's suggestion was that it be used in conjunction with the "from __future__ import unicode_literals" import, so that you could write: b"" # Binary data "" # Text (unicode) data str("") # Native string type It reduces the problem (compared to omitting the import and using a u() function), but it's still ugly and still involves the "action at a distance" of the unicode literals import. Cheers, Nick. -- Nick Coghlan | [hidden email] | Brisbane, Australia _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Nick Coghlan
Nick Coghlan <ncoghlan <at> gmail.com> writes:
> The PEP already mentions that. In fact, all bar the first paragraph in > the "Rationale and Goals" section discusses it. However, it's the last I didn't meaning the __future__ import bit, but a discussion re. alternatives to u('xxx'). > Future imports work well for things like absolute imports, new > keywords, or statements becoming functions - if the future import is > missing when you expected it to be present (or vice-versa) will result > in a quick SyntaxError or ImportError that will point you directly to > the offending code. Unicode literals and implicitly creating new-style > classes are a different matter - for those, if the module level > modification takes place (or doesn't take place when you expected it > to be there), you get unexpected changes in behaviour instead of a > clear exception that refers directly to the source of the problem. I don't disagree with anything you said here. Perhaps I've been doing too much work recently with single 2.x/3.x codebase projects, so I've just gotten to like using Unicode literals without the u prefix. However, as the proposal doesn't force one to use u prefixes, I'm not really objecting, especially if it speeds transition to 3.x. Regards, Vinay Sajip _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Nick Coghlan
On 2/26/2012 6:14 AM, Nick Coghlan wrote:
> As soon as you allow the use of "from __future__ import > unicode_literals" or a module level "__metaclass__ = type", you can't > review diffs in isolation any more - whether the diff is correct or > not will depend on the presence or absence of module level tweak to > the language semantics. > > Future imports work well for things like absolute imports, new > keywords, or statements becoming functions - if the future import is > missing when you expected it to be present (or vice-versa) will result > in a quick SyntaxError or ImportError that will point you directly to > the offending code. Unicode literals and implicitly creating new-style > classes are a different matter - for those, if the module level > modification takes place (or doesn't take place when you expected it > to be there), you get unexpected changes in behaviour instead of a > clear exception that refers directly to the source of the problem. __future__ import division. That doesn't mean I'm in favor of this new __future__, just keeping a wide angle on the viewfinder. --Ned. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Armin Ronacher
Some microbenchmarks:
$ python -m timeit -n 10000 -r 100 -s "x = 123" "'foobarbaz_%d' % x" 10000 loops, best of 100: 1.24 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" 10000 loops, best of 100: 1.59 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" 10000 loops, best of 100: 1.58 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; n = lambda s: s" "n('foobarbaz_%d') % x" 10000 loops, best of 100: 1.41 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s % x" 10000 loops, best of 100: 1.22 usec per loop There are no significant overhead to use converters. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Ned Batchelder
On Sun, Feb 26, 2012 at 10:34 PM, Ned Batchelder <[hidden email]> wrote:
> There are already __future__ imports that violate this principle: from > __future__ import division. That doesn't mean I'm in favor of this new > __future__, just keeping a wide angle on the viewfinder. Armin's straw poll was actually about whether or not people used the future import for division, rather than unicode literals. It is indeed the same problem - and several of us had a strong preference for forcing float division with "float(x) / y" over relying on the long distance effect of the future import (although it was only in this thread that I figured out exactly *why* I don't like those two, but happily used many of the other future imports when they were necessary). Cheers, Nick. -- Nick Coghlan | [hidden email] | Brisbane, Australia _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Steven D'Aprano-8
Nick Coghlan <ncoghlan <at> gmail.com> writes:
> It reduces the problem (compared to omitting the import and using a > u() function), but it's still ugly and still involves the "action at a > distance" of the unicode literals import. I agree about the action-at-a-distance leading to non-obvious bugs and wasted head-scratching time caused by such. It could be mitigated somewhat by project-level conventions, e.g. that all string literals are Unicode on that project. Then, if you put yourself in the relevant mindset when working on that project, there are fewer surprises. It's probably a matter of choosing the lesser among evils, since the proposal seems to allow mixing of literals with and without u prefixes in 3.x code - doesn't that also seem ugly? When this came up earlier (when I think Chris McDonough raised it) the issue of what to do on 3.2 came up, and though it has been addressed somewhat in the PEP, it would be nice to see the suggested on-installation hook fleshed out a little more. Regards, Vinay Sajip _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Serhiy Storchaka-2
Hi,
On 2/26/12 12:35 PM, Serhiy Storchaka wrote: > Some microbenchmarks: > > $ python -m timeit -n 10000 -r 100 -s "x = 123" "'foobarbaz_%d' % x" > 10000 loops, best of 100: 1.24 usec per loop > $ python -m timeit -n 10000 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" > 10000 loops, best of 100: 1.59 usec per loop > $ python -m timeit -n 10000 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" > 10000 loops, best of 100: 1.58 usec per loop > $ python -m timeit -n 10000 -r 100 -s "x = 123; n = lambda s: s" "n('foobarbaz_%d') % x" > 10000 loops, best of 100: 1.41 usec per loop > $ python -m timeit -n 10000 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s % x" > 10000 loops, best of 100: 1.22 usec per loop > > There are no significant overhead to use converters. That's because what you're benchmarking here more than anything is the overhead of eval() :-) See the benchmark linked in the PEP for one that measures the actual performance of the string literal / wrapper. Regards, Armin _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Guido van Rossum
On Saturday, February 25, 2012 at 10:13 PM, Guido van Rossum wrote:
> If this can encourage more projects to support Python 3 (even if it's > only 3.3 and later) and hence improve adoption of Python 3, I'm all > for it. > > A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. > > --Guido > After having this explained quite a bit to me by the more web-savvy folks such as Armin and Chris M/etc, I am a +1, the rationale makes sense, and much for the same reason that Guido cites, I think this will help with code bases using the single code base approach, and assist with overall adoption. +1 jesse _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Ned Batchelder
Hi,
On 2/26/12 12:34 PM, Ned Batchelder wrote: > There are already __future__ imports that violate this principle: from > __future__ import division. That doesn't mean I'm in favor of this new > __future__, just keeping a wide angle on the viewfinder. That's actually mentioned in the PEP :-) > A quick poll on Twitter about the use of the division future import > supported my suspicions that people opt out of behaviour-changing > future imports because they are a maintenance burden. Every time you > review code you have to check the top of the file to see if the > behaviour was changed. Regards, Armin _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Vinay Sajip
Hi,
On 2/26/12 12:42 PM, Vinay Sajip wrote: > When this came up earlier (when I think Chris McDonough raised it) the issue of > what to do on 3.2 came up, and though it has been addressed somewhat in the PEP, > it would be nice to see the suggested on-installation hook fleshed out a little > more. I wanted to do that but the tokenizer module is quite ugly to customize in order to allow "u" prefixes to strings which is why I postponed that. It would work similar to how 2to3 is invoked however. In case this PEP gets approved I will refactor the tokenize module while adding support for "u" prefixes and use that as the basis for a installation hook for older Python 3 versions. Regards, Armin _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Armin Ronacher
26.02.12 14:42, Armin Ronacher написав(ла):
> On 2/26/12 12:35 PM, Serhiy Storchaka wrote: >> Some microbenchmarks: >> >> $ python -m timeit -n 10000 -r 100 -s "x = 123" "'foobarbaz_%d' % x" >> 10000 loops, best of 100: 1.24 usec per loop >> $ python -m timeit -n 10000 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" >> 10000 loops, best of 100: 1.59 usec per loop >> $ python -m timeit -n 10000 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" >> 10000 loops, best of 100: 1.58 usec per loop >> $ python -m timeit -n 10000 -r 100 -s "x = 123; n = lambda s: s" > "n('foobarbaz_%d') % x" >> 10000 loops, best of 100: 1.41 usec per loop >> $ python -m timeit -n 10000 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s > % x" >> 10000 loops, best of 100: 1.22 usec per loop >> >> There are no significant overhead to use converters. > That's because what you're benchmarking here more than anything is the > overhead of eval() :-) See the benchmark linked in the PEP for one that > measures the actual performance of the string literal / wrapper. $ python -m timeit -n 10000 -r 100 "" 10000 loops, best of 100: 0.087 usec per loop Overhead of eval is 5%. Real code is not single string literal, every string literal occured together with a lot of code (getting and setting variables, attribute access, function calls, binary operators, unconditional and conditional jumps, etc), and total effect of using simple converter will be insignificant. _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
|
In reply to this post by Vinay Sajip
This seems like too strong a statement:
"Python 2.6 and Python 2.7 support syntax features from Python 3 which for the most part make a unified code base possible. Many thought that the unicode_literals future import might make a common source possible, but it turns out that it's doing more harm than good." While it may be true for *some* problem domains, such as WSGI apps, it is not true in general, IMO. I use this future import all the time in both libraries and applications and it's almost always helpful. Cheers, -Barry _______________________________________________ Python-Dev mailing list [hidden email] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com |
| Powered by Nabble | Edit this page |
