|
Does anyone have (or know of) accurate totals and percentages on how
Python is used? I'm particularly interested in the following groupings: - new development vs. stable code-bases - categories (web, scripts, "big data", computation, etc.) - "bare metal" vs. on top of some framework - regional usage I'm thinking about this partly because of the discussion on python-ideas about the perceived challenges of Unicode in Python 3. All the rhetoric, anecdotal evidence, and use-cases there have little meaning to me, in regards to Python as a whole, without an understanding of who is actually affected. For instance, if frameworks (like django and numpy) could completely hide the arguable challenges of Unicode in Python 3--and most projects were built on top of frameworks--then general efforts for making Unicode easier in Python 3 should go toward helping framework writers. Not only are such usage numbers useful for the Unicode discussion (which I wish would get resolved and die so we could move on to more interesting stuff :) ). They help us know where efforts could be focused in general to make Python more powerful and easier to use where it's already used extensively. They can show us the areas that Python isn't used much, thus exposing a targeted opportunity to change that. Realistically, it's not entirely feasible to compile such information at a comprehensive level, but even generally accurate numbers would be a valuable resource. If the numbers aren't out there, what would some good approaches to discovering them? Thanks! -eric -- http://mail.python.org/mailman/listinfo/python-list |
|
Eric Snow, 11.02.2012 22:02:
> - categories (web, scripts, "big data", computation, etc.) No numbers, but from my stance, the four largest areas where Python is used appear to be (in increasing line length order): a) web applications b) scripting and tooling c) high-performance computation d) testing (non-Python/embedded/whatever code) I'm sure others will manage to remind me of the one or two I forgot... Stefan -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Eric Snow-2
On 2/11/2012 3:02 PM, Eric Snow wrote:
> I'm thinking about this partly because of the discussion on > python-ideas about the perceived challenges of Unicode in Python 3. > For instance, if frameworks (like django and numpy) could completely > hide the arguable challenges of Unicode in Python 3--and most projects > were built on top of frameworks--then general efforts for making > Unicode easier in Python 3 should go toward helping framework writers. Huh? I'll admit I'm a novice, but isn't Unicode mostly trivial in py3k compared to 2.x? Or are you referring to porting 2.x to 3.x? I've been under the impression that Unicode in 2.x can be painful at times, but easy in 3.x. I've been using 3.2 and Unicode hasn't been much of an issue. -- CPython 3.2.2 | Windows NT 6.1.7601.17640 -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Eric Snow-2
On 11/02/2012 21:02, Eric Snow wrote:
> Does anyone have (or know of) accurate totals and percentages on how > Python is used? I'm particularly interested in the following > groupings: > > - new development vs. stable code-bases > - categories (web, scripts, "big data", computation, etc.) > - "bare metal" vs. on top of some framework > - regional usage > > I'm thinking about this partly because of the discussion on > python-ideas about the perceived challenges of Unicode in Python 3. > All the rhetoric, anecdotal evidence, and use-cases there have little > meaning to me, in regards to Python as a whole, without an > understanding of who is actually affected. > > For instance, if frameworks (like django and numpy) could completely > hide the arguable challenges of Unicode in Python 3--and most projects > were built on top of frameworks--then general efforts for making > Unicode easier in Python 3 should go toward helping framework writers. > > Not only are such usage numbers useful for the Unicode discussion > (which I wish would get resolved and die so we could move on to more > interesting stuff :) ). They help us know where efforts could be > focused in general to make Python more powerful and easier to use > where it's already used extensively. They can show us the areas that > Python isn't used much, thus exposing a targeted opportunity to change > that. > > Realistically, it's not entirely feasible to compile such information > at a comprehensive level, but even generally accurate numbers would be > a valuable resource. If the numbers aren't out there, what would some > good approaches to discovering them? Thanks! > > -eric As others have said on other Python newsgroups it ain't a problem. The only time I've ever had a problem was with matplotlib which couldn't print a £ sign. I used a U to enforce unicode job done. If I had a major problem I reckon that a search on c.l.p would give me an answer easy peasy. -- Cheers. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Andrew Berg-4
On Sat, Feb 11, 2012 at 2:51 PM, Andrew Berg <[hidden email]> wrote:
> On 2/11/2012 3:02 PM, Eric Snow wrote: >> I'm thinking about this partly because of the discussion on >> python-ideas about the perceived challenges of Unicode in Python 3. > >> For instance, if frameworks (like django and numpy) could completely >> hide the arguable challenges of Unicode in Python 3--and most projects >> were built on top of frameworks--then general efforts for making >> Unicode easier in Python 3 should go toward helping framework writers. > Huh? I'll admit I'm a novice, but isn't Unicode mostly trivial in py3k > compared to 2.x? Or are you referring to porting 2.x to 3.x? I've been > under the impression that Unicode in 2.x can be painful at times, but > easy in 3.x. > I've been using 3.2 and Unicode hasn't been much of an issue. My expectation is that yours is the common experience. However, in at least one current thread (on python-ideas) and at a variety of times in the past, _some_ people have found Unicode in Python 3 to make more work. So that got me to thinking about who's experience is the general case, and if any concerns broadly apply to more that framework/library writers (like django, jinja, twisted, etc.). Having usage statistics would be helpful in identifying the impact of things like Unicode in Python 3. -eric -- http://mail.python.org/mailman/listinfo/python-list |
|
On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow <[hidden email]> wrote:
> However, in at > least one current thread (on python-ideas) and at a variety of times > in the past, _some_ people have found Unicode in Python 3 to make more > work. If Unicode in Python is causing you more work, isn't it most likely that the issue would have come up anyway? For instance, suppose you have a web form and you accept customer names, which you then store in a database. You could assume that the browser submits it in UTF-8 and that your database back-end can accept UTF-8, and then pretend that it's all ASCII, but if you then want to upper-case the name for a heading, somewhere you're going to needto deal with Unicode; and when your programming language has facilities like str.upper(), that's going to make it easier, not later. Sure, the simple case is easier if you pretend it's all ASCII, but it's still better to have language facilities. ChrisA -- http://mail.python.org/mailman/listinfo/python-list |
|
On Sat, Feb 11, 2012 at 6:28 PM, Chris Angelico <[hidden email]> wrote:
> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow <[hidden email]> wrote: >> However, in at >> least one current thread (on python-ideas) and at a variety of times >> in the past, _some_ people have found Unicode in Python 3 to make more >> work. > > If Unicode in Python is causing you more work, isn't it most likely > that the issue would have come up anyway? For instance, suppose you > have a web form and you accept customer names, which you then store in > a database. You could assume that the browser submits it in UTF-8 and > that your database back-end can accept UTF-8, and then pretend that > it's all ASCII, but if you then want to upper-case the name for a > heading, somewhere you're going to needto deal with Unicode; and when > your programming language has facilities like str.upper(), that's > going to make it easier, not later. Sure, the simple case is easier if > you pretend it's all ASCII, but it's still better to have language > facilities. Yeah, that's how I see it too. However, my sample size is much too small to have any sense of the broader Python 3 experience. That's what I'm going for with those Python usage statistics (if it's even feasible). -eric -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Eric Snow-2
On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:
> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow > <[hidden email]> wrote: >> However, in at >> least one current thread (on python-ideas) and at a variety of times in >> the past, _some_ people have found Unicode in Python 3 to make more >> work. > > If Unicode in Python is causing you more work, isn't it most likely that > the issue would have come up anyway? The argument being made is that in Python 2, if you try to read a file that contains Unicode characters encoded with some unknown codec, you don't have to think about it. Sure, you get moji-bake rubbish in your database, but that's the fault of people who insist on not being American. Or who spell Zoe with an umlaut. In Python 3, if you try the same thing, you get an error. Fixing the error requires thought, and even if that is only a minuscule amount of thought, that's too much for some developers who are scared of Unicode. Hence the FUD that Python 3 is too hard because it makes you learn Unicode. I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have at least a basic working knowledge of Unicode, you're the equivalent of a doctor who doesn't believe in germs. http://www.joelonsoftware.com/articles/Unicode.html Learning a basic working knowledge of Unicode is not that hard. You don't need to be an expert, and it's just not that scary. The use-case given is: "I have a file containing text. I can open it in an editor and see it's nearly all ASCII text, except for a few weird and bizarre characters like £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an error. What should I do that requires no thought?" Obvious answers: - Try decoding with UTF8 or Latin1. Even if you don't get the right characters, you'll get *something*. - Use open(filename, encoding='ascii', errors='surrogateescape') (Or possibly errors='ignore'.) -- Steven -- http://mail.python.org/mailman/listinfo/python-list |
|
On Feb 11, 8:23 pm, Steven D'Aprano <steve
+[hidden email]> wrote: > On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote: > > On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow > > <[hidden email]> wrote: > >> However, in at > >> least one current thread (on python-ideas) and at a variety of times in > >> the past, _some_ people have found Unicode in Python 3 to make more > >> work. > > > If Unicode in Python is causing you more work, isn't it most likely that > > the issue would have come up anyway? > > The argument being made is that in Python 2, if you try to read a file > that contains Unicode characters encoded with some unknown codec, you > don't have to think about it. Sure, you get moji-bake rubbish in your > database, but that's the fault of people who insist on not being > American. Or who spell Zoe with an umlaut. That's not the worst of it... i have many times had a block of text that was valid ASCII except for some intermixed Unicode white-space. Who the hell would even consider inserting Unicode white-space!!! > "I have a file containing text. I can open it in an editor and see it's > nearly all ASCII text, except for a few weird and bizarre characters like > £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an > error. What should I do that requires no thought?" > > Obvious answers: the most obvious answer would be to read the file WITHOUT worrying about asinine encoding. -- http://mail.python.org/mailman/listinfo/python-list |
|
On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson
<[hidden email]> wrote: > On Feb 11, 8:23 pm, Steven D'Aprano <steve > +[hidden email]> wrote: >> "I have a file containing text. I can open it in an editor and see it's >> nearly all ASCII text, except for a few weird and bizarre characters like >> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an >> error. What should I do that requires no thought?" >> >> Obvious answers: > > the most obvious answer would be to read the file WITHOUT worrying > about asinine encoding. What this statement misunderstands, though, is that ASCII is itself an encoding. Files contain bytes, and it's only what's external to those bytes that gives them meaning. The famous "bush hid the facts" trick with Windows Notepad shows the folly of trying to use internal evidence to identify meaning from bytes. Everything that displays text to a human needs to translate bytes into glyphs, and the usual way to do this conceptually is to go via characters. Pretending that it's all the same thing really means pretending that one byte represents one character and that each character is depicted by one glyph. And that's doomed to failure, unless everyone speaks English with no foreign symbols - so, no mathematical notations. ChrisA -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Rick Johnson
On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> Everything that displays text to a human needs to translate bytes into > glyphs, and the usual way to do this conceptually is to go via > characters. Pretending that it's all the same thing really means > pretending that one byte represents one character and that each > character is depicted by one glyph. And that's doomed to failure, unless > everyone speaks English with no foreign symbols - so, no mathematical > notations. Pardon me, but you can't even write *English* in ASCII. You can't say that it cost you £10 to courier your résumé to the head office of Encyclopædia Britanica to apply for the position of Staff Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy and old-fashioned, but it is traditional English.) Hell, you can't even write in *American*: you can't say that the recipe for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc. ASCII truly is a blight on the world, and the sooner it fades into obscurity, like EBCDIC, the better. Even if everyone did change to speak ASCII, you still have all the historical records and documents and files to deal with. Encodings are not going away. -- Steven -- http://mail.python.org/mailman/listinfo/python-list |
|
On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano
<[hidden email]> wrote: > You can't say that it cost you £10 to courier your résumé to the head > office of Encyclopædia Britanica to apply for the position of Staff > Coördinator. True, but if it cost you $10 (or 10 GBP) to courier your curriculum vitae to the head office of Encyclopaedia Britannica to become Staff Coordinator, then you'd be fine. And if it cost you $10 to post your work summary to Britannica's administration to apply for this Staff Coordinator position, you could say it without 'e' too. Doesn't mean you don't need Unicode! ChrisA -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Rick Johnson
On Sat, 11 Feb 2012 18:36:52 -0800, Rick Johnson wrote:
>> "I have a file containing text. I can open it in an editor and see it's >> nearly all ASCII text, except for a few weird and bizarre characters >> like £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I >> get an error. What should I do that requires no thought?" >> >> Obvious answers: > > the most obvious answer would be to read the file WITHOUT worrying about > asinine encoding. Your mad leet reading comprehension skillz leave me in awe Rick. If you try to read a file containing non-ASCII characters encoded using UTF8 on Windows without explicitly specifying either UTF8 as the encoding, or an error handler, you will get an exception. It's not just UTF8 either, but nearly all encodings. You can't even expect to avoid problems if you stick to nothing but Windows, because Windows' default encoding is localised: a file generated in (say) Israel or Japan or Germany will use a different code page (encoding) by default than one generated in (say) the US, Canada or UK. -- Steven -- http://mail.python.org/mailman/listinfo/python-list |
|
On 2/12/2012 12:10 AM, Steven D'Aprano wrote:
> It's not just UTF8 either, but nearly all encodings. You can't even > expect to avoid problems if you stick to nothing but Windows, because > Windows' default encoding is localised: a file generated in (say) Israel > or Japan or Germany will use a different code page (encoding) by default > than one generated in (say) the US, Canada or UK. Generated by what? Windows will store a locale value for programs to use, but programs use Unicode internally by default (i.e., API calls are Unicode unless they were built for old versions of Windows), and the default filesystem (NTFS) uses Unicode for file names. AFAIK, only the terminal has a localized code page by default. Perhaps Notepad will write text files with the localized code page by default, but that's an application choice... -- CPython 3.2.2 | Windows NT 6.1.7601.17640 -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Steven D'Aprano-11
On 12.2.2012 03:23, Steven D'Aprano wrote:
> The use-case given is: > > "I have a file containing text. I can open it in an editor and see it's > nearly all ASCII text, except for a few weird and bizarre characters like > £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an > error. What should I do that requires no thought?" > > Obvious answers: > > - Try decoding with UTF8 or Latin1. Even if you don't get the right > characters, you'll get *something*. > > - Use open(filename, encoding='ascii', errors='surrogateescape') > > (Or possibly errors='ignore'.) These are not good answer, IMHO. The only answer I can think of, really, is: - pack you luggage, your submarine waits on you to peel onions in it (with reference to the Joel's article). Meaning, really, you should learn your craft and pull up your head from the sand. There is a wider world around you. (and yes, I am a Czech, so I need at least latin-2 for my language). Best, Matěj -- http://mail.python.org/mailman/listinfo/python-list |
|
On 12.2.2012 09:14, Matej Cepl wrote:
>> Obvious answers: >> >> - Try decoding with UTF8 or Latin1. Even if you don't get the right >> characters, you'll get *something*. >> >> - Use open(filename, encoding='ascii', errors='surrogateescape') >> >> (Or possibly errors='ignore'.) > > These are not good answer, IMHO. The only answer I can think of, really, > is: Slightly less flameish answer to the question “What should I do, really?” is a tough one: all these suggested answers are bad because they don’t deal with the fact, that your input data are obviously broken. The rest is just pure GIGO … without fixing (and I mean, really, fixing, not ignoring the problem, which is what the previous answers suggest) your input, you’ll get garbage on output. And you should be thankful to py3k that it shown the issue to you. BTW, can you display the following line? Příliš žluťoučký kůň úpěl ďábelské ódy. Best, Matěj -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Steven D'Aprano-11
On Sun, 12 Feb 2012 01:05:35 -0600, Andrew Berg wrote:
> On 2/12/2012 12:10 AM, Steven D'Aprano wrote: >> It's not just UTF8 either, but nearly all encodings. You can't even >> expect to avoid problems if you stick to nothing but Windows, because >> Windows' default encoding is localised: a file generated in (say) >> Israel or Japan or Germany will use a different code page (encoding) by >> default than one generated in (say) the US, Canada or UK. > Generated by what? Windows will store a locale value for programs to > use, but programs use Unicode internally by default Which programs? And we're not talking about what they use internally, but what they write to files. > (i.e., API calls are > Unicode unless they were built for old versions of Windows), and the > default filesystem (NTFS) uses Unicode for file names. No. File systems do not use Unicode for file names. Unicode is an abstract mapping between code points and characters. File systems are written using bytes. Suppose you're a fan of Russian punk bank Наӥв and you have a directory of their music. The file system doesn't store the Unicode code points 1053 1072 1253 1074, it has to be encoded to a sequence of bytes first. NTFS by default uses the UTF-16 encoding, which means the actual bytes written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading byte-order mark \xff\xfe). Windows has two separate APIs, one for "wide" characters, the other for single bytes. Depending on which one you use, the directory will appear to be called Наӥв or 0å2. But in any case, we're not talking about the file name encoding. We're talking about the contents of files. > AFAIK, only the > terminal has a localized code page by default. Perhaps Notepad will > write text files with the localized code page by default, but that's an > application choice... Exactly. And unless you know what encoding the application chooses, you will likely get an exception trying to read the file. -- Steven -- http://mail.python.org/mailman/listinfo/python-list |
|
On 2/12/2012 3:12 AM, Steven D'Aprano wrote:
> NTFS by default uses the UTF-16 encoding, which means the actual bytes > written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading > byte-order mark \xff\xfe). That's what I meant. Those bytes will be interpreted consistently across all locales. > Windows has two separate APIs, one for "wide" characters, the other for > single bytes. Depending on which one you use, the directory will appear > to be called Наӥв or 0å2. Yes, and AFAIK, the wide API is the default. The other one only exists to support programs that don't support the wide API (generally, such programs were intended to be used on older platforms that lack that API). > But in any case, we're not talking about the file name encoding. We're > talking about the contents of files. Okay then. As I stated, this has nothing to do with the OS since programs are free to interpret bytes any way they like. -- CPython 3.2.2 | Windows NT 6.1.7601.17640 -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Matěj Cepl
On 12/02/2012 08:26, Matej Cepl wrote:
> On 12.2.2012 09:14, Matej Cepl wrote: >>> Obvious answers: >>> >>> - Try decoding with UTF8 or Latin1. Even if you don't get the right >>> characters, you'll get *something*. >>> >>> - Use open(filename, encoding='ascii', errors='surrogateescape') >>> >>> (Or possibly errors='ignore'.) >> >> These are not good answer, IMHO. The only answer I can think of, really, >> is: > > Slightly less flameish answer to the question “What should I do, > really?” is a tough one: all these suggested answers are bad because > they don’t deal with the fact, that your input data are obviously > broken. The rest is just pure GIGO … without fixing (and I mean, really, > fixing, not ignoring the problem, which is what the previous answers > suggest) your input, you’ll get garbage on output. And you should be > thankful to py3k that it shown the issue to you. > > BTW, can you display the following line? > > Příliš žluťoučký kůň úpěl ďábelské ódy. > > Best, > > Matěj Yes in Thunderbird, Notepad, Wordpad and Notepad++ on Windows Vista, can't be bothered to try any other apps. -- Cheers. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list |
|
In reply to this post by Rick Johnson
In article <[hidden email]>,
Chris Angelico <[hidden email]> wrote: > On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson > <[hidden email]> wrote: > > On Feb 11, 8:23 pm, Steven D'Aprano <steve > > +[hidden email]> wrote: > >> "I have a file containing text. I can open it in an editor and see it's > >> nearly all ASCII text, except for a few weird and bizarre characters like > >> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an > >> error. What should I do that requires no thought?" > >> > >> Obvious answers: > > > > the most obvious answer would be to read the file WITHOUT worrying > > about asinine encoding. > > What this statement misunderstands, though, is that ASCII is itself an > encoding. Files contain bytes, and it's only what's external to those > bytes that gives them meaning. becoming a universal standard which lasted for decades, people who grew up with it don't realize there was once any other way. Not just EBCDIC, but also SIXBIT, RAD-50, tilt/rotate, packed card records, and so on. Transcoding was a way of life, and if you didn't know what you were starting with and aiming for, it was hopeless. Kind of like now where we are again with Unicode. </soapbox> -- http://mail.python.org/mailman/listinfo/python-list |
| Powered by Nabble | Edit this page |
