Quantcast

Python usage numbers

classic Classic list List threaded Threaded
112 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Python usage numbers

Eric Snow-2
Does anyone have (or know of) accurate totals and percentages on how
Python is used?  I'm particularly interested in the following
groupings:

- new development vs. stable code-bases
- categories (web, scripts, "big data", computation, etc.)
- "bare metal" vs. on top of some framework
- regional usage

I'm thinking about this partly because of the discussion on
python-ideas about the perceived challenges of Unicode in Python 3.
All the rhetoric, anecdotal evidence, and use-cases there have little
meaning to me, in regards to Python as a whole, without an
understanding of who is actually affected.

For instance, if frameworks (like django and numpy) could completely
hide the arguable challenges of Unicode in Python 3--and most projects
were built on top of frameworks--then general efforts for making
Unicode easier in Python 3 should go toward helping framework writers.

Not only are such usage numbers useful for the Unicode discussion
(which I wish would get resolved and die so we could move on to more
interesting stuff :) ).  They help us know where efforts could be
focused in general to make Python more powerful and easier to use
where it's already used extensively.  They can show us the areas that
Python isn't used much, thus exposing a targeted opportunity to change
that.

Realistically, it's not entirely feasible to compile such information
at a comprehensive level, but even generally accurate numbers would be
a valuable resource.  If the numbers aren't out there, what would some
good approaches to discovering them?  Thanks!

-eric
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Stefan Behnel-3
Eric Snow, 11.02.2012 22:02:
> - categories (web, scripts, "big data", computation, etc.)

No numbers, but from my stance, the four largest areas where Python is used
appear to be (in increasing line length order):

a) web applications
b) scripting and tooling
c) high-performance computation
d) testing (non-Python/embedded/whatever code)

I'm sure others will manage to remind me of the one or two I forgot...

Stefan

--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Andrew Berg-4
In reply to this post by Eric Snow-2
On 2/11/2012 3:02 PM, Eric Snow wrote:
> I'm thinking about this partly because of the discussion on
> python-ideas about the perceived challenges of Unicode in Python 3.

> For instance, if frameworks (like django and numpy) could completely
> hide the arguable challenges of Unicode in Python 3--and most projects
> were built on top of frameworks--then general efforts for making
> Unicode easier in Python 3 should go toward helping framework writers.
Huh? I'll admit I'm a novice, but isn't Unicode mostly trivial in py3k
compared to 2.x? Or are you referring to porting 2.x to 3.x? I've been
under the impression that Unicode in 2.x can be painful at times, but
easy in 3.x.
I've been using 3.2 and Unicode hasn't been much of an issue.
--
CPython 3.2.2 | Windows NT 6.1.7601.17640
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Mark Lawrence
In reply to this post by Eric Snow-2
On 11/02/2012 21:02, Eric Snow wrote:

> Does anyone have (or know of) accurate totals and percentages on how
> Python is used?  I'm particularly interested in the following
> groupings:
>
> - new development vs. stable code-bases
> - categories (web, scripts, "big data", computation, etc.)
> - "bare metal" vs. on top of some framework
> - regional usage
>
> I'm thinking about this partly because of the discussion on
> python-ideas about the perceived challenges of Unicode in Python 3.
> All the rhetoric, anecdotal evidence, and use-cases there have little
> meaning to me, in regards to Python as a whole, without an
> understanding of who is actually affected.
>
> For instance, if frameworks (like django and numpy) could completely
> hide the arguable challenges of Unicode in Python 3--and most projects
> were built on top of frameworks--then general efforts for making
> Unicode easier in Python 3 should go toward helping framework writers.
>
> Not only are such usage numbers useful for the Unicode discussion
> (which I wish would get resolved and die so we could move on to more
> interesting stuff :) ).  They help us know where efforts could be
> focused in general to make Python more powerful and easier to use
> where it's already used extensively.  They can show us the areas that
> Python isn't used much, thus exposing a targeted opportunity to change
> that.
>
> Realistically, it's not entirely feasible to compile such information
> at a comprehensive level, but even generally accurate numbers would be
> a valuable resource.  If the numbers aren't out there, what would some
> good approaches to discovering them?  Thanks!
>
> -eric

As others have said on other Python newsgroups it ain't a problem.  The
only time I've ever had a problem was with matplotlib which couldn't
print a £ sign.  I used a U to enforce unicode job done.  If I had a
major problem I reckon that a search on c.l.p would give me an answer
easy peasy.

--
Cheers.

Mark Lawrence.

--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Eric Snow-2
In reply to this post by Andrew Berg-4
On Sat, Feb 11, 2012 at 2:51 PM, Andrew Berg <[hidden email]> wrote:

> On 2/11/2012 3:02 PM, Eric Snow wrote:
>> I'm thinking about this partly because of the discussion on
>> python-ideas about the perceived challenges of Unicode in Python 3.
>
>> For instance, if frameworks (like django and numpy) could completely
>> hide the arguable challenges of Unicode in Python 3--and most projects
>> were built on top of frameworks--then general efforts for making
>> Unicode easier in Python 3 should go toward helping framework writers.
> Huh? I'll admit I'm a novice, but isn't Unicode mostly trivial in py3k
> compared to 2.x? Or are you referring to porting 2.x to 3.x? I've been
> under the impression that Unicode in 2.x can be painful at times, but
> easy in 3.x.
> I've been using 3.2 and Unicode hasn't been much of an issue.

My expectation is that yours is the common experience.  However, in at
least one current thread (on python-ideas) and at a variety of times
in the past, _some_ people have found Unicode in Python 3 to make more
work.  So that got me to thinking about who's experience is the
general case, and if any concerns broadly apply to more that
framework/library writers (like django, jinja, twisted, etc.).  Having
usage statistics would be helpful in identifying the impact of things
like Unicode in Python 3.

-eric
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Chris Angelico
On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow <[hidden email]> wrote:
> However, in at
> least one current thread (on python-ideas) and at a variety of times
> in the past, _some_ people have found Unicode in Python 3 to make more
> work.

If Unicode in Python is causing you more work, isn't it most likely
that the issue would have come up anyway? For instance, suppose you
have a web form and you accept customer names, which you then store in
a database. You could assume that the browser submits it in UTF-8 and
that your database back-end can accept UTF-8, and then pretend that
it's all ASCII, but if you then want to upper-case the name for a
heading, somewhere you're going to needto deal with Unicode; and when
your programming language has facilities like str.upper(), that's
going to make it easier, not later. Sure, the simple case is easier if
you pretend it's all ASCII, but it's still better to have language
facilities.

ChrisA
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Eric Snow-2
On Sat, Feb 11, 2012 at 6:28 PM, Chris Angelico <[hidden email]> wrote:

> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow <[hidden email]> wrote:
>> However, in at
>> least one current thread (on python-ideas) and at a variety of times
>> in the past, _some_ people have found Unicode in Python 3 to make more
>> work.
>
> If Unicode in Python is causing you more work, isn't it most likely
> that the issue would have come up anyway? For instance, suppose you
> have a web form and you accept customer names, which you then store in
> a database. You could assume that the browser submits it in UTF-8 and
> that your database back-end can accept UTF-8, and then pretend that
> it's all ASCII, but if you then want to upper-case the name for a
> heading, somewhere you're going to needto deal with Unicode; and when
> your programming language has facilities like str.upper(), that's
> going to make it easier, not later. Sure, the simple case is easier if
> you pretend it's all ASCII, but it's still better to have language
> facilities.

Yeah, that's how I see it too.  However, my sample size is much too
small to have any sense of the broader Python 3 experience.  That's
what I'm going for with those Python usage statistics (if it's even
feasible).

-eric
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Steven D'Aprano-11
In reply to this post by Eric Snow-2
On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:

> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow
> <[hidden email]> wrote:
>> However, in at
>> least one current thread (on python-ideas) and at a variety of times in
>> the past, _some_ people have found Unicode in Python 3 to make more
>> work.
>
> If Unicode in Python is causing you more work, isn't it most likely that
> the issue would have come up anyway?

The argument being made is that in Python 2, if you try to read a file
that contains Unicode characters encoded with some unknown codec, you
don't have to think about it. Sure, you get moji-bake rubbish in your
database, but that's the fault of people who insist on not being
American. Or who spell Zoe with an umlaut.

In Python 3, if you try the same thing, you get an error. Fixing the
error requires thought, and even if that is only a minuscule amount of
thought, that's too much for some developers who are scared of Unicode.
Hence the FUD that Python 3 is too hard because it makes you learn
Unicode.

I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with
Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have
at least a basic working knowledge of Unicode, you're the equivalent of a
doctor who doesn't believe in germs.

http://www.joelonsoftware.com/articles/Unicode.html

Learning a basic working knowledge of Unicode is not that hard. You don't
need to be an expert, and it's just not that scary.

The use-case given is:

"I have a file containing text. I can open it in an editor and see it's
nearly all ASCII text, except for a few weird and bizarre characters like
£ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
error. What should I do that requires no thought?"

Obvious answers:

- Try decoding with UTF8 or Latin1. Even if you don't get the right
characters, you'll get *something*.

- Use open(filename, encoding='ascii', errors='surrogateescape')

(Or possibly errors='ignore'.)



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Rick Johnson
On Feb 11, 8:23 pm, Steven D'Aprano <steve
+[hidden email]> wrote:

> On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:
> > On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow
> > <[hidden email]> wrote:
> >> However, in at
> >> least one current thread (on python-ideas) and at a variety of times in
> >> the past, _some_ people have found Unicode in Python 3 to make more
> >> work.
>
> > If Unicode in Python is causing you more work, isn't it most likely that
> > the issue would have come up anyway?
>
> The argument being made is that in Python 2, if you try to read a file
> that contains Unicode characters encoded with some unknown codec, you
> don't have to think about it. Sure, you get moji-bake rubbish in your
> database, but that's the fault of people who insist on not being
> American. Or who spell Zoe with an umlaut.

That's not the worst of it... i have many times had a block of text
that was valid ASCII except for some intermixed Unicode white-space.
Who the hell would even consider inserting Unicode white-space!!!

> "I have a file containing text. I can open it in an editor and see it's
> nearly all ASCII text, except for a few weird and bizarre characters like
> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
> error. What should I do that requires no thought?"
>
> Obvious answers:

the most obvious answer would be to read the file WITHOUT worrying
about asinine encoding.
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Chris Angelico
On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson
<[hidden email]> wrote:

> On Feb 11, 8:23 pm, Steven D'Aprano <steve
> +[hidden email]> wrote:
>> "I have a file containing text. I can open it in an editor and see it's
>> nearly all ASCII text, except for a few weird and bizarre characters like
>> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
>> error. What should I do that requires no thought?"
>>
>> Obvious answers:
>
> the most obvious answer would be to read the file WITHOUT worrying
> about asinine encoding.

What this statement misunderstands, though, is that ASCII is itself an
encoding. Files contain bytes, and it's only what's external to those
bytes that gives them meaning. The famous "bush hid the facts" trick
with Windows Notepad shows the folly of trying to use internal
evidence to identify meaning from bytes.

Everything that displays text to a human needs to translate bytes into
glyphs, and the usual way to do this conceptually is to go via
characters. Pretending that it's all the same thing really means
pretending that one byte represents one character and that each
character is depicted by one glyph. And that's doomed to failure,
unless everyone speaks English with no foreign symbols - so, no
mathematical notations.

ChrisA
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Steven D'Aprano-11
In reply to this post by Rick Johnson
On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:

> Everything that displays text to a human needs to translate bytes into
> glyphs, and the usual way to do this conceptually is to go via
> characters. Pretending that it's all the same thing really means
> pretending that one byte represents one character and that each
> character is depicted by one glyph. And that's doomed to failure, unless
> everyone speaks English with no foreign symbols - so, no mathematical
> notations.

Pardon me, but you can't even write *English* in ASCII.

You can't say that it cost you £10 to courier your résumé to the head
office of Encyclopædia Britanica to apply for the position of Staff
Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
and old-fashioned, but it is traditional English.)

Hell, you can't even write in *American*: you can't say that the recipe
for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.

ASCII truly is a blight on the world, and the sooner it fades into
obscurity, like EBCDIC, the better.

Even if everyone did change to speak ASCII, you still have all the
historical records and documents and files to deal with. Encodings are
not going away.


--
Steven
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Chris Angelico
On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano
<[hidden email]> wrote:
> You can't say that it cost you £10 to courier your résumé to the head
> office of Encyclopædia Britanica to apply for the position of Staff
> Coördinator.

True, but if it cost you $10 (or 10 GBP) to courier your curriculum
vitae to the head office of Encyclopaedia Britannica to become Staff
Coordinator, then you'd be fine. And if it cost you $10 to post your
work summary to Britannica's administration to apply for this Staff
Coordinator position, you could say it without 'e' too. Doesn't mean
you don't need Unicode!

ChrisA
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Steven D'Aprano-11
In reply to this post by Rick Johnson
On Sat, 11 Feb 2012 18:36:52 -0800, Rick Johnson wrote:

>> "I have a file containing text. I can open it in an editor and see it's
>> nearly all ASCII text, except for a few weird and bizarre characters
>> like £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I
>> get an error. What should I do that requires no thought?"
>>
>> Obvious answers:
>
> the most obvious answer would be to read the file WITHOUT worrying about
> asinine encoding.

Your mad leet reading comprehension skillz leave me in awe Rick.

If you try to read a file containing non-ASCII characters encoded using
UTF8 on Windows without explicitly specifying either UTF8 as the
encoding, or an error handler, you will get an exception.

It's not just UTF8 either, but nearly all encodings. You can't even
expect to avoid problems if you stick to nothing but Windows, because
Windows' default encoding is localised: a file generated in (say) Israel
or Japan or Germany will use a different code page (encoding) by default
than one generated in (say) the US, Canada or UK.



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Andrew Berg-4
On 2/12/2012 12:10 AM, Steven D'Aprano wrote:
> It's not just UTF8 either, but nearly all encodings. You can't even
> expect to avoid problems if you stick to nothing but Windows, because
> Windows' default encoding is localised: a file generated in (say) Israel
> or Japan or Germany will use a different code page (encoding) by default
> than one generated in (say) the US, Canada or UK.
Generated by what? Windows will store a locale value for programs to
use, but programs use Unicode internally by default (i.e., API calls are
Unicode unless they were built for old versions of Windows), and the
default filesystem (NTFS) uses Unicode for file names. AFAIK, only the
terminal has a localized code page by default.
Perhaps Notepad will write text files with the localized code page by
default, but that's an application choice...

--
CPython 3.2.2 | Windows NT 6.1.7601.17640
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Matěj Cepl
In reply to this post by Steven D'Aprano-11
On 12.2.2012 03:23, Steven D'Aprano wrote:

> The use-case given is:
>
> "I have a file containing text. I can open it in an editor and see it's
> nearly all ASCII text, except for a few weird and bizarre characters like
> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
> error. What should I do that requires no thought?"
>
> Obvious answers:
>
> - Try decoding with UTF8 or Latin1. Even if you don't get the right
> characters, you'll get *something*.
>
> - Use open(filename, encoding='ascii', errors='surrogateescape')
>
> (Or possibly errors='ignore'.)

These are not good answer, IMHO. The only answer I can think of, really, is:

- pack you luggage, your submarine waits on you to peel onions in it
(with reference to the Joel's article). Meaning, really, you should
learn your craft and pull up your head from the sand. There is a wider
world around you.

(and yes, I am a Czech, so I need at least latin-2 for my language).

Best,

Matěj
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Matěj Cepl
On 12.2.2012 09:14, Matej Cepl wrote:

>> Obvious answers:
>>
>> - Try decoding with UTF8 or Latin1. Even if you don't get the right
>> characters, you'll get *something*.
>>
>> - Use open(filename, encoding='ascii', errors='surrogateescape')
>>
>> (Or possibly errors='ignore'.)
>
> These are not good answer, IMHO. The only answer I can think of, really,
> is:

Slightly less flameish answer to the question “What should I do,
really?” is a tough one: all these suggested answers are bad because
they don’t deal with the fact, that your input data are obviously
broken. The rest is just pure GIGO … without fixing (and I mean, really,
fixing, not ignoring the problem, which is what the previous answers
suggest) your input, you’ll get garbage on output. And you should be
thankful to py3k that it shown the issue to you.

BTW, can you display the following line?

Příliš žluťoučký kůň úpěl ďábelské ódy.

Best,

Matěj
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Steven D'Aprano-11
In reply to this post by Steven D'Aprano-11
On Sun, 12 Feb 2012 01:05:35 -0600, Andrew Berg wrote:

> On 2/12/2012 12:10 AM, Steven D'Aprano wrote:
>> It's not just UTF8 either, but nearly all encodings. You can't even
>> expect to avoid problems if you stick to nothing but Windows, because
>> Windows' default encoding is localised: a file generated in (say)
>> Israel or Japan or Germany will use a different code page (encoding) by
>> default than one generated in (say) the US, Canada or UK.
> Generated by what? Windows will store a locale value for programs to
> use, but programs use Unicode internally by default

Which programs? And we're not talking about what they use internally, but
what they write to files.


> (i.e., API calls are
> Unicode unless they were built for old versions of Windows), and the
> default filesystem (NTFS) uses Unicode for file names.

No. File systems do not use Unicode for file names. Unicode is an
abstract mapping between code points and characters. File systems are
written using bytes.

Suppose you're a fan of Russian punk bank Наӥв and you have a directory
of their music. The file system doesn't store the Unicode code points
1053 1072 1253 1074, it has to be encoded to a sequence of bytes first.

NTFS by default uses the UTF-16 encoding, which means the actual bytes
written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading
byte-order mark \xff\xfe).

Windows has two separate APIs, one for "wide" characters, the other for
single bytes. Depending on which one you use, the directory will appear
to be called Наӥв or 0å2.

But in any case, we're not talking about the file name encoding. We're
talking about the contents of files.


> AFAIK, only the
> terminal has a localized code page by default. Perhaps Notepad will
> write text files with the localized code page by default, but that's an
> application choice...

Exactly. And unless you know what encoding the application chooses, you
will likely get an exception trying to read the file.


--
Steven
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Andrew Berg-4
On 2/12/2012 3:12 AM, Steven D'Aprano wrote:
> NTFS by default uses the UTF-16 encoding, which means the actual bytes
> written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading
> byte-order mark \xff\xfe).
That's what I meant. Those bytes will be interpreted consistently across
all locales.

> Windows has two separate APIs, one for "wide" characters, the other for
> single bytes. Depending on which one you use, the directory will appear
> to be called Наӥв or 0å2.
Yes, and AFAIK, the wide API is the default. The other one only exists
to support programs that don't support the wide API (generally, such
programs were intended to be used on older platforms that lack that API).

> But in any case, we're not talking about the file name encoding. We're
> talking about the contents of files.
Okay then. As I stated, this has nothing to do with the OS since
programs are free to interpret bytes any way they like.

--
CPython 3.2.2 | Windows NT 6.1.7601.17640
--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Mark Lawrence
In reply to this post by Matěj Cepl
On 12/02/2012 08:26, Matej Cepl wrote:

> On 12.2.2012 09:14, Matej Cepl wrote:
>>> Obvious answers:
>>>
>>> - Try decoding with UTF8 or Latin1. Even if you don't get the right
>>> characters, you'll get *something*.
>>>
>>> - Use open(filename, encoding='ascii', errors='surrogateescape')
>>>
>>> (Or possibly errors='ignore'.)
>>
>> These are not good answer, IMHO. The only answer I can think of, really,
>> is:
>
> Slightly less flameish answer to the question “What should I do,
> really?” is a tough one: all these suggested answers are bad because
> they don’t deal with the fact, that your input data are obviously
> broken. The rest is just pure GIGO … without fixing (and I mean, really,
> fixing, not ignoring the problem, which is what the previous answers
> suggest) your input, you’ll get garbage on output. And you should be
> thankful to py3k that it shown the issue to you.
>
> BTW, can you display the following line?
>
> Příliš žluťoučký kůň úpěl ďábelské ódy.
>
> Best,
>
> Matěj

Yes in Thunderbird, Notepad, Wordpad and Notepad++ on Windows Vista,
can't be bothered to try any other apps.

--
Cheers.

Mark Lawrence.

--
http://mail.python.org/mailman/listinfo/python-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Python usage numbers

Roy Smith
In reply to this post by Rick Johnson
In article <[hidden email]>,
 Chris Angelico <[hidden email]> wrote:

> On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson
> <[hidden email]> wrote:
> > On Feb 11, 8:23 pm, Steven D'Aprano <steve
> > +[hidden email]> wrote:
> >> "I have a file containing text. I can open it in an editor and see it's
> >> nearly all ASCII text, except for a few weird and bizarre characters like
> >> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
> >> error. What should I do that requires no thought?"
> >>
> >> Obvious answers:
> >
> > the most obvious answer would be to read the file WITHOUT worrying
> > about asinine encoding.
>
> What this statement misunderstands, though, is that ASCII is itself an
> encoding. Files contain bytes, and it's only what's external to those
> bytes that gives them meaning.
Exactly.  <soapbox class="wise-old-geezer">.  ASCII was so successful at
becoming a universal standard which lasted for decades, people who grew
up with it don't realize there was once any other way.  Not just EBCDIC,
but also SIXBIT, RAD-50, tilt/rotate, packed card records, and so on.  
Transcoding was a way of life, and if you didn't know what you were
starting with and aiming for, it was hopeless.  Kind of like now where
we are again with Unicode.  </soapbox>

--
http://mail.python.org/mailman/listinfo/python-list
1234 ... 6
Loading...