Best practice for opening files for newbies?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Best practice for opening files for newbies?

Chris Barker
Folks,

I'm in the position of teaching Python to beginners (beginners to Python, anyway).

I'm teaching Python2 -- because that is still what most of the code "in the wild" is in. I do think I"ll transition to Python 3 fairly soon, as it's not too hard for folks to back-port their knowledge, but for now, it's Py2 -- and I'm hoping not to have that debate on this thread.

But I do want to keep the 2->3 transition in mind, so where it's not too hard, want to teach approaches that will transition well to py3.

So: there are way too many ways to open a simple file to read or write a bit of text (or binary):

open()
file()
io.open()
codecs.open()

others???

I'm thinking that way to go now with modern Py2 is:

from io import open

then use open() .....

IIUC, this will give the user an open() that behaves the same way as py3's open() (identical?).

The only issue (so far) I've run into is this:

In [51]: f = io.open("test_file.txt", 'w')

In [52]: f.write("some string")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-52-f874778a72a1> in <module>()
----> 1 f.write("some string")

TypeError: must be unicode, not str

I'm OK with that -- I think it's better for folks learning py2 now to get used to Unicode up front anyway.

But any other issues? Is this a good way to go?

By the way: I note that the default encoding for io.open on my system (OS-X) is utf-8, despite:
In [54]: sys.getdefaultencoding()
Out[54]: 'ascii'

How is that determined?

-CHB



Reply | Threaded
Open this post in threaded view
|

Best practice for opening files for newbies?

Chris Angelico
On Fri, Sep 19, 2014 at 2:19 AM,  <chris.barker at noaa.gov> wrote:
> So: there are way too many ways to open a simple file to read or write a bit of text (or binary):
>
> open()

Personally, I'd just use this, all the way through - and not importing
from io, either. But others may disagree.

Be clear about what's text and what's bytes, everywhere. When you do
make the jump to Py3, you'll have to worry about text files vs binary
files, and if you need to support Windows as well as Unix, you need to
get that right anyway, so just make sure you get the two straight.
Going Py3 will actually make your job quite a bit easier, there; but
even if you don't, save yourself a lot of trouble later on by keeping
the difference very clear. And you can save yourself some more
conversion trouble by tossing this at the top of every .py file you
write:

from __future__ import print_function, division, unicode_literals

But mainly, just go with the simple open() call and do the job the
easiest way you can. And go Py3 as soon as you can, because ...

> because that is still what most of the code "in the wild" is in.

... this statement isn't really an obvious truth any more (it's hard
to say what "most" code is), and it's definitely not going to remain
so for the long-term future. For people learning Python today, unless
they plan on having a really short career in programming, more of
their time will be after 2020 than before it, and Python 3 is the way
to go.

Plus, it's just way WAY easier to get Unicode right in Py3 than in
Py2. Save yourself the hassle!

ChrisA


Reply | Threaded
Open this post in threaded view
|

Best practice for opening files for newbies?

Chris Barker
In reply to this post by Chris Barker
On Thursday, September 18, 2014 9:38:00 AM UTC-7, Chris Angelico wrote:
> On Fri, Sep 19, 2014 at 2:19 AM,  <chris.barker at noaa.gov> wrote:
> > So: there are way too many ways to open a simple file to read or write a bit of text (or binary):
> > open()
>
> Personally, I'd just use this, all the way through - and not importing
>
> from io, either. But others may disagree.

well the trick there is that it's a serious trick to work with non-ascii compatible text files if you do that...
 
> Be clear about what's text and what's bytes, everywhere. When you do
> make the jump to Py3, you'll have to worry about text files vs binary
> files, and if you need to support Windows as well as Unix, you need to
> get that right anyway, so just make sure you get the two straight.

yup -- I've always emphasized that point, but from a py2 perspective (and with the built in open() file object, what is a utf-8 encoded file? text or bytes? It's bytes -- and you need to do the decoding yourself. Why make people do that?

In the past, I started with open(), ignored unicode for a while then when I introduced unicode, I pointed them to codecs.open() (I hadn't discovered io.open yet ). Maybe I should stick with this approach, but it feels like a bad idea.

> Save yourself a lot of trouble later on by keeping
> the difference very clear.

exactly -- but it's equally clear, and easier and more robust to have two types of files: binary and text, where text requires a known encoding. Rather than three types: binary, ascii text and encoded text, which is really binary, which you can then decode to make text....

Think of somethign as simple and common as loping through the lines in a file!

> And you can save yourself some more
> conversion trouble by tossing this at the top of every .py file you
>
> write:
>
> from __future__ import print_function, division, unicode_literals

yup -- I've been thinking of recommending that to my students as well -- particularly unicode_literal
 
> But mainly, just go with the simple open() call and do the job the
> easiest way you can. And go Py3 as soon as you can, because ...

A discussion for another thread....

Thanks,
    -Chris