[Tutor] question about operator overloading

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Tutor] question about operator overloading

Albert-jan Roskam
Hi,

I am extending a program for a hobby project where potentially huge spss files are read. I would like to add functionality to append files. I thought it would be nice and intuitive to overload + and += for this. The code below is a gross simplification, but I want to get the basics right. Is this the way how operator overloading is usually done?

class Append(object):

    def __init__(self, file1, file2=None):
        """ file1 and file2 will actually be of a class of my own,
        which has a readFile method that is a generator that returns
        one record at a time """
        self.file1 = file1
        self.file2 = file2
        self.merged = []

    def __add__(self):
        self.file1.extend(self.file2)
        return self.file1

    def __iadd__(self):
        self.merged.extend(self.file1)
        return self.merged
       
    def writerows(self):
        rows = self.file1
        for row in rows:
            yield row

# overloading '+'
file1 = [[1, 2, 3], [4, 5, 6], [6, 6, 6]]       
file2 = [[1, 2, 3]]
app = Append(file1, file2)
merged = app.file1 + app.file2 # 'merged'  will not actually hold data
for line in app.writerows():
    print line

# overloading '+='
files = [file1, file2]
for i, f in enumerate(files):
    if i == 0:
        app = Append(f)
        app.merged = f
    else:
        app.merged += f
print app.merged
 
Thank you in advance!

Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Dave Angel-3
On 03/05/2012 03:16 PM, Albert-Jan Roskam wrote:

> Hi,
>
> I am extending a program for a hobby project where potentially huge spss files are read. I would like to add functionality to append files. I thought it would be nice and intuitive to overload + and += for this. The code below is a gross simplification, but I want to get the basics right. Is this the way how operator overloading is usually done?
>
>
> class Append(object):
>
>      def __init__(self, file1, file2=None):
>          """ file1 and file2 will actually be of a class of my own,
>          which has a readFile method that is a generator that returns
>          one record at a time """
>          self.file1 = file1
>          self.file2 = file2
>          self.merged = []
>
>      def __add__(self):
>          self.file1.extend(self.file2)
>          return self.file1
>
>      def __iadd__(self):
>          self.merged.extend(self.file1)
>          return self.merged
>        
>      def writerows(self):
>          rows = self.file1
>          for row in rows:
>              yield row
>
> # overloading '+'
> file1 = [[1, 2, 3], [4, 5, 6], [6, 6, 6]]      
> file2 = [[1, 2, 3]]
> app = Append(file1, file2)
> merged = app.file1 + app.file2 # 'merged'  will not actually hold data
> for line in app.writerows():
>      print line
>
> # overloading '+='
> files = [file1, file2]
> for i, f in enumerate(files):
>      if i == 0:
>          app = Append(f)
>          app.merged = f
>      else:
>          app.merged += f
> print app.merged
>

I hate to say it, but it's not even close.

When you say  app.file1 + app.file2,   you're not calling either of your
special methods you defined in Append.  You're just adding the file1 and
file2 attributes.  Since in your example these are lists, they do the
usual thing.

Similarly, your app.merged += f  does NOT call your __iadd__() method.

Just what kind of an object is an Append object supposed to be?  Classes
are usually for encapsulating data and behavior, not just to bundle up
some functions.

Normally, you should be defining the __add__() and __iadd__() methods in the class that file1 and file2 are instances of.  So if you want to make a dummy example, start by defining a (single) class that holds just one of these.  Then create two instances, and try adding and +='ing the two instances.



DaveA


--

DaveA

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Albert-jan Roskam
Hi Dave,

aha! Good thing I asked. ;-) I've indeed been thinking where this __add__ method should live. The program as it is now has a Generic class, a Reader class and a Writer class. I thought an Append class was appropriate because it uses Reader and Writer (and probably also Generic) methods and the data is from multiple files. It reads a bunch of files (even though the term 'reading' is more a conceptual term here, as none of the data will be held in memory), appends them (__add__), and writes them to one merged file. Doesn't adding __add__ change the responsibility from 'thou shallt read one and only one file' into something less precise?

So if I understand you correctly, the following pseudocode is better?
merged = Reader.readFile(somefile1) + Reader.readFile(somefile2)
# ..which is the same as: Reader.readFile(somefile1).__add__(Reader.readFile(somefile2))
for line in merged:
  Writer.writerow(line)

Maybe this is why my 'top-down code' (what I posted earlier) and my 'bottom-up code' (some code that I wrote earlier) don't add up (pun intended!). In the bottom-up code there was no need for an Append class!
 
Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

From: Dave Angel <[hidden email]>
To: Albert-Jan Roskam <[hidden email]>
Cc: Python Mailing List <[hidden email]>
Sent: Monday, March 5, 2012 9:36 PM
Subject: Re: [Tutor] question about operator overloading

On 03/05/2012 03:16 PM, Albert-Jan Roskam wrote:

> Hi,
>
> I am extending a program for a hobby project where potentially huge spss files are read. I would like to add functionality to append files. I thought it would be nice and intuitive to overload + and += for this. The code below is a gross simplification, but I want to get the basics right. Is this the way how operator overloading is usually done?
>
>
> class Append(object):
>
>      def __init__(self, file1, file2=None):
>          """ file1 and file2 will actually be of a class of my own,
>          which has a readFile method that is a generator that returns
>          one record at a time """
>          self.file1 = file1
>          self.file2 = file2
>          self.merged = []
>
>      def __add__(self):
>          self.file1.extend(self.file2)
>          return self.file1
>
>      def __iadd__(self):
>          self.merged.extend(self.file1)
>          return self.merged
>              def writerows(self):
>          rows = self.file1
>          for row in rows:
>              yield row
>
> # overloading '+'
> file1 = [[1, 2, 3], [4, 5, 6], [6, 6, 6]]      file2 = [[1, 2, 3]]
> app = Append(file1, file2)
> merged = app.file1 + app.file2 # 'merged'  will not actually hold data
> for line in app.writerows():
>      print line
>
> # overloading '+='
> files = [file1, file2]
> for i, f in enumerate(files):
>      if i == 0:
>          app = Append(f)
>          app.merged = f
>      else:
>          app.merged += f
> print app.merged
>

I hate to say it, but it's not even close.

When you say  app.file1 + app.file2,  you're not calling either of your special methods you defined in Append.  You're just adding the file1 and file2 attributes.  Since in your example these are lists, they do the usual thing.

Similarly, your app.merged += f  does NOT call your __iadd__() method.

Just what kind of an object is an Append object supposed to be?  Classes are usually for encapsulating data and behavior, not just to bundle up some functions.

Normally, you should be defining the __add__() and __iadd__() methods in the class that file1 and file2 are instances of.  So if you want to make a dummy example, start by defining a (single) class that holds just one of these.  Then create two instances, and try adding and +='ing the two instances.



DaveA


--
DaveA




_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Dave Angel-3
On 03/05/2012 04:10 PM, Albert-Jan Roskam wrote:

> Hi Dave,
>
> aha! Good thing I asked. ;-) I've indeed been thinking where this __add__ method should live. The program as it is now has a Generic class, a Reader class and a Writer class. I thought an Append class was appropriate because it uses Reader and Writer (and probably also Generic) methods and the data is from multiple files. It reads a bunch of files (even though the term 'reading' is more a conceptual term here, as none of the data will be held in memory), appends them (__add__), and writes them to one merged file. Doesn't adding __add__ change the responsibility from 'thou shallt read one and only one file' into something less precise?
>
>
> So if I understand you correctly, the following pseudocode is better?
>
> merged = Reader.readFile(somefile1) + Reader.readFile(somefile2)
> # ..which is the same as: Reader.readFile(somefile1).__add__(Reader.readFile(somefile2))
> for line in merged:
>    Writer.writerow(line)
>
>
> Maybe this is why my 'top-down code' (what I posted earlier) and my 'bottom-up code' (some code that I wrote earlier) don't add up (pun intended!). In the bottom-up code there was no need for an Append class!
>

Please don't top-post.  We've now lost all the context of what happened
before.

You still don't get it.  If you're going to add objects, they should be
objects that represent what's being added.  So the objects are of type
MyFile, not type Reader, whatever that is.  Reader and Writer sounds
like a java approach.

So you combine two files something like this:

file1 = MyFile(whatever, moreargs)
file2 = MyFile(whateverelse, lessargs)
file1 += file2

that last line will call the method  __iadd__() of class  MyFile.  Self
will be file1, and the other parameter will be file2.  By convention,
it'd add file2 to the end of already-existing file1.

It's not clear what __add__() should mean for physical files.  It builds
something (MyFile instance) that represents two of them, but there's no
way to assign a filename to it, except after the fact.  So it might be
an in-memory equivalent (eg. a list).  It should NOT just extend the
first file.  Generally, it shouldn't modify either of its arguments.

By the way, you could have learned a lot in your original example by
just adding print statements in the two methods.

--
DaveA
_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Alan Gauld
On 05/03/12 21:25, Dave Angel wrote:

> It's not clear what __add__() should mean for physical files.

My guess would be similar to the cat operator in Unix:

$ cat file1, file2 > file3

is equivalent to

file3 = file1 + file2

But of course, thats just my interpretation of file addition...

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Joel Goldstick-2
On Mon, Mar 5, 2012 at 6:20 PM, Alan Gauld <[hidden email]> wrote:

> On 05/03/12 21:25, Dave Angel wrote:
>
>> It's not clear what __add__() should mean for physical files.
>
>
> My guess would be similar to the cat operator in Unix:
>
> $ cat file1, file2 > file3
>
> is equivalent to
>
> file3 = file1 + file2
>
> But of course, thats just my interpretation of file addition...
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
>
>
> _______________________________________________
> Tutor maillist  -  [hidden email]
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

if spss files are text (not binary) why not:

f1 = open("f1")read()
f2 = open("f2")read()

outfile = open("outfile", "w")
outfile.write(f1 + f2)
outfile.close()

You could put this in a function and pass all infiles as *filenames,
then loop to read each file and output result



--
Joel Goldstick
_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Joel Goldstick-2
On Mon, Mar 5, 2012 at 6:35 PM, Joel Goldstick <[hidden email]> wrote:

> On Mon, Mar 5, 2012 at 6:20 PM, Alan Gauld <[hidden email]> wrote:
>> On 05/03/12 21:25, Dave Angel wrote:
>>
>>> It's not clear what __add__() should mean for physical files.
>>
>>
>> My guess would be similar to the cat operator in Unix:
>>
>> $ cat file1, file2 > file3
>>
>> is equivalent to
>>
>> file3 = file1 + file2
>>
>> But of course, thats just my interpretation of file addition...
>>
>> --
>> Alan G
>> Author of the Learn to Program web site
>> http://www.alan-g.me.uk/
>>
>>
>> _______________________________________________
>> Tutor maillist  -  [hidden email]
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>
> if spss files are text (not binary) why not:
>
oops forgot the dots.

> f1 = open("f1").read()
> f2 = open("f2").read()
>
> outfile = open("outfile", "w")
> outfile.write(f1 + f2)
> outfile.close()
>
> You could put this in a function and pass all infiles as *filenames,
> then loop to read each file and output result
>
>
>
> --
> Joel Goldstick



--
Joel Goldstick
_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Steven D'Aprano-8
In reply to this post by Alan Gauld
Alan Gauld wrote:

> On 05/03/12 21:25, Dave Angel wrote:
>
>> It's not clear what __add__() should mean for physical files.
>
> My guess would be similar to the cat operator in Unix:
>
> $ cat file1, file2 > file3
>
> is equivalent to
>
> file3 = file1 + file2
>
> But of course, thats just my interpretation of file addition...

I think that's what Albert-Jan is probably thinking, but the two models are
not quite the same. I think that what he wants is probably closer to something
like the fileinput module. I think what he wants is to avoid this:

for f in (file1, file2, file3, file4):
     for record in f:
         process(record)

in favour of this:

all_the_files = file1 + file2 + file3 + file4  # merge file contents
for record in all_the_files:
     process(record)

Albert-Jan, am I close? If not, please explain what you are trying to accomplish.

If the files are small, the easy way is to just read their contents, add them
together as strings or lists, and then process the lot. But if the files are
big, or you want to process them on-demand instead of up-front, you need an
approach similar to fileinput.

Personally, all these Reader and Append objects make my brain hurt, and I
hardly ever use operator overloading, except perhaps for numeric types. Reader
objects, I can just get. But "Append" objects?

This may be useful:

http://steve-yegge.blogspot.com.au/2006/03/execution-in-kingdom-of-nouns.html

and also itertools:


from itertools import chain
merged = chain(file1, file2, file3, file4)
for record in merged:
     process(record)



--
Steven
_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Dave Angel-3
In reply to this post by Alan Gauld
On 03/05/2012 06:20 PM, Alan Gauld wrote:

> On 05/03/12 21:25, Dave Angel wrote:
>
>> It's not clear what __add__() should mean for physical files.
>
> My guess would be similar to the cat operator in Unix:
>
> $ cat file1, file2 > file3
>
> is equivalent to
>
> file3 = file1 + file2
>
> But of course, thats just my interpretation of file addition...
>

So somehow assigning the object to file3 will write the data to a file
by the name "file3" ?  I know about __add__(), but didn't know we had
__assign__()


--

DaveA

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Peter Otten
Dave Angel wrote:

> On 03/05/2012 06:20 PM, Alan Gauld wrote:
>> On 05/03/12 21:25, Dave Angel wrote:
>>
>>> It's not clear what __add__() should mean for physical files.
>>
>> My guess would be similar to the cat operator in Unix:
>>
>> $ cat file1, file2 > file3
>>
>> is equivalent to
>>
>> file3 = file1 + file2
>>
>> But of course, thats just my interpretation of file addition...
>>
>
> So somehow assigning the object to file3 will write the data to a file
> by the name "file3" ?  I know about __add__(), but didn't know we had
> __assign__()

That is indeed one problem that makes an approach based on operator
overloading clumsy here. You can either invent a name for file3
or defer writing the file:

file3 = file1 + file2
file3.save_as(filename)

Below is an implementation with a made-up destination file name:

$ cat roskam.py
import os

def remove(filename):
    try:
        os.remove(filename)
    except OSError:
        pass

class File(object):
    def __init__(self, filename):
        self.filename = filename
    def __iadd__(self, other):
        with  self.open("a") as dest, other.open() as source:
            dest.writelines(source)
        return self
    def __add__(self, other):
        result = File("+".join([self.filename, other.filename]))
        remove(result.filename)
        result += self
        result += other
        return result
    def open(self, *mode):
        return open(self.filename, *mode)
    def __str__(self):
        return self.filename + ":\n" + "".join("    " + line for line in
self.open())

if __name__ == "__main__":
    remove("file3")
    remove("file4")

    with open("file1", "w") as f:
        f.write("""\
alpha
beta
gamma
""")
    with open("file2", "w") as f:
        f.write("""\
DELTA
EPSILON
""")

    file1, file2, file3 = map(File, ["file1", "file2", "file3"])
    file3 += File("file1")
    file3 += File("file2")
   
    file4 = file2 + file1 + file2

    for f in file1, file2, file3, file4:
        print f

$ python roskam.py
file1:
    alpha
    beta
    gamma

file2:
    DELTA
    EPSILON

file3:
    alpha
    beta
    gamma
    DELTA
    EPSILON

file2+file1+file2:
    DELTA
    EPSILON
    alpha
    beta
    gamma
    DELTA
    EPSILON

The code is meant to illustrate the implementation of __add__() and
__iadd__(), I don't recommend that you actually use it. You can easily
achieve the same with a for loop that is concise and easy to understand:

with open(destname, "wb") as dest:
    for sourcename in sourcenames:
        with open(sourcename, "rb") as source:
            shutil.copyfileobj(source, dest)


_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Alan Gauld
In reply to this post by Dave Angel-3
On 06/03/12 02:42, Dave Angel wrote:

>> My guess would be similar to the cat operator in Unix:
>>
>> file3 = file1 + file2
>>
>
> So somehow assigning the object to file3 will write the data to a file
> by the name "file3" ? I know about __add__(), but didn't know we had
> __assign__()

We don't need any special assign behavior, its just standard Python
assignment of the returned object to a name.

class MyFile(file):
....
    def __add__(self, file2):
       newFile = MyFile('foo.dat','wb')
       newFile.write(self.read())
       newFile.write(file2.read())
       return newFile

file3 = MyFile('spam.dat') + MyFile('baz.dat')

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Albert-jan Roskam
In reply to this post by Steven D'Aprano-8
From: Steven D'Aprano <[hidden email]>
To: [hidden email]
Sent: Tuesday, March 6, 2012 1:58 AM
Subject: Re: [Tutor] question about operator overloading
Alan Gauld wrote:

> On 05/03/12 21:25, Dave Angel wrote:
>
>> It's not clear what __add__() should mean for physical files.
>
> My guess would be similar to the cat operator in Unix:
>
> $ cat file1, file2 > file3
>
> is equivalent to
>
> file3 = file1 + file2
>
> But of course, thats just my interpretation of file addition...

I think that's what Albert-Jan is probably thinking, but the two models are not quite the same. I think that what he wants is probably closer to something like the fileinput module. I think what he wants is to avoid this:
-----> First off, thank you all for your replies, including the replies after this mail. And sorry for top-posting in an earlier mail
-----> And yes indeed Steven and Alan, this is what I had in mind.
for f in (file1, file2, file3, file4):
    for record in f:
        process(record)

in favour of this:

all_the_files = file1 + file2 + file3 + file4  # merge file contents
for record in all_the_files:
    process(record)

Albert-Jan, am I close? If not, please explain what you are trying to accomplish.
----> What I had in mind was something like Peter Otten suggested:
merged = file1 + file2
merged.save_as(filename)
Your solution looks intuitive, but won't "all_the_files" become very large if file1 through file4 contain, say, 100 billion values each?

If the files are small, the easy way is to just read their contents, add them together as strings or lists, and then process the lot. But if the files are big, or you want to process them on-demand instead of up-front, you need an approach similar to fileinput.
----> see above. Btw, contrary to what somebody in this thread said, Spss files are binary files, not text files.

Personally, all these Reader and Append objects make my brain hurt, and I hardly ever use operator overloading, except perhaps for numeric types. Reader objects, I can just get. But "Append" objects?

This may be useful:

http://steve-yegge.blogspot.com.au/2006/03/execution-in-kingdom-of-nouns.html
----> Nice one ;-))

and also itertools:


from itertools import chain
merged = chain(file1, file2, file3, file4)
for record in merged:
    process(record)
----> Very, *very* useful function, thank you!
----> this is (incomplete) code that I created without bothering about __add__:
with SavWriter(mergedSavFileName, varNames, varTypes) as sav_merged:
  for savFileName in glob.glob("d:/temp/*.sav"):
    with SavReader(savFileName) as sav_r:
      header = sav_r.next()
      for row in sav_r:
        sav_merged.writerow(row)
http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/
---> Maybe I am making my life too difficult by trying to use __add__?

-- Steven
_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] question about operator overloading

Joel Goldstick-2
On Tue, Mar 6, 2012 at 7:05 AM, Albert-Jan Roskam <[hidden email]> wrote:

> From: Steven D'Aprano <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, March 6, 2012 1:58 AM
>
> Subject: Re: [Tutor] question about operator overloading
>
> Alan Gauld wrote:
>
>> On 05/03/12 21:25, Dave Angel wrote:
>>
>>> It's not clear what __add__() should mean for physical files.
>>
>> My guess would be similar to the cat operator in Unix:
>>
>> $ cat file1, file2 > file3
>>
>> is equivalent to
>>
>> file3 = file1 + file2
>>
>> But of course, thats just my interpretation of file addition...
>
> I think that's what Albert-Jan is probably thinking, but the two models are
> not quite the same. I think that what he wants is probably closer to
> something like the fileinput module. I think what he wants is to avoid this:
> -----> First off, thank you all for your replies, including the replies
> after this mail. And sorry for top-posting in an earlier mail
> -----> And yes indeed Steven and Alan, this is what I had in mind.
> for f in (file1, file2, file3, file4):
>     for record in f:
>         process(record)
>
> in favour of this:
>
> all_the_files = file1 + file2 + file3 + file4  # merge file contents
> for record in all_the_files:
>     process(record)
>
> Albert-Jan, am I close? If not, please explain what you are trying to
> accomplish.
> ----> What I had in mind was something like Peter Otten suggested:
> merged = file1 + file2
> merged.save_as(filename)
> Your solution looks intuitive, but won't "all_the_files" become very large
> if file1 through file4 contain, say, 100 billion values each?
>
>
> If the files are small, the easy way is to just read their contents, add
> them together as strings or lists, and then process the lot. But if the
> files are big, or you want to process them on-demand instead of up-front,
> you need an approach similar to fileinput.
> ----> see above. Btw, contrary to what somebody in this thread said, Spss
> files are binary files, not text files.

I chimed in with an assumption that these are were text files.  Since
i wasn't right assuming that, and if I read the discussion correctly
there are two methods being contemplated.  One way is to read a file,
process it, write the result to an output file, read the next file,
process it and append to the output file.  The second method is to
concatinate all of the input files, then open it up and process it.

But, if the spss files aren't text, then I assume they have some
structure that might not be concatinatable.  Not sure if that is a
word.

>
>
> Personally, all these Reader and Append objects make my brain hurt, and I
> hardly ever use operator overloading, except perhaps for numeric types.
> Reader objects, I can just get. But "Append" objects?
>
> This may be useful:
>
> http://steve-yegge.blogspot.com.au/2006/03/execution-in-kingdom-of-nouns.html
> ----> Nice one ;-))
>
> and also itertools:
>
>
> from itertools import chain
> merged = chain(file1, file2, file3, file4)
> for record in merged:
>     process(record)
> ----> Very, *very* useful function, thank you!
> ----> this is (incomplete) code that I created without bothering about
> __add__:
> with SavWriter(mergedSavFileName, varNames, varTypes) as sav_merged:
>   for savFileName in glob.glob("d:/temp/*.sav"):
>     with SavReader(savFileName) as sav_r:
>       header = sav_r.next()
>       for row in sav_r:
>         sav_merged.writerow(row)
> http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/
> ---> Maybe I am making my life too difficult by trying to use __add__?
>
> -- Steven
> _______________________________________________
> Tutor maillist  -  [hidden email]
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
>
> _______________________________________________
> Tutor maillist  -  [hidden email]
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



--
Joel Goldstick
_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor