zip as iterator and bad/good practices

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Fabien
Folks,

I am developing a program which I'd like to be python 2 and 3
compatible. I am still relatively new to python and I use primarily py3
for development. Every once in a while I use a py2 interpreter to see if
my tests pass through.

I just spent several hours tracking down a bug which was related to the
fact that zip is an iterator in py3 but not in py2. Of course I did not
know about that difference. I've found the izip() function which should
do what I want, but that awful bug made me wonder: is it a bad practice
to interactively modify the list you are iterating over?

I am computing mass fluxes along glacier branches ordered by
hydrological order, i.e. branch i is guaranteed to flow in a branch
later in that list. Branches are objects which have a pointer to the
object they are flowing into.

In pseudo code:

for stuff, branch in zip(stuffs, branches):
        # compute flux
        ...
        # add to the downstream branch
        id_branch = branches.index(branch.flows_to)
        branches[id_branch].property.append(stuff_i_computed)

So, all downstream branches in python2 where missing information from
their tributaries. It is quite a dangerous code but I can't find a more
elegant solution.

Thanks!

Fabien


Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Fabien
On 06/12/2015 05:00 PM, Fabien wrote:
> I've found the izip() function which should do what I want

I've just come accross a stackoverflow post where they recommend:

from future_builtins import zip

which is OK since I don't want to support versions <= 2.6

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Ian Kelly-2
In reply to this post by Fabien
On Fri, Jun 12, 2015 at 9:00 AM, Fabien <fabien.maussion at gmail.com> wrote:

> Folks,
>
> I am developing a program which I'd like to be python 2 and 3 compatible. I
> am still relatively new to python and I use primarily py3 for development.
> Every once in a while I use a py2 interpreter to see if my tests pass
> through.
>
> I just spent several hours tracking down a bug which was related to the fact
> that zip is an iterator in py3 but not in py2. Of course I did not know
> about that difference. I've found the izip() function which should do what I
> want

If you're supporting both 2 and 3, you may want to look into using the
third-party "six" library, which provides utilities for writing
cross-compatible code.  Using the correct zip() function with six is
just:

    from six.moves import zip

> but that awful bug made me wonder: is it a bad practice to
> interactively modify the list you are iterating over?

Generally speaking, yes, it's bad practice to add or remove items
because this may result in items being visited more than once or not
at all. Modifying or replacing items however is usually not an issue.

> I am computing mass fluxes along glacier branches ordered by hydrological
> order, i.e. branch i is guaranteed to flow in a branch later in that list.
> Branches are objects which have a pointer to the object they are flowing
> into.
>
> In pseudo code:
>
> for stuff, branch in zip(stuffs, branches):
>         # compute flux
>         ...
>         # add to the downstream branch
>         id_branch = branches.index(branch.flows_to)
>         branches[id_branch].property.append(stuff_i_computed)

Er, I don't see the problem here. The branch object in the zip list
and the branch object in branches should be the *same* object, so the
downstream branch update should be reflected when you visit it later
in the iteration, regardless of whether zip returns a list or an iterator.

Tangentially, unless you're using id_branch for something else that
isn't shown here, is it really necessary to search the list for the
downstream branch when it looks like you already have a reference to
it? Could the above simply be replaced with:

    branch.flows_to.property.append(stuff_i_computed)

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Fabien
In reply to this post by Fabien
On 06/12/2015 05:26 PM, Ian Kelly wrote:

>> for stuff, branch in zip(stuffs, branches):
>> >         # compute flux
>> >         ...
>> >         # add to the downstream branch
>> >         id_branch = branches.index(branch.flows_to)
>> >         branches[id_branch].property.append(stuff_i_computed)
> Er, I don't see the problem here. The branch object in the zip list
> and the branch object in branches should be the*same*  object, so the
> downstream branch update should be reflected when you visit it later
> in the iteration, regardless of whether zip returns a list or an iterator.
>
> Tangentially, unless you're using id_branch for something else that
> isn't shown here, is it really necessary to search the list for the
> downstream branch when it looks like you already have a reference to
> it? Could the above simply be replaced with:
>
>      branch.flows_to.property.append(stuff_i_computed)

Thanks a lot for your careful reading! I overly simplified my example
and indeed this line works fine. I was adding things to "stuffs" too,
which is a list of lists... Sorry for the confusion!

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Fabien
In reply to this post by Fabien
On 06/12/2015 05:26 PM, Ian Kelly wrote:
>> but that awful bug made me wonder: is it a bad practice to
>> >interactively modify the list you are iterating over?
> Generally speaking, yes, it's bad practice to add or remove items
> because this may result in items being visited more than once or not
> at all. Modifying or replacing items however is usually not an issue.
>

Thanks. In that case I was modifying items and needed them to be updated
during the loop. I kept the solution as is and my tests pass in 2 and 3.

I will consider using six. Currently all my modules begin with:


from __future__ import division
try:
     from itertools import izip as zip
except ImportError:
     pass

Which might even become longer if I find other bugs ;-)

Fabien

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Mark Lawrence
In reply to this post by Fabien
On 12/06/2015 16:00, Fabien wrote:

> Folks,
>
> I am developing a program which I'd like to be python 2 and 3
> compatible. I am still relatively new to python and I use primarily py3
> for development. Every once in a while I use a py2 interpreter to see if
> my tests pass through.
>
> I just spent several hours tracking down a bug which was related to the
> fact that zip is an iterator in py3 but not in py2. Of course I did not
> know about that difference. I've found the izip() function which should
> do what I want, but that awful bug made me wonder: is it a bad practice
> to interactively modify the list you are iterating over?
>
> I am computing mass fluxes along glacier branches ordered by
> hydrological order, i.e. branch i is guaranteed to flow in a branch
> later in that list. Branches are objects which have a pointer to the
> object they are flowing into.
>
> In pseudo code:
>
> for stuff, branch in zip(stuffs, branches):
>      # compute flux
>      ...
>      # add to the downstream branch
>      id_branch = branches.index(branch.flows_to)
>      branches[id_branch].property.append(stuff_i_computed)
>
> So, all downstream branches in python2 where missing information from
> their tributaries. It is quite a dangerous code but I can't find a more
> elegant solution.
>
> Thanks!
>
> Fabien
>

Start here https://docs.python.org/3/howto/pyporting.html

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence


Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Laura Creighton-2
In reply to this post by Fabien
The real problem is removing things from lists when you are iterating
over them, not adding things to the end of lists.

Python 2.7.9 (default, Mar  1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> mylist = [1,2,3]
>>> for i in mylist:
...     print i
...     mylist.remove(i)
...
1
3
>>> mylist
[2]

Most people expect 1 2 and 3 to get printed, and mylist to be empty at
the end of this loop.

Laura

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Terry Reedy
In reply to this post by Fabien
On 6/12/2015 11:00 AM, Fabien wrote:
> is it a bad practice
> to interactively modify the list you are iterating over?

One needs care.  Appending to the end of the list is OK, unless you
append a billion items or so ;-)  Appending to the end of a queue while
*removing* items from the front of the queue, where the queue resizes
itself at the front as needed, is standard for breadth-first search.  A
deque.Deque can be used for this.  Depth-first search appends to and
deletes from the end (or top) of a stack, but this is NOT
forward-iteration as implemented by Python iterators.

--
Terry Jan Reedy


Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Terry Reedy
In reply to this post by Laura Creighton-2
On 6/12/2015 4:34 PM, Laura Creighton wrote:
> The real problem is removing things from lists when you are iterating
> over them, not adding things to the end of lists.

One needs to iterate backwards.

 >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]

 >>> for i in range(len(ints)-1, -1, -1):
        if ints[i] % 2:
                del ints[i]
       
 >>> ints
[0, 2, 2, 4, 6]

But using a list comp and, if necessary, copying the result back into
the original list is much easier.

 >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
 >>> ints[:] = [i for i in ints if not i % 2]
 >>> ints
[0, 2, 2, 4, 6]


--
Terry Jan Reedy


Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

sohcahtoa82@gmail.com
In reply to this post by Laura Creighton-2
On Friday, June 12, 2015 at 4:44:08 PM UTC-7, Terry Reedy wrote:

> On 6/12/2015 4:34 PM, Laura Creighton wrote:
> > The real problem is removing things from lists when you are iterating
> > over them, not adding things to the end of lists.
>
> One needs to iterate backwards.
>
>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
>
>  >>> for i in range(len(ints)-1, -1, -1):
> if ints[i] % 2:
> del ints[i]
>
>  >>> ints
> [0, 2, 2, 4, 6]
>
> But using a list comp and, if necessary, copying the result back into
> the original list is much easier.
>
>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
>  >>> ints[:] = [i for i in ints if not i % 2]
>  >>> ints
> [0, 2, 2, 4, 6]
>
>
> --
> Terry Jan Reedy

On the second line of your final solution, is there any reason you're using `ints[:]` rather than just `ints`?

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Chris Angelico
On Sat, Jun 13, 2015 at 10:02 AM,  <sohcahtoa82 at gmail.com> wrote:

>>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
>>  >>> ints[:] = [i for i in ints if not i % 2]
>>  >>> ints
>> [0, 2, 2, 4, 6]
>>
>>
>> --
>> Terry Jan Reedy
>
> On the second line of your final solution, is there any reason you're using `ints[:]` rather than just `ints`?

If you use "ints = [...]", it rebinds the name ints to the new list.
If you use "ints[:] = [...]", it replaces the entire contents of the
list with the new list. The two are fairly similar if there are no
other references to that list, but the replacement matches the
mutation behaviour of remove().

def just_some(ints):
    ints[:] = [i for i in ints if not i % 2]

ChrisA

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

sohcahtoa82@gmail.com
In reply to this post by sohcahtoa82@gmail.com
On Friday, June 12, 2015 at 5:27:21 PM UTC-7, Chris Angelico wrote:

> On Sat, Jun 13, 2015 at 10:02 AM,  <sohcahtoa82 at gmail.com> wrote:
> >>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
> >>  >>> ints[:] = [i for i in ints if not i % 2]
> >>  >>> ints
> >> [0, 2, 2, 4, 6]
> >>
> >>
> >> --
> >> Terry Jan Reedy
> >
> > On the second line of your final solution, is there any reason you're using `ints[:]` rather than just `ints`?
>
> If you use "ints = [...]", it rebinds the name ints to the new list.
> If you use "ints[:] = [...]", it replaces the entire contents of the
> list with the new list. The two are fairly similar if there are no
> other references to that list, but the replacement matches the
> mutation behaviour of remove().
>
> def just_some(ints):
>     ints[:] = [i for i in ints if not i % 2]
>
> ChrisA

Ah that makes sense.  Thanks.

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Jimages
In reply to this post by Fabien


> On Jun 12, 2015, at 11:00 PM, Fabien <fabien.maussion at gmail.com> wrote:
> but that awful bug made me wonder: is it a bad practice to interactively modify the list you are iterating over?
Yes.
I am a newbie. I also have been confused when I read the tutorial. It recommends make a copy before looping. Then I try.
#--------------------------
Test = [1, 2]
For i in Test:
    Test.append(i)
#--------------------------
But when i execute. The script does not end. I know there must something wrong. So I launch debugger and deserve the list after each loop.
And I see:
Loop 1: [ 1, 2, 1]
Loop 2: [ 1, 2, 1, 2]
Loop 3: [ 1, 2, 1, 2, 1]
Loop 4: [ 1, 2, 1, 2, 1, 2]
......
So you can see that loop will *never* end.
So I think you regard the 'i' as a pointer. After execute one loop the pointer repoints to next element , but at the same time you are appending element. So pointer will *never* repoints to the last element.
How to solve?
Change code to this
#--------------------------
Test = [1, 2]
For i in Test[:] :
    Test.append(i)
#--------------------------



Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Steven D'Aprano-11
In reply to this post by Fabien
On Sat, 13 Jun 2015 13:32:59 +0800, jimages wrote:

> I am a newbie. I also have been confused when I read the tutorial. It
> recommends make a copy before looping. Then I try.
> #--------------------------
> Test = [1, 2]
> For i in Test:
>     Test.append(i)
> #--------------------------

You don't make a copy of Test here. You could try this instead:

Test = [1, 2]
copy_test = Test[:]  # [:] makes a slice copy of the whole list
for i in copy_test:  # iterate over the copy
    Test.append(i)  # and append to the original

print(Test)


But an easier way is:

Test = [1, 2]
Test.extend(Test)
print(Test)


> But when i execute. The script does not end. I know there must something
> wrong. So I launch debugger and deserve the list after each loop. And I
> see:
> Loop 1: [ 1, 2, 1]
> Loop 2: [ 1, 2, 1, 2]
> Loop 3: [ 1, 2, 1, 2, 1]
> Loop 4: [ 1, 2, 1, 2, 1, 2]
> ......
> So you can see that loop will *never* end. So I think you regard the 'i'
> as a pointer.

i is not a pointer. It is just a variable that gets a value from the
list, the same as:

    # first time through the loop
    i = Test[0]
    # second time through the loop
    i = Test[1]  # the second item


The for loop statement:

    for item in seq: ...

understands sequences, lists, and other iterables, not "item". item is
just an ordinary variable, nothing special about it. The for statement
takes the items in seq, one at a time, and assigns them to the variable
"item". In English:

    for each item in seq ...

or to put it another way:

    get the first item of seq
    assign it to "item"
    process the block
    get the second item of seq
    assign it to "item"
    process the block
    get the third item of seq
    assign it to "item"
    process the block
    ...

and so on, until seq runs out of items. But if you keep appending items
to the end, it will never run out.


> Change code to this
> #--------------------------
> Test = [1, 2]
> For i in Test[:] :
>     Test.append(i)
> #--------------------------


Yes, this will work.


--
Steve

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Oscar Benjamin-2
On 13 June 2015 at 08:17, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:

> On Sat, 13 Jun 2015 13:32:59 +0800, jimages wrote:
>
>> I am a newbie. I also have been confused when I read the tutorial. It
>> recommends make a copy before looping. Then I try.
>> #--------------------------
>> Test = [1, 2]
>> For i in Test:
>>     Test.append(i)
>> #--------------------------
>
> You don't make a copy of Test here. You could try this instead:
>
> Test = [1, 2]
> copy_test = Test[:]  # [:] makes a slice copy of the whole list
> for i in copy_test:  # iterate over the copy
>     Test.append(i)  # and append to the original
>
> print(Test)
>
>
> But an easier way is:
>
> Test = [1, 2]
> Test.extend(Test)
> print(Test)

I can't see anything in the docs that specify the behaviour that
occurs here. If I change it to

    Test.extend(iter(Test))

then it borks my system in 1s after consuming 8GB of RAM (I recovered
with killall python in the tty).

According to the docs:
"""
list.extend(L)

Extend the list by appending all the items in the given list;
equivalent to a[len(a):] = L.
"""
https://docs.python.org/2/tutorial/datastructures.html#more-on-lists

The alternate form

    Test[len(Test):] = Test

is equivalent but

    Test[len(Test):] = iter(Test)

is not since it doesn't bork my system.

I looked here:
https://docs.python.org/2/library/stdtypes.html#mutable-sequence-types
but I don't see anything that specifies how self-referential slice
assignment should behave.

I checked under pypy and all behaviour is the same but I'm not sure if
this shouldn't be considered implementation-defined or undefined
behaviour. It's not hard to see how a rearrangement of the list.extend
method would lead to a change of behaviour and I can't see that the
current behaviour is really guaranteed by the language and in fact
it's inconsistent with the docs for list.extend.

As an aside they say that pypy is fast but it took about 10 times
longer than cpython to bork my system. :)

--
Oscar

Reply | Threaded
Open this post in threaded view
|

zip as iterator and bad/good practices

Steven D'Aprano-11
In reply to this post by Steven D'Aprano-11
On Sat, 13 Jun 2015 13:48:45 +0100, Oscar Benjamin wrote:

> On 13 June 2015 at 08:17, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:

>> But an easier way is:
>>
>> Test = [1, 2]
>> Test.extend(Test)
>> print(Test)
>
> I can't see anything in the docs that specify the behaviour that occurs
> here.

Neither do I, but there is a test for it:

        a.extend(a)
        self.assertEqual(a, self.type2test([0, 0, 1, 0, 0, 1]))

https://hg.python.org/cpython/file/a985b6455fde/Lib/test/list_tests.py

> If I change it to
>
>     Test.extend(iter(Test))
>
> then it borks my system in 1s after consuming 8GB of RAM (I recovered
> with killall python in the tty).

The reason that fails should be obvious: as new items keep getting added
to Test, the iterator likewise sees more items to iterate over. I don't
know if this is documented, but you can see what happens here:

py> L = [10, 20]
py> it = iter(L)
py> L.append(next(it)); print L
[10, 20, 10]
py> L.append(next(it)); print L
[10, 20, 10, 20]
py> L.append(next(it)); print L
[10, 20, 10, 20, 10]
py> L.append(next(it)); print L
[10, 20, 10, 20, 10, 20]


So as Test.extend tries to iterate over iter(Test), it just keeps growing
as more items are added to Test.


--
Steven D'Aprano