Question about breaking out of a loop

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about breaking out of a loop

Hy Carrinski
I am working on code to solve a combinatorial probability problem, and
plan to send a link to the full code in a few days.

There is generator that yields tuples in a defined order into a loop
that performs a calculation. I would like to provide an option to stop
the calculation when the threshold is reached.

I have put a simplified sample of this on github:
https://gist.github.com/1012945

My questions are:
   1. Is it an antipattern to change a datatype to cause an exception?
   2. If so, how would you improve on my version 3 function?

The function in version 3 is pretty close to my current solution, but
the functions combinations(), f() and g() are standing in for more
computationally intensive functions. The potential antipattern
involves temporarily setting a value to None in a dictionary of
integers.

Thank you,
Hy
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Question about breaking out of a loop

Dirk Bergstrom-5
On 06/07/2011 12:40 PM, Hy Carrinski wrote:
> There is generator that yields tuples in a defined order into a loop
> that performs a calculation. I would like to provide an option to stop
> the calculation when the threshold is reached.
> The function in version 3 is pretty close to my current solution, but
> the functions combinations(), f() and g() are standing in for more
> computationally intensive functions.

This seems like a perfect example of premature optimization.  You've got
a loop with two computationally intensive operations per cycle, and
you're worried about optimizing away a single if-equals check per cycle.
  Will that single if statement really make so much difference once you
put the real (and presumably much more time consuming) functions in place?

--
        --------------------------------------
       Dirk Bergstrom           [hidden email]
              http://otisbean.com/
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Question about breaking out of a loop

Jeremy Fishman
In reply to this post by Hy Carrinski
Your second and third definitions aren't really different from each other.  Both incur a "per-iteration penalty", the first with an if-statement and the second with a dictionary lookup.

I bet you are not going to get a noticeable speedup over a simple if-statement check, but an alternative approach is to solve the problem you are checking for up-front:

>>> # warning: not a proof
... 
>>> def count(seq):
...   return sum(1 for e in seq)
... 
>>> def f(n, w, t):
...   return (c for c in combinations(range(n), w) if c[0] < t)
... 
>>> def g(n, w, t):
...   for i in range(t):
...     for c in combinations(range(i + 1, n), w - 1):
...       yield (i,) + c
... 
>>> [count(f(10, 5, i)) for i in range(5)]
[0, 126, 196, 231, 246]
>>> [count(g(10, 5, i)) for i in range(5)]
[0, 126, 196, 231, 246]
>>> list(f(5, 3, 2))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2, 3), (1, 2, 4), (1, 3, 4)]
>>> list(g(5, 3, 2))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2, 3), (1, 2, 4), (1, 3, 4)]


  - Jeremy

On Tue, Jun 7, 2011 at 12:40 PM, Hy Carrinski <[hidden email]> wrote:
I am working on code to solve a combinatorial probability problem, and
plan to send a link to the full code in a few days.

There is generator that yields tuples in a defined order into a loop
that performs a calculation. I would like to provide an option to stop
the calculation when the threshold is reached.

I have put a simplified sample of this on github:
https://gist.github.com/1012945

My questions are:
  1. Is it an antipattern to change a datatype to cause an exception?
  2. If so, how would you improve on my version 3 function?

The function in version 3 is pretty close to my current solution, but
the functions combinations(), f() and g() are standing in for more
computationally intensive functions. The potential antipattern
involves temporarily setting a value to None in a dictionary of
integers.

Thank you,
Hy
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies


_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Question about breaking out of a loop

Hy Carrinski
In reply to this post by Dirk Bergstrom-5
I agree that this optimization is not critical. Premature optimization
is certainly something that I try to avoid, and is an easy path to
follow.

My actual code has gone through a few rounds of refactoring and some
optimization. This is question involves a specific area which my
profiling has shown can introduce around a 10% decrease in runtime
based only on eliminating this conditional. These rounds have involved
effort to maintain and increase clarity. The computationally intensive
functions actually make use of caching so they do not consume much
time.

I did not include many of these details in the original posting.

By introducing the if statement, the runtime of the profiled actual
code increases by around 20%. While that increase is not too
important, the structure with the if statement does not seem right to
me because it introduces the overhead whether or not the threshold
option is exercised. Is there a more Pythonic way to introduce the
break without the commensurate increase in overhead? By the way, I do
think that this loop is the most appropriate place to introduce a
threshold.

w.r.t. Jeremy's recent interesting suggestions. I think that filtering
on the generator may not actually result in StopIteration without
computing the values. My actual generator makes a sequence of only the
values that I care about. After writing it, I found a posting by Tim
Peters from a few years ago at
http://code.activestate.com/recipes/218332/ which uses an algorithm
similar to mine (for the generator). But, please note that link is
pretty far astray from the present questions.

Thank you,
Hy


On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom <[hidden email]> wrote:

> On 06/07/2011 12:40 PM, Hy Carrinski wrote:
>>
>> There is generator that yields tuples in a defined order into a loop
>> that performs a calculation. I would like to provide an option to stop
>> the calculation when the threshold is reached.
>> The function in version 3 is pretty close to my current solution, but
>> the functions combinations(), f() and g() are standing in for more
>> computationally intensive functions.
>
> This seems like a perfect example of premature optimization.  You've got a
> loop with two computationally intensive operations per cycle, and you're
> worried about optimizing away a single if-equals check per cycle.  Will that
> single if statement really make so much difference once you put the real
> (and presumably much more time consuming) functions in place?
>
> --
>       --------------------------------------
>      Dirk Bergstrom           [hidden email]
>             http://otisbean.com/
>
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Question about breaking out of a loop

Jeremy Fishman
Yes thank you Hy for pointing out f() function will not break early.  I am not sure why I changed the implementation as I had intended f() to be a copy of your for-loop from loop_fcn_v2() to demonstrate equivalency.

I believe g() generates exactly the values expected and no more.
  - Jeremy


On Tue, Jun 7, 2011 at 1:42 PM, Hy Carrinski <[hidden email]> wrote:
I agree that this optimization is not critical. Premature optimization
is certainly something that I try to avoid, and is an easy path to
follow.

My actual code has gone through a few rounds of refactoring and some
optimization. This is question involves a specific area which my
profiling has shown can introduce around a 10% decrease in runtime
based only on eliminating this conditional. These rounds have involved
effort to maintain and increase clarity. The computationally intensive
functions actually make use of caching so they do not consume much
time.

I did not include many of these details in the original posting.

By introducing the if statement, the runtime of the profiled actual
code increases by around 20%. While that increase is not too
important, the structure with the if statement does not seem right to
me because it introduces the overhead whether or not the threshold
option is exercised. Is there a more Pythonic way to introduce the
break without the commensurate increase in overhead? By the way, I do
think that this loop is the most appropriate place to introduce a
threshold.

w.r.t. Jeremy's recent interesting suggestions. I think that filtering
on the generator may not actually result in StopIteration without
computing the values. My actual generator makes a sequence of only the
values that I care about. After writing it, I found a posting by Tim
Peters from a few years ago at
http://code.activestate.com/recipes/218332/ which uses an algorithm
similar to mine (for the generator). But, please note that link is
pretty far astray from the present questions.

Thank you,
Hy


On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom <[hidden email]> wrote:
> On 06/07/2011 12:40 PM, Hy Carrinski wrote:
>>
>> There is generator that yields tuples in a defined order into a loop
>> that performs a calculation. I would like to provide an option to stop
>> the calculation when the threshold is reached.
>> The function in version 3 is pretty close to my current solution, but
>> the functions combinations(), f() and g() are standing in for more
>> computationally intensive functions.
>
> This seems like a perfect example of premature optimization.  You've got a
> loop with two computationally intensive operations per cycle, and you're
> worried about optimizing away a single if-equals check per cycle.  Will that
> single if statement really make so much difference once you put the real
> (and presumably much more time consuming) functions in place?
>
> --
>       --------------------------------------
>      Dirk Bergstrom           [hidden email]
>             http://otisbean.com/
>


_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Question about breaking out of a loop

Hy Carrinski
Jeremy's function g() does produce outputs equivalent to the sample
code without using a conditional. Thank you also for including
examples. As a generator, it actually could serve as a three parameter
wrapper for itertools.combinations().

I hope that this present email can better specify the question. I very
much appreciate the good thoughts thus far.

1. The only information we have about the generator is that its
outputs have a defined order. I find this abstraction useful because
it may help any answers to this question to be more generally
applicable, and also contains the complexity of the code for the
generator itself by not considering multiple starting or ending
points.

> There is generator that yields tuples in a defined order into a loop
> that performs a calculation.

2. I wrote the gist in order to answer the primary question:

    Is it an antipattern to change a datatype to cause an exception?

3. I also began to think that itertools.takewhile() might be a better
option, but does not seem to be in this case. First, it introduces a
need for something larger than an integer to compare, for the case
where the threshold is None. Second, while it looked nice, it also
caused a significant slowdown (~50%)

Thanks,
Hy


On Tue, Jun 7, 2011 at 1:57 PM, Jeremy Fishman
<[hidden email]> wrote:

> Yes thank you Hy for pointing out f() function will not break early.  I am
> not sure why I changed the implementation as I had intended f() to be a copy
> of your for-loop from loop_fcn_v2() to demonstrate equivalency.
> I believe g() generates exactly the values expected and no more.
>   - Jeremy
>
>
> On Tue, Jun 7, 2011 at 1:42 PM, Hy Carrinski <[hidden email]> wrote:
>>
>> I agree that this optimization is not critical. Premature optimization
>> is certainly something that I try to avoid, and is an easy path to
>> follow.
>>
>> My actual code has gone through a few rounds of refactoring and some
>> optimization. This is question involves a specific area which my
>> profiling has shown can introduce around a 10% decrease in runtime
>> based only on eliminating this conditional. These rounds have involved
>> effort to maintain and increase clarity. The computationally intensive
>> functions actually make use of caching so they do not consume much
>> time.
>>
>> I did not include many of these details in the original posting.
>>
>> By introducing the if statement, the runtime of the profiled actual
>> code increases by around 20%. While that increase is not too
>> important, the structure with the if statement does not seem right to
>> me because it introduces the overhead whether or not the threshold
>> option is exercised. Is there a more Pythonic way to introduce the
>> break without the commensurate increase in overhead? By the way, I do
>> think that this loop is the most appropriate place to introduce a
>> threshold.
>>
>> w.r.t. Jeremy's recent interesting suggestions. I think that filtering
>> on the generator may not actually result in StopIteration without
>> computing the values. My actual generator makes a sequence of only the
>> values that I care about. After writing it, I found a posting by Tim
>> Peters from a few years ago at
>> http://code.activestate.com/recipes/218332/ which uses an algorithm
>> similar to mine (for the generator). But, please note that link is
>> pretty far astray from the present questions.
>>
>> Thank you,
>> Hy
>>
>>
>> On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom <[hidden email]> wrote:
>> > On 06/07/2011 12:40 PM, Hy Carrinski wrote:
>> >>
>> >> There is generator that yields tuples in a defined order into a loop
>> >> that performs a calculation. I would like to provide an option to stop
>> >> the calculation when the threshold is reached.
>> >> The function in version 3 is pretty close to my current solution, but
>> >> the functions combinations(), f() and g() are standing in for more
>> >> computationally intensive functions.
>> >
>> > This seems like a perfect example of premature optimization.  You've got
>> > a
>> > loop with two computationally intensive operations per cycle, and you're
>> > worried about optimizing away a single if-equals check per cycle.  Will
>> > that
>> > single if statement really make so much difference once you put the real
>> > (and presumably much more time consuming) functions in place?
>> >
>> > --
>> >       --------------------------------------
>> >      Dirk Bergstrom           [hidden email]
>> >             http://otisbean.com/
>> >
>
>
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Question about breaking out of a loop

Mark Voorhies
On Tuesday, June 07, 2011 05:03:26 pm Hy Carrinski wrote:
> 2. I wrote the gist in order to answer the primary question:
>
>     Is it an antipattern to change a datatype to cause an exception?

A different way to phrase this might be:
   What are reasonable sentinel patterns?

In Python, None is a reasonable sentinel value in a container of references,
in the same way that a null pointer is a reasonable sentinel value in C/C++.

It is also reasonable to use an exception to handle an "exceptional" case of
control flow (encountering the sentinel value), and you've shown that this
doesn't introduce overhead in Python.

So, I don't think there's anything inherently objectionable about your implementation
(comments about premature optimization notwithstanding).  It might be useful to
think of what you're doing as the special case: "marking a reference as null"
rather than the more general and potentially hackier: "changing a datatype".

Mark

_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Question about breaking out of a loop

Hy Carrinski
Thank you for the advice. I have updated the gist to include each of
the suggestions and to serve as a set of examples rather than a
question. Finally, I found that this may be a good application for
itertools.groupby().

https://gist.github.com/1012945

Thanks,
Hy

On Tue, Jun 7, 2011 at 5:22 PM, Mark Voorhies <[hidden email]> wrote:

> On Tuesday, June 07, 2011 05:03:26 pm Hy Carrinski wrote:
>> 2. I wrote the gist in order to answer the primary question:
>>
>>     Is it an antipattern to change a datatype to cause an exception?
>
> A different way to phrase this might be:
>   What are reasonable sentinel patterns?
>
> In Python, None is a reasonable sentinel value in a container of references,
> in the same way that a null pointer is a reasonable sentinel value in C/C++.
>
> It is also reasonable to use an exception to handle an "exceptional" case of
> control flow (encountering the sentinel value), and you've shown that this
> doesn't introduce overhead in Python.
>
> So, I don't think there's anything inherently objectionable about your implementation
> (comments about premature optimization notwithstanding).  It might be useful to
> think of what you're doing as the special case: "marking a reference as null"
> rather than the more general and potentially hackier: "changing a datatype".
>
> Mark
>
> _______________________________________________
> Baypiggies mailing list
> [hidden email]
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies