
12

I am throwing the cat among the pigeons. ;)
In another thread I mentioned that I liked to have tail recursion in
Python. To be clear not automatic, but asked for.
Looking at the replies I did hit a nerve. But I still want to
continue.
Some things are better expressed recursively for the people reading
the code. But there are two problems with that:
 You can get out of stack space
 It is less efficient
Most of the time the first problem is the most important.
When I write factorial (I know it is already written, but I use it as
an example to show a point), the recursive variant can not be called
with 1.000 without tail recursion. So for functions that could go very
deep, tail recursion would be a blessing.
By the way: I think that even if the recursion does not go further as
500, it is still a good idea to use tail recursion. Why use stack
space when it is not necessary?
But to my surprise tail recursion could even be more efficient. I
wrote two different versions of factorial with self implemented tail
recursion. For bigger values both are more efficient. And I expect
that if the tail recursion is done by the compiler instead of by hand,
it will be a little faster still.
This is the output a run of my code:
15:05:50: Start with the time needed to calculate 100000 times
15:05:50: Timing factorial_iterative (985): 32.265420768002514
15:06:22: Timing factorial_recursive (985): 58.381072121992474
15:07:21: Timing factorial_recursive_old (985): 64.46238571999129
15:08:25: Timing factorial_tail_recursion (985): 40.43312480399618
15:09:06: Timing factorial_tail_recursion_old(985): 39.70765891499468
15:09:45: Start with the time needed to calculate 1 times
No recursive, because without tail recursion you would run out of stack space
15:09:45: Timing factorial_iterative (100000): 3.9112528519763146
15:09:49: Timing factorial_tail_recursion (100000): 3.928693111985922
15:09:53: Timing factorial_tail_recursion_old(100000): 4.305187558988109
15:09:58: Timing factorial_iterative (200000): 18.081113666004967
15:10:16: Timing factorial_tail_recursion (200000): 16.660855480993632
15:10:32: Timing factorial_tail_recursion_old(200000): 18.169589380006073
15:10:51: Timing factorial_iterative (300000): 41.79109025900834
15:11:32: Timing factorial_tail_recursion (300000): 38.368264676013496
15:12:11: Timing factorial_tail_recursion_old(300000): 41.646923307009274
15:12:52: Timing factorial_iterative (400000): 78.35287749301642
15:14:11: Timing factorial_tail_recursion (400000): 73.17889478098368
15:15:24: Timing factorial_tail_recursion_old(400000): 89.64840986899799
15:16:53: Timing factorial_iterative (500000): 154.76221033901675
15:19:28: Timing factorial_tail_recursion (500000): 130.3837693700043
15:21:39: Timing factorial_tail_recursion_old(500000): 131.41286378499353
15:23:50: These result show that tail recursion can be interesting
They show also that the way you use tail recursion is important
As said the most important reason is that code is often more elegant
when written recursively, but you cannot do that if it is possible
that the recursion can go very deep. But if recursively code would be
more elegant and faster, then it would really be interesting to have.
I would not opt for automatically using tail recursion, but only when
the programmer says so. And if it is not possible, that should be an
error.
To make sure it was not a fluke, I ran it again:
16:01:30: Start with the time needed to calculate 100000 times
16:01:30: Timing factorial_iterative (985): 31.465190444985637
16:02:01: Timing factorial_recursive (985): 54.562154764978914
16:02:56: Timing factorial_recursive_old (985): 55.56128695001826
16:03:52: Timing factorial_tail_recursion (985): 36.27355203201296
16:04:28: Timing factorial_tail_recursion_old(985): 40.36879472099827
16:05:08: Start with the time needed to calculate 1 times
No recursive, because without tail recursion you would run out of stack space
16:05:08: Timing factorial_iterative (100000): 3.764512833993649
16:05:12: Timing factorial_tail_recursion (100000): 3.8083034529990982
16:05:16: Timing factorial_tail_recursion_old(100000): 4.107901128008962
16:05:20: Timing factorial_iterative (200000): 16.076719653996406
16:05:36: Timing factorial_tail_recursion (200000): 16.108007609989727
16:05:52: Timing factorial_tail_recursion_old(200000): 17.71343147099833
16:06:10: Timing factorial_iterative (300000): 37.82596729800571
16:06:48: Timing factorial_tail_recursion (300000): 40.308226338995155
16:07:28: Timing factorial_tail_recursion_old(300000): 41.254319412022596
16:08:09: Timing factorial_iterative (400000): 77.01277641401975
16:09:26: Timing factorial_tail_recursion (400000): 73.4060631209868
16:10:40: Timing factorial_tail_recursion_old(400000): 80.26402168802451
16:12:00: Timing factorial_iterative (500000): 131.84731978402124
16:14:12: Timing factorial_tail_recursion (500000): 125.31950747498195
16:16:17: Timing factorial_tail_recursion_old(500000): 133.39186109701404
16:18:30: These result show that tail recursion can be interesting
They show also that the way you use tail recursion is important

Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


Op Saturday 2 May 2015 16:20 CEST schreef Cecil Westerhof:
> I am throwing the cat among the pigeons. ;)
>
> In another thread I mentioned that I liked to have tail recursion in
> Python. To be clear not automatic, but asked for.
>
> Looking at the replies I did hit a nerve. But I still want to
> continue.
>
> Some things are better expressed recursively for the people reading
> the code. But there are two problems with that:
>  You can get out of stack space
>  It is less efficient
>
> Most of the time the first problem is the most important.
>
> When I write factorial (I know it is already written, but I use it
> as an example to show a point), the recursive variant can not be
> called with 1.000 without tail recursion. So for functions that
> could go very deep, tail recursion would be a blessing.
>
> By the way: I think that even if the recursion does not go further
> as 500, it is still a good idea to use tail recursion. Why use stack
> space when it is not necessary?
I pushed the example to GitHub:
https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


On Sunday, 3 May 2015 16:23:59 UTC+1, Cecil Westerhof wrote:
> > By the way: I think that even if the recursion does not go further
> > as 500, it is still a good idea to use tail recursion. Why use stack
> > space when it is not necessary?
>
> I pushed the example to GitHub:
> https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.pyYou already know this, as your code shows, but tail call recursion elimination is only possible when you have a *direct* tail call (one with the result of the tail call returned immediately to the caller). Even the standard trivial factorial example doesn't have a direct tail call, without rewriting to use an accumulator variable. Which is a nonintuitive transformation to anyone who's not familiar with recursive functional languages and their idioms.
If you're rewriting your recursive function *anyway*, it's not that much harder in many (most?) cases to rewrite it iteratively.
An example of a function that naturally uses direct tail call recursion, but which doesn't have a simple iterative rewrite, would be more compelling. Not particularly compelling (to me, anyway) even so, but still better than factorial or fibonnaci examples.
Paul


Op Tuesday 5 May 2015 17:47 CEST schreef Paul Moore:
> On Sunday, 3 May 2015 16:23:59 UTC+1, Cecil Westerhof wrote:
>>> By the way: I think that even if the recursion does not go further
>>> as 500, it is still a good idea to use tail recursion. Why use
>>> stack space when it is not necessary?
>>
>> I pushed the example to GitHub:
>> https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py>
> You already know this, as your code shows, but tail call recursion
> elimination is only possible when you have a *direct* tail call (one
> with the result of the tail call returned immediately to the
> caller). Even the standard trivial factorial example doesn't have a
> direct tail call, without rewriting to use an accumulator variable.
> Which is a nonintuitive transformation to anyone who's not familiar
> with recursive functional languages and their idioms.
>
> If you're rewriting your recursive function *anyway*, it's not that
> much harder in many (most?) cases to rewrite it iteratively.
>
> An example of a function that naturally uses direct tail call
> recursion, but which doesn't have a simple iterative rewrite, would
> be more compelling. Not particularly compelling (to me, anyway) even
> so, but still better than factorial or fibonnaci examples.
Well, I did not write many tail recursive functions. But what surprised
me was that for large values the ?tail recursive? version was more
efficient as the iterative version. And that was with myself
implementing the tail recursion. I expect the code to be more
efficient when the compiler implements the tail recursion.

Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


On 05/05/2015 12:18 PM, Cecil Westerhof wrote:
>
> Well, I did not write many tail recursive functions. But what surprised
> me was that for large values the ?tail recursive? version was more
> efficient as the iterative version. And that was with myself
> implementing the tail recursion. I expect the code to be more
> efficient when the compiler implements the tail recursion.
>
You've said that repeatedly, so I finally took a look at your webpage
https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.pyI didn't have your framework to call the code, so I just extracted some
functions and did some testing. I do see some differences, where the
socalled tail_recursive functions are sometimes faster, but I did some
investigating to try to determine why.
I came up with the conclusion that sometimes the multiply operation
takes longer than other times. And in particular, i can see more
variation between the two following loops than between your two functions.
def factorial_iterative(x, simple=False):
assert x >= 0
result = 1
j=2
if not simple:
for i in range(2, x + 1):
result *= i
j += 1
else:
for i in range(2, x + 1):
result *= j
j += 1
pass
return result
When the "simple" is True, the function takes noticeably and
consistently longer. For example, it might take 116 instead of 109
seconds. For the same counts, your code took 111.
I've looked at dis.dis(factorial_iterative), and can see no explicit
reason for the difference.

DaveA


Op Tuesday 5 May 2015 20:45 CEST schreef Dave Angel:
> On 05/05/2015 12:18 PM, Cecil Westerhof wrote:
>
>>
>> Well, I did not write many tail recursive functions. But what
>> surprised me was that for large values the ?tail recursive? version
>> was more efficient as the iterative version. And that was with
>> myself implementing the tail recursion. I expect the code to be
>> more efficient when the compiler implements the tail recursion.
>>
>
>
> You've said that repeatedly, so I finally took a look at your
> webpage
>
> https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py>
> I didn't have your framework to call the code, so I just extracted
> some functions and did some testing.
I definitely need to take care of documentation.
It can be called with:
python3 mathDecebal.py factorial
The problem is that it will do correctness and speed test. I have to
split those in two different things. And use a different file for
both.
Maybe make a directory test and put a correctness_<function>.py and a
speed_<function>.py.
> I do see some differences,
> where the socalled tail_recursive functions are sometimes faster,
> but I did some investigating to try to determine why.
>
>
> I came up with the conclusion that sometimes the multiply operation
> takes longer than other times. And in particular, i can see more
> variation between the two following loops than between your two
> functions.
>
>
> def factorial_iterative(x, simple=False):
> assert x >= 0
> result = 1
> j=2
> if not simple:
> for i in range(2, x + 1):
> result *= i
> j += 1
> else:
> for i in range(2, x + 1):
> result *= j
> j += 1
> pass
>
> return result
>
> When the "simple" is True, the function takes noticeably and
> consistently longer. For example, it might take 116 instead of 109
> seconds. For the same counts, your code took 111.
>
> I've looked at dis.dis(factorial_iterative), and can see no explicit
> reason for the difference.
I would say that a variable that is filled by a range is different as
a normal variable. Do not ask me why. ;)
Even if you (general not personal you) think that the tail recursion
is a waist of time, this is an interesting result I think.

Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


On Tue, May 5, 2015 at 12:45 PM, Dave Angel <davea at davea.name> wrote:
> When the "simple" is True, the function takes noticeably and consistently
> longer. For example, it might take 116 instead of 109 seconds. For the
> same counts, your code took 111.
I can't replicate this. What version of Python is it, and what value
of x are you testing with?
> I've looked at dis.dis(factorial_iterative), and can see no explicit reason
> for the difference.
My first thought is that maybe it's a result of the branch. Have you
tried swapping the branches, or reimplementing as separate functions
and comparing?


On 5/5/2015 12:18 PM, Cecil Westerhof wrote:
> Op Tuesday 5 May 2015 17:47 CEST schreef Paul Moore:
>
>> On Sunday, 3 May 2015 16:23:59 UTC+1, Cecil Westerhof wrote:
>>>> By the way: I think that even if the recursion does not go further
>>>> as 500, it is still a good idea to use tail recursion. Why use
>>>> stack space when it is not necessary?
>>>
>>> I pushed the example to GitHub:
>>> https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py>>
>> You already know this, as your code shows, but tail call recursion
>> elimination is only possible when you have a *direct* tail call (one
An 'indirect tail call' would not be a tail call.
>> with the result of the tail call returned immediately to the
>> caller). Even the standard trivial factorial example doesn't have a
>> direct tail call, without rewriting to use an accumulator variable.
>> Which is a nonintuitive transformation to anyone who's not familiar
>> with recursive functional languages and their idioms.
>>
>> If you're rewriting your recursive function *anyway*, it's not that
>> much harder in many (most?) cases to rewrite it iteratively.
For count functions, the main change between tail recursion and while
iteration is replacing 'if' with 'while' and converting the tail call to
assignment. (One may have to reverse the ifelse first to put the tail
call in the if branch.)
from math import factorial as fac
print(fac(0), fac(1), fac(2), fac(6))
def fac_rt(n, i=2, res=1):
if i <= n:
return fac_rt(n, i+1, res*i)
else:
return res
def fac_iw(n):
i = 2
res = 1
while i <= n:
i, res = i+1, res*i
return res
for i in (0, 1, 2, 6):
print(fac(i) == fac_rt(i) == fac_iw(i))
>>>
1 1 2 720
True
True
True
True
For collection functions that process each item once, 'for item in
collection: ...' is nearly always easier to write in the first place.
>> An example of a function that naturally uses direct tail call
>> recursion, but which doesn't have a simple iterative rewrite, would
>> be more compelling.
Simple, easily converted functions like the above, with one recursive
call in one branch of an ifelse, are the most common. Something with
multiple mutually exclusive tail calls, like the following
def f_rt1(*args):
if nonbase1:
return f(*revisedargs1)
elif base_condition:
return base(args)
else:
return f(*revisedargs2)
must be converted to ifelse with all tail calls in the if branch.
def f_rt2(*args):
if not base_condition:
if nonbase1:
return f(*revisedargs1)
else:
return f(*revisedargs2)
else:
return base(args)
Conversion would then be simple. The complication is that the
'base_condition' in the original version might not really be the base
condition due to a dependence on nonbase1 being false. This is a
generic problem with reordering ifelif statements.
For nonlinear (branching) recursion, in which multiple recursive calls
may be made for one function call, the last recursive call may be a tail
call. An example is inplace quick sort. Eliminating just the tail
call may not be simple, but it also does not eliminate the possibility
of blowing the call stack. To do that, one must eliminate all recursive
calls by adding explicit stacks.
> Well, I did not write many tail recursive functions. But what surprised
> me was that for large values the ?tail recursive? version was more
> efficient as the iterative version.
In your first thread, what you mislabelled 'tail recursive version' was
an iterative while loop version while the 'iterative version' was an
iterative for loop version. In this thread, you just posted timings
without code. I will not believe your claim until I see one file that I
can run myself with an actual tail recursive function, as above, that
beats the equivalent while or for loop version.

Terry Jan Reedy


On 05/05/2015 04:30 PM, Ian Kelly wrote:
> On Tue, May 5, 2015 at 12:45 PM, Dave Angel <davea at davea.name> wrote:
>> When the "simple" is True, the function takes noticeably and consistently
>> longer. For example, it might take 116 instead of 109 seconds. For the
>> same counts, your code took 111.
>
> I can't replicate this. What version of Python is it, and what value
> of x are you testing with?
>
>> I've looked at dis.dis(factorial_iterative), and can see no explicit reason
>> for the difference.
>
> My first thought is that maybe it's a result of the branch. Have you
> tried swapping the branches, or reimplementing as separate functions
> and comparing?
>
Logic is quite simple:
def factorial_iterative(x, simple=False):
assert x >= 0
result = 1
j=2
if not simple:
for i in range(2, x + 1):
#print("range value is of type", type(i), "and value", i)
#print("ordinary value is of type", type(j), "and value", j)
result *= i
j += 1
else:
for i in range(2, x + 1):
result *= j
j += 1
return result
def loop(func, funcname, arg):
start = time.time()
for i in range(repeats):
func(arg, True)
print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
start = time.time()
for i in range(repeats):
func(arg)
print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
repeats = 1
and arg is 10**4
loop(factorial_iterative, "factorial_iterative ", arg)
My actual program does the same thing with other versions of the
function, including Cecil's factorial_tail_recursion, and my optimized
version of that.
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
factorial_iterative (100000) took 3.807
factorial_iterative (100000) took 3.664
factorial_iterative (200000) took 17.07
factorial_iterative (200000) took 15.3
factorial_iterative (300000) took 38.93
factorial_iterative (300000) took 36.01
Note that I test them in the opposite order of where they appear in the
function. That's because I was originally using the simple flag to test
an empty loop. The empty loop is much quicker either way, so it's not
the issue. (But if it were, the forrange version is much quicker).
I think I'll take your last suggestion and write separate functions.

DaveA


Op Tuesday 5 May 2015 22:46 CEST schreef Terry Reedy:
>> Well, I did not write many tail recursive functions. But what
>> surprised me was that for large values the ?tail recursive? version
>> was more efficient as the iterative version.
>
> In your first thread, what you mislabelled 'tail recursive version'
> was an iterative while loop version
That is because Python has no tail recursion, so I needed to program
the tail recursion myself. Tail recursion would do under the hood what
I did there manually.
> while the 'iterative version'
> was an iterative for loop version. In this thread, you just posted
> timings without code. I will not believe your claim until I see one
> file that I can run myself with an actual tail recursive function,
> as above, that beats the equivalent while or for loop version.
https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof


On Tue, May 5, 2015 at 3:00 PM, Dave Angel <davea at davea.name> wrote:
> def loop(func, funcname, arg):
> start = time.time()
> for i in range(repeats):
> func(arg, True)
> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
>
> start = time.time()
> for i in range(repeats):
> func(arg)
> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
Note that you're explicitly passing True in one case but leaving the
default in the other. I don't know whether that might be responsible
for the difference you're seeing.
Also, it's best to use the timeit module for timing code, e.g.:
>>> from timeit import Timer
>>> t1 = Timer("factorial_iterative(100000, False)", "from __main__ import factorial_iterative")
>>> t1.repeat(10, number=1)
[3.8517245299881324, 3.7571076710009947, 3.7780062559759244,
3.848508063936606, 3.7627131739864126, 3.8278848479967564,
3.776115525048226, 3.83024005102925, 3.8322679550619796,
3.8195601429324597]
>>> min(_), sum(_) / len(_)
(3.7571076710009947, 3.8084128216956743)
>>> t2 = Timer("factorial_iterative(100000, True)", "from __main__ import factorial_iterative")
>>> t2.repeat(10, number=1)
[3.8363616950809956, 3.753201302024536, 3.7838632150087506,
3.7670978900277987, 3.805312803015113, 3.7682680500438437,
3.856655619922094, 3.796431727008894, 3.8224815409630537,
3.765664782957174]
>>> min(_), sum(_) / len(_)
(3.753201302024536, 3.7955338626052253)
As you can see, in my testing the True case was actually marginally
(probably not significantly) faster in both the min and the average.


On Tue, May 5, 2015 at 3:23 PM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> On Tue, May 5, 2015 at 3:00 PM, Dave Angel <davea at davea.name> wrote:
>> def loop(func, funcname, arg):
>> start = time.time()
>> for i in range(repeats):
>> func(arg, True)
>> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
>>
>> start = time.time()
>> for i in range(repeats):
>> func(arg)
>> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
>
> Note that you're explicitly passing True in one case but leaving the
> default in the other. I don't know whether that might be responsible
> for the difference you're seeing.
I don't think that's the cause, but I do think that it has something
to do with the way the timing is being run. When I run your loop
function, I do observe the difference. If I reverse the order so that
the False case is tested first, I observe the opposite. That is, the
slower case is consistently the one that is timed *first* in the loop
function, regardless of which case that is.


On 05/05/2015 05:39 PM, Ian Kelly wrote:
> On Tue, May 5, 2015 at 3:23 PM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
>> On Tue, May 5, 2015 at 3:00 PM, Dave Angel <davea at davea.name> wrote:
>>> def loop(func, funcname, arg):
>>> start = time.time()
>>> for i in range(repeats):
>>> func(arg, True)
>>> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
>>>
>>> start = time.time()
>>> for i in range(repeats):
>>> func(arg)
>>> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()start))
>>
>> Note that you're explicitly passing True in one case but leaving the
>> default in the other. I don't know whether that might be responsible
>> for the difference you're seeing.
>
> I don't think that's the cause, but I do think that it has something
> to do with the way the timing is being run. When I run your loop
> function, I do observe the difference. If I reverse the order so that
> the False case is tested first, I observe the opposite. That is, the
> slower case is consistently the one that is timed *first* in the loop
> function, regardless of which case that is.
>
I created two functions and called them with Timeit(), and the
difference is now below 3%
And when I take your lead and double the loop() function so it runs each
test twice, I get steadily decreasing numbers.
I think all of this has been noise caused by the caching of objects
including function objects. I was surprised by this, as the loops are
small enough I'd figure the function object would be fully cached the
first time it was called.

DaveA


On 5/5/2015 5:12 PM, Cecil Westerhof wrote:
> Op Tuesday 5 May 2015 22:46 CEST schreef Terry Reedy:
>
>>> Well, I did not write many tail recursive functions. But what
>>> surprised me was that for large values the ?tail recursive? version
>>> was more efficient as the iterative version.
>>
>> In your first thread, what you mislabelled 'tail recursive version'
>> was an iterative while loop version
>
> That is because Python has no tail recursion,
Yes is does. Please stop using 'tail recursion' to refer to the absence
of recursion or the process of removing recursion. I wrote a tail
recursive function, and I believe you did too.
What Python does not have is automatic tail call optimization, or
specifically, tail recursion *elimination*. Both possibilities, the
general and specific, have been considered and rejected for excellent
reasons. I hinted at some of them in my post.
See https://en.wikipedia.org/wiki/Tail_callfor 'tail recursion' and 'tail call elimination'
What I believe you showed is that a while loop can be faster than a more
or less equivalent for loop that produces the same result by a slightly
different method. That is not surprising. Relative timings for CPython
vary a few percent between different combinations of Python version, C
compiler and settings, operating system, and cpu and other hardware.

Terry Jan Reedy


On Wed, 6 May 2015 07:23 am, Ian Kelly wrote:
> On Tue, May 5, 2015 at 3:00 PM, Dave Angel <davea at davea.name> wrote:
>> def loop(func, funcname, arg):
>> start = time.time()
>> for i in range(repeats):
>> func(arg, True)
>> print("{0}({1}) took {2:7.4}".format(funcname, arg,
>> time.time()start))
>>
>> start = time.time()
>> for i in range(repeats):
>> func(arg)
>> print("{0}({1}) took {2:7.4}".format(funcname, arg,
>> time.time()start))
>
> Note that you're explicitly passing True in one case but leaving the
> default in the other. I don't know whether that might be responsible
> for the difference you're seeing.
It will be responsible for *some* difference, it is certainly faster to pass
one argument than two, but likely an entirely trivial amount compared to
the time to iterate some large number of times.
> Also, it's best to use the timeit module for timing code, e.g.:
>
>>>> from timeit import Timer
>>>> t1 = Timer("factorial_iterative(100000, False)", "from __main__ import
>>>> factorial_iterative") t1.repeat(10, number=1)
> [3.8517245299881324, 3.7571076710009947, 3.7780062559759244,
> 3.848508063936606, 3.7627131739864126, 3.8278848479967564,
> 3.776115525048226, 3.83024005102925, 3.8322679550619796,
> 3.8195601429324597]
>>>> min(_), sum(_) / len(_)
> (3.7571076710009947, 3.8084128216956743)
Only the minimum is statistically useful.
The reason is that every timing measurement has an error, due to random
fluctuations of the OS, CPU pipelining, caches, etc, but that error is
always positive. The noise always makes the code take longer to run, not
faster!
So we can imagine every measurement is made up of the "true" value T, plus
an error, where T = the perfectly repeatable timing that the function would
take to run if we could somehow eliminate every possible source of noise
and error in the system. We can't, of course, but we would like to estimate
T as closely as possible.
Without loss of generality, suppose we collect five timings:
times = [T+?1, T+?2, T+?3, T+?4, T+?5]
where the epsilons ? are unknown errors due to noise. We don't know the
distribution of the errors, except that no epsilon can be less than zero.
We want an estimate for T, something which comes as close as possible to T.
It should be obvious that the following relationships hold:
mean(times) = T + mean([?1, ?2, ?3, ?4, ?5])
min(times) = T + min([?1, ?2, ?3, ?4, ?5])
in other words, both the mean of the timings and the minimum of the timings
are equivalent to the "true" timing, plus an error. It should also be
obvious that the mean of the epsilons must be larger than the smallest of
the errors (except in the miraculous case where all the epsilons are
exactly the same).
So the statistic with the minimum error is the minimum. Any other stat,
whether you use the mean, geometric mean, harmonic mean, mode, median, or
anything else, will have a larger error and be a worse estimate for
the "true" value T.
All because the errors are onesided: they always make the timing take
longer, never less time.

Steven


On Wed, 6 May 2015 02:18 am, Cecil Westerhof wrote:
> Well, I did not write many tail recursive functions. But what surprised
> me was that for large values the ?tail recursive? version was more
> efficient as the iterative version. And that was with myself
> implementing the tail recursion. I expect the code to be more
> efficient when the compiler implements the tail recursion.
You cannot know that. Python is not C or Java, and your intuition as to what
will be fast and what will be slow will not be accurate in Python. Python
has its own fast paths, and slow paths, and they are not the same as C's or
Java's.
I have been using Python for over 15 years, and I would not want to guess
whether a hypothetical compilergenerated tailrecursionelimination
function would be faster or slower than manually unrolling the recursion
into a while loop.
I am surprised that you find a manually unrolled version with a while loop
is faster than an iterative version. I assume the iterative version uses a
forloop. In my experience, a forloop is faster than a while loop.
But perhaps that depends on the exact version and platform.

Steven


On Wed, 6 May 2015 05:42 am, Cecil Westerhof wrote:
> I would say that a variable that is filled by a range is different as
> a normal variable. Do not ask me why. ;)
I would say that you are wrong. If I have understood you correctly, that
cannot possibly be the case in Python, all Python variables work the same
way[1], and none of them can remember where they came from.
[1] Well, technically local variables and global variables are implemented
differently, locals inside a function use a C array and globals use a dict,
but apart from a few implementation details like that, any two variables in
the same scope[2] operate identically.
[2] Anyone who raises the issue of exec() or import * inside a function body
in Python 2 will be slapped with a halibut.

Steven


On Wed, May 6, 2015 at 12:57 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Wed, 6 May 2015 05:42 am, Cecil Westerhof wrote:
>
>> I would say that a variable that is filled by a range is different as
>> a normal variable. Do not ask me why. ;)
>
>
> I would say that you are wrong. If I have understood you correctly, that
> cannot possibly be the case in Python, all Python variables work the same
> way[1], and none of them can remember where they came from.
My reading of the code was that one had two variables and the other
had just one. Could be the noise in the numbers came from cache
locality differences.
No difference between the variables, just a peculiarity of CPUs and
how they use memory.
ChrisA


On Sun, 3 May 2015 12:20 am, Cecil Westerhof wrote:
[...]
> But to my surprise tail recursion could even be more efficient. I
> wrote two different versions of factorial with self implemented tail
> recursion. For bigger values both are more efficient. And I expect
> that if the tail recursion is done by the compiler instead of by hand,
> it will be a little faster still.
I decided to do some experimentation.
Firstly, when timing code, one should minimise the amount of "scaffolding"
used around the code you care about. Using a single function like this:
def factorial_iterative(x, flag):
if flag:
# first implementation
else:
# second implementation
is a bad idea, because you are not just timing the implementations, but also
the scaffolding that selects between them. Also, you increase the size of
the function, which increases the chance of cache misses and other
processor effects. So first step is to split the implementations into
separate functions. Here are my four implementations, the first two are
taken from your code:
def factorial_forloop1(n):
assert n >= 0
result = 1
for i in range(2, n + 1):
result *= i
return result
def factorial_forloop2(n):
# Version with a silly extra variable.
assert n >= 0
result = 1
j=2
for i in range(2, n + 1):
result *= j
j += 1
return result
from operator import mul
try:
reduce
except NameError:
from functools import reduce
def factorial_reduce(n):
assert n >= 0
return reduce(mul, range(2, n+1), 1)
def factorial_while(n):
assert n >= 0
result = 1
while n > 1:
result *= n
n = 1
return result
A quick test to verify that they return the same results:
py> factorial_while(10)
3628800
py> factorial_reduce(10)
3628800
py> factorial_forloop1(10)
3628800
py> factorial_forloop2(10)
3628800
There's no point in optimising code that does the wrong thing!
Now, let's do some timing tests. It is best to use a welltested timing
framework rather than invent your own dodgy one, so I use the timeit
module. Read the comments in the module to see why rolling your own is a
bad idea.
I'll start with some relative small input sizes. Remember, the inputs may be
small, but the outputs will be very large.
from timeit import Timer
code = "fact(10); fact(20); fact(30)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
I'm in a bit of a hurry, so I may be a bit more slapdash than normal.
Normally, I would pick a larger number of trials, and a larger number of
iterations per trial, but here I'm going to use just best of three trials,
each of 10000 iterations each:
for t in (t1, t2, t3, t4):
print(min(t.repeat(repeat=3, number=10000)))
which prints:
0.22797810286283493 # while
0.17145151272416115 # reduce
0.16230526939034462 # forloop
0.22819281555712223 # silly forloop
(Comments added by me.) See my earlier post for why the minimum is the only
statistic worth looking at. These results show that:
 the version using while is significantly slower than the more
straightforward iterative versions using either reduce or a for loop.
 adding an extra, unnecessary, variable to the forloop, and incrementing
it by hand, carries as much overhead as using a while loop;
 reduce is slightly slower than a pure Python forloop (at least according
to this simple trial, on my computer  your results may vary);
 the obvious winner is the straightforward iterative version with a
forloop.
Now I'm going to test it with a larger input:
py> big = factorial_forloop1(50000)
py> sys.getsizeof(big) # number of bytes
94460
py> len(str(big)) # number of digits
213237
Recreate the timer objects, and (again, because I'm in something of a hurry)
do a bestofthree with just 2 iterations per trial.
code = "fact(50000)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
for t in (t1, t2, t3, t4):
print(min(t.repeat(repeat=3, number=2)))
which takes about two minutes on my computer, and prints:
8.604736926034093 # while loop
10.786483339965343 # reduce
10.91099695302546 # for loop
10.821452282369137 # silly version of the for loop
(Again, annotations are by me.)
These results are fascinating, and rather surprising. I think that they
demonstrate that, at this size of input argument, the time is dominated by
processing the BigNum int objects, not the iteration: whether we use
reduce, a straightforward forloop, or a forloop with an extra variable
makes very little difference, of the order of 1%.
(I wouldn't read too much into the fact that the forloop which does *more*
work, but manually adjusting a second variable, is slightly faster. Unless
that can be proven to be consistently the case, I expect that is just
random noise. I ran some quick trials, and did not replicate that result:
py> for t in (t3, t4, t3, t4):
... print(min(t.repeat(repeat=3, number=1)))
...
5.282072028145194 # forloop
5.415546240285039 # silly forloop
5.358346642926335 # forloop
5.328046130016446 # silly forloop
Note that, as expected, doing two iterations per trial takes about twice as
long as one iteration per trial: 5 seconds versus 10.
What is surprising is that for very large input like this, the while loop is
significantly faster than reduce or either of the forloops. I cannot
explain that result.
(Just goes to show that timing code in Python can surprise you when you
least expect it.)
Oh, and for the record, I'm using Python 3.3 on Linux:
py> sys.version
'3.3.0rc3 (default, Sep 27 2012, 18:44:58) \n[GCC 4.1.2 20080704 (Red Hat
4.1.252)]'
Results may vary on other platforms and versions.

Steven


On Tue, May 5, 2015 at 7:27 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> Only the minimum is statistically useful.
I disagree. The minimum tells you how fast the code *can* run, under
optimal circumstances. The mean tells you how fast it *realistically*
runs, under typical load. Both can be useful to measure.

12
