Challenge: optimizing isqrt

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Steven D'Aprano-11
There is an algorithm for calculating the integer square root of any
positive integer using only integer operations:

def isqrt(n):
    if n < 0: raise ValueError
    if n == 0:
        return 0
    bits = n.bit_length()
    a, b = divmod(bits, 2)
    x = 2**(a+b)
    while True:
        y = (x + n//x)//2
        if y >= x:
            return x
        x = y

This returns the integer part of the square root of n, that is, the greatest
whole number less than or equal to the square root of n:

py> isqrt(35)
5
py> isqrt(36)
6


That makes it equivalent to int(math.sqrt(n)), which also happens to be
much, much faster, at least for small values of n. However, for large
values of n, using floating point intermediate calculations fail:

py> import math
py> int(math.sqrt(2**3000))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float

Another problem is that, above a certain size, the resolution of floats is
larger than 1, so you can't convert every int into a float without loss:

py> float(2**90-1) == 2**90-1
False

which means that using math.sqrt is not correct:

py> isqrt(2**90-1)
35184372088831
py> int(math.sqrt(2**90-1))  # Off by one.
35184372088832


So, the challenge is to identify when it is safe to optimise isqrt(n) as
int(math.sqrt(n)):

Q1: What is the largest value of M, such that

all(isqrt(i) == int(math.sqrt(n)) for n in range(M))

returns True?

I have done a brute force test, and in nine hours confirmed that M is at
least 7627926244. That took nine hours, and I expect that a brute force
test of every int representable as a float would take *months* of
processing time.

Q2: For values above M, is there a way of identifying which values of n are
okay to use the optimized version?

Q3: What is the largest value of n beyond which you can never use the float
optimization?


You can assume that Python floats are IEEE-754 C doubles, and that
math.sqrt() is correctly rounded.


--
Steven



Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Chris Angelico
On Sat, Nov 1, 2014 at 12:29 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:

> Q1: What is the largest value of M, such that
>
> all(isqrt(i) == int(math.sqrt(n)) for n in range(M))
>
> returns True?
>
> I have done a brute force test, and in nine hours confirmed that M is at
> least 7627926244. That took nine hours, and I expect that a brute force
> test of every int representable as a float would take *months* of
> processing time.

Here's something that should be faster than pure brute-force:

import math
for i in range(2**34): # 2**34 should take about nine hours
    sqrt = int(math.sqrt(i))
    if sqrt * sqrt > i:
        print("Too high at %d" % i)
        break
    if (sqrt+1) * (sqrt+1) <= i:
        print("Too low at %d" % i)
        break

There are two ways int(math.sqrt(i)) can be wrong: it can return a
number that's too high, or one that's too low. If it's too high,
squaring it (btw, int*int seems to be faster than int**2) will produce
a number higher than the original. If the result was too low, then the
next number up would have been within the correct bounds. There's no
need to actually calculate isqrt here.

I was able to probe up to 2**29 in seven minutes on my computer
(allocating one core, CPython 3.5 from trunk). If it's linear, going
as far as 2**34 should take no more than the nine hours you spent. I
expect time will be a bit worse than linear (working with small
numbers is marginally faster than working with big numbers), but if
it's at least reasonably close, probing as far as 2**53 would take
81555 days. So, uhh, a brute force by your method is going to take a
lot more than months. (And 2**53 isn't technically where ints stop
being representable, but I'd use that as the final cut-off, as it's
where *all* ints stop being representable. In any case, there's a
failure at 2**54-1, so there's no need to go past there.)

Hmm. The speed difference between your brute force and mine isn't all
that much. I'm estimating maybe doing at best twice as much in the
same time. So you're probably already using something similar to the
above, and this isn't adding much to the discussion. :(

ChrisA


Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Christian Gollwitzer
In reply to this post by Steven D'Aprano-11
Hi Steven,

let me start by answering from reverse:
 > Q3: What is the largest value of n beyond which you can never use the
float
 > optimization?
 >

A3: There is no such value, besides the upper limit of floats (DBL_MAX~
10^308)

P3: If you feed a perfect square into the floating point square root
algorithm, with a mantissa of the root of length smaller than the
bitwidth of your float, it will always come out perfectly. I.e.,
computing sqrt(25) in FP math is no different from sqrt(25*2**200):

 >>> 25*2**200
40173451106474756888549052308529065063055074844569820882534400L
 >>> x=int(math.sqrt(25*2**200))
 >>> x
6338253001141147007483516026880L
 >>> x*x
40173451106474756888549052308529065063055074844569820882534400L
 >>>



Am 01.11.14 02:29, schrieb Steven D'Aprano:

> There is an algorithm for calculating the integer square root of any
> positive integer using only integer operations:
>
> def isqrt(n):
>      if n < 0: raise ValueError
>      if n == 0:
>          return 0
>      bits = n.bit_length()
>      a, b = divmod(bits, 2)
>      x = 2**(a+b)
>      while True:
>          y = (x + n//x)//2
>          if y >= x:
>              return x
>          x = y

> Q2: For values above M, is there a way of identifying which values of n are
> okay to use the optimized version?

A2: Do it in a different way.

Your above algorithm is obviously doing Heron- or Newton-Raphson
iterations, so the same as with floating point math. The first line
before the while loop computes some approximation to sqrt(n). Instead of
doing bit shuffling, you could compute this by FP math and get closer to
the desired result, unless the integer is too large to be represented by
FP. Now, the terminating condition seems to rely on the fact that the
initial estimate x>=sqrt(n), but I don't really understand it. My guess
is that if you do x=int(sqrt(n)), then do the first iteration, then swap
x and y such that x>y, then enter the loop, you would simply start with
a better estimate in case that the significant bits can be represented
by the float.

So this is my try, but not thoroughly tested:

def isqrt(n):
        if n < 0: raise ValueError
        if n == 0:
                return 0
        bits = n.bit_length()
        # the highest exponent in 64bit IEEE is 1023
        if n>2**1022:
                a, b = divmod(bits, 2)
                x = 2**(a+b)
        else:
                x=int(math.sqrt(n))
                y=n//x
                if x<y:
                        x,y = (y,x)

        while True:
                y = (x + n//x)//2
                if y >= x:
                        return x
                x = y


Christian






Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Christian Gollwitzer
Addendum: If my method below works, you can also use it to speed up
computations for n>2*1022, by splitting off an even power of two from
the integer and computing the FP sqrt of the mantissa for the seed, i.e.
doing the FP manually.

Am 01.11.14 09:02, schrieb Christian Gollwitzer:

> Hi Steven,
>
> let me start by answering from reverse:
>  > Q3: What is the largest value of n beyond which you can never use the
> float
>  > optimization?
>  >
>
> A3: There is no such value, besides the upper limit of floats (DBL_MAX~
> 10^308)
>
> P3: If you feed a perfect square into the floating point square root
> algorithm, with a mantissa of the root of length smaller than the
> bitwidth of your float, it will always come out perfectly. I.e.,
> computing sqrt(25) in FP math is no different from sqrt(25*2**200):
>
>  >>> 25*2**200
> 40173451106474756888549052308529065063055074844569820882534400L
>  >>> x=int(math.sqrt(25*2**200))
>  >>> x
> 6338253001141147007483516026880L
>  >>> x*x
> 40173451106474756888549052308529065063055074844569820882534400L
>  >>>
>
>
>
> Am 01.11.14 02:29, schrieb Steven D'Aprano:
>> There is an algorithm for calculating the integer square root of any
>> positive integer using only integer operations:
>>
>> def isqrt(n):
>>      if n < 0: raise ValueError
>>      if n == 0:
>>          return 0
>>      bits = n.bit_length()
>>      a, b = divmod(bits, 2)
>>      x = 2**(a+b)
>>      while True:
>>          y = (x + n//x)//2
>>          if y >= x:
>>              return x
>>          x = y
>
>> Q2: For values above M, is there a way of identifying which values of
>> n are
>> okay to use the optimized version?
>
> A2: Do it in a different way.
>
> Your above algorithm is obviously doing Heron- or Newton-Raphson
> iterations, so the same as with floating point math. The first line
> before the while loop computes some approximation to sqrt(n). Instead of
> doing bit shuffling, you could compute this by FP math and get closer to
> the desired result, unless the integer is too large to be represented by
> FP. Now, the terminating condition seems to rely on the fact that the
> initial estimate x>=sqrt(n), but I don't really understand it. My guess
> is that if you do x=int(sqrt(n)), then do the first iteration, then swap
> x and y such that x>y, then enter the loop, you would simply start with
> a better estimate in case that the significant bits can be represented
> by the float.
>
> So this is my try, but not thoroughly tested:
>
> def isqrt(n):
>      if n < 0: raise ValueError
>      if n == 0:
>          return 0
>      bits = n.bit_length()
>      # the highest exponent in 64bit IEEE is 1023
>      if n>2**1022:
>          a, b = divmod(bits, 2)
>          x = 2**(a+b)
>      else:
>          x=int(math.sqrt(n))
>          y=n//x
>          if x<y:
>              x,y = (y,x)
>
>      while True:
>          y = (x + n//x)//2
>          if y >= x:
>              return x
>          x = y
>
>
> Christian
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Chris Angelico
In reply to this post by Christian Gollwitzer
On Sat, Nov 1, 2014 at 7:02 PM, Christian Gollwitzer <auriocus at gmx.de> wrote:

> Your above algorithm is obviously doing Heron- or Newton-Raphson iterations,
> so the same as with floating point math. The first line before the while
> loop computes some approximation to sqrt(n). Instead of doing bit shuffling,
> you could compute this by FP math and get closer to the desired result,
> unless the integer is too large to be represented by FP. Now, the
> terminating condition seems to rely on the fact that the initial estimate
> x>=sqrt(n), but I don't really understand it. My guess is that if you do
> x=int(sqrt(n)), then do the first iteration, then swap x and y such that
> x>y, then enter the loop, you would simply start with a better estimate in
> case that the significant bits can be represented by the float.

Part of the point of that algorithm is that it never uses FP, and is
therefore not limited by FP restrictions. As to the assumption that
x>=sqrt(n), that would be safe: if the bit length is even (that is,
it's between an odd power of 2 and an even one - for example, 2**13 <
16000 <= 2**14), the initial estimate is the exact square root of the
power of two at the top of that range (bit length of 14 means x is
2**7, 128 == sqrt(16384)); if the bit length is odd (eg 2**8 < 300 <=
2**9), the initial estimate rounds the halving upward (bit length of 9
means x is 2**(9//2+1), 32 > sqrt(512)). So there's a guarantee that
the initial estimate is no lower than the target number.

ChrisA


Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Christian Gollwitzer
In reply to this post by Christian Gollwitzer
Am 01.11.14 09:13, schrieb Chris Angelico:

> On Sat, Nov 1, 2014 at 7:02 PM, Christian Gollwitzer <auriocus at gmx.de> wrote:
>> Your above algorithm is obviously doing Heron- or Newton-Raphson iterations,
>> so the same as with floating point math. The first line before the while
>> loop computes some approximation to sqrt(n). Instead of doing bit shuffling,
>> you could compute this by FP math and get closer to the desired result,
>> unless the integer is too large to be represented by FP. Now, the
>> terminating condition seems to rely on the fact that the initial estimate
>> x>=sqrt(n), but I don't really understand it. My guess is that if you do
>> x=int(sqrt(n)), then do the first iteration, then swap x and y such that
>> x>y, then enter the loop, you would simply start with a better estimate in
>> case that the significant bits can be represented by the float.
>
> Part of the point of that algorithm is that it never uses FP, and is
> therefore not limited by FP restrictions.

which are???




Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Chris Angelico
On Sat, Nov 1, 2014 at 7:25 PM, Christian Gollwitzer <auriocus at gmx.de> wrote:
>> Part of the point of that algorithm is that it never uses FP, and is
>> therefore not limited by FP restrictions.
>
>
> which are???

Most notably, the inability to represent every integer beyond 2**53,
and the inability to represent any integer beyond 2**1024. This
algorithm should work fine with any positive integer.

ChrisA


Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Christian Gollwitzer
In reply to this post by Christian Gollwitzer
Am 01.11.14 09:33, schrieb Chris Angelico:

> On Sat, Nov 1, 2014 at 7:25 PM, Christian Gollwitzer <auriocus at gmx.de> wrote:
>>> Part of the point of that algorithm is that it never uses FP, and is
>>> therefore not limited by FP restrictions.
>>
>>
>> which are???
>
> Most notably, the inability to represent every integer beyond 2**53,
> and the inability to represent any integer beyond 2**1024. This
> algorithm should work fine with any positive integer.
>
  OK so you did not bother to look at my proposed alternative
implementation. If I understood Steven correctly, he wanted to speed up
the original isqrt algorithm by using FP when this is possible. I have
shown how to do it for n<2**1022 (maybe 2**1024, I'm to lean to check
it). I admit that there is some microoptimizatio left, e.g. the first
division is done twice, the comparison should be bits>1022, not n>2*1022
etc.

        Christian



Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Chris Angelico
On Sat, Nov 1, 2014 at 7:38 PM, Christian Gollwitzer <auriocus at gmx.de> wrote:

> Am 01.11.14 09:33, schrieb Chris Angelico:
>>
>> On Sat, Nov 1, 2014 at 7:25 PM, Christian Gollwitzer <auriocus at gmx.de>
>> wrote:
>>>>
>>>> Part of the point of that algorithm is that it never uses FP, and is
>>>> therefore not limited by FP restrictions.
>>>
>>>
>>>
>>> which are???
>>
>>
>> Most notably, the inability to represent every integer beyond 2**53,
>> and the inability to represent any integer beyond 2**1024. This
>> algorithm should work fine with any positive integer.
>>
>  OK so you did not bother to look at my proposed alternative implementation.
> If I understood Steven correctly, he wanted to speed up the original isqrt
> algorithm by using FP when this is possible. I have shown how to do it for
> n<2**1022 (maybe 2**1024, I'm to lean to check it). I admit that there is
> some microoptimizatio left, e.g. the first division is done twice, the
> comparison should be bits>1022, not n>2*1022 etc.

I did look at it. Trouble is, I don't know floating point's details
well enough to prove that there are no *other* limitations.

FWIW, I've proven the algorithm as far as 2**38. That's still a long
way short of 2**53, though, and getting as far as 2**39 would, with my
brute-force checker, require between 2.5 and 5 hours. I've made some
improvements over the original, but it's still slow.

ChrisA


Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Akira Li
In reply to this post by Steven D'Aprano-11
Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:

> There is an algorithm for calculating the integer square root of any
> positive integer using only integer operations:
>
> def isqrt(n):
>     if n < 0: raise ValueError
>     if n == 0:
>         return 0
>     bits = n.bit_length()
>     a, b = divmod(bits, 2)
>     x = 2**(a+b)
>     while True:
>         y = (x + n//x)//2
>         if y >= x:
>             return x
>         x = y
>
> This returns the integer part of the square root of n, that is, the greatest
> whole number less than or equal to the square root of n:
>
> py> isqrt(35)
> 5
> py> isqrt(36)
> 6
>
>
> That makes it equivalent to int(math.sqrt(n)), which also happens to be
> much, much faster, at least for small values of n. However, for large
> values of n, using floating point intermediate calculations fail:
>
> py> import math
> py> int(math.sqrt(2**3000))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OverflowError: long int too large to convert to float
>
> Another problem is that, above a certain size, the resolution of floats is
> larger than 1, so you can't convert every int into a float without loss:
>
> py> float(2**90-1) == 2**90-1
> False
>
> which means that using math.sqrt is not correct:
>
> py> isqrt(2**90-1)
> 35184372088831
> py> int(math.sqrt(2**90-1))  # Off by one.
> 35184372088832
>
>
> So, the challenge is to identify when it is safe to optimise isqrt(n) as
> int(math.sqrt(n)):
>
> Q1: What is the largest value of M, such that
>
> all(isqrt(i) == int(math.sqrt(n)) for n in range(M))
>
> returns True?
>
> I have done a brute force test, and in nine hours confirmed that M is at
> least 7627926244. That took nine hours, and I expect that a brute force
> test of every int representable as a float would take *months* of
> processing time.
>
> Q2: For values above M, is there a way of identifying which values of n are
> okay to use the optimized version?
>
> Q3: What is the largest value of n beyond which you can never use the float
> optimization?
>
>
> You can assume that Python floats are IEEE-754 C doubles, and that
> math.sqrt() is correctly rounded.

Where do you want to use your optimized isqrt(i)?

There could be specialized algorithms that work only in narrow specific
circumstances e.g., the inverse square root (1/sqrt(x)) implementation
from Quake III Arena that has 0x5f3759df constant in it (only of
historical interest now).

If you want to work with very large (thousands, millions of digits)
integers then gmp library might be faster then the default Python
integer implementation.


--
Akira



Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Serhiy Storchaka-2
In reply to this post by Steven D'Aprano-11
On 01.11.14 03:29, Steven D'Aprano wrote:
> There is an algorithm for calculating the integer square root of any
> positive integer using only integer operations:

Here is my optimized implementation inspired by Christian's ideas.

import math, sys

C1 = sys.float_info.radix ** sys.float_info.mant_dig
C2 = int(sys.float_info.max)
C3 = C2.bit_length() - 2

def isqrt(n):
     if n < 0:
         raise ValueError
     if n == 0:
         return 0
     if n <= C1:
         return int(math.sqrt(n))
     if n <= C2:
         x = int(math.sqrt(n))
     else:
         b = (n.bit_length() - C3) // 2
         x = int(math.sqrt(n >> (2 * b))) << b
     y = (x + n // x) // 2
     if y == x:
         return x
     while True:
         x = y
         y = (x + n // x) // 2
         if y >= x:
             return x




Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Stephen Tucker
Serhiy,

This looks very good indeed. As a matter of interest, is there any
particular reason you have used 2*b instead of b+b? Might b+b be faster
than b*2?

Also, in various lines, you use //2. Would >>1 be quicker? On reflection,
perhaps you have had to use //2 because >>1 cannot be used in those
situations.

Stephen Tucker.




On Thu, Nov 20, 2014 at 6:00 PM, Serhiy Storchaka <storchaka at gmail.com>
wrote:

> On 01.11.14 03:29, Steven D'Aprano wrote:
>
>> There is an algorithm for calculating the integer square root of any
>> positive integer using only integer operations:
>>
>
> Here is my optimized implementation inspired by Christian's ideas.
>
> import math, sys
>
> C1 = sys.float_info.radix ** sys.float_info.mant_dig
> C2 = int(sys.float_info.max)
> C3 = C2.bit_length() - 2
>
> def isqrt(n):
>     if n < 0:
>         raise ValueError
>     if n == 0:
>         return 0
>     if n <= C1:
>         return int(math.sqrt(n))
>     if n <= C2:
>         x = int(math.sqrt(n))
>     else:
>         b = (n.bit_length() - C3) // 2
>         x = int(math.sqrt(n >> (2 * b))) << b
>     y = (x + n // x) // 2
>     if y == x:
>         return x
>     while True:
>         x = y
>         y = (x + n // x) // 2
>         if y >= x:
>             return x
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20141121/af61daa9/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Dave Angel-4
On 11/21/2014 03:15 AM, Stephen Tucker wrote:

>
>
>
> On Thu, Nov 20, 2014 at 6:00 PM, Serhiy Storchaka <storchaka at gmail.com>
> wrote:
>
>> On 01.11.14 03:29, Steven D'Aprano wrote:
>>
>>> There is an algorithm for calculating the integer square root of any
>>> positive integer using only integer operations:
>>>
>>
>> Here is my optimized implementation inspired by Christian's ideas.
>>
>>  ....
>> def isqrt(n):
>>      if n < 0:
>>          raise ValueError
>>      if n == 0:
>>          return 0
>>      if n <= C1:
>>          return int(math.sqrt(n))
>>      if n <= C2:
>>          x = int(math.sqrt(n))
>>      else:
>>          b = (n.bit_length() - C3) // 2
>>          x = int(math.sqrt(n >> (2 * b))) << b
>>      y = (x + n // x) // 2
>>      if y == x:
>>          return x
>>      while True:
>>          x = y
>>          y = (x + n // x) // 2
>>          if y >= x:
>>              return x
>>
>> > Serhiy,
 >
 > This looks very good indeed. As a matter of interest, is there any
 > particular reason you have used 2*b instead of b+b? Might b+b be faster
 > than b*2?
 >
 > Also, in various lines, you use //2. Would >>1 be quicker? On reflection,
 > perhaps you have had to use //2 because >>1 cannot be used in those
 > situations.
 >
 > Stephen Tucker.
 >

(Please don't top-post here.  I moved your remarks after the code you're
referencing, so that others may see it in the preferred order)

It's been probably two decades since I've been in a programming
situation where that kind of difference mattered.  In Python in
particular, the actual arithmetic or bit shifting is probably only one
part in a hundred, and in modern processors all these operations tend to
be very close in timing.

You're welcome to use the timeit module to try to measure it.  But I
doubt it makes a difference of more than one part in a thousand.  And in
some situations I can imagine b+b to be slower than 2*b. (For example,
if b were global)


--
DaveA


Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Serhiy Storchaka-2
In reply to this post by Stephen Tucker
?'??????, 21-???-2014 08:15:57 ?? ????????:
> This looks very good indeed. As a matter of interest, is there any
> particular reason you have used 2*b instead of b+b? Might b+b be faster
> than b*2?

Yes, it is slightly faster, but the effect is indiscernible in total
time. But
there is not harm to use b+b.

> Also, in various lines, you use //2. Would >>1 be quicker? On reflection,
> perhaps you have had to use //2 because >>1 cannot be used in those
> situations.

I thought this effect would be insignificant too. But actually it is
measurable
(about 10% for some input). Thanks, this optimization is worth to be
applied.



Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Dave Angel-4
On 11/25/2014 02:31 PM, Serhiy Storchaka wrote:

> ?'??????, 21-???-2014 08:15:57 ?? ????????:
>> This looks very good indeed. As a matter of interest, is there any
>> particular reason you have used 2*b instead of b+b? Might b+b be faster
>> than b*2?
>
> Yes, it is slightly faster, but the effect is indiscernible in total
> time. But
> there is not harm to use b+b.
>
>> Also, in various lines, you use //2. Would >>1 be quicker? On reflection,
>> perhaps you have had to use //2 because >>1 cannot be used in those
>> situations.
>
> I thought this effect would be insignificant too. But actually it is
> measurable
> (about 10% for some input). Thanks, this optimization is worth to be
> applied.
>

Unfortunately, for many values, the version of the function with >>1 is
slower.  It's only when the argument is bigger than 10**40 that it's as
much as 1% faster.  But it's true that for really large values, it's
quicker.

A quick test shows that 3.3 is about 20% faster for both these functions
than 2.7.

My oversight earlier was assuming a native type.  Once you get into
"longs" which aren't supported by the processor, the shift will likely
become much faster than divide.

--
DaveA


Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Stephen Tucker
Another _possible_ performance improvement that is staring us in the face
is that 2*b could be replaced with b<<1. Given that b+b (an earlier
suggestion of mine) involves two table look-ups for b, whereas b<<1 only
involves one, it seems that the scope here for improvement is significant.

By the way, I hope this post is not "top-posted" as my previous one was.
Apologies for that - I am new to this kind of thing.



On Wed, Nov 26, 2014 at 12:46 AM, Dave Angel <davea at davea.name> wrote:

> On 11/25/2014 02:31 PM, Serhiy Storchaka wrote:
>
>> ?'??????, 21-???-2014 08:15:57 ?? ????????:
>>
>>> This looks very good indeed. As a matter of interest, is there any
>>> particular reason you have used 2*b instead of b+b? Might b+b be faster
>>> than b*2?
>>>
>>
>> Yes, it is slightly faster, but the effect is indiscernible in total
>> time. But
>> there is not harm to use b+b.
>>
>>  Also, in various lines, you use //2. Would >>1 be quicker? On reflection,
>>> perhaps you have had to use //2 because >>1 cannot be used in those
>>> situations.
>>>
>>
>> I thought this effect would be insignificant too. But actually it is
>> measurable
>> (about 10% for some input). Thanks, this optimization is worth to be
>> applied.
>>
>>
> Unfortunately, for many values, the version of the function with >>1 is
> slower.  It's only when the argument is bigger than 10**40 that it's as
> much as 1% faster.  But it's true that for really large values, it's
> quicker.
>
> A quick test shows that 3.3 is about 20% faster for both these functions
> than 2.7.
>
> My oversight earlier was assuming a native type.  Once you get into
> "longs" which aren't supported by the processor, the shift will likely
> become much faster than divide.
>
>
> --
> DaveA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20141126/1b858e3f/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Steven D'Aprano-11
In reply to this post by Dave Angel-4
Stephen Tucker wrote:

> Another _possible_ performance improvement that is staring us in the face
> is that 2*b could be replaced with b<<1. Given that b+b (an earlier
> suggestion of mine) involves two table look-ups for b, whereas b<<1 only
> involves one, it seems that the scope here for improvement is significant.


As the Original Poster of this thread, I want to thank everyone who has
contributed and assure them that despite my silence so far, I do appreciate
your responses and I'm collecting the suggestions for later.


> By the way, I hope this post is not "top-posted" as my previous one was.
> Apologies for that - I am new to this kind of thing.

Alas, you have top posted. Has anyone explained what top posting is?

Reading from the top down, as is standard in English, an email conversation
will often take the following form. I'll use "=" to indicate quoted text
instead of the more common ">".

    = quoted comment
    = more text

    response

    = a question

    answer

    = a second question

    a second answer


Note that it reads like a conversation, "you say, I respond, you say, I
respond...". This is often called "interleaved posting" or "inline
posting", because the replies are interleaved between the questions. Also,
the quoted text is usually trimmed or "snipped" to remove parts which are
no longer relevant. The idea is to leave enough quoted content to establish
content, but without removing so much as to give a misleading impression,
or leaving so much in that the reader is overwhelmed by the volume of
quoted text.

To old-school email and Usenet users, inline posting is usually considered
far superior, *especially* for long and technical discussions like we often
find here.

Top posting usually ends up looking like this:


    response completely lacking any context

    = quoted comment
    = more text
    = text which is no longer relevant
    = a question
    = more text which is not relevant
    = and even more of it
    = a second question
    = and more text which is not being
    = commented on, so why is it left in?


Notice that it is all too easy to fail to answer the questions when top
posting. (To be fair, that's more a product of the person responding being
lazy or careless, but top posting does encourage lazy and careless
replies.)


Bottom posting often looks like this:

    = quoted comment
    = more text
    = text which is no longer relevant
    = a question
    = more text which is not relevant
    = and even more of it
    = a second question
    = and more text which is not being
    = commented on, so why is it left in?

    response lacking context


Note that the reader may end up having to scroll past many paragraphs (or
even pages!) of quoted text to discover the response. Which, all too often,
is a single sentence.


Thanks for contributing to the discussion.



--
Steven



Reply | Threaded
Open this post in threaded view
|

Challenge: optimizing isqrt

Serhiy Storchaka-2
In reply to this post by Dave Angel-4
On 26.11.14 02:46, Dave Angel wrote:
> Unfortunately, for many values, the version of the function with >>1 is
> slower.  It's only when the argument is bigger than 10**40 that it's as
> much as 1% faster.  But it's true that for really large values, it's
> quicker.

Note that this path is used only for values above 2**53 (about 10**16).
For all tested values from 10**16 to 10**40 the difference between two
versions is not larger than 1% (in one or other direction). But the
difference between different runs of the benchmark for the same version
vary about 1%. So the difference between versions in this range doesn't
matter.