# Challenge: optimizing isqrt Classic List Threaded 18 messages Open this post in threaded view
|

## Challenge: optimizing isqrt

 There is an algorithm for calculating the integer square root of any positive integer using only integer operations: def isqrt(n):     if n < 0: raise ValueError     if n == 0:         return 0     bits = n.bit_length()     a, b = divmod(bits, 2)     x = 2**(a+b)     while True:         y = (x + n//x)//2         if y >= x:             return x         x = y This returns the integer part of the square root of n, that is, the greatest whole number less than or equal to the square root of n: py> isqrt(35) 5 py> isqrt(36) 6 That makes it equivalent to int(math.sqrt(n)), which also happens to be much, much faster, at least for small values of n. However, for large values of n, using floating point intermediate calculations fail: py> import math py> int(math.sqrt(2**3000)) Traceback (most recent call last):   File "", line 1, in OverflowError: long int too large to convert to float Another problem is that, above a certain size, the resolution of floats is larger than 1, so you can't convert every int into a float without loss: py> float(2**90-1) == 2**90-1 False which means that using math.sqrt is not correct: py> isqrt(2**90-1) 35184372088831 py> int(math.sqrt(2**90-1))  # Off by one. 35184372088832 So, the challenge is to identify when it is safe to optimise isqrt(n) as int(math.sqrt(n)): Q1: What is the largest value of M, such that all(isqrt(i) == int(math.sqrt(n)) for n in range(M)) returns True? I have done a brute force test, and in nine hours confirmed that M is at least 7627926244. That took nine hours, and I expect that a brute force test of every int representable as a float would take *months* of processing time. Q2: For values above M, is there a way of identifying which values of n are okay to use the optimized version? Q3: What is the largest value of n beyond which you can never use the float optimization? You can assume that Python floats are IEEE-754 C doubles, and that math.sqrt() is correctly rounded. -- Steven
Open this post in threaded view
|

## Challenge: optimizing isqrt

 On Sat, Nov 1, 2014 at 12:29 PM, Steven D'Aprano wrote: > Q1: What is the largest value of M, such that > > all(isqrt(i) == int(math.sqrt(n)) for n in range(M)) > > returns True? > > I have done a brute force test, and in nine hours confirmed that M is at > least 7627926244. That took nine hours, and I expect that a brute force > test of every int representable as a float would take *months* of > processing time. Here's something that should be faster than pure brute-force: import math for i in range(2**34): # 2**34 should take about nine hours     sqrt = int(math.sqrt(i))     if sqrt * sqrt > i:         print("Too high at %d" % i)         break     if (sqrt+1) * (sqrt+1) <= i:         print("Too low at %d" % i)         break There are two ways int(math.sqrt(i)) can be wrong: it can return a number that's too high, or one that's too low. If it's too high, squaring it (btw, int*int seems to be faster than int**2) will produce a number higher than the original. If the result was too low, then the next number up would have been within the correct bounds. There's no need to actually calculate isqrt here. I was able to probe up to 2**29 in seven minutes on my computer (allocating one core, CPython 3.5 from trunk). If it's linear, going as far as 2**34 should take no more than the nine hours you spent. I expect time will be a bit worse than linear (working with small numbers is marginally faster than working with big numbers), but if it's at least reasonably close, probing as far as 2**53 would take 81555 days. So, uhh, a brute force by your method is going to take a lot more than months. (And 2**53 isn't technically where ints stop being representable, but I'd use that as the final cut-off, as it's where *all* ints stop being representable. In any case, there's a failure at 2**54-1, so there's no need to go past there.) Hmm. The speed difference between your brute force and mine isn't all that much. I'm estimating maybe doing at best twice as much in the same time. So you're probably already using something similar to the above, and this isn't adding much to the discussion. :( ChrisA
Open this post in threaded view
|

## Challenge: optimizing isqrt

 In reply to this post by Steven D'Aprano-11 Hi Steven, let me start by answering from reverse:  > Q3: What is the largest value of n beyond which you can never use the float  > optimization?  > A3: There is no such value, besides the upper limit of floats (DBL_MAX~ 10^308) P3: If you feed a perfect square into the floating point square root algorithm, with a mantissa of the root of length smaller than the bitwidth of your float, it will always come out perfectly. I.e., computing sqrt(25) in FP math is no different from sqrt(25*2**200):  >>> 25*2**200 40173451106474756888549052308529065063055074844569820882534400L  >>> x=int(math.sqrt(25*2**200))  >>> x 6338253001141147007483516026880L  >>> x*x 40173451106474756888549052308529065063055074844569820882534400L  >>> Am 01.11.14 02:29, schrieb Steven D'Aprano: > There is an algorithm for calculating the integer square root of any > positive integer using only integer operations: > > def isqrt(n): >      if n < 0: raise ValueError >      if n == 0: >          return 0 >      bits = n.bit_length() >      a, b = divmod(bits, 2) >      x = 2**(a+b) >      while True: >          y = (x + n//x)//2 >          if y >= x: >              return x >          x = y > Q2: For values above M, is there a way of identifying which values of n are > okay to use the optimized version? A2: Do it in a different way. Your above algorithm is obviously doing Heron- or Newton-Raphson iterations, so the same as with floating point math. The first line before the while loop computes some approximation to sqrt(n). Instead of doing bit shuffling, you could compute this by FP math and get closer to the desired result, unless the integer is too large to be represented by FP. Now, the terminating condition seems to rely on the fact that the initial estimate x>=sqrt(n), but I don't really understand it. My guess is that if you do x=int(sqrt(n)), then do the first iteration, then swap x and y such that x>y, then enter the loop, you would simply start with a better estimate in case that the significant bits can be represented by the float. So this is my try, but not thoroughly tested: def isqrt(n):         if n < 0: raise ValueError         if n == 0:                 return 0         bits = n.bit_length()         # the highest exponent in 64bit IEEE is 1023         if n>2**1022:                 a, b = divmod(bits, 2)                 x = 2**(a+b)         else:                 x=int(math.sqrt(n))                 y=n//x                 if x= x:                         return x                 x = y Christian
Open this post in threaded view
|

## Challenge: optimizing isqrt

 Addendum: If my method below works, you can also use it to speed up computations for n>2*1022, by splitting off an even power of two from the integer and computing the FP sqrt of the mantissa for the seed, i.e. doing the FP manually. Am 01.11.14 09:02, schrieb Christian Gollwitzer: > Hi Steven, > > let me start by answering from reverse: >  > Q3: What is the largest value of n beyond which you can never use the > float >  > optimization? >  > > > A3: There is no such value, besides the upper limit of floats (DBL_MAX~ > 10^308) > > P3: If you feed a perfect square into the floating point square root > algorithm, with a mantissa of the root of length smaller than the > bitwidth of your float, it will always come out perfectly. I.e., > computing sqrt(25) in FP math is no different from sqrt(25*2**200): > >  >>> 25*2**200 > 40173451106474756888549052308529065063055074844569820882534400L >  >>> x=int(math.sqrt(25*2**200)) >  >>> x > 6338253001141147007483516026880L >  >>> x*x > 40173451106474756888549052308529065063055074844569820882534400L >  >>> > > > > Am 01.11.14 02:29, schrieb Steven D'Aprano: >> There is an algorithm for calculating the integer square root of any >> positive integer using only integer operations: >> >> def isqrt(n): >>      if n < 0: raise ValueError >>      if n == 0: >>          return 0 >>      bits = n.bit_length() >>      a, b = divmod(bits, 2) >>      x = 2**(a+b) >>      while True: >>          y = (x + n//x)//2 >>          if y >= x: >>              return x >>          x = y > >> Q2: For values above M, is there a way of identifying which values of >> n are >> okay to use the optimized version? > > A2: Do it in a different way. > > Your above algorithm is obviously doing Heron- or Newton-Raphson > iterations, so the same as with floating point math. The first line > before the while loop computes some approximation to sqrt(n). Instead of > doing bit shuffling, you could compute this by FP math and get closer to > the desired result, unless the integer is too large to be represented by > FP. Now, the terminating condition seems to rely on the fact that the > initial estimate x>=sqrt(n), but I don't really understand it. My guess > is that if you do x=int(sqrt(n)), then do the first iteration, then swap > x and y such that x>y, then enter the loop, you would simply start with > a better estimate in case that the significant bits can be represented > by the float. > > So this is my try, but not thoroughly tested: > > def isqrt(n): >      if n < 0: raise ValueError >      if n == 0: >          return 0 >      bits = n.bit_length() >      # the highest exponent in 64bit IEEE is 1023 >      if n>2**1022: >          a, b = divmod(bits, 2) >          x = 2**(a+b) >      else: >          x=int(math.sqrt(n)) >          y=n//x >          if x              x,y = (y,x) > >      while True: >          y = (x + n//x)//2 >          if y >= x: >              return x >          x = y > > > Christian > > > >
Open this post in threaded view
|

## Challenge: optimizing isqrt

 In reply to this post by Christian Gollwitzer On Sat, Nov 1, 2014 at 7:02 PM, Christian Gollwitzer wrote: > Your above algorithm is obviously doing Heron- or Newton-Raphson iterations, > so the same as with floating point math. The first line before the while > loop computes some approximation to sqrt(n). Instead of doing bit shuffling, > you could compute this by FP math and get closer to the desired result, > unless the integer is too large to be represented by FP. Now, the > terminating condition seems to rely on the fact that the initial estimate > x>=sqrt(n), but I don't really understand it. My guess is that if you do > x=int(sqrt(n)), then do the first iteration, then swap x and y such that > x>y, then enter the loop, you would simply start with a better estimate in > case that the significant bits can be represented by the float. Part of the point of that algorithm is that it never uses FP, and is therefore not limited by FP restrictions. As to the assumption that x>=sqrt(n), that would be safe: if the bit length is even (that is, it's between an odd power of 2 and an even one - for example, 2**13 < 16000 <= 2**14), the initial estimate is the exact square root of the power of two at the top of that range (bit length of 14 means x is 2**7, 128 == sqrt(16384)); if the bit length is odd (eg 2**8 < 300 <= 2**9), the initial estimate rounds the halving upward (bit length of 9 means x is 2**(9//2+1), 32 > sqrt(512)). So there's a guarantee that the initial estimate is no lower than the target number. ChrisA
Open this post in threaded view
|

## Challenge: optimizing isqrt

 In reply to this post by Christian Gollwitzer Am 01.11.14 09:13, schrieb Chris Angelico: > On Sat, Nov 1, 2014 at 7:02 PM, Christian Gollwitzer wrote: >> Your above algorithm is obviously doing Heron- or Newton-Raphson iterations, >> so the same as with floating point math. The first line before the while >> loop computes some approximation to sqrt(n). Instead of doing bit shuffling, >> you could compute this by FP math and get closer to the desired result, >> unless the integer is too large to be represented by FP. Now, the >> terminating condition seems to rely on the fact that the initial estimate >> x>=sqrt(n), but I don't really understand it. My guess is that if you do >> x=int(sqrt(n)), then do the first iteration, then swap x and y such that >> x>y, then enter the loop, you would simply start with a better estimate in >> case that the significant bits can be represented by the float. > > Part of the point of that algorithm is that it never uses FP, and is > therefore not limited by FP restrictions. which are???
Open this post in threaded view
|

## Challenge: optimizing isqrt

 On Sat, Nov 1, 2014 at 7:25 PM, Christian Gollwitzer wrote: >> Part of the point of that algorithm is that it never uses FP, and is >> therefore not limited by FP restrictions. > > > which are??? Most notably, the inability to represent every integer beyond 2**53, and the inability to represent any integer beyond 2**1024. This algorithm should work fine with any positive integer. ChrisA
Open this post in threaded view
|

## Challenge: optimizing isqrt

 In reply to this post by Christian Gollwitzer Am 01.11.14 09:33, schrieb Chris Angelico: > On Sat, Nov 1, 2014 at 7:25 PM, Christian Gollwitzer wrote: >>> Part of the point of that algorithm is that it never uses FP, and is >>> therefore not limited by FP restrictions. >> >> >> which are??? > > Most notably, the inability to represent every integer beyond 2**53, > and the inability to represent any integer beyond 2**1024. This > algorithm should work fine with any positive integer. >   OK so you did not bother to look at my proposed alternative implementation. If I understood Steven correctly, he wanted to speed up the original isqrt algorithm by using FP when this is possible. I have shown how to do it for n<2**1022 (maybe 2**1024, I'm to lean to check it). I admit that there is some microoptimizatio left, e.g. the first division is done twice, the comparison should be bits>1022, not n>2*1022 etc.         Christian
Open this post in threaded view
|

## Challenge: optimizing isqrt

 On Sat, Nov 1, 2014 at 7:38 PM, Christian Gollwitzer wrote: > Am 01.11.14 09:33, schrieb Chris Angelico: >> >> On Sat, Nov 1, 2014 at 7:25 PM, Christian Gollwitzer >> wrote: >>>> >>>> Part of the point of that algorithm is that it never uses FP, and is >>>> therefore not limited by FP restrictions. >>> >>> >>> >>> which are??? >> >> >> Most notably, the inability to represent every integer beyond 2**53, >> and the inability to represent any integer beyond 2**1024. This >> algorithm should work fine with any positive integer. >> >  OK so you did not bother to look at my proposed alternative implementation. > If I understood Steven correctly, he wanted to speed up the original isqrt > algorithm by using FP when this is possible. I have shown how to do it for > n<2**1022 (maybe 2**1024, I'm to lean to check it). I admit that there is > some microoptimizatio left, e.g. the first division is done twice, the > comparison should be bits>1022, not n>2*1022 etc. I did look at it. Trouble is, I don't know floating point's details well enough to prove that there are no *other* limitations. FWIW, I've proven the algorithm as far as 2**38. That's still a long way short of 2**53, though, and getting as far as 2**39 would, with my brute-force checker, require between 2.5 and 5 hours. I've made some improvements over the original, but it's still slow. ChrisA
Open this post in threaded view
|

## Challenge: optimizing isqrt

 In reply to this post by Steven D'Aprano-11 Steven D'Aprano writes: > There is an algorithm for calculating the integer square root of any > positive integer using only integer operations: > > def isqrt(n): >     if n < 0: raise ValueError >     if n == 0: >         return 0 >     bits = n.bit_length() >     a, b = divmod(bits, 2) >     x = 2**(a+b) >     while True: >         y = (x + n//x)//2 >         if y >= x: >             return x >         x = y > > This returns the integer part of the square root of n, that is, the greatest > whole number less than or equal to the square root of n: > > py> isqrt(35) > 5 > py> isqrt(36) > 6 > > > That makes it equivalent to int(math.sqrt(n)), which also happens to be > much, much faster, at least for small values of n. However, for large > values of n, using floating point intermediate calculations fail: > > py> import math > py> int(math.sqrt(2**3000)) > Traceback (most recent call last): >   File "", line 1, in > OverflowError: long int too large to convert to float > > Another problem is that, above a certain size, the resolution of floats is > larger than 1, so you can't convert every int into a float without loss: > > py> float(2**90-1) == 2**90-1 > False > > which means that using math.sqrt is not correct: > > py> isqrt(2**90-1) > 35184372088831 > py> int(math.sqrt(2**90-1))  # Off by one. > 35184372088832 > > > So, the challenge is to identify when it is safe to optimise isqrt(n) as > int(math.sqrt(n)): > > Q1: What is the largest value of M, such that > > all(isqrt(i) == int(math.sqrt(n)) for n in range(M)) > > returns True? > > I have done a brute force test, and in nine hours confirmed that M is at > least 7627926244. That took nine hours, and I expect that a brute force > test of every int representable as a float would take *months* of > processing time. > > Q2: For values above M, is there a way of identifying which values of n are > okay to use the optimized version? > > Q3: What is the largest value of n beyond which you can never use the float > optimization? > > > You can assume that Python floats are IEEE-754 C doubles, and that > math.sqrt() is correctly rounded. Where do you want to use your optimized isqrt(i)? There could be specialized algorithms that work only in narrow specific circumstances e.g., the inverse square root (1/sqrt(x)) implementation from Quake III Arena that has 0x5f3759df constant in it (only of historical interest now). If you want to work with very large (thousands, millions of digits) integers then gmp library might be faster then the default Python integer implementation. -- Akira
Open this post in threaded view
|

## Challenge: optimizing isqrt

 In reply to this post by Steven D'Aprano-11 On 01.11.14 03:29, Steven D'Aprano wrote: > There is an algorithm for calculating the integer square root of any > positive integer using only integer operations: Here is my optimized implementation inspired by Christian's ideas. import math, sys C1 = sys.float_info.radix ** sys.float_info.mant_dig C2 = int(sys.float_info.max) C3 = C2.bit_length() - 2 def isqrt(n):      if n < 0:          raise ValueError      if n == 0:          return 0      if n <= C1:          return int(math.sqrt(n))      if n <= C2:          x = int(math.sqrt(n))      else:          b = (n.bit_length() - C3) // 2          x = int(math.sqrt(n >> (2 * b))) << b      y = (x + n // x) // 2      if y == x:          return x      while True:          x = y          y = (x + n // x) // 2          if y >= x:              return x
Open this post in threaded view
|

## Challenge: optimizing isqrt

 Serhiy, This looks very good indeed. As a matter of interest, is there any particular reason you have used 2*b instead of b+b? Might b+b be faster than b*2? Also, in various lines, you use //2. Would >>1 be quicker? On reflection, perhaps you have had to use //2 because >>1 cannot be used in those situations. Stephen Tucker. On Thu, Nov 20, 2014 at 6:00 PM, Serhiy Storchaka wrote: > On 01.11.14 03:29, Steven D'Aprano wrote: > >> There is an algorithm for calculating the integer square root of any >> positive integer using only integer operations: >> > > Here is my optimized implementation inspired by Christian's ideas. > > import math, sys > > C1 = sys.float_info.radix ** sys.float_info.mant_dig > C2 = int(sys.float_info.max) > C3 = C2.bit_length() - 2 > > def isqrt(n): >     if n < 0: >         raise ValueError >     if n == 0: >         return 0 >     if n <= C1: >         return int(math.sqrt(n)) >     if n <= C2: >         x = int(math.sqrt(n)) >     else: >         b = (n.bit_length() - C3) // 2 >         x = int(math.sqrt(n >> (2 * b))) << b >     y = (x + n // x) // 2 >     if y == x: >         return x >     while True: >         x = y >         y = (x + n // x) // 2 >         if y >= x: >             return x > > > -- > https://mail.python.org/mailman/listinfo/python-list> -------------- next part -------------- An HTML attachment was scrubbed... URL:
Open this post in threaded view
|

## Challenge: optimizing isqrt

 On 11/21/2014 03:15 AM, Stephen Tucker wrote: > > > > On Thu, Nov 20, 2014 at 6:00 PM, Serhiy Storchaka > wrote: > >> On 01.11.14 03:29, Steven D'Aprano wrote: >> >>> There is an algorithm for calculating the integer square root of any >>> positive integer using only integer operations: >>> >> >> Here is my optimized implementation inspired by Christian's ideas. >> >>  .... >> def isqrt(n): >>      if n < 0: >>          raise ValueError >>      if n == 0: >>          return 0 >>      if n <= C1: >>          return int(math.sqrt(n)) >>      if n <= C2: >>          x = int(math.sqrt(n)) >>      else: >>          b = (n.bit_length() - C3) // 2 >>          x = int(math.sqrt(n >> (2 * b))) << b >>      y = (x + n // x) // 2 >>      if y == x: >>          return x >>      while True: >>          x = y >>          y = (x + n // x) // 2 >>          if y >= x: >>              return x >> >> > Serhiy,  >  > This looks very good indeed. As a matter of interest, is there any  > particular reason you have used 2*b instead of b+b? Might b+b be faster  > than b*2?  >  > Also, in various lines, you use //2. Would >>1 be quicker? On reflection,  > perhaps you have had to use //2 because >>1 cannot be used in those  > situations.  >  > Stephen Tucker.  > (Please don't top-post here.  I moved your remarks after the code you're referencing, so that others may see it in the preferred order) It's been probably two decades since I've been in a programming situation where that kind of difference mattered.  In Python in particular, the actual arithmetic or bit shifting is probably only one part in a hundred, and in modern processors all these operations tend to be very close in timing. You're welcome to use the timeit module to try to measure it.  But I doubt it makes a difference of more than one part in a thousand.  And in some situations I can imagine b+b to be slower than 2*b. (For example, if b were global) -- DaveA
Open this post in threaded view
|

## Challenge: optimizing isqrt

 In reply to this post by Stephen Tucker ?'??????, 21-???-2014 08:15:57 ?? ????????: > This looks very good indeed. As a matter of interest, is there any > particular reason you have used 2*b instead of b+b? Might b+b be faster > than b*2? Yes, it is slightly faster, but the effect is indiscernible in total time. But there is not harm to use b+b. > Also, in various lines, you use //2. Would >>1 be quicker? On reflection, > perhaps you have had to use //2 because >>1 cannot be used in those > situations. I thought this effect would be insignificant too. But actually it is measurable (about 10% for some input). Thanks, this optimization is worth to be applied.
Open this post in threaded view
|

## Challenge: optimizing isqrt

 On 11/25/2014 02:31 PM, Serhiy Storchaka wrote: > ?'??????, 21-???-2014 08:15:57 ?? ????????: >> This looks very good indeed. As a matter of interest, is there any >> particular reason you have used 2*b instead of b+b? Might b+b be faster >> than b*2? > > Yes, it is slightly faster, but the effect is indiscernible in total > time. But > there is not harm to use b+b. > >> Also, in various lines, you use //2. Would >>1 be quicker? On reflection, >> perhaps you have had to use //2 because >>1 cannot be used in those >> situations. > > I thought this effect would be insignificant too. But actually it is > measurable > (about 10% for some input). Thanks, this optimization is worth to be > applied. > Unfortunately, for many values, the version of the function with >>1 is slower.  It's only when the argument is bigger than 10**40 that it's as much as 1% faster.  But it's true that for really large values, it's quicker. A quick test shows that 3.3 is about 20% faster for both these functions than 2.7. My oversight earlier was assuming a native type.  Once you get into "longs" which aren't supported by the processor, the shift will likely become much faster than divide. -- DaveA
Open this post in threaded view
|

## Challenge: optimizing isqrt

 Another _possible_ performance improvement that is staring us in the face is that 2*b could be replaced with b<<1. Given that b+b (an earlier suggestion of mine) involves two table look-ups for b, whereas b<<1 only involves one, it seems that the scope here for improvement is significant. By the way, I hope this post is not "top-posted" as my previous one was. Apologies for that - I am new to this kind of thing. On Wed, Nov 26, 2014 at 12:46 AM, Dave Angel wrote: > On 11/25/2014 02:31 PM, Serhiy Storchaka wrote: > >> ?'??????, 21-???-2014 08:15:57 ?? ????????: >> >>> This looks very good indeed. As a matter of interest, is there any >>> particular reason you have used 2*b instead of b+b? Might b+b be faster >>> than b*2? >>> >> >> Yes, it is slightly faster, but the effect is indiscernible in total >> time. But >> there is not harm to use b+b. >> >>  Also, in various lines, you use //2. Would >>1 be quicker? On reflection, >>> perhaps you have had to use //2 because >>1 cannot be used in those >>> situations. >>> >> >> I thought this effect would be insignificant too. But actually it is >> measurable >> (about 10% for some input). Thanks, this optimization is worth to be >> applied. >> >> > Unfortunately, for many values, the version of the function with >>1 is > slower.  It's only when the argument is bigger than 10**40 that it's as > much as 1% faster.  But it's true that for really large values, it's > quicker. > > A quick test shows that 3.3 is about 20% faster for both these functions > than 2.7. > > My oversight earlier was assuming a native type.  Once you get into > "longs" which aren't supported by the processor, the shift will likely > become much faster than divide. > > > -- > DaveA > -- > https://mail.python.org/mailman/listinfo/python-list> -------------- next part -------------- An HTML attachment was scrubbed... URL: