Bug in timsort!?

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Roy Smith
Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Zachary Ware-2
On Tue, Feb 24, 2015 at 3:34 PM, Roy Smith <roy at panix.com> wrote:
> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

http://bugs.python.org/issue23515

Note that the article does mention that Python is not vulnerable due
to this bug (and best I can tell has no behavioral issues); no
computer currently in existence can create a large enough array to
cause a problem.

--
Zach


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Mark Lawrence
In reply to this post by Roy Smith
On 24/02/2015 21:34, Roy Smith wrote:
> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
>

As you can clearly no longer rely on Python it looks like, after a long
and loving relationship, I'm forced into moving on.  PHP, Applescript,
Javascript or back to C or even CORAL 66, what is your (plural)
recommendation(s)?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence



Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

sohcahtoa82@gmail.com
In reply to this post by Roy Smith
On Tuesday, February 24, 2015 at 1:50:15 PM UTC-8, Mark Lawrence wrote:

> On 24/02/2015 21:34, Roy Smith wrote:
> > http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
> >
>
> As you can clearly no longer rely on Python it looks like, after a long
> and loving relationship, I'm forced into moving on.  PHP, Applescript,
> Javascript or back to C or even CORAL 66, what is your (plural)
> recommendation(s)?
>
> --
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
>
> Mark Lawrence

Personally, I'm going with BASIC being interpreted by a JavaScript script being fed from a PHP script running on an out-dated version of Apache.

It includes an in-browser IDE that will be written as an ActiveX control and will require Internet Explorer 6.  An Adobe Flash version is in the works to allow it to run on modern browsers.


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Grant Edwards-7
In reply to this post by Roy Smith
On 2015-02-24, Roy Smith <roy at panix.com> wrote:

> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

I don't get it.

    3.2 Corrected Python merge_collapse function
   
    merge_collapse(MergeState *ms)
    {
        struct s_slice *p = ms->pending;
   
        assert(ms);
        while (ms->n > 1) {
            Py_ssize_t n = ms->n - 2;
            if (     n > 0   && p[n-1].len <= p[n].len + p[n+1].len
                || (n-1 > 0 &&  p[n-2].len <= p[n].len + p[n-1].len)) {
                if (p[n-1].len < p[n+1].len)
                    --n;
                if (merge_at(ms, n) < 0)
                    return -1;
            }
            else if (p[n].len <= p[n+1].len) {
                     if (merge_at(ms, n) < 0)
                            return -1;
            }
            else
                break;
        }
        return 0;
    }    

Or does "Python function" mean something else in this context?



   
--
Grant Edwards               grant.b.edwards        Yow! I request a weekend in
                                  at               Havana with Phil Silvers!
                              gmail.com            


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Ian Kelly-2
On Tue, Feb 24, 2015 at 3:45 PM, Grant Edwards <invalid at invalid.invalid> wrote:

> On 2015-02-24, Roy Smith <roy at panix.com> wrote:
>
>> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
>
> I don't get it.
>
>     3.2 Corrected Python merge_collapse function
>
>     merge_collapse(MergeState *ms)
>     {
>         struct s_slice *p = ms->pending;
>
>         assert(ms);
>         while (ms->n > 1) {
>             Py_ssize_t n = ms->n - 2;
>             if (     n > 0   && p[n-1].len <= p[n].len + p[n+1].len
>                 || (n-1 > 0 &&  p[n-2].len <= p[n].len + p[n-1].len)) {
>                 if (p[n-1].len < p[n+1].len)
>                     --n;
>                 if (merge_at(ms, n) < 0)
>                     return -1;
>             }
>             else if (p[n].len <= p[n+1].len) {
>                      if (merge_at(ms, n) < 0)
>                             return -1;
>             }
>             else
>                 break;
>         }
>         return 0;
>     }
>
> Or does "Python function" mean something else in this context?

Recall that CPython is implemented in C. ;-)


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Chris Angelico
In reply to this post by Roy Smith
On Wed, Feb 25, 2015 at 8:34 AM, Roy Smith <roy at panix.com> wrote:
> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

The post links to:

http://svn.python.org/projects/python/trunk/Objects/listobject.c

Is that still valid and current? It says svn, but does it pop through
to the Mercurial repo where the _real_ CPython code is stored?

ChrisA


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

MRAB-2
In reply to this post by Zachary Ware-2
On 2015-02-24 21:40, Zachary Ware wrote:

> On Tue, Feb 24, 2015 at 3:34 PM, Roy Smith <roy at panix.com> wrote:
>> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
>
> http://bugs.python.org/issue23515
>
> Note that the article does mention that Python is not vulnerable due
> to this bug (and best I can tell has no behavioral issues); no
> computer currently in existence can create a large enough array to
> cause a problem.
>
I think the key word here is "currently".



Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Skip Montanaro
On Tue, Feb 24, 2015 at 5:38 PM, MRAB <python at mrabarnett.plus.com> wrote:
> I think the key word here is "currently".

Sure, but it's not like you have to put out security updates tomorrow.
Even if/when we get to the point where machines can hold an array of
2**49 elements, I suspect people won't be using straight Python to
wrangle them. They will likely use other languages, and if using
Python, will be using something to accelerate things (PyPy, numpy,
etc). Does, for example, numpy's sort() function suffer from the same
flaw? Does PyPy use Timsort?

Skip


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Chris Angelico
On Wed, Feb 25, 2015 at 10:50 AM, Skip Montanaro
<skip.montanaro at gmail.com> wrote:
> Even if/when we get to the point where machines can hold an array of
> 2**49 elements, I suspect people won't be using straight Python to
> wrangle them.

Looking just at CPython, what is the absolute minimum storage space
for a massive list like that? My guess is that a 64-bit CPython might
get as good as eight bytes per element; playing around with smaller
figures (up to circa a million elements) shows it ranging from 8 to 9
bytes per element, bottoming out at just a smidge over 8, presumably
at the moments when there's no slack space (there's a fixed overhead,
which gets diminishingly significant). So an array of 2**49 elements
would take 2**52 bytes (4 petabytes) of storage - addressable, to be
sure, but an awful lot to be sorting.

Would it be sufficient to stick a comment into the source saying "This
may have problems with lists in excess of 2**49 elements", and leave
it until that's actually likely to happen?

ChrisA


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Mark Lawrence
In reply to this post by sohcahtoa82@gmail.com
On 24/02/2015 22:36, sohcahtoa82 at gmail.com wrote:

> On Tuesday, February 24, 2015 at 1:50:15 PM UTC-8, Mark Lawrence wrote:
>> On 24/02/2015 21:34, Roy Smith wrote:
>>> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
>>>
>>
>> As you can clearly no longer rely on Python it looks like, after a long
>> and loving relationship, I'm forced into moving on.  PHP, Applescript,
>> Javascript or back to C or even CORAL 66, what is your (plural)
>> recommendation(s)?
>>
>> --
>> My fellow Pythonistas, ask not what our language can do for you, ask
>> what you can do for our language.
>>
>> Mark Lawrence
>
> Personally, I'm going with BASIC being interpreted by a JavaScript script being fed from a PHP script running on an out-dated version of Apache.
>
> It includes an in-browser IDE that will be written as an ActiveX control and will require Internet Explorer 6.  An Adobe Flash version is in the works to allow it to run on modern browsers.
>

Thanks for your help, much appreciated :)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence



Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Chris Kaynor
In reply to this post by Chris Angelico
On Tue, Feb 24, 2015 at 4:07 PM, Chris Angelico <rosuav at gmail.com> wrote:

> On Wed, Feb 25, 2015 at 10:50 AM, Skip Montanaro
> <skip.montanaro at gmail.com> wrote:
> > Even if/when we get to the point where machines can hold an array of
> > 2**49 elements, I suspect people won't be using straight Python to
> > wrangle them.
>
> Looking just at CPython, what is the absolute minimum storage space
> for a massive list like that? My guess is that a 64-bit CPython might
> get as good as eight bytes per element; playing around with smaller
> figures (up to circa a million elements) shows it ranging from 8 to 9
> bytes per element, bottoming out at just a smidge over 8, presumably
> at the moments when there's no slack space (there's a fixed overhead,
> which gets diminishingly significant). So an array of 2**49 elements
> would take 2**52 bytes (4 petabytes) of storage - addressable, to be
> sure, but an awful lot to be sorting.
>
> Would it be sufficient to stick a comment into the source saying "This
> may have problems with lists in excess of 2**49 elements", and leave
> it until that's actually likely to happen?
>

I would agree that it will be quite a while as it stands before the bug
appears, so documenting it MAY be appropriate, as it is likely to be far
enough in the future that a number of unforeseen things may block the bug
from being an issue (TimSort may be replaced by something better, CPython
could die, etc).

That said, from what I understand, their proposed fix would be reasonably
easy to use, and have minimal cost, so it may be worthwhile to use just
because the issue is likely to be in use for quite a while before somebody
remembers the issue exists. Once somebody does hit it, it may take a while
to realize that TimSort is at fault, as it will only occur on some data
sets (my understanding is that the elements must be in certain
configurations with at-least the minimum number), so it will likely appear
as just a random crash occurring.

If it were fully up to me, I would do one of the following, in order of
preference:
1) Implement and test the fix provided on the blog. This requires the most
work due to licensing, reviewing, and testing.
2) Add a noisy warning/error to the code when sorting lists large enough to
potentially hit the error (probably 2**48 to leave some wiggle room). This
is much easier to do, but merely pushes the can down the road to be dealt
with later. It does still make it obvious what is going long when it
finally breaks.
3) Add a comment explaining the issue (likely linking back to the blog).
While by far the easiest "solution", again, this merely pushes the can down
the road, and it may be very non-obvious what went wrong when it breaks,
despite the comment.

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150224/4747e752/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Dave Angel-4
In reply to this post by Chris Angelico
On 02/24/2015 07:07 PM, Chris Angelico wrote:

> On Wed, Feb 25, 2015 at 10:50 AM, Skip Montanaro
> <skip.montanaro at gmail.com> wrote:
>> Even if/when we get to the point where machines can hold an array of
>> 2**49 elements, I suspect people won't be using straight Python to
>> wrangle them.
>
> Looking just at CPython, what is the absolute minimum storage space
> for a massive list like that? My guess is that a 64-bit CPython might
> get as good as eight bytes per element; playing around with smaller
> figures (up to circa a million elements) shows it ranging from 8 to 9
> bytes per element, bottoming out at just a smidge over 8, presumably
> at the moments when there's no slack space (there's a fixed overhead,
> which gets diminishingly significant). So an array of 2**49 elements
> would take 2**52 bytes (4 petabytes) of storage - addressable, to be
> sure, but an awful lot to be sorting.
>
> Would it be sufficient to stick a comment into the source saying "This
> may have problems with lists in excess of 2**49 elements", and leave
> it until that's actually likely to happen?
>
> ChrisA
>

One could do much better than that.  Put in a test for the length of the
list, and if it's too large for the current implementation, throw an
exception.  Much better than a comment.

--
DaveA


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Terry Reedy
In reply to this post by Roy Smith
On 2/24/2015 4:34 PM, Roy Smith wrote:
> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

Tim Peters is aware of this and opened http://bugs.python.org/issue23515


--
Terry Jan Reedy



Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Ned Deily
In reply to this post by Chris Angelico
In article
<CAPTjJmpQw-23mzgYHFjUC+y-C=vVm1dKePSpjG9+J3H8tHO7dA at mail.gmail.com>,
 Chris Angelico <rosuav at gmail.com> wrote:
> The post links to:
>
> http://svn.python.org/projects/python/trunk/Objects/listobject.c
>
> Is that still valid and current? It says svn, but does it pop through
> to the Mercurial repo where the _real_ CPython code is stored?

In general, no.  The python svn repo linked to above is frozen in time:
for most branches, as of the conversion to hg in 2011.  There was at
least one discussion thread soon thereafter about what to do about it:

http://comments.gmane.org/gmane.comp.python.devel/125252

It is still true that at least one repo in the svn.python.org is
actively maintained (e.g. "external").  I *think* all the others are now
obsolete.

http://svn.python.org/view/

--
 Ned Deily,
 nad at acm.org



Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Paddy
In reply to this post by Skip Montanaro
On Wednesday, 25 February 2015 00:08:32 UTC, Chris Angelico  wrote:
> On Wed, Feb 25, 2015 at 10:50 AM, Skip Montanaro
> <skip.mon-- at gmail.com> wrote:
> > Even if/when we get to the point where machines can hold an array of
> > 2**49 elements, I suspect people won't be using straight Python to
> > wrangle them.
>
<<snip>>
>
> Would it be sufficient to stick a comment into the source saying "This
> may have problems with lists in excess of 2**49 elements", and leave
> it until that's actually likely to happen?
>
> ChrisA

If we are given a proven fix with little downside then if we are to touch the source then we should MAKE THE FIX. To do otherwise would send the wrong signal to those that depend on the perceived integrity of the C-Python developers.


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Chris Angelico
On Wed, Feb 25, 2015 at 8:10 PM, Paddy <paddy3118 at gmail.com> wrote:
> If we are given a proven fix *with little downside* then if we are to touch the source then we should MAKE THE FIX.

(emphasis mine)

That's really the question, though. I'm in no position to evaluate the
patch and ascertain whether the fix does indeed have little or no
downside. But as the tracker issue proves, a greater mind than mine
has decided that it's to be applied, which means that, yes, it's worth
doing.

ChrisA


Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Robert Kern-2
In reply to this post by Grant Edwards-7
On 2015-02-24 22:45, Grant Edwards wrote:

> On 2015-02-24, Roy Smith <roy at panix.com> wrote:
>
>> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
>
> I don't get it.
>
>      3.2 Corrected Python merge_collapse function
>
>      merge_collapse(MergeState *ms)
>      {
>          struct s_slice *p = ms->pending;
>
>          assert(ms);
>          while (ms->n > 1) {
>              Py_ssize_t n = ms->n - 2;
>              if (     n > 0   && p[n-1].len <= p[n].len + p[n+1].len
>                  || (n-1 > 0 &&  p[n-2].len <= p[n].len + p[n-1].len)) {
>                  if (p[n-1].len < p[n+1].len)
>                      --n;
>                  if (merge_at(ms, n) < 0)
>                      return -1;
>              }
>              else if (p[n].len <= p[n+1].len) {
>                       if (merge_at(ms, n) < 0)
>                              return -1;
>              }
>              else
>                  break;
>          }
>          return 0;
>      }
>
> Or does "Python function" mean something else in this context?

"Corrected merge_collapse function [from the Python implementation of TimSort]"
as opposed to the Java implementation which was also discussed.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco



Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Sturla Molden
In reply to this post by Roy Smith
On 24/02/15 22:34, Roy Smith wrote:
> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
>

This is awful. It is broken for arrays longer than 2**49 elements. With
8 bytes per PyObject* pointer this is >4096 terabytes of RAM. I don't
see how we can fix this in time.

Oh yes, and they mention that TimSort is used on billions of devices due
to Android mobile phones. This is clearly very relevant for mobile
phones. Next thing you know your litte Samsung Galaxy with more than
4096 terabytes breaks down from a stack overflow in TimSort.


Sturla




Reply | Threaded
Open this post in threaded view
|

Bug in timsort!?

Jonas Wielicki
On 25.02.2015 14:58, Sturla Molden wrote:

> On 24/02/15 22:34, Roy Smith wrote:
>> http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/
>>
>>
> [?]
>
> Oh yes, and they mention that TimSort is used on billions of devices due
> to Android mobile phones. This is clearly very relevant for mobile
> phones. Next thing you know your litte Samsung Galaxy with more than
> 4096 terabytes breaks down from a stack overflow in TimSort.

The Java version of the bug is reproducible with just 67108864 elements,
if I read the article correctly.

jwi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20150225/5568af42/attachment.sig>

123