memoize & ordering of kwargs.items()

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

memoize & ordering of kwargs.items()

Jonathan Hartley
Hey,

I've been writing my own 'memoize' function a lot recently - I'm using
it as an interview question.

Here's what I've got (with a corresponding series of tests):

     def memoize(wrapped):

         cache = {}

         @wraps(wrapped)
         def wrapper(*args, **kwargs):
             key = (args, tuple(kwargs.items()))
             if key not in cache:
                 cache[key] = wrapped(*args, **kwargs)
             return cache[key]

         return wrapper

Yesterday a candidate pointed out that this is wrong because .items()
for two equal dictionaries might return the (key,value) pairs in a
different order, presumably dependent on the respective dictionary's
history. This would produce a different 'key' and hence an erroneous
cache miss.

This implies that the second line of 'wrapper' should become:

               key = (args, tuple(sorted(kwargs.items())))

(I've added 'sorted')
Looking at lru_cache, I see that is exactly how it is implemented there.
http://hg.python.org/cpython/file/default/Lib/functools.py

However, I'm unable to come up with a test that proves this is
necessary. I'm can create two equal dictionaries which return their
.items() in a different order:

         # The intent is that 'small.items()' comes out in a different order
         # than 'large.items()'
         small = {'x':1, 'y':5}
         large = {hex(i): i for i in range(257)}
         large.update(small)
         for i in range(257):
             del large[hex(i)]

 >>> print small.items()
     [('y', 5), ('x', 1)]
 >>> print large.items()
     [('x', 1), ('y', 5)]

If I could magically transfer these dictionaries directly into the
'kwargs' of wrapper, then I think I'd be done. However, I have to pass
these dictionaries in to wrapper via the '**' mechanism. Printing
'kwargs.items()' within 'wrapper' shows that equal dictionaries always
return their items in the same order, so the 'sorted' call is apparently
not necessary.

Is 'sorted' really required? If so, how can I write a test to
demonstrate it?

Best regards,

     Jonathan

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       +44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

René Dudfield
Good morning,

1. Run it on different versions of python, or on different machines/OSes.  That usually results in different dictionary ordering.
2. create a dict subclass that returns items always randomised.

    class my_fun_randomising_item_dict_subclass(dict):
        def items(self):
            return random.shuffle(dict.items(self, key))
    def make_key(kwargs):
        return (args, tuple(sorted(my_fun_randomising_item_dict_subclass(kwargs).items())))

cheers,

On Fri, Nov 11, 2011 at 9:42 AM, Jonathan <[hidden email]> wrote:
Hey,

I've been writing my own 'memoize' function a lot recently - I'm using it as an interview question.

Here's what I've got (with a corresponding series of tests):

   def memoize(wrapped):

       cache = {}

       @wraps(wrapped)
       def wrapper(*args, **kwargs):
           key = (args, tuple(kwargs.items()))
           if key not in cache:
               cache[key] = wrapped(*args, **kwargs)
           return cache[key]

       return wrapper

Yesterday a candidate pointed out that this is wrong because .items() for two equal dictionaries might return the (key,value) pairs in a different order, presumably dependent on the respective dictionary's history. This would produce a different 'key' and hence an erroneous cache miss.

This implies that the second line of 'wrapper' should become:

             key = (args, tuple(sorted(kwargs.items())))

(I've added 'sorted')
Looking at lru_cache, I see that is exactly how it is implemented there.
http://hg.python.org/cpython/file/default/Lib/functools.py

However, I'm unable to come up with a test that proves this is necessary. I'm can create two equal dictionaries which return their .items() in a different order:

       # The intent is that 'small.items()' comes out in a different order
       # than 'large.items()'
       small = {'x':1, 'y':5}
       large = {hex(i): i for i in range(257)}
       large.update(small)
       for i in range(257):
           del large[hex(i)]

>>> print small.items()
   [('y', 5), ('x', 1)]
>>> print large.items()
   [('x', 1), ('y', 5)]

If I could magically transfer these dictionaries directly into the 'kwargs' of wrapper, then I think I'd be done. However, I have to pass these dictionaries in to wrapper via the '**' mechanism. Printing 'kwargs.items()' within 'wrapper' shows that equal dictionaries always return their items in the same order, so the 'sorted' call is apparently not necessary.

Is 'sorted' really required? If so, how can I write a test to demonstrate it?

Best regards,

   Jonathan

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       <a href="tel:%2B44%207737%20062%20225" value="+447737062225" target="_blank">+44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Ross Lawley-2
In reply to this post by Jonathan Hartley
Hi,


"CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions."

Given that I would use sorted to guarantee order.  Then no matter what anyone has done to the kwargs passed in the memoizing will not have a bug.

Ross

On Fri, Nov 11, 2011 at 8:42 AM, Jonathan <[hidden email]> wrote:
Hey,

I've been writing my own 'memoize' function a lot recently - I'm using it as an interview question.

Here's what I've got (with a corresponding series of tests):

   def memoize(wrapped):

       cache = {}

       @wraps(wrapped)
       def wrapper(*args, **kwargs):
           key = (args, tuple(kwargs.items()))
           if key not in cache:
               cache[key] = wrapped(*args, **kwargs)
           return cache[key]

       return wrapper

Yesterday a candidate pointed out that this is wrong because .items() for two equal dictionaries might return the (key,value) pairs in a different order, presumably dependent on the respective dictionary's history. This would produce a different 'key' and hence an erroneous cache miss.

This implies that the second line of 'wrapper' should become:

             key = (args, tuple(sorted(kwargs.items())))

(I've added 'sorted')
Looking at lru_cache, I see that is exactly how it is implemented there.
http://hg.python.org/cpython/file/default/Lib/functools.py

However, I'm unable to come up with a test that proves this is necessary. I'm can create two equal dictionaries which return their .items() in a different order:

       # The intent is that 'small.items()' comes out in a different order
       # than 'large.items()'
       small = {'x':1, 'y':5}
       large = {hex(i): i for i in range(257)}
       large.update(small)
       for i in range(257):
           del large[hex(i)]

>>> print small.items()
   [('y', 5), ('x', 1)]
>>> print large.items()
   [('x', 1), ('y', 5)]

If I could magically transfer these dictionaries directly into the 'kwargs' of wrapper, then I think I'd be done. However, I have to pass these dictionaries in to wrapper via the '**' mechanism. Printing 'kwargs.items()' within 'wrapper' shows that equal dictionaries always return their items in the same order, so the 'sorted' call is apparently not necessary.

Is 'sorted' really required? If so, how can I write a test to demonstrate it?

Best regards,

   Jonathan

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       <a href="tel:%2B44%207737%20062%20225" value="+447737062225" target="_blank">+44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Jonathan Hartley
In reply to this post by Jonathan Hartley
That's good to know René, but I *think* it's orthogonal to the question. Please correct me if I'm wrong.

If PyPy returns items in a different order than CPython, that doesn't matter to me, so long as every invocation of my function in a particular process ends up receiving a particular  order, that doesn't change until the process ends (or the cache is cleared.)

The dict subclass is a great idea - but if the change in order isn't ever manifested by a regular dict, then it implies to me that the 'sorted' call isn't actually required in real life, so the need for this whole test disappears.

See also my imminent reply to Ross.

    Jonathan


On 11/11/2011 08:49, René Dudfield wrote:
Good morning,

1. Run it on different versions of python, or on different machines/OSes.  That usually results in different dictionary ordering.
2. create a dict subclass that returns items always randomised.

    class my_fun_randomising_item_dict_subclass(dict):
        def items(self):
            return random.shuffle(dict.items(self, key))
    def make_key(kwargs):
        return (args, tuple(sorted(my_fun_randomising_item_dict_subclass(kwargs).items())))

cheers,

On Fri, Nov 11, 2011 at 9:42 AM, Jonathan <[hidden email]> wrote:
Hey,

I've been writing my own 'memoize' function a lot recently - I'm using it as an interview question.

Here's what I've got (with a corresponding series of tests):

   def memoize(wrapped):

       cache = {}

       @wraps(wrapped)
       def wrapper(*args, **kwargs):
           key = (args, tuple(kwargs.items()))
           if key not in cache:
               cache[key] = wrapped(*args, **kwargs)
           return cache[key]

       return wrapper

Yesterday a candidate pointed out that this is wrong because .items() for two equal dictionaries might return the (key,value) pairs in a different order, presumably dependent on the respective dictionary's history. This would produce a different 'key' and hence an erroneous cache miss.

This implies that the second line of 'wrapper' should become:

             key = (args, tuple(sorted(kwargs.items())))

(I've added 'sorted')
Looking at lru_cache, I see that is exactly how it is implemented there.
http://hg.python.org/cpython/file/default/Lib/functools.py

However, I'm unable to come up with a test that proves this is necessary. I'm can create two equal dictionaries which return their .items() in a different order:

       # The intent is that 'small.items()' comes out in a different order
       # than 'large.items()'
       small = {'x':1, 'y':5}
       large = {hex(i): i for i in range(257)}
       large.update(small)
       for i in range(257):
           del large[hex(i)]

>>> print small.items()
   [('y', 5), ('x', 1)]
>>> print large.items()
   [('x', 1), ('y', 5)]

If I could magically transfer these dictionaries directly into the 'kwargs' of wrapper, then I think I'd be done. However, I have to pass these dictionaries in to wrapper via the '**' mechanism. Printing 'kwargs.items()' within 'wrapper' shows that equal dictionaries always return their items in the same order, so the 'sorted' call is apparently not necessary.

Is 'sorted' really required? If so, how can I write a test to demonstrate it?

Best regards,

   Jonathan

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       <a moz-do-not-send="true" href="tel:%2B44%207737%20062%20225" value="+447737062225" target="_blank">+44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk



_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk

-- 
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       +44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Duncan Booth
In reply to this post by Ross Lawley-2
On Fri, Nov 11, 2011 at 8:50 AM, Ross Lawley <[hidden email]> wrote:

>> However, I'm unable to come up with a test that proves this is necessary.
>> I'm can create two equal dictionaries which return their .items() in a
>> different order:
>>
>>        # The intent is that 'small.items()' comes out in a different order
>>        # than 'large.items()'
>>        small = {'x':1, 'y':5}
>>        large = {hex(i): i for i in range(257)}
>>        large.update(small)
>>        for i in range(257):
>>            del large[hex(i)]
>>
There are two ways to create dictionaries that are equal but have
their keys in different order. The way you used ends up with a very
sparse dictionary after you deleted the padding keys. When you call a
function it copies the kwargs dictionary so at that point the
sparseness is lost as is your artificial key ordering.

The other way is simply to pick keys that that hash to the same value
modulo the size of the dictionary. Since the dictionary copy copies
the keys in the order they are stored you will get the same hash
conflict in the copy as the original.

>> Is 'sorted' really required? If so, how can I write a test to demonstrate
>> it?

This should help you:

>>> def foo(**kw):
        print(kw)

       
>>> foo(a=None, i=None)
{'a': None, 'i': None}
>>> foo(i=None, a=None)
{'i': None, 'a': None}
>>>

And here's an easy way to find some conflicts. You could use this code
to find a conflict instead of hardwiring the argument names in the
test.

>>> for p, q in itertools.product(ascii_letters, ascii_letters):
        d1, d2 = {p:None, q:None}, {q:None, p:None}
        if repr(d1) != repr(d2):
                print(d1)
                count += 1
                if count > 20:
                        break

               
{'a': None, 'i': None}
{'a': None, 'q': None}
{'a': None, 'y': None}
{'a': None, 'A': None}
{'a': None, 'I': None}
{'a': None, 'Q': None}
{'a': None, 'Y': None}
{'b': None, 'j': None}
{'b': None, 'z': None}
{'c': None, 'k': None}
{'c': None, 's': None}
{'c': None, 'C': None}
{'c': None, 'K': None}
{'c': None, 'S': None}
{'d': None, 'l': None}
{'d': None, 't': None}
{'d': None, 'D': None}
{'d': None, 'L': None}
{'d': None, 'T': None}
{'m': None, 'e': None}
{'u': None, 'e': None}
_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Jonathan Hartley
In reply to this post by Jonathan Hartley
Thanks Ross.

The thing is, the 'kwargs' dict that callers pass in is not the same dict as the kwargs that my wrapper function receives. My 'wrapper' function always receives a brand-new dictionary, created from scratch as the function is invoked. Hence, the kwargs never has any 'history' - they are always newly created, from the keyword args passed in to the function. This, I suspect, is why their 'items()' always appear in the same order.

I tried calling 'wrapper' with keyword args in a different order:

    wrapper(x=1, y=2, z=3)
    wrapper(y=2, x=1, z=3)
    ...etc, for every permutation

and by passing in **kwargs dictionaries with different histories (detailed in my original mail):

    wrapper(**kwargs)

but in every case, inside wrapper...

    def wrapper(*args, **kwargs):
        print kwargs.items()

...the items call always returns the dictionary's content in the same order. So I *suspect*, but cannot yet prove, that the '**' mechanism is already sorting the keyword args used to construct the kwargs dictionary (or is doing some equivalent, such that the dicts always end up identical-including-order-of-items())

I agree that 'to be sure' is enough of a justification for including the 'sorted' call anyway. But it irks me that I can't write a test to show it.

    Jonathan


On 11/11/2011 08:50, Ross Lawley wrote:
Hi,


"CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions."

Given that I would use sorted to guarantee order.  Then no matter what anyone has done to the kwargs passed in the memoizing will not have a bug.

Ross

On Fri, Nov 11, 2011 at 8:42 AM, Jonathan <[hidden email]> wrote:
Hey,

I've been writing my own 'memoize' function a lot recently - I'm using it as an interview question.

Here's what I've got (with a corresponding series of tests):

   def memoize(wrapped):

       cache = {}

       @wraps(wrapped)
       def wrapper(*args, **kwargs):
           key = (args, tuple(kwargs.items()))
           if key not in cache:
               cache[key] = wrapped(*args, **kwargs)
           return cache[key]

       return wrapper

Yesterday a candidate pointed out that this is wrong because .items() for two equal dictionaries might return the (key,value) pairs in a different order, presumably dependent on the respective dictionary's history. This would produce a different 'key' and hence an erroneous cache miss.

This implies that the second line of 'wrapper' should become:

             key = (args, tuple(sorted(kwargs.items())))

(I've added 'sorted')
Looking at lru_cache, I see that is exactly how it is implemented there.
http://hg.python.org/cpython/file/default/Lib/functools.py

However, I'm unable to come up with a test that proves this is necessary. I'm can create two equal dictionaries which return their .items() in a different order:

       # The intent is that 'small.items()' comes out in a different order
       # than 'large.items()'
       small = {'x':1, 'y':5}
       large = {hex(i): i for i in range(257)}
       large.update(small)
       for i in range(257):
           del large[hex(i)]

>>> print small.items()
   [('y', 5), ('x', 1)]
>>> print large.items()
   [('x', 1), ('y', 5)]

If I could magically transfer these dictionaries directly into the 'kwargs' of wrapper, then I think I'd be done. However, I have to pass these dictionaries in to wrapper via the '**' mechanism. Printing 'kwargs.items()' within 'wrapper' shows that equal dictionaries always return their items in the same order, so the 'sorted' call is apparently not necessary.

Is 'sorted' really required? If so, how can I write a test to demonstrate it?

Best regards,

   Jonathan

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       <a moz-do-not-send="true" href="tel:%2B44%207737%20062%20225" value="+447737062225" target="_blank">+44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk



_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk

-- 
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       +44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

René Dudfield
In reply to this post by Jonathan Hartley
On Fri, Nov 11, 2011 at 10:23 AM, Jonathan <[hidden email]> wrote:
That's good to know René, but I *think* it's orthogonal to the question. Please correct me if I'm wrong.

If PyPy returns items in a different order than CPython, that doesn't matter to me, so long as every invocation of my function in a particular process ends up receiving a particular  order, that doesn't change until the process ends (or the cache is cleared.)

The dict subclass is a great idea - but if the change in order isn't ever manifested by a regular dict, then it implies to me that the 'sorted' call isn't actually required in real life, so the need for this whole test disappears.

See also my imminent reply to Ross.

    Jonathan




Just because your code works Now, does not mean it will work in the future.  Especially when the ordering is explicitly marked as a non-deterministic implementation detail.  Relying on it to be deterministic when it is stated to be non-deterministic will mean that in the future you could get a failure.

Relying on implementation quirks is fine, but not reliable.  In your case though, the worst that can happen now is a cache miss.  In the future though, your cache could be completely useless because every time the data could be in a different order.

Testing on multiple platforms allows you to sometimes see a failure condition, which lets you test it.  Likewise, using a randomised dict subclass will let you test your code as if there is an error condition.

cya.

ps.  my randomised items subclass is buggy... I forgot that random.shuffle works in place, and doesn't return a shuffled sequence.

_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Jonathan Hartley
In reply to this post by Ross Lawley-2
On 11/11/2011 09:24, Duncan Booth wrote:
> pick keys that that hash to the same value modulo the size of the
> dictionary. Since the dictionary copy copies the keys in the order
> they are stored you will get the same hash conflict in the copy as the
> original.

Brilliant, thank-you. So I was just being 'lucky' in my choices of dict
keys. I believe this is the answer I am looking for. René & Ross's point
about being mindful of different implementations is still very pertinent
in my mind, but I'm happy for now.

Thanks everyone!

     Jonathan

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       +44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Jonathan Hartley
In reply to this post by Jonathan Hartley
On 11/11/2011 09:34, René Dudfield wrote:
On Fri, Nov 11, 2011 at 10:23 AM, Jonathan <[hidden email]> wrote:
That's good to know René, but I *think* it's orthogonal to the question. Please correct me if I'm wrong.

If PyPy returns items in a different order than CPython, that doesn't matter to me, so long as every invocation of my function in a particular process ends up receiving a particular  order, that doesn't change until the process ends (or the cache is cleared.)

The dict subclass is a great idea - but if the change in order isn't ever manifested by a regular dict, then it implies to me that the 'sorted' call isn't actually required in real life, so the need for this whole test disappears.

See also my imminent reply to Ross.

    Jonathan



Just because your code works Now, does not mean it will work in the future.  Especially when the ordering is explicitly marked as a non-deterministic implementation detail.  Relying on it to be deterministic when it is stated to be non-deterministic will mean that in the future you could get a failure.

Relying on implementation quirks is fine, but not reliable.  In your case though, the worst that can happen now is a cache miss.  In the future though, your cache could be completely useless because every time the data could be in a different order.

Testing on multiple platforms allows you to sometimes see a failure condition, which lets you test it.  Likewise, using a randomised dict subclass will let you test your code as if there is an error condition.

cya.

ps.  my randomised items subclass is buggy... I forgot that random.shuffle works in place, and doesn't return a shuffled sequence.


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk

Hey René,
Thanks for that! I guess I was getting distracted because I was interpreting the situation as the fact that .items() for an *arbitrary* dictionary may well be non-deterministic, but in this particular case, for dictionaries newly-created by the '**' mechanism, they appear to be non-deterministic. I was forgetting that this is (presumably) also an implementation detail. Regardless, it sounds like you are probably right on all counts. Thanks for your enlightening thoughts.
Cheers,
    Jonathan
-- 
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       +44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Jonathan Hartley
In reply to this post by Ross Lawley-2
On 11/11/2011 09:35, Jonathan wrote:

> On 11/11/2011 09:24, Duncan Booth wrote:
>> pick keys that that hash to the same value modulo the size of the
>> dictionary. Since the dictionary copy copies the keys in the order
>> they are stored you will get the same hash conflict in the copy as
>> the original.
>
> Brilliant, thank-you. So I was just being 'lucky' in my choices of
> dict keys. I believe this is the answer I am looking for. René &
> Ross's point about being mindful of different implementations is still
> very pertinent in my mind, but I'm happy for now.
>
> Thanks everyone!
>
>     Jonathan
>

For completeness, my final test is therefore:

     def test_not_dependant_on_order_of_kwargs(self):
         calls = []

         @memoize
         def counter(**kwargs):
             calls.append(1)

         for first in ascii_letters:
             for second in ascii_letters:
                 forward = {first:0, second:0}
                 reverse = {second:0, first:0}

                 del calls[:]
                 counter(**forward)
                 counter(**reverse)
                 self.assertEqual(len(calls), 1,
                     '%d for %s,%s' % (len(calls), first, second))


Without the 'sorted', this fails with:
======================================================================
FAIL: test_not_dependant_on_order_of_kwargs (__main__.MemoizeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "./test.py", line 58, in test_not_dependant_on_order_of_kwargs
     '%d for %s,%s' % (len(calls), first, second))
AssertionError: 2 for a,i

----------------------------------------------------------------------


Jubilant!

     Jonathan

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       +44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Jonathan Hartley
On 11/11/2011 11:14, Jonathan wrote:

> On 11/11/2011 09:35, Jonathan wrote:
>> On 11/11/2011 09:24, Duncan Booth wrote:
>>> pick keys that that hash to the same value modulo the size of the
>>> dictionary. Since the dictionary copy copies the keys in the order
>>> they are stored you will get the same hash conflict in the copy as
>>> the original.
>>
>> Brilliant, thank-you. So I was just being 'lucky' in my choices of
>> dict keys. I believe this is the answer I am looking for. René &
>> Ross's point about being mindful of different implementations is
>> still very pertinent in my mind, but I'm happy for now.
>>
>> Thanks everyone!
>>
>>     Jonathan
>>
>
> For completeness, my final test is therefore:
>
>     def test_not_dependant_on_order_of_kwargs(self):
>         calls = []
>
>         @memoize
>         def counter(**kwargs):
>             calls.append(1)
>
>         for first in ascii_letters:
>             for second in ascii_letters:
>                 forward = {first:0, second:0}
>                 reverse = {second:0, first:0}
>
>                 del calls[:]
>                 counter(**forward)
>                 counter(**reverse)
>                 self.assertEqual(len(calls), 1,
>                     '%d for %s,%s' % (len(calls), first, second))
>

oops, but that doesn't pass with the 'sorted'. This does:

         for index, first in enumerate(ascii_letters):
             for second in ascii_letters[index:]:

(i.e. I was testing each combination forwards and backwards, but then
later in the iteration also testing them again, in the opposite order,
which resulted in 0 extra calls to counter.)

--
Jonathan Hartley    [hidden email]    http://tartley.com
Made of meat.       +44 7737 062 225       twitter/skype: tartley


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Duncan Booth
In reply to this post by Jonathan Hartley
On Fri, Nov 11, 2011 at 10:14 AM, Jonathan <[hidden email]> wrote:

> Hey René,
> Thanks for that! I guess I was getting distracted because I was interpreting
> the situation as the fact that .items() for an *arbitrary* dictionary may
> well be non-deterministic, but in this particular case, for dictionaries
> newly-created by the '**' mechanism, they appear to be non-deterministic. I
> was forgetting that this is (presumably) also an implementation detail.
> Regardless, it sounds like you are probably right on all counts. Thanks for
> your enlightening thoughts.
> Cheers,
>     Jonathan

René is indeed correct to in what he says, but I think you are also
correct to want to write a test case that shows the code is not
depending on the dict order. Or in other words it is useful to have
the counterexample I supplied, but the absence of a counterexample
still wouldn't be an excuse for not fixing the code.

Likewise there was a question on SO about using (s is "") to test for
empty strings. There is a counterexample for Python 2.x but not so far
as I know for Python 3.x but that still means people shouldn't do it
in Python 3.x (the questioner in that case knew that but they wanted
an example to throw at their colleague).
_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Bugzilla from a.harrowell@gmail.com
In reply to this post by Jonathan Hartley

On Friday 11 Nov 2011 09:30:23 Jonathan wrote:

> I agree that 'to be sure' is enough of a justification for including the 'sorted' call anyway. But it irks me that I can't write a test to show it.


This is surely inherent in the fact that python dictionaries are not deterministically ordered. The dox don't say that you will always get key-val pairs back in *different* orders - just that you cannot rely on them being in the *same* order. Equally, you can't assume that they *won't*.



Further, the dox also suggest that the behaviour is not truly random, so you can't rely on statistical randomness. You just have to avoid the whole idea of order when retrieving items from a python dictionary and treat it purely as a hash table that supports both a get-item-by-id behaviour and also some iterative and setwise methods.



ISTR there are cases where two calls to dict.items() (and dict.keys() and dict.values()) *will* get the same order if there was no intervening write to the dict, but I suspect it's best to stick to the "right" behaviour.


--

The only thing worse than e-mail disclaimers...is people who send e-mail to lists complaining about them


_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: memoize & ordering of kwargs.items()

Matt Hamilton
<base href="x-msg://2440/">
On 11 Nov 2011, at 12:54, Alexander Harrowell wrote:

This is surely inherent in the fact that python dictionaries are not deterministically ordered. The dox don't say that you will always get key-val pairs back in *different* orders - just that you cannot rely on them being in the *same* order. Equally, you can't assume that they *won't*.

In CPython they *are* deterministically ordered though through their implementation. The API just states that for the purposes of the API you cannot rely on the order of the keys/values returned as that is a side effect of the implementation.

-Matt

NETSIGHT

Matt Hamilton

Technical Director
Email
[hidden email]

Telephone
+44 (0) 117 909 0901


Web
www.netsight.co.uk

Address
40 Berkeley Square, Clifton 
Bristol BS8 1HU







_______________________________________________
python-uk mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-uk