[Django] #29215: Document potential change to behaviour of QuerySet methods when upgrading to Python 3.6

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Django] #29215: Document potential change to behaviour of QuerySet methods when upgrading to Python 3.6

Django
#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------
               Reporter:  Matt       |          Owner:  nobody
  Fisher                             |
                   Type:             |         Status:  new
  Cleanup/optimization               |
              Component:             |        Version:  2.0
  Documentation                      |
               Severity:  Normal     |       Keywords:  documentation
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Python 3.6 introduces
 [https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep468 a change]
 that preserves the ordering of keyword arguments -
 [https://www.python.org/dev/peps/pep-0468/ PEP 468: Preserving Keyword
 Argument Order]

 Since this is not a difference between Python 2 and 3 per se, it is not
 mentioned in the [https://docs.djangoproject.com/en/1.11/topics/python3/
 “Porting to Python 3”] Django 1.11 topic page.

 Preserving keyword argument ordering can change the SQL produced by
 QuerySet methods that accept `**kwargs`, specifically .get, .filter, and
 .exclude, and potentially also .values, .annotate, .aggregate, .create,
 .get_or_create, .update, and .update_or_create.

 As an example,
 `.filter(b__in=some_qs, a=1)`
 could produce
 `... WHERE (a=1 AND b IN (SELECT …))` in Python < 3.6 and
 `... WHERE (b IN (SELECT …) AND a=1)` in Python >= 3.6.
 The query is semantically equivalent but the postgres query planner can
 produce dramatically different query plans and resultant performance if
 the query has a significant number of joins.

 We had one particular frequently-run query that ran in 2-3 seconds in
 Python 2, and 80+ seconds in Python 3.6. The problem was only apparent in
 production, and presented as severely degraded responsiveness from the
 postgres RDS database. The additional load on the database made most other
 queries take longer as well, which made it difficult to track down the
 cause as it was not obvious that any queries could have changed. To
 compound the problem, the different query execution caused many more temp
 block writes on the RDS instance, which burned through our IOPS burst
 balance and further degraded site performance. Because the query didn’t
 change semantically, all our automated tests passed and we didn’t see the
 issue until it was in production under full load. Googling the problem
 produced no good results, which suggests it is not common, but in need of
 a useful resource.

 People may be affected by this change when upgrading from Python 2.7 to
 Python 3.6+ using Django 1.x, in which case it would be helpful to have a
 warning in the [https://docs.djangoproject.com/en/1.11/topics/python3/
 1.11 porting topic], but it will also occur when people are upgrading from
 Python 3.[0-5] to 3.6+ while potentially using Django 2, in which case the
 porting topic is no longer in the current docs. Not sure what would be an
 appropriate place for a warning in this case.

--
Ticket URL: <https://code.djangoproject.com/ticket/29215>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/053.a276fea7be60a828a7d9241ef92aa39c%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #29215: Document potential change to behaviour of QuerySet methods when upgrading to Python 3.6

Django
#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------
     Reporter:  Matt Fisher          |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Documentation        |                  Version:  2.0
     Severity:  Normal               |               Resolution:
     Keywords:  documentation        |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

 * cc: Simon Charette (added)


Comment:

 This behavior has little to do with Python 3.6 itself.

 If you had query performing better on Python 2.7 than on Python 3 it's
 probably because you weren't using
 [https://docs.python.org/2/using/cmdline.html?highlight=pythonhashseed#cmdoption-r
 a random hash seed] which is the default on Python 3. In other words, you
 were relying on a stable hash seed to build queries in a way that
 PostgreSQL query planner's would correctly optimize. If you had updated
 from 2.7 to Python 3.5 instead you would have hit the slow query issue
 from time to time when the interpreter chose an hash seed that made
 `list({'b__in', 'a'}) == ['b__in', 'a']`. In some way Python 3.6 allows
 you to reliably build your `WHERE` statement through kwargs ordering while
 it wasn't possible before.

 I guess we could add a section about the fact Python 3 has random hash
 seeding on by default and that may break code making assumptions about
 unordered data structures ordering. Maybe we should also suggest running
 Python 2.7 with the `-R` flag on when porting code?

--
Ticket URL: <https://code.djangoproject.com/ticket/29215#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/068.91974fd3de7338a81ffd249bbf4446ce%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #29215: Document potential change to behaviour of QuerySet methods when upgrading to Python 3.6

Django
In reply to this post by Django
#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------
     Reporter:  Matt Fisher          |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Documentation        |                  Version:  2.0
     Severity:  Normal               |               Resolution:
     Keywords:  documentation        |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

 I think the documentation suggests that `.filter(b__in=some_qs, a=1)` is
 equivalent to `.filter(a=1, b__in=some_qs)`. I'm not sure if we should
 document the subtleties you brought up about the actual SQL that's
 produced. I doubt there are any tests for this "feature." If we don't
 document it, then documenting a possible change regarding it doesn't make
 much sense. Well, it's now informally documented with this ticket.

 By the way, we're no longer updating the Django 1.11 documentation, which
 includes the "porting to Python 3" document. The Django 1.11 documentation
 [https://docs.djangoproject.com/en/1.11/howto/deployment/checklist
 /#python-options recommends] using `python -R` for a random hash seed,
 when using Python 2.

--
Ticket URL: <https://code.djangoproject.com/ticket/29215#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/068.d45c6f84a649decabcfd67bd7db66d36%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #29215: Document potential change to behaviour of QuerySet methods when upgrading to Python 3.6

Django
In reply to this post by Django
#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------
     Reporter:  Matt Fisher          |                    Owner:  nobody
         Type:                       |                   Status:  closed
  Cleanup/optimization               |
    Component:  Documentation        |                  Version:  2.0
     Severity:  Normal               |               Resolution:  invalid
     Keywords:  documentation        |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

 * status:  new => closed
 * resolution:   => invalid


Comment:

 There's not much we can realistically do here.

 I'd say it would be border-line for whether to include it in the 1.11
 porting topic page, but since that's no longer maintained that question is
 moot.

 I feel the issue itself out of scope for Django: we're generating correct
 SQL, which is meant to be declarative, i.e. independent of how the RDMS
 implements the query.
 This falls with any number of backend specific quirks that we just can't
 get into. (It's interesting what the Postgres folks would say about it.)

--
Ticket URL: <https://code.djangoproject.com/ticket/29215#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/068.df323d0b8ca326b77aaa632377dbcf25%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.