[Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

Django
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
               Reporter:  Mike       |          Owner:  nobody
  Lissner                            |
                   Type:  Bug        |         Status:  new
              Component:  Database   |        Version:  2.1
  layer (models, ORM)                |       Keywords:  prefetch,
               Severity:  Normal     |  multidatabase
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 I've been poking around the darker edges of multidatabase prefetching and
 I discovered a strange pattern. When you have cached prefetch values,
 they're not aware of which database they came from:

 {{{
 # This runs zero queries
 In [62]: pizzas =
 Pizza.objects.filter(pk=17).prefetch_related('toppings').using('replica')

 # This runs two, as expected
 In [63]: pizza0 = pizzas[0]

 # This runs zero, even though it normally would come from "default" and
 the cache came from "replica"
 In [64]: pizza0.toppings.all()
 Out[64]: [<Topping 1: Cheese>, <Topping 2: Bacon>]

 # This runs one query, even though this data should already be populated.
 In [65]: pizza0.toppings.all().using('replica')
 Out[65]: [<Topping 1: Cheese>, <Topping 2: Bacon>]
 }}}

 I was pretty surprised and confused that adding the using() method on the
 last line above busted the cache even when the database selected was the
 same as the one used to pre-populate the cache.

 I was also surprised (though less so) at the opposite, that *not*
 including the non-default DB (on line 64) *was* able to hit the cache
 (which was populated by "replica") instead of hitting the default database
 (as the query would normally do).

 On IRC chatting about this, the question was: "Well, should filter hit the
 DB or use the cache? What about exclude and other methods? Where do you
 draw the line?" I think the simple line to draw is whether something
 changes the SQL query. using() methods don't change SQL, so they should
 work properly on a cached result. filter() and exclude() do change the
 SQL, so they *shouldn't* use the cache.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/051.56ad9d27482adcbf7542976b9f3c028f%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

Django
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
     Reporter:  Mike Lissner         |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  prefetch,            |             Triage Stage:
  multidatabase                      |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

 * version:  2.1 => master
 * type:  Bug => Cleanup/optimization


Comment:

 Hi Mike. Thanks for the report.

 On the basis of the difference between `ln64` and `ln65` I'm inclined to
 accept this. `pizza0` came from `replica` and the `ln64` call just goes
 with that. (That seems right/expected to me...) I see your point that
 adding the `using()` shouldn't change that.

 At first glance the ''"if you didn't add a `filter()`/`exclude()`"'' idea
 seems OK too.

 Two questions though:

 1. What's the use-case for re-adding the `using()` call in `ln65`? I can't
 quite see why you'd do that...?
 2. Have you looked at all at what a patch might look like? (If it's simple
 enough, that helps...)

 Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.57b0ebb79613b99b5c58dc568ea61530%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

Django
In reply to this post by Django
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
     Reporter:  Mike Lissner         |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  prefetch,            |             Triage Stage:
  multidatabase                      |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Mike Lissner):

 Replying to [comment:1 Carlton Gibson]:
 > On the basis of the difference between `ln64` and `ln65` I'm inclined to
 accept this. `pizza0` came from `replica` and the `ln64` call just goes
 with that. (That seems right/expected to me...) I see your point that
 adding the `using()` shouldn't change that.

 Good! I made myself clear. Always a good thing.

 > At first glance the ''"if you didn't add a `filter()`/`exclude()`"''
 idea seems OK too.

 That was my analysis too: "This seems to be a reasonable line in the
 sand."

 > Two questions though:
 >
 > 1. What's the use-case for re-adding the `using()` call in `ln65`? I
 can't quite see why you'd do that...? (Another way of looking at `using()`
 might be "go back to **this** DB"...)

 I can't make a really clear argument for my use case, but basically it
 was, "I'm doing weird things with prefetching and I want to be sure Django
 pulls this data properly from the right DB/cache." In other words, it was
 me trying to be explicit about where the caches were *supposed* come from,
 and then getting bit by having the caches busted. From my reading, ln64 is
 totally unclear which DB it would use since it would normally use the
 "default" DB, but had a cached value, so which wins? My solution was to be
 more explicit to make sure it used the cache, but that backfired (luckily,
 I was watching the query logs).

 > 2. Have you looked at all at what a patch might look like? (If it's
 simple enough, that helps...)

 No, not at all, unfortunately. I've looked into the Django code from time
 to time, but mostly I stay on my side of the API, lest I get in trouble.
 Especially when it comes to the ORM bits.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.3365298f0f81fa890d2798b2735c86a4%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

Django
In reply to this post by Django
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
     Reporter:  Mike Lissner         |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  prefetch,            |             Triage Stage:
  multidatabase                      |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

 * Attachment "ticket_30079.patch" added.

 Adjustment of admonition wording.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.b8f11fbc27c6d72a45a02a2275015702%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

Django
In reply to this post by Django
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
     Reporter:  Mike Lissner         |                    Owner:  nobody
         Type:                       |                   Status:  closed
  Cleanup/optimization               |
    Component:  Documentation        |                  Version:  master
     Severity:  Normal               |               Resolution:  wontfix
     Keywords:  prefetch,            |             Triage Stage:
  multidatabase                      |  Unreviewed
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

 * status:  new => closed
 * resolution:   => wontfix
 * has_patch:  0 => 1
 * component:  Database layer (models, ORM) => Documentation


Comment:

 OK, at best, this is a cleanup on the documentation. The admonition note
 in the
 [https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.query.QuerySet.prefetch_related
 `prefech_related` docs] already covers this:

 > Remember that, as always with **QuerySets**, any subsequent chained
 methods which imply a different database query will ignore previously
 cached results, and retrieve data using a fresh database query.

 `using()` is a ''subsequent chained method''. It returns a new QuerySet,
 and the results and prefetch caches are not transferred.

 Arguably, `using()`, whilst returning a new QuerySet, doesn't actually
 imply a different query.

 As such I drafted the attached patch.

 > ... as always with ``QuerySets``, any subsequent chained methods which
 return a new ``QuerySet``, normally implying a different database query
 ...

 In my opinion, the patch unnecessarily complicates the meaning. We know
 from our use of QuerySets that all the
 [https://docs.djangoproject.com/en/2.1/ref/models/querysets/#methods-that-
 return-new-querysets Methods that return new QuerySets] behave this way,
 even if we might forget it occasionally. Thus `wontfix`.

 Thanks for the report Mike!

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.9a293191d2f6aaf304c853bc39b82b11%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

Django
In reply to this post by Django
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
     Reporter:  Mike Lissner         |                    Owner:  nobody
         Type:                       |                   Status:  closed
  Cleanup/optimization               |
    Component:  Documentation        |                  Version:  master
     Severity:  Normal               |               Resolution:  wontfix
     Keywords:  prefetch,            |             Triage Stage:
  multidatabase                      |  Unreviewed
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Mike Lissner):

 Whew, that's pretty subtle in both the old and the new versions of the
 documentation. I have to confess that I don't think of chaining as the
 trigger for a new queryset. I think about the query as a new trigger for a
 queryset. I guess I never read this part of the documentation or didn't
 take it to heart.

 I don't know, I don't think we can document our way out of this behavior,
 but I have no idea how difficult this is to fix technically.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups "Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/066.a41447b1d50e7e266f06284f22f44a40%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.