Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Jakub Kleň
Hi guys, I came across a problem today while testing my webapp with a full migrations applied from my legacy site for the first time.
The problem is I'm using Prefetch in my view query (get_queryset), and it works fine when I just fetch it, but the problem comes with the .count() query which generates sql with subqueries, and it takes really long to process by MySQL.
The nicest solution I'm thinking about would be to use 2 different queries for object_list / pagination count, because for count the query would be really simple, and wouldn't need the prefetches.
I think this scenario may be pretty likely to come across also for other Django users, so I'm thinking if it wouldn't be nice to have it directly in Django.
From my point of view, it seems like the implementation would be pretty straightforward and easy, and I would even like to look into it and contribute to the Django project if possible and if you think it would be useful.
I would like to add count_queryset param to Paginator.__init__, and def get_count_queryset to MultipleObjectMixin. Happy to hear your opinions on this :)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Adam Johnson-2
it takes really long to process by MySQL.

Which version? Subquery performance gets better in MySQL 5.6 and even more so on MariaDB.

 for count the query would be really simple, and wouldn't need the prefetches.

When counting, prefetches aren't actaully executed. Check the actual SQL with e.g. django-debug-toolbar .

On Fri, 29 Jun 2018 at 21:24, Jakub Kleň <[hidden email]> wrote:
Hi guys, I came across a problem today while testing my webapp with a full migrations applied from my legacy site for the first time.
The problem is I'm using Prefetch in my view query (get_queryset), and it works fine when I just fetch it, but the problem comes with the .count() query which generates sql with subqueries, and it takes really long to process by MySQL.
The nicest solution I'm thinking about would be to use 2 different queries for object_list / pagination count, because for count the query would be really simple, and wouldn't need the prefetches.
I think this scenario may be pretty likely to come across also for other Django users, so I'm thinking if it wouldn't be nice to have it directly in Django.
From my point of view, it seems like the implementation would be pretty straightforward and easy, and I would even like to look into it and contribute to the Django project if possible and if you think it would be useful.
I would like to add count_queryset param to Paginator.__init__, and def get_count_queryset to MultipleObjectMixin. Happy to hear your opinions on this :)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM1zPun1epd8G7BpV5z-bWXAq-rMkvmZvwgLGqsTzadGFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Tom Forbes
In reply to this post by Jakub Kleň
Are you sure it is the prefetches that is causing this? As Adam pointed out these are correctly ignored. Annotations however are not, which can cause unnecessary work and longer execution times. i.e:

Book.objects.annotate(something=Max('pages__word_count')).count()

We have enough information to be able to strip the pages join when using count() it certain conditions are met, and this might be preferable than adding workarounds to various places.


On Fri, 29 Jun 2018, 22:00 Jakub Kleň, <[hidden email]> wrote:
Hi guys, I came across a problem today while testing my webapp with a full migrations applied from my legacy site for the first time.
The problem is I'm using Prefetch in my view query (get_queryset), and it works fine when I just fetch it, but the problem comes with the .count() query which generates sql with subqueries, and it takes really long to process by MySQL.
The nicest solution I'm thinking about would be to use 2 different queries for object_list / pagination count, because for count the query would be really simple, and wouldn't need the prefetches.
I think this scenario may be pretty likely to come across also for other Django users, so I'm thinking if it wouldn't be nice to have it directly in Django.
From my point of view, it seems like the implementation would be pretty straightforward and easy, and I would even like to look into it and contribute to the Django project if possible and if you think it would be useful.
I would like to add count_queryset param to Paginator.__init__, and def get_count_queryset to MultipleObjectMixin. Happy to hear your opinions on this :)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJNKOE-8QA4w-74U42FCnX%2B685W9Wm8MJN_L0b7QVZYVKQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Jakub Kleň
After some more investigation yesterday, I found out that it isn't the prefetches and the subquery, and exactly as you write, it's annotations which generate joins, and for count() the difference is it has to go thru the whole db table, and that is slowing it down a lot. For my queryset, the normal .all() query takes 0.006s, and the count query takes 16.6s. And I noticed, that even for queries that didn't seem problematic (1 join or so), it actually slows them down a lot, so basically every queryset in a paginated view is getting slowed down if it's a bit more complex, and you're totally true that the real solution would be to optimize them directly for the .count() calls. That would be an awesome tweak! :)

Yesterday I was looking into Django, because I needed to implement some workaround for my app, and created a PR which doesn't seem to me like a good change now.
I basically implemented the stuff mentioned in my first message: https://github.com/django/django/pull/10114
I guess I can reply to the PR and close it now, sorry for the SPAM, I thought it was a good idea to have it there.

For now I have a workaround which basically adds the functionality in the PR to my views using a mixin, so It's fine, but I would be really happy to see the tweak you mentioned in Django some day :)

Thank you for your time,
Jakub.

On Saturday, June 30, 2018 at 5:40:32 PM UTC+2, Tom Forbes wrote:
Are you sure it is the prefetches that is causing this? As Adam pointed out these are correctly ignored. Annotations however are not, which can cause unnecessary work and longer execution times. i.e:

Book.objects.annotate(something=Max('pages__word_count')).count()

We have enough information to be able to strip the pages join when using count() it certain conditions are met, and this might be preferable than adding workarounds to various places.


On Fri, 29 Jun 2018, 22:00 Jakub Kleň, <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="kND-VXN_CAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">kuk...@...> wrote:
Hi guys, I came across a problem today while testing my webapp with a full migrations applied from my legacy site for the first time.
The problem is I'm using Prefetch in my view query (get_queryset), and it works fine when I just fetch it, but the problem comes with the .count() query which generates sql with subqueries, and it takes really long to process by MySQL.
The nicest solution I'm thinking about would be to use 2 different queries for object_list / pagination count, because for count the query would be really simple, and wouldn't need the prefetches.
I think this scenario may be pretty likely to come across also for other Django users, so I'm thinking if it wouldn't be nice to have it directly in Django.
From my point of view, it seems like the implementation would be pretty straightforward and easy, and I would even like to look into it and contribute to the Django project if possible and if you think it would be useful.
I would like to add count_queryset param to Paginator.__init__, and def get_count_queryset to MultipleObjectMixin. Happy to hear your opinions on this :)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" rel="nofollow" target="_blank" gdf-obfuscated-mailto="kND-VXN_CAAJ" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" rel="nofollow" target="_blank" gdf-obfuscated-mailto="kND-VXN_CAAJ" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com?utm_medium=email&amp;utm_source=footer" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a294cf3c-9509-4788-8f18-b4e33a580b4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Jakub Kleň
In reply to this post by Adam Johnson-2
Hi, thank you for your response.

I'm currently running MySQL 5.7.17, and the time difference between the queries is really big (0.006s vs 16.6s).
I'm also wondering that something like this isn't handled by MySQL.

On Saturday, June 30, 2018 at 1:50:48 PM UTC+2, Adam Johnson wrote:
it takes really long to process by MySQL.

Which version? Subquery performance gets better in MySQL 5.6 and even more so on MariaDB.

 for count the query would be really simple, and wouldn't need the prefetches.

When counting, prefetches aren't actaully executed. Check the actual SQL with e.g. django-debug-toolbar .

On Fri, 29 Jun 2018 at 21:24, Jakub Kleň <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="s-t7_elyCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">kuk...@...> wrote:
Hi guys, I came across a problem today while testing my webapp with a full migrations applied from my legacy site for the first time.
The problem is I'm using Prefetch in my view query (get_queryset), and it works fine when I just fetch it, but the problem comes with the .count() query which generates sql with subqueries, and it takes really long to process by MySQL.
The nicest solution I'm thinking about would be to use 2 different queries for object_list / pagination count, because for count the query would be really simple, and wouldn't need the prefetches.
I think this scenario may be pretty likely to come across also for other Django users, so I'm thinking if it wouldn't be nice to have it directly in Django.
From my point of view, it seems like the implementation would be pretty straightforward and easy, and I would even like to look into it and contribute to the Django project if possible and if you think it would be useful.
I would like to add count_queryset param to Paginator.__init__, and def get_count_queryset to MultipleObjectMixin. Happy to hear your opinions on this :)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="s-t7_elyCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-develop...@googlegroups.com.
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="s-t7_elyCAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">django-d...@googlegroups.com.
Visit this group at <a href="https://groups.google.com/group/django-developers" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/group/django-developers&#39;;return true;">https://groups.google.com/group/django-developers.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com?utm_medium\x3demail\x26utm_source\x3dfooter&#39;;return true;">https://groups.google.com/d/msgid/django-developers/a29e6b81-f05c-46af-a479-04e1f4bce705%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.


--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/a807e2d9-bf1d-4e3b-9b9d-8bea838e143e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Jakub Kleň
In reply to this post by Jakub Kleň
I checked MySQL 8 today, and it's the same story. Does anyone know how this performs in MariaDB or Postgresql?

On Friday, June 29, 2018 at 11:00:33 PM UTC+2, Jakub Kleň wrote:
Hi guys, I came across a problem today while testing my webapp with a full migrations applied from my legacy site for the first time.
The problem is I'm using Prefetch in my view query (get_queryset), and it works fine when I just fetch it, but the problem comes with the .count() query which generates sql with subqueries, and it takes really long to process by MySQL.
The nicest solution I'm thinking about would be to use 2 different queries for object_list / pagination count, because for count the query would be really simple, and wouldn't need the prefetches.
I think this scenario may be pretty likely to come across also for other Django users, so I'm thinking if it wouldn't be nice to have it directly in Django.
From my point of view, it seems like the implementation would be pretty straightforward and easy, and I would even like to look into it and contribute to the Django project if possible and if you think it would be useful.
I would like to add count_queryset param to Paginator.__init__, and def get_count_queryset to MultipleObjectMixin. Happy to hear your opinions on this :)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7018c157-03cc-4f94-afa3-b9cdc753f36f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Ramiro Morales-2
In reply to this post by Tom Forbes
On Sat, Jun 30, 2018 at 12:40 PM Tom Forbes <[hidden email]> wrote:
Are you sure it is the prefetches that is causing this? As Adam pointed out these are correctly ignored. Annotations however are not, which can cause unnecessary work and longer execution times. i.e:

Book.objects.annotate(something=Max('pages__word_count')).count()

We have enough information to be able to strip the pages join when using count() it certain conditions are met, and this might be preferable than adding workarounds to various places.

FYI: This had already been reported in #23771 (https://code.djangoproject.com/ticket/23771) (and a couple of similar/duplicates tickets)

Regards,

--
Ramiro Morales
@ramiromorales

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAO7PdF_Y_vQqYNst5ZR1d9vxWaZMqhZwfF3fcZ4PC7tQv0hGJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Optionally using a custom queryset for Paginator and MultipleObjectMixin (ListView)

Tom Forbes
Indeed, I had an attempt at doing this here (https://github.com/django/django/pull/8928/), but it seems a hard problem. I think there is huge potential here but I have no time to work on this anymore.

On Tue, 17 Jul 2018, 19:00 Ramiro Morales, <[hidden email]> wrote:
On Sat, Jun 30, 2018 at 12:40 PM Tom Forbes <[hidden email]> wrote:
Are you sure it is the prefetches that is causing this? As Adam pointed out these are correctly ignored. Annotations however are not, which can cause unnecessary work and longer execution times. i.e:

Book.objects.annotate(something=Max('pages__word_count')).count()

We have enough information to be able to strip the pages join when using count() it certain conditions are met, and this might be preferable than adding workarounds to various places.

FYI: This had already been reported in #23771 (https://code.djangoproject.com/ticket/23771) (and a couple of similar/duplicates tickets)

Regards,

--
Ramiro Morales
@ramiromorales

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAO7PdF_Y_vQqYNst5ZR1d9vxWaZMqhZwfF3fcZ4PC7tQv0hGJw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJOf-rN%3DMmOMTxjSEjjrC7FOGwuOLF7zXGqP79aeg0-68g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.