A faster paginator for django

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

A faster paginator for django

Saleem Jaffer
Hi all,

The default paginator that comes with Django is inefficient when dealing with large tables. This is because the final query for fetching pages uses "OFFSET" which is basically a linear scan till the last index of the current page. Does it make sense to have a better paginator which does not use "OFFSET". 

If this sounds like a good idea, I have some ideas on how to do it and with some help from you guys I can implement it.

Saleem

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/020f66e0-ec58-47e2-be0b-00c3f1688c5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A faster paginator for django

ludovic coues
The preferred way for this kind of scenario is to make a third party package. This let you release new version faster than the Django development cycle and it's super easy to install thanks to tools like pip.

Once your solution is stable, if it's popular enough, it could be incorporated into Django.

I'm really curious how you want to do it. I thought there was no other way to skip some row in an SQL query


On Wed, Dec 5, 2018, 13:15 Saleem Jaffer <[hidden email] wrote:
Hi all,

The default paginator that comes with Django is inefficient when dealing with large tables. This is because the final query for fetching pages uses "OFFSET" which is basically a linear scan till the last index of the current page. Does it make sense to have a better paginator which does not use "OFFSET". 

If this sounds like a good idea, I have some ideas on how to do it and with some help from you guys I can implement it.

Saleem

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/020f66e0-ec58-47e2-be0b-00c3f1688c5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAEuG%2BTar6iM5sctRSyPkOHtmL7u9pM%3D01bUvWn%2BZEeQHZsk2fg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A faster paginator for django

Jason Johns
https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/ has some interesting alternatives.  I believe Django uses the first one. But the others have some tradeoffs that might not apply to all the dbs that django supports.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b649819f-80c3-4b12-af07-0914c679287a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A faster paginator for django

Adam Johnson-2
There are already some packages on listed on djangopackages that claim to implement different pagination strategies: https://djangopackages.org/grids/g/pagination/

On Wed, 5 Dec 2018 at 12:37, Jason Johns <[hidden email]> wrote:
https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/ has some interesting alternatives.  I believe Django uses the first one. But the others have some tradeoffs that might not apply to all the dbs that django supports.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b649819f-80c3-4b12-af07-0914c679287a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM0R%3DHGSmRMs1mGnr9ZEBECPmRwTHF7_7U3FCBaHky69ug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A faster paginator for django

Curtis Maloney-2
In reply to this post by Saleem Jaffer
On 12/5/18 8:30 PM, Saleem Jaffer wrote:

> Hi all,
>
> The default paginator that comes with Django is inefficient when dealing
> with large tables. This is because the final query for fetching pages
> uses "OFFSET" which is basically a linear scan till the last index of
> the current page. Does it make sense to have a better paginator which
> does not use "OFFSET".
>
> If this sounds like a good idea, I have some ideas on how to do it and
> with some help from you guys I can implement it.

There are a number of alternatives to this, as well as low-cost
solutions to improve OFFSET / LIMIT pagination.

By adding an index to the sorting field(s), it can drastically improve
the "simple" case.

Beyond that, you start getting into cases with more significant
trade-offs.  I know Matthew Schinckel was recently working on a drop-in
replacement paginator that used "keyset pagination"
https://bitbucket.org/schinckel/django-keyset-pagination

There's a lot of published work on this topic, and I'd be very
interested to see, at least, a library implementing some of these
alternatives.

At the very least, we could want to ensure the documentation recommends
indexing on the ORDERing fields, where possible.

--
Curtis

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/f9c52aee-e28c-7ae0-306d-3ae2fbd828ee%40tinbrain.net.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A faster paginator for django

Josh Smeaton
I think most people would typically sort/paginate on the primary key field, which still exhibits a linear scan, where 
the first few pages are fast and the last few pages take significantly longer. Just wanted to call that out in case there
were listeners thinking an index solves the problem. Sorting by an unindexed field just makes a bad query a horrible one.

On Thursday, 6 December 2018 07:32:47 UTC+11, Curtis Maloney wrote:
On 12/5/18 8:30 PM, Saleem Jaffer wrote:

> Hi all,
>
> The default paginator that comes with Django is inefficient when dealing
> with large tables. This is because the final query for fetching pages
> uses "OFFSET" which is basically a linear scan till the last index of
> the current page. Does it make sense to have a better paginator which
> does not use "OFFSET".
>
> If this sounds like a good idea, I have some ideas on how to do it and
> with some help from you guys I can implement it.

There are a number of alternatives to this, as well as low-cost
solutions to improve OFFSET / LIMIT pagination.

By adding an index to the sorting field(s), it can drastically improve
the "simple" case.

Beyond that, you start getting into cases with more significant
trade-offs.  I know Matthew Schinckel was recently working on a drop-in
replacement paginator that used "keyset pagination"
<a href="https://bitbucket.org/schinckel/django-keyset-pagination" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fbitbucket.org%2Fschinckel%2Fdjango-keyset-pagination\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHk1_lMYHq_cwH4qLjmY2oxL9rwqg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fbitbucket.org%2Fschinckel%2Fdjango-keyset-pagination\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHk1_lMYHq_cwH4qLjmY2oxL9rwqg&#39;;return true;">https://bitbucket.org/schinckel/django-keyset-pagination

There's a lot of published work on this topic, and I'd be very
interested to see, at least, a library implementing some of these
alternatives.

At the very least, we could want to ensure the documentation recommends
indexing on the ORDERing fields, where possible.

--
Curtis

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/ea4f87d6-5b9a-4f82-9486-7a7f4b094607%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A faster paginator for django

Kye Russell
In reply to this post by Saleem Jaffer
It might also be worth looking at the alternative pagination methods offered by Django REST Framework as a source of inspiration.

On Wednesday, December 5, 2018 at 8:15:22 PM UTC+8, Saleem Jaffer wrote:
Hi all,

The default paginator that comes with Django is inefficient when dealing with large tables. This is because the final query for fetching pages uses "OFFSET" which is basically a linear scan till the last index of the current page. Does it make sense to have a better paginator which does not use "OFFSET". 

If this sounds like a good idea, I have some ideas on how to do it and with some help from you guys I can implement it.

Saleem

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/0f431ffc-5ebf-4703-9e01-91240007c154%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A faster paginator for django

Ivan Anishchuk
In reply to this post by Saleem Jaffer
Note: I'm only covering postgres here but the idea is pretty universal.

Good pagination starts with good sorting, so I'd suggest starting with having a good primary key. Integer PK could be enough, could be not quite (custom generator + postgres uuid field would be the next logical choice for me, something timestamp-based but small and fast in any case). I'd supplement that with a secondary id (modification/version id, exactly the same as pk but reset on every save). This way you can cover two most important sorting: by creation order and by modification order (you can even implement some simple data versioning if you use both as a composite pk but that's another story). Then, if you need sorting on other fields it would be extremely wise to create indices for every single sorting option (if the field you sort on is not unique, use index_together and pk and/or version id as the last field).

Now, having that, you can start with the actual pagination. Since you always sort by a unique keyset in this scheme, you should be able to paginate using greater than comparison and to make it generic enough let's assume you always sort/paginate by multiple fields (arbitrary number of them). You have to somehow pass the information about which fields you want to sort on and which keyset you want to compare with from client to server, you can invent your own weird way to do it but the best (I think) would be to use standard query filters and (multiple, if needed) order_by. "Cursor" pagination in Django Rest Framework, for example, is essentially this but supports only a single sorting field (cannot use unique keysets to supplement non-unique primary sorting fields) and for reason unknown encodes the filters to base64 bloating the size enormously. With this proposed scheme everything is transparent and flexible if just slightly verbose (you can always do something similar with base64 encoding or whatnot if you want it to become less transparent and longer). Your url would look kinda like this: /view/?order_by=slug&order_by=-id&slug_lt=some_title&id_lt=123&limit=50 (there could be less verbose options like ?o=-field1+field2-field3&k=+some_title&k=-123&l=50 but more urlsafe if you really care about a few bytes, but mostly just make your param names configurable and you should be fine) -- if you absolutely have to use non-unique keysets you can augment this by adding optional offset (small offset with any gt/lt is better than big offset and no filtering). Next, you need to calculate the next/previous pages from the current, there were some links above that explain it at length but basically to keep it stateless you should always query an extra item and use its value with >= for the next page and with < and reversed order for the previous.

Another beauty of keyset pagination is that it doesn't have to be exact. If an object was removed, comparisons will continue working. You can also trivially construct keysets for getting first and last pages (paginating back from the last would produce different pages but I don't think it matters that much in most real-life scenarios).

Keyset has a few disadvantages over page number, mainly no random page access. If you want to have it at the cost of slower queries, you can add optional offset (mentioned above) and run a count(*) query like page number paginator does (or perhaps run some faster estimation). Page number is just a shorter representation of limit-offset pagination when you now the total count, and combined with keyset it would still be much faster on long tables than just limit-offset (or page number).

This is roughly how I plan to implement it although I don't dream of building anything universal, I just want a small reusable module so I don't have to reimplement this in every project that needs it. If you ever release any code, feel free to ping me for review or testing, chances are I like your library and won't have to implement mine :) I have too many library ideas and not enough time to work on them all, collaboration could be much more better.

Ivan.

On Wed, Dec 5, 2018 at 3:15 PM Saleem Jaffer <[hidden email]> wrote:
Hi all,

The default paginator that comes with Django is inefficient when dealing with large tables. This is because the final query for fetching pages uses "OFFSET" which is basically a linear scan till the last index of the current page. Does it make sense to have a better paginator which does not use "OFFSET". 

If this sounds like a good idea, I have some ideas on how to do it and with some help from you guys I can implement it.

Saleem

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/020f66e0-ec58-47e2-be0b-00c3f1688c5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CADPNjZ53PUz8TcLLUQ7YOYUL-zfWeNsfLvrxwwsmNbswRLaakA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.